Home  >>  Archives  >>  Volume 18 Number 1  >>  st0515

The Stata Journal
Volume 18 Number 1: pp. 101-117

Subscribe to the Stata Journal

ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation

Carlo Schwarz
University of Warwick
Coventry, UK
Abstract.  In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.
Terms of use     View this article (PDF)

View all articles by this author: Carlo Schwarz

View all articles with these keywords: ldagibbs, machine learning, latent Dirichlet allocation, Gibbs sampling, topic model, text analysis

Download citation: BibTeX  RIS

Download citation and abstract: BibTeX  RIS