ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
Abstract. In this article, I introduce the ldagibbs command, which implements
latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most
popular machine-learning topic model. Topic models automatically cluster text
documents into a user-chosen number of topics. Latent Dirichlet allocation
represents each document as a probability distribution over topics and
represents each topic as a probability distribution over words. Therefore,
latent Dirichlet allocation provides a way to analyze the content of large
unclassified text data and an alternative to predefined document
classifications.
View all articles by this author:
Carlo Schwarz
View all articles with these keywords:
ldagibbs, machine learning, latent Dirichlet allocation, Gibbs sampling, topic model, text analysis
Download citation: BibTeX RIS
Download citation and abstract: BibTeX RIS
|