Home  >>  Archives  >>  Volume 14 Number 4  >>  dm0077

The Stata Journal
Volume 14 Number 4: pp. 817-829



Subscribe to the Stata Journal
cover

txttool: Utilities for text analysis in Stata

Unislawa Williams
Spelman College
Atlanta, GA
uwilliams@spelman.edu
Sean P. Williams
SunTrust Bank
Atlanta, GA
sean.williams.1000@gmail.com
Abstract.  This article describes txttool, a command that provides a set of tools for managing free-form text. The command integrates several built-in Stata functions with new text capabilities. These latter functions include a utility to create a bag-of-words representation of text and an implementation of Porter’s (1980, Program: Electronic library and information systems 14: 130–137) word-stemming algorithm. Collectively, these utilities provide a text-processing suite for text mining and other text-based applications in Stata.
Terms of use     View this article (PDF)

View all articles by these authors: Unislawa Williams, Sean P. Williams

View all articles with these keywords: txttool, text mining, Porter stemmer, bag of words, cleaning, stop words, subwords

Download citation: BibTeX  RIS

Download citation and abstract: BibTeX  RIS