The Stata Journal
Volume 14 Number 4: pp. 817-829

txttool: Utilities for text analysis in Stata

Unislawa Williams
Spelman College
Atlanta, GA
Sean P. Williams
SunTrust Bank
Atlanta, GA
Abstract.  This article describes txttool, a command that provides a set of tools for managing free-form text. The command integrates several built-in Stata functions with new text capabilities. These latter functions include a utility to create a bag-of-words representation of text and an implementation of Porter’s (1980, Program: Electronic library and information systems 14: 130–137) word-stemming algorithm. Collectively, these utilities provide a text-processing suite for text mining and other text-based applications in Stata.
View all articles with these keywords: txttool, text mining, Porter stemmer, bag of words, cleaning, stop words, subwords

