The Stata Journal
Volume 16 Number 4: pp. 1058-1071

Speaking Stata: Letter values as selected quantiles

Nicholas J. Cox
Department of Geography
Durham University
Durham, UK
Abstract.  Letter values were introduced and named by J. W. Tukey in the 1970s as selected quantiles. The idea is to choose a small set of quantiles to characterize an ordered sample of values, quantiles that are each defined as either individual order statistics or the mean of two such statistics. The procedure is to start with the median, to continue with quartiles, and then with first and last octiles, and so on. At each step, approximate medians are identified of successively smaller tail fractions until the extremes, the minimum and maximum, are reached. As a historical aside, the same idea can be identified in work by Francis Galton from 1880.

Letter values are supported by Stata through the official command lv, but that command is geared to letter-value displays and (arbitrarily) will compute no more than 21 letter values. A new command, lvalues, is introduced that supports calculation of letter values without such limits and is designed to save results in new variables for as many variables and distinct groups as are specified. Results may then be easily listed and (especially) plotted in pursuit of identification and comparison of distribution level, spread, and shape. Examples are given with emphasis on quantile plots.
