{smcl}
{* *! version 1.2.2 27jul2009}{...}
{cmd:help stpm2} {right: ({browse "http://www.stata-journal.com/article.html?article=st0165":SJ9-2: st0165})}
{hline}

{title:Title}

{p2colset 5 14 16 2}{...}
{p2col :{hi:stpm2} {hline 2}}Flexible parametric survival models{p_end}
{p2colreset}{...}


{title:Syntax}


{p 8 16 2}{cmd:stpm2} [{varlist}] {ifin}, {opt sc:ale(scalename)} [{it:options}]


{marker options}{...}
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Model}
{synopt :{opt scale(scalename)}}specify scale on which survival model is to be fit{p_end}
{synopt :{opt df(#)}}specify degrees of freedom for baseline hazard function{p_end}
{synopt :{opt knots(numlist)}}specify knot locations for baseline hazard{p_end}
{synopt :{opt tvc(varlist)}}specify varlist of time-dependent effects{p_end}
{synopt :{opt dft:vc(df_list)}}specify degrees of freedom for each time-dependent effect{p_end}
{synopt :{opt knotst:vc(numlist)}}specify knot locations for time-dependent effects{p_end}
{synopt :{opt knscale(scale)}}specify scale for user-defined knots; default scale is {cmd:time}){p_end}
{synopt :{opt bk:nots(knotslist)}}specify boundary knots{p_end}
{synopt :{opt noorth:og}}do not use orthogonal transformation of spline variables{p_end}
{synopt :{opt bhaz:ard(varname)}}invoke relative survival models where {it:varname} holds expected mortality rate (hazard) at time of death{p_end}
{synopt :{opt nocons:tant}}suppress constant term{p_end}
{synopt :{opt st:ratify(varlist)}}for backward compatibility with {cmd:stpm}{p_end}
{synopt :{cmdab:th:eta(}{cmd:est}|{it:#}{cmd:)}}for backward compatibility with {cmd:stpm}{p_end}

{syntab:Reporting}
{synopt :{opt alleq}}report all equations used by {cmd:ml}{p_end}
{synopt :{opt ef:orm}}report exponentiated coefficients{p_end}
{synopt :{opt keepc:ons}}do not drop constraints used in {cmd:ml} routine{p_end}
{synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)}{p_end}
{synopt :{opt showc:ons}}list constraints in output{p_end}

{syntab:Max options}
{synopt :{opt const:heta(#)}}constrain value of theta{p_end}
{synopt :{opt initt:heta(#)}}specify initial value of theta{p_end}
{synopt :{opt lin:init}}obtain initial values by first fitting a linear function of ln(time); seldom used{p_end}
{synopt :{it:{help streg##maximize_options:maximize_options}}}control maximization process; seldom used{p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
You must {cmd:stset} your data before using {cmd:stpm2}; see {manhelp stset ST}.{p_end}


{title:Description}

{pstd}
{cmd:stpm2} fits flexible parametric survival models (Royston-Parmar models). {cmd:stpm2} 
can be used with single- or multiple-record or single- or multiple-failure {cmd:st} data.
Survival models can be fit on the log cumulative hazard scale, the log cumulative
odds scale, the standard normal deviate (probit) scale, or on a scale defined by the
value of theta using the Aranda-Ordaz family of link functions.

{pstd}
{cmd:stpm2} can fit the same models as can {cmd:stpm}, but {cmd:stpm2} is more flexible in that it does
not force the knots for time-dependent effects to be the same as those used
for the baseline distribution function. Also, {cmd:stpm2} can fit relative survival
models by use of the {cmd:bhazard()} option. Postestimation commands have been extended
over what is available in {cmd:stpm}. {cmd:stpm2} is noticeably faster than {cmd:stpm}.

{pstd}
See {manhelp streg ST} for other (standard) parametric survival models.


{title:Options}

{dlgtab:Model}

{phang}
{opt scale(scalename)} specifies on which scale the survival model is to be
fit.

{pmore}
{cmd:scale({ul:h}azard)} fits a model on the log cumulative hazard scale,
i.e., the scale of ln[-ln{S(t)}]. If no time-dependent effects are specified,
the resulting model has proportional hazards.

{pmore}
{cmd:scale({ul:o}dds)} fits a model on the log cumulative odds scale,
i.e., ln[{1 - S(t)}/S(t)]. If no time-dependent effects
are specified, then this is a proportional-odds model.

{pmore}
{cmd:scale({ul:n}ormal)} fits a model on the normal equivalent deviate
scale, i.e., a probit link for the survival function invnorm{1 - S(t)}.

{pmore}
{cmd:scale({ul:t}heta)} fits a model on a scale defined by the value of theta
for the Aranda-Ordaz family of link functions, i.e.,
ln[{S(t)^(-theta) - 1}/theta]. theta = 1 corresponds to a
proportional-odds model, and theta = 0 corresponds to a proportional
cumulative-hazard model.

{phang} {opt df(#)} specifies the degrees of freedom (df) for the restricted cubic
spline function used for the baseline hazard rate. {it:#} must be between 1
and 10, but a value between 1 and 5 is usually sufficient.
The {cmd:knots()} option is not applicable if the {cmd:df()}
option is specified. The knots are placed at the following centiles of the
distribution of the uncensored log survival times:

        {hline 60}
        df  knots  Centile positions
        {hline 60}
         1    0    (no knots)
         2    1    50
         3    2    33 67
         4    3    25 50 75
         5    4    20 40 60 80
         6    5    17 33 50 67 83
         7    6    14 29 43 57 71 86
         8    7    12.5 25 37.5 50 62.5 75 87.5
         9    8    11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9
        10    9    10 20 30 40 50 60 70 80 90
        {hline 60}

{pmore}
These are internal knots and there are also boundary knots
placed at the minimum and maximum of the distribution of uncensored survival
times.

{phang}
{opt knots(numlist)} specifies knot locations for the baseline distribution
function, as opposed to the default locations set by {cmd:df()}. The locations of the knots are placed on the scale defined by {cmd:knscale()}.
However, the scale used by the restricted cubic spline function is always
log time. Default knot positions are determined by the {opt df()} option.

{phang}
{opt tvc(varlist)} specifies the names of the variables that are time dependent.
Time-dependent effects are fit using restricted cubic splines.
The df is specified using the {opt dftvc()} option.

{phang} {opt dftvc(df_list)} specifies the df for time-dependent
effects. The potential df is between 1 and 10. With 1 df, a linear effect of log time is fit.  If there is more than one
time-dependent effect and a different df is required for each
time-dependent effect, then the following syntax can be used:
{cmd:dftvc(x1:3 x2:2 1)}, where {cmd:x1} has 3 df, {cmd:x2} has 2 df, and any
remaining time-dependent effects have 1 df.

{phang} {opt knotstvc(numlist)} specifies the location of the internal knots for
any time-dependent effects. If different knots are required for different
time-dependent effects, then this option can be specified as follows:
{cmd:knotstvc(x1 1 2 3 x2 1.5 3.5)}.

{phang} {opt knscale(scale)} sets the scale on which user-defined knots are
specified.  {cmd:knscale(time)} denotes the original time scale,
{cmd:knscale(log)} denotes the log time scale, and {cmd:knscale(centile)}
specifies that the knots are taken to be centile positions in the distribution
of the uncensored log survival times.  The default is {cmd:knscale(time)}.  The default is {cmd:knscale(time)}.

{phang}
{opt bknots(knotslist)} is a two-element list giving
the boundary knots. By default, these are located at the minimum and maximum
of the uncensored survival times. They are specified on the scale defined
by {cmd:knscale()}.

{phang}
{cmd: noorthog} suppresses orthogonal transformation of spline variables.

{phang} {opt bhazard(varname)} is used when fitting relative survival models.
{it:varname} gives the expected mortality rate at the time of death or censoring.
{cmd:stpm2} gives an error message when there are missing values of
{it:varname}, because this usually indicates that an error has occurred when
merging the expected mortality rates.

{phang}
{opt noconstant};
see {helpb st estimation options##noconstant:[R] estimation options}.

{phang}
{opt stratify(varlist)} is provided for backward compatibility with {helpb stpm}.
Members of {it:varlist} are modeled with time-dependent effects. See
the {opt tvc()} and {opt dftvc()} options for {cmd:stpm2}'s way of
specifying time-dependent effects.

{phang}
{cmd:theta(}{cmd:est}|{it:#}{cmd:)} is provided for backward compatibility with
{helpb stpm}. {cmd:est} requests that theta be estimated, whereas {it:#}
fixes theta to {it:#}. See {opt constheta()} and {opt inittheta()} for
{cmd:stpm2}'s way of specifying theta.


{dlgtab:Reporting}

{phang}
{opt alleq} reports all equations used by {cmd:ml}. The models are fit using
various constraints for parameters associated with the derivatives of the
spline functions. These parameters are generally not of interest and thus
are not shown by default. Also, an extra equation is used when fitting
delayed-entry models; again, this is not shown by default.

{phang}
{opt eform} reports the exponentiated coefficients. For models on the log
cumulative-hazard scale, {cmd:scale(hazard)}, this gives hazard ratios if
the covariate is not time dependent. Similarly, for models on the log
cumulative-odds scale, {cmd:scale(odds)}, this option will give odds ratios
for non-time-dependent effects.

{phang}
{opt keepcons} prevents the constraints imposed by {cmd:stpm2} on the
derivatives of the spline function when fitting delayed-entry models from
being dropped. By default, the constraints are dropped.

{phang}
{opt level(#)} specifies the confidence level, as a percentage, for confidence
intervals.  The default is {cmd:level(95)} or as set by {helpb set level}.

{phang}
{opt showcons} lists the output the constraints used by {cmd:stpm2} for the derivatives of the spline function and when fitting delayed-entry models; the default is to not list them. 

{marker maximize_options}{...}
{dlgtab:Max options}
 
{phang}
{opt constheta(#)} constrains the value of theta; i.e., it is treated as a known
constant.

{phang}
{opt inittheta(#)} specifies an initial value for theta in the Aranda-Ordaz
family of link functions.

{phang}
{opt lininit} obtains initial values by fitting only the first spline
basis function (i.e., a linear function of log survival time).
This option is seldom needed.

{phang}
{it:maximize_options}: {opt dif:ficult}, {opt tech:nique(algorithm_spec)}, 
{opt iter:ate(#)}, [{cmdab:no:}]{opt lo:g}, {opt tr:ace}, {opt grad:ient}, 
{opt showstep}, {opt hess:ian}, {opt shownr:tolerance}, {opt tol:erance(#)}, 
{opt ltol:erance(#)}, {opt gtol:erance(#)}, {opt nrtol:erance(#)}, 
{opt nonrtol:erance}, {opt from(init_specs)}; see {manhelp maximize R}.  These 
options are seldom used, but the {opt difficult} option may be useful if there
are convergence problems when fitting models that use the Aranda-Ordaz family of link
functions.


{title:Remarks}

{pstd}
Let t denote time. {cmd:stpm2} works by first calculating the survival function
after fitting a Cox proportional hazards model. The procedure is
illustrated for proportional hazards models, specified by the
{cmd:scale(hazard)} option. S(t) is converted to an estimate of the log cumulative hazard
function, Z(t), by the formula

{pin}
	Z(t) = ln[-ln{S(t)}]

{pstd}
This estimate of Z(t) is then smoothed on ln(t) by using regression splines with
knots placed at certain quantiles of the distribution of t. The knot positions
are chosen automatically if the spline complexity is specified by the {cmd:df()}
option, or manually by way of the {cmd:knots()} option. (The knots
are placed on values of ln(t), not t.) Denote the predicted values of the log cumulative
hazard function by Z_hat(t). The density function, f(t), is

{pin}
	f(t) = -dS(t)/dt = dS/dZ_hat dZ_hat/dt = S(t) exp(Z_hat) dZ_hat(t)/dt

{pstd}
dZ_hat(t)/dt is computed from the regression coefficients of the fitted spline
function. The estimated survival function is calculated as

{pin}
	S_hat(t) = exp{-exp Z_hat(t)}

{pstd}
The hazard function is calculated as f(t)/S_hat(t).

{pstd}
If {it:varlist} is specified, the baseline survival function (i.e., at zero values
of the covariates) is used instead of the survival function of the raw
observations. With {cmd:df(1)}, a Weibull model is fit.

{pstd}
With {cmd:scale(normal)}, smoothing is of the normal quantile function,
invnorm{1 - S(t)}, instead of the log cumulative-hazard function. With
{cmd:df(1)}, a lognormal model is fit.

{pstd}
With {cmd:scale(odds)}, smoothing is of the log odds-of-failure function,
ln[{1 - S(t)}/S(t)], instead of the log cumulative-hazard function. With
{cmd:df(1)}, a loglogistic model is fit.

{pstd}
Estimation is performed by maximum likelihood. Optimization uses the
default technique, {cmd:nr} (meaning Stata's version of Newton-Raphson
iteration).


{title:Examples}

{pstd}Setup{p_end}

{phang2}{stata "webuse brcancer"}{p_end}
{phang2}{stata "stset rectime, failure(censrec = 1)"}{p_end}

{pstd}Proportional hazards model{p_end}

{phang2}{stata "stpm2 hormon, scale(hazard) df(4) eform"}{p_end}

{pstd}Proportional odds model{p_end}

{phang2}{stata "stpm2 hormon, scale(odds) df(4) eform"}{p_end}

{pstd}Time-dependent effects on cumulative hazard scale{p_end}

{phang2}{stata "stpm2 hormon, scale(hazard) df(4) tvc(hormon) dftvc(3)"}{p_end}

{pstd}User-defined knots at centiles of uncensored event times{p_end}

{phang2}{stata "stpm2 hormon, scale(hazard)  knots(20 50 80) knscale(centile)"}{p_end}


{title:Author}

{pstd}Paul C. Lambert{p_end}
{pstd}Centre for Biostatistics and Genetic Epidemiology{p_end}
{pstd}Department of Health Sciences{p_end}
{pstd}University of Leicester, UK{p_end}
{pstd}paul.lambert@le.ac.uk{p_end}


{title:Also see}

{psee}
Article: {it:Stata Journal}, volume 9, number 2: {browse "http://www.stata-journal.com/article.html?article=st0165":st0165}

{psee}
Online:  {helpb stpm2_postestimation}; {manhelp stset ST}, {helpb stpm} (if
installed)
{p_end}