{smcl}
{hline}
help for {hi:qhapipf} {right: (TSJ-2: st0008)}
{hline}


{title:Analysis of Quantitative traits using regression and log-linear modelling when}
{title:PHASE is unknown}

{p 8 27}
{cmdab:qhapipf}
{it:varlist} [{cmd:using} {it:filename}] [{cmd:if} {it:exp}] 
 {cmd:,} 
 {cmd:qt(}{it:varname}{cmd:)}
[{cmd:ipf(}{it:string}{cmd:)}
 {cmdab:reg:ress(}{it:string}{cmd:)}
 {cmd:start}
 {cmdab:dis:play}
 {cmd:known}
 {cmd:phase(}{it:varname}{cmd:)}
 {cmd:acc(}{it:#}{cmd:)}
 {cmd:ipfacc(}{it:#}{cmd:)}
 {cmd:nolog}
 {cmd:model(}{it:#}{cmd:)}
 {cmd:lrtest(}{it:#}{cmd:,}{it:#}{cmd:)}
 {cmd:convars(}{it:varlist}{cmd:)}
 {cmd:confile(}{it:filename}{cmd:)}
 {cmd:mv}
 {cmd:mvdel}
 {cmd:hap(}{it:string}{cmd:)}
 {cmd:menu}
] 


{title:Description}

{p}This command models the relationship between a normally distributed
continuous variable in a population-based random sample and individuals'
haplotype.  This command uses an EM algorithm to resolve haplotype phase.
Covariates are constructed from the haplotype and used in a regression model.
Additionally the EM algorithm also handles missing typings assuming MCAR.

{p}There are two distinct models the log-linear model for haplotype
frequencies. Further details of this procedure are found in the Stata command
{cmd:hapipf}. Haplotype frequencies are estimated under the assumption of
Hardy--Weinberg Equilibrium.

{p}The regression model relates the haplotypes to the quantitative trait.
This model is specified in {cmd:regress()} with the dependent variable
specified by the {cmd:qt()} option.

{p}The regresssion model takes a syntax to specify the dummy variables for the
regression model. The syntax can specify within-loci, between-loci and
between-chromosome effects.


{title:Options}

{p 0 4}{cmd:qt}{cmd:(}{it:varname}{cmd:)} specifies the dependent variable in
the regression model.

{p 0 4}{cmd:ipf(}{it:string}{cmd:)}specifies the log-linear model. It
requires syntax of the form {hi:l1*l2+l3}. {hi:l1*l2} allows all the
interactions between the first two loci, and locus 3 is independent of them.
This syntax is used in most books on log-linear modeling. "-" terms and
brackets are not allowed.

{p 0 4}{cmd:regress(}{it:string}{cmd:)}specifies the regression
model.  The program then creates "dummy" variables for all the effects. A
fuller description of this option is given in the examples.

{p 0 4}{cmd:start} specifies that the starting posterior weights of the EM
algorithm are chosen at random.

{p 0 4}{cmd:display} specifies to output parameter estimates.

{p 0 4}{cmd:known} specifies that phase is known.

{p 0 4}{cmd:phase(}{it:varname}{cmd:)} specifies a variable that contains 1s
where phase is known and 0s where phase is unknown.

{p 0 4}{cmd:acc(}{it:#}{cmd:)} specifies the convergence criteria based on the log likelihood. 

{p 0 4}{cmd:ipfacc(}{it:#}{cmd:)} specifies the convergence criteria for the
ipf algorithm.

{p 0 4}{cmd:nolog} suppresses the iteration log.

{p 0 4}{cmd:model}{cmd:(}{it:#}{cmd:)} specifies a label for the
log-linear model being fitted. This label is used in the {hi: lrtest()}
option.

{p 0 4}{cmd:lrtest(}{it:#}{cmd:,}{it:#}{cmd:)} performs a likelihood-ratio
test between the two models saved by the {hi: model()} option.

{p 0 4}{cmd:convars(}{it:varlist}{cmd:)} specifies a list of variables in the
constraints file.

{p 0 4}{cmd:confile(}{it:filename}{cmd:)} specifies the name of the
constraints file.

{p 0 4}{cmd:mv} specifies that the algorithm should replace missing locus data
(".") with a copy of each of the possible alleles at this locus. This is
performed at the same stage as the handling of the missing phase when the
dataset is expanded into all possible observations. If this option is not
specified but some of the alleles do contain missing data the algorithm sees
the symbol "." as another allele.

{p 0 4}{cmd:mvdel} specifies that all subjects with missing alleles are
deleted.

{p 0 4}{cmd:hap(}{it:string}{cmd:)} specifies the haplotype of interest.
The dummy variables in the regression are all related to this haplotype. If
the user does not slect a particular haplotype, one is chosen.

{p 0 4}{cmd:menu} specifies that the command is run through a window
interface.


{title:Examples}

{p}To execute the menu interface version of this command type

{p 8 12}{inp:. qhapipf,menu}

{p}For the examples, I shall assume there are three loci a, b, and c.  The
pairs of alleles are contained in the 6 variables a1, a2, b1, b2, c1, and c2.
Let the quantitative trait variable be y.

{p}All the models described here assume that the saturated model is fitted
for the haplotype frequencies. For a single locus {hi:a}, this saturated model
is specified by the option {cmd:ipf(l1)}. Given this, the regression models are
specified in the {cmd:regress()} option, and the more common models are
described below. All the regression models assume that there are two alleles
per locus, multiple alleles are recoded by the algorithm in terms of an allele
of interest and all the rest are the reference group.

{p}The one parameter constant model is specified by {cmd:reg(1)}.  To add an
additional parameter that is the additive effect of the allele of interest the
model is specified by the option {cmd:reg([l1+l1])}, where {hi: l1} represents
the first locus in the varlist.  This is the one-locus single-point
additive model (one-locus SAM).The terms between the [] (brackets) represent
the within-locus model, in the SAM the two chromosomes are independent but
have the same parameter for the allele of interest effect. If the allelic
effect depended on the chromosome, then there would be two parameters and this
is specified by the option {cmd:reg([l1a+l1b])}, this is the effect of
parental imprinting, not additive.  Additionally, the within-locus
between-chromosome interaction can be included by replacing the {hi:+} symbol
with {hi:*}. This parameter is usually called the dominance parameter.  The
two models become {cmd:reg([l1*l1])} and {cmd:reg([l1a*l1b])}, respectively.

{p}The commands to fit these models are given below.

{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg(1) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1a+l1b]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1*l1]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1a*l1b]) qt(y)}{p_end}

{p}To test whether locus a is associated with the quantitative trait, compare
the two regression models {hi:1} and {hi:[l1+l1]}

{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) model(0) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg(1) model(1) lrtest(0,1) qt(y)}{p_end}

{p}When modeling more than one locus, there are additional between-loci
interaction terms.  The within-loci interactions are specified within the []
(brackets) and the between-loci interactions are specified between the []
(brackets).  The two-locus SAM now becomes the model {hi: [l1+l1]+[l2+l2]},
where the two loci are independent specified by the ``+'' symbol between the
two sets of brackets.  An extension of this model would allow one between-loci
interaction (or ``haplotype'' effect), this is the two-locus multipoint
additive model (two-locus MAM), this model is specified by the option 
{cmd:reg([l1+l1]*[l2+l2])}. The saturated model that ignores parental
imprinting is specified by the option {cmd:reg([l1*l1]*[l2*l2])}. This model
contains between-chromosome interactions. Between-chromosome interactions can
be further divided into within-loci between-chromosome interactions (dominance
parameters) and between-loci between-chromsome interactions. The full
saturated model including parental imprinting is specified by the option
{cmd:reg([l1a*l1b]*[l2a*l2b])}.

{p}The commands to fit these models are given below

{p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]+[l2+l2]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]*[l2+l2]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1*l1]*[l2*l2]) qt(y)}{p_end}
{p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1a*l1b]*[l2a*l2b]) qt(y)}{p_end}

{p}The algorithm calculates the haplotype frequencies internally, and the
log-linear model option {cmd:ipf()} specifies this model. Generally, it is
taken to be the saturated model. It may be advantageous to use an intermediate
model to reduce the number of parameters in the full joint likelihood. This
can also be tested using this command using the likelihood-ratio test.


{title:Author}

{p}Adrian Mander, MRC Biostatistics Unit, Forvie Site, Addenbrookes, Cambridge,UK.
Click here to see Adrian Mander's {browse "http://www.mrc-bsu.cam.ac.uk/personal/adrian":WEB site}
Email {browse "mailto:adrian.mander@mrc-bsu.cam.ac.uk":adrian.mander@mrc-bsu.cam.ac.uk}


{title:Also see}

{p 0 19}On-line: help for {help hapipf} (if installed), {help ipf} (if installed){p_end}