{smcl} {hline} help for {hi:qhapipf} {right: (TSJ-2: st0008)} {hline} {title:Analysis of Quantitative traits using regression and log-linear modelling when} {title:PHASE is unknown} {p 8 27} {cmdab:qhapipf} {it:varlist} [{cmd:using} {it:filename}] [{cmd:if} {it:exp}] {cmd:,} {cmd:qt(}{it:varname}{cmd:)} [{cmd:ipf(}{it:string}{cmd:)} {cmdab:reg:ress(}{it:string}{cmd:)} {cmd:start} {cmdab:dis:play} {cmd:known} {cmd:phase(}{it:varname}{cmd:)} {cmd:acc(}{it:#}{cmd:)} {cmd:ipfacc(}{it:#}{cmd:)} {cmd:nolog} {cmd:model(}{it:#}{cmd:)} {cmd:lrtest(}{it:#}{cmd:,}{it:#}{cmd:)} {cmd:convars(}{it:varlist}{cmd:)} {cmd:confile(}{it:filename}{cmd:)} {cmd:mv} {cmd:mvdel} {cmd:hap(}{it:string}{cmd:)} {cmd:menu} ] {title:Description} {p}This command models the relationship between a normally distributed continuous variable in a population-based random sample and individuals' haplotype. This command uses an EM algorithm to resolve haplotype phase. Covariates are constructed from the haplotype and used in a regression model. Additionally the EM algorithm also handles missing typings assuming MCAR. {p}There are two distinct models the log-linear model for haplotype frequencies. Further details of this procedure are found in the Stata command {cmd:hapipf}. Haplotype frequencies are estimated under the assumption of Hardy--Weinberg Equilibrium. {p}The regression model relates the haplotypes to the quantitative trait. This model is specified in {cmd:regress()} with the dependent variable specified by the {cmd:qt()} option. {p}The regresssion model takes a syntax to specify the dummy variables for the regression model. The syntax can specify within-loci, between-loci and between-chromosome effects. {title:Options} {p 0 4}{cmd:qt}{cmd:(}{it:varname}{cmd:)} specifies the dependent variable in the regression model. {p 0 4}{cmd:ipf(}{it:string}{cmd:)}specifies the log-linear model. It requires syntax of the form {hi:l1*l2+l3}. {hi:l1*l2} allows all the interactions between the first two loci, and locus 3 is independent of them. This syntax is used in most books on log-linear modeling. "-" terms and brackets are not allowed. {p 0 4}{cmd:regress(}{it:string}{cmd:)}specifies the regression model. The program then creates "dummy" variables for all the effects. A fuller description of this option is given in the examples. {p 0 4}{cmd:start} specifies that the starting posterior weights of the EM algorithm are chosen at random. {p 0 4}{cmd:display} specifies to output parameter estimates. {p 0 4}{cmd:known} specifies that phase is known. {p 0 4}{cmd:phase(}{it:varname}{cmd:)} specifies a variable that contains 1s where phase is known and 0s where phase is unknown. {p 0 4}{cmd:acc(}{it:#}{cmd:)} specifies the convergence criteria based on the log likelihood. {p 0 4}{cmd:ipfacc(}{it:#}{cmd:)} specifies the convergence criteria for the ipf algorithm. {p 0 4}{cmd:nolog} suppresses the iteration log. {p 0 4}{cmd:model}{cmd:(}{it:#}{cmd:)} specifies a label for the log-linear model being fitted. This label is used in the {hi: lrtest()} option. {p 0 4}{cmd:lrtest(}{it:#}{cmd:,}{it:#}{cmd:)} performs a likelihood-ratio test between the two models saved by the {hi: model()} option. {p 0 4}{cmd:convars(}{it:varlist}{cmd:)} specifies a list of variables in the constraints file. {p 0 4}{cmd:confile(}{it:filename}{cmd:)} specifies the name of the constraints file. {p 0 4}{cmd:mv} specifies that the algorithm should replace missing locus data (".") with a copy of each of the possible alleles at this locus. This is performed at the same stage as the handling of the missing phase when the dataset is expanded into all possible observations. If this option is not specified but some of the alleles do contain missing data the algorithm sees the symbol "." as another allele. {p 0 4}{cmd:mvdel} specifies that all subjects with missing alleles are deleted. {p 0 4}{cmd:hap(}{it:string}{cmd:)} specifies the haplotype of interest. The dummy variables in the regression are all related to this haplotype. If the user does not slect a particular haplotype, one is chosen. {p 0 4}{cmd:menu} specifies that the command is run through a window interface. {title:Examples} {p}To execute the menu interface version of this command type {p 8 12}{inp:. qhapipf,menu} {p}For the examples, I shall assume there are three loci a, b, and c. The pairs of alleles are contained in the 6 variables a1, a2, b1, b2, c1, and c2. Let the quantitative trait variable be y. {p}All the models described here assume that the saturated model is fitted for the haplotype frequencies. For a single locus {hi:a}, this saturated model is specified by the option {cmd:ipf(l1)}. Given this, the regression models are specified in the {cmd:regress()} option, and the more common models are described below. All the regression models assume that there are two alleles per locus, multiple alleles are recoded by the algorithm in terms of an allele of interest and all the rest are the reference group. {p}The one parameter constant model is specified by {cmd:reg(1)}. To add an additional parameter that is the additive effect of the allele of interest the model is specified by the option {cmd:reg([l1+l1])}, where {hi: l1} represents the first locus in the varlist. This is the one-locus single-point additive model (one-locus SAM).The terms between the [] (brackets) represent the within-locus model, in the SAM the two chromosomes are independent but have the same parameter for the allele of interest effect. If the allelic effect depended on the chromosome, then there would be two parameters and this is specified by the option {cmd:reg([l1a+l1b])}, this is the effect of parental imprinting, not additive. Additionally, the within-locus between-chromosome interaction can be included by replacing the {hi:+} symbol with {hi:*}. This parameter is usually called the dominance parameter. The two models become {cmd:reg([l1*l1])} and {cmd:reg([l1a*l1b])}, respectively. {p}The commands to fit these models are given below. {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg(1) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1a+l1b]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1*l1]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1a*l1b]) qt(y)}{p_end} {p}To test whether locus a is associated with the quantitative trait, compare the two regression models {hi:1} and {hi:[l1+l1]} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg([l1+l1]) model(0) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2, ipf(l1) reg(1) model(1) lrtest(0,1) qt(y)}{p_end} {p}When modeling more than one locus, there are additional between-loci interaction terms. The within-loci interactions are specified within the [] (brackets) and the between-loci interactions are specified between the [] (brackets). The two-locus SAM now becomes the model {hi: [l1+l1]+[l2+l2]}, where the two loci are independent specified by the ``+'' symbol between the two sets of brackets. An extension of this model would allow one between-loci interaction (or ``haplotype'' effect), this is the two-locus multipoint additive model (two-locus MAM), this model is specified by the option {cmd:reg([l1+l1]*[l2+l2])}. The saturated model that ignores parental imprinting is specified by the option {cmd:reg([l1*l1]*[l2*l2])}. This model contains between-chromosome interactions. Between-chromosome interactions can be further divided into within-loci between-chromosome interactions (dominance parameters) and between-loci between-chromsome interactions. The full saturated model including parental imprinting is specified by the option {cmd:reg([l1a*l1b]*[l2a*l2b])}. {p}The commands to fit these models are given below {p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]+[l2+l2]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1+l1]*[l2+l2]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1*l1]*[l2*l2]) qt(y)}{p_end} {p 8 12}{inp:. qhapipf a1 a2 b1 b2, ipf(l1*l2) reg([l1a*l1b]*[l2a*l2b]) qt(y)}{p_end} {p}The algorithm calculates the haplotype frequencies internally, and the log-linear model option {cmd:ipf()} specifies this model. Generally, it is taken to be the saturated model. It may be advantageous to use an intermediate model to reduce the number of parameters in the full joint likelihood. This can also be tested using this command using the likelihood-ratio test. {title:Author} {p}Adrian Mander, MRC Biostatistics Unit, Forvie Site, Addenbrookes, Cambridge,UK. Click here to see Adrian Mander's {browse "http://www.mrc-bsu.cam.ac.uk/personal/adrian":WEB site} Email {browse "mailto:adrian.mander@mrc-bsu.cam.ac.uk":adrian.mander@mrc-bsu.cam.ac.uk} {title:Also see} {p 0 19}On-line: help for {help hapipf} (if installed), {help ipf} (if installed){p_end}