HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> Category vars: how to do gamma+discrete component? http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1153484845 Message started by jettlogic on Jul 21st, 2006 at 5:27am |
Title: Category vars: how to do gamma+discrete component? Post by jettlogic on Jul 21st, 2006 at 5:27am
Hi everyone,
For rate category variables, I'm looking for a way to combine a discretised gamma with equal-probability rate categories, (see code snippet), with a fixed-location discrete component. This would be much like the "Gamma with a class of invariant sites" model, except I want to place the discrete component at a particular non-zero rate r and have probability p (so that the Gamma distribution is multiplied by 1-p). I don't see any code in "Examples/Categories" that implements a Gamma+Invariant model, and my attempts to achieve this effect have bombed out with errors like "can't have a category variable depend on a category variable". Does anyone have an idea on how to go about doing this? global alpha = 0.5; alpha:>0.01; alpha:<100; beta:=alpha; category c = ( 5, /* number of rates */ EQUAL, /* probs. of rates */ MEAN, /* sampling method */ GammaDist(_x_,alpha,beta), /* density */ CGammaDist(_x_,alpha,beta), /* CDF */ 0, /* left bound */ 1e5, /* right bound */ CGammaDist(_x_,alpha+1,beta)*alpha/beta /* "antiderivative" of x f(x) */ ); |
Title: Re: Category vars: how to do gamma+discrete compon Post by jettlogic on Aug 1st, 2006 at 9:25am
Hi Sergei,
Thank you for the tips! I wouldn't have thought to use the extra parameter to the LikelihoodFunction to carry out the mixture as a mixture of two complete models. The resulting likelihoods behave well when comparing two mixtures (constraining some parameters reduces the likelihood, as expected). However, a simple (non-mixture) model like plain gamma or even plain point mass (!) comes out more likely than the mixture (by a few log-likelihood points), although the mixture is more general. (point mass significantly less likely than gamma of course) I have a few hypothesis, but after a lot of testing I suspect the mixture model is converging on some particular local optimum, or is converging too slowly to reach the true optimum in a reasonable step limit. I think the right approach is to optimise params for the gamma model, then for the point-mass model (keeping the TV_TS from gamma?), then optimise P such that the sum of "Log(SITE_LIKELIHOOD[0]*P+(1-P)*SITE_LIKELIHOOD[1])" over sites is maximised. However, if HyPhy is optimising everything simultaneously it could explain the local or slow convergence (?) Does this sound right to you? Would you like to see the test code? |
Title: Re: Category vars: how to do gamma+discrete compon Post by Sergei on Aug 1st, 2006 at 10:52am
Dear jettlogic,
wrote on Aug 1st, 2006 at 9:25am:
This does sound like a convergence issue. I'd definitely like to see the test code (you could just e-mail it to me) and help you resolve the issue. Cheers, Sergei |
Title: Re: Category vars: how to do gamma+discrete compon Post by jettlogic on Aug 2nd, 2006 at 7:51am
To check that the model mixture isn't converging, I compared it to something conceptually equivalent but implemented without model mixture.
Since I don't know how to make gamma+point mass using just a category variable (would like to know how, please!), I used gamma+invariant. The first uses 5-cat category var (first being invariant sites), and the second a 4-cat gamma mixed with an invariant model. The results are for P51.nex. For this code, the model-mixture gets a worse likelihood, and reject having invariant sites by setting IP=0 (!). I also noticed that with gamma+point-mass the model mixture rejects putting a point-mass anywhere (P=0), and produces worse-than-gamma likelihoods. What do you think is causing the above? Code (] Gamma+Invariant using category variable Log Likelihood = -3285.62637458753; Shared Parameters: IP=0.205486 alpha=0.369268 TV_TS=0.0989337 beta=alpha/(1-IP)=0.464773 Gamma+Invariant using model mixture Log Likelihood = -3286.09788196153; Shared Parameters: IP=0 alpha=0.225829 TV_TS=0.0992244 beta=alpha/(1-IP)=0.225829 [/code):
|
Title: Re: Category vars: how to do gamma+discrete compon Post by Sergei on Aug 9th, 2006 at 10:52am
Dear Graham,
This example code revealed a bug in how the HyPhy core handles model mixtures when one of the components had category variables, and the other did not. I greatly appreciate your bringing this to my attention. The problem has now been fixed (as of Aug 9th 2006) builds, and the slightly modified example code from your example (pasted here), now gives very similar results with two parameterizations. I realized that my earlier example was not scaling the mean of the mixture to be one correctly, and I also adjusted the number of rate categories to be the same in the gamma component in both cases. Code (] DataSet nucleotideSequences = ReadDataFile ("../data/p51.nex"); DataSetFilter filteredData = CreateFilter ( nucleotideSequences, 1 ); /* Get observed nucleotide frequencies */ HarvestFrequencies ( observedFreqs, filteredData, 1, 1, 1 ); /******************************************************************** Gamma+Invariant LF using category variable ********************************************************************/ fprintf ( stdout, "Gamma+Invariant using category variable\n\n" ); global alpha = 1.0; alpha:>0.01; alpha:<100; /* shape parameter */ global beta = 1.0; beta:>0.01; beta:<100; /* beta paramter */ global TV_TS; TV_TS:>0; /* transversion/transition rate ratio */ global ncats = 5; /* number of rate categories */ global IP = 1/ncats; IP :< 1-(1e-10); /* weight of invariant category */ beta:=alpha*(1-IP); /* mean of rate dist:=1 => (1-IP)*(alpha/beta)+IP*0 := 1 */ catFreqMatrix = { ncats, 1 }; catFreqMatrix[0):
Output [code] amma+Invariant using category variable Log Likelihood = -3285.32599393655; Shared Parameters: IP=0.438837 alpha=0.791996 TV_TS=0.0980606 beta=alpha*(1-IP)=0.444439 Tree givenTree=((((D_CD_83_ELI_ACC_K03454:0.0214343,D_CD_83_NDK_ACC_M27323:0.0101549)Node3:0.0121809,D_UG_94_94UG114_ACC_U88824:0.0659841)Node2:0.00275392,D_CD_84_84ZR085_ACC_U88822:0.0326156)Node1:0.0278006,B_US_83_RF_ACC_M17451:0.0301682,((B_FR_83_HXB2_ACC_K03455:0.0124699,B_US_86_JRFL_ACC_U63632:0.0189762)Node10:0.00189701,B_US_90_WEAU160_ACC_U21135:0.0225559)Node9:0.00470909); Gamma+Invariant using model mixture Log Likelihood = -3285.32535002373; Shared Parameters: IP=0.440838 alpha=0.799072 TV_TS=0.0980331 beta=alpha*(1-IP)=0.446811 Tree givenTree1=((((D_CD_83_ELI_ACC_K03454:0.0383084,D_CD_83_NDK_ACC_M27323:0.0181054)Node3:0.0218335,D_UG_94_94UG114_ACC_U88824:0.118069)Node2:0.00489725,D_CD_84_84ZR085_ACC_U88822:0.0583811)Node1:0.0497151,B_US_83_RF_ACC_M17451:0.0539374,((B_FR_83_HXB2_ACC_K03455:0.0222191,B_US_86_JRFL_ACC_U63632:0.0340032)Node10:0.00337515,B_US_90_WEAU160_ACC_U21135:0.0403464)Node9:0.0084376); Tree givenTree2=((((D_CD_83_ELI_ACC_K03454:0,D_CD_83_NDK_ACC_M27323:0)Node3:0,D_UG_94_94UG114_ACC_U88824:0)Node2:0,D_CD_84_84ZR085_ACC_U88822:0)Node1:0,B_US_83_RF_ACC_M17451:0,((B_FR_83_HXB2_ACC_K03455:0,B_US_86_JRFL_ACC_U63632:0)Node10:0,B_US_90_WEAU160_ACC_U21135:0)Node9:0); [/code] Cheers, Sergei |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |