HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Theoretical questions >> Sequence Analysis >> Site Likelihoods = 0 http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1235471674 Message started by Miguel Lacerda on Feb 24th, 2009 at 2:34am |
Title: Site Likelihoods = 0 Post by Miguel Lacerda on Feb 24th, 2009 at 2:34am
Hi Sergei,
After optimising all the parameters (including branch lengths) of a GY94 codon substitution model, I obtained a log-likelihood of -13062369.95 indicating that 13 sites have a likelihood of zero under this model. I have examined these sites and cannot find anything which makes them stand out as unusual. 1. Why do some sites end up with a zero likelihood after parameter optimisation? 2. Is there a way around this as I require non-zero likelihoods for all sites? Thanks so much! Best regards, Miguel |
Title: Re: Site Likelihoods = 0 Post by Sergei on Feb 24th, 2009 at 7:29am
Dear Miguel,
Assuming that underflow scaling works as planned (of which I am 99% confident, but get SVN 414 or later), you will see 0 likelihoods either if there is a NAN somewhere in a calculation, or if Pr {*->Y} = 0 for any starting character and a FIXED ending character (i.e. along a terminal branch). This will happen in a clade like (X:0,Y:0)*, or if the Y-th column of the transition matrix is all 0's. Check for these two conditions, and if you can't find them, send me the file. Cheers, Sergei |
Title: Re: Site Likelihoods = 0 Post by Miguel Lacerda on Feb 25th, 2009 at 7:06am
Dear Sergei,
I have tried everything you have suggested using SVN v298. The branch lengths were jointly optimized in HyPhy along with all the other parameters in the GY94 model and there are no clades of the form (X:0,Y:0) - in fact, there is only 1 terminal branch of length zero. I have also checked the transition matrix P = exp(Q*t) for columns of zeros and did not find any. As estimating the parameters of a GY94 model and computing site-by-site likelihoods is a very standard analysis, I would not expect anything unusual that would produce NANs in the substitution matrix. The attached txt file contains the site-by-site log-likelihoods that I get when running the attached bf file. You will notice that 13 sites have -Inf. Is there any way around this? Thanks so much! Miguel http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=SiteLikelihoods_001.zip (63 KB | )
|
Title: Re: Site Likelihoods = 0 Post by Sergei on Feb 25th, 2009 at 3:59pm
Dear Miguel,
In the newer SVN code the -Inf goes away. What happens is that if a site underflowed, HyPhy used to deal with it in the optimization phase, but report a 0 probability (hence -Inf log likelihood) in the site reconstruction phase. Now [code] ConstructCategoryMatrix (siteLikelihoods,`lfID`) [/code] will return both the scaled siteLikelihoods (so the comparison between classes at a site is valid) and a vector siteLikelihoods.site_scalers, which lists the number of times each site likelihood has been scaled by a constant factor (2^200). If the factor is 0, then a given site has not been scaled and a factor is, say 3, that the probability of the site is the corresponding value in siteLikelihoods divided by (2^200)^3. E.g. at site 15 (which has a -Inf in your example), the reported scaled log-likelihood is -11.455278, but the scaling factor is 7, hence the actual likelihood is -11.455278 - 7*200*Log(2) = -981.861 HTH, Sergei |
Title: Re: Site Likelihoods = 0 Post by Miguel Lacerda on Mar 4th, 2009 at 6:17am
Dear Sergei,
I am struggling to reproduce your results with SVN 415. Firstly, I discovered that the -Inf's only go away if I run HYPHYMP_DEV, but not HYPHYMP. And secondly, I am not able to retrieve the site_scalers vector you speak of. I have only added the following simple code to the end of the batch file discussed above: [code] ConstructCategoryMatrix(siteLikelihoods, "logL"); fprintf("sitelik.txt", siteLikelihoods); fprintf("scalers.txt", siteLikelihoods.site_scalers); [/code] Presumably this should do the trick? Or am I doing something stupid? Thanks again! Kind regards, Miguel |
Title: Re: Site Likelihoods = 0 Post by Sergei on Mar 4th, 2009 at 8:29am
Dear Miguel,
Yes indeed, you need to be using HYPHYMP_DEV. The HYPHYMP version is frozen at the May 8 2008 feature set and does not include any of the new scaling routines. Indeed, there should a site_scalers matrix and a .log_scale_multiplier number that tells you what one unit of scaling is on the log scale (in the HYPHYMP_DEV version). Cheers, Sergei |
Title: Re: Site Likelihoods = 0 Post by Miguel Lacerda on Mar 5th, 2009 at 6:27am
Dear Sergei,
I still cannot retrieve the site_scalers vector you refer to using SVN 415. I have attached my code - it is exactly the same as the previous code given above except that I have added: [code] ConstructCategoryMatrix(siteLikelihoods, logL); fprintf("sitelik.txt", siteLikelihoods); fprintf("scalers.txt", siteLikelihoods.site_scalers); [/code] at the end. The site likelihoods are returned correctly, but HyPhy does not seem to recognise site_scalers and outputs simply a 0. I have the same problem with the log_scale_multiplier. Please could you advise. Thanks, Miguel http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=SiteLikelihoods_Sergei.bf (8 KB | )
|
Title: Re: Site Likelihoods = 0 Post by Sergei on Mar 5th, 2009 at 4:26pm
Dear Miguel,
Let me put out the next SVN revision (mid next week, hopefully) that should have all this cleaned up. Thanks for your patience. Being a professor means I don't get to code like I could in grad school Sergei |
Title: Re: Site Likelihoods = 0 Post by Miguel Lacerda on Mar 13th, 2009 at 3:43am
Great! Thanks a lot - will give it a shot! I look forward to HyPhy v2.0 ;D
Thanks for all your effort, Miguel |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |