HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Theoretical questions >> Sequence Analysis >> Site Likelihoods = 0
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1235471674

Message started by Miguel Lacerda on Feb 24th, 2009 at 2:34am

Title: Site Likelihoods = 0
Post by Miguel Lacerda on Feb 24th, 2009 at 2:34am
Hi Sergei,

After optimising all the parameters (including branch lengths) of a GY94 codon substitution model, I obtained a log-likelihood of -13062369.95 indicating that 13 sites have a likelihood of zero under this model. I have examined these sites and cannot find anything which makes them stand out as unusual.

1. Why do some sites end up with a zero likelihood after parameter optimisation?

2. Is there a way around this as I require non-zero likelihoods for all sites?

Thanks so much!

Best regards,
Miguel

Title: Re: Site Likelihoods = 0
Post by Sergei on Feb 24th, 2009 at 7:29am
Dear Miguel,

Assuming that underflow scaling works as planned (of which I am 99% confident, but get SVN 414 or later), you will see 0 likelihoods either if there is a NAN somewhere in a calculation, or if Pr {*->Y} = 0 for any starting character and a FIXED ending character (i.e. along a terminal branch). This will happen in a clade like (X:0,Y:0)*, or if the Y-th column of the transition matrix is all 0's. Check for these two conditions, and if you can't find them, send me the file.

Cheers,
Sergei

Title: Re: Site Likelihoods = 0
Post by Miguel Lacerda on Feb 25th, 2009 at 7:06am
Dear Sergei,

I have tried everything you have suggested using SVN v298. The branch lengths were  jointly optimized in HyPhy along with all the other parameters in the GY94 model and there are no clades of the form (X:0,Y:0) - in fact, there is only 1 terminal branch of length zero. I have also checked the transition matrix P = exp(Q*t) for columns of zeros and did not find any.

As estimating the parameters of a GY94 model and computing site-by-site likelihoods is a very standard analysis, I would not expect anything unusual that would produce NANs in the substitution matrix.

The attached txt file contains the site-by-site log-likelihoods that I get when running the attached bf file. You will notice that 13 sites have -Inf.

Is there any way around this?

Thanks so much!

Miguel


http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=SiteLikelihoods_001.zip (63 KB | )

Title: Re: Site Likelihoods = 0
Post by Sergei on Feb 25th, 2009 at 3:59pm
Dear Miguel,

In the newer SVN code the -Inf goes away. What happens is that if a site underflowed, HyPhy used to deal with it in the optimization phase, but report a 0 probability (hence -Inf log likelihood) in the site reconstruction phase. Now

[code]
ConstructCategoryMatrix (siteLikelihoods,`lfID`)
[/code]

will return both the scaled siteLikelihoods (so the comparison between classes at a site is valid) and a vector siteLikelihoods.site_scalers, which lists the number of times each site likelihood has been scaled by a constant factor (2^200). If the factor is 0, then a given site has not been scaled and a factor is, say 3, that the probability of the site is the corresponding value in  siteLikelihoods divided by (2^200)^3. E.g. at site 15 (which has a -Inf in your example), the reported scaled log-likelihood is -11.455278, but the scaling factor is 7, hence the actual likelihood is -11.455278 - 7*200*Log(2) = -981.861

HTH,
Sergei

Title: Re: Site Likelihoods = 0
Post by Miguel Lacerda on Mar 4th, 2009 at 6:17am
Dear Sergei,

I am struggling to reproduce your results with SVN 415. Firstly, I discovered that the -Inf's only go away if I run HYPHYMP_DEV, but not HYPHYMP. And secondly, I am not able to retrieve the site_scalers vector you speak of.

I have only added the following simple code to the end of the batch file discussed above:

[code]
ConstructCategoryMatrix(siteLikelihoods, "logL");
fprintf("sitelik.txt", siteLikelihoods);
fprintf("scalers.txt", siteLikelihoods.site_scalers);
[/code]

Presumably this should do the trick? Or am I doing something stupid?

Thanks again!

Kind regards,
Miguel

Title: Re: Site Likelihoods = 0
Post by Sergei on Mar 4th, 2009 at 8:29am
Dear Miguel,

Yes indeed, you need to be using HYPHYMP_DEV. The HYPHYMP version is frozen at the May 8 2008 feature set and does not include any of the new scaling routines.

Indeed, there should a site_scalers matrix and a .log_scale_multiplier number that tells you what one unit of scaling is on the log scale (in the HYPHYMP_DEV version).

Cheers,
Sergei




Title: Re: Site Likelihoods = 0
Post by Miguel Lacerda on Mar 5th, 2009 at 6:27am
Dear Sergei,

I still cannot retrieve the site_scalers vector you refer to using SVN 415. I have attached my code - it is exactly the same as the previous code given above except that I have added:

[code]
 ConstructCategoryMatrix(siteLikelihoods, logL);
 fprintf("sitelik.txt", siteLikelihoods);
 fprintf("scalers.txt", siteLikelihoods.site_scalers);
[/code]

at the end. The site likelihoods are returned correctly, but HyPhy does not seem to recognise site_scalers and outputs simply a 0. I have the same problem with the log_scale_multiplier.

Please could you advise.

Thanks,
Miguel

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=SiteLikelihoods_Sergei.bf (8 KB | )

Title: Re: Site Likelihoods = 0
Post by Sergei on Mar 5th, 2009 at 4:26pm
Dear Miguel,

Let me put out the next SVN revision (mid next week, hopefully) that should have all this cleaned up.

Thanks for your patience. Being a professor means I don't get to code like I could in grad school  

Sergei

Title: Re: Site Likelihoods = 0
Post by Sergei on Mar 12th, 2009 at 6:42pm
Dear Miguel,

I've created a new SVN branch that houses a new codebase of HyPhy. You can download it directly from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Please be advised that this version is not very stable and doesn't yet support many of the older version's features, but I am actively working on with the goal of releasing HyPhy v2.0 (with things like GPU CUDA acceleration) by the end of the spring.

If you also grab the initial test suite from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login and look inside the REL directory (SmallNucREL.bf and utils.bf) for examples of .site_scalers usage.

Cheers,
Sergei


Title: Re: Site Likelihoods = 0
Post by Miguel Lacerda on Mar 13th, 2009 at 3:43am
Great! Thanks a lot - will give it a shot! I look forward to HyPhy v2.0  ;D

Thanks for all your effort,
Miguel

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.