Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Site Likelihoods = 0 (Read 6025 times)
Miguel Lacerda
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male
Site Likelihoods = 0
Feb 24th, 2009 at 2:34am
 
Hi Sergei,

After optimising all the parameters (including branch lengths) of a GY94 codon substitution model, I obtained a log-likelihood of -13062369.95 indicating that 13 sites have a likelihood of zero under this model. I have examined these sites and cannot find anything which makes them stand out as unusual.

1. Why do some sites end up with a zero likelihood after parameter optimisation?

2. Is there a way around this as I require non-zero likelihoods for all sites?

Thanks so much!

Best regards,
Miguel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Likelihoods = 0
Reply #1 - Feb 24th, 2009 at 7:29am
 
Dear Miguel,

Assuming that underflow scaling works as planned (of which I am 99% confident, but get SVN 414 or later), you will see 0 likelihoods either if there is a NAN somewhere in a calculation, or if Pr {*->Y} = 0 for any starting character and a FIXED ending character (i.e. along a terminal branch). This will happen in a clade like (X:0,Y:0)*, or if the Y-th column of the transition matrix is all 0's. Check for these two conditions, and if you can't find them, send me the file.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel Lacerda
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male
Re: Site Likelihoods = 0
Reply #2 - Feb 25th, 2009 at 7:06am
 
Dear Sergei,

I have tried everything you have suggested using SVN v298. The branch lengths were  jointly optimized in HyPhy along with all the other parameters in the GY94 model and there are no clades of the form (X:0,Y:0) - in fact, there is only 1 terminal branch of length zero. I have also checked the transition matrix P = exp(Q*t) for columns of zeros and did not find any.

As estimating the parameters of a GY94 model and computing site-by-site likelihoods is a very standard analysis, I would not expect anything unusual that would produce NANs in the substitution matrix.

The attached txt file contains the site-by-site log-likelihoods that I get when running the attached bf file. You will notice that 13 sites have -Inf.

Is there any way around this?

Thanks so much!

Miguel

Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (63 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Likelihoods = 0
Reply #3 - Feb 25th, 2009 at 3:59pm
 
Dear Miguel,

In the newer SVN code the -Inf goes away. What happens is that if a site underflowed, HyPhy used to deal with it in the optimization phase, but report a 0 probability (hence -Inf log likelihood) in the site reconstruction phase. Now

Code:
ConstructCategoryMatrix (siteLikelihoods,`lfID`)
 



will return both the scaled siteLikelihoods (so the comparison between classes at a site is valid) and a vector siteLikelihoods.site_scalers, which lists the number of times each site likelihood has been scaled by a constant factor (2^200). If the factor is 0, then a given site has not been scaled and a factor is, say 3, that the probability of the site is the corresponding value in  siteLikelihoods divided by (2^200)^3. E.g. at site 15 (which has a -Inf in your example), the reported scaled log-likelihood is -11.455278, but the scaling factor is 7, hence the actual likelihood is -11.455278 - 7*200*Log(2) = -981.861

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel Lacerda
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male
Re: Site Likelihoods = 0
Reply #4 - Mar 4th, 2009 at 6:17am
 
Dear Sergei,

I am struggling to reproduce your results with SVN 415. Firstly, I discovered that the -Inf's only go away if I run HYPHYMP_DEV, but not HYPHYMP. And secondly, I am not able to retrieve the site_scalers vector you speak of.

I have only added the following simple code to the end of the batch file discussed above:

Code:
ConstructCategoryMatrix(siteLikelihoods, "logL");
fprintf("sitelik.txt", siteLikelihoods);
fprintf("scalers.txt", siteLikelihoods.site_scalers);
 



Presumably this should do the trick? Or am I doing something stupid?

Thanks again!

Kind regards,
Miguel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Likelihoods = 0
Reply #5 - Mar 4th, 2009 at 8:29am
 
Dear Miguel,

Yes indeed, you need to be using HYPHYMP_DEV. The HYPHYMP version is frozen at the May 8 2008 feature set and does not include any of the new scaling routines.

Indeed, there should a site_scalers matrix and a .log_scale_multiplier number that tells you what one unit of scaling is on the log scale (in the HYPHYMP_DEV version).

Cheers,
Sergei



Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel Lacerda
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male
Re: Site Likelihoods = 0
Reply #6 - Mar 5th, 2009 at 6:27am
 
Dear Sergei,

I still cannot retrieve the site_scalers vector you refer to using SVN 415. I have attached my code - it is exactly the same as the previous code given above except that I have added:

Code:
  ConstructCategoryMatrix(siteLikelihoods, logL);
  fprintf("sitelik.txt", siteLikelihoods);
  fprintf("scalers.txt", siteLikelihoods.site_scalers);
 



at the end. The site likelihoods are returned correctly, but HyPhy does not seem to recognise site_scalers and outputs simply a 0. I have the same problem with the log_scale_multiplier.

Please could you advise.

Thanks,
Miguel
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (8 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Likelihoods = 0
Reply #7 - Mar 5th, 2009 at 4:26pm
 
Dear Miguel,

Let me put out the next SVN revision (mid next week, hopefully) that should have all this cleaned up.

Thanks for your patience. Being a professor means I don't get to code like I could in grad school  

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Likelihoods = 0
Reply #8 - Mar 12th, 2009 at 6:42pm
 
Dear Miguel,

I've created a new SVN branch that houses a new codebase of HyPhy. You can download it directly from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Please be advised that this version is not very stable and doesn't yet support many of the older version's features, but I am actively working on with the goal of releasing HyPhy v2.0 (with things like GPU CUDA acceleration) by the end of the spring.

If you also grab the initial test suite from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login and look inside the REL directory (SmallNucREL.bf and utils.bf) for examples of .site_scalers usage.

Cheers,
Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel Lacerda
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male
Re: Site Likelihoods = 0
Reply #9 - Mar 13th, 2009 at 3:43am
 
Great! Thanks a lot - will give it a shot! I look forward to HyPhy v2.0  Grin

Thanks for all your effort,
Miguel
Back to top
 
 
IP Logged