Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Using nonreversible nucleotide models (Read 4517 times)
Jeff Mower
Guest


Using nonreversible nucleotide models
Oct 8th, 2005 at 11:49pm
 
Hello,

The documentation for Hyphy states that "HyPhy can work properly with any member of the class of general time reversible models, regardless of the number of character states. "

Does Hyphy also work properly for non time reversible models?  Should I force Hyphy to accept a rooted topology when doing this? 

From what I understand, the likelihood will change depending on the rooting when using a nonreversible model.   I am not really interested in the value of the likelihood;  I am more interested in the values of the rate matrix.  Will the rate matrix values also change depending on the root?

Any help you can provide is greatly appreciated.  Thanks.

Jeff
Back to top
 
 
IP Logged
 
Jeff Mower
Guest


Re: Using nonreversible nucleotide models
Reply #1 - Oct 9th, 2005 at 12:10am
 
Hi again,

Sorry, one other issue.  I like to save my results as a batch file for future reference.  I used a nonreversible model and forced Hyphy to accept rooted trees in my initial analysis, then saved the results as a batch file.   But when I import the results back into Hyphy using the batch file and print the likelihood, it is different than the value I originally got.

I am guessing that Hyphy does not recognize that the initial analysis used rooted trees, and the change in likelihood reflects Hyphy calculating the likelihood using an unrooted topology.  If so, is there any way around this?  Thanks again.

Jeff
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Using nonreversible nucleotide models
Reply #2 - Oct 10th, 2005 at 7:43am
 
Dear Jeff,

The only fundamental difference in how one goes about computing the phylogenetic likelihood function, when general (non-reversible) models are used, is that the placement of the root is now important, i.e. one can't automatically 'unroot' the tree.

HyPhy will unroot all trees, unless the ACCEPT_ROOTED_TREES batch language
language constant is set to '1'.

I can imagine cases when placement of the root may have a strong effect on rate estimates (e.g. a large tree, where the 'outgroup' is incorrectly rooted to be deep in the tree).

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Using nonreversible nucleotide models
Reply #3 - Oct 10th, 2005 at 7:44am
 
Dear Jeff,

Quote:
But when I import the results back into Hyphy using the batch file and print the likelihood, it is different than the value I originally got.


It would appear that this is a bug; I'll make sure that HyPhy sets the ACCEPT_ROOTED_TREES flag in saved analyses when necessary. In the meantime, you can edit the saved files and add
Code:
ACCEPT_ROOTED_TREES=1;
 


flag at the beginning of the HyPhy NEXUS block to prevent tree unrooting.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jeff Mower
Guest


Re: Using nonreversible nucleotide models
Reply #4 - Oct 10th, 2005 at 10:52am
 
Hi Sergei,

Thanks for the quick reply and for the helpful information.

Quote:
HyPhy will unroot all trees, unless the ACCEPT_ROOTED_TREES batch language
language constant is set to '1'.


This is what I did, but then I got worried when reimporting the batch files and saw the likelihood changing.  I have now stuck the  "ACCEPT_ROOTED_TREES=1;"  code into the batch file as you suggested, and that worked great.  Thanks.


I have another question if you don't mind.  I'd like to estimate the actual number of specific changes that have occurred along a particular lineage.  For example, the number of A->G transitions that have occurred along the human branch subsequent to the split between humans and chimps.  I'm using local rate matrices, so that each branch has its own set of rates.  And the matrices are nonreversible as I've mentioned, so that the A->G rate is separate from the G->A rate.  What would be the best way to go about this?

I tried doing this by first reconstructing ancestral sequences, then taking the number of A's present in the human-chimp common ancestor and multiplying this number by the A->G rate from the human branch matrix.  It seems like this should give me the number of A->G changes that have occurred along the humun branch, but do I need to consider the possibility of multiple changes along the branch?  For instance, some of the A->G changes might actually go A->G->C by the end of the branch.  Do you have any thoughts on whether this approach is valid, and/or any suggestions on how else I could approach this?

Thanks again for your help.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Using nonreversible nucleotide models
Reply #5 - Oct 10th, 2005 at 9:53pm
 
Dear Jeff,

Quote:
I have another question if you don't mind.  I'd like to estimate the actual number of specific changes that have occurred along a particular lineage.  For example, the number of A->G transitions that have occurred along the human branch subsequent to the split between humans and chimps.  I'm using local rate matrices, so that each branch has its own set of rates.  And the matrices are nonreversible as I've mentioned, so that the A->G rate is separate from the G->A rate.  What would be the best way to go about this?


This is a fairly tricky question. One easy option (which is likely to undercount the number of changes) is to infer the ancestral sequences and simply count the number of changes from A->G (or any other pair) along the branches. The problem with this is two-fold

(a). We are treating ancestral states as known, ignoring reconstruction errors (this can be corrected fairly easily, by computing support for a given branch labeling, much like we did in the WAC method in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

(b). Secondly, what you are really after is the expected number of A->G transitions over any branch. Computing this quantity is much more involved, because it involves integration over all paths of a Markov chain which can take it from one state to another (not necessarily from A or to G, but all possible 16 pairs) and has at least one A->G substitution. Dutheil et al (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) address this to an extent (see equations 4-7). Their approach still does not directly yield the expected number of A->G substitutions, but it can be modified to approximate it (I believe).

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jeff Mower
Guest


Re: Using nonreversible nucleotide models
Reply #6 - Oct 12th, 2005 at 8:56pm
 
Hi Sergei,

Ahhh well, I was afraid it wouldn't be easy.  I'll look into the references you mentioned and see what I can do.  You've been very helpful.  Thanks a lot.

Jeff
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Using nonreversible nucleotide models
Reply #7 - Oct 13th, 2005 at 9:33am
 
Dear Jeff,
Quote:
Ahhh well, I was afraid it wouldn't be easy.  I'll look into the references you mentioned and see what I can do.  You've been very helpful.  Thanks a lot.


Unless your branches long (e.g. >1 expected substitution per site with or without rate heterogeneity), you are probably fine using the simplest approach of counting differences at this point.

HyPhy has some (undocumented still) features to compute support for branch labeling, which I can help you with - I'd be more concerned with ancestral sequence uncertainty than undercounting multiple substitutions.

To be fair, I am now intrigued with doing that properly too, so I might add the feature soon (when I have some time:( ).

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged