HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> Using nonreversible nucleotide models
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1128840581

Message started by Jeff Mower on Oct 8th, 2005 at 11:49pm

Title: Using nonreversible nucleotide models
Post by Jeff Mower on Oct 8th, 2005 at 11:49pm
Hello,

The documentation for Hyphy states that "HyPhy can work properly with any member of the class of general time reversible models, regardless of the number of character states. "

Does Hyphy also work properly for non time reversible models?  Should I force Hyphy to accept a rooted topology when doing this?  

From what I understand, the likelihood will change depending on the rooting when using a nonreversible model.   I am not really interested in the value of the likelihood;  I am more interested in the values of the rate matrix.  Will the rate matrix values also change depending on the root?

Any help you can provide is greatly appreciated.  Thanks.

Jeff  

Title: Re: Using nonreversible nucleotide models
Post by Jeff Mower on Oct 9th, 2005 at 12:10am
Hi again,

Sorry, one other issue.  I like to save my results as a batch file for future reference.  I used a nonreversible model and forced Hyphy to accept rooted trees in my initial analysis, then saved the results as a batch file.   But when I import the results back into Hyphy using the batch file and print the likelihood, it is different than the value I originally got.

I am guessing that Hyphy does not recognize that the initial analysis used rooted trees, and the change in likelihood reflects Hyphy calculating the likelihood using an unrooted topology.  If so, is there any way around this?  Thanks again.

Jeff

Title: Re: Using nonreversible nucleotide models
Post by Sergei on Oct 10th, 2005 at 7:43am
Dear Jeff,

The only fundamental difference in how one goes about computing the phylogenetic likelihood function, when general (non-reversible) models are used, is that the placement of the root is now important, i.e. one can't automatically 'unroot' the tree.

HyPhy will unroot all trees, unless the ACCEPT_ROOTED_TREES batch language
language constant is set to '1'.

I can imagine cases when placement of the root may have a strong effect on rate estimates (e.g. a large tree, where the 'outgroup' is incorrectly rooted to be deep in the tree).

Cheers,
Sergei

Title: Re: Using nonreversible nucleotide models
Post by Sergei on Oct 10th, 2005 at 7:44am
Dear Jeff,


wrote on Oct 9th, 2005 at 12:10am:
But when I import the results back into Hyphy using the batch file and print the likelihood, it is different than the value I originally got.


It would appear that this is a bug; I'll make sure that HyPhy sets the ACCEPT_ROOTED_TREES flag in saved analyses when necessary. In the meantime, you can edit the saved files and add
[code]
ACCEPT_ROOTED_TREES=1;
[/code]
flag at the beginning of the HyPhy NEXUS block to prevent tree unrooting.

Cheers,
Sergei

Title: Re: Using nonreversible nucleotide models
Post by Jeff Mower on Oct 10th, 2005 at 10:52am
Hi Sergei,

Thanks for the quick reply and for the helpful information.


wrote on Oct 10th, 2005 at 7:43am:
HyPhy will unroot all trees, unless the ACCEPT_ROOTED_TREES batch language
language constant is set to '1'.


This is what I did, but then I got worried when reimporting the batch files and saw the likelihood changing.  I have now stuck the  "ACCEPT_ROOTED_TREES=1;"  code into the batch file as you suggested, and that worked great.  Thanks.


I have another question if you don't mind.  I'd like to estimate the actual number of specific changes that have occurred along a particular lineage.  For example, the number of A->G transitions that have occurred along the human branch subsequent to the split between humans and chimps.  I'm using local rate matrices, so that each branch has its own set of rates.  And the matrices are nonreversible as I've mentioned, so that the A->G rate is separate from the G->A rate.  What would be the best way to go about this?

I tried doing this by first reconstructing ancestral sequences, then taking the number of A's present in the human-chimp common ancestor and multiplying this number by the A->G rate from the human branch matrix.  It seems like this should give me the number of A->G changes that have occurred along the humun branch, but do I need to consider the possibility of multiple changes along the branch?  For instance, some of the A->G changes might actually go A->G->C by the end of the branch.  Do you have any thoughts on whether this approach is valid, and/or any suggestions on how else I could approach this?

Thanks again for your help.

Title: Re: Using nonreversible nucleotide models
Post by Sergei on Oct 10th, 2005 at 9:53pm
Dear Jeff,


wrote on Oct 10th, 2005 at 10:52am:
I have another question if you don't mind.  I'd like to estimate the actual number of specific changes that have occurred along a particular lineage.  For example, the number of A->G transitions that have occurred along the human branch subsequent to the split between humans and chimps.  I'm using local rate matrices, so that each branch has its own set of rates.  And the matrices are nonreversible as I've mentioned, so that the A->G rate is separate from the G->A rate.  What would be the best way to go about this?


This is a fairly tricky question. One easy option (which is likely to undercount the number of changes) is to infer the ancestral sequences and simply count the number of changes from A->G (or any other pair) along the branches. The problem with this is two-fold

(a). We are treating ancestral states as known, ignoring reconstruction errors (this can be corrected fairly easily, by computing support for a given branch labeling, much like we did in the WAC method in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

(b). Secondly, what you are really after is the expected number of A->G transitions over any branch. Computing this quantity is much more involved, because it involves integration over all paths of a Markov chain which can take it from one state to another (not necessarily from A or to G, but all possible 16 pairs) and has at least one A->G substitution. Dutheil et al (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) address this to an extent (see equations 4-7). Their approach still does not directly yield the expected number of A->G substitutions, but it can be modified to approximate it (I believe).

HTH,
Sergei

Title: Re: Using nonreversible nucleotide models
Post by Jeff Mower on Oct 12th, 2005 at 8:56pm
Hi Sergei,

Ahhh well, I was afraid it wouldn't be easy.  I'll look into the references you mentioned and see what I can do.  You've been very helpful.  Thanks a lot.

Jeff

Title: Re: Using nonreversible nucleotide models
Post by Sergei on Oct 13th, 2005 at 9:33am
Dear Jeff,

wrote on Oct 12th, 2005 at 8:56pm:
Ahhh well, I was afraid it wouldn't be easy.  I'll look into the references you mentioned and see what I can do.  You've been very helpful.  Thanks a lot.


Unless your branches long (e.g. >1 expected substitution per site with or without rate heterogeneity), you are probably fine using the simplest approach of counting differences at this point.

HyPhy has some (undocumented still) features to compute support for branch labeling, which I can help you with - I'd be more concerned with ancestral sequence uncertainty than undercounting multiple substitutions.

To be fair, I am now intrigued with doing that properly too, so I might add the feature soon (when I have some time:( ).

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.