HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
HYPHY Package >> HyPhy feedback >> Substitution model determination
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1152741222

Message started by Masaki on Jul 12th, 2006 at 2:53pm

Title: Substitution model determination
Post by Masaki on Jul 12th, 2006 at 2:53pm
Hi, I am a beginner of the HyPhy.
I wonder why a tree was required for NucModelCompare.bf.
How should I reconstruct the tree before substitution model was determined?
Since Modeltest indicated that HKY85+G is the best model for the data, I reconstructed a ML tree by the substitution model, then, used the tree for NucModelCompare.bf.
Although HKY85 (in local option) and HKY85+G (in global w/variation option) were also supported by NucModelCompare.bf., is it a proper procedure to get correct result?

Title: Re: Substitution model determination
Post by Sergei on Jul 12th, 2006 at 10:50pm
Dear Masaki,

A very good question indeed. Firstly, you need a tree for model selection because you need to take identity by descent versus independent mutation into account. When you infer the tree based on some substitution model, this will definitely introduce a bias into subsequent model selection based on that tree, but most likely it is not fatal.

Model selection tends to be fairly robust with respect to some changes in the tree. You can check this by trying to select models on trees inferred by (for example), parsimony, neighbor joining and ML. In many cases you will find that the model selected for each tree will be the same. This is not to say that model selection is always insensitive to tree bias, I am sure there are examples when one strongly influences the other.

You can also try the inverse of this process, and infer several trees based on some candidate models (HKY, GTR, TrN etc) and see if the trees are identical or highly similar.

Lastly, if you want to be very safe, I would recommend a model-averaged approach from the 2004 Systematic Biology paper by Posada and Buckley. In essense, you can fit all 203 possible time reversible nucleotide models (from F81 to GTR and everything in between), and try to base your inference on weighted contributions from all models.

Hope this helps,
Sergei

Title: Re: Substitution model determination
Post by Masaki on Jul 14th, 2006 at 1:53am
Dear Sergei,

Thank you for your kind and quick reply to my question. I will check the robustness of the model selection by providing other trees constructed by MP, NJ and ML with other models.

I have another question on the substitution model useage in HyPhy.
That is how do I select the local, global, global w/variation, and global w/variation option for the data analysis.
Can I compare them using AIC for the best model that was selected by NucModelCompare.bf.  in each option?

Title: Re: Substitution model determination
Post by Sergei on Jul 14th, 2006 at 10:50am
Dear Masaki,

You could indeed do that. Local models are generally too parameter rich for larger datasets, since they attempt to estimate multiple parameters per branch (e.g. up to 6 parameters, with local GTR). Local models should only really be used for long alignments and a few sequences (e.g. < 20)

You can test for the rate variation option by fitting the Beta+Gamma distribution on top of the Global nucleotide model and then using AIC (or small sample AIC or BIC) to choose the best model. In most cases, the Beta+Gamma model will be better than the plain Global one.

Hope this helps,
Sergei

Title: Re: Substitution model determination
Post by Masaki on Jul 17th, 2006 at 6:10pm
Dear Sergei,

Thank you for your kind explanation.
I will do it as you taught.
Thank you, again.

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.