Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Substitution model determination (Read 3950 times)
Masaki
Guest


Substitution model determination
Jul 12th, 2006 at 2:53pm
 
Hi, I am a beginner of the HyPhy.
I wonder why a tree was required for NucModelCompare.bf.
How should I reconstruct the tree before substitution model was determined?
Since Modeltest indicated that HKY85+G is the best model for the data, I reconstructed a ML tree by the substitution model, then, used the tree for NucModelCompare.bf.
Although HKY85 (in local option) and HKY85+G (in global w/variation option) were also supported by NucModelCompare.bf., is it a proper procedure to get correct result?
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Substitution model determination
Reply #1 - Jul 12th, 2006 at 10:50pm
 
Dear Masaki,

A very good question indeed. Firstly, you need a tree for model selection because you need to take identity by descent versus independent mutation into account. When you infer the tree based on some substitution model, this will definitely introduce a bias into subsequent model selection based on that tree, but most likely it is not fatal.

Model selection tends to be fairly robust with respect to some changes in the tree. You can check this by trying to select models on trees inferred by (for example), parsimony, neighbor joining and ML. In many cases you will find that the model selected for each tree will be the same. This is not to say that model selection is always insensitive to tree bias, I am sure there are examples when one strongly influences the other.

You can also try the inverse of this process, and infer several trees based on some candidate models (HKY, GTR, TrN etc) and see if the trees are identical or highly similar.

Lastly, if you want to be very safe, I would recommend a model-averaged approach from the 2004 Systematic Biology paper by Posada and Buckley. In essense, you can fit all 203 possible time reversible nucleotide models (from F81 to GTR and everything in between), and try to base your inference on weighted contributions from all models.

Hope this helps,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Masaki
Guest


Re: Substitution model determination
Reply #2 - Jul 14th, 2006 at 1:53am
 
Dear Sergei,

Thank you for your kind and quick reply to my question. I will check the robustness of the model selection by providing other trees constructed by MP, NJ and ML with other models.

I have another question on the substitution model useage in HyPhy.
That is how do I select the local, global, global w/variation, and global w/variation option for the data analysis.
Can I compare them using AIC for the best model that was selected by NucModelCompare.bf.  in each option?
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Substitution model determination
Reply #3 - Jul 14th, 2006 at 10:50am
 
Dear Masaki,

You could indeed do that. Local models are generally too parameter rich for larger datasets, since they attempt to estimate multiple parameters per branch (e.g. up to 6 parameters, with local GTR). Local models should only really be used for long alignments and a few sequences (e.g. < 20)

You can test for the rate variation option by fitting the Beta+Gamma distribution on top of the Global nucleotide model and then using AIC (or small sample AIC or BIC) to choose the best model. In most cases, the Beta+Gamma model will be better than the plain Global one.

Hope this helps,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Masaki
Guest


Re: Substitution model determination
Reply #4 - Jul 17th, 2006 at 6:10pm
 
Dear Sergei,

Thank you for your kind explanation.
I will do it as you taught.
Thank you, again.
Back to top
 
 
IP Logged