Sergei
|
Dear Masaki,
A very good question indeed. Firstly, you need a tree for model selection because you need to take identity by descent versus independent mutation into account. When you infer the tree based on some substitution model, this will definitely introduce a bias into subsequent model selection based on that tree, but most likely it is not fatal.
Model selection tends to be fairly robust with respect to some changes in the tree. You can check this by trying to select models on trees inferred by (for example), parsimony, neighbor joining and ML. In many cases you will find that the model selected for each tree will be the same. This is not to say that model selection is always insensitive to tree bias, I am sure there are examples when one strongly influences the other.
You can also try the inverse of this process, and infer several trees based on some candidate models (HKY, GTR, TrN etc) and see if the trees are identical or highly similar.
Lastly, if you want to be very safe, I would recommend a model-averaged approach from the 2004 Systematic Biology paper by Posada and Buckley. In essense, you can fit all 203 possible time reversible nucleotide models (from F81 to GTR and everything in between), and try to base your inference on weighted contributions from all models.
Hope this helps, Sergei
|