Sergei
|
Dear Olivier,
Both methods (not suprisingly) have their pros and cons. The rule of thumb is probably to use random effects (G+I + Emp Bayes) for smaller data sets, and fixed effects (siteRates.bf) for larger data sets.
The strength of FEL is that it is distribution free, so that it won't force a potentially wrong distribution (G+I or any other) onto the rates in your alignments, and the weakness, is that it is fairly noisy, because you are estimating a rate from one site, so your sample size is about the number of sequences, hence you need to have about 25 or more to avoid serious overfitting.
The strength of REL is that it can avoid overfitting by 'pooling' similar sites into a single rate class, but this is also its weakness, because one has to assume something about the unknown distribution of rates. The practical result of this is that the inference can 'smooth' the rates, forcing sites with different patterns into the same rate class, if the underlying distribution is insufficiently flexible. The sample size of hyperparameters of REL (alpha and P_I for G+I) is the length of the alignment, so it will almost surely do poorly for short sequences. I would recommend using beta-Gamma, as it is much more flexible that G+I with only one extra parameter.
HTH, Sergei
|