Nicola
YaBB Newbies
Offline

Curious HyPhy user
Posts: 13
Gender:
|
Dear all,
I am developing a special model to allow polymorphisms in a phylogenetic analysis. Until now I assumed that observed allele frequencies are the "true" ones, but now I want to account for sampling variance ("uncertainty", or "ambiguity" one could say). That is, observed states at leaves will not be really observed any more, and tips of the tree will be similar to inner nodes.
The perfect solution for me would be to specify the input file not the usual way: ">species1 ATTGCA...", but explicitly declaring at each site the probability of each character: ">species1 (A:0.9,C:0.05,G:0.05)TTG(A:0.05,C:0.9,G:0.05)A..." or something similar (in fact, instead of nucleotides I have codons). I guess that to obtain this result I would have to modify the functions "ReadDataFile", "CreateFilter", probably also "LikelihoodFunction"? Does this sound feasible in a reasonable time?
The alternative option would be to introduce new characters, the "observed" states (Ao, Co, Go, To) distinct from the "unobserved" states (Au, Cu, Gu, Tu), allow only unobserved characters in the inner phylogeny, add "dummy" branches, one below each tree tip, and on those dummy branches allow only substitutions from unobserved to observed states. This should work fine, but should also slow down everything (as the state space would have increased dimension, and I am already basically working with codons).
Alternatively, is there a way to allow at most 1 substitution on a branch?
Thanks, ad best regards,
Nicola De Maio
|