Jamie
YaBB Newbies
Offline
Feed your monkey!
Posts: 7
|
Sergei, thanks for offering the more explicit instructions. However, in doing so, I realized that I was looking for a clever and you gave me scalpel (to reference the prez's budgeting...).
To explain further, and in reference to mm who started this thread, PAML has a 'dirty data' option which causes the program to ignore any site that has a gap. This is what I mean by a clever. My understanding of PAML is that, with this option turned on, if you give it a 10 codon alignment and there is a gap in one or more taxa for 1 codon, it analyzes only 9 codons. In hyphy terms, it creates a partition that excludes any codon sites with a gap in the alignment.
Now, assuming I'm understanding things correctly, the 'COUNT_GAPS_IN_FREQUENCIES = 0;' "scalpel" is much more nuanced than this. Using this option does not affect the partitioning of the data in any way -- all 10 sites in our alignment will be analyzed. it just affects how the nucleotide frequencies are estimated for that codon site, as you describe above.
So what I should have asked is whether there is a way to tell HYPHY to ignore every alignment position that includes a gap. I believe this is what mm was asking for. So, in the case of an FEL analysis (for instance in the SelectionSubtreeComparison that I'm working with), I should have an output where estimates are reported for only 9 of 10 sites.
I do recognize that FELs are conveniently 'quantized' by site and that it would not be all that difficult to generate a 'gapped T/F' vector externally via some script that parses the alignment. But this is not the case for REL analyses which combine info across sites. So, for anyone migrating from PAML to HYPHY (as I think we all should...), having this option in HYPHY would help the pilgrims.
Thanks for helping me out with the nitty-gritty here...
Jamie
|