Jan
YaBB Newbies
Offline

Feed your monkey!
Posts: 2
|
Hi all,
I'm working on characterising the meta genome of a virus, and wanted to identify areas of the genome under positive evolutionary pressure. It looks like the HyPhy package would do the trick, but I'm unfortunately not well versed with genomic statistics or the programs associated with them. Thus, I was hoping someone could point me in the right direction.
Some background: The virus has low overall NA diversity (~2% on ~6Kb), with its most variable gene being 97% conserved (~1.5kb gene). The virus does have subgroups, and these subgroups are the origin of most of the variation (that is, the polymorphisms are conserved in a sub-type specific manner). This leads me to conclude that those sites would be under positive selection, correct?
As a bit of an excercise, I plugged in the alignment of my most variable gene into the REL app on the Datamonkey.org site, and got some results that I was puzzled by. The most vairable gene showed that I only had sites under negative selection, and the output was sensitive to the number or sequences that I would put in, despite identical sequences being pruned out prior to analyses apparently. I think I am missing something fundemental here...
So, I guess my questions are: 1) Given the low sequence diversity and small sequence size, what would be the best method for analysing selective pressure? 2) Would it be most accurate to analyse the whole genome, including non-coding and regulatory regions, or analyse each gene separately?
Thanks for any help!
|