HyPhy message board - Print Page

Hi all,

I'm working on characterising the meta genome of a virus, and wanted to identify areas of the genome under positive evolutionary pressure. It looks like the HyPhy package would do the trick, but I'm unfortunately not well versed with genomic statistics or the programs associated with them. Thus, I was hoping someone could point me in the right direction.

Some background: The virus has low overall NA diversity (~2% on ~6Kb), with its most variable gene being 97% conserved (~1.5kb gene). The virus does have subgroups, and these subgroups are the origin of most of the variation (that is, the polymorphisms are conserved in a sub-type specific manner). This leads me to conclude that those sites would be under positive selection, correct?

As a bit of an excercise, I plugged in the alignment of my most variable gene into the REL app on the Datamonkey.org site, and got some results that I was puzzled by. The most vairable gene showed that I only had sites under negative selection, and the output was sensitive to the number or sequences that I would put in, despite identical sequences being pruned out prior to analyses apparently. I think I am missing something fundemental here...

So,
I guess my questions are:
1) Given the low sequence diversity and small sequence size, what would be the best method for analysing selective pressure?
2) Would it be most accurate to analyse the whole genome, including non-coding and regulatory regions, or analyse each gene separately?

Thanks for any help!

Hi Jan,

Generally speaking, the level of divergence in your alignment is probably too low (unless you have many, i.e. 50+ sequences) to detect positive selection at the level of a single site reliably (which is why REL gives you unstable results). Typically, the best you can do in these cases is to test for the action of selection on all (or part) of the gene (i.e. are there any sites under selection), in place of specific sites (i.e. which sites are under selection).

What I would recommend you do instead is try to look for region specific selection, e.g. using PARRIS on Datamonkey to test for evidence of selection on the most variable gene.

If you haven't had a chance to look at Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Hi Sergei,

I have run all of my genes through PARRIS (out of 64 samples I get ~35 unique ones), including the most variable (capsid protein), and found no evidence of positive selection at p<0.1. This was also the case if I ran capsid gene in ~100nt windows, or if I only used a 198nt region of the capsid gene that encoded for an antigenic loop where one subtype had 2AA substitutions.

So I'm at a bit of a loss to explain the data. The best I can think of is that:

From a visual examination, the conserved subtype specific AA changes in the capsid protein, and especially in the antigenic loops, would be under positive selective pressure (diversifying). However, using PARRIS, no regions were detected under positive selection. So therefore could it be possible that these two subtypes have divereged enough that they exist as separate populations and thus are no longer under positive selective pressure in relation to each other?

Thanks again for your help!

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl Methodology Questions >> How to >> Viral positive selection detection http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1254007443 Message started by Jan on Sep 26^th, 2009 at 4:24pm