HyPhy message board - Print Page

Dear authors of Hyphy,

I have a question on the methods used to compare selective pressures among populations/genes (compareselectivepressures). I have been unable to find any documentation. What rationale and test is used here to judge the significance of the different mutation patterns between the two populations?

Thank you for any information,

Yours Ted

Dear Ted,

We haven't written the documentation for this analysis yet, because we are working on the manuscript describing it.

Very briefly, the idea for the pairwise comparison is as follows:

1). Find the best fitting nucleotide model for each alignment given the tree for each alignment.
2). Fit the respective nucleotide to each of the alignments to find branch lengths and rate bias parameters.
3). Holding those constant, for every codon site a MG94x(appropriate nucleotide model) is fitted to each of the alignments. The MG model has two parameters: alpha (syn rate) and beta (non-syn rate). The general model for every site assumes that alpha_1 and beta_1 at that site (first alignment) are independent of alpha_2 and beta_2. The null model assumes that beta_1/alpha_1 = beta_2/alpha_2. The null model stipulates equal selective pressure at the site in both alignments whereas the alternative allows the pressures to vary. To test for significance, a single-dergee of freedom LRT is carried out.

The analysis assumes, of course, that the alignments are that of the same gene but sampled from different populations (e.g different subtypes of the same virus, etc). In particular, the alignments must be of the same length. The power is satisfactory (80% or so) if you have large (>50 sequences) alignments, or the contrast in beta/alpha is high (e.g. positively selected in one alignment and negatively in the other). False positive rates are well controlled by the p-value (i.e. <= p in our simulations).

Once we submit the manuscript I'll post a link to the preprint.

HTH,
Sergei

Dear authors of Hyphy,

I have a question on the methods used to compare selective pressures among populations/genes (compare selective pressures). I have tried your method but it seems not sensitive to detect selection site difference in different compartments of HIV-1 sequences from the same patients. Although sequence compartmentalization of HIV-1 was evident and many sites were distinct between the two populations in time, no statistical significance was observed by this method. I may did something wrong? Or the method is not sensitive to detect minor variation? (Same patient’s seq compared to different clade sequences)
Thanks
Rafael

Dear Rafael,

The method for selection comparion is most sensitive to the number of sequences in each sample. Our own analyses (in the process of being published) suggest that one needs 30 or more sequences (in each sample) for the method to gain power. For smaller samples, you may need to use higher p-values, since the test is overall rather conservative.

Additionally, sequence diversity is also improtant. It may be that your within-patient samples are simply too similar for site-by-site analyses.

We are currently working on some Random Effects methods (which pool information accross all sites) to look for differential selection. I'll post a note here when we have something publically available.

HTH,
Sergei

Dear Rafael,

To add to Sergei's comments:

1. The compare selective pressures analysis asks whether dN/dS is different within each compartment. Hence, a site may be divergent, and driven by positive selection (dN>dS) between two compartments, but conserved (dN<dS) within a compartment. The compare selective pressure analysis only looks for differences within each compartment, so nothing will show up for this particular case.

2. If one is truly interested in whether positive selection has driven compartmentalization, one needs to focus on the branch (or branches) of the tree that separate the compartments. In order to detect site-by-site differences between compartments, one needs (a) a lot of patients and (b) to hope that there are sites that are commonly divergent between patients. Even though it may appear that you have lots of data, if there are many sequences from each compartment, the evolution of HIV to infect another compartment may have arisen only once. So, for 10 sequences from each compartment, one really only has a sample size of one. Hence, your observation that there are many sites distinct does not mean that you have power to detect individual sites that have diverged under positive selection pressure.

Good luck with your analyses!
Simon

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl HYPHY Package >> HyPhy feedback >> compare populations/genes http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1089101292 Message started by Ted Mes on Jul 6^th, 2004 at 1:08am