HyPhy message board - Print Page

Dear Sergei,

I've run the CompareSelectivePressure.bf and the CompareSelectivePressureIVL.bf. I got three codons out of 84 evolving under significantly (alpha=0.05) different selective pressures between two species in each analysis.

For tips, the three positions are 44 (p=0.049), 68 (p=0.048), and 82 (p=0.008). Along internal branches, it is 66 (p=0.038), 80 (p=0.02) and 82 (p=0.033). Note that position 82 is significant in both analyses and that the p-value for position 68 which was significant for tips is only 0.053 along internal branches.

In the PLoS Comput Biol. 2006 paper, you mention that the test is conservative and judging by fig.S8, it seems indeed to be very conservative. However, should I still apply some kind of correction for multiple tests ? What would be your view on that (lets say you were a referee on the paper)?

Cheers,

Jon

The p-values hidden by smileys were 0.048, 0.008, and 0.038...

Dear Jon,

In this context, multiple correction would only apply if I wanted to take site-wise inference and make a global conclusion (e.g. because some sites are selected, there is evidence of global selection with a p-value).

I am testing S (number of sites) independent hypotheses, "is site i under selection or not". Each of these hypotheses can return a false positive at the rate related to the p-value (e.g. 0.05). If my question is whether or not a given site is under selection, then I can simply use that p-value, because I am testing a single hypothesis: is site 'i' under selection or not. What this means exactly, is that had N independent samples of site 'i' evolving neutrally been generated, the test would have reported the site to be under selection (falsely) 0.05*N times (or fewer if the test is conservative). Our simulations on FEL in the 2005 MBE paper also showed that the rate of false positives (i.e. the proportion of negatively or neutrally selected sites reported by site-by-site inference as positively selected) was dominated by the p-value of the test. You can think of this as: if the test reported 50 sites under selection and the error rate was 0.05, then 50*0.05 ~ 2-3 sites are false positives.

Now suppose I want to use site-by-site inference to say: if I see AT LEAST one site under selection, I am going to claim that this is evidence that the entire gene is under selection. Composite inference like this is where one needs to worry about multiple tests corrections, because if each test can fail at rate p, then at least one of S tests is going to fail more frequently. A multiple test correction would also be in order if you wanted to ensure that every single test reported as significant was likely correct. For example, in your case, three tests are expected to ALL be correct at once (in the most conservative estimation), (1-p)^3 % of the times. For p=0.05 this is 86% of the time, but that's only if the test actually has the error rate of p; FEL tends to be conservative.

Multiple test corrections, however, are very hard to apply in site-by-site testing. Effectively, this is because all procedures (except for the ultra-conservative Bonferroni), expect that a proportion of the tests were conducted on data generated under the null - this is needed to estimate the expected distribution of p-values. For sequence analysis, however, null model = neutral evolution, and very few sites in biological alignments are even close to neutral - most are conserved and a few are positively selected. This breaks most pFDR procedures - we tried to implement some for the Not so different paper, and discovered this rather quickly.

Hope this makes sense,
Sergei

Thanks Sergei,

Since the inference I wish to make is whether or not there is evidence of differential selection somewhere between the two species based on the site-by-site tests, I should apply some false discovery rate correction such as the q-value approach proposed by Storey (I think Bonferroni is too conservative to be really useful). But you're saying that the fact that most of the data are not generated under the null creates problem. Is it still worth applying it ?

Besides, I'm concerned about the lack of power if I feed the q-value procedure the p-values obtained as they are given your simulation results. Couldn't there be a way to adjust the observed p-values based on the observed rate of false positives from the simulation so that they are closer to the desired alpha level. Looking at the red curve on fig. S8 in the PLOS paper, obtaining a p-value of lets say 0.02 from this test would correspond to a probability of type I error of something like 0.00something (the y axis is not detailed).

Do you have any suggestion of what kind of correction I should apply ? Do you think the q-value procedure such as the one available as an R package would be the one to use? If yes, should we attempt to correct the p-values based on simulation results to take the conservative nature of the test into account before computing q-values ?

Dear Jon,

I think there are actually NO sites evolving under the null, because the null means that selection pressures are exactly the same on a site in two different populations; there may be some where selection is similar but not the same. I could never get q-values to work in a FEL setting, so I wouldn't recommend using them. We developed site-by-site tests to identify sites of interest, not so much to test global questions, but there are two things you could do (there may be more, but these are what came to my mind):

1). Do a global LRT type test. Consider all variable sites in at least one of the two populations (constant sites have no signal, and are excluded from the analysis anyway). Let there be K of them. Differential selection reports a LR value for each site. Sum them up, and do a test with chi^2 and K degrees of freedom, to see if the global NULL model (all matched pairs of variable sites have the same dN/dS), vs the global ALTERNATIVE model (dN/dS differs in each pair). This is likely to be very conservative, but if you can reject the NULL, you'll have the result.

2). Simulation based: take site by site rates estimated by the model which forces the same dN/dS for matched sites (data driven NULL). Simulate 100 replicates and run differential selection test on each. Now record the SMALLEST p-value in each run and compare with the smallest p-value in your actual test. If 5 or fewer replicates can best the observed p-value, this is evidence that you don't need to use multiple test correction.

Hope this helps,
Sergei

Thanks a lot. Both are good ideas. That solves my problem!

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl Methodology Questions >> How to >> correction for multiple tests http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1161979107 Message started by Jon on Oct 27^th, 2006 at 12:58pm