Sergei
|
Dear Jon,
In this context, multiple correction would only apply if I wanted to take site-wise inference and make a global conclusion (e.g. because some sites are selected, there is evidence of global selection with a p-value).
I am testing S (number of sites) independent hypotheses, "is site i under selection or not". Each of these hypotheses can return a false positive at the rate related to the p-value (e.g. 0.05). If my question is whether or not a given site is under selection, then I can simply use that p-value, because I am testing a single hypothesis: is site 'i' under selection or not. What this means exactly, is that had N independent samples of site 'i' evolving neutrally been generated, the test would have reported the site to be under selection (falsely) 0.05*N times (or fewer if the test is conservative). Our simulations on FEL in the 2005 MBE paper also showed that the rate of false positives (i.e. the proportion of negatively or neutrally selected sites reported by site-by-site inference as positively selected) was dominated by the p-value of the test. You can think of this as: if the test reported 50 sites under selection and the error rate was 0.05, then 50*0.05 ~ 2-3 sites are false positives.
Now suppose I want to use site-by-site inference to say: if I see AT LEAST one site under selection, I am going to claim that this is evidence that the entire gene is under selection. Composite inference like this is where one needs to worry about multiple tests corrections, because if each test can fail at rate p, then at least one of S tests is going to fail more frequently. A multiple test correction would also be in order if you wanted to ensure that every single test reported as significant was likely correct. For example, in your case, three tests are expected to ALL be correct at once (in the most conservative estimation), (1-p)^3 % of the times. For p=0.05 this is 86% of the time, but that's only if the test actually has the error rate of p; FEL tends to be conservative.
Multiple test corrections, however, are very hard to apply in site-by-site testing. Effectively, this is because all procedures (except for the ultra-conservative Bonferroni), expect that a proportion of the tests were conducted on data generated under the null - this is needed to estimate the expected distribution of p-values. For sequence analysis, however, null model = neutral evolution, and very few sites in biological alignments are even close to neutral - most are conserved and a few are positively selected. This breaks most pFDR procedures - we tried to implement some for the Not so different paper, and discovered this rather quickly.
Hope this makes sense, Sergei
|