Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Significance level and Type I errors for Selection (Read 1550 times)
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Significance level and Type I errors for Selection
Jul 11th, 2005 at 10:09am
 
I received the following query via e-mail (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for details of the SLAC/FEL/REL methods for detecting positive selection).

Quote:
(1) Do you have a suggested value for the Bayes factor as a cut off for infering positively selected sites?


Very crudely (a rule of thumb kind of thing, for uninformative priors), 1/Bayes Factor is comparable with the p-value for a classical hypothesis test. Thus a Bayes Factor of 100 is highly significant (p ~ 0.01), while a Bayes Factor of 20 is roughly p~0.05.

Quote:
(2) Have you found that small datasets (say, n < 10) are prone to Type I errors?
I have a number of datasets that could be sub-divided into smaller datasets.


The SLAC and FEL methods are conservative and are very unlikely to falsely identify sites under selection in small datasets. However, they also lack power for alignments with few sequences. REL on the other hand is quite likely to suffer from high Type I error rates for very small data sets, because there may not be enough data to fit the distribution of rates reliably. When you run REL locally you can reduce Type I errors by using the following heuristic:

(1). Check for presence of synonymous rate variation first (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login); if it can't be found (by comparing Dual and Nonsynonymous models using AIC or LRT), use the Nonsynonymous model for inference.

(2). Use the fewest possible number of rate classes (2x2, 2x3, 3x2, 3x3 etc), until you get no more AIC improvement - the fewer rate classes there are the fewer parameters are being fitted, and consequently the errors in remaining model parameter estimates may be reduced.

(3). We are currently developing a parameter sampling-resampling scheme, which will automatically quantify the errors in parameter estimates (similar to the BEB method of Yang/Nielsen, but using a more rigorous sampling scheme) and suggest which 'selected' sites may be suspect. Stay tuned.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged