Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Small data set (Read 2168 times)
mattn
YaBB Newbies
*
Offline



Posts: 4
Small data set
Jul 11th, 2007 at 6:30am
 
Hi!

I am new to HyPhy, so please forgive me for these naïve questions.  My dataset is for a single gene and includes 6 allele sequences from a single species.  The gene is 676 codons long and has a very polymorphic exon in its extracellular domain.  I am interested in determining whether there is evidence for positive/negative selection in this gene.

1.      Is a dataset of 6 allele sequences enough to say anything?  I realize that the answer may depend on the data itself (i.e., how many variable sites, how much variation, etc.)

2.      Which method (FEL, SLAC, REL) is best for analyzing small data sets in general?  The impression I got from reading the "Not So Different..." MBE paper was that REL is best, but not as conservative as the other methods.  What is the minimum number of sequence you would use?

3.      For the above methods, is it possible to calculate the power to detect pos/neg selection on a site-by-site basis? 

Thanks you so much for your time,

Matt
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Small data set
Reply #1 - Jul 11th, 2007 at 10:49am
 
Dear Matt,

[quote]
Is a dataset of 6 allele sequences enough to say anything?  I realize that the answer may depend on the data itself (i.e., how many variable sites, how much variation, etc.)
[/quote]

Generally speaking, there is little information in a 6-sequence alignment to detect selection at an individual site (but there may be enough to test for selection on the proportion of sites in the gene - just not which individual site is under selection). If the sequences are really divergent, you may also need to worry about alignment quality in variable regions (this depends on the gene of course) and saturation (i.e. where the rates are basically infinite).

[quote]
Which method (FEL, SLAC, REL) is best for analyzing small data sets in general?  The impression I got from reading the "Not So Different..." MBE paper was that REL is best, but not as conservative as the other methods.  What is the minimum number of sequence you would use?
[/quote]

Run REL - also take a look at the J Mol Evol paper (http://www.springerlink.com/content/x471k2187q966135/) for how to run REL tests when false positive rates are a concern.

[quote]For the above methods, is it possible to calculate the power to detect pos/neg selection on a site-by-site basis?  
[/quote]

Again, the JME paper talks about that to an extent.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
mattn
YaBB Newbies
*
Offline



Posts: 4
Re: Small data set
Reply #2 - Jul 12th, 2007 at 5:39am
 
Dear Sergei,

Thanks so much for your reply.  I will read the article you suggested.

BTW, I think it's great that you maintain this forum for users of HyPhy.  It's a great resource!

Matt
Back to top
 
 
IP Logged