Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
redundant sequences in one cluster and datamonkey (Read 2582 times)
Mete
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
redundant sequences in one cluster and datamonkey
May 31st, 2010 at 2:12pm
 
Dear Sergei,
My question relates to the suitability of my data for usage with Datamonkey.

I have sequenced environmental samples for a specific gene. Upon phylogenetic tree construction, they cluster with either of two separate species. However while one species is represented by approximately 30 unique sequences (I am saying this based on the tree), the other cluster is represented by 6 sequences. The diversity within each cluster is very low (over 99% identity), while the identity between clusters is ~96%. Would it be wrong to use the whole data set for selection analyses, since one cluster is represented by many sequences? Should I analyze each cluster seperately? (I assume the low divergence would be a problem). Should I use a subset of sequences from the large cluster?


Thank you very much for your answer,

Best,

Mete
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: redundant sequences in one cluster and datamonkey
Reply #1 - Jun 1st, 2010 at 11:40am
 
Hi Mete,

Uneven sampling could indeed bias your analysis: imagine that a site is under negative selection in the clade with 30 sequences, but is under positive selection in the clade with 6 sequences. The former signal will probably dominate and show the site as being under negative selection. That said, all site-by-site selection analyses assume that there is no variation in selection strength between lineages.

I would run your data through GA Branch to look for lineage dN/dS variation, and assuming there is not too much (or that it is not localized to some parts of your subtree), run the FEL analysis.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Mete
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Re: redundant sequences in one cluster and datamonkey
Reply #2 - Jun 5th, 2010 at 4:15pm
 
Thanks a lot..
Back to top
 
 
IP Logged