Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
maximum tree length/sequence divergence (Read 2397 times)
Mike
YaBB Newbies
*
Offline



Posts: 9
Spain
maximum tree length/sequence divergence
Feb 14th, 2007 at 3:50am
 
Greetings - I am studying the evolution of several families of g protein coupled receptors and I would like to identify positively selected sites and determine if lineages (subtrees) have evidence for positive selection.  One of the families, in particular, is quite diverse and parts of the alignment are 'questionable.'  I was eager to try out datamonkey so I plugged in the alignment and ran the SLAC analysis.  Out of 1701 sites, this analysis identified 419 positively selected sites and 45 under negative selection.  However, the analysis results also report:
Quote:
Tree length (expected substitutions per site) = 12848.4

Warning: Unusually long tree. Perhaps the sequences were misaligned?

Global dN/dS = 0.5623


There are ~100 sequences in the file.  Well, I know the alignment still needs work, but this brings up the question of how close the sequences need to be in order to get meaningful results from this and other analyses.  I can remove distant and unalignable sequences from the file, and divide it into clusters of sequences that align well within the cluster but not between clusters. Is there some optimum or maximum substitutions per site that I should try to achieve?
Any comments or suggestions are welcome.
Mike
Back to top
 
mike012012  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: maximum tree length/sequence divergence
Reply #1 - Feb 14th, 2007 at 10:45am
 
Dear Mike,

419/1701 selected sites sounds unrealistic. Try to strip all of the sites which are very gappy (i.e. only a few sequences have character data there), and also choose the 'Resolve ambiguities' option on the analysis setup page.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Mike
YaBB Newbies
*
Offline



Posts: 9
Spain
Re: maximum tree length/sequence divergence
Reply #2 - Feb 14th, 2007 at 12:58pm
 
Hi Sergei -

I realized that deleting columns from the alignment was going to be difficult to do quickly without disrupting the reading frame so instead I deleted about 10 sequences that were causing the most gaps.  These sequences appear to have large insertions that may be incorrectly annotated exons. After removing these sequences I have 93 sequences and 734 codons. the SLAC analysis using the Resolve ambiguities option results in 619 positively selected sites.

Thanks
Mike
Back to top
 
mike012012  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: maximum tree length/sequence divergence
Reply #3 - Feb 14th, 2007 at 2:25pm
 
Dear Mike,

619 selected sites still sounds really fishy. Can you send me (spond at ucsd dot edu) your job ID so I can look at it and see what is going on?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged