HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> maximum tree length/sequence divergence
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1171453849

Message started by Mike on Feb 14th, 2007 at 3:50am

Title: maximum tree length/sequence divergence
Post by Mike on Feb 14th, 2007 at 3:50am
Greetings - I am studying the evolution of several families of g protein coupled receptors and I would like to identify positively selected sites and determine if lineages (subtrees) have evidence for positive selection.  One of the families, in particular, is quite diverse and parts of the alignment are 'questionable.'  I was eager to try out datamonkey so I plugged in the alignment and ran the SLAC analysis.  Out of 1701 sites, this analysis identified 419 positively selected sites and 45 under negative selection.  However, the analysis results also report:

Quote:
Tree length (expected substitutions per site) = 12848.4

Warning: Unusually long tree. Perhaps the sequences were misaligned?

Global dN/dS = 0.5623


There are ~100 sequences in the file.  Well, I know the alignment still needs work, but this brings up the question of how close the sequences need to be in order to get meaningful results from this and other analyses.  I can remove distant and unalignable sequences from the file, and divide it into clusters of sequences that align well within the cluster but not between clusters. Is there some optimum or maximum substitutions per site that I should try to achieve?
Any comments or suggestions are welcome.
Mike

Title: Re: maximum tree length/sequence divergence
Post by Sergei on Feb 14th, 2007 at 10:45am
Dear Mike,

419/1701 selected sites sounds unrealistic. Try to strip all of the sites which are very gappy (i.e. only a few sequences have character data there), and also choose the 'Resolve ambiguities' option on the analysis setup page.

Cheers,
Sergei

Title: Re: maximum tree length/sequence divergence
Post by Mike on Feb 14th, 2007 at 12:58pm
Hi Sergei -

I realized that deleting columns from the alignment was going to be difficult to do quickly without disrupting the reading frame so instead I deleted about 10 sequences that were causing the most gaps.  These sequences appear to have large insertions that may be incorrectly annotated exons. After removing these sequences I have 93 sequences and 734 codons. the SLAC analysis using the Resolve ambiguities option results in 619 positively selected sites.

Thanks
Mike

Title: Re: maximum tree length/sequence divergence
Post by Sergei on Feb 14th, 2007 at 2:25pm
Dear Mike,

619 selected sites still sounds really fishy. Can you send me (spond at ucsd dot edu) your job ID so I can look at it and see what is going on?

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.