Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
substitutions pr site and positive selection (Read 3159 times)
KBR
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
substitutions pr site and positive selection
Jan 30th, 2007 at 8:42am
 
Hi
I really need help. I do not know much about these things, and it is difficult for me to choose one method instead of the other.
I have a dataset of 204 influenza HA sequences from the year 1999 up to today. I have used SLAC and FEL in HyPhy to predict positive selected sites. When they give different results...which should I believe in?, should I publish the results from both methods? Is a treshold of 0.1 ok? Is it too strict or not?

Is there a good way, the most correct way, to estimat the global dN/dS ratio? I have intil now used the value that is given when I run the positive selection alaysis.
Which option should I use if I would like to know the transition/transversion ratio?

Also I would like to estimate the substitutions/site/year....is this what is called evolutionary rate? Does this infer a molecular clock??? Which program can I use to calculate this.

Finally....I have a dataset of 19 sequences also from 1999-2006 of some other genes of influenza (internal genes)...are there any point in estimate for positive selected sites on such a small dataset?????
I hope you have time to help me

Kindly
Karoline
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: substitutions pr site and positive selection
Reply #1 - Jan 31st, 2007 at 9:51am
 
Dear Karoline,

For 204 sequences, 0.1 is probably too liberal. Use 0.05, and also take a look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Regarding different results from different methods there could be several scenarios: if one of the methods is significant (0.02) and the other is borderline (e.g. 0.06), then they aren't really disagreeing; the difficult cases if when the methods give diametrically opposed results (which should be rare for large data sets).

In our experience, FEL is generally more accurate than SLAC sets, so I would go with it, and report SLAC results as a backup.

Global dN/dS ratio is fairly robust to the estimation method, so I would report the value from the SLAC page (with 95% confidence interval).

Rate estimation does imply a molecular clock. HyPhy can do it, so can TipDate and BEAST from the (former) members of the Oxford Evolutionary group. Take a look at DatedTipsMolecularClock analysis in HyPhy (it is under Molecular Clock in Standard Analyses).

You should be able to find strongly selected sites in 19 sequences; run all three methods on it; REL tends to have more power for this size.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
KBR
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Re: substitutions pr site and positive selection
Reply #2 - Feb 8th, 2007 at 1:09am
 
Thanks Sergei
I have now tried it out and it works fine except for one gene. I have 15 sequences, 300 bases long. I have run SLC and FEL with at 0.1 threshold and REL with at 50 bayes threshold. I get a result for REL that I do not trust. SLC and FEL give no positive selected sites, while REL gives 4. Four sites in this gene is VERY unlikely. Is the 50 bayes to liberal? It worked fine for the other genes. Any suggestions? This gene is spliced, but I have done the measurements on each of the two coding frames in the gene.

I see that many publications publish estimated nucleotidechange/site/year on influenza datasets. First: I have a suspision that they do not calculate it based on likelihood or molecular clock but simply divides the overall number of transitions and transversion with the number of nucleotides, each year. What do you think about that? That is not a way to do it, for a publication, right?
Second: I can see in my dataset that the evolution of the genes is not consistant/constant, but is highly influenced by reassortments and intoduction of new strains from elsewere. Based on this it can not be right to estimate a molecular clock to calculate substitutions/site/year, right? Will I be better off not to mention anyting about substitution rates i my publication, when I do not believe that there has been a constant rate of evolution?
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: substitutions pr site and positive selection
Reply #3 - Feb 8th, 2007 at 4:37pm
 
Dear Karoline,

For your problem case, check to see if SLAC and FEL are borderline significant (e.g. 0.15) for the same sites. REL tends to be the most liberal of the tests, but it rarely produces a large number of false positives for > 10 sequences, unless there is something strange going on (e.g. very low sequence divergence). What do you mean by two reading frames? Is it a dual coding gene (the tail of PB2)?

In terms of substitution rates, I think you are quite right not to report estimates which are based on molecular clock (i.e. constant rates over time), because they may have little to do with real substitution rates when the assumption breaks down to a great extent.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged