Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
GA analysis: dN/dS = inf (Read 3860 times)
Sarah
YaBB Newbies
*
Offline



Posts: 47
GA analysis: dN/dS = inf
Jan 16th, 2007 at 1:52pm
 
I ran the GA selection analysis on ~200 sequences that were generated under high positive selection pressure in simulation (JC69 model). The analysis ran for 3 weeks on 10 processors (but pop size = 20; I made a mistake specifying CPUs) before it was killed. I looked at the output file and saw that the 2-rate class model had the lowest AIC. One rate was between 0 and 1 and the other had gone to 10^(5 or 6) (some big power of 10, can't remember which).

Most of the codons have finite dN/dS, as calculated in FEL.

Under what circumstances should I expect the GA analysis to fail if dN/dS = inf on some branches? Is it worth rerunning the analysis?

For comparison, I ran the analysis on ~200 real, positively selected sequences (012345 model) and the simulation ended after 3 days on 28 processors. It settled again on the 2-rate class model, with one rate just over one. I believe this tree has some dN/dS = inf branches.

At what frequencies of dN/dS = inf does this method fail?

Would greatly appreciate any insights or suggestions.

Thanks,
Sarah

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA analysis: dN/dS = inf
Reply #1 - Jan 16th, 2007 at 7:03pm
 
Dear Sarah,

Infinite ratios are typically an artifact of low divergence sequence where there are a few substitutions per branch, hence dS is inferred to be 0.
An infinite ratio is technically not a failure of the method, but rather the effect of data undersampling.

Another possibility is that your sequences are too short relative to the number of sequences. As set up by default, c-AIC model selection requires there to be at least 2N+14 for N sequences (because the number of model parameters, including frequencies must be less than the total number of sites less one). If your sequences are shorter than that, inference can become meaningless. I can modify the analysis to count the number of samples more liberally (as the number of characters in the alignment) to avoid this issue.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: GA analysis: dN/dS = inf
Reply #2 - Jan 17th, 2007 at 11:49am
 
Thanks, Sergei.

I'll start by trying to simulate with longer sequences--I'm currently near the minimum.

You imply the problem is deeper, though. Does you mean that if any branch has dS = 0, the GA analysis should not work? There is no way to accommodate an "inf" rate class?

Sarah
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA analysis: dN/dS = inf
Reply #3 - Jan 17th, 2007 at 12:13pm
 
Dear Sarah,

An 'inf' estimate is not really a failure of the method. There could be two possibilities:

1). Imagine that at some branch the estimate of dN is 100 but that of dS is 0. The numerical estimate of the ratio is 100/0 = inf. Clearly, '0' is not biological, and simply suggests that we haven't sampled enough data to find the small but non-zero estimate of dS. This may be better understood, if one computed confidence intervals (e.g. profile likelihood) on dS and dN. Suppose for example that the 95% CI for dS is (0-0.1). In that case you can crudely approximate the CI for the ratio to be (1000-Inf) and the estimate of inf is effectively correct (i.e. a very large dN/dS).

2). Now imagine that the estimates are dN = 0.1 and dS = 0 and confidence intervals are 0.05-0.15 and 0-0.1, respectively. Now the point estimate of dN/dS = inf can be very wrong indeed, because the ratio can credibly lie in 0.5-Inf.

In the latter case there is not much one can do with any method - the variance of dN/dS estimates will be just too high.

Hope this helps,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: GA analysis: dN/dS = inf
Reply #4 - Jan 22nd, 2007 at 9:06am
 
Dear Sergei,

What I think I was hoping for was that the GA might create an inf category for branches for which dS = 0 (effectively 0--I know it's not strictly biologically possible). I assume from the long runtime that there's no way these kinds of branches might be accommodated. Is that right?

The alternatives appear to be to:

(i) Increase dS to measurable quantities with smaller confidence intervals by adding a neutrally evolving, dummy sequence to the current sequences.

(ii) Increase branch lengths by sampling more coarsely over the tree, i.e., discarding sequences.

Thanks again,
Sarah
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA analysis: dN/dS = inf
Reply #5 - Jan 22nd, 2007 at 12:25pm
 
Dear Sarah,

The best way around the dS = 0 problem is to re-parameterize the model, to use separate alpha (dS) and beta (dN) rates on each branch, instead of defining beta = omega*alpha, running into problems when alpha = 0 and beta>0. It's a bit tedious to do (because the representation of the problem for the GA changes somewhat), but it can be done.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: GA analysis: dN/dS = inf
Reply #6 - Jan 30th, 2007 at 8:45am
 
Thanks, Sergei! I've more pressing challenges at the moment Tongue but will keep this solution in mind.

Sarah
Back to top
 
 
IP Logged