Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
When to trust GA analysis (Read 3985 times)
Sarah
Ex Member


When to trust GA analysis
Oct 25th, 2006 at 9:08am
 
I have a broad question.

Under what conditions can we trust the results of the GA branch selection analysis?

With PAML one can infer whether nested hypotheses are significant improvements over one another, and there are also indications (e.g., whether the dN/dS value is insanely high) if the inferences are weak.

Are there any papers on this topic? Could you spell out what to look for, or list cases in which GA should perform poorly?

Thanks.

Sarah
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: When to trust GA analysis
Reply #1 - Oct 25th, 2006 at 10:29am
 
Dear Sarah,

'Trust' in statistics is a dangerous concept:)
Nested hypotheses are good when they well represent all plausible explanations of a phenomenon. Otherwise, one may find that H_A is better than H_0, but in reality both are wrong and the correct model is H_1, nested neither in H_0, not in H_A.

There are a couple of diagnostics for GA results:

1). Repeatability of runs. If multiple runs return very similar results, both in terms of the top N models found, and in terms of model-averaged inference, then convergence has likely been achieved. Otherwise, try tightening the convergence criterion/increasing population size.

2). Size of confidence sets. If 95% confidence sets are very large (say > 30% of the total # models), then the model being considered is probably too complex (e.g. too many rate classes) and only model averaged inference should be trusted.

3). AIC based model selection has an analogue of p-values - evidence ratios. To compare if model 1 is much better than model 2, you can compute Akaike weights for each and take their ratios. A sufficiently large ratio (e.g. 20) can serve as evidence that model 1 has a much better fit than model 2. If the models found by the GA have high evidence ratios compared to the 'extreme' models (e.g. single ratio or free ratio or a priori), then we can have more confidence in the GA.

4). Sanity checks. For example, when you are running GABranch, you can always compare the output of that with a local (separate dN/dS for every branch) model, and see if the results seem to point in the same direction.

For a very readable book on model selection issues, I would recommend Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: When to trust GA analysis
Reply #2 - Oct 25th, 2006 at 10:40am
 
Yes, 'trust' is too... trusting. I suppose I'm looking for more indicators of confidence, and I don't have much experience with ML (and obviously didn't write the algorithm!).

I'm going to check out the book you recommended.

Thanks for listing the diagnostics--they are helpful.

Sarah
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: When to trust GA analysis
Reply #3 - Oct 25th, 2006 at 1:08pm
 
Dear Sarah,

Good luck! I like this quote from Edwards' 'Likelihood'

All our likelihood arguments are conditional on particular probability models: in a sense the model itself is a nuisance parameter. We would like to argue without it, but cannot.

Isn't statistical inference fun?

I was actually going to modify the web implementation of GA Branch to run faster, use a tweaked (from our more recent papers) algorithm for the search, and automatically determined the appropriate number of rate classes. Stay tuned...

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: When to trust GA analysis
Reply #4 - Nov 1st, 2006 at 7:04pm
 
Dear Sarah,

I just finished updating the GA Branch back end scripts; they will now run about 10-50 times faster depending on the data, and also automatically cycle through the number of rate classes, from 2 up to a maximum of 10.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
Ex Member


Re: When to trust GA analysis
Reply #5 - Nov 1st, 2006 at 7:06pm
 
Awesome! Thanks! I look forward to trying it out. Grin

Sarah
Back to top
 
 
IP Logged