HyPhy message board - When to trust GA analysis

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › Theoretical questions › Sequence Analysis › When to trust GA analysis

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

When to trust GA analysis (Read 4000 times)

Sarah

Ex Member

When to trust GA analysis
Oct 25^th, 2006 at 9:08am

I have a broad question.

Under what conditions can we trust the results of the GA branch selection analysis?

With PAML one can infer whether nested hypotheses are significant improvements over one another, and there are also indications (e.g., whether the dN/dS value is insanely high) if the inferences are weak.

Are there any papers on this topic? Could you spell out what to look for, or list cases in which GA should perform poorly?

Thanks.

Sarah

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: When to trust GA analysis
Reply #1 - Oct 25^th, 2006 at 10:29am

Dear Sarah,

'Trust' in statistics is a dangerous concept:)
Nested hypotheses are good when they well represent all plausible explanations of a phenomenon. Otherwise, one may find that H_A is better than H_0, but in reality both are wrong and the correct model is H_1, nested neither in H_0, not in H_A.

There are a couple of diagnostics for GA results:

1). Repeatability of runs. If multiple runs return very similar results, both in terms of the top N models found, and in terms of model-averaged inference, then convergence has likely been achieved. Otherwise, try tightening the convergence criterion/increasing population size.

2). Size of confidence sets. If 95% confidence sets are very large (say > 30% of the total # models), then the model being considered is probably too complex (e.g. too many rate classes) and only model averaged inference should be trusted.

3). AIC based model selection has an analogue of p-values - evidence ratios. To compare if model 1 is much better than model 2, you can compute Akaike weights for each and take their ratios. A sufficiently large ratio (e.g. 20) can serve as evidence that model 1 has a much better fit than model 2. If the models found by the GA have high evidence ratios compared to the 'extreme' models (e.g. single ratio or free ratio or a priori), then we can have more confidence in the GA.

4). Sanity checks. For example, when you are running GABranch, you can always compare the output of that with a local (separate dN/dS for every branch) model, and see if the results seem to point in the same direction.

For a very readable book on model selection issues, I would recommend Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Sarah YaBB Newbies Offline Posts: 47	Re: When to trust GA analysis Reply #2 - Oct 25^th, 2006 at 10:40am Yes, 'trust' is too... trusting. I suppose I'm looking for more indicators of confidence, and I don't have much experience with ML (and obviously didn't write the algorithm!). I'm going to check out the book you recommended. Thanks for listing the diagnostics--they are helpful. Sarah
Back to top	IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: When to trust GA analysis
Reply #3 - Oct 25^th, 2006 at 1:08pm

Dear Sarah,

Good luck! I like this quote from Edwards' 'Likelihood'
All our likelihood arguments are conditional on particular probability models: in a sense the model itself is a nuisance parameter. We would like to argue without it, but cannot.
Isn't statistical inference fun?

I was actually going to modify the web implementation of GA Branch to run faster, use a tweaked (from our more recent papers) algorithm for the search, and automatically determined the appropriate number of rate classes. Stay tuned...

Cheers,
Sergei

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: When to trust GA analysis Reply #4 - Nov 1^st, 2006 at 7:04pm Dear Sarah, I just finished updating the GA Branch back end scripts; they will now run about 10-50 times faster depending on the data, and also automatically cycle through the number of rate classes, from 2 up to a maximum of 10. Cheers, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sarah Ex Member	Re: When to trust GA analysis Reply #5 - Nov 1^st, 2006 at 7:06pm Awesome! Thanks! I look forward to trying it out. Sarah
Back to top	IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page