HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Theoretical questions >> Sequence Analysis >> When to trust GA analysis
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1161792492

Message started by Sarah on Oct 25th, 2006 at 9:08am

Title: When to trust GA analysis
Post by Sarah on Oct 25th, 2006 at 9:08am
I have a broad question.

Under what conditions can we trust the results of the GA branch selection analysis?

With PAML one can infer whether nested hypotheses are significant improvements over one another, and there are also indications (e.g., whether the dN/dS value is insanely high) if the inferences are weak.

Are there any papers on this topic? Could you spell out what to look for, or list cases in which GA should perform poorly?

Thanks.

Sarah

Title: Re: When to trust GA analysis
Post by Sergei on Oct 25th, 2006 at 10:29am
Dear Sarah,

'Trust' in statistics is a dangerous concept:)
Nested hypotheses are good when they well represent all plausible explanations of a phenomenon. Otherwise, one may find that H_A is better than H_0, but in reality both are wrong and the correct model is H_1, nested neither in H_0, not in H_A.

There are a couple of diagnostics for GA results:

1). Repeatability of runs. If multiple runs return very similar results, both in terms of the top N models found, and in terms of model-averaged inference, then convergence has likely been achieved. Otherwise, try tightening the convergence criterion/increasing population size.

2). Size of confidence sets. If 95% confidence sets are very large (say > 30% of the total # models), then the model being considered is probably too complex (e.g. too many rate classes) and only model averaged inference should be trusted.

3). AIC based model selection has an analogue of p-values - evidence ratios. To compare if model 1 is much better than model 2, you can compute Akaike weights for each and take their ratios. A sufficiently large ratio (e.g. 20) can serve as evidence that model 1 has a much better fit than model 2. If the models found by the GA have high evidence ratios compared to the 'extreme' models (e.g. single ratio or free ratio or a priori), then we can have more confidence in the GA.

4). Sanity checks. For example, when you are running GABranch, you can always compare the output of that with a local (separate dN/dS for every branch) model, and see if the results seem to point in the same direction.

For a very readable book on model selection issues, I would recommend Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Cheers,
Sergei

Title: Re: When to trust GA analysis
Post by Sarah on Oct 25th, 2006 at 10:40am
Yes, 'trust' is too... trusting. I suppose I'm looking for more indicators of confidence, and I don't have much experience with ML (and obviously didn't write the algorithm!).

I'm going to check out the book you recommended.

Thanks for listing the diagnostics--they are helpful.

Sarah

Title: Re: When to trust GA analysis
Post by Sergei on Oct 25th, 2006 at 1:08pm
Dear Sarah,

Good luck! I like this quote from Edwards' 'Likelihood'

All our likelihood arguments are conditional on particular probability models: in a sense the model itself is a nuisance parameter. We would like to argue without it, but cannot.

Isn't statistical inference fun?

I was actually going to modify the web implementation of GA Branch to run faster, use a tweaked (from our more recent papers) algorithm for the search, and automatically determined the appropriate number of rate classes. Stay tuned...

Cheers,
Sergei

Title: Re: When to trust GA analysis
Post by Sergei on Nov 1st, 2006 at 7:04pm
Dear Sarah,

I just finished updating the GA Branch back end scripts; they will now run about 10-50 times faster depending on the data, and also automatically cycle through the number of rate classes, from 2 up to a maximum of 10.

Cheers,
Sergei

Title: Re: When to trust GA analysis
Post by Sarah on Nov 1st, 2006 at 7:06pm
Awesome! Thanks! I look forward to trying it out. ;D

Sarah

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.