Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
interpretation PARRIS GARD results (Read 3007 times)
Maria2
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
interpretation PARRIS GARD results
Aug 25th, 2008 at 5:39am
 
Dear Hyphy/Datamonkey

I`m working with PAML, PARRIS and GARD and have some questions about my results. I started my analysis with PAML and found sites among my sequences that I`m working with to be under positive selection (4 under M2, 2 sites under M8). I checked for the interference of recombination using PARRIS and there still seems to be evidence for positive selection (LRT is still significant). Then I screened for recombination sites using GARD and detected one site. I subdivided the alignment in two parts, to analyse both fragments again with PAML and found for one fragment a higher nr of sites under positive selection (6 sites vs 2 earlier found in this region, LRT significant, similar for M2 and M8), for the other fragment I found only 1 (lost 1 site and LRT significant for M1/M2 but no longer under M7/M8). The extra sites found in the first segment were detected in the complete PAML analysis as well, but had then a low probability in the NEB.

My question is which sites I can now believe to be under positive selection? I can understand that if the GARD program detects a recombination site, the parameters as determined in PAML for the complete sequences are not applicable for all sites and that the results on the two seperate fragments are more correct. But then I like to know how one can see how `reliable` the GARD detection of the recombination site is (delta c-AIC between 0 and 1 Bp is 200, so I think very reliable, but what if this difference is 20?). Also, I do not completely understand the table in the results file of PARRIS. What information do the AIC-values for M1 and M2 provide and on what is the given p-value based on? I determined the LRT based on the difference in lnL given for M1 and M2. Sorry, I`m not a statistician, nor bio-informaticus, so I have some difficulties with fully understanding the meaning of a certain lnL- or c-AIC-value.

My question is also more general, because how should I interpretate results from others who have not performed a check for recombination, but did found pos sites under PAML? I can understand that under different models, different conditions are set, so one should present their results in the frame of the model. But also if I find different sites under M2 and M8 of PAML, which sites can I believe to be `really`under diversifying selection?

thanks a lot,
Maria
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: interpretation PARRIS GARD results
Reply #1 - Aug 25th, 2008 at 8:26am
 
Dear Maria,

If recombination has influenced your sequences, then not accounting for it in any phylogeny based analysis - not just PAML - can lead to erroneous inference. Our GARD paper in MBE shows an extreme example, and other people have demonstrated that for PAML - type analyses. I would always advise a recombination screen prior to running site-by-site selection screens.

As far as different results with PAML go. M2 and M8 are somewhat arbitrary models of rate variation, so you should more or less expect them to give somewhat different results. I can't tell you which one is better - in reality they both approximate biological reality poorly (as do all existing models). I would recommend that you read our "Not so different ... " paper for a discussion on how different selection detection methods stack up. As a word of caution, even if M2 vs M2a (or M8b vs M8a etc) is significant using LRT, it does not tell you that M2 is the correct model; it only tells you that M2a is unlikely to have generated the sequences. We advocate using many different models and trying to build a consensus among different methods.

Delta AIC of 10 or greater is usually sufficient for significance. PARRIS carries out a standard LRT test (1 degree of freedom with the null value of the boundary). 

Run your data through datamonkey.org for comparison - see what sites come up as being under selection and contrast that with PAML's sets. One thing that PAML does not account for is synonymous rate variability; that could lead to some false positive signal.

No method can guarantee that the sites it detects are under diversifying selection, I am afraid; all models are flawed to differing extents.

Cheers,
Sergei

PS If you haven't read this chapter yet (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login), you may find its contents useful in furthering your understanding.


Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged