Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Differences between PAML and DATAMONKEY results (Read 4007 times)
Denise
Guest


Differences between PAML and DATAMONKEY results
Apr 25th, 2006 at 11:21am
 
I recently analyzed the same data set using both PAML and Datamonkey and found that the sites identified under positive selection differ between the two completely.  Here's the details:
PAML - The LRT between M1 and M2 is not significant while that between M7 and M8 is significant.  The estimated omega for M8 is 1.18496 with  0.07359  as the proportion of sites in this category.  The BEB found 9 sites positively selected with one site greater than .95.
  The same data set run through Datamonkey found 3 sites in the FEL and SLAC analyses positively selected at the .05 significance level.  Of these sites, two out of the three are the same between the two analyses.  These sites do not have any overlap with the PAML analysis (there is no overlap even at the .1 significance level in Datamonkey).  And finally, my data set contains 42 species and 294 amino acids.  If you need any more information, let me know.
    So my questions include: Has this happened to other people?  Under what conditions might this happen?  For example, it appears in this case that sites may be positively selected but not at a level much higher than the neutral rate.  Is this a likely factor in conflicting results and is there any other conditions where such conflicts will arise?  Why might I get such different results from these two programs?  And for the big question: Any suggestions on how these results should be interpreted? 

Thanks for any help,
- Denise
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Differences between PAML and DATAMONKEY result
Reply #1 - Apr 25th, 2006 at 2:40pm
 
Dear Denise,

There are numerous differences between our methods (SLAC and FEL) and those implemented in PAML. I would direct your attention to the paper which describes our methods (http://mbe.oupjournals.org/cgi/content/short/22/5/1208) and points out how some of these differences may come into play.

Two major factors which could lead to different results are (a) Datamonkey.org allows synonymous rates to vary from site to site and PAML doesn't (see http://mbe.oxfordjournals.org/cgi/content/short/22/12/2375 for how this may affect the results); (b) M8 models may not always describe the distribution of rates in the alignment well; SLAC and FEL do not force all positively selected sites to have the same 'omega' (or its equivalent), but rather estimate omegas site by site.

One 'sanity check' I would recommend is to run SLAC and then use the 'Inferred Substitutions' link to see the inferred evolutionary history of all those sites which have been found as selected by SLAC/FEL and PAML. See if the results make sense (e.g. there should be some non-synonymous substitutions inferred at that site, and, generally, a lot fewer synonymous substitutions than non-synonymous ones).

One other thing you may want to check is that the alignment uploaded to Datamonkey.org was processed correctly; check the numbers of sequences and sites reported on the upload page and also use the Inferred Substitutions link as described above, for example, to make sure that Datamonkey.org and PAML index codons the same way.

It's difficult to give definitive advice about how to interpret the results; this has to be handled on a case-by-case basis, especially when the evidence for selection does not seem terribly convincing - e.g. is omega = 1.2 inferred by PAML really THAT different from neutrality? If you want, you can send me (help@datamonkey.org) he link to SLAC/FEL results for your data and the list of sites found to be under selection by PAML, and I'll take a look and see if anything stands out.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged