fubar_hyphy Vs datamonkey, result is different
fubar_hyphy Vs datamonkey, result is different
Mar 13th, 2014 at 5:36am
Hello, everyone.
I tried fubar method for dN/dS analysis for my data. I used Fubar method in HYPHY 2.2 using the default parameters(e.g. 20, 2000000 and 1000000 as burn in). then I tried to upload the sequences to datamonkey, and found the result was a little different. the standalone one found 2 sites were under positive selection with FDR=0.11. the datamonkey reported 1 site under positive selection with FDR=0.06. plus the datamonkey reported purification but the standalone one did not. So is different fubar version in standalone and datamonkey?
another question is, for the result, should I consider the FDR or just take all the reported positive selection genes (PP>0.9) in the result. Any help is appreciated.
Re: fubar_hyphy Vs datamonkey, result is different
Reply #1 - Mar 14th, 2014 at 2:05pm
Hi there,

1). There could be some (small) stochastic variation between FUBAR runs -- the methods in DM and HyPhy use the same code. Did you use the same posterior probability cutoffs (0.9 should be the default). Was the site that 'disappeared' in one of the runs close to 0.9 (e.g. 0.905?)

2). Datamonkey has some extra post-processing code to find sites under negative selection. You can get the list of such sites from FUBAR output by processing the .csv file (look at the Prob[alpha>beta] column)

3). Datamonkey reports the expected number of false positives, i.e. like FDR * [number of positive results]. What you choose to report is up to you. Generally I recommend reporting all the called sites, the cutoff value, and some FDR measure.

