Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
calculate standard errors for dN and dS (Read 3227 times)
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
calculate standard errors for dN and dS
Jan 10th, 2007 at 2:20pm
 
Hi Sergei,

I am interested in estimating the errors associated with the dN and dS values that are reported by post_sns.bf.  I think I know how to do it, but I have some questions.

Because the dS and synRate trees are exactly proportional, it seems dS for any branch can be calculated by multiplying the branch's synRate by a constant.

dS = C x synRate


Is this the case?  Should these values really be so simply correlated (I am using the MG94W9 codon model with local parameters).

If the above formula is correct, then I think this means that the error associated with dS can be calculated by multiplying the error for synRate by the same constant.  And the error for synRate can be obtained by taking the square root of the variance in synRate (which I am estimating by nonparametric boostrap).  Thus:

error_dS = C x sqrt(variance_synRate)


Does this sound about right to you?

Thanks!
Jeff
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: calculate standard errors for dN and dS
Reply #1 - Jan 10th, 2007 at 9:54pm
 
Dear Jeff,

You are nearly correct, but the 'C' constant will depend on other model parameters (base frequencies, transition/transversion frequencies, etc) which vary from replicate to replicate. The most rigorous way to compute what you want is to directly compute dS (dN) for each replicate, hence incorporating data dependance - you need to modify the bootstrap batch code for that (but it's not hard to do). What you are suggesting (using a constant 'C') is probably not going to be too far off, unless the data set has few sequences (and the variance in C is substantial).

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
Re: calculate standard errors for dN and dS
Reply #2 - Jan 11th, 2007 at 2:02am
 
Dear Sergei,

Yes, that makes sense.  The C value should change from replicate to replicate and this variance needs to be accounted for.  I'll have a look at the bootstrapping code and see what I can do.

Thanks!
Jeff
Back to top
 
 
IP Logged
 
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
Re: calculate standard errors for dN and dS
Reply #3 - Jan 17th, 2007 at 6:23am
 
Hi Sergei,

The BS output file lists branch synRates, nonSynRates, TotalBrLens, SynBrLens, and NonSynBrLens for each BS replicate.  What I want are the branchwise dN and dS values for each replicate.   Are these dN and dS values already available in any of the data matrices generated by simplebootstrap.bf, or will I have to get calculate them?

If I do have to calculate dS, I think I need to multiply SynBrLen by # nucleotide sites and divide by # synonymous sites.  Similarly for dN.  Are these site numbers available already, or will I need to calculate these a la post_sns.bf?

Could I get the dN and dS values fore each replicate by simply including post_sns.bf in the simplebootstrap.bf file?

As you can tell, I am looking to cut some corners here.  Any suggestions would be greatly appreciated.

Thanks!
Jeff
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: calculate standard errors for dN and dS
Reply #4 - Jan 17th, 2007 at 7:54am
 
Dear Jeff,

The easiest thing to do is output the numbers of synonymous/non-synonymous/total sites in each replicate as additional columns in the .csv file. I actually need this for another project, so I'll modify the file today and post a notice here...

[Part 2]. Just put up a new build, where the bootstrap analysis will also output the number of synonymous and ns sites for each bootstrap iterate into the .csv file it writes. Hence, to get dS and dN estimates (per branch or otherwise), you can rescale E[syn subs/site] by sites/syn sites and E[nonsyn subs/site] by sites/ns. sites

Cheers,
Sergei
Back to top
« Last Edit: Jan 17th, 2007 at 11:45am by Sergei »  

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
Re: calculate standard errors for dN and dS
Reply #5 - Jan 17th, 2007 at 2:00pm
 
Looks great!  Thanks for your help yet again.

Jeff
Back to top
 
 
IP Logged