Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
simulating positive selection on a branch (Read 2164 times)
ofedrigo
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 14
simulating positive selection on a branch
Sep 15th, 2006 at 12:23pm
 
Hi,

I am trying to simulate positive selection for non coding regions on a specific branch. For this, I use a four category model (as described in Zhang et al. 2005). It does not seem to work. Basically that's how I try to do it:

I simulate independently 4 dataset for each of the four site class with fraction of sites f0,f1,f2a,f2b respectively and I concatenate them. Here is the relevant part of my code:


ACCEPT_BRANCH_LENGTHS = 1;
characters = {{"A","C","G","T"}
                      {"1","","",""}};

/*set up parameters*/
treeString="(A:0.0070826,B:0.0117252,C:0.0623285)";
global kappa_inv=0.22;
obsFreqs={{0.5}{0.25}{0.25}{0.5}};
branch_name="A";
zeta0=0;
zeta1=1;
zeta2=1.5;
f0=500;
f1=300;
f2a=200;
f2b=250;

/*define model*/
HKY85RateMatrix = {{*,t*kappa_inv,t,t*kappa_inv}
                  {t*kappa_inv,*,t*kappa_inv,t}
                  {t,t*kappa_inv,*,t*kappa_inv}
                  {t*kappa_inv,t,t*kappa_inv,*}};
Model HKY85       = (HKY85RateMatrix, obsFreqs);

useModel(HKY85);

/*create neutral tree"
Tree dummy = treeString;
branchNames     = BranchName   (dummy,-1);
branchLengthsNow  = BranchLength (dummy,-1);
tToBranchLengthFactor:= 2*((obsFreqs[0]*obsFreqs[2]+obsFreqs[1]*obsFreqs[3]) + kappa_inv*(obsFreqs[0]*obsFreqs[1]+obsFreqs[0]*obsFreqs[3]+obsFreqs[1]*obsFreqs[2]+obsFreqs[2]*obsFreqs[3]));
Tree neutralTree=treeString;
for (k = 0; k < Columns (branchNames)-1; k=k+1)  {ExecuteCommands ("neutralTree."+branchNames[k]+".t:="+branchLengthsNow[k]+"/tToBranchLengthFactor;");}

Tree myTree=treeString;

/* zeta0 on foreground and background branches */
ClearConstraints (myTree);
ReplicateConstraint ("this1.?.t:=zeta0*this2.?.t",myTree,neutralTree);
/*simulate dataset*/
DataSet simulatedData = Simulate (myTree,obsFreqs,characters,f0,0);

/* zeta1 on foreground and background branches */
ClearConstraints (myTree);
ReplicateConstraint ("this1.?.t:=zeta1*this2.?.t",myTree,neutralTree);
/*simulate and concatenate dataset*/
DataSet sim = Simulate (myTree,obsFreqs,characters,f1,0);
DataSet simulatedData = Concatenate (simulatedData, sim);

/* zeta0 on background branches and zeta2 on foreground branch*/
ClearConstraints (myTree);
ReplicateConstraint("this1.?.t :=zeta0*this2.?.t", myTree,neutralTree);
ExecuteCommands("newTree."+branch_name+".t := zeta2*neutralTree."+branch_name+".t;");
/*simulate and concatenate dataset*/
DataSet sim = Simulate (myTree,obsFreqs,characters,f2a,0);
DataSet simulatedData = Concatenate (simulatedData, sim);

/* zeta1 on background branches and zeta2 on foreground branch*/
ClearConstraints (myTree);
ReplicateConstraint("this1.?.t :=zeta1*this2.?.t", myTree,neutralTree);
ExecuteCommands("newTree."+branch_name+".t := zeta2*neutralTree."+branch_name+".t;"); 
/*simulate and concatenate dataset*/
DataSet sim = Simulate (myTree,obsFreqs,characters,f2b,0);
DataSet simulatedData = Concatenate (simulatedData, sim);       

/*save concatenated dataset*/
DataSetFilter      simulatedDataFilter = CreateFilter (simulatedData,1);
DATA_FILE_PRINT_FORMAT = 2;
fprintf ("myoutput.phy", CLEAR_FILE,simulatedDataFilter);


Is it the right way to do it?

Thanks a lot,

Olivier
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulating positive selection on a branch
Reply #1 - Sep 15th, 2006 at 4:38pm
 
Dear Olivier,

A few small typos (e.g. obsFreqs need to sum to 1, newTree should be myTree in ExecuteCommands down the line), and a minor branch length scaling problem, but overall, looks good. Here's the corrected code:

Code:
 ACCEPT_BRANCH_LENGTHS = 1;
 characters = {{"A","C","G","T"}
     {"1","","",""}};

 /*set up parameters*/
 treeString="(A:0.0070826,B:0.0117252,C:0.0623285)";
 global kappa_inv=0.22;
 obsFreqs={{0.25}{0.25}{0.25}{0.25}};
 branch_name="A";
 zeta0=0;
 zeta1=1;
 zeta2=1.5;
 f0=500;
 f1=300;
 f2a=200;
 f2b=250;

 /*define model*/
 HKY85RateMatrix = {{*,t*kappa_inv,t,t*kappa_inv}
   {t*kappa_inv,*,t*kappa_inv,t}
   {t,t*kappa_inv,*,t*kappa_inv}
   {t*kappa_inv,t,t*kappa_inv,*}};
 Model HKY85 = (HKY85RateMatrix, obsFreqs);

 useModel(HKY85);

 /*create neutral tree" */
 Tree dummy 				= treeString;
 branchNames 				  = BranchName   (dummy,-1);
 branchLengthsNow 			= BranchLength (dummy,-1);
 ExecuteCommands("scalingFactor = branchLengthsNow[0]/dummy."+branchNames[0]+".t");

 Tree neutralTree=treeString;
 for (k = 0; k < Columns (branchNames)-1; k=k+1)
 {
 	ExecuteCommands ("neutralTree."+branchNames[k]+".t=dummy."+branchNames[k]+".t/scalingFactor");
 }


 Tree myTree=treeString;

 /* zeta0 on foreground and background branches */
 ClearConstraints (myTree);
 ReplicateConstraint ("this1.?.t:=zeta0*this2.?.t",myTree,neutralTree);
 /*simulate dataset*/

 fprintf (stdout, "zeta0 tree 1: ", Format (myTree,1,1), "\n");

 DataSet simulatedData = Simulate (myTree,obsFreqs,characters,f0,0);

 /* zeta1 on foreground and background branches */
 ClearConstraints (myTree);
 ReplicateConstraint ("this1.?.t:=zeta1*this2.?.t",myTree,neutralTree);
 /*simulate and concatenate dataset*/
 fprintf (stdout, "zeta1 tree 1: ", Format (myTree,1,1), "\n");
 DataSet sim = Simulate (myTree,obsFreqs,characters,f1,0);
 DataSet simulatedData = Concatenate (simulatedData, sim);

 /* zeta0 on background branches and zeta2 on foreground branch*/
 ClearConstraints (myTree);
 ReplicateConstraint("this1.?.t :=zeta0*this2.?.t", myTree,neutralTree);
 ExecuteCommands("myTree."+branch_name+".t := zeta2*neutralTree."+branch_name+".t;");
 /*simulate and concatenate dataset*/
 fprintf (stdout, "zeta0/zeta2 tree 1: ", Format (myTree,1,1), "\n");
 DataSet sim = Simulate (myTree,obsFreqs,characters,f2a,0);
 DataSet simulatedData = Concatenate (simulatedData, sim);

 /* zeta1 on background branches and zeta2 on foreground branch*/
 ClearConstraints (myTree);
 ReplicateConstraint("this1.?.t :=zeta1*this2.?.t", myTree,neutralTree);
 ExecuteCommands("myTree."+branch_name+".t := zeta2*neutralTree."+branch_name+".t;");
 fprintf (stdout, "zeta1/zeta2 tree 1: ", Format (myTree,1,1), "\n");
 /*simulate and concatenate dataset*/
 DataSet sim = Simulate (myTree,obsFreqs,characters,f2b,0);
 DataSet simulatedData = Concatenate (simulatedData, sim);

 /*save concatenated dataset*/
 DataSetFilter simulatedDataFilter = CreateFilter (simulatedData,1);
 DATA_FILE_PRINT_FORMAT = 2;
 fprintf ("myoutput.phy", CLEAR_FILE,simulatedDataFilter);
 



HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
ofedrigo
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 14
Re: simulating positive selection on a branch
Reply #2 - Sep 20th, 2006 at 12:31pm
 
Thanks Sergei!

I am still confused about the branch length scaling. My understanding was that, for instance under HKY model, branch lengths are defined are function of the frequency and  trst trsv  rates:

BranchLength (trvs, trst) = 2*trst*(f_A*f_G+f_C*f_T) + 2*trsv (f_A*f_C+f_A*f_T+f_C*f_G+f_G*f_T)

In your code, in order to calculate the scaling factor you do not account for base freq and kappa_inv. Am I missing something?

thanks again,

Olivier
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulating positive selection on a branch
Reply #3 - Sep 20th, 2006 at 8:29pm
 
Dear Olivier,

Generally speaking, if you want to constrain the branch length to be constant, while some parameters are being optimized, you need to write out the formula explicitly as you had. In your case, branch lengths or parameters never change during simulation, hence you can use the following trick.

For nearly all models, the dependance of branch length (BL) on the time parameter (t) is linear, where the coefficient depends on other parameters (e.g. rates and frequencies), i.e. BL = scaler * t

For a fixed set of all other model parameters, you can compute the scaler by setting t to some non-zero value, calling the BranchLength function to compute BL, and then setting scaler = BL/t.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged