HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> How to simulate data with user branch lengths
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1116438143

Message started by Sergei on May 18th, 2005 at 10:42am

Title: How to simulate data with user branch lengths
Post by Sergei on May 18th, 2005 at 10:42am
I received this query by e-mail

Quote:
Here is what I tried for HKY.  But I got a bunch of sequences that were all identical.

****

HKYM =  [*,trvs,trst,trvs}
                {trvs,*,trvs,trst}
                {trst,trvs,*,trvs}
                {trvs,trst,trvs,*];


eqf = [0.2}{0.3}{0.2}{0.3];
Model  HKY = (HKYM,eqf,1);

ACCEPT_ROOTED_TREES = 1;
Tree L_1 = (((T1:0.0728981,T2:0.156032):0.821206,T3:0.829292):0.247948,((T4:0.123786,(T5:0.0252201,T6:0.0267699):0.0845906):0.646862,((T7:0.420598,T8:0.297769):0.648832,(T9:0.113452,T10:0.136777):0.684275):0.155985):0.44216);

characters = ["A","C","G","T"}
             {"1","","",""];


DataSet sim = Simulate (L_1,eqf,characters,10000,1,"testDataHKY");


Under HKY85 the branch length is defined as a weighted sum of the transversion rate and a transition rate (see below):

BranchLength (trvs, trst) = 2*trst*(f_A*f_G+f_C*f_T) + 2*trsv (f_A*f_C+f_A*f_T+f_C*f_G+f_G*f_T).

ecause a branch length is the function of two parameters, HyPhy can't map a branch length from the tree string to actual values of transition and transversion; thus both are set to '0' (default values) and sequences are generated along a zero-length tree. You could check for this by calling and printing the value of BranchLength(L_1, 0) - 0 is simply the first branch in post-order traversal.

To fix the the problem of branch length->parameter mapping, I have modified your file to

(1). Use HKY85 with a global tranversion/transition ratio (fixed by the user). Thus BranchLength becomes the function of a single parameter (t).
(2). Wrote a function which computes the scaling factor B is BranchLength = B*t. B is a function of R and base frequencies.
(3). Added a loop after the tree has been read to set correct branch lengths.

The file is attached. If you go to other models, you'll need to update the function which computes 'B', but the rest of the analysis should work fine.


Code (]
global TvTs = 0.3; /* the global transition/transversion ratio */

HKYM =  [*,TvTs*t,t,TvTs*t}
        {TvTs*t,*,TvTs*t,t}
        {t,TvTs*t,*,TvTs*t}
        {TvTs*t,t,TvTs*t,*];


eqf = [0.2}{0.3}{0.2}{0.3];


Model  HKY = (HKYM,eqf,1);

ACCEPT_ROOTED_TREES = 1;

t             = 1;

/* it is important to set the branch length parameter to 1, so that the rate matrix is evaluated properly for the function,
otherwise the scaling factor will be of by a factor of 1/t */

scalingB = computeScalingFactorB (HKYM, eqf);

Tree L_1 = (((T1:0.0728981,T2:0.156032):0.821206,T3:0.829292):0.247948,((T4:0.123786,(T5:0.0252201,T6:0.0267699):0.0845906):0.646862,((T7:0.420598,T8:0.297769):0.648832,(T9:0.113452,T10:0.136777):0.684275):0.155985):0.44216);

/* after this step, HyPhy will set .t parameters of each branch to the value in the string, e.g L_1.T1.t = 0.0728981.
But the branch length of T1 will then be B*0.0728981. Thus we need to divide the .t of each branch by B to obtain
correct branch lengths. I added the print statments for diagnostics */

fprintf (stdout, "\nBranch scaling factor computed to be ", scalingB,"\n\n");

/* get all branch names and branch lengths*/

branchNames      = BranchName      (L_1,-1);
branchLenghts      = BranchLength      (L_1,-1);

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
     ExecuteCommands ("L_1."+branchNames[k):

+".t=L_1."+branchNames[k]+".t/scalingB;");
     fprintf (stdout, "Branch ", branchNames[k],".\n\t Length before scaling: ", Format (branchLenghts[k],10,5),
                                                                    "\n\t Length after scaling : ", Format (BranchLength (L_1,k),10,5), "\n");
}

characters = ["A","C","G","T"}
             {"1","","",""];


DataSet sim = Simulate (L_1,eqf,characters,10000,1,"testDataHKY");


function computeScalingFactorB (rateMatrix, baseFreqs)
{
     B = 0;
     for (n1 = 0; n1 < 4; n1 = n1+1)
     {
           for (n2 = 0; n2 < 4; n2 = n2+1)
           {
                 if (n2!=n1)
                 {
                       B = B + baseFreqs[n1]*baseFreqs[n2]*rateMatrix[n1][n2];
                 }
           }
     }
     return B;
}


Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.