Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Branch lengths - models with different dimensions (Read 3056 times)
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Branch lengths - models with different dimensions
Dec 16th, 2006 at 6:09am
 
Hi all,

I got stuck trying to MLE a tree with 2 partitions within the 12s, one for loops (4x4), and another for stems (8x8). The thing is patristic ML-distances are way too high, even > 1, so it may be a problem with the joint use of models with different parameters/dimensions. I assume a proportional model among partitions, which works properly whenever I use the same model type for different partitions (e.g.: GTR_1 for codon position 1, GTR_2 for position 2...). Here's the code I've been trying to use:

--------------------------------------------------------------------------------
---------------------------------

IS_TREE_PRESENT_IN_DATA = 1;

DataSet gene_12s = ReadDataFile (USE_NEXUS_FILE_DATA);

DataSetFilter Loop_RNA = CreateFilter (gene_12s, 1, DATA_FILE_PARTITION_MATRIX[1][1]);

/* RNA doublets (with exclusion set) */
DataSetFilter Stem_RNA = CreateFilter (gene_12s, 2, "2,18,3,17,4,16,...", "" ,"AA,AG,CC,CT,GA,GG,TC,TT");


/* Single transition between doublets */
global S;

/* Double transition between canonical doublets */
global D;

/* Loop: TN_93 */
global P;
global Q;

/* "Proportional model" among partitions */
global R;

/* +I, +G */
global P_invar;
global alpha;

category Pinv = (2,{{P_invar}{1-P_invar}},MEAN, ,{{0}{1}}, 0,1);
category gam =(4, EQUAL, MEAN, GammaDist(_x_,alpha,alpha), CGammaDist(_x_,alpha,alpha),0,1e25);


HarvestFrequencies (Loop_Freqs, Loop_RNA, 1, 1, 1);

/* Empirical frequencies of the 8 most common doublet states */
Doublet_Freqs = {{0.00854592}{0.174745}{0.012174}{0.304436}{0.245536}{0.0181973}{0.221259}{0.006
37755}};

/* Partition rate weights */
global rate_loop := (Loop_RNA.sites + Stem_RNA.sites) / (Loop_RNA.sites + R * Stem_RNA.sites);
global rate_stem := R * rate_loop;


/*  Loops: TN_93 + I + G */
RNA_loop_matrix = {
{     *            ,(  m * Pinv * gam),(P*m * Pinv * gam),(  m * Pinv * gam)}
{(  m * Pinv * gam),     *            ,(  m * Pinv * gam),(Q*m * Pinv * gam)}
{(P*m * Pinv * gam),(  m * Pinv * gam),     *            ,(  m * Pinv * gam)}
{(  m * Pinv * gam),(Q*m * Pinv * gam),(  m * Pinv * gam),     *            }

};


/* Model used:

a = double transversion between canonical pairs.
S * a = single transition between doublets.
D * a = double transition between canonical pairs.

Everything else not allowed in a single time step.

Sequence of states: AC, AT, CA, CG, GC, GT, TA, TG.  */


RNA_stem_matrix = {
{  *      ,S * a,  0      ,  0      ,S * a,  0      ,  0      ,  0      }
{S * a,  *      ,  0      ,  a  ,D * a,S * a,  0      ,  0      }
{  0      ,  0      ,  *      ,S * a,  0  ,  0      ,S * a,  0      }
{  0      ,  a      ,S * a,  *      ,  0  ,  0  ,D * a,S * a}
{S * a,D * a,  0      ,  0      ,  *  ,S * a,  a      ,  0      }
{  0      ,S * a,  0      ,  0      ,S * a,  *      ,  0      ,  0      }
{  0      ,  0      ,S * a,D * a,  a  ,  0      ,  *      ,S * a}
{  0      ,  0      ,  0      ,S * a,  0  ,  0      ,S * a,  *      }

};


Model Loop_model = (RNA_loop_matrix, Loop_Freqs);
Model Stem_model = (RNA_stem_matrix, Doublet_Freqs);



     UseModel(Loop_model);
     Tree myTree_loop = DATAFILE_TREE;

     UseModel(Stem_model);
     Tree myTree_stem = DATAFILE_TREE;

     Tree myTree_12s = DATAFILE_TREE;


     ReplicateConstraint ("this1.?.a := this2.?.m*R", myTree_stem, myTree_loop);

     LikelihoodFunction theLikFun = (Loop_RNA, myTree_loop, Stem_RNA, myTree_stem);
     Optimize(results,theLikFun);

     ReplicateConstraint ("this1.?.a := (this2.?.m + this3.?.a)/(rate_loop + rate_stem)", myTree_loop, myTree_stem, myTree_12s);
--------------------------------------------------------------------------------
---------------------------------

... So how can I correct the problem stated above?
Back to top
« Last Edit: Dec 17th, 2006 at 9:02am by Jose_Patane »  
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Branch lengths - models with different dimensi
Reply #1 - Dec 18th, 2006 at 11:06am
 
Dear Jose,

There shouldn't be a problem with differing data types in your partitions with regard to branch lengths.

Regarding the code

[list][*]The definition of gamma is incomplete (it will run but unnecessarily slowly). Include the conditional mean expression as the last argument:

[code]
category gam =(4, EQUAL, MEAN, GammaDist(_x_,alpha,alpha), CGammaDist(_x_,alpha,alpha),0,1e25,CGammaDist(_x_,alpha+1,alpha));
[/code]
[*]There is an extraneous space in the second to last entry
[code]
Doublet_Freqs = {{0.00854592}{0.174745}{0.012174}{0.304436}{0.245536}{0.0181973}{0.22125 9}{0.00637755}};
[/code]
[*] When defining Gamma+Inv, care must be taken to ensure that the mean of the resulting distribution is one, otherwise the length of the tree will be confounded with the proportion of invariable sites. Typically, you can do this by using a two-parameter gamma distribution with mean beta/alpha and enforcing (1-PInv)*beta/alpha = 1
[*] I don't understand the point of the last 'ReplicateConstraint'. Are you trying to combine the branch lengths of the trees from the loop and the stem into a joint estimate? As in: joint_BL = (loop_sites*loop_BL+stem_sites*stem_BL)/(total sites)? The ReplicateConstraint will accomplish properly. Firstly, the order of the arguments should be "myTree_12s, myTree_loop, myTree_stem". Secondly, if you print myTree_12s, you will get very odd branch lengths, because myTree_12s will be using the Stem_model (last defined model), and the branch lengths will be some functions of 'a', which depend on many other parameters. I'll post the code on how to generate joint branch lengths from the two trees if this is what you after.
[/list]

Best,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Re: Branch lengths - models with different dimensi
Reply #2 - Dec 18th, 2006 at 12:00pm
 
Hi Sergei,

about the points you posted:

- gamma was incomplete:
That's great, I'll try this way now;

- extraneous space within Double_freqs:
That might have been a problem when formatting for posting, it's not in the original *.bf file;

- gamma+Inv:
Great hint as well;

- ReplicateConstraint:
You got the point, I want to properly estimate the joint branch length;

- Order of the trees in ReplicateConstraint:
Another silly error of mine while editing for posting... the order is right in the original *.bf file.

- Joint branch lengths:
That's it, I want joint branch lengths... I couldn't find a way to make it work... nevertheless, the way the code is written fits perfectly when I use the very same model across partitions (e.g., dimensionality and number of parameters), but here it doesn't... you mention that it's wrong to define the final tree under one of the models... why is that, exactly, on a theoretical level, since it works fine in the previous mentioned case? (I've also thought about that possible catch before, but it seems not to be acceptable by HyPhy to define a tree without a model beforehand... )
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Branch lengths - models with different dimensi
Reply #3 - Dec 18th, 2006 at 12:22pm
 
Dear Jose,

Quote:
nevertheless, the way the code is written fits perfectly when I use the very same model across partitions (e.g., dimensionality and number of parameters), but here it doesn't... you mention that it's wrong to define the final tree under one of the models... why is that, exactly, on a theoretical level? (I've thought about that before, but it seems not to be acceptable by HyPhy to define a tree without a model beforehand... )


A branch length is usually a linear function of model parameters, e.g. BL = C * t, where 't' is the branch lengths (local) parameter in the model, and C is a function of all other model parameters (frequencies etc).

When you have the same model, 'C' is the same across all partitions, hence, by writing:

BL_J = a * BL_1 + b* BL_2

is the same as writing

t_j = a*t_1 + b*t_2.

If the constants are not the same, then you need is to enforce:

t_j = (a*t_1*C_1+b*t_2*C_2)/C_J.

These constant are not difficult to work out run time (they are \sum_j \pi_j r_{jj}, where 'r' is the diagonal element of the rate matrix, and \pi_j is the frequency of the respective character).

Alternatively, you can simply operate on branch lengths directly.
The code example below will mix the branch lengths in two trees to create a new one. Because HyPhy HBL does not currently support direct manipulation of branch lengths of tree objects, it is a bit hacky (using a post-order traversal 'flat' form of the trees).

Code:
UseModel (USE_NO_MODEL);
Tree mixedTree = treeString;
bl1     = BranchLength (myTree_stem,-1); /* get all branch lengths from the tree and return a vector, indexed by the post-order traversal order of the tree */
bl2      = BranchLength (myTree_loop,-1);
branchNames = BranchName (mixedTree,-1);

flatTree = mixedTree ^ 0; /* create a flat representation of the tree in an associative array */

mixedDistances = {}; /* create a new hash for branch lengths, using the form mixedDistances[branchName]  = branchLength */

for (bc = 0; bc < Columns(branchNames); bc = bc+1)
{
mixedDistances [branchNames[bc]] = prop1 * bl1[bc] + prop2 * bl2[bc];
}

/* include a file to convert the flat representation back to a tree string */
ExecuteAFile (HYPHY_BASE_DIRECTORY+"TemplateBatchFiles"+DIRECTORY_SEPARATOR+"TreeTools.ibf");

newTreeString = PostOrderAVL2StringDistances (flatTree, mixedDistances);
/* newTreeString now contains the Newick string with the mixed distances as you need; adjust prop1 and prop2 as needed */
 



Cheers,
Sergei

P.S. Example code above may have typos/bugs as I haven't tested it, but should be easy to fix.


Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Re: Branch lengths - models with different dimensi
Reply #4 - Dec 24th, 2006 at 8:51pm
 
Hi Sergei,

I'm having some trouble with the proper definition of Gamma+Inv , I've tried this following your advice:

-----------------------------------------------------------
global alpha;
global beta;

category gam =(4, EQUAL, beta/alpha, GammaDist(_x_,alpha,beta), CGammaDist(_x_,alpha,beta),0,1e25);


global P_invar;

category Pinv = (2,{{P_invar}{1-P_invar}},MEAN, ,{{0}{1}}, 0,1);
Pinv := 1-(alpha/beta);

------------------------------------------------------------------

Some parameters now have strange values (such as R), so I guess I misunderstood something...
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Branch lengths - models with different dimensi
Reply #5 - Dec 27th, 2006 at 9:27am
 
Dear Jose,

Take a look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

There's quite a bit of discussion on how to define Gamma+Inv in that thread.

The relevant bits for your code would be

Code:
-----------------------------------------------------------
global alpha = 1;
global beta = 1;
global P_invar = 0.5;
beta:=alpha/(1-P_invar);

category gam = (
    4,	  /* number of rates */
    EQUAL,	   /* weights */
    MEAN,	    /* sampling method */
    GammaDist(_x_,alpha,beta),    /* density */
    CGammaDist(_x_,alpha,beta),   /* CDF */
    0,	  /* left bound */
    1e5,	/* right bound */
    CGammaDist(_x_,alpha+1,beta)*alpha/beta /* mean cumulative function */
    );


category Pinv = (2,{{P_invar}{1-P_invar}},MEAN, ,{{0}{1}}, 0,1);

------------------------------------------------------------------
 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Re: Branch lengths - models with different dimensi
Reply #6 - Feb 27th, 2007 at 1:36pm
 
Thanks again Sergei...
Back to top
 
 
IP Logged