Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
accelerated sub rate in a particular lineage (Read 4485 times)
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
accelerated sub rate in a particular lineage
Feb 19th, 2009 at 4:42pm
 
Hello Sergei,

I have a set of noncoding regions and want to look for elements that are accelerated in a particular lineage.  To do this I have taken regions surrounding each element as a reference and want to constrain all element branches to be proportional to the corresponding reference branch under the null model.  Under the alternative model, I want to add another variable to the lineage I am interested in that allows the element branch to be longer.  

I am able to constrain two branches to be proportional and I add a constant of proportionality that I have calculated from the global data set.  It looks something like this:  Element_tree.branchA.t:={constant}*Reference_tree.branchA.t

My questions are as follows:
1)      Under the alternative model, how do I estimate (or allow for) a variable, say G, that is > 1 and allows the element branch to be longer?  The formula would look something like this:  Element_tree.branchA.t:=G*{constant}*Reference_tree.branchA.t

2)      In publications doing similar analyses (I’m thinking about Bush&Lahn 2008), an additional scaling factor, S, that is a tree wide scaling factor which accounts for the fact that elements vary in their general level of conservation was estimated.  The formula for constrained branches would be as follows with this factor: Element_tree.branchA.t:=S*{constant}*Reference_tree.branchA.t.  How do I add such a factor into the analysis?

3)      Is there a way to weight the element and reference branches differently in the likelihood?  


Thanks very much in advance for your help.  

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #1 - Feb 19th, 2009 at 10:14pm
 
Dear Rachel,

Are you planning to run this analysis in batch mode on many alignments or just once or twice? In the latter case you would simply use the interface, following the general guidelines outlined in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (especially section 2.3).

For batch analyses you would have to write a little batch file to do this, for example with ReplicateConstraint, e.g (this applies the constraint to the whole tree).

Code:
global G; G:>1;
ReplicateConstraint ("this1.?.t:=G*this2.?.t", Element_tree,Reference_tree);

 



I can give you more specific guidance if you tell me about the route you were going to take (batch file vs GUI)

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: accelerated sub rate in a particular lineage
Reply #2 - Feb 20th, 2009 at 10:44am
 
Thanks so much for the fast response.  I will be running the analysis in batch mode on many alignments.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #3 - Feb 23rd, 2009 at 7:46am
 
Dear Rachel,

The easiest way to implement the analysis is to use the scaffold provided by Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Most of the work lies in modifying the scripts to read input settings; the analysis itself will be very easy to implement. Think about what input is necessary for each analysis: reference tree, lineages to test, background alignments (or a tree estimated from those), evolutionary model to use etc. If you formalize the list of inputs/outputs, I can further assist you with how to modify the pipeline scaffold.

Cheers,
Sergei

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: accelerated sub rate in a particular lineage
Reply #4 - Feb 24th, 2009 at 5:33pm
 
Hello Sergei,

I've formalized the list of inputs and outputs and I am hoping if you can tell me whether or not I am on the right track or if you can see any problems with my setup.  In short, I want to determine if the species1 branch is evolving at an accelerated rate.

Below is a snip from a test batch file.  The reference partition contains putatively neutral sequence, the element partition contains putatively functional noncoding sequence.   I define the expected rate of evolution through calculating a constant of proportionality external to the data.  S is a tree-wide scaling factor that accounts for varying levels of conservation and G is a variable that allows the species1 terminal branch to be longer under the alternative hypothesis.  

/*Create the partitions*/
DataSetFilter ReferenceFilter = CreateFilter(myData, 1, "0-429, 1119-1718");
DataSetFilter ElementFilter = CreateFilter(myData, 1, "430-1118");

/*Create vector of observed frequencies*/
HarvestFrequencies(obsFreqsRef, ReferenceFilter,1,1,1);
HarvestFrequencies(obsFreqsEle, ElementFilter, 1,1,1);

/*Define models and trees*/
HKY85RateMatrix = {{*,b,a,b}{b,*,b,a}{a,b,*,b}{b,a,b,*}};

Model HKY85Ref = (HKY85RateMatrix, obsFreqsRef);
Tree myTreeRef = ((species1,species2),species3);

Model HKY85Ele = (HKY85RateMatrix, obsFreqsEle);
Tree myTreeEle = ((species1,species2),species3);

LikelihoodFunction theLnLik = (ReferenceFilter, myTreeRef, ElementFilter, myTreeEle);

/* Initialize S, the tree-wide scaling factor*/
global S;

/*Set the constraints*/
myTreeEle.species1.a:=S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=S*0.35*myTreeRef.species1.b;
myTreeEle.species2.a:=S*0.2*myTreeRef.species2.a;
myTreeEle.species2.b:=S*0.23*myTreeRef.species2.b;
myTreeEle.species3.a:=S*0.33*myTreeRef.species3.a;
myTreeEle.species3.b:=S*0.35*myTreeRef.species3.b;

Optimize (paramValues, theLnLik);
nullLnLik = paramValues[1][0];

/*Initialize G, the variable that allows the species1 terminal branch to be longer*/
global G; G:>1;
myTreeEle.species1.a:=G*S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=G*S*0.35*myTreeRef.species1.b;
myTreeEle.species2.a:=S*0.2*myTreeRef.species2.a;
myTreeEle.species2.b:=S*0.23*myTreeRef.species2.b;
myTreeEle.species3.a:=S*0.33*myTreeRef.species3.a;
myTreeEle.species3.b:=S*0.35*myTreeRef.species3.b;

Optimize (paramValues, theLnLik);

lnlikDelta = 2(nullLnLik-paramValues[1][0]);


Many thanks.

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #5 - Feb 25th, 2009 at 9:23pm
 
Dear Rachel,

The general outline looks sensible. I have a few comments/questions.

1). HKY85RateMatrix defines an HKY model with a separate transition AND transversion rate along every tree branch (a local model). Was that intentional?

2). Where did the constants (0.4, 0.35 etc) in the constraints come from?

3). You don't need to copy all the constraints for the alternative model, only the ones that change:

Code:
myTreeEle.species1.a:=G*S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=G*S*0.35*myTreeRef.species1.b;
 



You seem to be well on your way to setting up the analysis. Once you get it working to your satisfaction on a few test cases, I can show you how to write a script that can take partition specifications from files and writes results out to a formatted file for further processing -- something that is necessary for inclusion in pipelines.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: accelerated sub rate in a particular lineage
Reply #6 - Mar 6th, 2009 at 5:28pm
 
Hello Sergei,

I've set up some test cases and am almost ready to run the analysis.  My current plan is to write a perl wrapper to run the analysis for each alignment rather than use the batch analysis scaffold because it is very easy for me to do so.  Is there an advantage to using the hyphy batch scaffold or will I do just as well with my wrapper?  

The putatively neutral reference partitions are substantially larger than the element partitions.  I am wondering if it is possible and/or necessary to weight the element and reference bases differently.  

The constants (0.4, 0.35 etc) in the constraints (called constants of proportionality)  are based on the whole genome ratio between elements and neutral reference for a particular branch.  These are meant to account for lineage specific differences in constraint (which are genome-wide).  I have not yet determined if adding these is the best course of action.    

Thanks again for your help.

Cheers,

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #7 - Mar 9th, 2009 at 11:51am
 
Dear Rachel,

If you are comfortable with Perl, use it by all means. Take a look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for an example of how you can just pipe analysis options into HyPhy instead of creating config or control files.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: accelerated sub rate in a particular lineage
Reply #8 - Mar 9th, 2009 at 3:28pm
 
Sergei wrote on Mar 9th, 2009 at 11:51am:
Dear Rachel,

If you are comfortable with Perl, use it by all means. Take a look at for an example of how you can just pipe analysis options into HyPhy instead of creating config or control files.

Cheers,
Sergei


Thanks for the link.  Should be quite helpful.  

What about weighting the element and reference bases differently due the differences in length.  Is that possible/necessary?  

Cheers,

Rachel


Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #9 - Mar 9th, 2009 at 6:04pm
 
Dear Rachel,

Rachel wrote on Mar 9th, 2009 at 3:28pm:
What about weighting the element and reference bases differently due the differences in length.  Is that possible/necessary?


I don't see why weighting is necessary (unless I misunderstand what you are asking) -- you are measuring rates/branch lengths, and those are already normalized (per site).

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Rachel
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: accelerated sub rate in a particular lineage
Reply #10 - Mar 12th, 2009 at 2:59pm
 
Just wanted to say that I think I've got everything worked out.  Thanks so much for your help. 

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: accelerated sub rate in a particular lineage
Reply #11 - Mar 12th, 2009 at 6:24pm
 
Dear Rachel,

Great! Let me know if you have any additional questions.

Best,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged