HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> accelerated sub rate in a particular lineage
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1235090579

Message started by Rachel on Feb 19th, 2009 at 4:42pm

Title: accelerated sub rate in a particular lineage
Post by Rachel on Feb 19th, 2009 at 4:42pm
Hello Sergei,

I have a set of noncoding regions and want to look for elements that are accelerated in a particular lineage.  To do this I have taken regions surrounding each element as a reference and want to constrain all element branches to be proportional to the corresponding reference branch under the null model.  Under the alternative model, I want to add another variable to the lineage I am interested in that allows the element branch to be longer.  

I am able to constrain two branches to be proportional and I add a constant of proportionality that I have calculated from the global data set.  It looks something like this:  Element_tree.branchA.t:={constant}*Reference_tree.branchA.t

My questions are as follows:
1)      Under the alternative model, how do I estimate (or allow for) a variable, say G, that is > 1 and allows the element branch to be longer?  The formula would look something like this:  Element_tree.branchA.t:=G*{constant}*Reference_tree.branchA.t

2)      In publications doing similar analyses (I’m thinking about Bush&Lahn 2008), an additional scaling factor, S, that is a tree wide scaling factor which accounts for the fact that elements vary in their general level of conservation was estimated.  The formula for constrained branches would be as follows with this factor: Element_tree.branchA.t:=S*{constant}*Reference_tree.branchA.t.  How do I add such a factor into the analysis?

3)      Is there a way to weight the element and reference branches differently in the likelihood?  


Thanks very much in advance for your help.  

Rachel

Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Feb 19th, 2009 at 10:14pm
Dear Rachel,

Are you planning to run this analysis in batch mode on many alignments or just once or twice? In the latter case you would simply use the interface, following the general guidelines outlined in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (especially section 2.3).

For batch analyses you would have to write a little batch file to do this, for example with ReplicateConstraint, e.g (this applies the constraint to the whole tree).

[code]

global G; G:>1;
ReplicateConstraint ("this1.?.t:=G*this2.?.t", Element_tree,Reference_tree);

[/code]

I can give you more specific guidance if you tell me about the route you were going to take (batch file vs GUI)

Cheers,
Sergei

Title: Re: accelerated sub rate in a particular lineage
Post by Rachel on Feb 20th, 2009 at 10:44am
Thanks so much for the fast response.  I will be running the analysis in batch mode on many alignments.  

Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Feb 23rd, 2009 at 7:46am
Dear Rachel,

The easiest way to implement the analysis is to use the scaffold provided by Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Most of the work lies in modifying the scripts to read input settings; the analysis itself will be very easy to implement. Think about what input is necessary for each analysis: reference tree, lineages to test, background alignments (or a tree estimated from those), evolutionary model to use etc. If you formalize the list of inputs/outputs, I can further assist you with how to modify the pipeline scaffold.

Cheers,
Sergei

Cheers,
Sergei

Title: Re: accelerated sub rate in a particular lineage
Post by Rachel on Feb 24th, 2009 at 5:33pm
Hello Sergei,

I've formalized the list of inputs and outputs and I am hoping if you can tell me whether or not I am on the right track or if you can see any problems with my setup.  In short, I want to determine if the species1 branch is evolving at an accelerated rate.

Below is a snip from a test batch file.  The reference partition contains putatively neutral sequence, the element partition contains putatively functional noncoding sequence.   I define the expected rate of evolution through calculating a constant of proportionality external to the data.  S is a tree-wide scaling factor that accounts for varying levels of conservation and G is a variable that allows the species1 terminal branch to be longer under the alternative hypothesis.  

/*Create the partitions*/
DataSetFilter ReferenceFilter = CreateFilter(myData, 1, "0-429, 1119-1718");
DataSetFilter ElementFilter = CreateFilter(myData, 1, "430-1118");

/*Create vector of observed frequencies*/
HarvestFrequencies(obsFreqsRef, ReferenceFilter,1,1,1);
HarvestFrequencies(obsFreqsEle, ElementFilter, 1,1,1);

/*Define models and trees*/
HKY85RateMatrix = {{*,b,a,b}{b,*,b,a}{a,b,*,b}{b,a,b,*}};

Model HKY85Ref = (HKY85RateMatrix, obsFreqsRef);
Tree myTreeRef = ((species1,species2),species3);

Model HKY85Ele = (HKY85RateMatrix, obsFreqsEle);
Tree myTreeEle = ((species1,species2),species3);

LikelihoodFunction theLnLik = (ReferenceFilter, myTreeRef, ElementFilter, myTreeEle);

/* Initialize S, the tree-wide scaling factor*/
global S;

/*Set the constraints*/
myTreeEle.species1.a:=S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=S*0.35*myTreeRef.species1.b;
myTreeEle.species2.a:=S*0.2*myTreeRef.species2.a;
myTreeEle.species2.b:=S*0.23*myTreeRef.species2.b;
myTreeEle.species3.a:=S*0.33*myTreeRef.species3.a;
myTreeEle.species3.b:=S*0.35*myTreeRef.species3.b;

Optimize (paramValues, theLnLik);
nullLnLik = paramValues[1][0];

/*Initialize G, the variable that allows the species1 terminal branch to be longer*/
global G; G:>1;
myTreeEle.species1.a:=G*S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=G*S*0.35*myTreeRef.species1.b;
myTreeEle.species2.a:=S*0.2*myTreeRef.species2.a;
myTreeEle.species2.b:=S*0.23*myTreeRef.species2.b;
myTreeEle.species3.a:=S*0.33*myTreeRef.species3.a;
myTreeEle.species3.b:=S*0.35*myTreeRef.species3.b;

Optimize (paramValues, theLnLik);

lnlikDelta = 2(nullLnLik-paramValues[1][0]);


Many thanks.

Rachel

Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Feb 25th, 2009 at 9:23pm
Dear Rachel,

The general outline looks sensible. I have a few comments/questions.

1). HKY85RateMatrix defines an HKY model with a separate transition AND transversion rate along every tree branch (a local model). Was that intentional?

2). Where did the constants (0.4, 0.35 etc) in the constraints come from?

3). You don't need to copy all the constraints for the alternative model, only the ones that change:

[code]
myTreeEle.species1.a:=G*S*0.4*myTreeRef.species1.a;
myTreeEle.species1.b:=G*S*0.35*myTreeRef.species1.b;
[/code]

You seem to be well on your way to setting up the analysis. Once you get it working to your satisfaction on a few test cases, I can show you how to write a script that can take partition specifications from files and writes results out to a formatted file for further processing -- something that is necessary for inclusion in pipelines.

Cheers,
Sergei

Title: Re: accelerated sub rate in a particular lineage
Post by Rachel on Mar 6th, 2009 at 5:28pm
Hello Sergei,

I've set up some test cases and am almost ready to run the analysis.  My current plan is to write a perl wrapper to run the analysis for each alignment rather than use the batch analysis scaffold because it is very easy for me to do so.  Is there an advantage to using the hyphy batch scaffold or will I do just as well with my wrapper?  

The putatively neutral reference partitions are substantially larger than the element partitions.  I am wondering if it is possible and/or necessary to weight the element and reference bases differently.  

The constants (0.4, 0.35 etc) in the constraints (called constants of proportionality)  are based on the whole genome ratio between elements and neutral reference for a particular branch.  These are meant to account for lineage specific differences in constraint (which are genome-wide).  I have not yet determined if adding these is the best course of action.    

Thanks again for your help.

Cheers,

Rachel

Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Mar 9th, 2009 at 11:51am
Dear Rachel,

If you are comfortable with Perl, use it by all means. Take a look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for an example of how you can just pipe analysis options into HyPhy instead of creating config or control files.

Cheers,
Sergei

Title: Re: accelerated sub rate in a particular lineage
Post by Rachel on Mar 9th, 2009 at 3:28pm

Sergei wrote on Mar 9th, 2009 at 11:51am:
Dear Rachel,

If you are comfortable with Perl, use it by all means. Take a look at for an example of how you can just pipe analysis options into HyPhy instead of creating config or control files.

Cheers,
Sergei


Thanks for the link.  Should be quite helpful.  

What about weighting the element and reference bases differently due the differences in length.  Is that possible/necessary?  

Cheers,

Rachel



Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Mar 9th, 2009 at 6:04pm
Dear Rachel,


Rachel wrote on Mar 9th, 2009 at 3:28pm:
What about weighting the element and reference bases differently due the differences in length.  Is that possible/necessary?


I don't see why weighting is necessary (unless I misunderstand what you are asking) -- you are measuring rates/branch lengths, and those are already normalized (per site).

Cheers,
Sergei

Title: Re: accelerated sub rate in a particular lineage
Post by Rachel on Mar 12th, 2009 at 2:59pm
Just wanted to say that I think I've got everything worked out.  Thanks so much for your help.  

Rachel

Title: Re: accelerated sub rate in a particular lineage
Post by Sergei on Mar 12th, 2009 at 6:24pm
Dear Rachel,

Great! Let me know if you have any additional questions.

Best,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.