HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> constraining variables twice?
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1137771909

Message started by jamescotton on Jan 20th, 2006 at 7:45am

Title: constraining variables twice?
Post by jamescotton on Jan 20th, 2006 at 7:45am
Hi - i'm trying to model an alignment of four genes, but assuming a tree and branch lengths, with a separate GTR model for each partition. To do this, I want to constrain the overall mean rate to be equal to the branch lengths of the trees I give it.. This all seems to work fine, but I can't work out how to constrain the mean rate of my four partitions.

The rates of the four partitions are given by gene1rate, gene2rate etc., and I want to constrain the sum of rate*(numsite/totalsites) to be one.
I initially tried to do the following, but I think I realise what i'm doing, rather than constraining the sum of meanrates to be 1.0, the line overallrate := 1.0 clears the dependency of overallrate := meanrate1 + meanrate2 etc. and instead enforces that overallrate is 1! This seems obvious in retrospect.

How should I construct this slightly complicated constraint, given that the := operator only allows a single variable on the LHS? I'm sure there's an obvious way to do it, but i can't work it out.. I'm guessing you've come across similar problems and will see it straight away!

Thanks,
James


Code (]
DataSetFilter filteredData12S = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1):

[0]);
DataSetFilter filteredDataPOS1 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][1]);
DataSetFilter filteredDataPOS2 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][2]);
DataSetFilter filteredDataPOS3 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][3]);


ACCEPT_BRANCH_LENGTHS = 1;  
ACCEPT_ROOTED_TREES   = 1;  

global gene1rate = 1;  
global gene2rate = 1;
global gene3rate = 1;
global gene4rate = 1;

/*constrain all these to 1.*/
global totalsize := filteredData12S.sites + filteredDataPOS1.sites + filteredDataPOS2.sites + filteredDataPOS3.sites;
fprintf (stdout,"DATASET IS ",totalsize," CHARS\n");
fprintf (stdout,"PARTITIONS ARE ",filteredData12S.sites," ",filteredDataPOS1.sites," ",filteredDataPOS2.sites," ",filteredDataPOS3.sites,"\n");
global meanrate1 := gene1rate * (filteredData12S.sites / totalsize);
global meanrate2 := gene2rate * (filteredDataPOS1.sites / totalsize);
global meanrate3 := gene3rate * (filteredDataPOS2.sites / totalsize);
global meanrate4 := gene4rate * (filteredDataPOS3.sites / totalsize);

global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
global overallrate := 1.0;

Title: Re: constraining variables twice?
Post by jamescotton on Jan 20th, 2006 at 9:06am
Whoops - its okay.. i've worked it out - I was being slightly dense - not difficult once i'd realised there are only really 3 rate parameters because of the constraints.. Just in case anyone else wants to do something similar and are having a stupid day, I've done this, and it seems to work OK.

There was another problem, in that HYPHY was trying to optimise the variables called things like filteredData12S.site - is there a way to avoid this, perhaps using the executeCommand function you've shown me before?


Code (]
DataSetFilter filteredData12S = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1):

[0]);
DataSetFilter filteredDataPOS1 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][1]);
DataSetFilter filteredDataPOS2 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][2]);
DataSetFilter filteredDataPOS3 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][3]);


ACCEPT_BRANCH_LENGTHS = 1;  
ACCEPT_ROOTED_TREES   = 1;  

global gene1rate = 1;  
global gene2rate = 1;
global gene3rate = 1;
global gene4rate = 1;

/*constrain all these to 1.*/
global totalsize := 1141;
fprintf (stdout,"DATASET IS ",totalsize," CHARS\n");
fprintf (stdout,"PARTITIONS ARE 383 254 254 250\n");
global meanrate1 := gene1rate * (383 / 1141);
global meanrate2 := gene2rate * (254 / 1141);
global meanrate3 := gene3rate * (254 / 1141);
global meanrate4 := gene4rate * (250 / 1141);

global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
/*global overallrate := 1.0;*/

global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (1141 / 383);

ACCEPT_BRANCH_LENGTHS =
/*overallrate no longer really needed, just here to check*/
global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
/*global overallrate := 1.0;*/

global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize / filteredData12S.sites);

Title: Re: constraining variables twice?
Post by Sergei on Jan 20th, 2006 at 9:08am
Dear James,

Perhaps the easiest thing to do is the following.

1). Define the sum of all the variables you want scaled to 1.

Code (]
global myRate1 = 1;
...
global myRate4 = 1;
global sum_var:= (myRate1*mySites1+...+myRate4*mySites4)/totalSites;
[/code):



2). Next, define four scaled variables:
[code]
global scaledRate1 := myRate1 / sum_var;
...
global scaledRate4 := myRate4 / sum_var;


3). Lastly, define your models in terms of scaledRate variables.

The only caveat: HyPhy is going to overcount the number of parameters by one, with this parameterization; so when you do LRT or AIC (or similar) calculations, be sure to correct for that.

Cheers,
Sergei

Title: Re: constraining variables twice?
Post by jamescotton on Jan 20th, 2006 at 9:20am
Thanks, Sergei.. If I do it my way (which I think is equivalent but less pretty), will Hyphy also overestimate the parameter count? Is there a simple explanation of why it does, so I can try and spot it in future.

Thanks again for the fantastic support you provide for Hyphy!


Title: Re: constraining variables twice?
Post by Sergei on Jan 20th, 2006 at 9:42am
Dear James,


wrote on Jan 20th, 2006 at 9:06am:
There was another problem, in that HYPHY was trying to optimise the variables called things like filteredData12S.site - is there a way to avoid this, perhaps using the executeCommand function you've shown me before?

...snip...


Code (]
global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize / filteredData12S.sites);
[/code):



When HyPhy scans the definition for gene1rate, it will include all other variables in the formula as likelihood function parameters, even though they may have no effect on the likelihood function (HyPhy doesn't really have a certain way of determining, without too much effort, which parameters are 'real' and which are 'ghosts'). If, on the other hand, you write:

[code]
global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize__ / filteredData12S.sites__);


HyPhy will interpret the (I think it may not be well documented) double underscore suffix (__) as the instruction to instead insert current values to totalsize filteredData12S.sites in the formula. This would be the same (but more compact) than writing

[code]
ExecuteCommands("global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * ("+totalsize+" /"+ filteredData12S.sites+");");
[/code]

Cheers,
Sergei

Title: Re: constraining variables twice?
Post by Sergei on Jan 20th, 2006 at 9:50am
Dear James,


wrote on Jan 20th, 2006 at 9:20am:
Thanks, Sergei.. If I do it my way (which I think is equivalent but less pretty), will Hyphy also overestimate the parameter count? Is there a simple explanation of why it does, so I can try and spot it in future.


Your parameterization is indeed equivalent. The only difference, is that, in your case, (1-meanrate1-meanrate2-meanrate3) could become negative, since meanrate1+meanrate2+meanrate3 could exceed one, because their upper bounds are not constrained. This is usually not an issue, because HyPhy will automatically enforce non-negative values on all likelihood parameters (unless explicitly overridden), by heavily penalizing (subtracting something like 10^100 from the likelihood score) if any parameter values become negative.

My parameterization avoids the <0 issue, which tends to speed up evaluations and help optimization routines not get trapped in odd parts of the parameter space.

The reason HyPhy overcounts the parameters in my parametrization, is because I impose a complex (i.e. not of the form varX:=expression) constraint implicitly. HyPhy will scan four "independent" variables (myRate1,...,myRate4), without being able to automatically determine that these four parameters are bound by an implicit constraint (weighted mean of one).

In your parametrization, on the other hand, all constraints are explicit, so the number of parameters will be counted correctly.

HTH,
Sergei

Title: Re: constraining variables twice?
Post by jamescotton on Jan 20th, 2006 at 10:14am
Thanks again Sergei - your explanation makes perfect sense!

I was slightly alarmed when you mentioned the parameter counting problem, in case its an issue for other scripts i've written, but I don't think my mind works that way!

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.