Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
constraining variables twice? (Read 4067 times)
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
constraining variables twice?
Jan 20th, 2006 at 7:45am
 
Hi - i'm trying to model an alignment of four genes, but assuming a tree and branch lengths, with a separate GTR model for each partition. To do this, I want to constrain the overall mean rate to be equal to the branch lengths of the trees I give it.. This all seems to work fine, but I can't work out how to constrain the mean rate of my four partitions.

The rates of the four partitions are given by gene1rate, gene2rate etc., and I want to constrain the sum of rate*(numsite/totalsites) to be one.
I initially tried to do the following, but I think I realise what i'm doing, rather than constraining the sum of meanrates to be 1.0, the line overallrate := 1.0 clears the dependency of overallrate := meanrate1 + meanrate2 etc. and instead enforces that overallrate is 1! This seems obvious in retrospect.

How should I construct this slightly complicated constraint, given that the := operator only allows a single variable on the LHS? I'm sure there's an obvious way to do it, but i can't work it out.. I'm guessing you've come across similar problems and will see it straight away!

Thanks,
James

Code:
DataSetFilter filteredData12S = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][0]);
DataSetFilter filteredDataPOS1 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][1]);
DataSetFilter filteredDataPOS2 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][2]);
DataSetFilter filteredDataPOS3 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][3]);


ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;

global gene1rate = 1;
global gene2rate = 1;
global gene3rate = 1;
global gene4rate = 1;

/*constrain all these to 1.*/
global totalsize := filteredData12S.sites + filteredDataPOS1.sites + filteredDataPOS2.sites + filteredDataPOS3.sites;
fprintf (stdout,"DATASET IS ",totalsize," CHARS\n");
fprintf (stdout,"PARTITIONS ARE ",filteredData12S.sites," ",filteredDataPOS1.sites," ",filteredDataPOS2.sites," ",filteredDataPOS3.sites,"\n");
global meanrate1 := gene1rate * (filteredData12S.sites / totalsize);
global meanrate2 := gene2rate * (filteredDataPOS1.sites / totalsize);
global meanrate3 := gene3rate * (filteredDataPOS2.sites / totalsize);
global meanrate4 := gene4rate * (filteredDataPOS3.sites / totalsize);

global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
global overallrate := 1.0;
 

Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining variables twice?
Reply #1 - Jan 20th, 2006 at 9:06am
 
Whoops - its okay.. i've worked it out - I was being slightly dense - not difficult once i'd realised there are only really 3 rate parameters because of the constraints.. Just in case anyone else wants to do something similar and are having a stupid day, I've done this, and it seems to work OK.

There was another problem, in that HYPHY was trying to optimise the variables called things like filteredData12S.site - is there a way to avoid this, perhaps using the executeCommand function you've shown me before?

Code:
DataSetFilter filteredData12S = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][0]);
DataSetFilter filteredDataPOS1 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][1]);
DataSetFilter filteredDataPOS2 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][2]);
DataSetFilter filteredDataPOS3 = CreateFilter (ds,1,DATA_FILE_PARTITION_MATRIX[1][3]);


ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;

global gene1rate = 1;
global gene2rate = 1;
global gene3rate = 1;
global gene4rate = 1;

/*constrain all these to 1.*/
global totalsize := 1141;
fprintf (stdout,"DATASET IS ",totalsize," CHARS\n");
fprintf (stdout,"PARTITIONS ARE 383 254 254 250\n");
global meanrate1 := gene1rate * (383 / 1141);
global meanrate2 := gene2rate * (254 / 1141);
global meanrate3 := gene3rate * (254 / 1141);
global meanrate4 := gene4rate * (250 / 1141);

global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
/*global overallrate := 1.0;*/

global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (1141 / 383);

ACCEPT_BRANCH_LENGTHS =
/*overallrate no longer really needed, just here to check*/
global overallrate := meanrate1 + meanrate2 + meanrate3 + meanrate4;
/*global overallrate := 1.0;*/

global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize / filteredData12S.sites);
 

Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining variables twice?
Reply #2 - Jan 20th, 2006 at 9:08am
 
Dear James,

Perhaps the easiest thing to do is the following.

1). Define the sum of all the variables you want scaled to 1.
Code:
global myRate1 = 1;
...
global myRate4 = 1;
global sum_var:= (myRate1*mySites1+...+myRate4*mySites4)/totalSites;
 



2). Next, define four scaled variables:
Code:
global scaledRate1 := myRate1 / sum_var;
...
global scaledRate4 := myRate4 / sum_var;
 



3). Lastly, define your models in terms of scaledRate variables.

The only caveat: HyPhy is going to overcount the number of parameters by one, with this parameterization; so when you do LRT or AIC (or similar) calculations, be sure to correct for that.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining variables twice?
Reply #3 - Jan 20th, 2006 at 9:20am
 
Thanks, Sergei.. If I do it my way (which I think is equivalent but less pretty), will Hyphy also overestimate the parameter count? Is there a simple explanation of why it does, so I can try and spot it in future.

Thanks again for the fantastic support you provide for Hyphy!

Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining variables twice?
Reply #4 - Jan 20th, 2006 at 9:42am
 
Dear James,

Quote:
There was another problem, in that HYPHY was trying to optimise the variables called things like filteredData12S.site - is there a way to avoid this, perhaps using the executeCommand function you've shown me before?

...snip...

Code:
global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize / filteredData12S.sites);
 




When HyPhy scans the definition for gene1rate, it will include all other variables in the formula as likelihood function parameters, even though they may have no effect on the likelihood function (HyPhy doesn't really have a certain way of determining, without too much effort, which parameters are 'real' and which are 'ghosts'). If, on the other hand, you write:

Code:
global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * (totalsize__ / filteredData12S.sites__);
 



HyPhy will interpret the (I think it may not be well documented) double underscore suffix (__) as the instruction to instead insert current values to totalsize filteredData12S.sites in the formula. This would be the same (but more compact) than writing

Code:
ExecuteCommands("global gene1rate := (1.0 - (meanrate2 + meanrate3 + meanrate4)) * ("+totalsize+" /"+ filteredData12S.sites+");");
 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining variables twice?
Reply #5 - Jan 20th, 2006 at 9:50am
 
Dear James,

Quote:
Thanks, Sergei.. If I do it my way (which I think is equivalent but less pretty), will Hyphy also overestimate the parameter count? Is there a simple explanation of why it does, so I can try and spot it in future.


Your parameterization is indeed equivalent. The only difference, is that, in your case, (1-meanrate1-meanrate2-meanrate3) could become negative, since meanrate1+meanrate2+meanrate3 could exceed one, because their upper bounds are not constrained. This is usually not an issue, because HyPhy will automatically enforce non-negative values on all likelihood parameters (unless explicitly overridden), by heavily penalizing (subtracting something like 10^100 from the likelihood score) if any parameter values become negative.

My parameterization avoids the <0 issue, which tends to speed up evaluations and help optimization routines not get trapped in odd parts of the parameter space.

The reason HyPhy overcounts the parameters in my parametrization, is because I impose a complex (i.e. not of the form varX:=expression) constraint implicitly. HyPhy will scan four "independent" variables (myRate1,...,myRate4), without being able to automatically determine that these four parameters are bound by an implicit constraint (weighted mean of one).

In your parametrization, on the other hand, all constraints are explicit, so the number of parameters will be counted correctly.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining variables twice?
Reply #6 - Jan 20th, 2006 at 10:14am
 
Thanks again Sergei - your explanation makes perfect sense!

I was slightly alarmed when you mentioned the parameter counting problem, in case its an issue for other scripts i've written, but I don't think my mind works that way!
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged