HyPhy message board - Relative Ratio Test and 2 nt. subs. models

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › Methodology Questions › How to › Relative Ratio Test and 2 nt. subs. models

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

Relative Ratio Test and 2 nt. subs. models (Read 1766 times)

David_Soria

YaBB Newbies

Offline

I love YaBB 1G - SP1!

Posts: 1

Relative Ratio Test and 2 nt. subs. models
Sep 10^th, 2005 at 11:15am

Hello,
I want to test whether two non-coding regions from different genomes are evolving in a proportional fashion using only 3 taxa. In order to do that I am implementing the Relative Ratio Test in Hyphy. However, the nucleotide substitution model for one of the region is F81 while for the other one is K81uf. In Hyphy only one nt. substitution model can be used under the Reltive Ratio Test. If I select the F81 the RRT is not statistically significant but for the K81uf the RRT has a p-value <0,01.
My first question is whether is possible to implement the RRT using two different subs model. Secondly, if it is not possible which substitution model I should use (the F81 or the K81uf).
Thank you very much.
David

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: Relative Ratio Test and 2 nt. subs. models
Reply #1 - Sep 11^th, 2005 at 11:53am

Dear David,

It is possible to perform a Rel.Ratio test with two different models, but not through the standard analysis.

You can either use the GUI (see Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login for some pointers) or write a custom batch file. You should be able to use the batch file below to run the test you want to try (F81 and K81uf). It should also be fairly straightforward to modify the file with other models (e.g. HKY85 and REV etc).

Code:

/* This is an example HY-PHY Batch File.

   It reads in two  nucleotide datasets (assumed to share the same tree)
   and performs the relative ratio test on a tree, using F81 model on the
   first partition and K81uf on the second. The likelihood ratio statistic
   is evaluated and the P-value for the test is reported.

   Sergei L. Kosakovsky Pond
   September 2005

*/



/* 1. Read in the data and store the result in  DataSet variables. Also prompt the user
for a tree (assumed to be in the first data file), by including a stanard queryTree.bf module*/


SetDialogPrompt ("Locate the first nucleotide sequence file:");
DataSet 		nucleotideSequence1 = ReadDataFile (PROMPT_FOR_FILE);

incFileName = HYPHY_BASE_DIRECTORY+"TemplateBatchFiles"+DIRECTORY_SEPARATOR+"queryTree.bf";
ExecuteCommands  ("#include \""+incFileName+"\";");

SetDialogPrompt ("Locate the second nucleotide sequence file:");
DataSet 		nucleotideSequence2 = ReadDataFile (PROMPT_FOR_FILE);

if (nucleotideSequence1.species != nucleotideSequence2.species)
{
	fprintf (stdout, "\nERROR: Both data sets must have the same number of sequences\n");
	return 0;
}

/* 2. Filter the data, specifying that all of the data is to be used
	  and that it is to be treated as nucleotides. */



DataSetFilter	filteredData1 = CreateFilter (nucleotideSequence1,1);
DataSetFilter	filteredData2 = CreateFilter (nucleotideSequence2,1);



/* 3. Collect observed nucleotide frequencies from the filtered data. observedFreqs will
	  store the vector of frequencies. */



HarvestFrequencies (observedFreqs1, filteredData1, 1, 1, 1);
HarvestFrequencies (observedFreqs2, filteredData2, 1, 1, 1);

/* 4. Define the F81 and K81uf substitution matrix. '*' is defined to be -(sum of off-diag row elements) */

F81RateMatrix    =  {{*,t,t,t}{t,*,t,t}{t,t,*,t}{t,t,t,*}};

global			 R_AC = 1; /* Relative to R_AG = 1*/
global			 R_AT = 1;

K81ufRateMatrix  =  {{*,R_AC*t,t,R_AT*t}{R_AC*t,*,R_AT*t,t}{t,R_AT*t,*,R_AC*t}{R_AT*t,t,R_AC*t,*}};


/*5.  Define the models models, by combining the substitution matrix with the vector of observed
	  (equilibrium) frequencies. */


Model 	F81_Model   = (F81RateMatrix , observedFreqs1);
Model   K81uf_Model = (K81ufRateMatrix , observedFreqs2);


/*6.  Now we can define 2 trees - one for each data block. We use appropriate models for each one.
	  treeString is defined by 'queryTree.bf' */

TRY_NUMERIC_SEQUENCE_MATCH = 1;

UseModel (F81_Model);
Tree	tree_1 = treeString;

UseModel (K81uf_Model);
Tree	tree_2 = treeString;



/*7.  Since all the likelihood function ingredients (data, tree, equilibrium frequencies)
	  have been defined we are ready to construct the likelihood function. We
	  combine both datasets into one likelihood function. */

LikelihoodFunction  theLnLik = (filteredData1, tree_1,filteredData2, tree_2);



/*8.  Maximize the likelihood function, storing parameter values in the matrix paramValues.
	  We also store the resulting ln-lik and the number of model parameters. */


Optimize (paramValues, theLnLik);
unconstrainedLnLik = paramValues[1][0];
paramCount = paramValues[1][1] + 6	;


/*9.  Print the tree with optimal branch lengths to the console. */
/*    also print the ratio in branch lengths between matching branches */

fprintf  (stdout, "\n1). UNCONSTRAINED MODEL:", theLnLik);

bl1 = BranchLength (tree_1, -1);
bl2 = BranchLength (tree_2, -1);
bn  = BranchName   (tree_1, -1);

fprintf (stdout, "\n\nBranch lengths report\n\nLength in Tree 1, Length in Tree 2, Ratio, Branch Name\n");

for (k=0; k<Columns (bn)-1; k=k+1)
{
	fprintf (stdout, Format (bl1[k], 16, 6), "," , Format (bl2[k], 17, 6), ", ", Format(bl1[k]/bl2[k],6,3), ", ", bn[k], "\n");
}

/*10. We now constrain the rate of evolution along each branch to be proportional in both trees.
	  R will represent the ratio. We use ReplicateConstraint to automatically
	  attach the same constraint to all branches of the tree.
	  F*/

global REL_R = 1;
ReplicateConstraint ("this1.?.t:=REL_R*this2.?.t", tree_2, tree_1);
Optimize (paramValues, theLnLik);


/*11. Now we compute the ln-lik ratio statistic and the P-Value, using the Chi^2 dist'n
	  with an appropriate degree of freedom. */

lnlikDelta = 2 (unconstrainedLnLik-paramValues[1][0]);
pValue = 1-CChi2 (lnlikDelta, paramCount - paramValues[1][1]);

fprintf (stdout, "\n\n1). Relative ratio constraint\n\nLikelihood Ratio Statistic = ", lnLikDelte, "\nP-value = ", pValue, "\n", theLnLik);

bl1 = BranchLength (tree_1, -1);
bl2 = BranchLength (tree_2, -1);
bn  = BranchName   (tree_1, -1);

fprintf (stdout, "\n\nBranch lengths report\n\nLength in Tree 1, Length in Tree 2, Ratio, Branch Name\n");

for (k=0; k<Columns (bn)-1; k=k+1)
{
	fprintf (stdout, Format (bl1[k], 16, 6), "," , Format (bl2[k], 17, 6), ", ", Format(bl1[k]/bl2[k],6,3), ", ", bn[k], "\n");
}

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page