Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Relative Ratio Test and 2 nt. subs. models (Read 1766 times)
David_Soria
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 1
Relative Ratio Test and 2 nt. subs. models
Sep 10th, 2005 at 11:15am
 
Hello,
I want to test whether two non-coding regions from different genomes are evolving in a proportional fashion using only 3 taxa. In order to do that I am implementing the Relative Ratio Test in Hyphy. However, the nucleotide substitution model  for one of the region is F81 while for the other one is K81uf. In Hyphy only one nt. substitution model can be used under the Reltive Ratio Test. If I select the F81 the RRT is not statistically significant but for the K81uf the RRT has a p-value <0,01.
My first question is whether is possible to implement the RRT using two different subs model. Secondly, if it is not possible which substitution model I should use (the F81 or the K81uf).
Thank you very much.
David
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Relative Ratio Test and 2 nt. subs. models
Reply #1 - Sep 11th, 2005 at 11:53am
 
Dear David,

It is possible to perform a Rel.Ratio test with two different models, but not through the standard analysis.

You can either use the GUI (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for some pointers) or write a custom batch file. You should be able to use the batch file below to run the test you want to try (F81 and K81uf). It should also be fairly straightforward to modify the file with other models (e.g. HKY85 and REV etc).

Code:
/* This is an example HY-PHY Batch File.

   It reads in two  nucleotide datasets (assumed to share the same tree)
   and performs the relative ratio test on a tree, using F81 model on the
   first partition and K81uf on the second. The likelihood ratio statistic
   is evaluated and the P-value for the test is reported.

   Sergei L. Kosakovsky Pond
   September 2005

*/



/* 1. Read in the data and store the result in  DataSet variables. Also prompt the user
for a tree (assumed to be in the first data file), by including a stanard queryTree.bf module*/


SetDialogPrompt ("Locate the first nucleotide sequence file:");
DataSet 		nucleotideSequence1 = ReadDataFile (PROMPT_FOR_FILE);

incFileName = HYPHY_BASE_DIRECTORY+"TemplateBatchFiles"+DIRECTORY_SEPARATOR+"queryTree.bf";
ExecuteCommands  ("#include \""+incFileName+"\";");

SetDialogPrompt ("Locate the second nucleotide sequence file:");
DataSet 		nucleotideSequence2 = ReadDataFile (PROMPT_FOR_FILE);

if (nucleotideSequence1.species != nucleotideSequence2.species)
{
	fprintf (stdout, "\nERROR: Both data sets must have the same number of sequences\n");
	return 0;
}

/* 2. Filter the data, specifying that all of the data is to be used
	  and that it is to be treated as nucleotides. */



DataSetFilter	filteredData1 = CreateFilter (nucleotideSequence1,1);
DataSetFilter	filteredData2 = CreateFilter (nucleotideSequence2,1);



/* 3. Collect observed nucleotide frequencies from the filtered data. observedFreqs will
	  store the vector of frequencies. */



HarvestFrequencies (observedFreqs1, filteredData1, 1, 1, 1);
HarvestFrequencies (observedFreqs2, filteredData2, 1, 1, 1);

/* 4. Define the F81 and K81uf substitution matrix. '*' is defined to be -(sum of off-diag row elements) */

F81RateMatrix    =  {{*,t,t,t}{t,*,t,t}{t,t,*,t}{t,t,t,*}};

global			 R_AC = 1; /* Relative to R_AG = 1*/
global			 R_AT = 1;

K81ufRateMatrix  =  {{*,R_AC*t,t,R_AT*t}{R_AC*t,*,R_AT*t,t}{t,R_AT*t,*,R_AC*t}{R_AT*t,t,R_AC*t,*}};


/*5.  Define the models models, by combining the substitution matrix with the vector of observed
	  (equilibrium) frequencies. */


Model 	F81_Model   = (F81RateMatrix , observedFreqs1);
Model   K81uf_Model = (K81ufRateMatrix , observedFreqs2);


/*6.  Now we can define 2 trees - one for each data block. We use appropriate models for each one.
	  treeString is defined by 'queryTree.bf' */

TRY_NUMERIC_SEQUENCE_MATCH = 1;

UseModel (F81_Model);
Tree	tree_1 = treeString;

UseModel (K81uf_Model);
Tree	tree_2 = treeString;



/*7.  Since all the likelihood function ingredients (data, tree, equilibrium frequencies)
	  have been defined we are ready to construct the likelihood function. We
	  combine both datasets into one likelihood function. */

LikelihoodFunction  theLnLik = (filteredData1, tree_1,filteredData2, tree_2);



/*8.  Maximize the likelihood function, storing parameter values in the matrix paramValues.
	  We also store the resulting ln-lik and the number of model parameters. */


Optimize (paramValues, theLnLik);
unconstrainedLnLik = paramValues[1][0];
paramCount = paramValues[1][1] + 6	;


/*9.  Print the tree with optimal branch lengths to the console. */
/*    also print the ratio in branch lengths between matching branches */

fprintf  (stdout, "\n1). UNCONSTRAINED MODEL:", theLnLik);

bl1 = BranchLength (tree_1, -1);
bl2 = BranchLength (tree_2, -1);
bn  = BranchName   (tree_1, -1);

fprintf (stdout, "\n\nBranch lengths report\n\nLength in Tree 1, Length in Tree 2, Ratio, Branch Name\n");

for (k=0; k<Columns (bn)-1; k=k+1)
{
	fprintf (stdout, Format (bl1[k], 16, 6), "," , Format (bl2[k], 17, 6), ", ", Format(bl1[k]/bl2[k],6,3), ", ", bn[k], "\n");
}

/*10. We now constrain the rate of evolution along each branch to be proportional in both trees.
	  R will represent the ratio. We use ReplicateConstraint to automatically
	  attach the same constraint to all branches of the tree.
	  F*/

global REL_R = 1;
ReplicateConstraint ("this1.?.t:=REL_R*this2.?.t", tree_2, tree_1);
Optimize (paramValues, theLnLik);


/*11. Now we compute the ln-lik ratio statistic and the P-Value, using the Chi^2 dist'n
	  with an appropriate degree of freedom. */

lnlikDelta = 2 (unconstrainedLnLik-paramValues[1][0]);
pValue = 1-CChi2 (lnlikDelta, paramCount - paramValues[1][1]);

fprintf (stdout, "\n\n1). Relative ratio constraint\n\nLikelihood Ratio Statistic = ", lnLikDelte, "\nP-value = ", pValue, "\n", theLnLik);

bl1 = BranchLength (tree_1, -1);
bl2 = BranchLength (tree_2, -1);
bn  = BranchName   (tree_1, -1);

fprintf (stdout, "\n\nBranch lengths report\n\nLength in Tree 1, Length in Tree 2, Ratio, Branch Name\n");

for (k=0; k<Columns (bn)-1; k=k+1)
{
	fprintf (stdout, Format (bl1[k], 16, 6), "," , Format (bl2[k], 17, 6), ", ", Format(bl1[k]/bl2[k],6,3), ", ", bn[k], "\n");
}
 

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged