Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
KH tests (Read 5131 times)
ince
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
KH tests
Apr 1st, 2008 at 8:25am
 
Hello all. I'm a new HyPhy user with two questions:

1)I am trying to use GARD to analyze an alignment that is 3.5 kb and it seems the run-time required exceeds the time allotted, so the analysis doesn't run to completion. Is there a way around this without removing data?

2)I am also using HYPHY to do KH tests on a different data set (not trees generated from GARD). Hyphy reads the alignment and tree files fine, but I am not able to do bootstrap reps because I get an error: Operation MAccess is not defined for 0, Current BL Command: jvec[1][k]=Log(vec2[k]). I do not get this message all the time. What should I try changing? Also is there an obvious reason why I might get a negative LRT when doing a KH test?

Many thanks.


Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #1 - Apr 1st, 2008 at 11:36am
 
Hi Ince,

How many sequences are you trying to analyze with GARD?  Our server has been getting hammered this week because Sergei's teaching some workshops in South Africa. 

An "Operation MAccess" error means that you're trying to index into a matrix outside of its dimensions.  Your example suggests to me that you might be indexing into a vector, but because HyPhy is based on zero-indexing, you should write:
Code:
jvec[0][k] = Log(vec2[k]);
 


The fact that this message doesn't always occur is probably because HyPhy retrieves a useable value from the memory address despite being outside of the matrix.

- Art.
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #2 - Apr 1st, 2008 at 11:44am
 
Oops, spoke too soon  Embarrassed
Now that I'm actually looking over the KHTest.bf batch file, I can see that what's probably going on is that an empty vector is being passed to the function testLRT in place of vec2.  Something's up with the alignment partition and/or trees that you're running the test on.  It would make my life a lot easier if I could replicate the error; can you send your data files to my e-mail?

Best,
- Art.
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #3 - Apr 1st, 2008 at 4:23pm
 
Hi Ince,

Okay, I've figured out the problem.  There was a bug in the template batch file KHtest.bf where the category matrix used to resample sites from the alignment was mistakenly being assigned to the wrong variable (v1 instead of v2), causing the matrix access error.  You need to replace lines 240-247 with the following:

Code:
	LikelihoodFunction lf1 = (filteredData,firstTree);
	Optimize (res1, lf1);
	ConstructCategoryMatrix (v1,lf1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO THE DATA\n", lf1);
	LikelihoodFunction lf2 = (filteredData,secondTree);
	Optimize (res2, lf2);
	ConstructCategoryMatrix (v2,lf2,COMPLETE);  /* corrected matrix assignment */
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO THE DATA1\n", lf2);
 



You also need to make sure that you select the tree estimated from your data FIRST, and then your alternative tree SECOND.  Otherwise, your likelihood ratio test (LRT) statistic will usually be negative  In other words, we expect that a tree estimated from the data should do a better job (in re-fitting a substitution model to a similar data set) then some arbitrary tree.

Best,
- Art.
Back to top
 
 
IP Logged
 
ince
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: KH tests
Reply #4 - Apr 4th, 2008 at 7:43am
 
Art, The fix seems to have worked. However, when I choose more than 2 rate categories, the program seems to choke. I'll do  100 bootstraps and it will give me an LRT and a p value, but then it freezes. I am not sure that the p value (which is always 0.000, but I'm not sure it shouldn't always be so small) is correct in these cases. Any thoughts?

thanks

Will
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #5 - Apr 4th, 2008 at 10:24am
 
Hi Ince,

After a quick look over KHTest.bf, my hunch is that the bootstrap procedure won't work properly on models with rate variation.  This is because the function is expecting to take vector arguments (per-site likelihoods), rather than the full matrix of category-conditional site likelihoods that would be generated for a model with rate variation.

I'm going to confirm this for myself, and then I'll try to come up with a fix if I'm correct.

- Art.
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #6 - Apr 4th, 2008 at 11:43am
 
Huzzah, I fixed it!  Grin

Paste the following code to replace lines 238-280 (approximate) in KHTest.bf:

Code:
else	/* full alignment */
{
	LikelihoodFunction lf1 = (filteredData,firstTree);
	Optimize (res1, lf1);
	ConstructCategoryMatrix (v1,lf1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO THE DATA\n", lf1);

	if (distribution)
	{
		GetInformation (distInfo1, c);
		props1 = {{0,1}} * distInfo1;
	}

	LikelihoodFunction lf2 = (filteredData,secondTree);
	Optimize (res2, lf2);
	ConstructCategoryMatrix (v2,lf2,COMPLETE);
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO THE DATA1\n", lf2);

	if (distribution)
	{
		GetInformation (distInfo2, c);
		props2 = {{0,1}} * distInfo2;
	}

	LRT1 = 2*(res1[1][0]-res2[1][0]);

	fprintf (stdout, "\n\nSUMMARY:\n\n\tLRT = ",LRT1, "\n");
	if (LRT1 < 0)
	{
		fprintf (stdout, "\n\nERROR: The LRTs was expeceted to be positive.\nPlease check your trees and alignemt");
		return 0;
	}

	test1 = {itCount, 1};

	if (distribution)
	{
		test1 = testLRT2 (v1,v2,props1,props2);
	}
	else
	{
		test1 = testLRT (v1,v2)%0;	/* % sorts matrix on column 0 */
	}
 



and then add the following to the end of the file:

Code:
function testLRT2 (vec1, vec2, props1, props2)
{
	size1 = Columns(vec1);	/*  number of sites */
	nclasses1 = Columns(props1);

	sumVec1 = {size1,1};
	jvec	= {2,size1};
	resMx1	= {itCount,1};	/* itCount = number of bootstrap replicates */
	resMx2	= {itCount,1};

	for (k=0; k<size1; k=k+1)
	{
		sumVec1 [k]	   = 1;

		for (j = 0; j < nclasses1; j = j+1)
		{
			jvec [0][k] = jvec[0][k] + props1[j] * vec1[j][k];
			jvec [1][k] = jvec[1][k] + props2[j] * vec2[j][k];
		}
		jvec	[0][k] = Log(jvec[0][k]);	/* convert to log likelihoods */
		jvec	[1][k] = Log(jvec[1][k]);
	}


	for (k=0; k<itCount; k=k+1)
	{
		resampled = Random(jvec,1);		/* resample matrix of equal dimension WITH replacement */
		resampled = resampled*sumVec1;	/* matrix trick for summing log likelihoods across sites */
		resMx1[k] = resampled[0];
		resMx2[k] = resampled[1];
	}

	return (resMx1-resMx2)*2;
}
 



Cheers,
- Art.
Back to top
 
 
IP Logged
 
ince
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: KH tests
Reply #7 - Apr 5th, 2008 at 2:09pm
 
Art, thanks, but now I get a P value of 1.000. Sorry for all the fuss, and thanks for all your help.

Will

below is the KHtest.bf I'm working with.

Code:
VERBOSITY_LEVEL = -1;
ALLOW_SEQUENCE_MISMATCH = 1;

fprintf(stdout,"\n ---- RUNNING KH ALTERNATIVE TOPOLOGY TEST ---- \n");

ChoiceList (dataType,"Data type",1,SKIP_NONE,"Nucleotide/Protein","Nucleotide or amino-acid (protein).",
				     "Codon","Codon (several available genetic codes).");

if (dataType<0)
{
	return;
}

if (dataType)
{
	NICETY_LEVEL = 3;
	SetDialogPrompt ("Please choose a codon data file:");
	#include "TemplateModels/chooseGeneticCode.def";
}
else
{
	SetDialogPrompt ("Please choose a nucleotide or amino-acid data file:");
}

DataSet ds = ReadDataFile (PROMPT_FOR_FILE);
fprintf (stdout,"The following data were read:\n",ds,"\n");

ChoiceList (compType,"Comparison Kind",1,SKIP_NONE,"Full Alignment","Compare the trees applied to the full alignment.",
				     "Recombination Test","Compare the trees on 2 parts of the alignment, reversing their roles before and after the breakpoint.");


if (compType<0)
{
	return;
}

if (dataType)
{
	DataSetFilter filteredData = CreateFilter (ds,3,"","",GeneticCodeExclusions);
}
else
{
	DataSetFilter filteredData = CreateFilter (ds,1);
}


SelectTemplateModel(filteredData);

firstTreeString  = "";
secondTreeString = "";

if (Rows(NEXUS_FILE_TREE_MATRIX)>=2)
{
	ChoiceList (tc,"Choose first tree",1,SKIP_NONE,NEXUS_FILE_TREE_MATRIX);
	if (tc>=0)
	{
		firstTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		skipMx = {{tc__}};
		ChoiceList (tc,"Choose second tree",1,skipMx,NEXUS_FILE_TREE_MATRIX);
		if (tc>=0)
		{
			secondTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		}
	}
	else
	{
		ChoiceList (tc,"Choose second tree",1,SKIP_NONE,NEXUS_FILE_TREE_MATRIX);
		if (tc>=0)
		{
			secondTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		}
	}
}
else
{
	SetDialogPrompt ("Load a tree for the first partition");
	fscanf (PROMPT_FOR_FILE, "String", firstTreeString);
	SetDialogPrompt ("Load a tree for the second partition");
	fscanf (PROMPT_FOR_FILE, "String", secondTreeString);
}

if (Abs(firstTreeString) == 0)
{
	SetDialogPrompt ("Locate the file with the first tree:");
	fscanf 			(PROMPT_FOR_FILE,"Tree",firstTree);
}
else
{
	Tree firstTree = firstTreeString;
}

if (Abs(secondTreeString) == 0)
{
	SetDialogPrompt ("Locate the file with the second tree:");
	fscanf 			(PROMPT_FOR_FILE,"Tree",secondTree);
}
else
{
	Tree secondTree = secondTreeString;
}

if (firstTree == secondTree)
{
	fprintf (stdout, "\nERROR: The two trees cannot be equal\n");
	return 0;
}

firstTreeString  = ""+firstTree;
secondTreeString = ""+secondTree;

fprintf (stdout, "\nFirst tree:\n",firstTreeString, "\n\nSecond Tree:\n",secondTreeString,"\n");

itCount = 0;
while (itCount<1)
{
	fprintf (stdout, "\nHow many bootstrap replicates should be done (>0):");
	fscanf (stdin, "Number", itCount);
}

labels = {{"Resampled LRT"}};

if (compType)
{
	breakPoint = 0;
	if (dataType)
	{
		pPrompt = "codons";
	}
	else
	{
		pPrompt = "bp";
	}
	while (breakPoint<1 || breakPoint > filteredData.sites-2)
	{
		fprintf (stdout, "\nEnter the location of the breakpoint (0-based and measured in ", pPrompt, "). Valid locations are [1-", filteredData.sites-3, "]:");
		fscanf (stdin, "Number", breakPoint);
	}
	fprintf (stdout, "\nUsing the breakpoint at ", breakPoint, "-th ", pPrompt, "\n");
	if (dataType)
	{
		DataSetFilter filteredData1 = CreateFilter (ds,3,siteIndex<3*breakPoint,"",GeneticCodeExclusions);
		DataSetFilter filteredData2 = CreateFilter (ds,3,siteIndex>=3*breakPoint,"",GeneticCodeExclusions);
	}
	else
	{
		DataSetFilter filteredData1 = CreateFilter (ds,1,siteIndex<breakPoint);
		DataSetFilter filteredData2 = CreateFilter (ds,1,siteIndex>=breakPoint);
	}
	LikelihoodFunction lf1_1 = (filteredData1,firstTree);
	Optimize (res1_1, lf1_1);
	ConstructCategoryMatrix (v1,lf1_1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO PARTITION 1\n", lf1_1);
	LikelihoodFunction lf1_2 = (filteredData1,secondTree);
	Optimize (res1_2, lf1_2);
	ConstructCategoryMatrix (v2,lf1_2,COMPLETE);
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO PARTITION 1\n", lf1_2);
	LikelihoodFunction lf2_1 = (filteredData2,firstTree);
	Optimize (res2_1, lf2_1);
	ConstructCategoryMatrix (v3,lf2_1,COMPLETE);
	fprintf (stdout, "\n\n3). FITTING TREE 1 TO PARTITION 2\n", lf2_1);
	LikelihoodFunction lf2_2 = (filteredData2,secondTree);
	Optimize (res2_2, lf2_2);
	ConstructCategoryMatrix (v4,lf2_2,COMPLETE);
	fprintf (stdout, "\n\n4). FITTING TREE 2 TO PARTITION 2\n", lf2_2);

	LRT1 = 2*(res1_1[1][0]-res1_2[1][0]);
	LRT2 = 2*(res2_2[1][0]-res2_1[1][0]);

	fprintf (stdout, "\n\nSUMMARY:\n\n\tPARTITION 1 LRT = ",LRT1, "\n\tPARTITION 2 LRT = ", LRT2);
	if (LRT1 < 0 || LRT2 < 0)
	{
		fprintf (stdout, "\n\nERROR: Both LRTs were expeceted to be positive.\nPlease check your trees and partition");
		return 0;
	}

	test1 = testLRT (v1,v2)%0;

	for (k=0; k<Columns(v1); k=k+1)
	{
		if (test1[k]>0)
		{
			break;
		}
	}

	fprintf (stdout, "\n\nEstimated p-value for the FIRST partition (based on ", itCount," replicates): ", Format(k/itCount,10,4));


	OpenWindow (CHARTWINDOW,{{"Simulated LRT for the First Partition"}
		{"labels"}
		{"test1"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT1}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;10;40");

	test2 = testLRT (v4,v3)%0;

	for (k=0; k<Columns(v3); k=k+1)
	{
		if (test2[k]>0)
		{
			break;
		}
	}

	fprintf (stdout, "\n\nEstimated p-value for the SECOND partition (based on ", itCount," replicates): ", Format(k/itCount,10,4),"\n");
	OpenWindow (CHARTWINDOW,{{"Simulated LRT for the Second Partition"}
		{"labels"}
		{"test2"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT2}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;20+(SCREEN_WIDTH)/2;40");
}
else	/* full alignment */
{
	LikelihoodFunction lf1 = (filteredData,firstTree);
	Optimize (res1, lf1);
	ConstructCategoryMatrix (v1,lf1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO THE DATA\n", lf1);

	if (distribution)
	{
		GetInformation (distInfo1, c);
		props1 = {{0,1}} * distInfo1;
	}

	LikelihoodFunction lf2 = (filteredData,secondTree);
	Optimize (res2, lf2);
	ConstructCategoryMatrix (v2,lf2,COMPLETE);
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO THE DATA1\n", lf2);

	if (distribution)
	{
		GetInformation (distInfo2, c);
		props2 = {{0,1}} * distInfo2;
	}

	LRT1 = 2*(res1[1][0]-res2[1][0]);

	fprintf (stdout, "\n\nSUMMARY:\n\n\tLRT = ",LRT1, "\n");
	if (LRT1 < 0)
	{
		fprintf (stdout, "\n\nERROR: The LRTs was expeceted to be positive.\nPlease check your trees and alignemt");
		return 0;
	}

	test1 = {itCount, 1};

	if (distribution)
	{
		test1 = testLRT2 (v1,v2,props1,props2);
	}
	else
	{
		test1 = testLRT (v1,v2)%0;	/* % sorts matrix on column 0 */
	}
	fprintf (stdout, "\n\nEstimated p-value for  (based on ", itCount," replicates): ", Format(k/itCount,10,4),"\n");
	OpenWindow (CHARTWINDOW,{{"Simulated LRT"}
		{"labels"}
		{"test1"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT1}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;20+(SCREEN_WIDTH)/2;40");
}

/*--------------------------------------------------------------------------------------*/

function testLRT (vec1, vec2)
{
	size1 = Columns(vec1);

	sumVec1 = {size1,1};
	jvec	= {2,size1};
	resMx1	= {itCount,1};
	resMx2	= {itCount,1};

	for (k=0; k<size1; k=k+1)
	{
		sumVec1 [k]	   = 1;
		jvec	[0][k] = Log(vec1[k]);
		jvec	[1][k] = Log(vec2[k]);
	}


	for (k=0; k<itCount; k=k+1)
	{
		resampled = Random(jvec,1);
		resampled = resampled*sumVec1;
		resMx1[k] = resampled[0];
		resMx2[k] = resampled[1];
	}

	return (resMx1-resMx2)*2;
}
function testLRT2 (vec1, vec2, props1, props2)
{
	size1 = Columns(vec1);	/*  number of sites */
	nclasses1 = Columns(props1);

	sumVec1 = {size1,1};
	jvec	= {2,size1};
	resMx1	= {itCount,1};	/* itCount = number of bootstrap replicates */
	resMx2	= {itCount,1};

	for (k=0; k<size1; k=k+1)
	{
		sumVec1 [k]	   = 1;

		for (j = 0; j < nclasses1; j = j+1)
		{
			jvec [0][k] = jvec[0][k] + props1[j] * vec1[j][k];
			jvec [1][k] = jvec[1][k] + props2[j] * vec2[j][k];
		}
		jvec	[0][k] = Log(jvec[0][k]);	/* convert to log likelihoods */
		jvec	[1][k] = Log(jvec[1][k]);
	}


	for (k=0; k<itCount; k=k+1)
	{
		resampled = Random(jvec,1);		/* resample matrix of equal dimension WITH replacement */
		resampled = resampled*sumVec1;	/* matrix trick for summing log likelihoods across sites */
		resMx1[k] = resampled[0];
		resMx2[k] = resampled[1];
	}

	return (resMx1-resMx2)*2;
}

 

Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #8 - Apr 7th, 2008 at 12:32pm
 
Hi Will,

Found a bug in the code.  Try this:

Code:
VERBOSITY_LEVEL = -1;
ALLOW_SEQUENCE_MISMATCH = 1;

fprintf(stdout,"\n ---- RUNNING KH ALTERNATIVE TOPOLOGY TEST ---- \n");

ChoiceList (dataType,"Data type",1,SKIP_NONE,"Nucleotide/Protein","Nucleotide or amino-acid (protein).",
				     "Codon","Codon (several available genetic codes).");

if (dataType<0)
{
	return;
}

if (dataType)
{
	NICETY_LEVEL = 3;
	SetDialogPrompt ("Please choose a codon data file:");
	#include "TemplateModels/chooseGeneticCode.def";
}
else
{
	SetDialogPrompt ("Please choose a nucleotide or amino-acid data file:");
}

DataSet ds = ReadDataFile (PROMPT_FOR_FILE);
fprintf (stdout,"The following data were read:\n",ds,"\n");

ChoiceList (compType,"Comparison Kind",1,SKIP_NONE,"Full Alignment","Compare the trees applied to the full alignment.",
				     "Recombination Test","Compare the trees on 2 parts of the alignment, reversing their roles before and after the breakpoint.");


if (compType<0)
{
	return;
}

if (dataType)
{
	DataSetFilter filteredData = CreateFilter (ds,3,"","",GeneticCodeExclusions);
}
else
{
	DataSetFilter filteredData = CreateFilter (ds,1);
}


SelectTemplateModel(filteredData);

firstTreeString  = "";
secondTreeString = "";

if (Rows(NEXUS_FILE_TREE_MATRIX)>=2)
{
	ChoiceList (tc,"Choose first tree",1,SKIP_NONE,NEXUS_FILE_TREE_MATRIX);
	if (tc>=0)
	{
		firstTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		skipMx = {{tc__}};
		ChoiceList (tc,"Choose second tree",1,skipMx,NEXUS_FILE_TREE_MATRIX);
		if (tc>=0)
		{
			secondTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		}
	}
	else
	{
		ChoiceList (tc,"Choose second tree",1,SKIP_NONE,NEXUS_FILE_TREE_MATRIX);
		if (tc>=0)
		{
			secondTreeString = NEXUS_FILE_TREE_MATRIX[tc][1];
		}
	}
}
else
{
	SetDialogPrompt ("Load a tree for the first partition");
	fscanf (PROMPT_FOR_FILE, "String", firstTreeString);
	SetDialogPrompt ("Load a tree for the second partition");
	fscanf (PROMPT_FOR_FILE, "String", secondTreeString);
}

if (Abs(firstTreeString) == 0)
{
	SetDialogPrompt ("Locate the file with the first tree:");
	fscanf 			(PROMPT_FOR_FILE,"Tree",firstTree);
}
else
{
	Tree firstTree = firstTreeString;
}

if (Abs(secondTreeString) == 0)
{
	SetDialogPrompt ("Locate the file with the second tree:");
	fscanf 			(PROMPT_FOR_FILE,"Tree",secondTree);
}
else
{
	Tree secondTree = secondTreeString;
}

if (firstTree == secondTree)
{
	fprintf (stdout, "\nERROR: The two trees cannot be equal\n");
	return 0;
}

firstTreeString  = ""+firstTree;
secondTreeString = ""+secondTree;

fprintf (stdout, "\nFirst tree:\n",firstTreeString, "\n\nSecond Tree:\n",secondTreeString,"\n");

itCount = 0;
while (itCount<1)
{
	fprintf (stdout, "\nHow many bootstrap replicates should be done (>0):");
	fscanf (stdin, "Number", itCount);
}

labels = {{"Resampled LRT"}};

if (compType)		/* recombination */
{
	breakPoint = 0;
	if (dataType)
	{
		pPrompt = "codons";
	}
	else
	{
		pPrompt = "bp";
	}
	while (breakPoint<1 || breakPoint > filteredData.sites-2)
	{
		fprintf (stdout, "\nEnter the location of the breakpoint (0-based and measured in ", pPrompt, "). Valid locations are [1-", filteredData.sites-3, "]:");
		fscanf (stdin, "Number", breakPoint);
	}
	fprintf (stdout, "\nUsing the breakpoint at ", breakPoint, "-th ", pPrompt, "\n");
	if (dataType)
	{
		DataSetFilter filteredData1 = CreateFilter (ds,3,siteIndex<3*breakPoint,"",GeneticCodeExclusions);
		DataSetFilter filteredData2 = CreateFilter (ds,3,siteIndex>=3*breakPoint,"",GeneticCodeExclusions);
	}
	else
	{
		DataSetFilter filteredData1 = CreateFilter (ds,1,siteIndex<breakPoint);
		DataSetFilter filteredData2 = CreateFilter (ds,1,siteIndex>=breakPoint);
	}
	LikelihoodFunction lf1_1 = (filteredData1,firstTree);
	Optimize (res1_1, lf1_1);
	ConstructCategoryMatrix (v1,lf1_1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO PARTITION 1\n", lf1_1);
	LikelihoodFunction lf1_2 = (filteredData1,secondTree);
	Optimize (res1_2, lf1_2);
	ConstructCategoryMatrix (v2,lf1_2,COMPLETE);
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO PARTITION 1\n", lf1_2);
	LikelihoodFunction lf2_1 = (filteredData2,firstTree);
	Optimize (res2_1, lf2_1);
	ConstructCategoryMatrix (v3,lf2_1,COMPLETE);
	fprintf (stdout, "\n\n3). FITTING TREE 1 TO PARTITION 2\n", lf2_1);
	LikelihoodFunction lf2_2 = (filteredData2,secondTree);
	Optimize (res2_2, lf2_2);
	ConstructCategoryMatrix (v4,lf2_2,COMPLETE);
	fprintf (stdout, "\n\n4). FITTING TREE 2 TO PARTITION 2\n", lf2_2);

	LRT1 = 2*(res1_1[1][0]-res1_2[1][0]);
	LRT2 = 2*(res2_2[1][0]-res2_1[1][0]);

	fprintf (stdout, "\n\nSUMMARY:\n\n\tPARTITION 1 LRT = ",LRT1, "\n\tPARTITION 2 LRT = ", LRT2);
	if (LRT1 < 0 || LRT2 < 0)
	{
		fprintf (stdout, "\n\nERROR: Both LRTs were expeceted to be positive.\nPlease check your trees and partition");
		return 0;
	}

	test1 = testLRT (v1,v2)%0;

	for (k=0; k<Columns(v1); k=k+1)
	{
		if (test1[k]>0)
		{
			break;
		}
	}

	fprintf (stdout, "\n\nEstimated p-value for the FIRST partition (based on ", itCount," replicates): ", Format(k/itCount,10,4));


	OpenWindow (CHARTWINDOW,{{"Simulated LRT for the First Partition"}
		{"labels"}
		{"test1"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT1}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;10;40");

	test2 = testLRT (v4,v3)%0;

	for (k=0; k<Columns(v3); k=k+1)
	{
		if (test2[k]>0)
		{
			break;
		}
	}

	fprintf (stdout, "\n\nEstimated p-value for the SECOND partition (based on ", itCount," replicates): ", Format(k/itCount,10,4),"\n");
	OpenWindow (CHARTWINDOW,{{"Simulated LRT for the Second Partition"}
		{"labels"}
		{"test2"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT2}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;20+(SCREEN_WIDTH)/2;40");
}
else	/* full alignment */
{
	LikelihoodFunction lf1 = (filteredData,firstTree);
	Optimize (res1, lf1);
	ConstructCategoryMatrix (v1,lf1,COMPLETE);
	fprintf (stdout, "\n\n1). FITTING TREE 1 TO THE DATA\n", lf1);

	if (distribution)
	{
		GetInformation (distInfo1, c);
		props1 = {{0,1}} * distInfo1;
	}

	LikelihoodFunction lf2 = (filteredData,secondTree);
	Optimize (res2, lf2);
	ConstructCategoryMatrix (v2,lf2,COMPLETE);
	fprintf (stdout, "\n\n2). FITTING TREE 2 TO THE DATA1\n", lf2);

	if (distribution)
	{
		GetInformation (distInfo2, c);
		props2 = {{0,1}} * distInfo2;
	}

	LRT1 = 2*(res1[1][0]-res2[1][0]);

	fprintf (stdout, "\n\nSUMMARY:\n\n\tLRT = ",LRT1, "\n");
	if (LRT1 < 0)
	{
		fprintf (stdout, "\n\nERROR: The LRTs was expeceted to be positive.\nPlease check your trees and alignemt");
		return 0;
	}

	test1 = {itCount, 1};

	if (distribution)
	{
		test1 = testLRT2 (v1,v2,props1,props2)%0;
	}
	else
	{
		test1 = testLRT (v1,v2)%0;	/* % sorts matrix on column 0 */
	}

	/* for (k=0; k<Columns(v1); k=k+1) */
	for (k = 0; k < itCount; k = k + 1)
	{
		if ((test1[k] - LRT1) > 0)
		{
			break;		/* find bootstrap replicate whose LRT exceeds the observed value */
		}
	}

	fprintf (stdout, "\n\nEstimated p-value for  (based on ", itCount," replicates): ", Format(k/itCount,10,4),"\n");

	OpenWindow (CHARTWINDOW,{{"Simulated LRT"}
		{"labels"}
		{"test1"}
		{"Step Plot"}
		{"Index"}
		{labels[0]}
		{"Index"}
		{"LRT"}
		{"LRT"}
		{""}
		{""+LRT1}
		{""}
		{"10;1.309;0.785398"}
		{"Times:12:0;Times:10:0;Times:14:2"}
		{"0;0;16777215;5000268;0;0;6750054;11842740;13158600;14474460;0;3947580;16777215;5000268;11776947;10066329;9199669;7018159;1460610;16748822;11184810;14173291;14173291"}
		{"16"}
		},
		"(SCREEN_WIDTH-30)/2;(SCREEN_HEIGHT-50)/2;20+(SCREEN_WIDTH)/2;40");
}

/*--------------------------------------------------------------------------------------*/

function testLRT (vec1, vec2)
{
	size1 = Columns(vec1);	/*  number of sites */

	sumVec1 = {size1,1};
	jvec	= {2,size1};
	resMx1	= {itCount,1};	/* itCount = number of bootstrap replicates */
	resMx2	= {itCount,1};

	for (k=0; k<size1; k=k+1)
	{
		sumVec1 [k]	   = 1;
		jvec	[0][k] = Log(vec1[k]);	/* convert to log likelihoods */
		jvec	[1][k] = Log(vec2[k]);
	}


	for (k=0; k<itCount; k=k+1)
	{
		resampled = Random(jvec,1);		/* resample matrix of equal dimension WITH replacement */
		resampled = resampled*sumVec1;	/* matrix trick for summing log likelihoods across sites */
		resMx1[k] = resampled[0];
		resMx2[k] = resampled[1];
	}

	return (resMx1-resMx2)*2;
}


/*--------------------------------------------------------------------------------------*/

function testLRT2 (vec1, vec2, props1, props2)
{
	size1 = Columns(vec1);	/*  number of sites */
	nclasses1 = Columns(props1);

	sumVec1 = {size1,1};
	jvec	= {2,size1};
	resMx1	= {itCount,1};	/* itCount = number of bootstrap replicates */
	resMx2	= {itCount,1};

	for (k=0; k<size1; k=k+1)
	{
		sumVec1 [k]	   = 1;

		for (j = 0; j < nclasses1; j = j+1)
		{
			jvec [0][k] = jvec[0][k] + props1[j] * vec1[j][k];
			jvec [1][k] = jvec[1][k] + props2[j] * vec2[j][k];
		}
		jvec	[0][k] = Log(jvec[0][k]);	/* convert to log likelihoods */
		jvec	[1][k] = Log(jvec[1][k]);
	}


	for (k=0; k<itCount; k=k+1)
	{
		resampled = Random(jvec,1);		/* resample matrix of equal dimension WITH replacement */
		resampled = resampled*sumVec1;	/* matrix trick for summing log likelihoods across sites */
		resMx1[k] = resampled[0];
		resMx2[k] = resampled[1];
	}

	return (resMx1-resMx2)*2;
}
 


Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #9 - Apr 7th, 2008 at 12:37pm
 
Although I fixed how P-value was being computed, I find it to be a funny sort of test.  Essentially, the function testLRT is resampling sites and recomputing LRT based on that sample.  Although the bootstrap is a decent means of quantifying our uncertainty in LRT (one should expect the 'real' value to fall somewhere at the median of the distribution, such that P is usually around 0.5), I don't quite understand how this is evaluating how much BETTER of a fit one tree provides than another.

- Art.
Back to top
 
 
IP Logged
 
ince
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: KH tests
Reply #10 - Apr 10th, 2008 at 12:13pm
 
Art, I am using this test to determine if the two trees are different from each other on the assumption that the P value would reflect the probability that one tree fits the data significantly better than the other (based on re-sampling analysis of the LR), but I may be mistaken in my assumptions/conceptualization. As it is now for the KHtest, it looks like the P value is always around 0.5, and this may be consistent with how the test works as you have described. However, it seems like a departure from the KH test that is implemented in the GARDprocessor.bf. In that analysis, the P values makes a little more sense to me as trees generated from contiguous partitions of the alignment have higher P values than those that are further apart in an alignment where GARD detects recombination. Are these two versions of the test consistent? thoughts?

many thanks

Will
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: KH tests
Reply #11 - Apr 10th, 2008 at 12:26pm
 
Hi Will,

As far as determining whether one tree fits the entire alignment significantly better than another, resampling sites from the alignment in a non-parametric bootstrap makes little sense to me other than as a means of generating a confidence interval around your estimate of the LRT.  I would go with that interpretation and ignore the P-value reported by the batch file.  I'll have to ask (i.e. ridicule) Sergei about that batch file when he gets back.  Grin

Cheers,
-Art.
Back to top
 
 
IP Logged