HyPhy message board - KH test error

Sundy

YaBB Newbies

Offline

Feed your monkey!

Posts: 33

KH test error
Feb 13^th, 2009 at 9:45am

Dear everyone,

Did any one meet error like following, when you run KH test for GARD with HyPhy?

Operation MAccess is not defined for 0
Current BL command: jvec[1][k]=vec2[k] Current task has been terminated. Would you like to see the remaining error messages, if there are any?

I met this problem for several times recently. It always stop in the middle of analysis. I can not get the final results of KH test.
Could anyone tell me how to fix it? I appreciate!!

Best regards,

Sundy

Back to top

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #1 - Feb 13^th, 2009 at 11:31am Dear Sundy, Try using the attached file (drop it into TemplateBatchFiles). There was a bug in the older versions of KHTest.bf that could result in the error that you are seeing. Keep those bug reports coming:) Cheers, Sergei
Back to top	Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to Login (7 KB \| ) Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sundy YaBB Newbies Offline Feed your monkey! Posts: 33	Re: KH test error Reply #2 - Feb 13^th, 2009 at 1:41pm Dear Sergei, Thank you so much. But I am sorry that this error is still there. The attached file is my sequence and GARD.splits file. Could you please have a look of them, whether something wrong with these files? Thank you! Best regards, Sundy
Back to top	Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to Login (7 KB \| ) IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #3 - Feb 13^th, 2009 at 3:27pm Dear Sundy, Please try replacing GARDProcessor.bf in TemplateBatchFiles with the file I attach. I was able to run your example through the script without problems. Cheers, Sergei
Back to top	Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to Login (17 KB \| ) Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sundy YaBB Newbies Offline Feed your monkey! Posts: 33	Re: KH test error Reply #4 - Feb 14^th, 2009 at 7:20am Dear Sergei, Thank you so much!!!! This problem was fixed Cheers, Sundy
Back to top	IP Logged

Sundy

YaBB Newbies

Offline

Feed your monkey!

Posts: 33

Re: KH test error
Reply #5 - Feb 14^th, 2009 at 8:30am

Dear Sergei,

I am sorry to trouble you again.
I have another question about the new "GARDProcessor.bf" you gived me.

When I replaced the old GARDProcessor.bf in TemplateBatchFiles with this new file. I don't have the problem as mentioned above. But when I run some my previous data which was found having several significant breakpoints with the old GARDProcessor.bf, I found that there were no significant breakpoints anymore.

So I am a little confused. which GARDProcessor.bf result are more credible?

For example, in the attached file, the old GARDProcessor.bf showed 4/5 significant, but the new GARDProcessor.bf indicated 0/5 significant.

Back to top

Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login (11 KB | )

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #6 - Feb 15^th, 2009 at 1:37pm Dear Sundy, The results from the new script are correct -- there was a bug in the older version. Some of the breakpoints in your alignment are still significant (i.e at 0.05 and not at 0.01) level which is the default in the script. I can modify the script to let you select the p-value if you'd like. Cheers, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sundy

YaBB Newbies

Offline

Feed your monkey!

Posts: 33

Re: KH test error
Reply #7 - Feb 15^th, 2009 at 4:33pm

Dear Sergei,

Thank you so much!
I will greatly appreciate if you can help me modify the script to select the p-value.

I have another question: Has the KH test in HyPhy already include Bonferroni’s correction, or not?

In the above file, the GARD and new KH test results are as following:

Breakpoint location LHS vs. RHS RHS vs. LHS
454 0.031 0.033
729 <0.001 0.022
1091 0.006 0.001
1927 0.001 0.026
3321 0.034 0.411

The KH test dosen't think site 1091 as significant, so I am thinking this is due to Bonferroni’s correction, although both p-value<0.01. After Bonferroni’s correction, 1091 is significant at 0.05 level, but not significant at 0.01 level. Is what I understand right?

As what I am understanding, for this 5 breakpoints test, p<0.01 equal to 0.05 significance, p<0.002 equal to 0.01 significance (0.01/5). Is this right?

Thank you very much!!

Sundy

Back to top

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: KH test error
Reply #8 - Feb 16^th, 2009 at 10:32am

Dear Sundy,

I've modified GARDProcessor (attached) to summarize KH p-values (both raw and Bonferroni-corrected) at the end of the run and to also report how many KH-significant breakpoints there are at 3 different significance levels. This should make the interpretation of GARD results easier.

Example output follows:

Code:

 Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
	 454 |   0.46570 |	  1.00000 |   0.00290 |	  0.02900
	 729 |   0.00020 |	  0.00200 |   0.00010 |	  0.00100
	1091 |   0.25400 |	  1.00000 |   0.00580 |	  0.05800
	1927 |   0.02710 |	  0.27100 |   0.22930 |	  1.00000
	3321 |   0.40520 |	  1.00000 |   0.00010 |	  0.00100

At p = 0.01 there are 1 significant breakpoints
At p = 0.05 there are 1 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

Cheers,
Sergei

Back to top

Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login (17 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Sundy

YaBB Newbies

Offline

Feed your monkey!

Posts: 33

Re: KH test error
Reply #9 - Feb 16^th, 2009 at 11:21am

Dear Sergei,

Thank you very very very much!!!
This is really great. It is easy to use for someone like me.

But I found another problem:
I run this dataset on my computer again, I got totally different results as you gived me above. (as following)
I thought maybe my HyPhy was not the latest version, then I downloaded the latest HyPhy (2008.5.8). But I still got these results.

Could you please help me figure out what's this problem?

Thank you so much!!
Best regards,

Sundy

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
454 | 0.03630 | 0.36300 | 0.03060 | 0.30600
729 | 0.02100 | 0.21000 | 0.00010 | 0.00100
1091 | 0.00100 | 0.01000 | 0.00780 | 0.07800
1927 | 0.02540 | 0.25400 | 0.00050 | 0.00500
3321 | 0.40440 | 1.00000 | 0.03530 | 0.35300

At p = 0.01 there are 0 significant breakpoints
At p = 0.05 there are 0 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

Mean splits identify: 0.15

Back to top

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #10 - Feb 16^th, 2009 at 12:42pm Dear Sundy, I am using a developmental version of HyPhy (not available for download yet). Let me confirm the results and get back to you. Cheers, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #11 - Feb 16^th, 2009 at 1:29pm Dear Sundy, Your version is giving the correct p-values. Thanks for alerting me to this discrepancy - I found a bug in the developmental version because of it. Cheers, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sundy YaBB Newbies Offline Feed your monkey! Posts: 33	Re: KH test error Reply #12 - Feb 16^th, 2009 at 1:42pm Dear Sergei, Thank you so much!! I finally figured out all my questions about GARD so far (with your help). I really appreciate you for your quick reply every time. Thanks!! Sundy
Back to top	IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: KH test error Reply #13 - Feb 18^th, 2009 at 12:10pm Dear Sundy, Thank you very much for bringing the discrepancy between your results and the ones I was getting with the prerelease version to my attention. This helped me identify a serious bug in the new likelihood evaluation routines. Thanks for helping make HyPhy a better product. Best, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Miguel Lacerda

YaBB Newbies

Offline

Feed your monkey!

Posts: 36
Natl Univ of Ireland, Galway
Gender: male

Re: KH test error
Reply #14 - Apr 28^th, 2009 at 4:46am

Dear Sergei

I have performed a GARD analysis and have noticed a few idiosyncrasies when processing the results using the latest versions of GARDProcessor.bf and KHTest.bf posted in this thread.

Firstly, running GARDProcessor.bf multiple times on the same dataset and splits file produces (slightly) different results. For example, here are the KH tests for 3 runs with the same input files:

Code:

RUN 1:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0044


Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001


Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
       507 |   0.00010 |        0.00020 |   0.00440 |        0.00880
----------------------------------------------------------------------

RUN 2:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0038


Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001


Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
       507 |   0.00010 |        0.00020 |   0.00380 |        0.00760
----------------------------------------------------------------------

RUN 3:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.31331109413;
Fitting tree 2 to partition 1
Log Likelihood = -4730.60159498021;
KH Testing partition 1
Tree 2 base LRT = 150.577. p-value = 0.0042


Fitting tree 1 to partition 2
Log Likelihood = -5043.72987443926;
Fitting tree 2 to partition 2
Log Likelihood = -4836.4869482817;
KH Testing partition 2
Tree 1 base LRT = 414.486. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
       507 |   0.00010 |        0.00020 |   0.00420 |        0.00840

------------------------------------------------------------------------

You'll notice that the likelihoods and p-values differ slightly between runs. Could this just be due to different starting values in the optimisation? (I'm actually just curious - the differences in the results are pedantic!)

Of more concern to me is that I get very different results when I run the processor file locally vs on my cluster with HYPHYMP_DEV SVN415 (probably due to different versions of HyPhy). I obtained the above results on my machine, but this is what I get from two runs on the cluster:

Code:

RUN 1:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.3469

Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
       507 |   0.00010 |        0.00020 |   0.34690 |        0.69380

---------------------------------------------------------------------


RUN 2:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.354


Fitting tree 1 to partition 2
Log Likelihood = -5043.73191448845;
Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
       507 |   0.00010 |        0.00020 |   0.35400 |        0.70800

Again, I have copied the GARDProcessor.bf and KHTest.bf files from this thread into TemplateBatchFiles on the cluster. As you can see, the breakpoint is no longer significant.

Please could you advise as to which of the above results is correct. I have attached the relevant files.

Thanks a lot as always!

Miguel

PS: While I'm here... I have one more question

I have a dataset with ~500 sequences and 1100 nucleotides - i.e. too few sequences relative to the number of sites in order to run GARD. So I have divided the dataset into random samples of 50 sequences and am running GARD on each of these smaller datasets. Is that what you would advise?

Thanks again....

Back to top

Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login (8 KB | )

IP Logged

	Welcome, Guest. Please Login