HyPhy message board - Print Page

Dear everyone,

Did any one meet error like following, when you run KH test for GARD with HyPhy?

Operation MAccess is not defined for 0
Current BL command: jvec[1][k]=vec2[k] Current task has been terminated. Would you like to see the remaining error messages, if there are any?

I met this problem for several times recently. It always stop in the middle of analysis. I can not get the final results of KH test.
Could anyone tell me how to fix it? I appreciate!!

Best regards,

Sundy

Dear Sundy,

Try using the attached file (drop it into TemplateBatchFiles).
There was a bug in the older versions of KHTest.bf that could result in the error that you are seeing.

Keep those bug reports coming:)

Cheers,
Sergei

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KHTest.bf (7 KB | )

Dear Sergei,

Thank you so much. But I am sorry that this error is still there.
The attached file is my sequence and GARD.splits file. Could you please have a look of them, whether something wrong with these files?

Thank you!

Best regards,

Sundy

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_001.zip (7 KB | )

Dear Sundy,

Please try replacing GARDProcessor.bf in TemplateBatchFiles with the file I attach. I was able to run your example through the script without problems.

Cheers,
Sergei

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARDProcessor_001.bf (17 KB | )

Dear Sergei,

Thank you so much!!!! This problem was fixed :D :D
Cheers,

Sundy

Dear Sergei,

I am sorry to trouble you again.
I have another question about the new "GARDProcessor.bf" you gived me.

When I replaced the old GARDProcessor.bf in TemplateBatchFiles with this new file. I don't have the problem as mentioned above. But when I run some my previous data which was found having several significant breakpoints with the old GARDProcessor.bf, I found that there were no significant breakpoints anymore.

So I am a little confused. which GARDProcessor.bf result are more credible?

For example, in the attached file, the old GARDProcessor.bf showed 4/5 significant, but the new GARDProcessor.bf indicated 0/5 significant.

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=10ORF.zip (11 KB | )

Dear Sundy,

The results from the new script are correct -- there was a bug in the older version. Some of the breakpoints in your alignment are still significant (i.e at 0.05 and not at 0.01) level which is the default in the script. I can modify the script to let you select the p-value if you'd like.

Cheers,
Sergei

Dear Sergei,

Thank you so much!
I will greatly appreciate if you can help me modify the script to select the p-value.

I have another question: Has the KH test in HyPhy already include Bonferroni’s correction, or not?

In the above file, the GARD and new KH test results are as following:
Breakpoint location LHS vs. RHS RHS vs. LHS
454 0.031 0.033
729 <0.001 0.022
1091 0.006 0.001
1927 0.001 0.026
3321 0.034 0.411
The KH test dosen't think site 1091 as significant, so I am thinking this is due to Bonferroni’s correction, although both p-value<0.01. After Bonferroni’s correction, 1091 is significant at 0.05 level, but not significant at 0.01 level. Is what I understand right?

As what I am understanding, for this 5 breakpoints test, p<0.01 equal to 0.05 significance, p<0.002 equal to 0.01 significance (0.01/5). Is this right?

Thank you very much!!

Sundy

Dear Sundy,

I've modified GARDProcessor (attached) to summarize KH p-values (both raw and Bonferroni-corrected) at the end of the run and to also report how many KH-significant breakpoints there are at 3 different significance levels. This should make the interpretation of GARD results easier.

Example output follows:

[code]
Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
454 | 0.46570 | 1.00000 | 0.00290 | 0.02900
729 | 0.00020 | 0.00200 | 0.00010 | 0.00100
1091 | 0.25400 | 1.00000 | 0.00580 | 0.05800
1927 | 0.02710 | 0.27100 | 0.22930 | 1.00000
3321 | 0.40520 | 1.00000 | 0.00010 | 0.00100

At p = 0.01 there are 1 significant breakpoints
At p = 0.05 there are 1 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

[/code]

Cheers,
Sergei

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARDProcessor_002.bf (17 KB | )

Dear Sergei,

Thank you very very very much!!!
This is really great. It is easy to use for someone like me.

But I found another problem:
I run this dataset on my computer again, I got totally different results as you gived me above. (as following)
I thought maybe my HyPhy was not the latest version, then I downloaded the latest HyPhy (2008.5.8). But I still got these results.

Could you please help me figure out what's this problem?

Thank you so much!!
Best regards,

Sundy

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
454 | 0.03630 | 0.36300 | 0.03060 | 0.30600
729 | 0.02100 | 0.21000 | 0.00010 | 0.00100
1091 | 0.00100 | 0.01000 | 0.00780 | 0.07800
1927 | 0.02540 | 0.25400 | 0.00050 | 0.00500
3321 | 0.40440 | 1.00000 | 0.03530 | 0.35300

At p = 0.01 there are 0 significant breakpoints
At p = 0.05 there are 0 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

Mean splits identify: 0.15

Dear Sundy,

I am using a developmental version of HyPhy (not available for download yet). Let me confirm the results and get back to you.

Cheers,
Sergei

Dear Sundy,

Your version is giving the correct p-values. Thanks for alerting me to this discrepancy - I found a bug in the developmental version because of it.

Cheers,
Sergei

Dear Sergei,

Thank you so much!! :D :D :D
I finally figured out all my questions about GARD so far (with your help).
I really appreciate you for your quick reply every time.
Thanks!!

Sundy

Dear Sundy,

Thank you very much for bringing the discrepancy between your results and the ones I was getting with the prerelease version to my attention. This helped me identify a serious bug in the new likelihood evaluation routines. Thanks for helping make HyPhy a better product.

Best,
Sergei

Dear Sergei

I have performed a GARD analysis and have noticed a few idiosyncrasies when processing the results using the latest versions of GARDProcessor.bf and KHTest.bf posted in this thread.

Firstly, running GARDProcessor.bf multiple times on the same dataset and splits file produces (slightly) different results. For example, here are the KH tests for 3 runs with the same input files:

Code (]
RUN 1:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0044

Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
507 | 0.00010 | 0.00020 | 0.00440 | 0.00880
----------------------------------------------------------------------

RUN 2:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0038

Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
507 | 0.00010 | 0.00020 | 0.00380 | 0.00760
----------------------------------------------------------------------

RUN 3:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.31331109413;
Fitting tree 2 to partition 1
Log Likelihood = -4730.60159498021;
KH Testing partition 1
Tree 2 base LRT = 150.577. p-value = 0.0042

Fitting tree 1 to partition 2
Log Likelihood = -5043.72987443926;
Fitting tree 2 to partition 2
Log Likelihood = -4836.4869482817;
KH Testing partition 2
Tree 1 base LRT = 414.486. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
507 | 0.00010 | 0.00020 | 0.00420 | 0.00840

------------------------------------------------------------------------
[/code):

You'll notice that the likelihoods and p-values differ slightly between runs. Could this just be due to different starting values in the optimisation? (I'm actually just curious - the differences in the results are pedantic!)

Of more concern to me is that I get very different results when I run the processor file locally vs on my cluster with HYPHYMP_DEV SVN415 (probably due to different versions of HyPhy). I obtained the above results on my machine, but this is what I get from two runs on the cluster:

[code]
RUN 1:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.3469

Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
507 | 0.00010 | 0.00020 | 0.34690 | 0.69380

---------------------------------------------------------------------

RUN 2:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.354

Fitting tree 1 to partition 2
Log Likelihood = -5043.73191448845;
Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
507 | 0.00010 | 0.00020 | 0.35400 | 0.70800

Again, I have copied the GARDProcessor.bf and KHTest.bf files from this thread into TemplateBatchFiles on the cluster. As you can see, the breakpoint is no longer significant.

Please could you advise as to which of the above results is correct. I have attached the relevant files.

Thanks a lot as always!

Miguel

PS: While I'm here... I have one more question :)

I have a dataset with ~500 sequences and 1100 nucleotides - i.e. too few sequences relative to the number of sites in order to run GARD. So I have divided the dataset into random samples of 50 sequences and am running GARD on each of these smaller datasets. Is that what you would advise?

Thanks again....

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARD.zip (8 KB | )

Dear Miguel,

Could you running the same code on my cluster (you should have an account) there with the current HYPHYMPI systemwide command to confirm that the error is still there in the newest version.

Also, small variations in p-values are expected from run to run, because the test is permutation based (10000 replicates).

Sergei

Quote:

Could you running the same code on my cluster (you should have an account) there with the current HYPHYMPI systemwide command to confirm that the error is still there in the newest version.

I wanted to run it on your cluster but seem to be having problems connecting:

ssh_exchange_identification: Connection closed by remote host

Quote:

Also, small variations in p-values are expected from run to run, because the test is permutation based (10000 replicates).

Thanks - that makes sense!

Miguel

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl Datamonkey Server >> Datamonkey feedback >> KH test error http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1234547144 Message started by Sundy on Feb 13^th, 2009 at 9:45am