HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Datamonkey Server >> Datamonkey feedback >> KH test error
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1234547144

Message started by Sundy on Feb 13th, 2009 at 9:45am

Title: KH test error
Post by Sundy on Feb 13th, 2009 at 9:45am
Dear everyone,

Did any one meet error like following, when you run KH test for GARD with HyPhy?

Operation MAccess is not defined for 0
Current BL command: jvec[1][k]=vec2[k] Current task has been terminated. Would you like to see the remaining error messages, if there are any?

I met this problem for several times recently. It always stop in the middle of analysis. I can not get the final results of KH test.
Could anyone tell me how to fix it? I appreciate!!

Best regards,

Sundy

Title: Re: KH test error
Post by Sergei on Feb 13th, 2009 at 11:31am
Dear Sundy,

Try using the attached file (drop it into TemplateBatchFiles).
There was a bug in the older versions of KHTest.bf that could result in the error that you are seeing.

Keep those bug reports coming:)

Cheers,
Sergei
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KHTest.bf (7 KB | )

Title: Re: KH test error
Post by Sundy on Feb 13th, 2009 at 1:41pm
Dear Sergei,

Thank you so much. But I am sorry that this error is still there.
The attached file is my sequence and GARD.splits file. Could you please have a look of them, whether something wrong with these files?

Thank you!

Best regards,

Sundy
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_001.zip (7 KB | )

Title: Re: KH test error
Post by Sergei on Feb 13th, 2009 at 3:27pm
Dear Sundy,

Please try replacing GARDProcessor.bf in TemplateBatchFiles with the file I attach. I was able to run your example through the script without problems.

Cheers,
Sergei
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARDProcessor_001.bf (17 KB | )

Title: Re: KH test error
Post by Sundy on Feb 14th, 2009 at 7:20am
Dear Sergei,

Thank you so much!!!! This problem was fixed :D :D
Cheers,

Sundy

Title: Re: KH test error
Post by Sundy on Feb 14th, 2009 at 8:30am
Dear Sergei,

I am sorry to trouble you again.
I have another question about the new "GARDProcessor.bf" you gived me.

When I replaced the old GARDProcessor.bf in TemplateBatchFiles with this new file. I don't have the problem as mentioned above. But when I run some my previous data which was found having several significant breakpoints with the old GARDProcessor.bf, I found that there were no significant breakpoints anymore.

So I am a little confused. which GARDProcessor.bf result are more credible?

For example, in the attached file, the old GARDProcessor.bf showed 4/5 significant, but the new GARDProcessor.bf indicated 0/5 significant.


http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=10ORF.zip (11 KB | )

Title: Re: KH test error
Post by Sergei on Feb 15th, 2009 at 1:37pm
Dear Sundy,

The results from the new script are correct -- there was a bug in the older version. Some of the breakpoints in your alignment are still significant (i.e at 0.05 and not at 0.01) level which is the default in the script. I can modify the script to let you select the p-value if you'd like.

Cheers,
Sergei

Title: Re: KH test error
Post by Sundy on Feb 15th, 2009 at 4:33pm
Dear Sergei,

Thank you so much!
I will greatly appreciate if you can help me modify the script to select the p-value.

I have another question: Has the KH test in HyPhy already include Bonferroni’s correction, or not?

In the above file, the GARD and new KH test results are as following:
Breakpoint location      LHS vs. RHS      RHS vs. LHS
     454                        0.031            0.033
     729                        <0.001            0.022
     1091                        0.006            0.001
     1927                        0.001            0.026
     3321                        0.034            0.411
The KH test dosen't think site 1091 as significant, so I am thinking this is due to Bonferroni’s correction, although both p-value<0.01. After Bonferroni’s correction, 1091 is significant at 0.05 level, but not significant at 0.01 level. Is what I understand right?

As what I am understanding, for this 5 breakpoints test, p<0.01 equal to 0.05 significance, p<0.002 equal to 0.01 significance (0.01/5). Is this right?  

Thank you very much!!

Sundy

Title: Re: KH test error
Post by Sergei on Feb 16th, 2009 at 10:32am
Dear Sundy,

I've modified GARDProcessor (attached) to summarize KH p-values (both raw and Bonferroni-corrected) at the end of the run and to also report how many KH-significant breakpoints there are at 3 different significance levels. This should make the interpretation of GARD results easier.

Example output follows:

[code]
Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      454 |   0.46570 |        1.00000 |   0.00290 |        0.02900
      729 |   0.00020 |        0.00200 |   0.00010 |        0.00100
     1091 |   0.25400 |        1.00000 |   0.00580 |        0.05800
     1927 |   0.02710 |        0.27100 |   0.22930 |        1.00000
     3321 |   0.40520 |        1.00000 |   0.00010 |        0.00100

At p = 0.01 there are 1 significant breakpoints
At p = 0.05 there are 1 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

[/code]

Cheers,
Sergei
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARDProcessor_002.bf (17 KB | )

Title: Re: KH test error
Post by Sundy on Feb 16th, 2009 at 11:21am
Dear Sergei,

Thank you very very very much!!!
This is really great. It is easy to use for someone like me.

But I found another problem:
I run this dataset on my computer again, I got totally different results as you gived me above. (as following)
I thought maybe my HyPhy was not the latest version, then I downloaded the latest HyPhy (2008.5.8). But I still got these results.

Could you please help me figure out what's this problem?

Thank you so much!!
Best regards,

Sundy


Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      454 |   0.03630 |        0.36300 |   0.03060 |        0.30600
      729 |   0.02100 |        0.21000 |   0.00010 |        0.00100
     1091 |   0.00100 |        0.01000 |   0.00780 |        0.07800
     1927 |   0.02540 |        0.25400 |   0.00050 |        0.00500
     3321 |   0.40440 |        1.00000 |   0.03530 |        0.35300

At p = 0.01 there are 0 significant breakpoints
At p = 0.05 there are 0 significant breakpoints
At p = 0.1 there are 1 significant breakpoints

Mean splits identify: 0.15


Title: Re: KH test error
Post by Sergei on Feb 16th, 2009 at 12:42pm
Dear Sundy,

I am using a developmental version of HyPhy (not available for download yet). Let me confirm the results and get back to you.

Cheers,
Sergei

Title: Re: KH test error
Post by Sergei on Feb 16th, 2009 at 1:29pm
Dear Sundy,

Your version is giving the correct p-values. Thanks for alerting me to this discrepancy - I found a bug in the developmental version because of it.

Cheers,
Sergei

Title: Re: KH test error
Post by Sundy on Feb 16th, 2009 at 1:42pm
Dear Sergei,

Thank you so much!! :D :D :D
I finally figured out all my questions about GARD so far (with your help).
I really appreciate you for your quick reply every time.
Thanks!!

Sundy

Title: Re: KH test error
Post by Sergei on Feb 18th, 2009 at 12:10pm
Dear Sundy,

Thank you very much for bringing the discrepancy between your results and the ones I was getting with the prerelease version to my attention. This helped me identify a serious bug in the new likelihood evaluation routines. Thanks for helping make HyPhy a better product.

Best,
Sergei

Title: Re: KH test error
Post by Miguel Lacerda on Apr 28th, 2009 at 4:46am
Dear Sergei

I have performed a GARD analysis and have noticed a few idiosyncrasies when processing the results using the latest versions of GARDProcessor.bf and KHTest.bf posted in this thread.

Firstly, running GARDProcessor.bf multiple times on the same dataset and splits file produces (slightly) different results. For example, here are the KH tests for 3 runs with the same input files:


Code (]
RUN 1:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0044


Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001


Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      507 |   0.00010 |        0.00020 |   0.00440 |        0.00880
----------------------------------------------------------------------

RUN 2:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.72380305908;
Fitting tree 2 to partition 1
Log Likelihood = -4730.55612145537;
KH Testing partition 1
Tree 2 base LRT = 149.665. p-value = 0.0038


Fitting tree 1 to partition 2
Log Likelihood = -5043.76888590518;
Fitting tree 2 to partition 2
Log Likelihood = -4836.52025035318;
KH Testing partition 2
Tree 1 base LRT = 414.497. p-value = 0.0001


Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      507 |   0.00010 |        0.00020 |   0.00380 |        0.00760
----------------------------------------------------------------------

RUN 3:
----------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.31331109413;
Fitting tree 2 to partition 1
Log Likelihood = -4730.60159498021;
KH Testing partition 1
Tree 2 base LRT = 150.577. p-value = 0.0042


Fitting tree 1 to partition 2
Log Likelihood = -5043.72987443926;
Fitting tree 2 to partition 2
Log Likelihood = -4836.4869482817;
KH Testing partition 2
Tree 1 base LRT = 414.486. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      507 |   0.00010 |        0.00020 |   0.00420 |        0.00840

------------------------------------------------------------------------
[/code):

You'll notice that the likelihoods and p-values differ slightly between runs. Could this just be due to different starting values in the optimisation? (I'm actually just curious - the differences in the results are pedantic!)

Of more concern to me is that I get very different results when I run the processor file locally vs on my cluster with HYPHYMP_DEV SVN415 (probably due to different versions of HyPhy). I obtained the above results on my machine, but this is what I get from two runs on the cluster:

[code]
RUN 1:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.3469

Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      507 |   0.00010 |        0.00020 |   0.34690 |        0.69380

---------------------------------------------------------------------


RUN 2:
---------------------------------------------------------------------
Fitting tree 1 to partition 1
Log Likelihood = -4655.74764230272;
Fitting tree 2 to partition 1
Log Likelihood = -4730.57601129128;
KH Testing partition 1
Tree 2 base LRT = 149.657. p-value = 0.354


Fitting tree 1 to partition 2
Log Likelihood = -5043.73191448845;
Fitting tree 2 to partition 2
Log Likelihood = -4836.49821145768;
KH Testing partition 2
Tree 1 base LRT = 414.467. p-value = 0.0001

Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
      507 |   0.00010 |        0.00020 |   0.35400 |        0.70800


Again, I have copied the GARDProcessor.bf and KHTest.bf files from this thread into TemplateBatchFiles on the cluster. As you can see, the breakpoint is no longer significant.

Please could you advise as to which of the above results is correct. I have attached the relevant files.

Thanks a lot as always!

Miguel

PS: While I'm here... I have one more question :)

I have a dataset with ~500 sequences and 1100 nucleotides - i.e. too few sequences relative to the number of sites in order to run GARD. So I have divided the dataset into random samples of 50 sequences and am running GARD on each of these smaller datasets. Is that what you would advise?

Thanks again....






http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=GARD.zip (8 KB | )

Title: Re: KH test error
Post by Sergei on Apr 28th, 2009 at 11:54am
Dear Miguel,

Could you running the same code on my cluster (you should have an account) there with the current HYPHYMPI systemwide command to confirm that the error is still there in the newest version.

Also, small variations in p-values are expected from run to run, because the test is permutation based (10000 replicates).

Sergei

Title: Re: KH test error
Post by Miguel Lacerda on Apr 29th, 2009 at 4:40am

Quote:
Could you running the same code on my cluster (you should have an account) there with the current HYPHYMPI systemwide command to confirm that the error is still there in the newest version.

I wanted to run it on your cluster but seem to be having problems connecting:

ssh_exchange_identification: Connection closed by remote host



Quote:
Also, small variations in p-values are expected from run to run, because the test is permutation based (10000 replicates).

Thanks - that makes sense!

Miguel

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.