Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Pages: 1 2 
KH test on GARD analysis results (Read 9735 times)
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
KH test on GARD analysis results
Oct 18th, 2010 at 1:13am
 
Hi,

I am running the GARD analysis (GARD.bf) on a cluster in MPI mode. Now I want to perform a KH-test for testing significance of the detected breakpoints. I am not sure which TemplateBatchFiles can do this analysis in the same way as it is done on the Datamonkey server. I tried to run KHTest.bf as well as GARDProcessor.bf on my GARD analysis files. However, I am not sure which input I have to provide and which parameters to choose to get a meaningful KH-test analysis done.
The values I obtained so far for KHTest.bf analysis are different from the ones obtained on the Datamonkey server; and I often get the follwoing message when calculating p-values: ERROR: Both LRTs were expected to be positive. Please check your trees and partition. Also when running GARDProcessor.bf, I get an error:
"Tree 1 base LRT = 7.86179. p-value = 0.859
Tree 3 base LRT = 7.86178. p-value = Error:MPI Node:0
Operation MAccess is not defined for 0
Current BL Command:jvec[1][k]=vec2[k]"

How would I run a KH-test in a proper way on the GARD analysis? Which input files from my GARD analysis do I have to provide for such an analysis? How do I have to set the parameters?

Thanks for any help on this topic!

Phil
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #1 - Oct 19th, 2010 at 6:04pm
 
Hi Phil,

Take a look at this thread Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for some pointers and let me know if you experience further issues. The errors that you are seeing are unusual. Could you post the datamonkey job ID for me to look at and see what is going on?

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #2 - Oct 19th, 2010 at 11:10pm
 
Hi,

Thanks for your reply. The Job ID for such a GARD analysis is 709348183612766.1
However, on the datamonkey server the analysis is running fine. The problem occurs when I am running the GARDProcessor.bf analysis on our own server (I want to handle a few hundred files...). As input data for the GARDProcessor.bf I used  the output files "_finalout" and "_splits" from the GARD.bf anaylsis.
I executed the batch file as follwos: mpirun -n 2 ./HYPHYMPI GARDProcessor.bf (for mpi environment)

Thanks a lot for your help. If you need further information, let me know.

Cheers

Phil
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #3 - Oct 20th, 2010 at 5:46pm
 
Hi Phil,

You can execute GARDProcessor.bf in single processor environment (it doesn't make use of MPI), but that shouldn't matter.
I tried running your file through the current version of GARD.bf and GARDProcessor.bf and everything worked well.

Code:
Breakpoint | LHS Raw p | LHS adjusted p | RHS Raw p | RHS adjusted p
	 458 |   0.19220 |	  0.76880 |   0.03920 |	  0.15680
	 837 |   0.20450 |	  0.81800 |   0.19470 |	  0.77880

At p = 0.01 there are 0 significant breakpoints
At p = 0.05 there are 0 significant breakpoints
At p = 0.1 there are 0 significant breakpoints

Mean splits identify: 0.380952

 



Could you confirm the version of HyPhy (just launch HYPHYMP or HYPHYMPI from the install directory and look for the version as in the text below):

Code:
	 /HYPHY 2.0020101015beta(MP) for Linux on x86_64\
***************** TYPES OF STANDARD ANALYSES *****************


...
 



Could you also attach the _finalout and _splits files that gave you the error?

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #4 - Oct 20th, 2010 at 11:57pm
 
Hi,

The version I have installed:

/HYPHY 1.0020080508beta(MPI) for Linux on x86_64\      
***************** TYPES OF STANDARD ANALYSES *************

I attached the input and output files of the HYPHY GARD.bf analysis.

If I run HYPHY in a single processor environment the GARDProcessor analysis stops with the same error:

>Tree 3 base LRT = 8.47178. p-value = Error:MPI Node:0
>Operation MAccess is not defined for 0
>Current BL Command:jvec[1][k]=vec2[k]
>Segmentation fault


I used 'GARD_BQ00320.out_finalout' and 'GARD_BQ00320.out_splits' as input for the alignment file and the GA partition result file, respectively.

Thank yo so much for your support.

Phil
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (9 KB | )
 
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #5 - Oct 21st, 2010 at 2:43am
 
Hi

I think I could figure out what the problem might have been. In my _splits file, there are two returns before the different trees are described. I removed those returns and GARDProcessor.bf was running without errror. Additionally, I have seen that when comparing the nexus outfile and the splits file the nucleotide positions of the partitions are shifted by one!?
I am not sure whether the analysis was running properly. I attached the output as text file. May be you can have a look. Further, I can only find one p-Value for each partition. But, basically I should get two p-values for each breakpoint (RHS and LHS). How can I obtain the other two values? And, my values are different than the ones from your analysis!

Thanks for the help.

Philipp
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (47 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #6 - Oct 21st, 2010 at 6:07am
 
Hi Phil,

Looks like you are running an older version of HyPhy (based on the output).
Could you download and compile the latest 2.0 (not 1.0) version?

The off-by-one coordinates in NEXUS files are handled correctly (most of HyPhy internal coordinates are 0-based, and NEXUS is 1-based)

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #7 - Oct 23rd, 2010 at 2:01am
 
Hi

So, I installed the newer version of Hyphy and now the analysis seems to run smoothly. The output looks very much the same as in your case. Now I can start to analyzed my entire data sets!

Thanks a lot for all your help.

Cheers

Phil
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #8 - Oct 25th, 2010 at 12:12pm
 
Hi Phil,

I am glad the upgrade solved your problems. Please let me know if you run into any other issues.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #9 - Oct 26th, 2010 at 2:01pm
 
Hi Sergei

The GARD analysis is now running. However, the results fo the analysis(breakpoints and singificance) are somehow different when I run it on our own server (compared to the datamonkey server). I attached my input and output files for one example (test1). For this alignment, I obtained  one breakpoint with the datamonkey analysis (significant, p = 0.1), whereas 2 different breakpoints were obtained when running the analysis on our own server (they were both not significant). Also, I experience that the number and position of the identified breakpoints can be different when running the analysis repetitively. This might sound weird, but if e.g. I change the header names of my fasta input file, the output is different! If I use the same files on the datamonkey server, the output is the same, meaning that the problem seems to be restricted to the version I installed on our server.

Thanks for your help!

Phil
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (8 KB | )
 
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #10 - Oct 26th, 2010 at 2:08pm
 
[OOPS - sorry edited your post instead of responding to it. Sergei]

Hi Phil,

Definitely allow for rate variation, otherwise you may have GARD flagging segments that evolve faster or slower than others as recombinant (although the KH test should pick that up). I would recommend GTR + 3 bin general discrete distribution as a good default.

Sergei
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #11 - Oct 26th, 2010 at 4:38pm
 
Hi Phil,

This file is actually a good example of how not including rate variation leads to 'breakpoint' identification due to rate variation. You should expect some variability from run to run because the algorithm is stochastic and what is being reported to the screen is simply the 'best' model, there may be many others that are nearly equally good, but with breakpoints slightly moved (datamonkey reports this spread in breakpoint location plots). Also, as you may see from the run on your file (with no rate variation), there is almost to c-AIC difference between a model with 1 BP and a model with 2 BP (0.3), suggesting that they are almost equally good:

Code:
Breakpoints    c-AIC  Delta c-AIC [BP	1] [BP	2] [BP	3]
	    0  9061.54
	    1  9044.88	 16.657	   92
	    2  9044.60	  0.285	   92	   303
	    3  9044.60	  0.000	   92	   303
GA has considered	  1726/     8649124 (3922 over all runs) unique models
Total run time	     0 hrs 1 mins 7 seconds
Throughput		   58.54 models/second
Allocated time remaining 999 hrs 58 mins 53 seconds (approx. 2.1073e+08 more models.)

 



When you run this file WITH rate variation, no recombination breakpoints are found.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #12 - Oct 27th, 2010 at 3:16am
 
Hi Sergei

Ok, I will run my analysis in the future with GTRrev+ 3 bins. However, I experienced another issue which might be related. I want to analyze alignments consisting of only 4 sequences. If I run these files locally, GARD is detecting significant positive selection in a large proportion of the alignments. The Datamonkey version is detecing similar breakpoints, but they are not significant in none of the tested alignments. I have to say that I rather trust the output from datamonkey than my local version of GARD, as no significant recombination was detected for exactly the same genes, when the number of taxa included in the alignment was increased .
I have no idea why the KH testing gives a totally different output for the two analysis (Datamonkey vs local version). Also, the p-values are weird: for each of the significant breakpoints they are somehow the same. This is true for most of the alignments I have analyzed so far (significant recombination detected in more than >50% of the analyzed genes). I attached my input and output files for one exmaple (test2; either with four taxa or ten taxa). I also attached a screenshot from the datamonkey result output (in that case the p-values are all 1!?).

Thank you very much for all your help. I really appreciate that!

Best

Philipp
Back to top
« Last Edit: Oct 27th, 2010 at 1:32pm by Sergei »  
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (82 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: KH test on GARD analysis results
Reply #13 - Oct 27th, 2010 at 1:34pm
 
Hi Philipp,

Thanks very much for bringing this to my attention. GARDProcessor.bf in the distribution was not correctly processing the case of IDENTICAL trees between two successive partitions (when the KH p-value is obviously 1). This (and another bug I managed to introduce with a code push yesterday) have been fixed. Please download the latest HyPhy version and try again.

Your patience and reports of odd behavior are very much appreciated.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Phil485
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 9
Re: KH test on GARD analysis results
Reply #14 - Oct 27th, 2010 at 3:03pm
 
Hi Sergei

Thanks for your reply! I downloaded the following version and I still experience the same problem: GARD is predicting 2-3 significant breakpoints in the example I had sent you before.

/HYPHY 2.0020101022beta(MPI) for Linux on x86_64\
***************** TYPES OF STANDARD ANALYSES *****************

Is this the current version? I downloaded it from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.
v2.0 developmental build

Thanks

Philipp
Back to top
 
 
IP Logged
 
Pages: 1 2