Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
GA branch method error (Read 4914 times)
Kenes
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
GA branch method error
Feb 23rd, 2011 at 9:28am
 
Hello,

I'm trying to perform a GA branch analysis method on 32 sequences of 1869 sequences so we had to build a version on our own cluster. Our cluster now has support for HYPHYMPI albeit be it in a non-interactive mode so arguments have to be fed prior to sumbitting the job. Clusterjobs kept dying without any error log files so I therefore tried running this analysis with HYPHYMP on a high-memory machine to see if there were any discernable errors. The following is the output from such a run:

--------------------------------------------------------------
$ ./HYPHYMP ModelSelectorBranchLocal.bf < tempinput


                       +-------------------+
                       |Choose Genetic Code|
                       +-------------------+


       (1):[Universal] Universal code. (Genebank transl_table=1).
       (2):[Vertebrate mtDNA] Vertebrate mitochondrial DNA code. (Genebank transl_table=2).
       (3):[Yeast mtDNA] Yeast mitochondrial DNA code. (Genebank transl_table=3).
       (4):[Mold/Protozoan mtDNA] Mold, Protozoan and Coelenterate mitochondrial DNA and the Mycloplasma/Spiroplasma code. (Genebank transl_table=4).
       (5):[Invertebrate mtDNA] Invertebrate mitochondrial DNA code. (Genebank transl_table=5).
       (6):[Ciliate Nuclear] Ciliate, Dasycladacean and Hexamita Nuclear code. (Genebank transl_table=6).
       (7):[Echinoderm mtDNA] Echinoderm mitochondrial DNA code. (Genebank transl_table=9).
       (8):[Euplotid Nuclear] Euplotid Nuclear code. (Genebank transl_table=10).
       (9):[Alt. Yeast Nuclear] Alternative Yeast Nuclear code. (Genebank transl_table=12).
       (10):[Ascidian mtDNA] Ascidian mitochondrial DNA code. (Genebank transl_table=13).
       (11):[Flatworm mtDNA] Flatworm mitochondrial DNA code. (Genebank transl_table=14).
       (12):[Blepharisma Nuclear] Blepharisma Nuclear code. (Genebank transl_table=15).
       (13):[Chlorophycean mtDNA] Chlorophycean Mitochondrial Code (transl_table=16).
       (14):[Trematode mtDNA] Trematode Mitochondrial Code (transl_table=21).
       (15):[Scenedesmus obliquus mtDNA] Scenedesmus obliquus mitochondrial Code (transl_table=22).
       (16):[Thraustochytrium mtDNA] Thraustochytrium Mitochondrial Code (transl_table=23).

Please choose an option (or press q to cancel selection):
$PATH/Codon file to analyze::
$PATH/Please select a tree file for the data::
Nucleotide a 6-character model string specification (e.g. HKY85 = 010010):?
[Preliminary Step 1] A total of 61 branches. Fitting a nucleotide model to approximate branch lengths...
[Preliminary Step 2] Nucleotide LogL = -30752.5
[Preliminary Step 3] Base model has 76 parameters. Log-L = -27319.3. Mean dN/dS = 0.0816161
Error:
Invalid Matrix Index [0][0] in a -2 by 2 matrix.

Function call stack
1 : sortedScores[0][0]=-crapAIC
-----------------------------------------------------------------

The file tempinput contains the input arguments (could also be given in on the prompt but I chose this option to ensure the problem lies not with the structure of this file):

-------------------------------------
1
nuc_aligned_noemptyrows.fasta
tree.newick
012345
---------------------------------------

nuc_aligned_noemptyrows.fasta is a valid fasta file, aligned for codon positions
tree.newick contains a valid newick string of species in the fasta file with branch lenghts included

I suspect this error I get is the reason why my analysis won't start properly in MPI mode (assuming this error is not caused by the fact this example is run on a single machine of course). Can anybody help me with this?
Thanks

Kevin
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA branch method error
Reply #1 - Feb 23rd, 2011 at 10:54am
 
Hi Kevin,

The error you are seeing is because the analysis assumes an MPI environment, does not do error checking for having >1 machine and dies when only 1 machine is present. The error occurs once the analysis is running already, hence your configuration seems OK. I would suggest passing absolute paths to the alignment and the tree files -- the current working directory for MPI jobs/HyPhy environment is frequently not the same one as it is for interactive sessions.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Kenes
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: GA branch method error
Reply #2 - Feb 25th, 2011 at 8:46am
 
Thank you for your reply!

We had to do some searching and turned out it indeed had something to do with the path structure as you suggested. Our infrastructure uses a module load system that immediately loads all the correct binaries and libraries (with the underlying path structure) but the new BranchGA files I downloaded were not put in the right library. After fixing this we got the script working in MPI mode.
I however fear I have another question/problem. After the analysis was finished I tried post-processing the script (on a single high-memory machine) and I get the following error:


$ HYPHYMP BranchGAResultProcessor.bf

$path_to_hyphy/Please locate the output of a GA branch analysis::nuc_aligned_noemptyrows.fasta_ga_branch.out

[PHASE 0] Processing nuc_aligned_noemptyrows.fasta_ga_branch.out

$path_to_hyphy/Save HTML results to:results
[PHASE 1] Reading raw data
       Read 27460 model fits

[PHASE 2] Tabulating Rates
       Best overall c-AIC      = 54587.7
       Best c-AIC with 1 rates = 54812.000. Rate estimates (branches):    0.0816 (  61)
       Best c-AIC with 2 rates = 54624.500. Rate estimates (branches):    0.1705 (  33)   0.0518 (  28)
       Best c-AIC with 3 rates = 54601.500. Rate estimates (branches):    0.1830 (  27)   0.0475 (  21)   0.0802 (  13)
       Best c-AIC with 4 rates = 54594.200. Rate estimates (branches):    0.1669 (  17)   0.0447 (  17)   0.2267 (  13)   0.0676 (  14)
       Best c-AIC with 5 rates = 54589.000. Rate estimates (branches):    0.1724 (  14)   0.0438 (  14)   0.0587 (  11)   0.2351 (  13)   0.0901 (   9)
       Best c-AIC with 6 rates = 54587.700. Rate estimates (branches):    0.1723 (  15)   0.0440 (  13)   0.0901 (  11)   0.2350 (  11)   0.0000 (   2)   0.0587 (   9)
       Best c-AIC with 7 rates = 54588.500. Rate estimates (branches):    0.1721 (  15)   0.0447 (  12)   0.0306 (   1)   0.0901 (  10)   0.2348 (  11)   0.0000 (   2)   0.0587 (  10)

[PHASE 3] Computing Akaike Weights and Model Averaged Rate Estimates


[PHASE 4] Computing Confidence Sets and Generating Result Files
       All models in the 95% confidence set (CS):    3923
       Highest c-AIC in the CS: 54594.60
       Models in the 95% CS with 1 rates =      0
       Models in the 95% CS with 2 rates =      0
       Models in the 95% CS with 3 rates =      0
       Models in the 95% CS with 4 rates =    118
       Models in the 95% CS with 5 rates =   2421
       Models in the 95% CS with 6 rates =   1043
       Models in the 95% CS with 7 rates =    341

[PHASE 5] Making a PostScript tree plot
Error:
Operation 'MAccess' is not implemented/defined for a Number

Function call stack
1 : psLegend*(""+colorMx[0]+" "+colorMx[1]+" "+colorMx[2]+" setrgbcolor\n"+currentPosX+" "+currentPosY+" moveto\n")

Seeing how all scripts on our side should be in place now, could this be a hyphy problem? If necessary, I can attach the nuc_aligned_noemptyrows.fasta_ga_branch.out file but it seems to be a valid output file upon inspection.

K
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA branch method error
Reply #3 - Feb 25th, 2011 at 8:52am
 
Hi Kevin,

Yes, indeed, this is a HyPhy bug. Basically the old version of GAProcessor couldn't correctly render trees with more than 5 branch classes. Try using the file I am attaching here...

Sergei
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (15 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Kenes
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: GA branch method error
Reply #4 - Feb 25th, 2011 at 9:51am
 
Thanks again for your swift reply.

The new script works and processing runs properly until the end. However could it be there is still an error in the generation of the .ps plot file? When I open this file I get half always half my tree with coloured branches and all values but to 0 percent. I don't think this is due to my viewer or anything as the lysozyme.ps example in the GAbranch packet opens up just fine. I can again include the .ps file if this should be necessary of course. Other output files seem OK

K
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA branch method error
Reply #5 - Feb 25th, 2011 at 9:54am
 
Hi Kevin,

Can you attach your result file (zipped) -- I'll see what the issue is.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Kenes
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: GA branch method error
Reply #6 - Feb 25th, 2011 at 10:02am
 
Here it is.

K
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (2 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GA branch method error
Reply #7 - Feb 25th, 2011 at 10:07am
 
Hi Kevin,

The reason you are seeing half a tree is because the PS file output by HyPhy is non-standard size (taller than a page). Try running ps2pdf on your machine to convert it to a PDF file using ghostview. The rest of the output looks fine: you are seeing 0% labels over tree branches because no rate class has dN>dS, hence no branch is on average positively selected.

Incidentally, we have a new method for detecting episodic selection -- if that's what you are interested in, run HyPhy->standard analyses->Positive Selection->BranchSiteREL.bf (the first option). You don't need MPI for this analysis. Take a look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for an example of use.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Kenes
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: GA branch method error
Reply #8 - Feb 25th, 2011 at 10:14am
 
Thanks again.

As you can see my technical knowlodge as a biologist sometimes comes short but with that we have everything we need to finish this analysis. Our purpose indeed also lies within finding epsodic bursts of selection so we're definitely also are going to explore this option, especially now that the whole HYPHY package is fully functional on all our servers and shares  Smiley

K
Back to top
 
 
IP Logged