Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Huge data (Read 3945 times)
KE
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Huge data
May 21st, 2007 at 6:04am
 
Dear Sergei,

I tried to apply AnalyseCodonData.bf (MG94, global) via Analyses menu to a huge file: approx. 10^7 nt and 9 species. Everything went well until 99% of likelihood function optimisation, when I got an error message:

Failed to allocate 1086581776 bytes. Current BL Command: Optimize storing into, res, the following likelihood function: lf;

What happened? Is it possible to get the results somehow, or that file is too huge? At least using another method/more powerful computer? And is it possible to run FEL on such huge files?

Many thanks,
Kate
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Huge data
Reply #1 - May 21st, 2007 at 7:52am
 
Dear Kate,

I am not sure why you got the memory error; 10MBase analyses on 9 sequenes should not take very much memory actually. A few things:

1). Do you have the latest HYPHY version? I doubt that this is the problem, but if you have the latest version, then we can talk about the same version. Try running dNdSRatesAnalysis.bf under Codon Selection Analyses (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for details - section 2).

2). FEL is probably not the best choice, because it's complexity is proportional to the length of the alignment; it will take a long time! REL is probably the way to go.

3). What computer/OS are you running HyPhy on?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
KE
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Re: Huge data
Reply #2 - May 23rd, 2007 at 5:41am
 
Dear Sergei,

I've installed HyPhy on 22.02.2007. My computer's got AMD Sempron processor 3100+ 1.8 GHz, 512 MB RAM and Windows XP Professional SP2. I used HYPHY-AthlonXP.exe

When I use REL, how many bins and which distribution should I take to get interpretable results on 9 sequences for different lengths from several hundreds nt to 10MB? Can using many bins replace FEL (and is speaking about replacing a continuous distribution by a discrete optimisation correct here)? And what are the rules for choosing the parameters in general?

Thank you very much for the manual, I'm going to work it over.

Kate
Back to top
« Last Edit: May 23rd, 2007 at 7:02am by KE »  
 
IP Logged
 
KE
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Re: Huge data
Reply #3 - May 23rd, 2007 at 6:00am
 
I tried dndsRatesAnalysis.bf and got the same error message, unfortunately.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Huge data
Reply #4 - May 23rd, 2007 at 6:20am
 
Dear Kate,

This is odd; could be a bug in a recent version. I would like to try debugging the code on your file. Could you perhaps compress the alignment and e-mail it to me for testing?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
KE
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Re: Huge data
Reply #5 - May 23rd, 2007 at 8:51am
 
I've e-mailed it  Smiley
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Huge data
Reply #6 - May 23rd, 2007 at 12:07pm
 
Dear Kate,

Try editing the source of AnalyzeCodonData.bf (inside TemplateBatchFiles) to have these two lines (first line optional, second line needed to avoid memory overflow).

Code:
OPTIMIZE_SUMMATION_ORDER_PARTITION = 500;
CACHE_SUBTREES					 = 0;
 



You data has about 1,000,000 unique codon patterns, and the second cache allocator was asking for something like 1,000,000*3*61*sizeof (double) bytes to speed up likelihood calculations - that step caused the memory error.

Without the caching scheme the calculations will be a bit slower, but not orders of magnitude slower.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
KE
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 5
Re: Huge data
Reply #7 - May 28th, 2007 at 1:42am
 
Thank you very much!!! It works  Smiley
Back to top
 
 
IP Logged