Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Maximum Sequence Length (Read 6500 times)
adk
Guest


Maximum Sequence Length
May 3rd, 2005 at 4:26pm
 
Hi there,

I'm trying to estimate parameters on an extremely big dataset (~154 megabases), but I get an memory full error:

** malloc: vm_allocate(size=160923648) failed (error code=3)
*** malloc[18470]: error: Can't allocate region

the errors.log file looks like this:

Error:
Memory Full Exiting...
Current BL Command:Read Data Set myData from file "../../../data/simulans/syntenic/pairwise/melw501.auto.nonCDS.cat.fa"

I am guessing that this dataset is too big for the default compilation of HYPHY. Is this true? If so, how could I compile it to handle this big a dataset?

cheers,
Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Maximum Sequence Length
Reply #1 - May 3rd, 2005 at 4:37pm
 
Dear Andy,

Quote:
I am guessing that this dataset is too big for the default compilation of HYPHY. Is this true? If so, how could I compile it to handle this big a dataset?



Never tried something quite this large. On Mac OS X there is nothing one can do to adjust maximum memory size for a process; 4GB is obviously the limit for a 32 bit system (non-G5). If I recall OS X also limits a per-process memory allocation to something like 1.5 GB. What is your computer configuration like?

How many sequences do you have? I can try to generate some random data with the same length and same data format and see where the memory allocation error happens.

To be fair, I never really optimized the data reading module to use the least amount of memory; perhaps I should make a code revision to improve the memory footprint.

Cheers,
Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: Maximum Sequence Length
Reply #2 - May 3rd, 2005 at 4:44pm
 
Hey,

So the dataset is only two sequences, each about 77 megabases long. The computer setup is a Dual G5 Xserve. How much memory should I need to open the data set? I'm taking it there is no hard coded limit in HYPHY then?

cheers,

Andy
Back to top
 
 
IP Logged
 
adk
Guest


Re: Maximum Sequence Length
Reply #3 - May 3rd, 2005 at 4:50pm
 
btw- our sys. admin. tells me that that server should be able to use 4gigs per process. If the data file is 149mb is it possible that it is taking up over 4gb of RAM?

cheers,
Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Maximum Sequence Length
Reply #4 - May 3rd, 2005 at 4:53pm
 
Dear Andy,

Quote:
So the dataset is only two sequences, each about 77 megabases long. The computer setup is a Dual G5 Xserve. How much memory should I need to open the data set? I'm taking it there is no hard coded limit in HYPHY then?



There is no hard coded limit. I just looked at the data reader code; my latest revision was optimized for speed of reading MSA (more than 2 sequences); the current incarnation is actually very memory inefficient for 2 long sequences.

I am actually doing something that will require me to read genome-size pairs of sequences, so I'm probably going to optimize memory usage for your scenario; stay tuned - should have more for you in a couple of days.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: Maximum Sequence Length
Reply #5 - May 3rd, 2005 at 5:07pm
 
Excellent Sergei! HYPHY is at the core of the phylogenetics analysis we are currently performing for an upoming drospohila genomics paper, so this addition would be extremely helpful!
cheers,
Andy

Back to top
 
 
IP Logged
 
Simon
Ex Member


Re: Maximum Sequence Length
Reply #6 - May 4th, 2005 at 11:25am
 
Dear Andy,

Are you looking at comparisons between large numbers of orthologous genes? If so, what kinds of things are you looking at? I ask, as we're trying to develop various methods for looking at dN/dS comparisons at a genome-wide level, similar to the work that Andy Clark and Rasmus Nielsen did on the Celera human/chimp data, which may come in handy. Feel free to email me or Sergei if you'd like to be a beta-tester.

Best,
Simon
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Maximum Sequence Length
Reply #7 - May 4th, 2005 at 5:51pm
 
Quote:
I'm probably going to optimize memory usage for your scenario; stay tuned - should have more for you in a couple of days.


I have rewritten some old parts of the code which were never designed with very long data sets it mind  Embarrassed. I was able to read in a 30 megabase human-chimp CDS with the new code (May 4th, 2005 build) in about 30 seconds and run nucleotide and codon model fits on it quickly in about 250 MB peak memory consumption. Give today's build a spin and let me know if it works...

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
avilella
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 35
Re: Maximum Sequence Length
Reply #8 - Nov 30th, 2005 at 10:20am
 
[quote author=Simon  link=1115162813/0#6 date=1115231149]Dear Andy,

Are you looking at comparisons between large numbers of orthologous genes? If so, what kinds of things are you looking at? I ask, as we're trying to develop various methods for looking at dN/dS comparisons at a genome-wide level, similar to the work that Andy Clark and Rasmus Nielsen did on the Celera human/chimp data, which may come in handy. Feel free to email me or Sergei if you'd like to be a beta-tester.

Best,
Simon[/quote]

Hi Simon and Sergei,

I stumbled upon this post as of today and wanted to say that I am
certainly doing that kind of comparisons in my research, and am
enormously interested into being a beta-tester of your methods.

This is a perfect timing at this point of my research, so I am really
interested in getting into this as soon as possible,

Looking forward to hearing from you,

Bests,

    Albert.
Back to top
 
 
IP Logged
 
avilella
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 35
Re: Maximum Sequence Length
Reply #9 - Dec 22nd, 2005 at 2:38pm
 
Hi all,

I have an input file of 10seqs x 4090497 for which trying to calculate a free-ratios/local model codon analysis.

After spending some hours consuming ~1g RAM, it stops without giving the results and without much indication on what happened in the logs.

Is that kind of file meant to be analysed without problem? Any idea of what might be happening?

Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Maximum Sequence Length
Reply #10 - Dec 22nd, 2005 at 3:51pm
 
Dear Albert,

Good catch. Some old experimental code related to the sorting the order of alignment columns (which should have been de-activated) was choking on a large codon data set (with loads of unique codon patterns). I'll fix it in the next build; in the meantime change line 7496 in likefunc.cpp from

Code:
checkParameter (useFullMST,kp,1.0);
 



to

Code:
checkParameter (useFullMST,kp,0.0);
 



recompile and try again.

Cheers,
Sergei

P.S. Yes, I am still putting together multi-gene analyses. They were originally written for 2 sequences only and the modification to more than 2 is a bit tedious Sad

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged