HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
HYPHY Package >> HyPhy feedback >> Maximum Sequence Length http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1115162813 Message started by adk on May 3rd, 2005 at 4:26pm |
Title: Maximum Sequence Length Post by adk on May 3rd, 2005 at 4:26pm
Hi there,
I'm trying to estimate parameters on an extremely big dataset (~154 megabases), but I get an memory full error: ** malloc: vm_allocate(size=160923648) failed (error code=3) *** malloc[18470]: error: Can't allocate region the errors.log file looks like this: Error: Memory Full Exiting... Current BL Command:Read Data Set myData from file "../../../data/simulans/syntenic/pairwise/melw501.auto.nonCDS.cat.fa" I am guessing that this dataset is too big for the default compilation of HYPHY. Is this true? If so, how could I compile it to handle this big a dataset? cheers, Andy |
Title: Re: Maximum Sequence Length Post by Sergei on May 3rd, 2005 at 4:37pm
Dear Andy,
wrote on May 3rd, 2005 at 4:26pm:
Never tried something quite this large. On Mac OS X there is nothing one can do to adjust maximum memory size for a process; 4GB is obviously the limit for a 32 bit system (non-G5). If I recall OS X also limits a per-process memory allocation to something like 1.5 GB. What is your computer configuration like? How many sequences do you have? I can try to generate some random data with the same length and same data format and see where the memory allocation error happens. To be fair, I never really optimized the data reading module to use the least amount of memory; perhaps I should make a code revision to improve the memory footprint. Cheers, Sergei |
Title: Re: Maximum Sequence Length Post by adk on May 3rd, 2005 at 4:44pm
Hey,
So the dataset is only two sequences, each about 77 megabases long. The computer setup is a Dual G5 Xserve. How much memory should I need to open the data set? I'm taking it there is no hard coded limit in HYPHY then? cheers, Andy |
Title: Re: Maximum Sequence Length Post by adk on May 3rd, 2005 at 4:50pm
btw- our sys. admin. tells me that that server should be able to use 4gigs per process. If the data file is 149mb is it possible that it is taking up over 4gb of RAM?
cheers, Andy |
Title: Re: Maximum Sequence Length Post by Sergei on May 3rd, 2005 at 4:53pm
Dear Andy,
wrote on May 3rd, 2005 at 4:44pm:
There is no hard coded limit. I just looked at the data reader code; my latest revision was optimized for speed of reading MSA (more than 2 sequences); the current incarnation is actually very memory inefficient for 2 long sequences. I am actually doing something that will require me to read genome-size pairs of sequences, so I'm probably going to optimize memory usage for your scenario; stay tuned - should have more for you in a couple of days. Cheers, Sergei |
Title: Re: Maximum Sequence Length Post by adk on May 3rd, 2005 at 5:07pm
Excellent Sergei! HYPHY is at the core of the phylogenetics analysis we are currently performing for an upoming drospohila genomics paper, so this addition would be extremely helpful!
cheers, Andy |
Title: Re: Maximum Sequence Length Post by Simon on May 4th, 2005 at 11:25am
Dear Andy,
Are you looking at comparisons between large numbers of orthologous genes? If so, what kinds of things are you looking at? I ask, as we're trying to develop various methods for looking at dN/dS comparisons at a genome-wide level, similar to the work that Andy Clark and Rasmus Nielsen did on the Celera human/chimp data, which may come in handy. Feel free to email me or Sergei if you'd like to be a beta-tester. Best, Simon |
Title: Re: Maximum Sequence Length Post by Sergei on May 4th, 2005 at 5:51pm Quote:
I have rewritten some old parts of the code which were never designed with very long data sets it mind :-[. I was able to read in a 30 megabase human-chimp CDS with the new code (May 4th, 2005 build) in about 30 seconds and run nucleotide and codon model fits on it quickly in about 250 MB peak memory consumption. Give today's build a spin and let me know if it works... Cheers, Sergei |
Title: Re: Maximum Sequence Length Post by avilella on Nov 30th, 2005 at 10:20am wrote on May 4th, 2005 at 11:25am:
Hi Simon and Sergei, I stumbled upon this post as of today and wanted to say that I am certainly doing that kind of comparisons in my research, and am enormously interested into being a beta-tester of your methods. This is a perfect timing at this point of my research, so I am really interested in getting into this as soon as possible, Looking forward to hearing from you, Bests, Albert. |
Title: Re: Maximum Sequence Length Post by Sergei on Dec 22nd, 2005 at 3:51pm
Dear Albert,
Good catch. Some old experimental code related to the sorting the order of alignment columns (which should have been de-activated) was choking on a large codon data set (with loads of unique codon patterns). I'll fix it in the next build; in the meantime change line 7496 in likefunc.cpp from Code (] checkParameter (useFullMST,kp,1.0); [/code):
recompile and try again. Cheers, Sergei P.S. Yes, I am still putting together multi-gene analyses. They were originally written for 2 sequences only and the modification to more than 2 is a bit tedious :( |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |