HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> How can I profile a HYPHY execution?
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1129552284

Message started by avilella on Oct 17th, 2005 at 5:31am

Title: How can I profile a HYPHY execution?
Post by avilella on Oct 17th, 2005 at 5:31am
Dears,

HYPHY is taking a lot of time with the "Replace stop codons with gaps
in codon data" with a file of very long sequences:

echo -e "4\n1\n1\n/home/avb/wallace/eukarya/drosophila/concat/obacs/data/Dmsye.fasta\n4\n3\n1\n1\n/home/avb/wallace/eukarya/drosophila/concat/obacs/data/Dmsye.nogaps.fasta\n" | ./HYPHYMP

And I was wondering how can I profile, somehow, where is it spending
so much time to do the job.

Are there any tools that you, the HYPHY gurus, use for that?

Bests,

   Albert.

Title: Re: How can I profile a HYPHY execution?
Post by avilella on Oct 17th, 2005 at 5:36am
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

This is an example file with large sequences and a good amount of stop codons.

Title: Re: How can I profile a HYPHY execution?
Post by Sergei on Oct 17th, 2005 at 4:31pm
Dear Albert,

I fixed the bug which was causing the slowdown.

Effectively it's a 'stream problem'. Effectively, what happens is this imagine you have a very long string A and a string B, which you want to populate with a modified version of A.

The code I had in the file (and it works fine for short sequences) is like this:


Code (]
B="";

for (k=0; k<Abs(A); k=k+1)
{
     if (A[k):

== something)
    {
           B = B + function of A[k]);
    }
    else
    {
           B = B + another function of A[k];
    }
}


The slowdown comes when Abs(A) is large. Effectively in B=B+something, new memory has to be allocated of length(B)+length(something). When length (B) is large, the cost of reallocating length(B)+length (something) can be order (length (B)). Thus the overall complexity of the algorithm can be order (length (A)^2).

Here's a streamed version of the same alrgorithm:


Code (]
B="";
B*8192; /* make B into a stream string and allocate some initial storage */

for (k=0; k<Abs(A); k=k+1)
{
     if (A[k):

== something)
    {
           B * function of A[k]); /* '*' in this context is an 'add to stream' operation */
    }
    else
    {
           B * another function of A[k];
    }
}
B*0; /*trim unused memory; B is now a regular string */


Now, when time comes to allocate more memory to stream B, new allocation is not length (function of A[k]), but rather length (B)/2 (i.e. if the string is already long, we allocate a lot more memory that immediately needed, but then save on memory allocations). Now we only have Log (length (A)) allocations (assuming functions of A[k] return constant length strings), and execution time is Log(length (A)) * length (A).

Fixed version of CleanStopCodons.bf Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login and will be rolled into the next update.

Having a HyPhy batch code profiler is a good idea! I will add it soon.

Cheers,
Sergei


Title: Re: How can I profile a HYPHY execution?
Post by Sergei on Oct 21st, 2005 at 3:39pm
Dear Albert,

I have added a simple code profiler to HyPhy as of today's (Oct 21st) build. Take a look at the profile_test.bf file in Examples/BatchLanguage for a trivial example.

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.