Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
MPI runtime problem (Read 4269 times)
Matt
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 23
MPI runtime problem
Mar 2nd, 2010 at 8:14am
 
I am having problems with running the MPI version of HYPHY on my computer. I run multiple MPI apps (OPENMPI), so I'm reasonable confident that the issue is not there..

I am running on Ubuntu workstation, have 24gb RAM, 16 cores, and am trying to work with a datafile that has 115 sequences of 300bases each.  I downloaded the source code last week, so it is relatively new.  

Anyways, I can invoke the program by either
[code]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI[/code]

or

[code]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI /home/macmanes/hyphy/MPI/TemplateBatchFiles/dNdSRateAnalysis.bf[/code]

With the former, I type in the relevant commands a the prompts..

After it reads in the data, I am asked to select the appropriate code, and then it gives me the MPI errors/seqmentation fault..

[code]Please choose an option (or press q to cancel selection):
     (1):[Universal] Universal code. (Genebank transl_table=1).
     ...
     (12):[Blepharisma Nuclear] Blepharisma Nuclear code. (Genebank transl_table=15).

1
Please choose an option (or press q to cancel selection):

[macmanes:03003] *** Process received signal ***
[macmanes:03003] Signal: Segmentation fault (11)
[macmanes:03003] Signal code: Address not mapped (1)
[macmanes:03003] Failing at address: 0x69
[macmanes:03003] [ 0] /lib/libpthread.so.0 [0x7fc831dc8190]
[macmanes:03003] [ 1] ./HYPHYMPI(_ZN7_StringC1ERKS_+0x11) [0x626861]
[macmanes:03003] [ 2] ./HYPHYMPI(_ZN18_ElementaryCommand5toStrEv+0x2fc9) [0x458539]
[macmanes:03003] [ 3] ./HYPHYMPI(_Z22ReturnCurrentCallStackv+0xd8) [0x445338]
[macmanes:03003] [ 4] ./HYPHYMPI(_Z9WarnError7_String+0x43) [0x629993]
[macmanes:03003] [ 5] ./HYPHYMPI(_ZN18_ElementaryCommand13ExecuteCase25ER14_ExecutionListb+0xf7e) [0x45ec3e]
[macmanes:03003] [ 6] ./HYPHYMPI(_ZN18_ElementaryCommand7ExecuteER14_ExecutionList+0x238) [0x46f8e8]
[macmanes:03003] [ 7] ./HYPHYMPI(_ZN14_ExecutionList7ExecuteEv+0x1e8) [0x472188]
[macmanes:03003] [ 8] ./HYPHYMPI(_ZN18_ElementaryCommand13ExecuteCase39ER14_ExecutionList+0x2af) [0x4725ef]
[macmanes:03003] [ 9] ./HYPHYMPI(_ZN18_ElementaryCommand7ExecuteER14_ExecutionList+0x13d) [0x46f7ed]
[macmanes:03003] [10] ./HYPHYMPI(_ZN14_ExecutionList7ExecuteEv+0x1e8) [0x472188]
[macmanes:03003] [11] ./HYPHYMPI(main+0x96a) [0x51381a]
[macmanes:03003] [12] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fc83031eabd]
[macmanes:03003] [13] ./HYPHYMPI [0x439fe9]
[macmanes:03003] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3003 on node macmanes exited on signal 11 (Segmentation fault)[/code]

Please let me know if there is other info I can send to help diagnose the problem. FWIW, I can do the example's, for instance,

[code]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI /home/macmanes/hyphy/TestSuite/REL/ModelMixture.bf[/code]

but those do not appear to run in MPI...

Thanks.Matt

Thanks. Matt
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: MPI runtime problem
Reply #1 - Mar 2nd, 2010 at 8:22am
 
Hi Matt,

I am not sure what is causing the error, but I suspect it may be the interplay between OpenMP (multithreading) and MPI (distributed computing). Could you confirm that HyPhy is able to execute the simple MPI test script attached below?

Sergei

Back to top
 

MPITest.bf (1 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Matt
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 23
Re: MPI runtime problem
Reply #2 - Mar 2nd, 2010 at 8:37am
 
seems like no problem there!
Code:
macmanes@macmanes:~/hyphy/MPI$ mpirun -np 14 ./HYPHYMPI /home/macmanes/hyphy/MPITest.bf
Running a HYPHY-MPI test

Detected 14 computational nodes
Polling slave nodes...
Polling node 2...
OK
Polling node 3...
OK
Polling node 4...
OK
Polling node 5...
OK
Polling node 6...
OK
Polling node 7...
OK
Polling node 8...
OK
Polling node 9...
OK
Polling node 10...
OK
Polling node 11...
OK
Polling node 12...
OK
Polling node 13...
OK
Polling node 14...
OK

Measuring simple job send/receieve throughput...
Node     2 sent/received 15706 batch jobs per second
Node     3 sent/received 10789 batch jobs per second
Node     4 sent/received 17053.8 batch jobs per second
Node     5 sent/received 18098.4 batch jobs per second
Node     6 sent/received 17198.8 batch jobs per second
Node     7 sent/received 17404 batch jobs per second
Node     8 sent/received 8262.4 batch jobs per second
Node     9 sent/received 17831.4 batch jobs per second
Node    10 sent/received 17820.8 batch jobs per second
Node    11 sent/received 17751.2 batch jobs per second
Node    12 sent/received 22178.6 batch jobs per second
Node    13 sent/received 6194.6 batch jobs per second
Node    14 sent/received 21.2 batch jobs per second

Measuring relative computational performance...
Master node reference index:    1886740
Slave node   1 index:    1918280.     101.67% relative to the master
Slave node   2 index:    3340666.     177.06% relative to the master
Slave node   3 index:    1966488.     104.23% relative to the master
Slave node   4 index:    1918905.     101.70% relative to the master
Slave node   5 index:    1892695.     100.32% relative to the master
Slave node   6 index:    1893526.     100.36% relative to the master
Slave node   7 index:    1814552.	96.17% relative to the master
Slave node   8 index:    1934813.     102.55% relative to the master
Slave node   9 index:    1932888.     102.45% relative to the master
Slave node  10 index:    1819430.	96.43% relative to the master
Slave node  11 index:    2870106.     152.12% relative to the master
Slave node  12 index:    1902300.     100.82% relative to the master
Slave node  13 index:    1873175.	99.28% relative to the master


macmanes@macmanes:~/hyphy/MPI$ 


Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: MPI runtime problem
Reply #3 - Mar 2nd, 2010 at 8:42am
 
Hi Matt,

Looking at the call trace from your first error post, seems like the SIG11 fault happens while HyPhy is attempting to display an error message. ExecuteCase25 (another function on the stack trace) is used to deal with standard input: my guess is the mpirun does not pass standard input to the process? Could you try to run the same command on the same data using the MP2 build of HYPHY and see if that works OK? If that does, write a wrapper file (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) to encode the inputs and retry MPI.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Matt
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 23
Re: MPI runtime problem
Reply #4 - Mar 3rd, 2010 at 8:35am
 
Hi Sergei

The MP and MP_GTK builds work fine using my dataset, but still nothing from the MPI version. I'm currently trying to make a wrapper as in the example you provided, so, I'll keep my fingers crossed for that..

Thanks. Matt
Back to top
 
 
IP Logged
 
Matt
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 23
Re: MPI runtime problem
Reply #5 - Mar 3rd, 2010 at 9:39am
 
Hi Sergei,

Success using MPI and a wrapper.. I'm not sure why, but I suspect that it has something to do with some misplaced carriage returns when inputting options in the command line (i.e. when not using a wrapper).

For instance, after using the command
Code:
$mpirun -np 12 ./HYPHYMPI {options}  

I needed to hit return before I would be prompted to give my codon file location... Similarly, there were several other misplaced carriage returns that were required to get to the next prompt..

I bet this is the problem..
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: MPI runtime problem
Reply #6 - Mar 3rd, 2010 at 9:51am
 
Hi Matt,

Glad you solved the problem. Different implementations of MPI use different stdin and stdout buffering techniques, hence using a wrapper file is probably the safest way to go.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged