HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
HYPHY Package >> HyPhy bugs >> MPI runtime problem
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1267546450

Message started by Matt on Mar 2nd, 2010 at 8:14am

Title: MPI runtime problem
Post by Matt on Mar 2nd, 2010 at 8:14am
I am having problems with running the MPI version of HYPHY on my computer. I run multiple MPI apps (OPENMPI), so I'm reasonable confident that the issue is not there..

I am running on Ubuntu workstation, have 24gb RAM, 16 cores, and am trying to work with a datafile that has 115 sequences of 300bases each.  I downloaded the source code last week, so it is relatively new.  

Anyways, I can invoke the program by either

Code (]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI[/code):
or

[code]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI /home/macmanes/hyphy/MPI/TemplateBatchFiles/dNdSRateAnalysis.bf


With the former, I type in the relevant commands a the prompts..

After it reads in the data, I am asked to select the appropriate code, and then it gives me the MPI errors/seqmentation fault..


Code (]Please choose an option (or press q to cancel selection):
     (1):[Universal):

Universal code. (Genebank transl_table=1).
     ...
     (12):[Blepharisma Nuclear] Blepharisma Nuclear code. (Genebank transl_table=15).

1
Please choose an option (or press q to cancel selection):

[macmanes:03003] *** Process received signal ***
[macmanes:03003] Signal: Segmentation fault (11)
[macmanes:03003] Signal code: Address not mapped (1)
[macmanes:03003] Failing at address: 0x69
[macmanes:03003] [ 0] /lib/libpthread.so.0 [0x7fc831dc8190]
[macmanes:03003] [ 1] ./HYPHYMPI(_ZN7_StringC1ERKS_+0x11) [0x626861]
[macmanes:03003] [ 2] ./HYPHYMPI(_ZN18_ElementaryCommand5toStrEv+0x2fc9) [0x458539]
[macmanes:03003] [ 3] ./HYPHYMPI(_Z22ReturnCurrentCallStackv+0xd8) [0x445338]
[macmanes:03003] [ 4] ./HYPHYMPI(_Z9WarnError7_String+0x43) [0x629993]
[macmanes:03003] [ 5] ./HYPHYMPI(_ZN18_ElementaryCommand13ExecuteCase25ER14_ExecutionListb+0xf7e) [0x45ec3e]
[macmanes:03003] [ 6] ./HYPHYMPI(_ZN18_ElementaryCommand7ExecuteER14_ExecutionList+0x238) [0x46f8e8]
[macmanes:03003] [ 7] ./HYPHYMPI(_ZN14_ExecutionList7ExecuteEv+0x1e8) [0x472188]
[macmanes:03003] [ 8] ./HYPHYMPI(_ZN18_ElementaryCommand13ExecuteCase39ER14_ExecutionList+0x2af) [0x4725ef]
[macmanes:03003] [ 9] ./HYPHYMPI(_ZN18_ElementaryCommand7ExecuteER14_ExecutionList+0x13d) [0x46f7ed]
[macmanes:03003] [10] ./HYPHYMPI(_ZN14_ExecutionList7ExecuteEv+0x1e8) [0x472188]
[macmanes:03003] [11] ./HYPHYMPI(main+0x96a) [0x51381a]
[macmanes:03003] [12] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fc83031eabd]
[macmanes:03003] [13] ./HYPHYMPI [0x439fe9]
[macmanes:03003] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3003 on node macmanes exited on signal 11 (Segmentation fault)


Please let me know if there is other info I can send to help diagnose the problem. FWIW, I can do the example's, for instance,

[code]macmanes@macmanes:~/hyphy/MPI$ /home/macmanes/apps/ompi1.4.1/bin/mpirun ./HYPHYMPI /home/macmanes/hyphy/TestSuite/REL/ModelMixture.bf[/code]

but those do not appear to run in MPI...

Thanks.Matt

Thanks. Matt

Title: Re: MPI runtime problem
Post by Sergei on Mar 2nd, 2010 at 8:22am
Hi Matt,

I am not sure what is causing the error, but I suspect it may be the interplay between OpenMP (multithreading) and MPI (distributed computing). Could you confirm that HyPhy is able to execute the simple MPI test script attached below?

Sergei


http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=MPITest.bf (1 KB | )

Title: Re: MPI runtime problem
Post by Matt on Mar 2nd, 2010 at 8:37am
seems like no problem there!
[code]macmanes@macmanes:~/hyphy/MPI$ mpirun -np 14 ./HYPHYMPI /home/macmanes/hyphy/MPITest.bf
Running a HYPHY-MPI test

Detected 14 computational nodes
Polling slave nodes...
Polling node 2...
OK
Polling node 3...
OK
Polling node 4...
OK
Polling node 5...
OK
Polling node 6...
OK
Polling node 7...
OK
Polling node 8...
OK
Polling node 9...
OK
Polling node 10...
OK
Polling node 11...
OK
Polling node 12...
OK
Polling node 13...
OK
Polling node 14...
OK

Measuring simple job send/receieve throughput...
Node     2 sent/received 15706 batch jobs per second
Node     3 sent/received 10789 batch jobs per second
Node     4 sent/received 17053.8 batch jobs per second
Node     5 sent/received 18098.4 batch jobs per second
Node     6 sent/received 17198.8 batch jobs per second
Node     7 sent/received 17404 batch jobs per second
Node     8 sent/received 8262.4 batch jobs per second
Node     9 sent/received 17831.4 batch jobs per second
Node    10 sent/received 17820.8 batch jobs per second
Node    11 sent/received 17751.2 batch jobs per second
Node    12 sent/received 22178.6 batch jobs per second
Node    13 sent/received 6194.6 batch jobs per second
Node    14 sent/received 21.2 batch jobs per second

Measuring relative computational performance...
Master node reference index:    1886740
Slave node   1 index:    1918280.     101.67% relative to the master
Slave node   2 index:    3340666.     177.06% relative to the master
Slave node   3 index:    1966488.     104.23% relative to the master
Slave node   4 index:    1918905.     101.70% relative to the master
Slave node   5 index:    1892695.     100.32% relative to the master
Slave node   6 index:    1893526.     100.36% relative to the master
Slave node   7 index:    1814552.      96.17% relative to the master
Slave node   8 index:    1934813.     102.55% relative to the master
Slave node   9 index:    1932888.     102.45% relative to the master
Slave node  10 index:    1819430.      96.43% relative to the master
Slave node  11 index:    2870106.     152.12% relative to the master
Slave node  12 index:    1902300.     100.82% relative to the master
Slave node  13 index:    1873175.      99.28% relative to the master


macmanes@macmanes:~/hyphy/MPI$[/code]

Title: Re: MPI runtime problem
Post by Sergei on Mar 2nd, 2010 at 8:42am
Hi Matt,

Looking at the call trace from your first error post, seems like the SIG11 fault happens while HyPhy is attempting to display an error message. ExecuteCase25 (another function on the stack trace) is used to deal with standard input: my guess is the mpirun does not pass standard input to the process? Could you try to run the same command on the same data using the MP2 build of HYPHY and see if that works OK? If that does, write a wrapper file (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) to encode the inputs and retry MPI.

Sergei

Title: Re: MPI runtime problem
Post by Matt on Mar 3rd, 2010 at 8:35am
Hi Sergei

The MP and MP_GTK builds work fine using my dataset, but still nothing from the MPI version. I'm currently trying to make a wrapper as in the example you provided, so, I'll keep my fingers crossed for that..

Thanks. Matt

Title: Re: MPI runtime problem
Post by Matt on Mar 3rd, 2010 at 9:39am
Hi Sergei,

Success using MPI and a wrapper.. I'm not sure why, but I suspect that it has something to do with some misplaced carriage returns when inputting options in the command line (i.e. when not using a wrapper).

For instance, after using the command
[code]$mpirun -np 12 ./HYPHYMPI {options} [/code] I needed to hit return before I would be prompted to give my codon file location... Similarly, there were several other misplaced carriage returns that were required to get to the next prompt..

I bet this is the problem..

Title: Re: MPI runtime problem
Post by Sergei on Mar 3rd, 2010 at 9:51am
Hi Matt,

Glad you solved the problem. Different implementations of MPI use different stdin and stdout buffering techniques, hence using a wrapper file is probably the safest way to go.

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.