A program called "maker" was built using this MPI. At least that is what I believe I told it to do like this:
Code: Select all
perl Build.PL 2>&1 | tee ../build_pl.log
#query:prompt relative to MPI installation
#mpi query: Y
#specify path to mpicc: /usr/lib64/openmpi/bin/mpicc
#specify path to mpi.h: /usr/include/openmpi-x86_64
./Build install 2>&1 | tee maker_install.log
Code: Select all
module add openmpi-x86_64
echo $LD_LIBRARY_PATH
/usr/lib64/openmpi/lib
echo $PATH
/usr/lib64/openmpi/bin:/home/mathog/perl5/perlbrew/bin:/home/mathog/perl5/perlbrew/perls/perl-5.20.0t/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/mathog/bin
nice /usr/lib64/openmpi/bin/mpiexec --prefix /usr/lib64/openmpi -n 1 \
/home/mathog/src/maker/bin/maker \
</dev/null >try_maker_1.log 2>&1 &
Code: Select all
[machinename:70819] mca: base: component_find: unable to open /usr/lib64/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[machinename:70819] mca: base: component_find: unable to open /usr/lib64/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[machinename:70819] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[machinename:70819] mca: base: component_find: unable to open /usr/lib64/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[10350,1],0]
Exit code: 1
--------------------------------------------------------------------------
The things it says it cannot find seem to be present:
Code: Select all
ls -al /usr/lib64/openmpi/lib/openmpi/mca_shmem*
-rwxr-xr-x 1 root root 12352 May 11 2016 /usr/lib64/openmpi/lib/openmpi/mca_shmem_mmap.so
-rwxr-xr-x 1 root root 11304 May 11 2016 /usr/lib64/openmpi/lib/openmpi/mca_shmem_posix.so
-rwxr-xr-x 1 root root 8832 May 11 2016 /usr/lib64/openmpi/lib/openmpi/mca_shmem_sysv.so
binary is also the one referenced on the first line of the script. With some DEBUG print statements the source all all those error messages was tracked down to this line:
Code: Select all
MPI_Init();
Code: Select all
use Process::MpiChunk;
use Process::MpiTiers;
use Parallel::Application::MPI qw(:all);
definitely is, it is in ~/src/maker/perl/lib/Parallel/Application/MPI.pm and consists of:
Code: Select all
sub MPI_Init {
my $stat = 0;
if($$ != 0 && !$INITIALIZED && _load()){
# allow signals to interupt blocked MPI calls
UNSAFE_SIGNALS {
$stat = C_MPI_Init();
};
$INITIALIZED = 1;
}
return $stat;
}
Code: Select all
eval{
#this comment is just a way to force Inline::C to recompile on changing MPICC and MPIDIR
my $comment = "void _comment() {\nchar comment[] = \"MPICC=$mpicc, MPIDIR=$mpidir, CCFLAGSEX=$extra\";\n}\n";
Inline->bind(C => $CODE . $comment,
NAME => 'Parallel::Application::MPI',
DIRECTORY => $loc,
CC => $mpicc,
LD => $mpicc,
CCFLAGSEX => $extra,
INC => '-I'.$mpidir,);
};
/usr/include/openmpi-x86_64, which is correct, as far as I can tell.
Any suggestions on what is going wrong here and how to fix it?