MPMD runs using the match script

The OpenMPI and MVAPICH2 version of MPI have the ability to run Multiple Instruction - Multiple Data (MIMD) or Multiple Program - Multiple Data (MPMD) programs. That is, each MPI task can be a different program. For example, one task can be a Fortran program and another a C. You can also explicity map MPI tasks to nodes to allow less than 1 MPI task per core. This is useful when you want to run large memory per task jobs, when you want one task to be on a node by itself and the rest on other nodes or when you are running hybrid Openmp/HPI jobs. We will discuss running MPMD programs first.

The most versatile way to run a MPMD program is to use an "appfile". The appfile is a list of nodes on which to run along with the program that is to be run on each node. The command for using an appfile to run an MPI program is

mpiexec -app appfile

If you specify the --app option on the mpiexec command line all other arguments are ignored.

The appfile is collection of lines of the form

-host <host name> -np <number of copies to run on host> <program name>

If you specify different application names in your appfile then you have a MIMD parallel program. The -np number determines how many copies of each given program to run on the node.

It is possible to have a node listed more then once. For example the following two appfiles are equivalent.

Appfile Example 1

          -host compute-1-1 -np 1 myprogram
          -host compute-1-1 -np 1 myprogram
          -host compute-1-1 -np 1 myprogram
          -host compute-1-1 -np 1 myprogram

Appfile Example 2

 
          -host compute-1-1 -np 4 myprogram

The difficultly is that the names of the nodes that you are assigned by the scheduler are not known until after the job is submitted. So you need to create the appfile on the fly from within your PBS script.

We have created a script "match" located at /lustre/apps/utility/match on RA and /opt/utility/match on Mio which takes a list of nodes and a list of applications to run on those nodes and creates an appfile.

The PBS variable $PBS_NODEFILE contains the name of a file that has the list of nodes on which your job will run. If you have a list of applications to run in a file, say app_list, then the one usage of match would be

     match $PBS_NODEFILE app_list > appfile
     mpiexec -app appfile

The number of applications that get launched is equal to the length of the longer of the two lists, the node file list, or the application list. If there are more nodes than applications then there will be multiple copies of the application launched. If you are running on RA the $PBS_NODEFILE file contains 8 copies of each node used in the computation. Assume you are running on 2 nodes, say compute-3-2 and fatcompute-9-8 and you have two different programs to run, c_ex00 and f_ex00. Your $PBS_NODEFILE would contain:

compute-3-2
compute-3-2
compute-3-2
compute-3-2
compute-3-2
compute-3-2
compute-3-2
compute-3-2
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8
fatcompute-9-8

Here are some examples with different app_list files:


[tkaiser@ra mpiTests]$ cat applist
c_ex00
f_ex00
[tkaiser@ra mpiTests]$ match nodefile applist 
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00


[tkaiser@ra mpiTests]$ cat applist4
c_ex00
c_ex00
f_ex00
f_ex00
[tkaiser@ra mpiTests]$ match nodefile applist4
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  f_ex00


[tkaiser@ra mpiTests]$ cat applist4b
c_ex00
f_ex00
c_ex00
f_ex00
[tkaiser@ra mpiTests]$ match nodefile applist4b
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  compute-3-2  -np  1  c_ex00
-host  compute-3-2  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
-host  fatcompute-9-8  -np  1  c_ex00
-host  fatcompute-9-8  -np  1  f_ex00
[tkaiser@ra mpiTests]$ 

If you do a unique sort

sort -u $PBS_NODEFILE > shortlist

then the file shortlist will contain a single copy of each node name. If you are using two nodes then "shortlist" would contain the names of those two nodes.

The match script can also take a replication count. The replication count is the number of copies of an application to run on a node as shown above. If both your application list and your node list each contain two entries then you can run four copies of one program on one node and four copies of the other program on the second node by doing the following:

     match shortlist applist -4 > appfile
     mpiexec -app applife

If your application list only contained a single entry then the above example would run 4 copies of the application on each of the two nodes.

The match script has a help screen that is shown if you type

match --help

Hybrid OpenMP/MPI program
Running less than 8 MPI tasks per node
and using match in command line mode

The match script can take the program name[s] and replication count[s] on the command line. For example if you wanted to run a hybrid OpenMP/MPI program with four copies of the program "c_ex00" on each node you could do the following:

/lustre/home/apps/utility/match  shortlist -p"c_ex00"  4 > appfile
mpiexec  --app appfile

Assume we have two MPI programs one C and the other Fortran, c_ex00.c and f_ex00.f90. These program just print the MPI task id and the name of the node on which they are running. Below we have several examples of match being used in command line mode to map these programs to nodes, along with the sorted program output.



********** command **********

/lustre/home/apps/utility/match  shortlist -p"c_ex00"   1 8 > appfile

********** appfile **********

-host  compute-3-24.local  -np  1  c_ex00
-host  fatcompute-12-10.local  -np  8  c_ex00

********** output **********

   C says Hello from   0 on compute-3-24.local
   C says Hello from   1 on fatcompute-12-10.local
   C says Hello from   2 on fatcompute-12-10.local
   C says Hello from   3 on fatcompute-12-10.local
   C says Hello from   4 on fatcompute-12-10.local
   C says Hello from   5 on fatcompute-12-10.local
   C says Hello from   6 on fatcompute-12-10.local
   C says Hello from   7 on fatcompute-12-10.local
   C says Hello from   8 on fatcompute-12-10.local



********** command **********

/lustre/home/apps/utility/match  shortlist -p"c_ex00 f_ex00"   1 > appfile

********** appfile **********

-host  compute-3-24.local  -np  1  c_ex00
-host  fatcompute-12-10.local  -np  1  f_ex00

********** output **********

   C says Hello from   0 on compute-3-24.local
Fort says Hello from   1 on fatcompute-12-10.local



********** command **********

/lustre/home/apps/utility/match  shortlist -p"c_ex00 f_ex00"   2 > appfile

********** appfile **********

-host  compute-3-24.local  -np  2  c_ex00
-host  fatcompute-12-10.local  -np  2  f_ex00

********** output **********

   C says Hello from   0 on compute-3-24.local
   C says Hello from   1 on compute-3-24.local
Fort says Hello from   2 on fatcompute-12-10.local
Fort says Hello from   3 on fatcompute-12-10.local



********** command **********

/lustre/home/apps/utility/match  shortlist -p"c_ex00 f_ex00"   2 4 > appfile

********** appfile **********

-host  compute-3-24.local  -np  2  c_ex00
-host  fatcompute-12-10.local  -np  4  f_ex00

********** output **********

   C says Hello from   0 on compute-3-24.local
   C says Hello from   1 on compute-3-24.local
Fort says Hello from   2 on fatcompute-12-10.local
Fort says Hello from   3 on fatcompute-12-10.local
Fort says Hello from   4 on fatcompute-12-10.local
Fort says Hello from   5 on fatcompute-12-10.local



********** command **********

/lustre/home/apps/utility/match  shortlist -p"c_ex00"  2 > appfile

********** appfile **********

-host  compute-3-24.local  -np  2  c_ex00
-host  fatcompute-12-10.local  -np  2  c_ex00

********** output **********

   C says Hello from   0 on compute-3-24.local
   C says Hello from   1 on compute-3-24.local
   C says Hello from   2 on fatcompute-12-10.local
   C says Hello from   3 on fatcompute-12-10.local

Feel free to copy and modify the match script for your own needs.