Reservations, Node Selection, Interactive Runs

Reservations on AuN are not currently supported.

Reservations on Mio

Reservations are no longer required on Mio to evict people from your nodes. In the past people would set a reservation for their nodes and in doing so purge jobs from users not belonging to their group. Now, people need only run the job, selecting to run in their group's partition. See Selecting Nodes on Mio and Running only on nodes you own below.

Selecting Nodes on Mio

There are two ways to manually select nodes on which to run. They can be listed on the command line or by selecting a partition. The "partition" method is discussed in the next section.

We have below a section of the man page for srun command describing how to specify a list of nodes on which to run:


-w, --nodelist=
    Request a specific list of hosts. The job will contain at least these hosts.
    The list may be specified as a comma-separated list of  hosts, a range of hosts
    (compute[1-5,7,...] for example), or a filename.  The host list will be assumed to 
    be a filename if it contains a "/" character. If you specify a max node count 
    (-N1-2) if there are more than 2 hosts in the file only the first 2 nodes will 
    be used in the  request list.   Rather  than  repeating  a host name multiple 
    times, an asterisk and a repitition count may be appended to a host name. For 
    example "compute1,compute1" and "compute1*2" are equivalent.

Example: running the script myscript on compute001, compute002, and compute003...

[joeuser@mio001 ~]sbatch --nodelist=compute[001-003]  myscript

Example: running the "hello world" program /opt/utility/phostname interactively on compute001, compute002, and compute003...

[joeuser@mio001 ~]srun --nodelist=compute[001-003]  --tasks-per-node=4 /opt/utility/phostname
compute001
compute001
compute001
compute001
compute002
compute002
compute002
compute002
compute003
compute003
compute003
compute003
[joeuser@mio001 color]$ 

Running only on nodes with particular features such as number of cores

There are several generation of nodes on Mio each with different "features." You can see the features by running the command:

[joeuser@mio001 ~]/opt/utility/slurmnodes -fAvailableFeatures
compute000
   Features core8,nehalem,mthca,ddr
compute001
   Features core8,nehalem,mthca,ddr
...
compute032
   Features core12,westmere,mthca,ddr
compute033
   Features core12,westmere,mthca,ddr
...
compute157
   Features core24,haswell,mlx4,fdr
...
...

Features can be used to select subsets of nodes. For example, if you want to run on nodes with 24 cores you can add an option --constraint=core24 to your sbatch command line or script.

[joeuser@mio001 ~]sbatch --constraint=core24 simple_slurm 
Submitted batch job 1289851
[joeuser@mio001 ~]

Which gives us:

[joeuser@mio001 ~]squeue -u joeuser
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1289851   compute   hybrid  joeuser  R       0:01      2 compute[157-158]
[joeuser@mio001 ~]

Running only on nodes you own (or in a particular partition)

Every normal compute node (exceptions are GPU and PHI nodes) on mio is part of two partitions or groupings. They are part of the compute partition and they are part of a partition that is assigned to a research group. That is, each research group has a partition and their nodes are in that partition. The GPU and PHI nodes are in their own partition to prevent people from accidentally running on them.

You can see the partitions that you are allowed to use (compute, phi, gpu and your groups partions) by running the command sinfo. sinfo -node will display which partitions you are allowed to run in. sinfo -a will show all partitions. sinfo -a --format="%P %N" shows a compact list of all partitions and nodes.

Add the option -p partition_name to your srun command run in the named partition. The default partition is compute which is all of the normal nodes. By default your job can end up on any nodes. Specifying your groups partition will restrict your job to "your" nodes.

Also, starting a job in your groups partition will purge any job running on your nodes that are run under the default partition. Thus, it is not necessary to create a reservation to gain access to your nodes. If you do not run in your partition your jobs have the potential to be deleted by the group owning the nodes.

There is a shortcut command that will show you the partitions in which you can run, /opt/utility/partitions. For example:

[joeuser@mio001 utility]$ /opt/utility/partitions
Partitions and their nodes available to joeuser
    compute   compute[000-003,008-013,016-033,035-041,043-047,049-052,054-081,083-193]
        phi   phi[001-002]
        gpu   gpu[001-003]
  joesgroup   compute[056-061,160-167]
[tkaiser@mio001 utility]$ 

We see that joeuser can run on nodes in the compute partition. The partitions compute, phi, and gpu are available to everyone. Joes group "owns" compute[056-061,160-167] and running in the joesgroup partition will allow preemption.

Running threaded jobs and/or Running with less than N MPI tasks per node Slurm will try to pack as many tasks on a node as it can to try to fill it so that there is at least 1 task or thread per core. So if you are running less than N MPI tasks per node where N is the number of cores slurm may put additional jobs on your node.

You can prevent this from happening by selecting setting values for the flags --tasks-per-node and --cpus-per-task on your sbatch command line or in you slurm script. The value for --tasks-per-node times --cpus-per-task should be the number of cores on the node. For example, if you are running on 2 16 core nodes you want 8 MPI tasks you might say

--nodes=2 --tasks-per-node=4 --cpus-per-task=4

where 2*4*4=32 or the total number of cores on two nodes.

You can also prevent additional jobs from running on nodes by using the --exclusive flag