srun(1)                         SLURM Commands                         srun(1)



NAME
       srun - Run parallel jobs


SYNOPSIS
       srun [OPTIONS...]  executable [args...]


DESCRIPTION
       Run  a  parallel  job  on cluster managed by SLURM.  If necessary, srun
       will first create a resource allocation in which to  run  the  parallel
       job.

       The  following  document describes the the influence of various options
       on the allocation of cpus to jobs and tasks.
       http://slurm.schedmd.com/cpu_management.html


OPTIONS
       -A, --account=<account>
              Charge resources used by this job  to  specified  account.   The
              account  is an arbitrary string. The account name may be changed
              after job submission using the scontrol command.


       --acctg-freq
              Define the job  accounting  and  profiling  sampling  intervals.
              This  can be used to override the JobAcctGatherFrequency parame-
              ter in SLURM’s configuration file,  slurm.conf.   The  supported
              format is follows:

              --acctg-freq=<datatype>=<interval>
                          where  <datatype>=<interval> specifies the task sam-
                          pling interval for the jobacct_gather  plugin  or  a
                          sampling  interval  for  a  profiling  type  by  the
                          acct_gather_profile  plugin.  Multiple,  comma-sepa-
                          rated  <datatype>=<interval> intervals may be speci-
                          fied. Supported datatypes are as follows:

                          task=<interval>
                                 where <interval> is the task sampling  inter-
                                 val in seconds for the jobacct_gather plugins
                                 and    for    task    profiling    by     the
                                 acct_gather_profile  plugin.  NOTE: This fre-
                                 quency is used to monitor  memory  usage.  If
                                 memory  limits  are enforced the highest fre-
                                 quency a user can request is what is  config-
                                 ured  in  the  slurm.conf file.  They can not
                                 turn it off (=0) either.

                          energy=<interval>
                                 where <interval> is the sampling interval  in
                                 seconds   for   energy  profiling  using  the
                                 acct_gather_energy plugin

                          network=<interval>
                                 where <interval> is the sampling interval  in
                                 seconds  for  infiniband  profiling using the
                                 acct_gather_infiniband plugin.

                          filesystem=<interval>
                                 where <interval> is the sampling interval  in
                                 seconds  for  filesystem  profiling using the
                                 acct_gather_filesystem plugin.

              The  default  value  for  the  task  sampling        interval
              is  30.  The  default  value  for  all other intervals is 0.  An
              interval of 0 disables sampling of the specified type.   If  the
              task sampling interval is 0, accounting information is collected
              only at job termination (reducing SLURM  interference  with  the
              job).
              Smaller (non-zero) values have a greater impact upon job perfor-
              mance, but a value of 30 seconds is not likely to be  noticeable
              for applications having less than 10,000 tasks.


       -B --extra-node-info=<sockets[:cores[:threads]]>
              Request  a  specific  allocation of resources with details as to
              the number and type of computational resources within a cluster:
              number  of  sockets (or physical processors) per node, cores per
              socket, and threads per core.  The  total  amount  of  resources
              being  requested is the product of all of the terms.  Each value
              specified is considered a minimum.  An asterisk (*) can be  used
              as a placeholder indicating that all available resources of that
              type are to be utilized.  As with nodes, the  individual  levels
              can also be specified in separate options if desired:
                  --sockets-per-node=<sockets>
                  --cores-per-socket=<cores>
                  --threads-per-core=<threads>
              If  task/affinity  plugin is enabled, then specifying an alloca-
              tion in this manner also sets a  default  --cpu_bind  option  of
              threads  if the -B option specifies a thread count, otherwise an
              option of cores if a  core  count  is  specified,  otherwise  an
              option   of   sockets.    If   SelectType   is   configured   to
              select/cons_res,  it  must  have   a   parameter   of   CR_Core,
              CR_Core_Memory,  CR_Socket,  or CR_Socket_Memory for this option
              to be honored.  This option is not supported on BlueGene systems
              (select/bluegene  plugin  is configured).  If not specified, the
              scontrol show job will display ’ReqS:C:T=*:*:*’.


       --begin=<time>
              Defer initiation of this  job  until  the  specified  time.   It
              accepts  times  of  the form HH:MM:SS to run a job at a specific
              time of day (seconds are optional).  (If that  time  is  already
              past,  the next day is assumed.)  You may also specify midnight,
              noon, fika (3  PM)  or  teatime  (4  PM)  and  you  can  have  a
              time-of-day suffixed with AM or PM for running in the morning or
              the evening.  You can also say what day the job will be run,  by
              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
              Combine   date   and   time   using   the    following    format
              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
              count time-units, where the time-units can be seconds (default),
              minutes, hours, days, or weeks and you can tell SLURM to run the
              job today with the keyword today and to  run  the  job  tomorrow
              with  the  keyword tomorrow.  The value may be changed after job
              submission using the scontrol command.  For example:
                 --begin=16:00
                 --begin=now+1hour
                 --begin=now+60           (seconds by default)
                 --begin=2010-01-20T12:34:00


              Notes on date/time specifications:
               - Although the ’seconds’ field of the HH:MM:SS time  specifica-
              tion  is  allowed  by  the  code, note that the poll time of the
              SLURM scheduler is not precise enough to guarantee  dispatch  of
              the  job on the exact second.  The job will be eligible to start
              on the next poll following the specified time.  The  exact  poll
              interval  depends  on the SLURM scheduler (e.g., 60 seconds with
              the default sched/builtin).
               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
              (00:00:00).
               -  If a date is specified without a year (e.g., MM/DD) then the
              current year is assumed, unless the  combination  of  MM/DD  and
              HH:MM:SS  has  already  passed  for that year, in which case the
              next year is used.


       --checkpoint=<time>
              Specifies the interval between creating checkpoints of  the  job
              step.   By  default,  the job step will have no checkpoints cre-
              ated.  Acceptable time formats include "minutes",  "minutes:sec-
              onds",  "hours:minutes:seconds",  "days-hours", "days-hours:min-
              utes" and "days-hours:minutes:seconds".


       --checkpoint-dir=<directory>
              Specifies the directory into which the job or job step’s  check-
              point  should be written (used by the checkpoint/blcr and check-
              point/xlch plugins only).  The  default  value  is  the  current
              working  directory.   Checkpoint  files  will  be  of  the  form
              "<job_id>.ckpt" for jobs and "<job_id>.<step_id>.ckpt"  for  job
              steps.


       --comment=<string>
              An arbitrary comment.


       -C, --constraint=<list>
              Nodes  can  have features assigned to them by the SLURM adminis-
              trator.  Users can specify which of these features are  required
              by  their  job  using  the constraint option.  Only nodes having
              features matching the job constraints will be  used  to  satisfy
              the  request.   Multiple  constraints may be specified with AND,
              OR, matching OR, resource  counts,  etc.   Supported  constraint
              options include:

              Single Name
                     Only nodes which have the specified feature will be used.
                     For example, --constraint="intel"

              Node Count
                     A request can specify the number  of  nodes  needed  with
                     some feature by appending an asterisk and count after the
                     feature   name.    For   example    "--nodes=16    --con-
                     straint=graphics*4  ..."  indicates that the job requires
                     16 nodes at that at least four of those nodes  must  have
                     the feature "graphics."

              AND    If  only  nodes  with  all  of specified features will be
                     used.  The ampersand is used for an  AND  operator.   For
                     example, --constraint="intel&gpu"

              OR     If  only  nodes  with  at least one of specified features
                     will be used.  The vertical bar is used for an OR  opera-
                     tor.  For example, --constraint="intel|amd"

              Matching OR
                     If  only  one of a set of possible options should be used
                     for all allocated nodes, then use  the  OR  operator  and
                     enclose the options within square brackets.  For example:
                     "--constraint=[rack1|rack2|rack3|rack4]" might be used to
                     specify that all nodes must be allocated on a single rack
                     of the cluster, but any of those four racks can be  used.

              Multiple Counts
                     Specific counts of multiple resources may be specified by
                     using the AND operator and enclosing the  options  within
                     square      brackets.       For      example:     "--con-
                     straint=[rack1*2&rack2*4]" might be used to specify  that
                     two  nodes  must be allocated from nodes with the feature
                     of "rack1" and four nodes must be  allocated  from  nodes
                     with the feature "rack2".

       WARNING:  When  srun is executed from within salloc or sbatch, the con-
       straint value can only contain a single feature name. None of the other
       operators are currently supported for job steps.


       --contiguous
              If  set,  then  the  allocated nodes must form a contiguous set.
              Not honored with the topology/tree or topology/3d_torus plugins,
              both  of  which can modify the node ordering.  Not honored for a
              job step’s allocation.


       --cores-per-socket=<cores>
              Restrict node selection to nodes with  at  least  the  specified
              number of cores per socket.  See additional information under -B
              option above when task/affinity plugin is enabled.


       --cpu_bind=[{quiet,verbose},]type
              Bind tasks  to  CPUs.   Used  only  when  the  task/affinity  or
              task/cgroup  plugin  is  enabled.   The  configuration parameter
              TaskPluginParam may override these  options.   For  example,  if
              TaskPluginParam  is  configured  to bind to cores, your job will
              not be able to bind tasks  to  sockets.   NOTE:  To  have  SLURM
              always  report on the selected CPU binding for all commands exe-
              cuted in a shell, you can enable verbose  mode  by  setting  the
              SLURM_CPU_BIND environment variable value to "verbose".

              The  following  informational environment variables are set when
              --cpu_bind is in use:
                   SLURM_CPU_BIND_VERBOSE
                   SLURM_CPU_BIND_TYPE
                   SLURM_CPU_BIND_LIST

              See the  ENVIRONMENT  VARIABLES  section  for  a  more  detailed
              description  of  the  individual SLURM_CPU_BIND variables. These
              variable are available only if the task/affinity plugin is  con-
              figured.

              When  using --cpus-per-task to run multithreaded tasks, be aware
              that CPU binding is inherited from the parent  of  the  process.
              This  means that the multithreaded task should either specify or
              clear the CPU binding itself to avoid having all threads of  the
              multithreaded  task use the same mask/CPU as the parent.  Alter-
              natively, fat masks (masks which specify more than  one  allowed
              CPU)  could  be  used for the tasks in order to provide multiple
              CPUs for the multithreaded tasks.

              By default, a job step has access to every CPU allocated to  the
              job.   To  ensure  that  distinct CPUs are allocated to each job
              step, use the --exclusive option.

              If the job step allocation includes an allocation with a  number
              of sockets, cores, or threads equal to the number of tasks times
              cpus-per-task, then the tasks will by default be  bound  to  the
              appropriate resources (auto binding). Disable this mode of oper-
              ation by explicitly setting "--cpu_bind=none".

              Note that a job step can be allocated different numbers of  CPUs
              on each node or be allocated CPUs not starting at location zero.
              Therefore one of the options which  automatically  generate  the
              task  binding  is  recommended.   Explicitly  specified masks or
              bindings are only honored when the job step has  been  allocated
              every available CPU on the node.

              Binding  a task to a NUMA locality domain means to bind the task
              to the set of CPUs that belong to the NUMA  locality  domain  or
              "NUMA  node".   If NUMA locality domain options are used on sys-
              tems with no NUMA support, then  each  socket  is  considered  a
              locality domain.

              Supported options include:

              q[uiet]
                     Quietly bind before task runs (default)

              v[erbose]
                     Verbosely report binding before task runs

              no[ne] Do not bind tasks to CPUs (default unless auto binding is
                     applied)

              rank   Automatically bind by task  rank.   The  lowest  numbered
                     task  on each node is bound to socket (or core or thread)
                     zero, etc.  Not supported unless the entire node is allo-
                     cated to the job.

              map_cpu:<list>
                     Bind  by  mapping  CPU  IDs  to  tasks as specified where
                     <list> is <cpuid1>,<cpuid2>,...<cpuidN>.  The mapping  is
                     specified  for a node and identical mapping is applied to
                     the tasks on every node (i.e. the lowest task ID on  each
                     node is mapped to the first CPU ID specified in the list,
                     etc.).  CPU IDs are interpreted as decimal values  unless
                     they are preceded with ’0x’ in which case they are inter-
                     preted as hexadecimal values.  Not supported  unless  the
                     entire node is allocated to the job.

              mask_cpu:<list>
                     Bind  by  setting  CPU  masks on tasks as specified where
                     <list> is  <mask1>,<mask2>,...<maskN>.   The  mapping  is
                     specified  for a node and identical mapping is applied to
                     the tasks on every node (i.e. the lowest task ID on  each
                     node  is  mapped to the first mask specified in the list,
                     etc.).  CPU masks are always interpreted  as  hexadecimal
                     values  but  can  be  preceded with an optional ’0x’. Not
                     supported unless the entire node is allocated to the job.

              rank_ldom
                     Bind  to  a  NUMA  locality domain by rank. Not supported
                     unless the entire node is allocated to the job.

              map_ldom:<list>
                     Bind by mapping NUMA locality  domain  IDs  to  tasks  as
                     specified  where  <list>  is  <ldom1>,<ldom2>,...<ldomN>.
                     The locality domain IDs are interpreted as decimal values
                     unless they are preceded with ’0x’ in which case they are
                     interpreted as hexadecimal values.  Not supported  unless
                     the entire node is allocated to the job.

              mask_ldom:<list>
                     Bind  by  setting  NUMA locality domain masks on tasks as
                     specified  where  <list>  is  <mask1>,<mask2>,...<maskN>.
                     NUMA locality domain masks are always interpreted as hex-
                     adecimal values but can  be  preceded  with  an  optional
                     ’0x’.   Not supported unless the entire node is allocated
                     to the job.

              sockets
                     Automatically generate masks binding  tasks  to  sockets.
                     Only  the CPUs on the socket which have been allocated to
                     the job will be used.  If the  number  of  tasks  differs
                     from  the  number of allocated sockets this can result in
                     sub-optimal binding.

              cores  Automatically generate masks binding tasks to cores.   If
                     the  number of tasks differs from the number of allocated
                     cores this can result in sub-optimal binding.

              threads
                     Automatically generate masks binding  tasks  to  threads.
                     If  the  number of tasks differs from the number of allo-
                     cated threads this can result in sub-optimal binding.

              ldoms  Automatically generate masks binding tasks to NUMA local-
                     ity  domains.   If  the  number of tasks differs from the
                     number of allocated locality domains this can  result  in
                     sub-optimal binding.

              boards Automatically generate masks binding tasks to boards.  If
                     the number of tasks differs from the number of  allocated
                     boards  this  can  result  in  sub-optimal  binding. This
                     option is supported by the task/cgroup plugin only.

              help   Show help message for cpu_bind


       --cpu-freq =<requested frequency in kilohertz>

              Request that the job step initiated by this srun command be  run
              at the requested frequency if possible, on the CPUs selected for
              the step on the compute node(s).  Acceptable values  at  present
              include:

              Low           the lowest available frequency

              High          the highest available frequency

              HighM1        (high  minus  one)  will  select  the next highest
                            available frequency

              Medium        attempts to set a frequency in the middle  of  the
                            available range

              Conservative  attempts to use the Conservative CPU governor

              OnDemand      attempts  to  use  the  OnDemand CPU governor (the
                            default value)

              Performance   attempts to use the Performance CPU governor

              PowerSave     attempts to use the PowerSave CPU governor

       The following informational environment variable is set in the job step
       when --cpu-freq option is requested.
               SLURM_CPU_FREQ_REQ

       This  environment variable can also be used to supply the value for the
       CPU frequency request if it is set when the ’srun’ command  is  issued.
       The  --cpu-freq on the command line will override the environment vari-
       able value.  See the ENVIRONMENT VARIABLES section for a description of
       the SLURM_CPU_FREQ_REQ variable.

       NOTE:  This  parameter  is treated as a request, not a requirement.  If
       the job step’s node does not support setting the CPU frequency, or  the
       requested  value  is  outside  the  bounds of the legal frequencies, an
       error is logged, but the job step is allowed to continue.

       NOTE: Setting the frequency for just the CPUs of the job  step  implies
       that  the tasks are confined to those CPUs.  If task confinement (i.e.,
       TaskPlugin=task/affinity or TaskPlugin=task/cgroup with the "Constrain-
       Cores" option) is not configured, this parameter is ignored.

       NOTE:  When  the  step  completes,  the  frequency and governor of each
       selected CPU is reset to the configured CpuFreqDef value with a default
       value of the OnDemand CPU governor.

       NOTE:  When  submitting jobs with  the --cpu-freq option with linuxproc
       as the ProctrackType can cause jobs to run too quickly before  Account-
       ing  is  able  to  poll  for  job  information.  As a result not all of
       accounting information will be present.


       -c, --cpus-per-task=<ncpus>
              Request that ncpus be allocated per process. This may be  useful
              if  the  job is multithreaded and requires more than one CPU per
              task for optimal performance. The default is one  CPU  per  pro-
              cess.   If  -c  is  specified  without -n, as many tasks will be
              allocated per node as possible while satisfying the -c  restric-
              tion.  For  instance  on  a  cluster with 8 CPUs per node, a job
              request for 4 nodes and 3 CPUs per task may be allocated 3 or  6
              CPUs  per  node  (1 or 2 tasks per node) depending upon resource
              consumption by other jobs. Such a job may be unable  to  execute
              more than a total of 4 tasks.  This option may also be useful to
              spawn tasks without allocating resources to the  job  step  from
              the  job’s  allocation  when running multiple job steps with the
              --exclusive option.

              WARNING: There are configurations and options  interpreted  dif-
              ferently by job and job step requests which can result in incon-
              sistencies   for   this   option.    For   example   srun    -c2
              --threads-per-core=1  prog  may  allocate two cores for the job,
              but if each of those cores contains two threads, the job alloca-
              tion  will  include four CPUs. The job step allocation will then
              launch two threads per CPU for a total of two tasks.

              WARNING: When srun is executed from  within  salloc  or  sbatch,
              there  are configurations and options which can result in incon-
              sistent allocations when -c has a value greater than -c on  sal-
              loc or sbatch.


       -d, --dependency=<dependency_list>
              Defer  the  start  of  this job until the specified dependencies
              have been satisfied completed.  <dependency_list> is of the form
              <type:job_id[:job_id][,type:job_id[:job_id]]>.   Many  jobs  can
              share the same dependency and these jobs may even belong to dif-
              ferent   users.  The   value may be changed after job submission
              using the scontrol command.

              after:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have begun execution.

              afterany:job_id[:jobid...]
                     This  job  can  begin  execution after the specified jobs
                     have terminated.

              afternotok:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have terminated in some failed state (non-zero exit code,
                     node failure, timed out, etc).

              afterok:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have  successfully  executed  (ran  to completion with an
                     exit code of zero).

              expand:job_id
                     Resources allocated to this job should be used to  expand
                     the specified job.  The job to expand must share the same
                     QOS (Quality of Service) and partition.  Gang  scheduling
                     of resources in the partition is also not supported.

              singleton
                     This   job  can  begin  execution  after  any  previously
                     launched jobs sharing the same job  name  and  user  have
                     terminated.


       -D, --chdir=<path>
              Have  the  remote  processes do a chdir to path before beginning
              execution. The default is to chdir to the current working direc-
              tory of the srun process. The path can be specified as full path
              or relative path to the directory where the command is executed.


       -e, --error=<mode>
              Specify  how  stderr is to be redirected. By default in interac-
              tive mode, srun redirects stderr to the same file as stdout,  if
              one is specified. The --error option is provided to allow stdout
              and stderr to be redirected  to  different  locations.   See  IO
              Redirection  below  for  more  options.   If  the specified file
              already exists, it will be overwritten.


       -E, --preserve-env
              Pass the current values of  environment  variables  SLURM_NNODES
              and  SLURM_NTASKS through to the executable, rather than comput-
              ing them from commandline parameters.


       --epilog=<executable>
              srun will run executable just after the job step completes.  The
              command  line  arguments  for executable will be the command and
              arguments of the job step.  If executable  is  "none",  then  no
              srun epilog will be run. This parameter overrides the SrunEpilog
              parameter in slurm.conf. This parameter is  completely  indepen-
              dent from the Epilog parameter in slurm.conf.



       --exclusive
              This  option has two slightly different meanings for job and job
              step allocations.  When used to initiate a job, the job  alloca-
              tion  cannot  share  nodes with other running jobs.  The default
              shared/exclusive behavior depends on  system  configuration  and
              the  partition’s  Shared  option takes precedence over the job’s
              option.

              This option can also be used when initiating more than  one  job
              step within an existing resource allocation, where you want sep-
              arate processors to be dedicated to each job step. If sufficient
              processors  are  not available to initiate the job step, it will
              be deferred. This can be thought of as providing a mechanism for
              resource management to the job within it’s allocation.

              The  exclusive  allocation  of  CPUs  only  applies to job steps
              explicitly invoked with the --exclusive option.  For example,  a
              job  might  be  allocated  one  node with four CPUs and a remote
              shell invoked on the  allocated  node.  If  that  shell  is  not
              invoked  with  the  --exclusive option, then it may create a job
              step with four tasks using the --exclusive option and  not  con-
              flict  with  the  remote  shell’s  resource allocation.  Use the
              --exclusive option to invoke every job step to  insure  distinct
              resources for each step.

              Note  that all CPUs allocated to a job are available to each job
              step unless the --exclusive option is used plus task affinity is
              configured.  Since resource management is provided by processor,
              the --ntasks option must be specified, but the following options
              should  NOT  be  specified --relative, --distribution=arbitrary.
              See EXAMPLE below.


       --export=<environment variables | NONE>
              Identify which environment variables are propagated to the lauc-
              nhed application.  Multiple environment variable names should be
              comma separated.  Environment variable names may be specified to
              propagate   the   current   value   of   those  variables  (e.g.
              "--export=EDITOR") or specific values for the variables  may  be
              exported  (e.g..  "--export=EDITOR=/bin/vi")  in addition to the
              environment variables that would otherwise be set.   By  default
              all environment variables are propagated.


       --gid=<group>
              If srun is run as root, and the --gid option is used, submit the
              job with group’s group access permissions.   group  may  be  the
              group name or the numerical group ID.


       --gres=<list>
              Specifies   a   comma   delimited  list  of  generic  consumable
              resources.   The  format  of  each  entry   on   the   list   is
              "name[[:type]:count]".   The  name  is  that  of  the consumable
              resource.  The count is the number of  those  resources  with  a
              default  value  of 1.  The specified resources will be allocated
              to the job on  each  node.   The  available  generic  consumable
              resources  is  configurable by the system administrator.  A list
              of available generic consumable resources will  be  printed  and
              the  command  will exit if the option argument is "help".  Exam-
              ples of use include "--gres=gpu:2,mic=1", "--gres=gpu:kepler:2",
              and  "--gres=help".   NOTE:  By default, a job step is allocated
              all of the generic resources that have allocated to the job.  To
              change  the  behavior  so  that  each  job  step is allocated no
              generic resources, explicitly set the value of --gres to specify
              zero  counts  for  each generic resource OR set "--gres=none" OR
              set the SLURM_STEP_GRES environment variable to "none".


       -H, --hold
              Specify the job is to be submitted in a held state (priority  of
              zero).   A  held job can now be released using scontrol to reset
              its priority (e.g. "scontrol release <job_id>").


       -h, --help
              Display help information and exit.


       --hint=<type>
              Bind tasks according to application hints.

              compute_bound
                     Select settings for compute bound applications:  use  all
                     cores in each socket, one thread per core.

              memory_bound
                     Select  settings  for memory bound applications: use only
                     one core in each socket, one thread per core.

              [no]multithread
                     [don’t] use extra threads  with  in-core  multi-threading
                     which  can  benefit communication intensive applications.
                     Only supported with the task/affinity plugin.

              help   show this help message


       -I, --immediate[=<seconds>]
              exit if resources are not available within the time period spec-
              ified.   If  no  argument  is given, resources must be available
              immediately for the request to succeed.  By default, --immediate
              is off, and the command will block until resources become avail-
              able. Since this option’s argument is optional, for proper pars-
              ing  the  single letter option must be followed immediately with
              the value and not include a  space  between  them.  For  example
              "-I60" and not "-I 60".


       -i, --input=<mode>
              Specify  how  stdin is to redirected. By default, srun redirects
              stdin from the terminal all tasks. See IO Redirection below  for
              more  options.   For  OS X, the poll() function does not support
              stdin, so input from a terminal is not possible.


       -J, --job-name=<jobname>
              Specify a name for the job. The specified name will appear along
              with the job id number when querying running jobs on the system.
              The default is the supplied  executable  program’s  name.  NOTE:
              This  information  may be written to the slurm_jobacct.log file.
              This file is space delimited so if a space is used in  the  job-
              name name it will cause problems in properly displaying the con-
              tents of the slurm_jobacct.log file when the  sacct  command  is
              used.


       --jobid=<jobid>
              Initiate  a  job step under an already allocated job with job id
              id.  Using this option will cause srun to behave exactly  as  if
              the SLURM_JOB_ID environment variable was set.


       -K, --kill-on-bad-exit[=0|1]
              Controls  whether  or  not  to terminate a job if any task exits
              with a non-zero exit code. If this option is not specified,  the
              default action will be based upon the SLURM configuration param-
              eter of KillOnBadExit. If this option is specified, it will take
              precedence  over  KillOnBadExit. An option argument of zero will
              not terminate the job. A non-zero argument or no  argument  will
              terminate  the job.  Note: This option takes precedence over the
              -W, --wait option to terminate the job  immediately  if  a  task
              exits  with  a non-zero exit code.  Since this option’s argument
              is optional, for proper parsing the single letter option must be
              followed  immediately  with  the  value  and not include a space
              between them. For example "-K1" and not "-K 1".


       -k, --no-kill
              Do not automatically terminate a job if one of the nodes it  has
              been  allocated  fails.  This option is only recognized on a job
              allocation, not for the submission of individual job steps.  The
              job will assume all responsibilities for fault-tolerance.  Tasks
              launch using this option will not be considered terminated (e.g.
              -K,  --kill-on-bad-exit  and  -W,  --wait  options  will have no
              effect upon the job step).  The active job step (MPI  job)  will
              likely suffer a fatal error, but subsequent job steps may be run
              if this option is specified.  The default action is to terminate
              the job upon node failure.


       --launch-cmd
              Print  external  launch  command instead of running job normally
              through SLURM. This option is  only  valid  if  using  something
              other than the launch/slurm plugin.


       --launcher-opts=<options>
              Options  for the external launcher if using something other than
              the launch/slurm plugin.


       -l, --label
              Prepend task number to lines of stdout/err.  The --label  option
              will prepend lines of output with the remote task id.


       -L, --licenses=<license>
              Specification  of  licenses (or other resources available on all
              nodes of the cluster) which  must  be  allocated  to  this  job.
              License  names can be followed by a colon and count (the default
              count is one).  Multiple license names should be comma separated
              (e.g.  "--licenses=foo:4,bar").


       -m, --distribution=
              <block|cyclic|arbitrary|plane=<options>[:block|cyclic]>

              Specify  alternate  distribution  methods  for remote processes.
              This option controls the assignment of tasks  to  the  nodes  on
              which  resources  have  been  allocated, and the distribution of
              those resources to tasks for binding (task affinity). The  first
              distribution  method  (before the ":") controls the distribution
              of resources across  nodes.  The  optional  second  distribution
              method  (after  the  ":") controls the distribution of resources
              across sockets within a node.  Note that  with  select/cons_res,
              the number of cpus allocated on each socket and node may be dif-
              ferent. Refer  to  http://slurm.schedmd.com/mc_support.html  for
              more  information on resource allocation, assignment of tasks to
              nodes, and binding of tasks to CPUs.
              First distribution method:

              block  The block distribution method will distribute tasks to  a
                     node  such that consecutive tasks share a node. For exam-
                     ple, consider an allocation of three nodes each with  two
                     cpus.  A  four-task  block distribution request will dis-
                     tribute those tasks to the nodes with tasks one  and  two
                     on  the  first  node,  task three on the second node, and
                     task four on the third node.  Block distribution  is  the
                     default  behavior if the number of tasks exceeds the num-
                     ber of allocated nodes.

              cyclic The cyclic distribution method will distribute tasks to a
                     node  such  that  consecutive  tasks are distributed over
                     consecutive nodes (in a round-robin fashion).  For  exam-
                     ple,  consider an allocation of three nodes each with two
                     cpus. A four-task cyclic distribution request  will  dis-
                     tribute  those tasks to the nodes with tasks one and four
                     on the first node, task two on the second node, and  task
                     three  on  the  third node.  Note that when SelectType is
                     select/cons_res, the same number of CPUs may not be allo-
                     cated on each node. Task distribution will be round-robin
                     among all the nodes with  CPUs  yet  to  be  assigned  to
                     tasks.   Cyclic  distribution  is the default behavior if
                     the number of tasks is no larger than the number of allo-
                     cated nodes.

              plane  The  tasks are distributed in blocks of a specified size.
                     The options include a number representing the size of the
                     task  block.   This is followed by an optional specifica-
                     tion of the task distribution scheme within  a  block  of
                     tasks  and  between  the  blocks of tasks.  The number of
                     tasks distributed to each node is the same as for  cyclic
                     distribution,  but  the  taskids  assigned  to  each node
                     depend on the plane size.  For  more  details  (including
                     examples and diagrams), please see
                     http://slurm.schedmd.com/mc_support.html
                     and
                     http://slurm.schedmd.com/dist_plane.html

              arbitrary
                     The  arbitrary  method of distribution will allocate pro-
                     cesses in-order as listed in file designated by the envi-
                     ronment  variable  SLURM_HOSTFILE.   If  this variable is
                     listed it will over ride any other method specified.   If
                     not  set  the  method  will default to block.  Inside the
                     hostfile must contain at  minimum  the  number  of  hosts
                     requested  and  be  one  per line or comma separated.  If
                     specifying a task  count  (-n,  --ntasks=<number>),  your
                     tasks  will  be laid out on the nodes in the order of the
                     file.
                     NOTE: The arbitrary distribution option on a job  alloca-
                     tion  only  controls the nodes to be allocated to the job
                     and not the allocation  of  CPUs  on  those  nodes.  This
                     option  is  meant  primarily to control a job step’s task
                     layout in an existing job allocation for  the  srun  com-
                     mand.


              Second distribution method:

              block  The  block  distribution  method will distribute tasks to
                     sockets such that consecutive tasks share a socket.

              cyclic The cyclic distribution method will distribute  tasks  to
                     sockets  such that consecutive tasks are distributed over
                     consecutive sockets (in a round-robin fashion).


       --mail-type=<type>
              Notify user by email when certain event types occur.  Valid type
              values  are BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN,
              END, FAIL and REQUEUE), TIME_LIMIT,  TIME_LIMIT_90  (reached  90
              percent  of  time  limit),  TIME_LIMIT_80 (reached 80 percent of
              time limit), and  TIME_LIMIT_50  (reached  50  percent  of  time
              limit).   Multiple type values may be specified in a comma sepa-
              rated  list.   The  user  to  be  notified  is  indicated   with
              --mail-user.


       --mail-user=<user>
              User  to  receive email notification of state changes as defined
              by --mail-type.  The default value is the submitting user.


       --mem=<MB>
              Specify the real memory required per node in MegaBytes.  Default
              value  is  DefMemPerNode and the maximum value is MaxMemPerNode.
              If configured, both of parameters can be seen using the scontrol
              show  config command.  This parameter would generally be used if
              whole nodes are allocated  to  jobs  (SelectType=select/linear).
              Specifying  a  memory limit of zero for a job step will restrict
              the job step to the amount of memory allocated to the  job,  but
              not  remove any of the job’s memory allocation from being avail-
              able to other job steps.  Also  see  --mem-per-cpu.   --mem  and
              --mem-per-cpu are mutually exclusive.  NOTE: Enforcement of mem-
              ory limits currently  relies  upon  the  task/cgroup  plugin  or
              enabling  of  accounting, which samples memory use on a periodic
              basis (data need not be stored, just collected). In  both  cases
              memory  use  is  based upon the job’s Resident Set Size (RSS). A
              task may  exceed  the  memory  limit  until  the  next  periodic
              accounting sample.


       --mem-per-cpu=<MB>
              Minimum memory required per allocated CPU in MegaBytes.  Default
              value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
              exception  below). If configured, both of parameters can be seen
              using the scontrol show config command.  Note that if the  job’s
              --mem-per-cpu  value  exceeds  the configured MaxMemPerCPU, then
              the user’s limit will be treated as a  memory  limit  per  task;
              --mem-per-cpu  will be reduced to a value no larger than MaxMem-
              PerCPU; --cpus-per-task will be set and value of --cpus-per-task
              multiplied  by the new --mem-per-cpu value will equal the origi-
              nal --mem-per-cpu value specified by the user.   This  parameter
              would  generally  be used if individual processors are allocated
              to jobs (SelectType=select/cons_res).  If  resources  are  allo-
              cated  by  the  core,  socket or whole nodes; the number of CPUs
              allocated to a job may be higher than the  task  count  and  the
              value of --mem-per-cpu should be adjusted accordingly.  Specify-
              ing a memory limit of zero for a job step will restrict the  job
              step  to  the  amount  of  memory  allocated to the job, but not
              remove any of the job’s memory allocation from  being  available
              to  other  job  steps.  Also see --mem.  --mem and --mem-per-cpu
              are mutually exclusive.


       --mem_bind=[{quiet,verbose},]type
              Bind tasks to memory. Used only when the task/affinity plugin is
              enabled  and the NUMA memory functions are available.  Note that
              the resolution of CPU and memory  binding  may  differ  on  some
              architectures.  For example, CPU binding may be performed at the
              level of the cores within a processor while memory binding  will
              be  performed  at  the  level  of nodes, where the definition of
              "nodes" may differ from system to system. The use  of  any  type
              other  than  "none"  or "local" is not recommended.  If you want
              greater control, try running a simple test code with the options
              "--cpu_bind=verbose,none  --mem_bind=verbose,none"  to determine
              the specific configuration.

              NOTE: To have SLURM always report on the selected memory binding
              for  all  commands  executed  in a shell, you can enable verbose
              mode by setting the SLURM_MEM_BIND environment variable value to
              "verbose".

              The  following  informational environment variables are set when
              --mem_bind is in use:

                   SLURM_MEM_BIND_VERBOSE
                   SLURM_MEM_BIND_TYPE
                   SLURM_MEM_BIND_LIST

              See the  ENVIRONMENT  VARIABLES  section  for  a  more  detailed
              description of the individual SLURM_MEM_BIND* variables.

              Supported options include:

              q[uiet]
                     quietly bind before task runs (default)

              v[erbose]
                     verbosely report binding before task runs

              no[ne] don’t bind tasks to memory (default)

              rank   bind by task rank (not recommended)

              local  Use memory local to the processor in use

              map_mem:<list>
                     bind  by  mapping  a  node’s memory to tasks as specified
                     where <list> is <cpuid1>,<cpuid2>,...<cpuidN>.   CPU  IDs
                     are   interpreted  as  decimal  values  unless  they  are
                     preceded with ’0x’ in which case they interpreted as hex-
                     adecimal values (not recommended)

              mask_mem:<list>
                     bind  by setting memory masks on tasks as specified where
                     <list> is <mask1>,<mask2>,...<maskN>.  memory  masks  are
                     always  interpreted  as  hexadecimal  values.   Note that
                     masks must be preceded with a ’0x’ if  they  don’t  begin
                     with  [0-9] so they are seen as numerical values by srun.

              help   show this help message


       --mincpus=<n>
              Specify a minimum number of logical cpus/processors per node.


       --msg-timeout=<seconds>
              Modify the job launch message timeout.   The  default  value  is
              MessageTimeout  in  the  SLURM  configuration  file  slurm.conf.
              Changes to this are typically not recommended, but could be use-
              ful to diagnose problems.


       --mpi=<mpi_type>
              Identify the type of MPI to be used. May result in unique initi-
              ation procedures.

              list   Lists available mpi types to choose from.

              lam    Initiates one ’lamd’ process  per  node  and  establishes
                     necessary environment variables for LAM/MPI.

              mpich1_shmem
                     Initiates  one process per node and establishes necessary
                     environment variables for  mpich1  shared  memory  model.
                     This also works for mvapich built for shared memory.

              mpichgm
                     For use with Myrinet.

              mvapich
                     For use with Infiniband.

              openmpi
                     For use with OpenMPI.

              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
                     only if the MPI  implementation  supports  it,  in  other
                     words  if the MPI has the PMI2 interface implemented. The
                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
                     which  provides  the  server  side  functionality but the
                     client side must  implement  PMI2_Init()  and  the  other
                     interface calls.

              none   No  special MPI processing. This is the default and works
                     with many other versions of MPI.


       --multi-prog
              Run a job with different programs and  different  arguments  for
              each  task.  In  this  case, the executable program specified is
              actually a configuration  file  specifying  the  executable  and
              arguments  for  each  task.  See  MULTIPLE PROGRAM CONFIGURATION
              below for details on the configuration file contents.


       -N, --nodes=<minnodes[-maxnodes]>
              Request that a minimum of minnodes nodes be  allocated  to  this
              job.   A maximum node count may also be specified with maxnodes.
              If only one number is  specified,  this  is  used  as  both  the
              minimum  and  maximum  node  count.  The partition’s node limits
              supersede those of the job.  If a job’s node limits are  outside
              of  the  range  permitted  for its associated partition, the job
              will be left in a PENDING state.  This permits  possible  execu-
              tion at a later time, when the partition limit is changed.  If a
              job node limit exceeds the number of  nodes  configured  in  the
              partition,  the job will be rejected.  Note that the environment
              variable SLURM_JOB_NUM_NODES  (and  SLURM_NNODES  for  backwards
              compatibility)  will be set to the count of nodes actually allo-
              cated to the job. See the ENVIRONMENT VARIABLES section for more
              information.  If -N is not specified, the default behavior is to
              allocate enough nodes to satisfy the requirements of the -n  and
              -c options.  The job will be allocated as many nodes as possible
              within the range specified and without delaying  the  initiation
              of  the job.  The node count specification may include a numeric
              value followed by a suffix of "k" (multiplies numeric  value  by
              1,024) or "m" (multiplies numeric value by 1,048,576).


       -n, --ntasks=<number>
              Specify  the  number of tasks to run. Request that srun allocate
              resources for ntasks tasks.  The default is one task  per  node,
              but  note  that  the  --cpus-per-task  option  will  change this
              default.


       --network=<type>
              Specify information pertaining to the switch  or  network.   The
              interpretation of type is system dependent.  This option is sup-
              ported when running Slurm on a Cray natively.   It  is  used  to
              request  using  Network Performace Counters.  Only one value per
              request is valid.  All options are case in-sensitive.   In  this
              configuration supported values include:

              system
                    Use  the  system-wide  network  performance counters. Only
                    nodes requested will be marked in use for the job  alloca-
                    tion.   If  the job does not fill up the entire system the
                    rest of the nodes are not able to be used  by  other  jobs
                    using  NPC,  if  idle their state will appear as PerfCnts.
                    These nodes are still available for other jobs  not  using
                    NPC.

              blade Use  the  blade  network  performance counters. Only nodes
                    requested will be marked in use for  the  job  allocation.
                    If  the job does not fill up the entire blade(s) allocated
                    to the job those blade(s) are not able to be used by other
                    jobs  using NPC, if idle their state will appear as PerfC-
                    nts.  These nodes are still available for other  jobs  not
                    using NPC.


              In  all  cases  the  job  or step allocation request must
              specify the
              --exclusive option.  Otherwise the request will be denied.

              Also  with  any  of these options steps are not allowed to share
              blades, so resources would remain idle inside an  allocation  if
              the  step  running  on a blade does not take up all the nodes on
              the blade.

              The network option is also supported on systems with IBM’s  Par-
              allel  Environment (PE).  See IBM’s LoadLeveler job command key-
              word documentation about the keyword "network" for more informa-
              tion.   Multiple  values  may  be specified in a comma separated
              list.  All options  are  case  in-sensitive.   Supported  values
              include:

              BULK_XFER[=<resources>]
                          Enable  bulk  transfer  of data using Remote Direct-
                          Memory  Access  (RDMA).   The   optional   resources
                          specification  is  a  numeric value which can have a
                          suffix of "k", "K", "m", "M", "g" or "G"  for  kilo-
                          bytes,  megabytes or gigabytes.  NOTE: The resources
                          specification is not supported by the underlying IBM
                          infrastructure  as  of  Parallel Environment version
                          2.2 and no value should be specified at  this  time.
                          The  devices  allocated  to a job must all be of the
                          same type.  The default value depends  upon  depends
                          upon  what  hardware  is  available  and in order of
                          preferences is IPONLY (which is  not  considered  in
                          User Space mode), HFI, IB, HPCE, and KMUX.

              CAU=<count> Number   of   Collecitve  Acceleration  Units  (CAU)
                          required.  Applies only to IBM Power7-IH processors.
                          Default  value  is  zero.   Independent  CAU will be
                          allocated for each programming interface (MPI, LAPI,
                          etc.)

              DEVNAME=<name>
                          Specify  the  device  name to use for communications
                          (e.g. "eth0" or "mlx4_0").

              DEVTYPE=<type>
                          Specify the device type to use  for  communications.
                          The supported values of type are: "IB" (InfiniBand),
                          "HFI" (P7 Host Fabric Interface), "IPONLY"  (IP-Only
                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker-
                          nel Emulation of HPCE).  The devices allocated to  a
                          job must all be of the same type.  The default value
                          depends upon depends upon what hardware is available
                          and  in order of preferences is IPONLY (which is not
                          considered in User Space mode), HFI, IB,  HPCE,  and
                          KMUX.

              IMMED =<count>
                          Number  of immediate send slots per window required.
                          Applies only to IBM Power7-IH  processors.   Default
                          value is zero.

              INSTANCES =<count>
                          Specify  number of network connections for each task
                          on each network connection.   The  default  instance
                          count is 1.

              IPV4        Use  Internet Protocol (IP) version 4 communications
                          (default).

              IPV6        Use Internet Protocol (IP) version 6 communications.

              LAPI        Use the LAPI programming interface.

              MPI         Use  the  MPI  programming  interface.   MPI  is the
                          default interface.

              PAMI        Use the PAMI programming interface.

              SHMEM       Use the OpenSHMEM programming interface.

              SN_ALL      Use all available switch networks (default).

              SN_SINGLE   Use one available switch network.

              UPC         Use the UPC programming interface.

              US          Use User Space communications.


              Some examples of network specifications:

              Instances=2,US,MPI,SN_ALL
                          Create two user space connections for MPI communica-
                          tions on every switch network for each task.

              US,MPI,Instances=3,Devtype=IB
                          Create three user space connections for MPI communi-
                          cations on every InfiniBand network for each task.

              IPV4,LAPI,SN_Single
                          Create a IP version 4 connection for LAPI communica-
                          tions on one switch network for each task.

              Instances=2,US,LAPI,MPI
                          Create  two user space connections each for LAPI and
                          MPI communications on every switch network for  each
                          task.  Note  that  SN_ALL  is  the default option so
                          every  switch  network  is  used.  Also  note   that
                          Instances=2   specifies  that  two  connections  are
                          established for each protocol  (LAPI  and  MPI)  and
                          each task.  If there are two networks and four tasks
                          on the node then  a  total  of  32  connections  are
                          established  (2 instances x 2 protocols x 2 networks
                          x 4 tasks).


       --nice[=adjustment]
              Run the job with an adjusted scheduling priority  within  SLURM.
              With no adjustment value the scheduling priority is decreased by
              100. The adjustment range is from -10000 (highest  priority)  to
              10000  (lowest  priority).  Only  privileged users can specify a
              negative adjustment. NOTE: This option is presently  ignored  if
              SchedulerType=sched/wiki or SchedulerType=sched/wiki2.


       --ntasks-per-core=<ntasks>
              Request the maximum ntasks be invoked on each core.  This option
              applies to the job allocation,  but  not  to  step  allocations.
              Meant   to  be  used  with  the  --ntasks  option.   Related  to
              --ntasks-per-node except at the core level instead of  the  node
              level.   Masks will automatically be generated to bind the tasks
              to specific core unless  --cpu_bind=none  is  specified.   NOTE:
              This option is not supported unless SelectTypeParameters=CR_Core
              or SelectTypeParameters=CR_Core_Memory is configured.


       --ntasks-per-node=<ntasks>
              Request that ntasks be invoked on each node.  If used  with  the
              --ntasks  option,  the  --ntasks option will take precedence and
              the --ntasks-per-node will be treated  as  a  maximum  count  of
              tasks per node.  Meant to be used with the --nodes option.  This
              is related to --cpus-per-task=ncpus, but does not require knowl-
              edge  of the actual number of cpus on each node.  In some cases,
              it is more convenient to be able to request that no more than  a
              specific  number  of tasks be invoked on each node.  Examples of
              this include submitting a hybrid MPI/OpenMP app where  only  one
              MPI  "task/rank"  should be assigned to each node while allowing
              the OpenMP portion to utilize all of the parallelism present  in
              the node, or submitting a single setup/cleanup/monitoring job to
              each node of a pre-existing allocation as one step in  a  larger
              job script.


       --ntasks-per-socket=<ntasks>
              Request  the  maximum  ntasks  be  invoked on each socket.  This
              option applies to the job allocation, but not  to  step  alloca-
              tions.   Meant  to be used with the --ntasks option.  Related to
              --ntasks-per-node except at the socket level instead of the node
              level.   Masks will automatically be generated to bind the tasks
              to specific sockets unless --cpu_bind=none is specified.   NOTE:
              This   option   is   not   supported   unless  SelectTypeParame-
              ters=CR_Socket or SelectTypeParameters=CR_Socket_Memory is  con-
              figured.


       -O, --overcommit
              Overcommit  resources.  When applied to job allocation, only one
              CPU is allocated to the job per node and options used to specify
              the  number  of tasks per node, socket, core, etc.  are ignored.
              When applied to job step allocations (the srun command when exe-
              cuted  within  an  existing  job allocation), this option can be
              used to launch more than one task per CPU.  Normally, srun  will
              not  allocate  more  than  one  process  per CPU.  By specifying
              --overcommit you are explicitly allowing more than  one  process
              per  CPU. However no more than MAX_TASKS_PER_NODE tasks are per-
              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
              in  the  file  slurm.h and is not a variable, it is set at SLURM
              build time.


       -o, --output=<mode>
              Specify the mode for stdout redirection. By default in  interac-
              tive  mode,  srun  collects stdout from all tasks and sends this
              output via TCP/IP to the attached terminal. With --output stdout
              may  be  redirected  to  a  file,  to  one  file per task, or to
              /dev/null. See section IO  Redirection  below  for  the  various
              forms of mode.  If the specified file already exists, it will be
              overwritten.

              If --error is not also specified on the command line, both  std-
              out  and stderr will directed to the file specified by --output.


       --open-mode=<append|truncate>
              Open the output and error files using append or truncate mode as
              specified.  The default value is specified by the system config-
              uration parameter JobFileAppend.


       -p, --partition=<partition_names>
              Request a specific partition for the  resource  allocation.   If
              not  specified,  the default behavior is to allow the slurm con-
              troller to select the default partition  as  designated  by  the
              system  administrator.  If  the job can use more than one parti-
              tion, specify their names in a comma separate list and  the  one
              offering  earliest  initiation will be used with no regard given
              to the partition name ordering (although higher priority  parti-
              tions will be considered first).  When the job is initiated, the
              name of the partition used will  be  placed  first  in  the  job
              record partition string.


       --priority=<value>
              Request  a  specific job priority.  May be subject to configura-
              tion specific constraints.


       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
              enables detailed data collection by the acct_gather_profile plu-
              gin.  Detailed data are typically time-series that are stored in
              an HDF5 file for the job.


              All       All data types are collected. (Cannot be combined with
                        other values.)


              None      No data types are collected. This is the default.
                         (Cannot be combined with other values.)


              Energy    Energy data is collected.


              Task      Task (I/O, Memory, ...) data is collected.


              Filesystem
                        Filesystem data is collected.


              Network   Network (InfiniBand) data is collected.


       --prolog=<executable>
              srun  will  run  executable  just before launching the job step.
              The command line arguments for executable will  be  the  command
              and arguments of the job step.  If executable is "none", then no
              srun prolog will be run. This parameter overrides the SrunProlog
              parameter  in  slurm.conf. This parameter is completely indepen-
              dent from the Prolog parameter in slurm.conf.


       --propagate[=rlimits]
              Allows users to specify which of the modifiable (soft)  resource
              limits  to  propagate  to  the  compute nodes and apply to their
              jobs.  If rlimits is not specified,  then  all  resource  limits
              will be propagated.  The following rlimit names are supported by
              Slurm (although some options may not be supported on  some  sys-
              tems):

              ALL       All limits listed below

              AS        The maximum address space for a process

              CORE      The maximum size of core file

              CPU       The maximum amount of CPU time

              DATA      The maximum size of a process’s data segment

              FSIZE     The  maximum  size  of files created. Note that if the
                        user sets FSIZE to less than the current size  of  the
                        slurmd.log,  job  launches will fail with a ’File size
                        limit exceeded’ error.

              MEMLOCK   The maximum size that may be locked into memory

              NOFILE    The maximum number of open files

              NPROC     The maximum number of processes available

              RSS       The maximum resident set size

              STACK     The maximum stack size


       --pty  Execute task zero in  pseudo  terminal  mode.   Implicitly  sets
              --unbuffered.  Implicitly sets --error and --output to /dev/null
              for all tasks except task zero, which may cause those  tasks  to
              exit immediately (e.g. shells will typically exit immediately in
              that situation).  Not currently supported on AIX platforms.


       -Q, --quiet
              Suppress informational messages from srun. Errors will still  be
              displayed.


       -q, --quit-on-interrupt
              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
              disables  the  status  feature  normally  available  when   srun
              receives  a single Ctrl-C and causes srun to instead immediately
              terminate the running job.


       --qos=<qos>
              Request a quality of service for the job.   QOS  values  can  be
              defined  for  each user/cluster/account association in the SLURM
              database.  Users will be limited to their association’s  defined
              set  of  qos’s  when the SLURM configuration parameter, Account-
              ingStorageEnforce, includes "qos" in it’s definition.


       -r, --relative=<n>
              Run a job step relative to node n  of  the  current  allocation.
              This  option  may  be used to spread several job steps out among
              the nodes of the current job. If -r is  used,  the  current  job
              step  will  begin at node n of the allocated nodelist, where the
              first node is considered node 0.  The -r option is not permitted
              with  -w  or -x option and will result in a fatal error when not
              running within a prior allocation (i.e. when SLURM_JOB_ID is not
              set).  The  default  for n is 0. If the value of --nodes exceeds
              the number of nodes identified with  the  --relative  option,  a
              warning  message  will be printed and the --relative option will
              take precedence.


       --reboot
              Force the allocated nodes to reboot  before  starting  the  job.
              This  is only supported with some system configurations and will
              otherwise be silently ignored.


       --resv-ports
              Reserve communication ports for this job.  Used for OpenMPI.


       --reservation=<name>
              Allocate resources for the job from the named reservation.


       --restart-dir=<directory>
              Specifies the directory from which the job or job step’s  check-
              point  should  be  read (used by the checkpoint/blcrm and check-
              point/xlch plugins only).


       -s, --share
              The job allocation can share resources with other running  jobs.
              The  resources  to  be  shared  can be nodes, sockets, cores, or
              hyperthreads depending upon configuration.  The  default  shared
              behavior  depends  on  system  configuration and the partition’s
              Shared option takes precedence  over  the  job’s  option.   This
              option may result in the allocation being granted sooner than if
              the --share option was not set and allow higher system  utiliza-
              tion, but application performance will likely suffer due to com-
              petition for resources.  Also see the --exclusive option.


       -S, --core-spec=<num>
              Count of specialized cores per node reserved by the job for sys-
              tem  operations and not used by the application. The application
              will not use these cores, but will be charged for their  alloca-
              tion.   Default  value  is  dependent upon the node’s configured
              CoreSpecCount value.  If a value of zero is designated  and  the
              Slurm  configuration  option AllowSpecResourcesUsage is enabled,
              the job will be allowed to override CoreSpecCount  and  use  the
              specialized resources on nodes it is allocated.


       --signal=<sig_num>[@<sig_time>]
              When  a  job is within sig_time seconds of its end time, send it
              the signal sig_num.  Due to the resolution of event handling  by
              SLURM,  the  signal  may  be  sent up to 60 seconds earlier than
              specified.  sig_num may either be a signal number or name  (e.g.
              "10"  or "USR1").  sig_time must have an integer value between 0
              and 65535.  By default, no signal is sent before the  job’s  end
              time.   If  a  sig_num  is  specified  without any sig_time, the
              default time will be 60 seconds.


       --slurmd-debug=<level>
              Specify a debug level for slurmd(8). The level may be  specified
              either  an  integer value between 0 [quiet, only errors are dis-
              played] and 4 [verbose operation] or the SlurmdDebug tags.

              quiet     Log nothing

              fatal     Log only fatal errors

              error     Log only errors

              info      Log errors and general informational messages

              verbose   Log errors and verbose informational messages


              The slurmd debug information is copied onto the stderr of
              the job. By default only errors are displayed.


       --sockets-per-node=<sockets>
              Restrict  node  selection  to  nodes with at least the specified
              number of sockets.  See additional information under  -B  option
              above when task/affinity plugin is enabled.


       --switches=<count>[@<max-time>]
              When  a tree topology is used, this defines the maximum count of
              switches desired for the job allocation and optionally the maxi-
              mum  time to wait for that number of switches. If SLURM finds an
              allocation containing more switches than  the  count  specified,
              the job remains pending until it either finds an allocation with
              desired switch count or the time limit expires.  It there is  no
              switch  count  limit,  there  is  no  delay in starting the job.
              Acceptable time formats  include  "minutes",  "minutes:seconds",
              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
              "days-hours:minutes:seconds".  The job’s maximum time delay  may
              be limited by the system administrator using the SchedulerParam-
              eters configuration parameter with the max_switch_wait parameter
              option.   The default max-time is the max_switch_wait Scheduler-
              Parameters.


       -T, --threads=<nthreads>
              Allows limiting the number of concurrent threads  used  to  send
              the job request from the srun process to the slurmd processes on
              the allocated nodes. Default is to use one thread per  allocated
              node  up  to a maximum of 60 concurrent threads. Specifying this
              option limits the number of concurrent threads to nthreads (less
              than  or  equal  to  60).  This should only be used to set a low
              thread count for testing on very small memory computers.


       -t, --time=<time>
              Set a limit on the total run time of the job or  job  step.   If
              the  requested time limit for a job exceeds the partition’s time
              limit, the job will be left in a PENDING state (possibly indefi-
              nitely).  If the requested time limit for a job step exceeds the
              partition’s time limit, the job step will not be initiated.  The
              default  time limit is the partition’s default time limit.  When
              the time limit is reached, each task in each job  step  is  sent
              SIGTERM  followed by SIGKILL.  The limit is for the job, all job
              steps are signaled. If the time limit is for a single  job  step
              within  an  existing  job allocation, only that job step will be
              affected. A job time limit supersedes all job step time  limits.
              The  interval  between  SIGTERM  and SIGKILL is specified by the
              SLURM configuration parameter KillWait.  A time  limit  of  zero
              requests that no time limit be imposed.  Acceptable time formats
              include "minutes",  "minutes:seconds",  "hours:minutes:seconds",
              "days-hours",  "days-hours:minutes" and "days-hours:minutes:sec-
              onds".


       --task-epilog=<executable>
              The slurmstepd daemon will run executable just after  each  task
              terminates.  This will be executed before any TaskEpilog parame-
              ter in slurm.conf is executed.  This  is  meant  to  be  a  very
              short-lived  program. If it fails to terminate within a few sec-
              onds, it will be killed along with any descendant processes.


       --task-prolog=<executable>
              The slurmstepd daemon will run executable just before  launching
              each  task. This will be executed after any TaskProlog parameter
              in slurm.conf is executed.  Besides the normal environment vari-
              ables, this has SLURM_TASK_PID available to identify the process
              ID of the task being started.  Standard output from this program
              of  the form "export NAME=value" will be used to set environment
              variables for the task being spawned.


       --test-only
              Returns an estimate of when a job  would  be  scheduled  to  run
              given  the  current  job  queue and all the other srun arguments
              specifying the job.  This limits srun’s behavior to just  return
              information;  no job is actually submitted.  EXCEPTION: On Blue-
              gene/Q systems on when running within an  existing  job  alloca-
              tion,  this  disables  the  use of "runjob" to launch tasks. The
              program will be executed directly by the slurmd daemon.


       --threads-per-core=<threads>
              Restrict node selection to nodes with  at  least  the  specified
              number  of threads per core.  NOTE: "Threads" refers to the num-
              ber of processing units on each core rather than the  number  of
              application  tasks  to  be  launched  per  core.  See additional
              information under -B option above when task/affinity  plugin  is
              enabled.


       --time-min=<time>
              Set  a  minimum time limit on the job allocation.  If specified,
              the job may have it’s --time limit lowered to a value  no  lower
              than  --time-min  if doing so permits the job to begin execution
              earlier than otherwise possible.  The job’s time limit will  not
              be  changed  after the job is allocated resources.  This is per-
              formed by a backfill scheduling algorithm to allocate  resources
              otherwise  reserved  for  higher priority jobs.  Acceptable time
              formats  include   "minutes",   "minutes:seconds",   "hours:min-
              utes:seconds",     "days-hours",     "days-hours:minutes"    and
              "days-hours:minutes:seconds".


       --tmp=<MB>
              Specify a minimum amount of temporary disk space.


       -u, --unbuffered
              By default the task launched by srun uses the buffering  of  its
              runtime  environment,  e.g. in glibc the stdout is line buffered
              and stderr is unbuffered.  This option causes the stdout of  the
              task to be line buffered.

       --usage
              Display brief help message and exit.


       --uid=<user>
              Attempt to submit and/or run a job as user instead of the invok-
              ing user id. The invoking user’s credentials  will  be  used  to
              check access permissions for the target partition. User root may
              use this option to run jobs as a normal user in a RootOnly  par-
              tition  for  example. If run as root, srun will drop its permis-
              sions to the uid specified after node allocation is  successful.
              user may be the user name or numerical user ID.


       -V, --version
              Display version information and exit.


       -v, --verbose
              Increase the verbosity of srun’s informational messages.  Multi-
              ple -v’s will further increase  srun’s  verbosity.   By  default
              only errors will be displayed.


       -W, --wait=<seconds>
              Specify  how long to wait after the first task terminates before
              terminating all remaining tasks.  A  value  of  0  indicates  an
              unlimited  wait (a warning will be issued after 60 seconds). The
              default value is set by the WaitTime parameter in the slurm con-
              figuration  file  (see slurm.conf(5)). This option can be useful
              to insure that a job is terminated in a timely  fashion  in  the
              event  that  one or more tasks terminate prematurely.  Note: The
              -K, --kill-on-bad-exit option takes precedence over  -W,  --wait
              to terminate the job immediately if a task exits with a non-zero
              exit code.


       -w, --nodelist=<host1,host2,... or filename>
              Request a specific list of hosts.   Unless  constrained  by  the
              maximum  node  count,  the  job will contain all of these hosts.
              The list may be specified as a comma-separated list of hosts,  a
              range  of  hosts  (host[1-5,7,...]  for example), or a filename.
              The host list will be assumed to be a filename if it contains  a
              "/" character.  If you specify a maximum node count and the host
              list contains more nodes, the extra node names will be  silently
              ignored.   If  you  specify  a  minimum  node or processor count
              larger than can be satisfied by the supplied  host  list,  addi-
              tional  resources  will  be  allocated on other nodes as needed.
              Rather than repeating a host name multiple  times,  an  asterisk
              and a repitition count may be appended to a host name. For exam-
              ple "host1,host1" and "host1*2" are equivalent.


       --wckey=<wckey>
              Specify wckey to be used with job.  If  TrackWCKey=no  (default)
              in the slurm.conf this value is ignored.


       -X, --disable-status
              Disable  the  display of task status when srun receives a single
              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
              running  job.  Without this option a second Ctrl-C in one second
              is required to forcibly terminate the job and srun will  immedi-
              ately  exit.  May  also  be  set  via  the  environment variable
              SLURM_DISABLE_STATUS.


       -x, --exclude=<host1,host2,... or filename>
              Request that a specific list of hosts not  be  included  in  the
              resources  allocated  to this job. The host list will be assumed
              to be a filename if it contains a "/"character.


       -Z, --no-allocate
              Run the specified tasks on a set of  nodes  without  creating  a
              SLURM  "job"  in the SLURM queue structure, bypassing the normal
              resource allocation step.  The list of nodes must  be  specified
              with  the  -w,  --nodelist  option.  This is a privileged option
              only available for the users "SlurmUser" and "root".


       The following options support Blue Gene systems, but may be  applicable
       to other systems as well.


       --blrts-image=<path>
              Path to blrts image for bluegene block.  BGL only.  Default from
              blugene.conf if not set.


       --cnload-image=<path>
              Path to compute  node  image  for  bluegene  block.   BGP  only.
              Default from blugene.conf if not set.


       --conn-type=<type>
              Require  the  block connection type to be of a certain type.  On
              Blue Gene the acceptable of type are MESH, TORUS  and  NAV.   If
              NAV,  or  if  not  set,  then  SLURM  will try to fit a what the
              DefaultConnType is set to in the bluegene.conf if that isn’t set
              the  default is TORUS.  You should not normally set this option.
              If running on a BGP system and wanting to run in HTC mode  (only
              for 1 midplane and below).  You can use HTC_S for SMP, HTC_D for
              Dual, HTC_V for virtual node mode, and  HTC_L  for  Linux  mode.
              For systems that allow a different connection type per dimension
              you can supply a comma separated list of connection types may be
              specified,  one for each dimension (i.e. M,T,T,T will give you a
              torus connection is all dimensions expect the first).


       -g, --geometry=<XxYxZ> | <AxXxYxZ>
              Specify the geometry requirements for the job. On BlueGene/L and
              BlueGene/P  systems there are three numbers giving dimensions in
              the X, Y and Z directions, while on BlueGene/Q systems there are
              four  numbers  giving dimensions in the A, X, Y and Z directions
              and can  not  be  used  to  allocate  sub-blocks.   For  example
              "--geometry=1x2x3x4",  specifies a block of nodes having 1 x 2 x
              3 x 4 = 24 nodes (actually midplanes on BlueGene).


       --ioload-image=<path>
              Path to io image for bluegene block.  BGP  only.   Default  from
              blugene.conf if not set.


       --linux-image=<path>
              Path to linux image for bluegene block.  BGL only.  Default from
              blugene.conf if not set.


       --mloader-image=<path>
              Path to mloader image for bluegene  block.   Default  from  blu-
              gene.conf if not set.


       -R, --no-rotate
              Disables  rotation  of  the job’s requested geometry in order to
              fit an appropriate block.  By default the specified geometry can
              rotate in three dimensions.


       --ramdisk-image=<path>
              Path  to  ramdisk  image for bluegene block.  BGL only.  Default
              from blugene.conf if not set.


       srun will submit the job request to the slurm job controller, then ini-
       tiate  all  processes on the remote nodes. If the request cannot be met
       immediately, srun will block until the resources are free  to  run  the
       job. If the -I (--immediate) option is specified srun will terminate if
       resources are not immediately available.

       When initiating remote processes srun will propagate the current  work-
       ing  directory,  unless --chdir=<path> is specified, in which case path
       will become the working directory for the remote processes.

       The -n, -c, and -N options control how CPUs  and nodes  will  be  allo-
       cated  to  the job. When specifying only the number of processes to run
       with -n, a default of one CPU per process is allocated.  By  specifying
       the  number  of  CPUs  required per task (-c), more than one CPU may be
       allocated per process. If the number of nodes  is  specified  with  -N,
       srun will attempt to allocate at least the number of nodes specified.

       Combinations  of the above three options may be used to change how pro-
       cesses are distributed across nodes and cpus. For instance, by specify-
       ing  both  the number of processes and number of nodes on which to run,
       the number of processes per node is implied. However, if the number  of
       CPUs  per  process  is more important then number of processes (-n) and
       the number of CPUs per process (-c) should be specified.

       srun will refuse to  allocate more than  one  process  per  CPU  unless
       --overcommit (-O) is also specified.

       srun will attempt to meet the above specifications "at a minimum." That
       is, if 16 nodes are requested for 32 processes, and some nodes  do  not
       have 2 CPUs, the allocation of nodes will be increased in order to meet
       the demand for CPUs. In other words, a minimum of 16  nodes  are  being
       requested.  However,  if  16 nodes are requested for 15 processes, srun
       will consider this an error, as  15  processes  cannot  run  across  16
       nodes.


       IO Redirection

       By  default, stdout and stderr will be redirected from all tasks to the
       stdout and stderr of srun, and stdin will be redirected from the  stan-
       dard input of srun to all remote tasks.  If stdin is only to be read by
       a subset of the spawned tasks, specifying a file to  read  from  rather
       than  forwarding  stdin  from  the srun command may be preferable as it
       avoids moving and storing data that will never be read.

       For OS X, the poll() function does not support stdin, so input  from  a
       terminal is not possible.

       For  BGQ  srun only supports stdin to 1 task running on the system.  By
       default it is taskid 0 but  can  be  changed  with  the  -i<taskid>  as
       described below, or --launcher-opts="--stdinrank=<taskid>".

       This  behavior  may  be changed with the --output, --error, and --input
       (-o, -e, -i) options. Valid format specifications for these options are

       all       stdout stderr is redirected from all tasks to srun.  stdin is
                 broadcast to all remote tasks.  (This is the  default  behav-
                 ior)

       none      stdout  and  stderr  is not received from any task.  stdin is
                 not sent to any task (stdin is closed).

       taskid    stdout and/or stderr are redirected from only the  task  with
                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
                 where ntasks is the total number of tasks in the current  job
                 step.   stdin  is  redirected  from the stdin of srun to this
                 same task.  This file will be written on the  node  executing
                 the task.

       filename  srun  will  redirect  stdout  and/or stderr to the named file
                 from all tasks.  stdin will be redirected from the named file
                 and  broadcast to all tasks in the job.  filename refers to a
                 path on the host that runs srun.  Depending on the  cluster’s
                 file  system  layout, this may result in the output appearing
                 in different places depending on whether the job  is  run  in
                 batch mode.

       format string
                 srun  allows  for  a format string to be used to generate the
                 named IO file described above. The following list  of  format
                 specifiers  may  be  used  in the format string to generate a
                 filename that will be unique to a given jobid, stepid,  node,
                 or  task.  In  each case, the appropriate number of files are
                 opened and associated with the corresponding tasks. Note that
                 any  format string containing %t, %n, and/or %N will be writ-
                 ten on the node executing the task rather than the node where
                 srun executes, these format specifiers are not supported on a
                 BGQ system.

                 %A     Job array’s master job allocation number.

                 %a     Job array ID (index) number.

                 %J     jobid.stepid of the running job. (e.g. "128.0")

                 %j     jobid of the running job.

                 %s     stepid of the running job.

                 %N     short hostname. This will create a  separate  IO  file
                        per node.

                 %n     Node  identifier  relative to current job (e.g. "0" is
                        the first node of the running job) This will create  a
                        separate IO file per node.

                 %t     task  identifier  (rank) relative to current job. This
                        will create a separate IO file per task.

                 %u     User name.

                 A number placed between  the  percent  character  and  format
                 specifier  may be used to zero-pad the result in the IO file-
                 name. This number is ignored if the format  specifier  corre-
                 sponds to  non-numeric data (%N for example).

                 Some  examples  of  how the format string may be used for a 4
                 task job step with a Job ID of 128  and  step  id  of  0  are
                 included below:

                 job%J.out      job128.0.out

                 job%4j.out     job0128.out

                 job%j-%2t.out  job128-00.out, job128-01.out, ...




INPUT ENVIRONMENT VARIABLES
       Some srun options may be set via environment variables.  These environ-
       ment variables, along with  their  corresponding  options,  are  listed
       below.  Note: Command line options will always override these settings.

       PMI_FANOUT            This is used exclusively  with  PMI  (MPICH2  and
                             MVAPICH2)  and controls the fanout of data commu-
                             nications. The srun  command  sends  messages  to
                             application  programs  (via  the PMI library) and
                             those applications may be called upon to  forward
                             that  data  to  up  to  this number of additional
                             tasks. Higher values offload work from  the  srun
                             command  to  the applications and likely increase
                             the vulnerability to failures.  The default value
                             is 32.

       PMI_FANOUT_OFF_HOST   This  is  used  exclusively  with PMI (MPICH2 and
                             MVAPICH2) and controls the fanout of data  commu-
                             nications.   The  srun  command sends messages to
                             application programs (via the  PMI  library)  and
                             those  applications may be called upon to forward
                             that data to additional tasks. By  default,  srun
                             sends  one  message per host and one task on that
                             host forwards the data to  other  tasks  on  that
                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
                             defined, the user task may be required to forward
                             the  data  to  tasks  on  other  hosts.   Setting
                             PMI_FANOUT_OFF_HOST  may  increase   performance.
                             Since  more  work is performed by the PMI library
                             loaded by the user application, failures also can
                             be more common and more difficult to diagnose.

       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
                             MVAPICH2) and controls how  much  the  communica-
                             tions  from  the tasks to the srun are spread out
                             in time in order to avoid overwhelming  the  srun
                             command  with  work.  The  default  value  is 500
                             (microseconds) per task. On relatively slow  pro-
                             cessors  or  systems  with  very  large processor
                             counts (and large PMI data sets),  higher  values
                             may be required.

       SLURM_CONF            The location of the SLURM configuration file.

       SLURM_ACCOUNT         Same as -A, --account

       SLURM_ACCTG_FREQ      Same as --acctg-freq

       SLURM_BLRTS_IMAGE     Same as --blrts-image

       SLURM_CHECKPOINT      Same as --checkpoint

       SLURM_CHECKPOINT_DIR  Same as --checkpoint-dir

       SLURM_CNLOAD_IMAGE    Same as --cnload-image

       SLURM_CONN_TYPE       Same as --conn-type

       SLURM_CORE_SPEC       Same as --core-spec

       SLURM_CPU_BIND        Same as --cpu_bind

       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.

       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task

       SLURM_DEBUG           Same as -v, --verbose

       SLURMD_DEBUG          Same as -d, --slurmd-debug

       SLURM_DEPENDENCY      -P, --dependency=<jobid>

       SLURM_DISABLE_STATUS  Same as -X, --disable-status

       SLURM_DIST_PLANESIZE  Same as -m plane

       SLURM_DISTRIBUTION    Same as -m, --distribution

       SLURM_EPILOG          Same as --epilog

       SLURM_EXCLUSIVE       Same as --exclusive

       SLURM_EXIT_ERROR      Specifies  the  exit  code generated when a SLURM
                             error occurs (e.g. invalid options).  This can be
                             used  by a script to distinguish application exit
                             codes from various SLURM error conditions.   Also
                             see SLURM_EXIT_IMMEDIATE.

       SLURM_EXIT_IMMEDIATE  Specifies   the  exit  code  generated  when  the
                             --immediate option is used and resources are  not
                             currently  available.   This  can  be  used  by a
                             script to distinguish application exit codes from
                             various   SLURM   error   conditions.   Also  see
                             SLURM_EXIT_ERROR.

       SLURM_GEOMETRY        Same as -g, --geometry

       SLURM_HINT            Same as --hint

       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES

       SLURM_IMMEDIATE       Same as -I, --immediate

       SLURM_IOLOAD_IMAGE    Same as --ioload-image

       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
                             Same as --jobid

       SLURM_JOB_NAME        Same as -J, --job-name except within an  existing
                             allocation,  in which case it is ignored to avoid
                             using the batch job’s name as the  name  of  each
                             job step.

       SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
                             Total  number  of  nodes  in  the job’s resource
                             allocation.

       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit

       SLURM_LABELIO         Same as -l, --label

       SLURM_LINUX_IMAGE     Same as --linux-image

       SLURM_MEM_BIND        Same as --mem_bind

       SLURM_MEM_PER_CPU     Same as --mem-per-cpu

       SLURM_MEM_PER_NODE    Same as --mem

       SLURM_MLOADER_IMAGE   Same as --mloader-image

       SLURM_MPI_TYPE        Same as --mpi

       SLURM_NETWORK         Same as --network

       SLURM_NNODES          Same as -N, --nodes

       SLURM_NO_ROTATE       Same as -R, --no-rotate

       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
                             Same as -n, --ntasks

       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core

       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node

       SLURM_NTASKS_PER_SOCKET
                             Same as --ntasks-per-socket

       SLURM_OPEN_MODE       Same as --open-mode

       SLURM_OVERCOMMIT      Same as -O, --overcommit

       SLURM_PARTITION       Same as -p, --partition

       SLURM_PMI_KVS_NO_DUP_KEYS
                             If set, then PMI key-pairs will contain no dupli-
                             cate  keys.  MPI  can use this variable to inform
                             the PMI library that it will  not  use  duplicate
                             keys  so  PMI  can  skip  the check for duplicate
                             keys.  This is the case for  MPICH2  and  reduces
                             overhead  in  testing for duplicates for improved
                             performance

       SLURM_PROFILE         Same as --profile

       SLURM_PROLOG          Same as --prolog

       SLURM_QOS             Same as --qos

       SLURM_RAMDISK_IMAGE   Same as --ramdisk-image

       SLURM_REMOTE_CWD      Same as -D, --chdir=

       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
                             maximum  count  of  switches  desired for the job
                             allocation and optionally  the  maximum  time  to
                             wait for that number of switches. See --switches

       SLURM_RESERVATION     Same as --reservation

       SLURM_RESTART_DIR     Same as --restart-dir

       SLURM_RESV_PORTS      Same as --resv-ports

       SLURM_SIGNAL          Same as --signal

       SLURM_STDERRMODE      Same as -e, --error

       SLURM_STDINMODE       Same as -i, --input

       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
                             if  set  and  non-zero, successive task exit mes-
                             sages with the same exit  code  will  be  printed
                             only once.

       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
                             job allocations).  Also see SLURM_GRES

       SLURM_STEP_KILLED_MSG_NODE_ID=ID
                             If set, only the specified node will log when the
                             job or step are killed by a signal.

       SLURM_STDOUTMODE      Same as -o, --output

       SLURM_TASK_EPILOG     Same as --task-epilog

       SLURM_TASK_PROLOG     Same as --task-prolog

       SLURM_THREADS         Same as -T, --threads

       SLURM_TIMELIMIT       Same as -t, --time

       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered

       SLURM_WAIT            Same as -W, --wait

       SLURM_WAIT4SWITCH     Max  time  waiting  for  requested  switches. See
                             --switches

       SLURM_WCKEY           Same as -W, --wckey

       SLURM_WORKING_DIR     -D, --chdir



OUTPUT ENVIRONMENT VARIABLES
       srun will set some environment variables in the environment of the exe-
       cuting  tasks on the remote compute nodes.  These environment variables
       are:


       SLURM_CHECKPOINT_IMAGE_DIR
                             Directory into which checkpoint images should  be
                             written if specified on the execute line.

       SLURM_CLUSTER_NAME    Name  of  the cluster on which the job is execut-
                             ing.

       SLURM_CPU_BIND_VERBOSE
                             --cpu_bind verbosity (quiet,verbose).

       SLURM_CPU_BIND_TYPE   --cpu_bind type (none,rank,map_cpu:,mask_cpu:).

       SLURM_CPU_BIND_LIST   --cpu_bind map or mask list (list  of  SLURM  CPU
                             IDs  or  masks for this node, CPU_ID = Board_ID x
                             threads_per_board       +       Socket_ID       x
                             threads_per_socket + Core_ID x threads_per_core +
                             Thread_ID).


       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
                             the  srun  command  as  a  numerical frequency in
                             kilohertz, or a coded value for a request of low,
                             medium,highm1 or high for the frequency.  See the
                             description  of  the  --cpu-freq  option  or  the
                             SLURM_CPU_FREQ_REQ input environment variable.

       SLURM_CPUS_ON_NODE    Count  of processors available to the job on this
                             node.  Note the  select/linear  plugin  allocates
                             entire  nodes to jobs, so the value indicates the
                             total  count  of  CPUs  on  the  node.   For  the
                             select/cons_res plugin, this number indicates the
                             number of cores on this  node  allocated  to  the
                             job.

       SLURM_CPUS_PER_TASK   Number  of  cpus requested per task.  Only set if
                             the --cpus-per-task option is specified.

       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
                             distribution with -m, --distribution.

       SLURM_GTIDS           Global  task IDs running on this node.  Zero ori-
                             gin and comma separated.

       SLURM_JOB_CPUS_PER_NODE
                             Number of CPUS per node.

       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.

       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
                             Job id of the executing job.


       SLURM_JOB_NAME        Set to the value of the --job-name option or  the
                             command  name  when  srun is used to create a new
                             job allocation. Not set when srun is used only to
                             create  a  job  step (i.e. within an existing job
                             allocation).


       SLURM_JOB_PARTITION   Name of the partition in which the  job  is  run-
                             ning.


       SLURM_LAUNCH_NODE_IPADDR
                             IP address of the node from which the task launch
                             was initiated (where the srun command ran  from).

       SLURM_LOCALID         Node  local task ID for the process within a job.


       SLURM_MEM_BIND_VERBOSE
                             --mem_bind verbosity (quiet,verbose).

       SLURM_MEM_BIND_TYPE   --mem_bind type (none,rank,map_mem:,mask_mem:).

       SLURM_MEM_BIND_LIST   --mem_bind map or mask  list  (<list  of  IDs  or
                             masks for this node>).


       SLURM_NNODES          Total number of nodes in the job’s resource allo-
                             cation.

       SLURM_NODE_ALIASES    Sets of  node  name,  communication  address  and
                             hostname  for nodes allocated to the job from the
                             cloud. Each element in the set if colon separated
                             and each set is comma separated. For example:
                             SLURM_NODE_ALIASES=
                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar

       SLURM_NODEID          The relative node ID of the current node.

       SLURM_NODELIST        List of nodes allocated to the job.

       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
                             Total number of processes in the current job.

       SLURM_PRIO_PROCESS    The  scheduling priority (nice value) at the time
                             of job submission.  This value is  propagated  to
                             the spawned processes.

       SLURM_PROCID          The MPI rank (or relative process ID) of the cur-
                             rent process.

       SLURM_SRUN_COMM_HOST  IP address of srun communication host.

       SLURM_SRUN_COMM_PORT  srun communication port.

       SLURM_STEP_LAUNCHER_PORT
                             Step launcher port.

       SLURM_STEP_NODELIST   List of nodes allocated to the step.

       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.

       SLURM_STEP_NUM_TASKS  Number of processes in the step.

       SLURM_STEP_TASKS_PER_NODE
                             Number of processes per node within the step.

       SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
                             The step ID of the current job.

       SLURM_SUBMIT_DIR      The directory from which srun was invoked.

       SLURM_SUBMIT_HOST     The hostname of the computer  from  which  salloc
                             was invoked.

       SLURM_TASK_PID        The process ID of the task being started.

       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
                             Values are comma separated and in the same  order
                             as  SLURM_NODELIST.   If  two or more consecutive
                             nodes are to have the same task count, that count
                             is followed by "(x#)" where "#" is the repetition
                             count.                For                example,
                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
                             first three nodes will each execute  three  tasks
                             and the fourth node will execute one task.


       SLURM_TOPOLOGY_ADDR   This  is  set  only  if the system has the topol-
                             ogy/tree plugin configured.  The  value  will  be
                             set  to  the  names network switches which may be
                             involved in the  job’s  communications  from  the
                             system’s top level switch down to the leaf switch
                             and ending with node name. A period  is  used  to
                             separate each hardware component name.

       SLURM_TOPOLOGY_ADDR_PATTERN
                             This  is  set  only  if the system has the topol-
                             ogy/tree plugin configured.  The  value  will  be
                             set   component   types  listed  in  SLURM_TOPOL-
                             OGY_ADDR.  Each component will be  identified  as
                             either  "switch"  or "node".  A period is used to
                             separate each hardware component type.


       SRUN_DEBUG            Set to the logging level  of  the  srun  command.
                             Default  value  is  3 (info level).  The value is
                             incremented or decremented based upon the  --ver-
                             bose and --quiet options.


       MPIRUN_NOALLOCATE     Do  not  allocate  a  block  on Blue Gene systems
                             only.

       MPIRUN_NOFREE         Do not free a block on Blue Gene systems only.

       MPIRUN_PARTITION      The block name on Blue Gene systems only.



SIGNALS AND ESCAPE SEQUENCES
       Signals sent to the srun command are  automatically  forwarded  to  the
       tasks  it  is  controlling  with  a few exceptions. The escape sequence
       <control-c> will report the state of all tasks associated with the srun
       command.  If  <control-c>  is entered twice within one second, then the
       associated SIGINT signal will be sent to all tasks  and  a  termination
       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
       spawned tasks.  If a third <control-c> is received,  the  srun  program
       will  be  terminated  without waiting for remote tasks to exit or their
       I/O to complete.

       The escape sequence <control-z> is presently ignored. Our intent is for
       this put the srun command into a mode where various special actions may
       be invoked.


MPI SUPPORT
       MPI use depends upon the type of MPI being used.  There are three  fun-
       damentally  different  modes  of  operation  used  by these various MPI
       implementation.

       1. SLURM directly launches the tasks  and  performs  initialization  of
       communications  (Quadrics  MPI, MPICH2, MPICH-GM, MVAPICH, MVAPICH2 and
       some MPICH1 modes). For example: "srun -n16 a.out".

       2. SLURM creates a resource allocation for  the  job  and  then  mpirun
       launches  tasks  using SLURM’s infrastructure (OpenMPI, LAM/MPI, HP-MPI
       and some MPICH1 modes).

       3. SLURM creates a resource allocation for  the  job  and  then  mpirun
       launches  tasks  using  some mechanism other than SLURM, such as SSH or
       RSH (BlueGene MPI and some MPICH1 modes).  These tasks  initiated  out-
       side of SLURM’s monitoring or control. SLURM’s epilog should be config-
       ured to purge these tasks when the job’s allocation is relinquished.

       See http://slurm.schedmd.com/mpi_guide.html for more information on use
       of these various MPI implementation with SLURM.


MULTIPLE PROGRAM CONFIGURATION
       Comments  in the configuration file must have a "#" in column one.  The
       configuration file contains the following  fields  separated  by  white
       space:

       Task rank
              One or more task ranks to use this configuration.  Multiple val-
              ues may be comma separated.  Ranges may be  indicated  with  two
              numbers separated with a ’-’ with the smaller number first (e.g.
              "0-4" and not "4-0").  To indicate all tasks not otherwise spec-
              ified,  specify  a rank of ’*’ as the last line of the file.  If
              an attempt is made to initiate a task for  which  no  executable
              program is defined, the following error message will be produced
              "No executable program specified for this task".

       Executable
              The name of the program to  execute.   May  be  fully  qualified
              pathname if desired.

       Arguments
              Program  arguments.   The  expression "%t" will be replaced with
              the task’s number.  The expression "%o" will  be  replaced  with
              the task’s offset within this range (e.g. a configured task rank
              value of "1-5" would  have  offset  values  of  "0-4").   Single
              quotes  may  be  used to avoid having the enclosed values inter-
              preted.  This field is optional.  Any arguments for the  program
              entered on the command line will be added to the arguments spec-
              ified in the configuration file.

       For example:
       ###################################################################
       # srun multiple program configuration file
       #
       # srun -n8 -l --multi-prog silly.conf
       ###################################################################
       4-6       hostname
       1,7       echo  task:%t
       0,2-3     echo  offset:%o

       > srun -n8 -l --multi-prog silly.conf
       0: offset:0
       1: task:1
       2: offset:1
       3: offset:2
       4: linux15.llnl.gov
       5: linux16.llnl.gov
       6: linux17.llnl.gov
       7: task:7




EXAMPLES
       This simple example demonstrates the execution of the command  hostname
       in  eight tasks. At least eight processors will be allocated to the job
       (the same as the task count) on however many nodes are required to sat-
       isfy  the  request.  The output of each task will be proceeded with its
       task number.  (The machine "dev" in the example below has  a  total  of
       two CPUs per node)


       > srun -n8 -l hostname
       0: dev0
       1: dev0
       2: dev1
       3: dev1
       4: dev2
       5: dev2
       6: dev3
       7: dev3


       The  srun -r option is used within a job script to run two job steps on
       disjoint nodes in the following example. The script is run using  allo-
       cate mode instead of as a batch job in this case.


       > cat test.sh
       #!/bin/sh
       echo $SLURM_NODELIST
       srun -lN2 -r2 hostname
       srun -lN2 hostname

       > salloc -N4 test.sh
       dev[7-10]
       0: dev9
       1: dev10
       0: dev7
       1: dev8


       The following script runs two job steps in parallel within an allocated
       set of nodes.


       > cat test.sh
       #!/bin/bash
       srun -lN2 -n4 -r 2 sleep 60 &
       srun -lN2 -r 0 sleep 60 &
       sleep 1
       squeue
       squeue -s
       wait

       > salloc -N4 test.sh
         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]

       STEPID     PARTITION     USER      TIME NODELIST
       65641.0        batch   grondo      0:01 dev[7-8]
       65641.1        batch   grondo      0:01 dev[9-10]


       This example demonstrates how one executes a simple MPICH job.  We  use
       srun  to  build  a list of machines (nodes) to be used by mpirun in its
       required format. A sample command line and the script  to  be  executed
       follow.


       > cat test.sh
       #!/bin/sh
       MACHINEFILE="nodes.$SLURM_JOB_ID"

       # Generate Machinefile for mpich such that hosts are in the same
       #  order as if run via srun
       #
       srun -l /bin/hostname | sort -n | awk ’{print $2}’ > $MACHINEFILE

       # Run using generated Machine file:
       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app

       rm $MACHINEFILE

       > salloc -N2 -n4 test.sh


       This  simple  example  demonstrates  the execution of different jobs on
       different nodes in the same srun.  You can do this for  any  number  of
       nodes  or  any number of jobs.  The executables are placed on the nodes
       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num-
       ber specified on the srun commandline.


       > cat test.sh
       case $SLURM_NODEID in
           0) echo "I am running on "
              hostname ;;
           1) hostname
              echo "is where I am running" ;;
       esac

       > srun -N2 test.sh
       dev0
       is where I am running
       I am running on
       dev1


       This  example  demonstrates use of multi-core options to control layout
       of tasks.  We request that four sockets per  node  and  two  cores  per
       socket be dedicated to the job.


       > srun -N2 -B 4-4:2-2 a.out

       This  example shows a script in which Slurm is used to provide resource
       management for a job by executing the various job steps  as  processors
       become available for their dedicated use.


       > cat my.script
       #!/bin/bash
       srun --exclusive -n4 prog1 &
       srun --exclusive -n3 prog2 &
       srun --exclusive -n1 prog3 &
       srun --exclusive -n1 prog4 &
       wait



COPYING
       Copyright  (C)  2006-2007  The Regents of the University of California.
       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2008-2010 Lawrence Livermore National Security.
       Copyright (C) 2010-2013 SchedMD LLC.

       This file is  part  of  SLURM,  a  resource  management  program.   For
       details, see <http://slurm.schedmd.com/>.

       SLURM  is free software; you can redistribute it and/or modify it under
       the terms of the GNU General Public License as published  by  the  Free
       Software  Foundation;  either  version  2  of  the License, or (at your
       option) any later version.

       SLURM is distributed in the hope that it will be  useful,  but  WITHOUT
       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
       for more details.


SEE ALSO
       salloc(1),  sattach(1),  sbatch(1), sbcast(1), scancel(1), scontrol(1),
       squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)



November 2014                     SLURM 14.11                          srun(1)

Generation:

mantohtml srun