User Tools

Site Tools


services:cluster:interactivesessions

Interactive sessions with Slurm

You don't have to run your jobs non-interactively using sbatch. It's also possible to open an interactive shell through the queuing system just like when using ssh to a node. You may only use ssh to log into a node to check if your sbatch job is running fine (using top for example). You may not start calculations this way, since this bypasses the queuing system and will take resources assigned to other users. When starting an interactive shell through the queuing system you get the added benefit that the environment will be set up just like for an sbatch type job. So you can even use this to test and debug MPI based jobs using multiple nodes.

Using ''srun''

In order to start an interactive session you need to use the srun command instead of sbatch. The srun command uses the same options as sbatch (e.g. for specifying the number of nodes, the number of cpus or the amount of memory). Please note that you have to specify some time limit for interactive sessions, too. So here is a simple example allocating one core on one node for one day and starting a login shell:

dreger@yoshi:~> srun –time=1-00:00:00 –nodes=1 –tasks=1 –mem=1G –pty /bin/bash
dreger@y063:~> squeue -al -u dreger
Tue Jun 16 16:14:43 2015
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
            191575      main     bash   dreger  RUNNING       0:16 1-00:00:00      1 y063

The option –pty is important to make the terminal behave correctly. In this session you can now check the environment and limit set up by Slurm:

Environment
dreger@y063:~> env | grep ^SLURM
[...]
SLURM_NTASKS_PER_NODE=1
SLURM_JOB_ID=191575

limits
dreger@y063:~> cat /cgroup/cpuset/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/cpuset.cpus
0
dreger@y063:~> cat /cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes
1073741824

Using multiple cores or nodes

In the next example we will allocate 2 nodes with 8 cores each and run a MPI version of hello-world:

dreger@yoshi:~> mpicc -o mpi_hello mpi_hello.c
dreger@yoshi:~> srun –time=1-00:00:00 –nodes=2 –tasks-per-node=8 –mem=1G –pty /bin/bash
dreger@y015:~> squeue -al -u dreger
Tue Jun 16 16:25:39 2015
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
            191577      main     bash   dreger  RUNNING       0:05 1-00:00:00      2 y[015-016]

dreger@y015:~> mpirun mpi_hello
Process 0 on y015 out of 16
Process 2 on y015 out of 16
Process 4 on y015 out of 16
Process 1 on y015 out of 16
Process 3 on y015 out of 16
Process 6 on y015 out of 16
Process 7 on y015 out of 16
Process 5 on y015 out of 16
Process 12 on y016 out of 16
Process 10 on y016 out of 16
Process 13 on y016 out of 16
Process 15 on y016 out of 16
Process 8 on y016 out of 16
Process 9 on y016 out of 16
Process 11 on y016 out of 16
Process 14 on y016 out of 16

services/cluster/interactivesessions.txt · Last modified: 2015/06/16 14:35 by dreger

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki