User Tools

Site Tools


services:cluster:slurm

This is an old revision of the document!


Introduction to the Slurm HPC cluster

The primary source for documentation on Slurm usage and commands can be found at the Slurm site. Please also consult the man pages on Slurm command, e.g. typing man sbatch will give you extensive information on the sbatch command.

Quick Start for the impatient

  1. Log in to the head node sheldon-ng.physik.fu-berlin.de using ssh
  2. Create a job script file to be run by the queuing system, supply information like:
    • how much memory to allocate for your job
    • how many cpu cores your jobs needs to run
    • how long you expect your job to run
    • where to and when you want the system to send mail
    • where output should be written to
  3. Submit your job script using the sbatch command

Consider the following example of a very basic job script named job1.sh:

#!/bin/bash

#SBATCH --job-name=job1                # Job name, will show up in squeue output
#SBATCH --ntasks=1                     # Number of cores
#SBATCH --nodes=1                      # Ensure that all cores are on one machine
#SBATCH --time=0-00:01:00              # Runtime in DAYS-HH:MM:SS format
#SBATCH --mem-per-cpu=100              # Memory per cpu in MB (see also --mem) 
#SBATCH --output=job1_%j.out           # File to which standard out will be written
#SBATCH --error=job1_%j.err            # File to which standard err will be written
#SBATCH --mail-type=END                # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=j.d@fu-berlin.de   # Email to which notifications will be sent 

# store job info in output file, if you want...
scontrol show job $SLURM_JOBID

# run your program...
hostname

Now just submit your job script using sbatch job1.sh from the command line. Please try to run jobs directly from the /scratch/username cluster wide filesystem to lower the load on the /home server. For testing purposes set the runtime of your job below 1 minute and submit it to the test partition by adding -p test to sbatch:

dreger@sheldon-ng:..dreger/quickstart> pwd
/scratch/dreger/quickstart
dreger@sheldon-ng:..dreger/quickstart> sbatch -p test job1.sh
Submitted batch job 26494
dreger@sheldon-ng:..dreger/quickstart> cat job1_26494.out
JobId=26494 Name=job1
   UserId=dreger(4440) GroupId=fbedv(400)
   Priority=10916 Account=fbedv QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2014-06-29T22:37:44 EligibleTime=2014-06-29T22:37:44
   StartTime=2014-06-29T22:37:44 EndTime=2014-06-29T22:38:44
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=test AllocNode:Sid=sheldon-ng:26448
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=x001
   BatchHost=x001
   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryCPU=100M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/clusterfs/scratch/dreger/quickstart/job1.sh
   WorkDir=/clusterfs/scratch/dreger/quickstart

x001

services/cluster/slurm.1404074327.txt.gz · Last modified: 2014/06/29 20:38 by dreger

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki