User Tools

Site Tools


services:cluster:queuing-system

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
cluster:queuing-system [2012/03/02 18:37] dregerservices:cluster:queuing-system [2012/10/18 17:24] pneuser
Line 1: Line 1:
-====== Basic information ======+====== Introduction to the HPC cluster of the physics department ======
  
 The login node of the HPC cluster is ''sheldon.physik.fu-berlin.de''. You can connect to it from anywhere using ssh, e.g. by issuing ''ssh sheldon.physik.fu-berlin.de'' on the command line or using putty from windows. The login node of the HPC cluster is ''sheldon.physik.fu-berlin.de''. You can connect to it from anywhere using ssh, e.g. by issuing ''ssh sheldon.physik.fu-berlin.de'' on the command line or using putty from windows.
Line 26: Line 26:
 ===== Submitting a job to the HPC cluster ===== ===== Submitting a job to the HPC cluster =====
  
-In oder to do any calculations on the HPC cluster **you have to submit your jobs to the queuing system** using the ''qsub'' command. You may not login to a compute node and start interactive calculations at any time.+In order to do any calculations on the HPC cluster **you have to submit your jobs to the queuing system** using the ''qsub'' command. You may not login to a compute node and start interactive calculations at any time.
  
 You submit jobs to the queuing system by writing a job-script which tells the queuing system about the resources your job needs and about the programs which are to be run. Basically the job-script is a shell script with some magic comments at the top (lines starting with ''#PBS'') which are parsed by the queuing system. You submit jobs to the queuing system by writing a job-script which tells the queuing system about the resources your job needs and about the programs which are to be run. Basically the job-script is a shell script with some magic comments at the top (lines starting with ''#PBS'') which are parsed by the queuing system.
Line 41: Line 41:
 #PBS -l walltime=1:00:00 #PBS -l walltime=1:00:00
 #PBS -l nodes=1:ppn=1 #PBS -l nodes=1:ppn=1
-#PBS -m bea -M hpcuser@zedat.fu-berlin.de+#PBS -m bea -M hpcuser@physik.fu-berlin.de
  
 ## go to the directory the user typed 'qsub' in ## go to the directory the user typed 'qsub' in
Line 93: Line 93:
 ''some-good-name.o26103'' contains the standard output of your job-script, while ''some-good-name.e26103'' contains the standard error output. ''outputfile.txt'' in this case contains the output of the ''env'' command, so you can use this to list all PBS-environment variables usable by your job-scripts by ''grep ^PBS outputfile.txt''. Type ''man qsub'' for more information on #PBS options and $PBS_* environment variables. ''some-good-name.o26103'' contains the standard output of your job-script, while ''some-good-name.e26103'' contains the standard error output. ''outputfile.txt'' in this case contains the output of the ''env'' command, so you can use this to list all PBS-environment variables usable by your job-scripts by ''grep ^PBS outputfile.txt''. Type ''man qsub'' for more information on #PBS options and $PBS_* environment variables.
  
-The most important recources that can (and should!) be specified by your job-script using the ''#PBS -l ...'' option are listed in the following table:+=== Requesting resources ===
  
-^ Resource ^ Format ^ Description ^ Example ^ +The most important recources that can (and should!) be specified by your job-script using the ''#PBS -l'' option are listed in the following table: 
-| nodes    | {<node_count> %%|%% <hostname>} [:ppn=<ppn>] | Number of nodes to be reserved for exclusive use by the job. ppn=# specifies the number of cores on each node. | **nodes=10:ppn=12** -> request 10 nodes with 12 cores each\\ **nodes=n100:ppn=8+n101:ppn=8** -> request two explicit nodes by hostname (possible, but not recommended) | + 
-| walltime | seconds, or [[HH:]MM:]SS | Maximum amount of real time during which the job can be in the running state. | **walltime=100:00:00** -> request 100 hours for this job | +^ Resource ^ Format ^ Description ^ Example ^ Default 
-| pmem | size* | Maximum amount of physical memory used by any single process of the job. In our case this means per core and defaults to 2gb. | **pmem=8gb** -> request 8gb RAM per core | +| nodes    | {<node_count> %%|%% <hostname>} [:ppn=<ppn>] | Number of nodes to be reserved for exclusive use by the job. ppn=# specifies the number of cores on each node. | **nodes=10:ppn=12** -> request 10 nodes with 12 cores each\\ **nodes=n100:ppn=8+n101:ppn=8** -> request two explicit nodes by hostname (possible, but not recommended) | nodes=1:ppn=1 
-| file | size* | The amount of total **local disk space** requested for the job. | **file=10gb** -> request 10 gigabytes of local disk space on each compute node |+| walltime | seconds, or [[HH:]MM:]SS | Maximum amount of real time during which the job can be in the running state. The job will be terminated once this limit is reached. | **walltime=100:00:00** -> request 100 hours for this job | walltime=1:00:00 (1 hour) 
 +| pmem | size* | Maximum amount of physical memory used by any single process of the job. In our case this means per core. | **pmem=8gb** -> request 8gb RAM per core | pmem=2gb 
 +| file | size* | The amount of **local disk space** per core requested for the job. The space can be accessed at /local_scratch/$PBS_JOBID | **file=10gb** -> request 10 gigabytes of local disk space on each compute node | none |
  
 **size* format** = integer, optionally followed by a multiplier {b,kb,mb,gb,tb} meaning {bytes,kilobytes,megabytes,gigabytes,terabytes}. no suffix means bytes. **size* format** = integer, optionally followed by a multiplier {b,kb,mb,gb,tb} meaning {bytes,kilobytes,megabytes,gigabytes,terabytes}. no suffix means bytes.
  
-=== Complicated example running CP2K using MPI on 12 nodes with 8 processors each === +=== Recommendations on resource usage === 
-<code bash>+ 
 +Note that in general it is a bad idea to specify far too large values for pmem or walltime //just to be on the safe side//, since this will very likely delay execution of your jobs. An explanation for this behaviour will be given in an upcoming section on backfill strategy of the queuing system. 
 + 
 +Please try to use local disk space on the compute nodes whenever possible. Since access to local storage is faster than access to your $PBS_O_WORKDIR, this will most likely speed up your compute jobs. At the same time it reduces the load on the central home-server. However, do not forget to copy back data from the compute nodes to $PBS_O_WORKDIR after the job has finised, since the local disk space will be cleared once your job-script ha finished. The following advanced job-script is using local disk space. 
 + 
 +=== Advanced job-script example running CP2K using MPI on 12 nodes with 8 cores each === 
 + 
 +<code> 
 +#!/bin/bash
 #PBS -N some_good_name #PBS -N some_good_name
-#PBS -l nodes=12:ppn=8:infiniband+#PBS -l nodes=12:ppn=8
 #PBS -l walltime=100:00:00 #PBS -l walltime=100:00:00
 #PBS -l file=1000M #PBS -l file=1000M
-#PBS -m ea -M mymail@zedat.fu-berlin.de+#PBS -m ea -M hpcuser@physik.fu-berlin.de
  
 cd $PBS_O_WORKDIR cd $PBS_O_WORKDIR
Line 120: Line 130:
 export seq=`cat seq` export seq=`cat seq`
 awk 'BEGIN{printf "%2.2d\n",ENVIRON["seq"]+1}' > seq awk 'BEGIN{printf "%2.2d\n",ENVIRON["seq"]+1}' > seq
- 
  
 infile=${flag}.inp infile=${flag}.inp
Line 145: Line 154:
  
 ===== Run a job interactively ===== ===== Run a job interactively =====
-If you want to run a job on a clusternode interactively (i.e. for debugging purposes), simply put a **''-I''** option in the qsub command line together+If you want to run a job on a compute node interactively (i.e. for debugging purposes), simply put a **''-I''** option in the qsub command line together
 with the torque resource options you are normally using in the #PBS lines in your job script: with the torque resource options you are normally using in the #PBS lines in your job script:
  
 <xterm> <xterm>
-qsub **-I** -l cput=20:00:00 -l nodes=1:ppn=1 -N jobname+hpcuser@sheldon:~> qsub **-I** -l cput=20:00:00 -l nodes=1:ppn=1 -N jobname
 </xterm> </xterm>
  
 qsub does not return in this case; instead, as soon as you get scheduled, you get an interactive shell on a node. qsub does not return in this case; instead, as soon as you get scheduled, you get an interactive shell on a node.
 +
 ===== Requesting resources ===== ===== Requesting resources =====
 The following resources can be requested from the queueing system: The following resources can be requested from the queueing system:
services/cluster/queuing-system.txt · Last modified: 2014/06/11 14:40 by dreger

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki