Use local storage on the compute nodes
If many jobs write to or read from the NFS server for the cluster home at the same time, the server can get very slow and even crash. Therefore it's very important that all users try to use local storage available to the nodes if possible. In most cases this will also speed up your jobs. In order to do so you have to tell the queuing system the amount of local disk space you want to reserve for your job. The queuing system will create a directory named /local_scratch/$PBS_JOBID
on the nodes. After the computation has finished you must copy the results you want to keep from the local disks back to your home directory. Please try to copy only input and output files you really need, refrain from using *
everywhere.
Example for TORQUE aka sheldon cluster
#!/bin/bash #PBS -N local-file #PBS -l walltime=1:00:00 #PBS -l file=10gb #PBS -m bea -M dreger@physik.fu-berlin.de # location of local storage directory on the node local_dir=/local_scratch/$PBS_JOBID if [[ -d "$local_dir" ]]; then echo "# found local storage at $local_dir. copying data from $PBS_O_WORKDIR to $local_dir." echo "# maximum file size is:" $(ulimit -f) # copy necessary input data cp $PBS_O_WORKDIR/input.dat $local_dir cd $local_dir else local_dir= echo "# no local storage found. running calculations in $PBS_O_WORKDIR" cd $PBS_O_WORKDIR fi # run jobs now md5sum input.dat > result.out # copy results back to $PBS_O_WORKDIR after job has finished if [[ -n "$local_dir" ]]; then cp result.out $PBS_O_WORKDIR fi
Example run using this jobfile:
<xterm> dreger@sheldon:~/test-file> ls input.dat jobfile1 dreger@sheldon:~/test-file> qsub jobfile1 656781.torque.physik.fu-berlin.de dreger@sheldon:~/test-file> ls input.dat jobfile1 local-file.e656781 local-file.o656781 result.out dreger@sheldon:~/test-file> grep ^# local-file.o656781 # found local storage at /local_scratch/656781.torque.physik.fu-berlin.de. copying data from /home/dreger/local-file to /local_scratch/656781.torque.physik.fu-berlin.de. # maximum file size is: 10485760 dreger@sheldon:~/test-file> cat result.out aee97cb3ad288ef0add6c6b5b5fae48a input.dat </xterm>