User Tools

Site Tools


services:cluster:important

Important notes on cluster usage

Write output files to /scratch/username

For every account on the cluster a directory on the cluster wide filesystem /scratch is created. It is very important that you do not write your output files from jobs to /home/username but /scratch/username instead. This is easily accomplished by submitting your jobs directly from /scratch/username. Otherwise the NFS server for /home might get into big trouble. You can copy necessary input data for your jobs from /home to /scratch before the main programs start and copy important output data back to /home after the jobs finished. You can also just keep output data on /scratch. Nothing on the filesystem will ever be deleted. The main difference is, that /home is just a single server while /scratch is a cluster-filesystem based on Fraunhofer FS that utilizes many servers at the same time.

Always specify the amount of memory your job needs in the jobfile

When submitting a job you absolutely must specify an amount of memory to be allocated for your job. The reason for this is that the default setting is ridiculously low at 1MB per cpu. Why is that? The main reason for this setting is that we need to make sure, that jobs of user A can not kill jobs of user B by using more memory than available. When two jobs of different users run on the same node at the same time and the job of user A uses all memory on the node, the job of user B might suffer or even get killed by the operating system in an attempt to free more memory. In order to prevent this from happening every job needs to specify an amount of memory that should be allocated and if the job exceeds this limit, it will be killed immediately. So why not always just request the maximum amount of memory to be on the safe side? Most nodes in the cluster currently have 8 cores and 48 GB of memory. If you request 48 GB of memory even thought your job doesn't need that much, your job can only run on completely empty nodes and might stay in the queue for a long time. If you lower the amount of requested memory, your job can probably run together with other jobs on nodes that have some resources left and it's likely that your jobs will start much sooner.

services/cluster/important.txt · Last modified: 2014/11/11 18:11 by dreger

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki