Differences

This shows you the differences between two versions of the page.

--- services:cluster:important [2014/11/11 18:11] – [Write output files to /scratch/username] dreger
+++ services:cluster:important [2025/02/07 16:43] (current) – Add note on modules behrmj87
@@ Line 1: / Line 1: @@
 ====== Important notes on cluster usage ======
-==== Write output files to /scratch/username ====
-For every account on the cluster a directory on the cluster wide filesystem /scratch is created. It is very important that you do not write your output files from jobs to /home///username// but /scratch///username// instead. This is easily accomplished by submitting your jobs directly from /scratch///username//.
+==== ''/home'' on the cluster ====
-Otherwise the NFS server for /home might get into big trouble. You can copy necessary input data for your jobs from /home to /scratch before the main programs start and copy important output data back to /home after the jobs finished. You can also just keep output data on /scratch. Nothing on the filesystem will ever be deleted. The main difference is, that /home is just a single server while /scratch is a cluster-filesystem based on Fraunhofer FS that utilizes many servers at the same time.
+The ''/home'' directories on the cluster are separate for each cluster and separate from our regular home directories, so you will need to copy over config you may need, such as SSH keys.
+==== Submit jobs from ''/scratch/username'' ====
+For every account on the cluster a directory on the cluster wide filesystem ''/scratch'' is created.
+You cannot write output files from jobs to ''/home///username//'' from inside a running job and ''/scratch///username//'' must also be the working directory for the job. This is easily accomplished by submitting your jobs directly from ''/scratch///username//''.
+''~/.cache'' is pointing to a temporary filesystem, which you can use nonetheless. You can also just keep output data on ''/scratch'' as nothing on the filesystem will ever be deleted. The main difference is, that ''/home'' is just a single server while ''/scratch'' is a cluster-filesystem based on BeeGFS that utilizes many servers at the same time.
 ==== Always specify the amount of memory your job needs in the jobfile ====
-When submitting a job you absolutely must specify an amount of memory to be allocated for your job. The reason for this is that the default setting is ridiculously low at 1MB per cpu. Why is that? The main reason for this setting is that we need to make sure, that jobs of user A can not kill jobs of user B by using more memory than available. When two jobs of different users run on the same node at the same time and the job of user A uses all memory on the node, the job of user B might suffer or even get killed by the operating system in an attempt to free more memory. In order to prevent this from happening every job needs to specify an amount of memory that should be allocated and if the job exceeds this limit, it will be killed immediately. So why not always just request the maximum amount of memory to be on the safe side? Most nodes in the cluster currently have 8 cores and 48 GB of memory. If you request 48 GB of memory even thought your job doesn't need that much, your job can only run on completely empty nodes and might stay in the queue for a long time. If you lower the amount of requested memory, your job can probably run together with other jobs on nodes that have some resources left and it's likely that your jobs will start much sooner.
+When submitting a job you absolutely must specify an amount of memory to be allocated for your job.
+By default, a job gets ''1MB'' of memory per allocated CPU. This is a ridiculously small value, which we set to make sure that some thought goes into how much memory you need. If more memory is allocated than you actually need, your job might wait longer than needed for a free spot in the cluster.
+If you specify less memory than needed, your program will be killed automatically by Slurm.
+==== Use ''/tmp'' for I/O intensive single node jobs ====
+Jobs that do a lot of I/O operations on a shared cluster filesystem like ''/scratch'' can severely slow down the whole system. If your job does not use multiple nodes and is not reading and writing very large files, it might be a good idea to move input and output files to the ''/tmp'' folder on the compute node itself.
+''/tmp'' is a RAM based filesystem, meaning that anything you store there is actually stored in memory. So space is quite limited. Currently all jobs on a node can use at most 20% of the total system memory for space in ''/tmp''. If you need more space, you should consider using ''/dev/shm'', where you can use up to 50% of the total system memory per job.
+<note tip>
+Usable space below ''/tmp'' and ''/dev/shm'' counts towards your job's memory usage and thus is limited by the ''%%--%%mem'' option
+</note>
+==== SSH access ====
+The login node allows password-based login only from within the university network. We generally recommend SSH access with key files. To get your SSH key on the login node, when you are not at the university, you have two options:
+  - Use the VPN.
+  - Use an SSH proxy jump via login,physik.fu-berlin.de
+The latter is done via
+<code bash>
+ssh-copy-id \
+    -i ~/.ssh/id_sheldon \
+    -o ProxyJump=username@login.physik.fu-berlin.de \
+    username@headnode.physik.fu-berlin.de
+</code>
+Assuming a key file ''id_sheldon''. You will need to change ''username'' to your username and ''headnode'' to the name of the head node (login node) of the cluster.
+==== Modules ====
+Modules are a staple in the HPC world. They are a way to change your environment to include paths that are not normally in your binary (~PATH~) or library search paths (~LD_LIBRARY_PATH~) so that you can use a wider variety/other/different versions of programs
+<code>
+These are the most important commands
+#+BEGIN_SRC bash
+# show available modules
+module avail
+# load a module
+module load name_of_module
+# e.g. module load gromacs/double/2020.4
+#
+# unload a module (usually not necessary in a job script, but you can use
+# modules interactively, too)
+module unload name_of_module
+</code>
+Somebody has to build the software. This is done by us and interested users, e.g. the GROMACS packages are mostly built by users in AG Netz. The software modules can be found in ~/net/opt~. If you want to contribute, let us know!