Differences

This shows you the differences between two versions of the page.

--- services:cluster:start [2025/02/07 14:33] – [Slurm documentation] drop link to outdated GPU documentation behrmj87
+++ services:cluster:start [2025/07/21 10:50] (current) – hoffmac00
@@ Line 8: / Line 8: @@
   - Your ZEDAT account username
-  - The group you are using the system for (e.g. ag-netz,ag-imhof,...)
+  - The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…)
-  - The software you are using for your simulations (e.g. gromacs, gaussian, self-written code in language XYZ, ...) and whether you use MPI or OpenCL/CUDA.
+  - The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussian, or self-written code in language Python, Fortran, Julia or C). Also let us know if you have any special needs, e.g. if you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains.
   - Software that you happen to know so well that other HPC users within the department may ask you for help.
-  - A self-contained example job that is typical for the workload you will be using the HPC systems for, ideally **with a small README** describing how to run it **and a job script**. If possible scale it so it runs between a few minutes and an hour at maximum.
+  - A self-contained example job that is typical for the workload you will be using the HPC systems for.
-  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper)
+  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper).
+The example must contain:
+  - A small README  describing how to run (and if necessary build) the example,
+  - a Slurm job script, and
+  - the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a ''requirements.txt'' for a Python virtual environment) or that is needed to build the software (e.g. a ''cargo.lock'').
+If possible:
+  - The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking.
+If you can't answer the questions for your example, these steps can help you answer them
+  - If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)?
+  - How long does your example run?
+  - How many CPUs and how much memory does the example need?
+  - Can the  example's runtime be made to scale, preferably by changing a single parameter?
 ===== Slurm documentation =====
-  * [[important|Important notes]] on cluster usage
+Read this for an introduction to Slurm queuing system, if you haven't used an HPC cluster before and want to learn the workflow:
   * Start with the [[slurm|Introduction to the Slurm HPC cluster]].
+Read this for some important notes on the specifics of our clusters.
+  * [[important|Important notes]] on cluster usage
+These are more specialised topics:
   * Using [[interactivesessions|interactive sessions]] with the queuing system.
   * Here is a [[nodes|list of special nodes]] that are currently not part of slurm.
@@ Line 29: / Line 53: @@
 ^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^
-| @#cfc:** sheldon cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm |||||||||||
+| @#cfc:** sheldon-ng cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm |||||||||||
 | @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 |  |  | 3072 | 16000GB | 0 |
 | @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 |  |  | 768 | 16064GB | 0 |
 | @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 |  |  | 384 | 2000GB | 0 |
 | @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 |  |  | 1080 | 45315GB | 0 |
-| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 2 |
+| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 20 |
-| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 4 |
+| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 48 |
 | @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 |
-| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 4 |
+| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 40 |
-| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | 4 |
+| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | 8 |
 | @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 |
 | @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 |
-| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 4 |
+| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 20 |
-| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 4 |
+| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 100 |
 | @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 |
-| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 8 |
+| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 24 |
 | @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 |
-| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **56** |
+| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **286** |
-(07.02.2025)
+(21.07.2025)
 {{:fotos:dsc_0445wiki.jpg?width=370|}}{{:fotos:dsc_0450.jpg?width=370|}}
 {{:fotos:dsc_0446.jpg?width=740|}}