Both sides previous revisionPrevious revisionNext revision | Previous revision |
services:cluster:start [2025/02/07 15:15] – [Slurm documentation] point to specific documentation behrmj87 | services:cluster:start [2025/02/07 16:46] (current) – fix math in table behrmj87 |
---|
| |
- Your ZEDAT account username | - Your ZEDAT account username |
- The group you are using the system for (e.g. AG Netz, AG Eisert,…) | - The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…) |
- The software you are using for your simulations (e.g. gromacs, gaussian, self-written code in language XYZ, ..) and whether you use MPI or OpenCL/CUDA. | - The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussian, or self-written code in language Python, Fortran, Julia or C). Also let us know if you have any special needs, e.g. if you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains. |
- Software that you happen to know so well that other HPC users within the department may ask you for help. | - Software that you happen to know so well that other HPC users within the department may ask you for help. |
- A self-contained example job that is typical for the workload you will be using the HPC systems for. | - A self-contained example job that is typical for the workload you will be using the HPC systems for. |
The example must contain: | The example must contain: |
| |
- A small README describing how to run the example, | - A small README describing how to run (and if necessary build) the example, |
- a Slurm job script, and | - a Slurm job script, and |
- the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a ''requirements.txt'' for a Python virtual environment) or that is needed to build the software (e.g. a ''cargo.lock''). | - the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a ''requirements.txt'' for a Python virtual environment) or that is needed to build the software (e.g. a ''cargo.lock''). |
| |
- The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking. | - The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking. |
| |
| If you can't answer the questions for your example, these steps can help you answer them |
| |
| - If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)? |
| - How long does your example run? |
| - How many CPUs and how much memory does the example need? |
| - Can the example's runtime be made to scale, preferably by changing a single parameter? |
===== Slurm documentation ===== | ===== Slurm documentation ===== |
| |
| |
^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^ | ^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^ |
| @#cfc:** sheldon cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm ||||||||||| | | @#cfc:** sheldon-ng cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm ||||||||||| |
| @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 | | | 3072 | 16000GB | 0 | | | @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 | | | 3072 | 16000GB | 0 | |
| @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 | | | 768 | 16064GB | 0 | | | @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 | | | 768 | 16064GB | 0 | |
| @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 | | | 384 | 2000GB | 0 | | | @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 | | | 384 | 2000GB | 0 | |
| @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 | | | 1080 | 45315GB | 0 | | | @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 | | | 1080 | 45315GB | 0 | |
| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 2 | | | @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 20 | |
| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 4 | | | @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 48 | |
| @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 | | | @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 | |
| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 4 | | | @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 40 | |
| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | 4 | | | @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | 8 | |
| @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 | | | @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 | |
| @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 | | | @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 | |
| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 4 | | | @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 20 | |
| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 4 | | | @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 100 | |
| @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 | | | @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 | |
| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 8 | | | @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 24 | |
| @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 | | | @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 | |
| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **56** | | | @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **286** | |
| |
(07.02.2025) | (07.02.2025) |