User Tools

Site Tools


services:cluster:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
services:cluster:start [2025/02/07 14:33] – [Slurm documentation] drop link to outdated GPU documentation behrmj87services:cluster:start [2025/02/07 16:46] (current) – fix math in table behrmj87
Line 8: Line 8:
  
   - Your ZEDAT account username   - Your ZEDAT account username
-  - The group you are using the system for (e.g. ag-netz,ag-imhof,...+  - The group you are using the system for (e.g. AG NetzAG EisertAG Franke…
-  - The software you are using for your simulations (e.g. gromacs, gaussian, self-written code in language XYZ, ...) and whether you use MPI or OpenCL/CUDA.+  - The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussianor self-written code in language PythonFortran, Julia or C)Also let us know if you have any special needs, e.gif you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains.
   - Software that you happen to know so well that other HPC users within the department may ask you for help.   - Software that you happen to know so well that other HPC users within the department may ask you for help.
-  - A self-contained example job that is typical for the workload you will be using the HPC systems for, ideally **with a small README** describing how to run it **and a job script**. If possible scale it so it runs between a few minutes and an hour at maximum+  - A self-contained example job that is typical for the workload you will be using the HPC systems for. 
-  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper)+  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper).
  
 +The example must contain:
 +
 +  - A small README  describing how to run (and if necessary build) the example,
 +  - a Slurm job script, and
 +  - the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a ''requirements.txt'' for a Python virtual environment) or that is needed to build the software (e.g. a ''cargo.lock''). 
 +
 +If possible:
 +
 +  - The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking.
 +
 +If you can't answer the questions for your example, these steps can help you answer them
 +
 +  - If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)?
 +  - How long does your example run?
 +  - How many CPUs and how much memory does the example need?
 +  - Can the  example's runtime be made to scale, preferably by changing a single parameter?
 ===== Slurm documentation ===== ===== Slurm documentation =====
  
-  * [[important|Important notes]] on cluster usage+Read this for an introduction to Slurm queuing system, if you haven't used an HPC cluster before and want to learn the workflow:  
   * Start with the [[slurm|Introduction to the Slurm HPC cluster]].   * Start with the [[slurm|Introduction to the Slurm HPC cluster]].
 +
 +Read this for some important notes on the specifics of our clusters.
 +
 +  * [[important|Important notes]] on cluster usage
 +
 +These are more specialised topics:
 +
   * Using [[interactivesessions|interactive sessions]] with the queuing system.   * Using [[interactivesessions|interactive sessions]] with the queuing system.
   * Here is a [[nodes|list of special nodes]] that are currently not part of slurm.   * Here is a [[nodes|list of special nodes]] that are currently not part of slurm.
Line 29: Line 53:
  
 ^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^ ^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^
-| @#cfc:** sheldon cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm |||||||||||+| @#cfc:** sheldon-ng cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm |||||||||||
 | @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 |  |  | 3072 | 16000GB | 0 | | @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 |  |  | 3072 | 16000GB | 0 |
 | @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 |  |  | 768 | 16064GB | 0 | | @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 |  |  | 768 | 16064GB | 0 |
 | @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 |  |  | 384 | 2000GB | 0 | | @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 |  |  | 384 | 2000GB | 0 |
 | @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 |  |  | 1080 | 45315GB | 0 | | @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 |  |  | 1080 | 45315GB | 0 |
-| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | +| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 20 
-| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | |+| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 48 |
 | @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 | | @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 |
-| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | +| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 40 
-| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | |+| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | |
 | @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 | | @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 |
 | @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 | | @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 |
-| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | +| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 20 
-| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | |+| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 100 |
 | @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 | | @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 |
-| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | |+| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 24 |
 | @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 | | @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 |
-| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **56** |+| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **286** | 
  
 (07.02.2025) (07.02.2025)
services/cluster/start.1738938810.txt.gz · Last modified: 2025/02/07 14:33 by behrmj87

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki