Differences

This shows you the differences between two versions of the page.

--- services:cluster:start [2022/07/12 15:49] – [Slurm documentation] zedv
+++ services:cluster:start [2025/07/21 10:50] (current) – hoffmac00
@@ Line 1: / Line 1: @@
 ====== Information about the HPC-Cluster ======
-<note tip>If you have questions, you can find us on [[https://meet.physik.fu-berlin.de/#/room/!lwzXdWYwaTwKKSKfAb:physik.fu-berlin.de?via=physik.fu-berlin.de|Matrix in #hpc:physik.fu-berlin.de]]</note>
+<note tip>If you have questions, you can find us on [[https://meet.physik.fu-berlin.de/#/room/#hpc:physik.fu-berlin.de|Matrix in #hpc:physik.fu-berlin.de]]</note>
 ===== Access to the Cluster =====
-In order to get access to the department of physics HPC resources you need to send an email to hpc@physik.fu-berlin.de. Please supply the following information:
+In order to get access to the department of physics HPC resources you need to send an email to [[hpc@physik.fu-berlin.de]]. Please supply the following information:
   - Your ZEDAT account username
-  - The group you are using the system for (e.g. ag-netz,ag-imhof,...)
+  - The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…)
-  - The software you are using for your simulations (e.g. gromacs, gaussian, self-written code in language XYZ, ...) and whether you use MPI or OpenCL/CUDA.
+  - The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussian, or self-written code in language Python, Fortran, Julia or C). Also let us know if you have any special needs, e.g. if you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains.
   - Software that you happen to know so well that other HPC users within the department may ask you for help.
-  - A self-contained example job that is typical for the workload you will be using the HPC systems for, ideally with a small README describing how to run it and a job script. If possible scale it so it runs between a few minutes and an hour at maximum.
+  - A self-contained example job that is typical for the workload you will be using the HPC systems for.
-  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper)
+  - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper).
+The example must contain:
+  - A small README  describing how to run (and if necessary build) the example,
+  - a Slurm job script, and
+  - the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a ''requirements.txt'' for a Python virtual environment) or that is needed to build the software (e.g. a ''cargo.lock'').
+If possible:
+  - The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking.
+If you can't answer the questions for your example, these steps can help you answer them
+  - If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)?
+  - How long does your example run?
+  - How many CPUs and how much memory does the example need?
+  - Can the  example's runtime be made to scale, preferably by changing a single parameter?
 ===== Slurm documentation =====
-  * [[important|Important notes]] on cluster usage
+Read this for an introduction to Slurm queuing system, if you haven't used an HPC cluster before and want to learn the workflow:
   * Start with the [[slurm|Introduction to the Slurm HPC cluster]].
-  * Using [[interactivesessions|interactive sessions]] with the queuing system.
-  * How to make use of the [[gpunodes|GPU-nodes]].
-  * Here is a [[nodes|list of special nodes]] that are currently not part of slurm.
-  * Here is a [[userlist|list of HPC users]] and the software they use
-  * Using [[sheldon-gpu|GPU nodes on sheldon]]
-===== General documentation =====
+Read this for some important notes on the specifics of our clusters.
-  * Robert Hübener from AG-Eisert has written a HOWTO for using [[mmacluster|Mathematica on a HPC-Cluster]].
+  * [[important|Important notes]] on cluster usage
-  * A more current Python version has been built for cluster usage. The [[pythoncluster|Python on the HPC-Cluster]] tutorial describes how to set it up.
-  * Try to [[usetmpforio|use /tmp for I/O intensive single node jobs]]
-===== Overview of available resources =====
+These are more specialised topics:
-The following table lists some HPC resources available at the physics department. At the end of the table we also list the resources for the ZEDAT [[http://www.zedat.fu-berlin.de/HPC/Soroban|soroban]] cluster. The tron cluster at Takustrasse 9 is currently restructured. We also have some [[nodes|special purpose nodes]] that are currently not managed by Slurm.
+  * Using [[interactivesessions|interactive sessions]] with the queuing system.
+  * Here is a [[nodes|list of special nodes]] that are currently not part of slurm.
+  * Here is a [[userlist|list of HPC users]] and the software they use
-The name of the login node for each of our clusters has the same name as the cluster, e.g. the tron login node is reachable via ssh under the hostname ''tron''.
+===== Overview of available resources =====
-^ Hosts                                                                                              ^ Manager  ^ Nodes                                       ^ Form                                  ^ Hardware                ^ CPU                ^ Speed    ^ Core/Node  ^  RAM/Core ^ RAM/Node  ^  #RAM                                        ^  #Cores                                      ^
+The following table lists some HPC resources available at the physics department. The tron cluster at Takustraße 9 is currently being restructured. We also have some [[nodes|special purpose nodes]] that are not managed by Slurm.
-| @#cfc:**tron cluster** - FB Physik - Location: Takustrasse 9 - OS: Debian/Stretch                                                                                                                                                                                                                                                                                                     ||||||||||||
-| @#cfc:z001-z020                                                                                    | SLURM    |                                          20 | 1U                                    | IBM iDataPlex dx360 M4  | 2x Xeon E5-2680v2  | 2.8GHz   |  20        |  25G      |  512G     |                                       10024G |  400                                         |
-| @#cfc:z021-z040                                                                                    | SLURM    |                                          20 | 1U                                    | IBM iDataPlex dx360 M4  | 2x Xeon E5-2680v2  | 2.8GHz   |  20        |  12G      |  256G     |                                        5120G |  400                                         |
-| @#cfc:z041-z113                                                                                    | SLURM    |                                          72 | 2U GPU Nodes (2x Nvidia Tesla K20x)   | IBM iDataPlex dx360 M4  | 2x Xeon E5-2680v2  | 2.8GHz   |  20        |  6G       |  128G     |                                        9216G |  1440                                        |
-| @#cfc:z163-z166                                                                                    | SLURM    |                                           4 | 2U                                    | HP DL560 G8             | 4x Xeon E5-4650L   | 2.6GHz   |  32        |  24G      |  768G     |                                        3072G |  128                                         |
-| @#cfc:**#Taku9**                                                                                   |          |  **~~=sum(range(col(),1,col(),row()-1))~~** |                                       |                         |                    |          |            |           |           |  **~~=sum(range(col(),1,col(),row()-1))~~G** |  **~~=sum(range(col(),1,col(),row()-1))~~**  |
-|                                                                                                    |          |                                             |                                       |                         |                    |          |            |           |           |                                              |                                              |
-| @#ccf:**soroban cluster** - ZEDAT-HPC - Location: ZEDAT                                                                                                                                                                                                                                                                                                                               ||||||||||||
+The name of the login node for each of our clusters has the same name as the cluster, e.g. the sheldon login node is reachable via ssh under the hostname ''sheldon.physik.fu-berlin.de'' (or just ''sheldon'' inside the department).
-| @#ccf:node001-002                                                                                  | SLURM    |                                           2 | 1U Twin                               | Asus Z8NH-D12           | 2x Xeon X5650      | 2.66GHz  | 12         |  8G       |  48G      |                                          96G |  24                                          |
-| @#ccf:node003-030                                                                                  | SLURM    |                                          28 | 1U Twin                               | Asus Z8NH-D12           | 2x Xeon X5650      | 2.66GHz  | 12         |  4G       |  24G      |                                         672G |  336                                         |
-| @#ccf:node031-100                                                                                  | SLURM    |                                          70 | 1U Twin                               | Asus Z8NH-D12           | 2x Xeon X5650      | 2.66GHz  | 12         |  8G       |  48G      |                                        3360G |  840                                         |
-| @#ccf:node101-112                                                                                  | SLURM    |                                          12 | 1U Twin                               | Asus Z8NH-D12           | 2x Xeon X5650      | 2.66GHz  | 12         |  16G      |  96G      |                                        1152G |  144                                         |
-| @#ccf:**#ZEDAT**                                                                                   |          |                                     **112** |                                       |                         |                    |          |            |           |           |                                    **5280G** |  **1344**                                    |
-|                                                                                                    |          |                                             |                                       |                         |                    |          |            |           |           |                                              |                                              |
-| @#ccc:Abacus4                                                                                      |          |                                           8 |                                       | IBM p575                | 16x POWER 5+       | 1.9Ghz   | 32         |  4G       |  128G     |                                        1024G |  256                                         |
+^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^
+| @#cfc:** sheldon-ng cluster** - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm |||||||||||
+| @#cfc:x[001-016,049-160] | 128 | 24 | 5.2GB | 125GB | x86-64-v2 |  |  | 3072 | 16000GB | 0 |
+| @#cfc:x[017-048] | 32 | 24 | 20.9GB | 502GB | x86-64-v2 |  |  | 768 | 16064GB | 0 |
+| @#cfc:x[161-176] | 16 | 24 | 5.2GB | 125GB | x86-64-v3 |  |  | 384 | 2000GB | 0 |
+| @#cfc:sheldon,x[177-178,180-222] | 45 | 24 | 42.0GB | 1007GB | x86-64-v3 |  |  | 1080 | 45315GB | 0 |
+| @#cfc:xq[01-10] | 10 | 128 | 2.0GB | 250GB | x86-64-v3 | 2x A5000 | 24GB | 1280 | 2500GB | 20 |
+| @#cfc:xgpu[01-05,07-13] | 12 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 192 | 2244GB | 48 |
+| @#cfc:xgpu06 | 1 | 16 | 11.2GB | 179GB | x86-64-v4 | 4x nVidia RTX 2080 TI | 11GB | 16 | 179GB | 4 |
+| @#cfc:xgpu[14-23] | 10 | 16 | 11.7GB | 187GB | x86-64-v4 | 4x A5000 | 24GB | 160 | 1870GB | 40 |
+| @#cfc:xgpu[24-25] | 2 | 16 | 11.7GB | 187GB | x86-64-v3 | 4x nVidia RTX 3090 | 24GB | 32 | 374GB | 8 |
+| @#cfc:xgpu26 | 1 | 64 | 2.0GB | 125GB | x86-64-v3 | 10x A5000 | 24GB | 64 | 125GB | 10 |
+| @#cfc:xgpu28 | 1 | 24 | 10.4GB | 250GB | x86-64-v3 | 4x nVidia RTX A600 Ada | 48GB | 24 | 250GB | 4 |
+| @#cfc:xgpu[29-33] | 5 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia Titan V | 12GB | 120 | 625GB | 20 |
+| @#cfc:xgpu[27,34-52,54-56,58,62] | 25 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x A5000 | 24GB | 600 | 3125GB | 100 |
+| @#cfc:xgpu57 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A600 | 48GB | 24 | 125GB | 4 |
+| @#cfc:xgpu[59-61] | 3 | 36 | 41.9GB | 1509GB | x86-64-v4 | 8x nVidia Tesla P100 | 16GB | 108 | 4527GB | 24 |
+| @#cfc:xgpu63 | 1 | 24 | 5.2GB | 125GB | x86-64-v3 | 4x nVidia RTX A4500 Ada | 24GB | 24 | 125GB | 4 |
+| @#cfc:**#Taku 7** | **293** | | | | | | | **7948** | **95448GB** | **286** |
-(06.11.2018)
+(21.07.2025)
 {{:fotos:dsc_0445wiki.jpg?width=370|}}{{:fotos:dsc_0450.jpg?width=370|}}
 {{:fotos:dsc_0446.jpg?width=740|}}