Information about the HPC-Cluster

If you have questions, you can find us on Matrix in #hpc:physik.fu-berlin.de

Access to the Cluster

In order to get access to the department of physics HPC resources you need to send an email to hpc@physik.fu-berlin.de. Please supply the following information:

Your ZEDAT account username
The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…)
The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussian, or self-written code in language Python, Fortran, Julia or C). Also let us know if you have any special needs, e.g. if you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains.
Software that you happen to know so well that other HPC users within the department may ask you for help.
A self-contained example job that is typical for the workload you will be using the HPC systems for.
If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper).

The example must contain:

A small README describing how to run (and if necessary build) the example,
a Slurm job script, and
the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a requirements.txt for a Python virtual environment) or that is needed to build the software (e.g. a cargo.lock).

If possible:

The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking.

If you can't answer the questions for your example, these steps can help you answer them

If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)?
How long does your example run?
How many CPUs and how much memory does the example need?
Can the example's runtime be made to scale, preferably by changing a single parameter?

Slurm documentation

Read this for an introduction to Slurm queuing system, if you haven't used an HPC cluster before and want to learn the workflow:

Start with the Introduction to the Slurm HPC cluster.

Read this for some important notes on the specifics of our clusters.

Important notes on cluster usage

These are more specialised topics:

Using interactive sessions with the queuing system.
Here is a list of special nodes that are currently not part of slurm.
Here is a list of HPC users and the software they use

Overview of available resources

The following table lists some HPC resources available at the physics department. The tron cluster at Takustraße 9 is currently being restructured. We also have some special purpose nodes that are not managed by Slurm.

The name of the login node for each of our clusters has the same name as the cluster, e.g. the sheldon login node is reachable via ssh under the hostname sheldon.physik.fu-berlin.de (or just sheldon inside the department).

Hosts	Nodes	Cores/Node	RAM/Core	RAM/Node	CPU features	GPU	on-GPU RAM	#Cores	#RAM	#GPU
sheldon-ng cluster - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm
x[001-016,049-160]	128	24	5.2GB	125GB	x86-64-v2			3072	16000GB	0
x[017-048]	32	24	20.9GB	502GB	x86-64-v2			768	16064GB	0
x[161-176]	16	24	5.2GB	125GB	x86-64-v3			384	2000GB	0
sheldon,x[177-178,180-222]	45	24	42.0GB	1007GB	x86-64-v3			1080	45315GB	0
xq[01-10]	10	128	2.0GB	250GB	x86-64-v3	2x A5000	24GB	1280	2500GB	20
xgpu[01-05,07-13]	12	16	11.7GB	187GB	x86-64-v4	4x nVidia RTX 2080 TI	11GB	192	2244GB	48
xgpu06	1	16	11.2GB	179GB	x86-64-v4	4x nVidia RTX 2080 TI	11GB	16	179GB	4
xgpu[14-23]	10	16	11.7GB	187GB	x86-64-v4	4x A5000	24GB	160	1870GB	40
xgpu[24-25]	2	16	11.7GB	187GB	x86-64-v3	4x nVidia RTX 3090	24GB	32	374GB	8
xgpu26	1	64	2.0GB	125GB	x86-64-v3	10x A5000	24GB	64	125GB	10
xgpu28	1	24	10.4GB	250GB	x86-64-v3	4x nVidia RTX A600 Ada	48GB	24	250GB	4
xgpu[29-33]	5	24	5.2GB	125GB	x86-64-v3	4x nVidia Titan V	12GB	120	625GB	20
xgpu[27,34-52,54-56,58,62]	25	24	5.2GB	125GB	x86-64-v3	4x A5000	24GB	600	3125GB	100
xgpu57	1	24	5.2GB	125GB	x86-64-v3	4x nVidia RTX A600	48GB	24	125GB	4
xgpu[59-61]	3	36	41.9GB	1509GB	x86-64-v4	8x nVidia Tesla P100	16GB	108	4527GB	24
xgpu63	1	24	5.2GB	125GB	x86-64-v3	4x nVidia RTX A4500 Ada	24GB	24	125GB	4
#Taku 7	293							7948	95448GB	286

(07.02.2025)