User Tools

Site Tools


services:cluster:start

Information about the HPC-Cluster

If you have questions, you can find us on Matrix in #hpc:physik.fu-berlin.de

Access to the Cluster

In order to get access to the department of physics HPC resources you need to send an email to hpc@physik.fu-berlin.de. Please supply the following information:

  1. Your ZEDAT account username
  2. The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…)
  3. The software you are using for your numerics (e.g. externally developed software like GROMACS or Gaussian, or self-written code in language Python, Fortran, Julia or C). Also let us know if you have any special needs, e.g. if you use any MPI, GPU offloading (OpenCL/CUDA/Vulkan Compute), or need special compiler toolchains.
  4. Software that you happen to know so well that other HPC users within the department may ask you for help.
  5. A self-contained example job that is typical for the workload you will be using the HPC systems for.
  6. If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper).

The example must contain:

  1. A small README describing how to run (and if necessary build) the example,
  2. a Slurm job script, and
  3. the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a requirements.txt for a Python virtual environment) or that is needed to build the software (e.g. a cargo.lock).

If possible:

  1. The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking.

If you can't answer the questions for your example, these steps can help you answer them

  1. If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)?
  2. How long does your example run?
  3. How many CPUs and how much memory does the example need?
  4. Can the example's runtime be made to scale, preferably by changing a single parameter?

Slurm documentation

Read this for an introduction to Slurm queuing system, if you haven't used an HPC cluster before and want to learn the workflow:

Read this for some important notes on the specifics of our clusters.

These are more specialised topics:

Overview of available resources

The following table lists some HPC resources available at the physics department. The tron cluster at Takustraße 9 is currently being restructured. We also have some special purpose nodes that are not managed by Slurm.

The name of the login node for each of our clusters has the same name as the cluster, e.g. the sheldon login node is reachable via ssh under the hostname sheldon.physik.fu-berlin.de (or just sheldon inside the department).

Hosts Nodes Cores/Node RAM/Core RAM/Node CPU features GPU on-GPU RAM #Cores #RAM #GPU
sheldon-ng cluster - FB Physik - Location: Takustraße 7 - OS: Debian/Bookworm
x[001-016,049-160] 128 24 5.2GB 125GB x86-64-v2 3072 16000GB 0
x[017-048] 32 24 20.9GB 502GB x86-64-v2 768 16064GB 0
x[161-176] 16 24 5.2GB 125GB x86-64-v3 384 2000GB 0
sheldon,x[177-178,180-222] 45 24 42.0GB 1007GB x86-64-v3 1080 45315GB 0
xq[01-10] 10 128 2.0GB 250GB x86-64-v3 2x A5000 24GB 1280 2500GB 20
xgpu[01-05,07-13] 12 16 11.7GB 187GB x86-64-v4 4x nVidia RTX 2080 TI 11GB 192 2244GB 48
xgpu06 1 16 11.2GB 179GB x86-64-v4 4x nVidia RTX 2080 TI 11GB 16 179GB 4
xgpu[14-23] 10 16 11.7GB 187GB x86-64-v4 4x A5000 24GB 160 1870GB 40
xgpu[24-25] 2 16 11.7GB 187GB x86-64-v3 4x nVidia RTX 3090 24GB 32 374GB 8
xgpu26 1 64 2.0GB 125GB x86-64-v3 10x A5000 24GB 64 125GB 10
xgpu28 1 24 10.4GB 250GB x86-64-v3 4x nVidia RTX A600 Ada 48GB 24 250GB 4
xgpu[29-33] 5 24 5.2GB 125GB x86-64-v3 4x nVidia Titan V 12GB 120 625GB 20
xgpu[27,34-52,54-56,58,62] 25 24 5.2GB 125GB x86-64-v3 4x A5000 24GB 600 3125GB 100
xgpu57 1 24 5.2GB 125GB x86-64-v3 4x nVidia RTX A600 48GB 24 125GB 4
xgpu[59-61] 3 36 41.9GB 1509GB x86-64-v4 8x nVidia Tesla P100 16GB 108 4527GB 24
xgpu63 1 24 5.2GB 125GB x86-64-v3 4x nVidia RTX A4500 Ada 24GB 24 125GB 4
#Taku 7 293 7948 95448GB 286

(07.02.2025)

services/cluster/start.txt · Last modified: 2025/02/07 16:46 by behrmj87

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki