services:cluster:start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
services:cluster:start [2025/02/06 13:42] – Don't link the general documentation anymore as both are severely outdated behrmj87 | services:cluster:start [2025/02/07 16:46] (current) – fix math in table behrmj87 | ||
---|---|---|---|
Line 8: | Line 8: | ||
- Your ZEDAT account username | - Your ZEDAT account username | ||
- | - The group you are using the system for (e.g. ag-netz,ag-imhof,...) | + | - The group you are using the system for (e.g. AG Netz, AG Eisert, AG Franke…) |
- | - The software you are using for your simulations | + | - The software you are using for your numerics |
- Software that you happen to know so well that other HPC users within the department may ask you for help. | - Software that you happen to know so well that other HPC users within the department may ask you for help. | ||
- | - A self-contained example job that is typical for the workload you will be using the HPC systems for, ideally **with a small README** describing how to run it **and a job script**. If possible scale it so it runs between a few minutes and an hour at maximum. | + | - A self-contained example job that is typical for the workload you will be using the HPC systems for. |
- | - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper) | + | - If you are no longer a member of the physics department, we would like to get an estimate on how much longer you will need access to the systems (e.g. to finish some paper). |
+ | The example must contain: | ||
+ | |||
+ | - A small README | ||
+ | - a Slurm job script, and | ||
+ | - the program that is run in the example and/or all input files needed to run it, this includes data files and definitions for the environment the job is to run in (e.g. a '' | ||
+ | |||
+ | If possible: | ||
+ | |||
+ | - The example should have an option to scale it so it runs between a few minutes and an hour at maximum, so that it can be used for benchmarking. | ||
+ | |||
+ | If you can't answer the questions for your example, these steps can help you answer them | ||
+ | |||
+ | - If you have written the code yourself, what dependecies does it have (e.g. Python libraries you import)? | ||
+ | - How long does your example run? | ||
+ | - How many CPUs and how much memory does the example need? | ||
+ | - Can the example' | ||
===== Slurm documentation ===== | ===== Slurm documentation ===== | ||
- | * [[important|Important notes]] on cluster | + | Read this for an introduction to Slurm queuing system, if you haven' |
* Start with the [[slurm|Introduction to the Slurm HPC cluster]]. | * Start with the [[slurm|Introduction to the Slurm HPC cluster]]. | ||
+ | |||
+ | Read this for some important notes on the specifics of our clusters. | ||
+ | |||
+ | * [[important|Important notes]] on cluster usage | ||
+ | |||
+ | These are more specialised topics: | ||
+ | |||
* Using [[interactivesessions|interactive sessions]] with the queuing system. | * Using [[interactivesessions|interactive sessions]] with the queuing system. | ||
- | * How to make use of the [[gpunodes|GPU-nodes]]. | ||
* Here is a [[nodes|list of special nodes]] that are currently not part of slurm. | * Here is a [[nodes|list of special nodes]] that are currently not part of slurm. | ||
* Here is a [[userlist|list of HPC users]] and the software they use | * Here is a [[userlist|list of HPC users]] and the software they use | ||
Line 25: | Line 48: | ||
===== Overview of available resources ===== | ===== Overview of available resources ===== | ||
- | <note important> | + | The following table lists some HPC resources available at the physics department. The tron cluster |
- | The following table lists some HPC resources available at the physics department. The tron cluster | + | The name of the login node for each of our clusters has the same name as the cluster, e.g. the sheldon login node is reachable via ssh under the hostname '' |
- | The name of the login node for each of our clusters has the same name as the cluster, | + | ^ Hosts ^ Nodes ^ Cores/Node ^ RAM/Core ^ RAM/Node ^ CPU features ^ GPU ^ on-GPU RAM ^ #Cores ^ #RAM ^ #GPU ^ |
+ | | @#cfc:** sheldon-ng | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
+ | | @# | ||
- | ^ Hosts ^ Manager | ||
- | | @# | ||
- | | @# | ||
- | | @# | ||
- | | @# | ||
- | | @# | ||
- | | @# | ||
- | | | | | ||
- | (06.11.2018) | + | (07.02.2025) |
{{: | {{: | ||
{{: | {{: |
services/cluster/start.1738849359.txt.gz · Last modified: 2025/02/06 13:42 by behrmj87