User Tools

Site Tools


services:cluster:gpunodes

Introduction to GPU accelerated jobs

Currently we have 31 nodes in the yoshi cluster (ygpu01-ygpu31) equipped with GPU boards. The exact hardware config is:

  • 2x NVidia Tesla M2070
  • 2x Xeon X5570
  • 24GB RAM
  • QDR Infiniband between all GPU nodes

In oder to use the GPU cards, you need to allocate them through the queuing system using the –gres=gpu:2 option. You could also just use one card if you submit with –gres=gpu:1. You also have to explicitly state the partition to run in using –partition=gpu-main (or gpu-test for the GPU test queue).

GROMACS example using GPU acceleration

Here I give a simple example using GROMACS. First I'll use an interactive session to explore the GPU feature, in the end I'll supply a complete batch script for use with sbatch.

dreger@yoshi:~/gpu> sinfo | grep gpu
gpu-test     up    2:00:00      1   idle ygpu01
gpu-main     up   infinite     30   idle ygpu[02-31]

The test partition gpu-test which consists of the single node ygpu01 will most likely be free, since it has a timelimit of 2 hours. So we'll use that for testing:

dreger@yoshi:~/gpu> srun –time=02:00:00 –nodes=1 –tasks=8 –gres=gpu:2 –partition=gpu-test –mem=1G –pty /bin/bash
dreger@ygpu01:~/gpu> env | grep CUDA
CUDA_VISIBLE_DEVICES=0,1
dreger@ygpu01:~/gpu> nvidia-smi
Thu Jun 18 14:16:19 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.65     Driver Version: 340.65         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M2070         Off  | 0000:14:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      9MiB /  5375MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M2070         Off  | 0000:15:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      9MiB /  5375MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

The nvidia-smi command gives some information on the GPUs. Currently no process is running on the GPUs. We'll start a simple GROMACS computation:

dreger@ygpu01:~/gpu> module load gromacs/non-mpi/4.6.7-cuda
dreger@ygpu01:~/gpu> genbox -box 9 9 9 -p -cs spc216 -o waterbox.gro
dreger@ygpu01:~/gpu> grompp -f run.mdp -c waterbox.gro -p topol.top
dreger@ygpu01:~/gpu> mdrun
[...]
Using 2 MPI threads
Using 4 OpenMP threads per tMPI thread

2 GPUs detected:
  #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
  #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible

2 GPUs auto-selected for this run.
Mapping of GPUs to the 2 PP ranks in this node: #0, #1
[...]
               Core t (s)   Wall t (s)        (%)
       Time:      262.880       34.401      764.2
                 (ns/day)    (hour/ns)
Performance:       25.121        0.955

While your jobs run you can log in to the node and call nvidia-smi to see if the GPUs are used at all:

dreger@ygpu01:~> nvidia-smi
Thu Jun 18 14:25:21 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.65     Driver Version: 340.65         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M2070         Off  | 0000:14:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |     67MiB /  5375MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M2070         Off  | 0000:15:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |     67MiB /  5375MiB |     77%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0     11481  mdrun                                                 55MiB |
|    1     11481  mdrun                                                 55MiB |
+-----------------------------------------------------------------------------+

Please check your job logfiles to see if your program has some problems using the GPUs. In case of GROMACS this might look like:

NOTE: GPU(s) found, but the current simulation can not use GPUs
      To use a GPU, set the mdp option: cutoff-scheme = Verlet
      (for quick performance testing you can use the -testverlet option)

Using 8 MPI threads

2 GPUs detected:
  #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
  #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible

2 compatible GPUs detected in the system, but none will be used.
Consider trying GPU acceleration with the Verlet scheme!

In this case a cutoff-scheme was specified that can not be used with GPU acceleration.

Compare the timings with a test run on the same node, that does not use the GPUs. In some cases the GPUs will not help at all, even though nvidia-smi shows a high utilization. For this example without GPU (note the missing -cuda in the module load command) we get:

dreger@ygpu01:~/gpu> module load gromacs/non-mpi/4.6.7
dreger@ygpu01:~/gpu> grompp -f run.mdp -c waterbox.gro -p topol.top
dreger@ygpu01:~/gpu> mdrun

               Core t (s)   Wall t (s)        (%)
       Time:      844.970      106.315      794.8
                 (ns/day)    (hour/ns)
Performance:        8.128        2.953

So in this case the calculation runs about three times faster with two GPU cards.

Example batch file

A job script for the example given above could look like:

#!/bin/bash

#SBATCH --mail-user=dreger@physik.fu-berlin.de
#SBATCH --mail-type=end

#SBATCH --output=job%j.out
#SBATCH --error=job%j.err
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=1024
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:2
#SBATCH --nodes=1
#SBATCH --partition=gpu-main

module load gromacs/non-mpi/4.6.7-cuda

TAG="${SLURM_JOB_ID}-$(hostname -s)-cuda"

grompp -f run.mdp -c waterbox.gro -p topol.top -o output-$TAG
mdrun -nt ${SLURM_CPUS_ON_NODE} -testverlet -v -deffnm output-$TAG

Please make sure you change the email if you use this for your own tests ;)

services/cluster/gpunodes.txt · Last modified: 2015/06/18 15:02 by dreger