Introduction to GPU accelerated jobs

This information is severely outdated

Currently we have 31 nodes in the yoshi cluster (ygpu01-ygpu31) equipped with GPU boards. The exact hardware config is:

2x NVidia Tesla M2070
2x Xeon X5570
24GB RAM
QDR Infiniband between all GPU nodes

In oder to use the GPU cards, you need to allocate them through the queuing system using the –gres=gpu:2 option. You could also just use one card if you submit with –gres=gpu:1. You also have to explicitly state the partition to run in using –partition=gpu-main (or gpu-test for the GPU test queue).

GROMACS example using GPU acceleration

Here I give a simple example using GROMACS. First I'll use an interactive session to explore the GPU feature, in the end I'll supply a complete batch script for use with sbatch.

<xterm> dreger@yoshi:~/gpu> sinfo | grep gpu gpu-test up 2:00:00 1 idle ygpu01 gpu-main up infinite 30 idle ygpu[02-31] </xterm>

The test partition gpu-test which consists of the single node ygpu01 will most likely be free, since it has a timelimit of 2 hours. So we'll use that for testing:

<xterm> dreger@yoshi:~/gpu> srun –time=02:00:00 –nodes=1 –tasks=8 –gres=gpu:2 –partition=gpu-test –mem=1G –pty /bin/bash dreger@ygpu01:~/gpu> env | grep CUDA CUDA_VISIBLE_DEVICES=0,1 dreger@ygpu01:~/gpu> nvidia-smi Thu Jun 18 14:16:19 2015 +——————————————————+

NVIDIA-SMI 340.65 Driver Version: 340.65

GPU Name Persistence-M	Bus-Id Disp.A	Volatile Uncorr. ECC
Fan Temp Perf Pwr:Usage/Cap	Memory-Usage	GPU-Util Compute M.
===============================+======================+======================
0 Tesla M2070 Off	0000:14:00.0 Off	0
N/A N/A P0 N/A / N/A	9MiB / 5375MiB	0% Default

+——————————-+———————-+———————-+

1 Tesla M2070 Off	0000:15:00.0 Off	0
N/A N/A P0 N/A / N/A	9MiB / 5375MiB	0% Default

+——————————-+———————-+———————-+

+—————————————————————————–+

Compute processes: GPU Memory

GPU PID Process name Usage

=============================================================================

No running compute processes found

+—————————————————————————–+ </xterm>

The nvidia-smi command gives some information on the GPUs. Currently no process is running on the GPUs. We'll start a simple GROMACS computation:

<xterm> dreger@ygpu01:~/gpu> module load gromacs/non-mpi/4.6.7-cuda dreger@ygpu01:~/gpu> genbox -box 9 9 9 -p -cs spc216 -o waterbox.gro dreger@ygpu01:~/gpu> grompp -f run.mdp -c waterbox.gro -p topol.top dreger@ygpu01:~/gpu> mdrun […] Using 2 MPI threads Using 4 OpenMP threads per tMPI thread

2 GPUs detected:

#0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
#1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible

2 GPUs auto-selected for this run. Mapping of GPUs to the 2 PP ranks in this node: #0, #1 […]

             Core t (s)   Wall t (s)        (%)
     Time:      262.880       34.401      764.2
               (ns/day)    (hour/ns)

Performance: 25.121 0.955 </xterm>

While your jobs run you can log in to the node and call nvidia-smi to see if the GPUs are used at all:

<xterm> dreger@ygpu01:~> nvidia-smi Thu Jun 18 14:25:21 2015 +——————————————————+

NVIDIA-SMI 340.65 Driver Version: 340.65

GPU Name Persistence-M	Bus-Id Disp.A	Volatile Uncorr. ECC
Fan Temp Perf Pwr:Usage/Cap	Memory-Usage	GPU-Util Compute M.
===============================+======================+======================
0 Tesla M2070 Off	0000:14:00.0 Off	0
N/A N/A P0 N/A / N/A	67MiB / 5375MiB	76% Default

+——————————-+———————-+———————-+

1 Tesla M2070 Off	0000:15:00.0 Off	0
N/A N/A P0 N/A / N/A	67MiB / 5375MiB	77% Default

+——————————-+———————-+———————-+

+—————————————————————————–+

Compute processes: GPU Memory

GPU PID Process name Usage

=============================================================================

0 11481 mdrun 55MiB

1 11481 mdrun 55MiB

+—————————————————————————–+ </xterm>

Please check your job logfiles to see if your program has some problems using the GPUs. In case of GROMACS this might look like:

<xterm> NOTE: GPU(s) found, but the current simulation can not use GPUs To use a GPU, set the mdp option: cutoff-scheme = Verlet (for quick performance testing you can use the -testverlet option)

Using 8 MPI threads

2 GPUs detected:

#0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
#1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible

2 compatible GPUs detected in the system, but none will be used. Consider trying GPU acceleration with the Verlet scheme! </xterm>

In this case a cutoff-scheme was specified that can not be used with GPU acceleration.

Compare the timings with a test run on the same node, that does not use the GPUs. In some cases the GPUs will not help at all, even though nvidia-smi shows a high utilization. For this example without GPU (note the missing -cuda in the module load command) we get:

<xterm> dreger@ygpu01:~/gpu> module load gromacs/non-mpi/4.6.7 dreger@ygpu01:~/gpu> grompp -f run.mdp -c waterbox.gro -p topol.top dreger@ygpu01:~/gpu> mdrun

             Core t (s)   Wall t (s)        (%)
     Time:      844.970      106.315      794.8
               (ns/day)    (hour/ns)

Performance: 8.128 2.953 </xterm>

So in this case the calculation runs about three times faster with two GPU cards.

Example batch file

A job script for the example given above could look like:

<xterm> #!/bin/bash

#SBATCH –mail-user=dreger@physik.fu-berlin.de #SBATCH –mail-type=end

#SBATCH –output=job%j.out #SBATCH –error=job%j.err #SBATCH –ntasks=8 #SBATCH –mem-per-cpu=1024 #SBATCH –time=01:00:00 #SBATCH –gres=gpu:2 #SBATCH –nodes=1 #SBATCH –partition=gpu-main

module load gromacs/non-mpi/4.6.7-cuda

TAG="${SLURM_JOB_ID}-$(hostname -s)-cuda"

grompp -f run.mdp -c waterbox.gro -p topol.top -o output-$TAG mdrun -nt ${SLURM_CPUS_ON_NODE} -testverlet -v -deffnm output-$TAG </xterm>

Please make sure you change the email if you use this for your own tests ;)

DokuWiki

Table of Contents

Introduction to GPU accelerated jobs

GROMACS example using GPU acceleration

Example batch file