This is an old revision of the document!

Introduction to GPU accelerated jobs

Currently we have 31 nodes in the yoshi cluster (ygpu01-ygpu31) equipped with GPU boards. The exact hardware config is:

2x NVidia Tesla M2070
2x Xeon X5570
24GB RAM
QDR Infiniband between all GPU nodes

In oder to use the GPU cards, you need to allocate them through the queuing system using the –gres=gpu:2 option. You could also just use one card if you submit with –gres=gpu:1. You also have to explicitly state the partition to run in using –partition=gpu-main (or gpu-test for the GPU test queue).

GROMACS example using GPU acceleration

Here I give a simple example using GROMACS. First I'll use an interactive session to explore the GPU feature, in the end I'll supply a complete batch script for use with sbatch.

dreger@yoshi:~/gpu> sinfo | grep gpu
gpu-test     up    2:00:00      1   idle ygpu01
gpu-main     up   infinite     30   idle ygpu[02-31]

The test partition gpu-test which consists of the single node ygpu01 will most likely be free, since it has a timelimit of 2 hours. So we'll use that for testing:

dreger@yoshi:~/gpu> srun –time=02:00:00 –nodes=1 –tasks=8 –gres=gpu:2 –partition=gpu-test –mem=1G –pty /bin/bash
dreger@ygpu01:~/gpu> env | grep CUDA
CUDA_VISIBLE_DEVICES=0,1
dreger@ygpu01:~/gpu> nvidia-smi
Thu Jun 18 14:16:19 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.65     Driver Version: 340.65         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M2070         Off  | 0000:14:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      9MiB /  5375MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M2070         Off  | 0000:15:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      9MiB /  5375MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

The nvidia-smi command gives some information on the GPUs. Currently no process is running on the GPUs. We'll start a simple GROMACS computation:

dreger@ygpu01:~/gpu> module load gromacs/non-mpi/4.6.7-cuda
dreger@ygpu01:~/gpu> genbox -box 9 9 9 -p -cs spc216 -o waterbox.gro
dreger@ygpu01:~/gpu> grompp -f run.mdp -c waterbox.gro -p topol.top
dreger@ygpu01:~/gpu> mdrun
[...]
Using 2 MPI threads
Using 4 OpenMP threads per tMPI thread

2 GPUs detected:
  #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
  #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible

2 GPUs auto-selected for this run.
Mapping of GPUs to the 2 PP ranks in this node: #0, #1
[...]
               Core t (s)   Wall t (s)        (%)
       Time:      262.880       34.401      764.2
                 (ns/day)    (hour/ns)
Performance:       25.121        0.955

</xterm>