Differences

This shows you the differences between two versions of the page.

--- services:cluster:gpunodes [2015/06/18 12:26] – [GROMACS example using GPU acceleration] dreger
+++ services:cluster:gpunodes [2015/06/18 13:02] – dreger
@@ Line 54: / Line 54: @@
 dreger@ygpu01:~/gpu> **genbox -box 9 9 9 -p -cs spc216 -o waterbox.gro**
 dreger@ygpu01:~/gpu> **grompp -f {{:services:cluster:run.mdp|}} -c waterbox.gro -p {{:services:cluster:topol.top|}}**
+dreger@ygpu01:~/gpu> **mdrun**
+[...]
+Using 2 MPI threads
+Using 4 OpenMP threads per tMPI thread
+GPUs detected:
+  #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
+  #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
+GPUs auto-selected for this run.
+Mapping of GPUs to the 2 PP ranks in this node: #0, #1
+[...]
+               Core t (s)   Wall t (s)        (%)
+       Time:      262.880       34.401      764.2
+                 (ns/day)    (hour/ns)
+Performance:       25.121        0.955
 </xterm>
+While your jobs run you can log in to the node and call ''nvidia-smi'' to see if the GPUs are used at all:
+<xterm>
+dreger@ygpu01:~> **nvidia-smi**
+Thu Jun 18 14:25:21 2015
++------------------------------------------------------+
+| NVIDIA-SMI 340.65     Driver Version: 340.65         |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|===============================+======================+======================|
+|   0  Tesla M2070         Off  | 0000:14:00.0     Off |                    0 |
+| N/A   N/A    P0    N/A /  N/A |     67MiB /  5375MiB |     76%      Default |
++-------------------------------+----------------------+----------------------+
+|   1  Tesla M2070         Off  | 0000:15:00.0     Off |                    0 |
+| N/A   N/A    P0    N/A /  N/A |     67MiB /  5375MiB |     77%      Default |
++-------------------------------+----------------------+----------------------+
++-----------------------------------------------------------------------------+
+| Compute processes:                                               GPU Memory |
+|  GPU       PID  Process name                                     Usage      |
+|=============================================================================|
+|    0     11481  mdrun                                                 55MiB |
+|    1     11481  mdrun                                                 55MiB |
++-----------------------------------------------------------------------------+
+</xterm>
+Please check your job logfiles to see if your program has some problems using the GPUs. In case of GROMACS this might look like:
+<xterm>
+**NOTE: GPU(s) found, but the current simulation can not use GPUs
+      To use a GPU, set the mdp option: cutoff-scheme = Verlet
+      (for quick performance testing you can use the -testverlet option)**
+Using 8 MPI threads
+GPUs detected:
+  #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
+  #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
+**2 compatible GPUs detected in the system, but none will be used.
+Consider trying GPU acceleration with the Verlet scheme!**
+</xterm>
+In this case a cutoff-scheme was specified that can not be used with GPU acceleration.
+Compare the timings with a test run on the same node, that does not use the GPUs. In some cases the GPUs will not help at all, even though ''nvidia-smi'' shows a high utilization. For this example without GPU (note the missing -cuda in the module load command) we get:
+<xterm>
+dreger@ygpu01:~/gpu> **module load gromacs/non-mpi/4.6.7**
+dreger@ygpu01:~/gpu> **grompp -f run.mdp -c waterbox.gro -p topol.top**
+dreger@ygpu01:~/gpu> **mdrun**
+               Core t (s)   Wall t (s)        (%)
+       Time:      844.970      106.315      794.8
+                 (ns/day)    (hour/ns)
+Performance:        8.128        2.953
+</xterm>
+So in this case the calculation runs about three times faster with two GPU cards.
+===== Example batch file =====
+A job script for the example given above could look like:
+<xterm>
+#!/bin/bash
+#SBATCH --mail-user=dreger@physik.fu-berlin.de
+#SBATCH --mail-type=end
+#SBATCH --output=job%j.out
+#SBATCH --error=job%j.err
+#SBATCH --ntasks=8
+#SBATCH --mem-per-cpu=1024
+#SBATCH --time=01:00:00
+#SBATCH --gres=gpu:2
+#SBATCH --nodes=1
+#SBATCH --partition=gpu-main
+module load gromacs/non-mpi/4.6.7-cuda
+TAG="${SLURM_JOB_ID}-$(hostname -s)-cuda"
+grompp -f run.mdp -c waterbox.gro -p topol.top -o output-$TAG
+mdrun -nt ${SLURM_CPUS_ON_NODE} -testverlet -v -deffnm output-$TAG
+</xterm>
+Please make sure you change the email if you use this for your own tests ;)