Table of Contents
Mathematica on a cluster HOWTO
Basics
On the cluster, Mathematica can be started without GUI from the command line (in interactive mode and from batch files). In this work mode, it processes so called m-files (they have a suffix .m). These m-files are normal Mathematica notebooks which have been 'saved as' in m-file form, which can be chosen in the menu.
IMPORTANT NOTE: Every cell in the m-file that is supposed to be processed by Mathematica (and not ignored) has to be marked as a so called 'initialization cell' in the cell menu.
Given an m-file "a.m", one starts it interactively from the shell in the following manner:
math -run "«a.m"
This shell command can be used as is in a script file or interactively also on the cluster. Mathematica then successively computes all the initialization cells it finds in the m-file, all other cells are ignored. It gives text output to the shell (but no graphics) and save/store commands (see below) can produce/retrieve output.
Saving and loading
This is important for data management. One can save data via notebook command, say a list called list1 to a file called file1,
Put[list1, "file1"]
Similarly, one can load data via
list1 = Get["file1"]
IMPORTANT NOTE: The alternative command
file « "file1"
works in a notebook, but not in an m-file.
Parallel computing
The whole point of going to a cluster is parallel computing. In the notebook, one can make use of parallelism by using commands like
ParallelTable[x,{x,0,10}]
which replaces the usual Table command. To make it actually work and use several kernels, one has to configure Mathematica properly. Usually we would use a special solution called GridMathematica, but one can also do it with the normal Mathematica and some notebook magic.
The general idea is then the following: We use the usual math command (see above) from the command line to start one master kernel (i.e. one instance of Mathematica). Using the m-file that is processed by this master kernel, we then
- start subkernels doing the actual work
- take care of data distribution
- collect data from the subkernels after/during computations
Step 2) and 3) are more or less automatic. For step 1) Mathematica has to know about the available nodes.
So step 1) works as follows. After submitting a job to the cluster, there is an environment variable called $PBS_NODEFILE
telling us where we find the PBS_NODEFILE that contains the names of the nodes which have been reserved for the job. In the following we
- read out the environment variable $PBS_NODEFILE
- use it to access the nodefile and read its content
- configure subkernels (one for each processor)
- launch the subkernels
Afterwards, the master kernel has started and is ready to use the subkernels for parallel computing. The appropriate commands are
Needs["SubKernels`RemoteKernels`"] nodefile = Environment["PBS_NODEFILE"] nodes = ReadList[nodefile] machines = Table[RemoteMachine["\"" <> ToString[n] <> "\"", "ssh -x -f -l `3` `1` /net/opt/bin/math -mathlink -linkmode Connect `4` -linkname `2` -subkernel -noinit", 1], {n, nodes}] LaunchKernels[machines]
That's it. Note that PBS_NODEFILE contains one identifier for each processor, so that we have, say, n010 12 times in PBS_NODEFILE. This is fine, because we feed each one into the machine creation command [step 4 above], hence creating one subkernel for each processor.
IMPORTANT NOTES:
- For some reason, Mathematica doesn't like it if we try to create more than about 250 subkernels. The procedure that usually works justs hangs after about 250 subkernels have been created, giving a generic 'MathLink error'. No idea why, could be the ssh configuration or Mathematica.
- "Parallel" commands like ParallelTable etc. accept an option called Method. This option has sometimes a huge impact on the achieved load. Although the default works OK, for parallel tasks with vastly different run times in each instance, the help advises to use Method→"FinestGrained". For me, this caused a very low load. Using Method→"CoarsestGrained" seems to work fine.
Data consistency
Sometimes several subkernels work on data that will be stored in a single data object, say they compute entries of a table, where the whole table is the common data object. In this case, it is important that they do not interfere when modyfing the common data object or when saving it to a file. For this purpose, Mathematica can define atomic expressions.
One needs to
- define which variables will be shared
- define an "unused" lock-variable, say lock1
- use a construction called CriticalSection
It is instructive to see the following example
n = 0 SetSharedVariable[n] Clear[lock1] ParallelDo[CriticalSection[{lock1}, n = n + 1],{i,1,10}]
Without the locking, each subkernel would fetch the value of n that is just in memory and write it back after adding 1. The result is more or less undefined in this case (could be everything from n=1 to n=10). With locking, updating of n is protected and the other subkernels wait. This is obviously also useful for writing to a file.