On the cluster, Mathematica can be started without GUI from the command line (in interactive mode and from batch files). In this work mode, it processes so called m-files (they have a suffix .m). These m-files are normal Mathematica notebooks which have been 'saved as' in m-file form, which can be chosen in the menu.
IMPORTANT NOTE: Every cell in the m-file that is supposed to be processed by Mathematica (and not ignored) has to be marked as a so called 'initialization cell' in the cell menu.
Given an m-file "a.m", one starts it interactively from the shell in the following manner:
math -run "«a.m"
This shell command can be used as is in a script file or interactively also on the cluster. Mathematica then successively computes all the initialization cells it finds in the m-file, all other cells are ignored. It gives text output to the shell (but no graphics) and save/store commands (see below) can produce/retrieve output.
This is important for data management. One can save data via notebook command, say a list called list1 to a file called file1,
Put[list1, "file1"]
Similarly, one can load data via
list1 = Get["file1"]
IMPORTANT NOTE: The alternative command
file « "file1"
works in a notebook, but not in an m-file.
The whole point of going to a cluster is parallel computing. In the notebook, one can make use of parallelism by using commands like
ParallelTable[x,{x,0,10}]
which replaces the usual Table command. To make it actually work and use several kernels, one has to configure Mathematica properly. Usually we would use a special solution called GridMathematica, but one can also do it with the normal Mathematica and some notebook magic.
The general idea is then the following: We use the usual math command (see above) from the command line to start one master kernel (i.e. one instance of Mathematica). Using the m-file that is processed by this master kernel, we then
Step 2) and 3) are more or less automatic. For step 1) Mathematica has to know about the available nodes.
So step 1) works as follows. After submitting a job to the cluster, there is an environment variable called $PBS_NODEFILE
telling us where we find the PBS_NODEFILE that contains the names of the nodes which have been reserved for the job. In the following we
Afterwards, the master kernel has started and is ready to use the subkernels for parallel computing. The appropriate commands are
Needs["SubKernels`RemoteKernels`"] nodefile = Environment["PBS_NODEFILE"] nodes = ReadList[nodefile] machines = Table[RemoteMachine["\"" <> ToString[n] <> "\"", "ssh -x -f -l `3` `1` /net/opt/bin/math -mathlink -linkmode Connect `4` -linkname `2` -subkernel -noinit", 1], {n, nodes}] LaunchKernels[machines]
That's it. Note that PBS_NODEFILE contains one identifier for each processor, so that we have, say, n010 12 times in PBS_NODEFILE. This is fine, because we feed each one into the machine creation command [step 4 above], hence creating one subkernel for each processor.
IMPORTANT NOTES:
Sometimes several subkernels work on data that will be stored in a single data object, say they compute entries of a table, where the whole table is the common data object. In this case, it is important that they do not interfere when modyfing the common data object or when saving it to a file. For this purpose, Mathematica can define atomic expressions.
One needs to
It is instructive to see the following example
n = 0 SetSharedVariable[n] Clear[lock1] ParallelDo[CriticalSection[{lock1}, n = n + 1],{i,1,10}]
Without the locking, each subkernel would fetch the value of n that is just in memory and write it back after adding 1. The result is more or less undefined in this case (could be everything from n=1 to n=10). With locking, updating of n is protected and the other subkernels wait. This is obviously also useful for writing to a file.