====== Mathematica on a cluster HOWTO ====== ===== Basics ===== On the cluster, Mathematica can be started without GUI from the command line (in interactive mode and from batch files). In this work mode, it processes so called m-files (they have a suffix .m). These m-files are normal Mathematica notebooks which have been 'saved as' in m-file form, which can be chosen in the menu. **IMPORTANT NOTE:** Every cell in the m-file that is supposed to be processed by Mathematica (and not ignored) has to be marked as a so called 'initialization cell' in the cell menu. Given an m-file "a.m", one starts it interactively from the shell in the following manner: ''math -run "< ToString[n] <> "\"", "ssh -x -f -l `3` `1` /net/opt/bin/math -mathlink -linkmode Connect `4` -linkname `2` -subkernel -noinit", 1], {n, nodes}] LaunchKernels[machines] That's it. Note that PBS_NODEFILE contains one identifier for each processor, so that we have, say, n010 12 times in PBS_NODEFILE. This is fine, because we feed each one into the machine creation command [step 4 above], hence creating one subkernel for each processor. **IMPORTANT NOTES:** * For some reason, Mathematica doesn't like it if we try to create more than about 250 subkernels. The procedure that usually works justs hangs after about 250 subkernels have been created, giving a generic 'MathLink error'. No idea why, could be the ssh configuration or Mathematica. * "Parallel" commands like ParallelTable etc. accept an option called Method. This option has sometimes a huge impact on the achieved load. Although the default works OK, for parallel tasks with vastly different run times in each instance, the help advises to use Method->"FinestGrained". For me, this caused a very low load. Using Method->"CoarsestGrained" seems to work fine. ===== Data consistency ===== Sometimes several subkernels work on data that will be stored in a single data object, say they compute entries of a table, where the whole table is the common data object. In this case, it is important that they do not interfere when modyfing the common data object or when saving it to a file. For this purpose, Mathematica can define atomic expressions. One needs to - define which variables will be shared - define an "unused" lock-variable, say lock1 - use a construction called CriticalSection It is instructive to see the following example n = 0 SetSharedVariable[n] Clear[lock1] ParallelDo[CriticalSection[{lock1}, n = n + 1],{i,1,10}] Without the locking, each subkernel would fetch the value of n that is just in memory and write it back after adding 1. The result is more or less undefined in this case (could be everything from n=1 to n=10). With locking, updating of n is protected and the other subkernels wait. This is obviously also useful for writing to a file.