Table of Contents
Shared GitLab Runners of the Physics Department
All repositories have the ability to use the shared GitLab Runners. They can be used to compile, test and deliver code automatically. They must not be used for numerical computations.
Intro to GitLab CI (Continuous Integration)
If CI is enabled and there is a push event, GitLab will automatically start a pipeline. A pipeline will run a series of specified jobs on the code and report errors it encounters. Using pipelines, you can see if a change causes an issue with existing code and you can test it for different environments. This works for all branches as well as merge requests or on a schedule.
Permissions
To change project-wide settings for the runners, you need Maintainer or higher permissions.
Enabling GitLab CI
To enable GitLab CI for a repository, go to the page of the repository at https://gitlabph.physik.fu-berlin.de/<your username / the group's name>/<repo name>
, go to Settings → General → Visibility, project features, permissions and enable Pipelines. After that go to Settings → CI/CD → Runners and click on Enable shared runners.
Basic Pipeline
The configuration of the CI is done using a yaml
file. Create a file .gitlab-ci.yml
at the root of the project
The configuration file can be as simple as
test: script: - ./testscript
This will define a job called test and in that job it will run ./testscript
from the root of the project. If this testscript exits successfully, the job will succeed and so will the pipeline.
Often you will want to store the results of a job, for example to use it in further jobs. For this, you can use artifacts. Below is a sample configuration that will build a $LaTeX$ project with bibtex to a PDF and upload the resulting PDF to our GitLab instance.
build: script: - pdflatex main - bibtex main - pdflatex main - pdflatex main artifacts: paths: - main.pdf
For more complex pipelines it is suitable to use multiple jobs or even multiple stages.
A stage is meant to collect a set of jobs that can be run at the same time in parallel. When the pipeline begins, it will start all jobs in the first stage, wait for them to succeed and then start the next stage with all its jobs, etc.
stages: - dependencies - build - test install_dependencies: stage: dependencies script: - ./get_deps.sh artifacts: paths: - deps/ build: stage: build script: - make artifacts: paths: - binary test1: stage: test script: - make test1 artifacts: paths: - test1.out test2: stage: test script: - make test1
The order of the stages is defined by the stages section in the first four lines. With this configuration, the first job will install the dependencies, which would be stored into a folder deps
and creates an artifact of that folder. Jobs will automatically download all artifacts from all jobs in previous stages. test1
and test2
will both get the folder deps
as it is an artifact of install_dependencies
and the file binary
as it is an artifact of build
and these jobs are both in an earlier stage than test1
and test2
. test2
will not get the file test1.out
from test1
, as the jobs are run in the same stage.
Variables
Environment variables can easily be defined within the yaml structure. They can be defined for all jobs at the top level of the file,
stages: - dependencies - build - test variables: VAR1: value1 VAR2: value2 install_dependencies ...
or for each job individually.
install_dependencies: stage: dependencies variables: VAR1: value3 script: - ./get_deps.sh artifacts: paths: - deps/
Variables inside jobs take precedence over variables defined at the top level. Variables can also be set for a project or a group in the CI/CD Settings. These settings are best used for secrets and can also provide their contents as files during jobs, in this case the variable contains the absolute path to the file instead of the actual contents. When variables are set for a group all projects and subgroups in the group will inherit them.
Parallel and matrix
Inside a job, you can use parallel
to run it multiple times simultaneously. Some languages, e.g. ruby also have the ability to split tests across these parallel jobs.
test: parallel: 3 script: - bundle - bundle exec rspec_booster --job $CI_NODE_INDEX/$CI_NODE_TOTAL
This would create 3 subjobs, which all show up under the test
job and all need to succeed for the test
job to succeed.
The parallel
keyword can also be used in combination with matrix to run the same job with different environment variables. This allows for a compact way to run similar jobs.
myjob: image: $RELEASE script: - ./test-script.sh parallel: matrix: - RELEASE: [buster, bullseye]
This would run myjob
on two different debian releases (see below for image options). Specifying multiple Variables will run it for all possible combinations (entries of the cartesian product of the two lists).
myjob: image: $RELEASE script: - ./get_dependecy.sh $DEPENDENCY_VERSION - ./test-script.sh parallel: matrix: - RELEASE: [buster, bullseye] DEPENDENCY_VERSION: [1.0, 2.2, 2.6]
In this case, 6 jobs would be created. Note, that DEPENDENCY_VERSION
is not a new entry but another element inside the same entry (no new -
at the beginning of the line). Adding a hyphen at the beginning will result in 5 jobs (one for buster
, one for bullseye
, one for 1.0
, …):
matrix: - RELEASE: [buster, bullseye] - DEPENDENCY_VERSION: [1.0, 2.2, 2.6]
Dependency Management
Dependencies of a project that are not compiled inside the project should not use artifacts. GitLab CI has a feature called cache. Cache is not uploaded to the GitLab instance, so it is not accessible by users but it is faster than using artifacts. With cache you can use the same version of the cache on multiple branches. Artifacts are always valid for a single job only.
Cache configuration uses key
to identify the cache and paths
to specify the cached contents, the same as artifacts
. Putting the following block at the top of your config file will enable cache for the whole pipeline.
cache: key: mycache paths: - testdir/
With this config, every pipeline will use the same cache. As dependencies might change from time to time and might be differing from branch to branch, you can use predefined Environment Variables to specify the key. key: $CI_COMMIT_REF_SLUG
will for example use the name of the branch/tag as the key, so pipelines have a unique cache for every branch and tag. You can also use checksums of the files that define the dependencies as the key.
cache: key: files: - requirements.txt paths: - somedir/
will use the checksum of requirements.txt
as the key. This means your cache only has to be recreated when requirements.txt
changes.
Jobs that update the cache should always be idempotent (i.e. should not change anything when run again) and reuse existing files. This means you wouldn’t need to reinstall, say 2GiB in a python venv with every run of a pipeline but only update the existing modules.
Images
GitLab CI allows for choosing images. All available images are based on debian. There are multiple variants and multiple releases available.
Available releases:
- buster
- bullseye
- bookworm (julia is not packaged for this release currently, therefore the julia variant is not available)
- oldstable (last stable release)
- stable (stable release, installed on workstations)
- testing (next stable release)
These releases correspond to the debian codenames.
Available variants:
- base (most simple variant e.g. for simple shell scripts)
- dev (for C/C++ projects)
- python (comes with the same python libs as the workstations)
- tex (comes with the same TeX packages as the workstations)
- fortran (comes with gfortran)
- haskell (comes with ghc)
- julia (comes with julia)
- ruby (comes with ruby)
- deb (for debian packaging)
- full (Includes everything from above and is close to the workstation configuration)
Launch time increases with the size of the image which is why the full variant will be slowest to start.
To specify an image in the CI configuration, you can set image
.
image: bullseye-python
The format is $relase-$variant
. The image
option can go into the yaml at the top at a global level to set the default image or in a specific job to overwrite the image.
image: bullseye-python ... deploy: image: bullseye-base script: - ./deploy.sh
This will use bullseye-python
for all jobs, except deploy
, which will use the minimal base image.
Notes on Resource Usage
- Cache is automatically deleted after not being accessed for 7 days in a row.
- Artifacts also expire. This can be controlled with expire_in. For example, to expire the artifacts after 2 hours:
yaml job: artifacts: expire_in: 2h
- The latest artifacts will always be kept.
- Artifacts that are not meant to be investigated manually (which should be most, except final results like LaTeX pdfs) should have a low expiration time to save disk space.
- The maximum size for artifacts is 4 GiB.
- Each job can use up to 8 Cores and 8 GiB RAM
Pipeline Editor
The Pipeline Editor can be used to validate and visualize the gitlab ci config. It can be accessed unter CI/CD → Editor. This editor will automatically check the syntax of the config. The visualize tab shows a sketch of what the pipeline will look like. This can be useful for dependencies across stages when using more complicated pipelines