This is an old revision of the document!

Shared GitLab Runner of the Physics department

All repositories have the ability to use the shared GitLab Runners. They can be used to compile, test and deliver code automatically. They must not be used for numerical computations.

Intro into GitLab CI (Continuous Integration)

If CI is enabled and there is a push event, GitLab will automatically start a pipeline. A pipeline will run a series of specified jobs on the code and report errors it encounters. Using pipelines, you can see if a change causes an issue with existing code and you can test it for different environments. This works for all branches as well as merge requests or on a schedule.

Permissions

To change project-wide settings for the runners, you need Maintainer or higher permissions.

Enabling GitLab CI

To enable GitLab CI for a repository, go to the page of the repository at https://gitlabph.physik.fu-berlin.de/<your username / the group's name>/<repo name>, go to Settings → General → Visibility, project features, permissions and enable Pipelines. After that go to Settings → CI/CD → Runners and click on Enable shared runners.

Basic Pipeline

The configuration of the CI is done using a yaml file. Create a file .gitlab-ci.yml at the root of the project

The configuration file can be as simple as

test:
    script:
        - ./testscript

This will define a job called test and in that job it will run ./testscript from the root of the project. If this testscripts exits successfully, the job will succeed and so will the pipeline.

Often you want to store the results of a job, for example to use it in further jobs. For this, you can use artifacts. Below is a sample configuration which will build a $LaTeX$ project with bibtex to a PDF and upload the resulting PDF to our GitLab instance.

build:
    script:
        - pdflatex main
        - bibtex main
        - pdflatex main
        - pdflatex main
    artifacts:
        paths:
            - main.pdf

For more complex pipelines it is suitable to use multiple jobs or even multiple stages.

A stage is meant to collect a set of jobs that can be run at the same time in parallel. When the pipeline begins, it will start all jobs in the first stage, wait for them to succeed and then start the next stage with all its jobs, etc.

stages:
    - dependencies
    - build
    - test

install_dependencies:
    stage: dependencies
    script:
        - ./get_deps.sh
    artifacts:
        paths:
            - deps/

build:
    stage: build
    script:
        - make
    artifacts:
        paths:
            - binary

test1:
    stage: test
    script:
        - make test1
    artifacts:
        paths:
            - test1.out

test2:
    stage: test
    script:
        - make test1

The order of the stages is defined by the stages section in the first four lines. With this configuration, the first job will install the dependencies, which would be stored into a folder deps and creates an artifact of that folder. Jobs will automatically download all artifacts from all jobs in previous stages. test1 and test2 will both get the folder deps as it is an artifact of install_dependencies and the file binary as it is an artifact of build and these jobs are both in an earlier stage than test1 and test2. test2 will not get the file test1.out from test1, as the jobs are run in the same stage.

Variables

Environment variables can easily be defined within the yaml structure. They can be defined for all jobs at the top level of the file,

stages:
    - dependencies
    - build
    - test

variables:
    VAR1: value1
    VAR2: value2

install_dependencies
...

or for each job individually.

install_dependencies:
    stage: dependencies
    variables:
        VAR1: value3
    script:
        - ./get_deps.sh
    artifacts:
        paths:
            - deps/

Variables inside jobs take precedence over variables defined at the top level. Variables can also be set for a project or a group in the CI/CD Settings. These settings are best used for secrets and can also provide their contents as files during jobs, in this case the variable contains the absolute path to the file instead of the actual contents. When variables are set for a group all projects and subgroups in the group will inherit them.

Parallel and matrix

Inside a job, you can use parallel to run it multiple times simultaneously. Some languages, e.g. ruby also have the ability to split tests across these parallel jobs.

test:
  parallel: 3
  script:
    - bundle
    - bundle exec rspec_booster --job $CI_NODE_INDEX/$CI_NODE_TOTAL

This would create 3 subjobs, which all show up under the test job and all need to succeed for the test job to succeed.

The parallel keyword can also be used in combination with matrix to run the same job with different environment variables. This allows for a compact way to run similar jobs.

myjob:
  image: $RELEASE
  script:
    - ./test-script.sh
  parallel:
    matrix:
      - RELEASE: [buster, bullseye]

This would run myjob on two different debian releases (see below for image options). Specifying multiple Variables will run it for all possible combinations (entries of the cartesian product of the two lists).

myjob:
  image: $RELEASE
  script:
    - ./get_dependecy.sh $DEPENDENCY_VERSION
    - ./test-script.sh
  parallel:
    matrix:
      - RELEASE: [buster, bullseye]
        DEPENDENCY_VERSION: [1.0, 2.2, 2.6]

In this case, 6 jobs would be created. Note, that DEPENDENCY_VERSION is not a new entry but another element inside the same entry (no new - at the beginning of the line). Adding a hyphen at the beginning will result in 5 jobs (one for buster, one for bullseye, one for 1.0, …):

    matrix:
      - RELEASE: [buster, bullseye]
      - DEPENDENCY_VERSION: [1.0, 2.2, 2.6]

Dependency Management

Dependencies of a project that are not compiled inside the project should not use artifacts. GitLab CI has a feature called cache. Cache is not uploaded to the GitLab instance, so it is not accessible by users but it is faster than using artifacts. With cache you can use the same version of the cache on multiple branches. Artifacts are always valid for a single job only.

Cache configuration uses key to identify the cache and paths to specify the cached contents, the same as artifacts. Putting the following block at the top of your config file will enable cache for the whole pipeline.

cache:
    key: mycache
    paths:
        - testdir/

With this config, every pipeline will use the same cache. As dependencies might change from time to time and might be differing from branch to branch, you can use predefined Environment Variables to specify the key. key: $CI_COMMIT_REF_SLUG will for example use the name of the branch/tag as the key, so pipelines have a unique cache for every branch and tag. You can also use checksums of the files that define the dependencies as the key.

cache:
    key:
        files:
            - requirements.txt
    paths:
        - somedir/

will use the checksum of requirements.txt as the key. This means your cache only has to be recreated when requirements.txt changes.

Jobs that update the cache should always be idempotent (i.e. should not change anything when run again) and reuse existing files. This means you wouldn’t need to reinstall, say 2GiB in a python venv with every run of a pipeline but only update the existing modules.

Images

GitLab CI allows for choosing images. All available images are based on debian. There are multiple variants and multiple releases available.

Available releases: * buster * bullseye * sid * stable (stable release, installed on workstations) * testing (next stable release)

These releases correspond to the debian codenames.

Available variants: * base (simplest variant e.g. for simple shell scripts) * dev (for C/C++ projects) * python (comes with the same python libs as the workstations) * tex (comes with the same TeX packages as the workstations) * fortran (comes with gfortran) * haskell (comes with ghc) * julia (comes with julia) * ruby (comes with ruby) * deb (for debian packaging) * full (Includes everything from above and is close to the workstation configuration)

Launch time increases with the size of the image which is why the full variant will be slowest to start.

To specify an image in the CI configuration, you can set image.

image: buster-python

The format is $relase-$variant. The image option can go into the yaml at the top at a global level to set the default image or in a specific job to overwrite the image.

image: buster-python
 
...

deploy:
    image: buster-base
    script:
        - ./deploy.sh

This will use buster-python for all jobs, except deploy, which will use the minimal base image.

Notes on Resource Usage

Cache is automatically deleted after not being accessed for 7 days in a row.
Artifacts also expire. This can be controlled with expire_in. For example, to expire the artifacts after 2 hours: yaml job: artifacts: expire_in: 2h
The latest artifacts will always be kept.
Artifacts that are not meant to be investigated manually (which should be most, except final results like LaTeX pdfs) should have a low expiration time to save disk space.
The maximum size for artifacts is 4 GiB.
Each job has TBD Cores and TBD GiB RAM

Pipeline Editor

The Pipeline Editor can be used to validate and visualize the gitlab ci config. It can be accessed unter CI/CD → Editor. This editor will automatically check the syntax of the config. The visualize tab shows a sketch of what the pipeline will look like. This can be useful for dependencies across stages when using more complicated pipelines

DokuWiki

Table of Contents