Docker | Algorist

Building a CI pipeline for R packages in gitlab

Sat, 06 Feb 2021 00:00:00 +0000

Continuous Integration (CI) is not a tool but a practice of continually merging in new behaviour/features into a released product. To facilitate this practice without exposing end users to unstable behaviour and bugs, testing needs to be standardised and automated. It’s no wonder then that CI is often associated with Test Driven Development (TDD), which mandates that you write your tests first, working backwards to the write the minimal code that should pass each test.

At first glance CI is not directly relevant to consultancy projects or products for internal use, which I tend to spend most of my time working on. However, it is a good discipline to foster ahead of developing packages for wider use, saves time when bug squashing and gives a framework for collaborating with other people (protect the master branch!).

The version control tool of choice (for the moment at least) for my company’s team is Gitlab. I had noticed on the platform there is the option to use inbuilt CI/CD tools on projects and products. I was generally aware of CI and agreed with the idea of standardising and streamlining tests on code so I thought I would give it a go.

I went to Stack Overflow for inspiration (as usual). From there I built the pipeline in yaml syntax, below.

image: rocker/r-base
stages:
- test
- quality
default:
before_script:
- mkdir -p installed_deps
- echo 'R_LIBS="installed_deps"' > .Renviron
- echo 'R_LIBS_USER="installed_deps"' >> .Renviron
- echo 'R_LIBS_SITE="installed_deps"' >> .Renviron
test:
stage: test
script:
- apt-get update
- apt-get install --yes --no-install-recommends r-cran-testthat r-cran-devtools
- R -e "devtools::install_deps(dependencies = TRUE)"
- R CMD build . --no-build-vignettes --no-manual
- PKG_FILE_NAME=$(ls -1t *.tar.gz | head -n 1)
- R CMD check "${PKG_FILE_NAME}" --no-build-vignettes --no-manual
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- installed_deps/
artifacts:
paths:
- '*.Rcheck/'
only:
- master
- dev
quality:
stage: quality
include:
template: Jobs/Code-Quality.gitlab-ci.yml

What does it do? Well, we start with the base R docker image from the rocker project, making sure that the package lists are up to date (pretty sure this is built on Debian linux, so the package manager would be dpkg).

Before we start on the actual job scripts, we run some code to make a folder called installed_deps and declare some environment variables in .Renviron to let R know this is the place to install packages. This is important for caching, described further below.

Next we focus on the test job. We install the bare minimum R libraries on top of base R so that we can test the package: testthat and devtools. Note we elect to install no recommended packages alongside to make sure the container is as debloated as possible.

After this we install all the dependencies that are specified in the DESCRIPTION file you wrote as part of your R package. When I used the original Stack answer above, it did not have the dependencies = TRUE argument, which meant that none of my suggested packages were installed. This caused my build to fail because some of my tests depended on them; however, for most use cases you may not need this argument.

As part of this stage we also define the cache to include the R library folder we set up in the before_script: section. This makes subsequent jobs run a lot faster (>5 times in my experience) because we don’t have to install packages again if they haven’t changed on CRAN. Note that this cache is only available for this stage, unless we define the cache outside of the stage.

The last three lines in the script build the package source as a tar.gz compressed file, store the package name from this file into a variable, and then use it to call CMD check. This final line tests both for whether all unit tests described in the /tests folder have passed, and whether the package can be built without errors. Note that for building the source and checking it we specify that we will not build vignettes or man pages. Although testing these might be useful they take a long time to run. In lieu of testing the vignette, I will often run a battery of unit tests using context("Testing according to the vignette"), replicating each line of the vignette in tests.

Artifacts are files that can be downloaded after the job is successfully run. We specify the folder where the compressed source file is created as an artifact.

The only: section within the test job refers to only the branches we wish to run the job on. Instead of whitelisting branches we could instead have dealt by exception, using except: instead. The configuration above only runs the test job on the master and dev branches.

The last job is from a Gitlab template that tests for code quality. I’m not sure how useful it is for R but it is a good illustration of how to chain jobs into a CI pipeline. Both test and code quality jobs must succeed for the pipeline to pass.

DIY runners

A note on using CI pipelines on the Gitlab free tier: it is very easy to chew through the 200 free minutes per month. I managed to consume around 70% of this in a day and a half of experimenting, which spurred me on to make my own runner to get around this limitation.

The Gitlab-runner project is the result of this experimentation. I successfully set up a runner on a Raspberry Pi 4 with 8gb of RAM but since the rocker/r-base image is not arm compatible then the job failed. One possibility is to change the image; the other is to use a computer with compatible architecture. In the end I used my other laptop and it ran just fine; I then changed the base image to the one in my armr project and it also ran on my Raspberry Pi fine as well. No more limited pipeline minutes!

Using Rstudio Server with docker

Sat, 30 Jan 2021 00:00:00 +0000

It’s taking a long time to run my genetic algorithm optimisation models recently. So much so that I’ve been looking at offloading processes to other computers lying idle on the network. The armr project aims to do this with parallel processing and Rstudio server docker images running on the raspberry pi but this is a work in progress currently, chiefly due to having to build Rstudio server from source.

In the meantime I have managed to run Rstudio server in a Docker container on my personal laptop, logging into it on my work laptop. Here’s how I did it, using the image provided by the Rocker project.

On the host machine

Assuming you already have docker installed, run the below code in a terminal on the host machine.

docker pull rocker/rstudio
docker run --rm -p 8787:8787 -e PASSWORD="password" rocker/rstudio

The default user name is “rstudio”. If you want to set the user name, add -e USER="user" to the command above.

As easy as that. To make the image usable we would have to create a new dockerfile based on this image and add run commands that install packages within R. For example,

from rocker/rstudio
RUN R -e "install.packages('tidyverse')"

The base armr dockerfile shows how to do this with an install script that reads a requirements.txt file - much quicker than installing each package individually. For saving work, you’ll also need to add a volume tag to the docker run command, e.g.,

docker run --rm -v $(pwd):home/user/ -p 8787:8787 -e USER="user" -e PASSWORD="password" rocker/rstudio

On the external machine

To access from another computer on the network, open a browser and navigate to hostname:877, where hostname is the hostname of the computer running the container. This can be found by running hostname on it from the terminal (Linux only). If you want to access Rstudio server on the go, away from home, then you will need to do the following.

Issue a static IP address to the host computer, most easily done using your router
Set up port forwarding to forward port 8787 from the static IP to the outside world
Read up on security settings in Rstudio server and implement them! These are likely to include IP whitelisting, certificate only authentication, banning IP addresses after several failed attempts (see fail2ban) and more

How to run R using Docker on Raspberry Pi

Mon, 17 Aug 2020 00:00:00 +0000

When I began learning about how to use Docker I stumbled on an excellent project called Rocker. For anyone with an x86 machine these Rocker images allow them to run R and most of its dependencies in a containerised environment. Plumber APIs, anyone? What about your own Shiny server? Finally, data scientists using R can have the same level of control on dependencies and package versions as Python users have become accustomed to through venv.

Things are a little more complicated for ARM users, especially 32-bit ARM architectures such as the Raspberry Pi. No Rocker images offer such compatibility so we’re on our own. This was the major reason I’ve started a project, called, ARMR, to build a series of Docker images that do offer compatibility with the lovable credit card sized computer.

Hello woRld

Whilst not much has happened with the project so far, at least I have a version of “Hello woRld”: a container with r-base installed. But first we must install Docker.

Installation

From the terminal on a Raspberry Pi, run the following.

# Downloads installation shell script and pipes it into the sh command
curl -sSL https://get.docker.com | sh
# Adds pi to the docker group so the user can run
sudo usermod -aG docker pi

From here we can either reboot or run systemctl start docker.service to start up Docker. To test it is working, try docker info, then docker version.

Once you know that Docker is running, let’s try a few things in order of sophistication. docker run hello-world will run a container based on an image called hello-world. docker run -it ubuntu bash takes it up a notch: now we have an ubuntu bash container running in interactive mode in the terminal.

To make it even more useful, we ought to have access to persistent storage. Let’s modify the command to include a mount volume.

docker run -it -v /home:/home ubuntu bash

The -v flag tells Docker to attach a volume; the following argument contains the information on what locations should be used, of the form from_volume:to_volume. The location on your machine is the from_volume and the location on your container is the to_volume. In our example, anything you create in the home folder within the container will persist in the home folder of your Raspberry Pi after you close the container. The easiest way to test this is to type touch myfile in the interactive terminal in the container, and watch the same file appear in your home folder.

Build an R container

Building a base R container is as simple as writing the code below to a file called Dockerfile. We use an arm32 ubuntu image as a base, from which we set an environment variable to force the terminal to be non-interactive. This is because when r-base is installed it waits for user input when setting parameters, hanging the container build. By setting the installation to be non-interactive we accept all the defaults, including timezone. Be mindful of this when handling datetimes!

FROM arm32v7/ubuntu
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install r-base -y

Once the Dockerfile is created you run it by typing docker build -t armr in the same folder in the terminal. Docker then builds an image with the tag armr. It builds by starting with the base image, setting the environment variable and adding a layer that comprises the result from the RUN command.

In fact, you can see all the layers that are built into any image by running docker history <image_name> in the terminal.

What’s next?

Base R is fine but to be useful we need to add a lot more packages and supporting software. Future development is likely to encompass the following:

Shiny server (I’d like to host one on this site)
Plumber API server
Rstudio server (so I can do analysis from anywhere on anything, even a tablet)
Images for commonly used packages (Tidyverse, data.table, caret, etc.)

armr

Sun, 26 Jul 2020 23:14:11 +0100

Gitlab Runner

Thu, 06 Feb 2020 10:43:00 +0000

README

A runner is a server that is able to execute a CI/CD pipeline. These pipelines contain one or more jobs that check a version controlled repository for build errors, code style, etc. This repository contains a dockerfile and bash scripts to make a Gitlab runner. It is intended to be used with a Gitlab repository that has a CI/CD pipeline configured according to a .gitlab-ci.yml file in the parent directory.

As long as the base image in your .gitlab-ci.yml file is compatible with arm architecture, this runner will work on a Raspberry Pi. If the base image isn’t compatible then you will have the same experience as this. Of course, you can build this image on an x86 machine and it will work with most base images a CI file is likely to contain.

To build

Instructions from https://www.devils-heaven.com/gitlab-runner-finally-use-your-raspberry-pi/. Note that you should copy the token that is located on your gitlab account, found under the project menu: Settings> CI / CD> Runners>Specific Runners

docker build -t gitlab-runner --build-arg token=<TOKEN> .

To run

From https://itnext.io/docker-in-docker-521958d34efd. Note we need to mount a volume as we run the container. This is because the runner depends on Docker. Running Docker in a Docker container is tricky so the easiest method is to “borrow” the host Docker service instead, which we do by mounting its .sock file.

docker run -v /var/run/docker.sock:/var/run/docker.sock gitlab-runner