<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Docker | Algorist</title><link>https://www.algorist.co.uk/tag/docker/</link><atom:link href="https://www.algorist.co.uk/tag/docker/index.xml" rel="self" type="application/rss+xml"/><description>Docker</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-gb</language><lastBuildDate>Sat, 06 Feb 2021 00:00:00 +0000</lastBuildDate><image><url>https://www.algorist.co.uk/images/icon_hu1a112552fd764c0568141c667be1573d_16776_512x512_fill_lanczos_center_2.png</url><title>Docker</title><link>https://www.algorist.co.uk/tag/docker/</link></image><item><title>Building a CI pipeline for R packages in gitlab</title><link>https://www.algorist.co.uk/post/building-a-ci-pipeline-for-r-packages-in-gitlab/</link><pubDate>Sat, 06 Feb 2021 00:00:00 +0000</pubDate><guid>https://www.algorist.co.uk/post/building-a-ci-pipeline-for-r-packages-in-gitlab/</guid><description>&lt;p>Continuous Integration (CI) is not a tool but a practice of continually merging in new behaviour/features into a released product. To facilitate this practice without exposing end users to unstable behaviour and bugs, testing needs to be standardised and automated. It&amp;rsquo;s no wonder then that CI is often associated with Test Driven Development (TDD), which mandates that you write your tests &lt;em>first&lt;/em>, working backwards to the write the minimal code that should pass each test.&lt;/p>
&lt;p>At first glance CI is not directly relevant to consultancy projects or products for internal use, which I tend to spend most of my time working on. However, it is a good discipline to foster ahead of developing packages for wider use, saves time when bug squashing and gives a framework for collaborating with other people (protect the master branch!).&lt;/p>
&lt;p>The version control tool of choice (for the moment at least) for my company&amp;rsquo;s team is Gitlab. I had noticed on the platform there is the option to use inbuilt CI/CD tools on projects and products. I was generally aware of CI and agreed with the idea of standardising and streamlining tests on code so I thought I would give it a go.&lt;/p>
&lt;p>I went to &lt;a href="https://stackoverflow.com/a/51874023/10960765" target="_blank" rel="noopener">Stack Overflow&lt;/a> for inspiration (as usual). From there I built the pipeline in yaml syntax, below.&lt;/p>
&lt;pre>&lt;code>image: rocker/r-base
stages:
- test
- quality
default:
before_script:
- mkdir -p installed_deps
- echo 'R_LIBS=&amp;quot;installed_deps&amp;quot;' &amp;gt; .Renviron
- echo 'R_LIBS_USER=&amp;quot;installed_deps&amp;quot;' &amp;gt;&amp;gt; .Renviron
- echo 'R_LIBS_SITE=&amp;quot;installed_deps&amp;quot;' &amp;gt;&amp;gt; .Renviron
test:
stage: test
script:
- apt-get update
- apt-get install --yes --no-install-recommends r-cran-testthat r-cran-devtools
- R -e &amp;quot;devtools::install_deps(dependencies = TRUE)&amp;quot;
- R CMD build . --no-build-vignettes --no-manual
- PKG_FILE_NAME=$(ls -1t *.tar.gz | head -n 1)
- R CMD check &amp;quot;${PKG_FILE_NAME}&amp;quot; --no-build-vignettes --no-manual
cache:
key: &amp;quot;$CI_COMMIT_REF_SLUG&amp;quot;
paths:
- installed_deps/
artifacts:
paths:
- '*.Rcheck/'
only:
- master
- dev
quality:
stage: quality
include:
template: Jobs/Code-Quality.gitlab-ci.yml
&lt;/code>&lt;/pre>
&lt;p>What does it do? Well, we start with the base R docker image from the &lt;em>rocker&lt;/em> project, making sure that the package lists are up to date (pretty sure this is built on Debian linux, so the package manager would be dpkg).&lt;/p>
&lt;p>Before we start on the actual job scripts, we run some code to make a folder called &lt;code>installed_deps&lt;/code> and declare some environment variables in &lt;code>.Renviron&lt;/code> to let R know this is the place to install packages. This is important for caching, described further below.&lt;/p>
&lt;p>Next we focus on the test job. We install the bare minimum R libraries on top of base R so that we can test the package: &lt;code>testthat&lt;/code> and &lt;code>devtools&lt;/code>. Note we elect to install no recommended packages alongside to make sure the container is as debloated as possible.&lt;/p>
&lt;p>After this we install all the dependencies that are specified in the DESCRIPTION file you wrote as part of your R package. When I used the original Stack answer above, it did not have the &lt;code>dependencies = TRUE&lt;/code> argument, which meant that none of my suggested packages were installed. This caused my build to fail because some of my tests depended on them; however, for most use cases you may not need this argument.&lt;/p>
&lt;p>As part of this stage we also define the cache to include the R library folder we set up in the &lt;code>before_script:&lt;/code> section. This makes subsequent jobs run a &lt;strong>lot&lt;/strong> faster (&amp;gt;5 times in my experience) because we don&amp;rsquo;t have to install packages again if they haven&amp;rsquo;t changed on CRAN. Note that this cache is only available for this stage, unless we define the cache outside of the stage.&lt;/p>
&lt;p>The last three lines in the script build the package source as a &lt;em>tar.gz&lt;/em> compressed file, store the package name from this file into a variable, and then use it to call &lt;code>CMD check&lt;/code>. This final line tests both for whether all unit tests described in the &lt;em>/tests&lt;/em> folder have passed, and whether the package can be built without errors. Note that for building the source and checking it we specify that we will not build vignettes or man pages. Although testing these might be useful they take a long time to run. In lieu of testing the vignette, I will often run a battery of unit tests using &lt;code>context(&amp;quot;Testing according to the vignette&amp;quot;)&lt;/code>, replicating each line of the vignette in tests.&lt;/p>
&lt;p>Artifacts are files that can be downloaded after the job is successfully run. We specify the folder where the compressed source file is created as an artifact.&lt;/p>
&lt;p>The &lt;code>only:&lt;/code> section within the test job refers to only the branches we wish to run the job on. Instead of whitelisting branches we could instead have dealt by exception, using &lt;code>except:&lt;/code> instead. The configuration above only runs the test job on the master and dev branches.&lt;/p>
&lt;p>The last job is from a Gitlab template that tests for code quality. I&amp;rsquo;m not sure how useful it is for R but it is a good illustration of how to chain jobs into a CI pipeline. Both test and code quality jobs must succeed for the pipeline to pass.&lt;/p>
&lt;h2 id="diy-runners">DIY runners&lt;/h2>
&lt;p>A note on using CI pipelines on the Gitlab free tier: it is &lt;strong>very&lt;/strong> easy to chew through the 200 free minutes per month. I managed to consume around 70% of this in a day and a half of experimenting, which spurred me on to make my own runner to get around this limitation.&lt;/p>
&lt;p>The &lt;a href="https://www.algorist.co.uk/project/internal-project/gitlab-runner/" target="_blank" rel="noopener">Gitlab-runner project&lt;/a> is the result of this experimentation. I successfully set up a runner on a Raspberry Pi 4 with 8gb of RAM but since the &lt;em>rocker/r-base&lt;/em> image is not arm compatible then the job failed. One possibility is to change the image; the other is to use a computer with compatible architecture. In the end I used my other laptop and it ran just fine; I then changed the base image to the one in my &lt;a href="https://github.com/Daveyr/armr" target="_blank" rel="noopener">armr project&lt;/a> and it also ran on my Raspberry Pi fine as well. No more limited pipeline minutes!&lt;/p></description></item><item><title>Using Rstudio Server with docker</title><link>https://www.algorist.co.uk/post/using-rstudio-server-with-docker/</link><pubDate>Sat, 30 Jan 2021 00:00:00 +0000</pubDate><guid>https://www.algorist.co.uk/post/using-rstudio-server-with-docker/</guid><description>
&lt;script src="https://www.algorist.co.uk/post/using-rstudio-server-with-docker/index.en_files/header-attrs/header-attrs.js">&lt;/script>
&lt;p>It’s taking a long time to run my genetic algorithm optimisation models recently. So much so that I’ve been looking at offloading processes to other computers lying idle on the network. The &lt;a href="https://github.com/Daveyr/armr">armr&lt;/a> project aims to do this with parallel processing and Rstudio server docker images running on the raspberry pi but this is a work in progress currently, chiefly due to having to build Rstudio server from source.&lt;/p>
&lt;p>In the meantime I have managed to run Rstudio server in a Docker container on my personal laptop, logging into it on my work laptop. Here’s how I did it, using the image provided by the &lt;a href="https://www.rocker-project.org/">Rocker project&lt;/a>.&lt;/p>
&lt;div id="on-the-host-machine" class="section level2">
&lt;h2>On the host machine&lt;/h2>
&lt;p>Assuming you already have docker installed, run the below code in a terminal on the host machine.&lt;/p>
&lt;pre class="bash">&lt;code>docker pull rocker/rstudio
docker run --rm -p 8787:8787 -e PASSWORD=&amp;quot;password&amp;quot; rocker/rstudio&lt;/code>&lt;/pre>
&lt;p>The default user name is “rstudio”. If you want to set the user name, add &lt;code>-e USER="user"&lt;/code> to the command above.&lt;/p>
&lt;p>As easy as that. To make the image usable we would have to create a new dockerfile based on this image and add run commands that install packages within R. For example,&lt;/p>
&lt;pre class="bash">&lt;code>from rocker/rstudio
RUN R -e &amp;quot;install.packages(&amp;#39;tidyverse&amp;#39;)&amp;quot;&lt;/code>&lt;/pre>
&lt;p>The base &lt;code>armr&lt;/code> dockerfile shows how to do this with an install script that reads a requirements.txt file - much quicker than installing each package individually. For saving work, you’ll also need to add a volume tag to the docker run command, e.g.,&lt;/p>
&lt;pre class="bash">&lt;code>docker run --rm -v $(pwd):home/user/ -p 8787:8787 -e USER=&amp;quot;user&amp;quot; -e PASSWORD=&amp;quot;password&amp;quot; rocker/rstudio&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="on-the-external-machine" class="section level2">
&lt;h2>On the external machine&lt;/h2>
&lt;p>To access from another computer on the network, open a browser and navigate to &lt;code>hostname:877&lt;/code>, where hostname is the hostname of the computer running the container. This can be found by running &lt;code>hostname&lt;/code> on it from the terminal (Linux only). If you want to access Rstudio server on the go, away from home, then you will need to do the following.&lt;/p>
&lt;ul>
&lt;li>Issue a static IP address to the host computer, most easily done using your router&lt;/li>
&lt;li>Set up port forwarding to forward port 8787 from the static IP to the outside world&lt;/li>
&lt;li>Read up on security settings in Rstudio server and implement them! These are likely to include IP whitelisting, certificate only authentication, banning IP addresses after several failed attempts (see &lt;em>fail2ban&lt;/em>) and more&lt;/li>
&lt;/ul>
&lt;/div></description></item><item><title>How to run R using Docker on Raspberry Pi</title><link>https://www.algorist.co.uk/post/how-to-run-r-using-docker-on-raspberry-pi/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://www.algorist.co.uk/post/how-to-run-r-using-docker-on-raspberry-pi/</guid><description>
&lt;p>When I began learning about how to use Docker I stumbled on an excellent project called &lt;a href="https://www.rocker-project.org/">Rocker&lt;/a>. For anyone with an x86 machine these Rocker images allow them to run R and most of its dependencies in a containerised environment. Plumber APIs, anyone? What about your own Shiny server? Finally, data scientists using R can have the same level of control on dependencies and package versions as Python users have become accustomed to through &lt;code>venv&lt;/code>.&lt;/p>
&lt;p>Things are a little more complicated for ARM users, especially 32-bit ARM architectures such as the Raspberry Pi. No Rocker images offer such compatibility so we’re on our own. This was the major reason I’ve started a project, called, &lt;a href="https://www.algorist.co.uk/project/armr/">ARMR&lt;/a>, to build a series of Docker images that &lt;strong>do&lt;/strong> offer compatibility with the lovable credit card sized computer.&lt;/p>
&lt;div id="hello-world" class="section level1">
&lt;h1>Hello woRld&lt;/h1>
&lt;p>Whilst not much has happened with the project so far, at least I have a version of “Hello woRld”: a container with r-base installed. But first we must install Docker.&lt;/p>
&lt;div id="installation" class="section level2">
&lt;h2>Installation&lt;/h2>
&lt;p>From the terminal on a Raspberry Pi, run the following.&lt;/p>
&lt;pre>&lt;code># Downloads installation shell script and pipes it into the sh command
curl -sSL https://get.docker.com | sh
# Adds pi to the docker group so the user can run
sudo usermod -aG docker pi&lt;/code>&lt;/pre>
&lt;p>From here we can either reboot or run &lt;code>systemctl start docker.service&lt;/code> to start up Docker. To test it is working, try &lt;code>docker info&lt;/code>, then &lt;code>docker version&lt;/code>.&lt;/p>
&lt;p>Once you know that Docker is running, let’s try a few things in order of sophistication. &lt;code>docker run hello-world&lt;/code> will run a container based on an image called &lt;em>hello-world&lt;/em>. &lt;code>docker run -it ubuntu bash&lt;/code> takes it up a notch: now we have an ubuntu bash container running in interactive mode in the terminal.&lt;/p>
&lt;p>To make it even more useful, we ought to have access to persistent storage. Let’s modify the command to include a mount volume.&lt;/p>
&lt;pre>&lt;code>docker run -it -v /home:/home ubuntu bash&lt;/code>&lt;/pre>
&lt;p>The &lt;code>-v&lt;/code> flag tells Docker to attach a volume; the following argument contains the information on what locations should be used, of the form &lt;em>from_volume:to_volume&lt;/em>. The location on your machine is the &lt;em>from_volume&lt;/em> and the location on your container is the &lt;em>to_volume&lt;/em>. In our example, anything you create in the home folder within the container will persist in the home folder of your Raspberry Pi after you close the container. The easiest way to test this is to type &lt;code>touch myfile&lt;/code> in the interactive terminal in the container, and watch the same file appear in your home folder.&lt;/p>
&lt;/div>
&lt;div id="build-an-r-container" class="section level2">
&lt;h2>Build an R container&lt;/h2>
&lt;p>Building a base R container is as simple as writing the code below to a file called &lt;code>Dockerfile&lt;/code>. We use an arm32 ubuntu image as a base, from which we set an environment variable to force the terminal to be non-interactive. This is because when r-base is installed it waits for user input when setting parameters, hanging the container build. By setting the installation to be non-interactive we accept all the defaults, including timezone. Be mindful of this when handling datetimes!&lt;/p>
&lt;pre>&lt;code>FROM arm32v7/ubuntu
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update &amp;amp;&amp;amp; apt install r-base -y&lt;/code>&lt;/pre>
&lt;p>Once the Dockerfile is created you run it by typing &lt;code>docker build -t armr&lt;/code> in the same folder in the terminal. Docker then builds an image with the tag &lt;em>armr&lt;/em>. It builds by starting with the base image, setting the environment variable and adding a layer that comprises the result from the &lt;code>RUN&lt;/code> command.&lt;/p>
&lt;p>In fact, you can see all the layers that are built into any image by running &lt;code>docker history &amp;lt;image_name&amp;gt;&lt;/code> in the terminal.&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="whats-next" class="section level1">
&lt;h1>What’s next?&lt;/h1>
&lt;p>Base R is fine but to be useful we need to add a lot more packages and supporting software. Future development is likely to encompass the following:&lt;/p>
&lt;ul>
&lt;li>Shiny server (I’d like to host one on this site)&lt;/li>
&lt;li>Plumber API server&lt;/li>
&lt;li>Rstudio server (so I can do analysis from anywhere on anything, even a tablet)&lt;/li>
&lt;li>Images for commonly used packages (Tidyverse, data.table, caret, etc.)&lt;/li>
&lt;/ul>
&lt;/div></description></item><item><title>armr</title><link>https://www.algorist.co.uk/project/internal-project/armr/</link><pubDate>Sun, 26 Jul 2020 23:14:11 +0100</pubDate><guid>https://www.algorist.co.uk/project/internal-project/armr/</guid><description/></item><item><title>Gitlab Runner</title><link>https://www.algorist.co.uk/project/internal-project/gitlab-runner/</link><pubDate>Thu, 06 Feb 2020 10:43:00 +0000</pubDate><guid>https://www.algorist.co.uk/project/internal-project/gitlab-runner/</guid><description>&lt;h1 id="readme">README&lt;/h1>
&lt;p>A runner is a server that is able to execute a CI/CD pipeline. These pipelines contain one or more jobs that check a version controlled repository for build errors, code style, etc. This repository contains a dockerfile and bash scripts to make a Gitlab runner. It is intended to be used with a Gitlab repository that has a CI/CD pipeline configured according to a &lt;code>.gitlab-ci.yml&lt;/code> file in the parent directory.&lt;/p>
&lt;p>As long as the base image in your .gitlab-ci.yml file is compatible with arm architecture, this runner will work on a Raspberry Pi. If the base image isn&amp;rsquo;t compatible then you will have the same experience as &lt;a href="https://www.talvbansal.me/blog/maximising-gitlab-ci-s-free-tier/" target="_blank" rel="noopener">this&lt;/a>. Of course, you can build this image on an x86 machine and it will work with most base images a CI file is likely to contain.&lt;/p>
&lt;h2 id="to-build">To build&lt;/h2>
&lt;p>Instructions from &lt;a href="https://www.devils-heaven.com/gitlab-runner-finally-use-your-raspberry-pi/">https://www.devils-heaven.com/gitlab-runner-finally-use-your-raspberry-pi/&lt;/a>. Note that you should copy the token that is located on your gitlab account, found under the project menu:
&lt;code>Settings&amp;gt; CI / CD&amp;gt; Runners&amp;gt;Specific Runners&lt;/code>&lt;/p>
&lt;pre>&lt;code>docker build -t gitlab-runner --build-arg token=&amp;lt;TOKEN&amp;gt; .
&lt;/code>&lt;/pre>
&lt;h2 id="to-run">To run&lt;/h2>
&lt;p>From &lt;a href="https://itnext.io/docker-in-docker-521958d34efd">https://itnext.io/docker-in-docker-521958d34efd&lt;/a>. Note we need to mount a volume as we run the container. This is because the runner depends on Docker. Running Docker in a Docker container is tricky so the easiest method is to &amp;ldquo;borrow&amp;rdquo; the host Docker service instead, which we do by mounting its .sock file.&lt;/p>
&lt;pre>&lt;code>docker run -v /var/run/docker.sock:/var/run/docker.sock gitlab-runner
&lt;/code>&lt;/pre></description></item></channel></rss>