Testing Packages on Linux ARM64 with GitHub Actions

How to use GitHub Actions to systematically build and test a Bioconductor package on Linux ARM64 architecture.

tech-notes
GitHub Actions
ARM64
Author
Published

July 14, 2023

Bioconductor on Linux ARM64

Introduction

The Bioconductor Build System (BBS) now includes routine package testing on Linux ARM64, but the relatively low frequency of testing means this, if a problem occurs with your package, it can take a while to identify and fix the issue using the build system alone.

The previous blog post “Emulated build and test of Bioconductor packages for Linux ARM64” described how one can use Docker and architecture emulation to build, test, and debug a Bioconductor package for running on Linux ARM64 architecture when you only have access to local x86 hardware.

However such manual testing can be frustrating to run as a package developer, either because it’s an extra task you have to run frequently, or because you only do it occasionally and forget the steps involved. Ideally such testing would happen automatically whenever you make changes to a package, but providing more rapid feedback than the BBS provides. In this article we build on these previously presented ideas to describe one approach for testing package on Linux ARM64 using a continuous integration environment on GitHub Actions.

Workflow Implementation

An example workflow implementation can be found at https://github.com/grimbough/bioc-testing-with-arm64/blob/main/.github/workflows/test-package-arm64.yml. In the remainder of this post we’ll discuss some of the implementation choices made there and how they work.

Choice of Docker container

The first thing to remember when using architecture emulation is that everything works much slower than when running natively - typically by at least an order of magnitude. This influences some of the decisions made during this workflow regarding which containers to use and what we want to cache between workflow steps. Some operations that might be acceptable in a standard workflow become painfully slow under emulation, and so we try reduce the number of slow steps.

The first of these is to use a modified version of the Bioconductor:devel docker image which has TinyTex pre-installed. This allows us to compile the package manual pages and and PDF vignettes during testing. Installing TinyTex and the required LaTeX packages takes approximately 10 minutes on our emulated system, so there is a noticeable time benefit to using an image with it already installed. The modified image can be found at ghcr.io/grimbough/bioc-with-tinytex:devel-arm64

Note: You could probably achieve a similar result by using the standard Bioconductor container and running R CMD check with the arguments --no-manual and --no-build-vignettes, however I would rather run the complete testing process in case there is problematic code in either the manual page examples or vignette.

Installing packages

Installing packages that require compilation is also incredibly slow on our emulated system, so it’s immediately desirable to cache the library of installed packages that are needed for testing. The actions/cache GitHub action does a good job of this, however it will only create a cache after a successful job run. Given we’re creating this workflow to test a potentially problematic package, it can be frustrating to repeatedly wait several hours for all the necessary packages to install, because you haven’t managed to fix the issue.

Given this, we can split our workflow into two jobs; the first installs the packages while the second runs the actual package tests. With this structure a failure in the second job doesn’t prevent the cache mechanism from working and makes repeated runs much faster.

Now lets take a look at the first few steps on our install-dependencies job and explain what’s happening. Most of these steps are pretty standard for regular users of GitHub Actions.

  install-dependencies:
    name: Install package dependencies
    runs-on: ubuntu-22.04
    
    steps:
      - name: checkout
        uses: actions/checkout@v3
        
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
        with:
          platforms: arm64

First off we’re checking out the git repository the workflow is found in. That’s probably just the package you’re testing. Then we’re using the docker/setup-qemu-action to install the QEMU emulator discussed in the previous post.

Now we set up the library cache.

      - name: Make R library
        run: mkdir -p ${RUNNER_TEMP}/R-lib

      - name: Cache Dependencies
        id: cache-deps
        uses: actions/cache@v3
        with:
          path: ${{ runner.temp }}/R-lib
          key: R_lib-ARM64-${{ hashFiles('**/DESCRIPTION') }}
          restore-keys: |
            R_lib-ARM64-${{ hashFiles('**/DESCRIPTION') }}
            R_lib-ARM64-

Initially we create an empty directory on our runner. In this example this is in the runners temporary directory, but it could be anywhere. We’ll later mount this location into our Docker container, and it will contain the installed package library. We have to create it outside of the Docker container and mount it so that the caching mechanism will work. If this location was created inside the Docker container, it would disappear when the container was destroyed, and we wouldn’t be able to retain the contents.

We then provide this location to the actions/cache action, and use a hash of the DESCRIPTION file to tag our cache. Update the DESCRIPTION e.g. to add a new dependency or bump the version number and a new cache will be created. This isn’t perfect, as it won’t necessary capture updates to installed packages in the library, but it does a reasonable job with being too complex.

The final step of the job is to install the dependencies.

      - name: Run the build process with Docker
        uses: addnab/docker-run-action@v3
        with:
          image: ghcr.io/grimbough/bioc-with-tinytex:devel-arm64
          options: |
            --platform linux/arm64
            --volume ${{ runner.temp }}/R-lib:/R-lib
            --volume ${{ github.workspace }}/../:/build
            --env R_LIBS_USER=/R-lib
          run: |
            echo "options(Ncpus=2L, timeout = 300)" >> ~/.Rprofile
            Rscript -e 'pkgs <- remotes::dev_package_deps("/build/examplePKG", dependencies = TRUE)' \
                    -e 'BiocManager::install(pkgs$package, update = TRUE, ask = FALSE)'

We use the addnab/docker-run-action action to run this step inside a docker container and provide the image argument with the TinyTex arm64 container discussed earlier.

The options argument supplies arguments you would give Docker at the command line if you were running it locally. Here we set the platform to linux/arm64 to work with the QEMU emulation. We mount two locations from our runner into to container: the location of the library we created earlier and the directory containing the package to be tests. Inside the container these will be found at /R-lib and /build respectively. We also set the R_LIBS_USER environment variable, so R will use the mounted library in preference to anywhere else.

The run section is where we provide the command to be executed inside the container. First there’s an optional step to set the number of CPUs R should use by default. Currently GitHub runners are dual core and there’s a performance benefit to ensuring R uses both of these when installing multiple packages from source as we’re doing here. Then we use the remotes and BiocManager packages to list the package dependencies and install them.

If this job executes successfully we should have a cached library containing all the packages required to test the package.

Running the package tests

The second job in our workflow will carry out the package tests. We can use the needs: argument to specify that this job requires the install-dependencies job to have completed successfully. Without specifying this GitHub Actions will try to run the two jobs simultaneously, which clearly isn’t appropriate.

  check-arm64:
    name: Test package on ARM64
    runs-on: ubuntu-22.04
    needs:   install-dependencies
    steps:

      - name: checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
        with:
          platforms: arm64

      - name: Make R library
        run: mkdir -p ${RUNNER_TEMP}/R-lib

      - name: Cache Dependencies
        id: cache-deps
        uses: actions/cache@v3
        with:
          path: ${{ runner.temp }}/R-lib
          key: R_lib-ARM64-${{ hashFiles('**/DESCRIPTION') }}
          restore-keys: |
            R_lib-ARM64-${{ hashFiles('**/DESCRIPTION') }}
            R_lib-ARM64-

The first few steps are the same as before, checking out the package repository, installing QEMU, and then restoring the cached set of packages.

Next we can again use the addnab/docker-run-action action to execute our tests inside a docker container. We use the same container image and set of options as before to mount the package and library locations, as well as supplying the --workdir argument to ensure the following commands are executed in the folder where the package directory can be found.

      - name: Test Package
        uses: addnab/docker-run-action@v3
        with:
          image: ghcr.io/grimbough/bioc-with-tinytex:devel-arm64
          options: |
            --platform linux/arm64
            --volume ${{ runner.temp }}/R-lib:/R-lib
            --volume ${{ github.workspace }}:/build
            --env R_LIBS_USER=/R-lib
            --workdir /build
          run: |
            ## Install and store the log like on the BioC Build System
            R CMD INSTALL examplePKG &> examplePKG.install-out.txt
            if [ $? -eq  1 ]; then
              cat examplePKG.install-out.txt
              exit 1;
            fi
            
            ## build the package
            R CMD build --keep-empty-dirs --no-resave-data examplePKG
            if [ $? -eq  1 ]; then exit 1; fi
            
            ## Check the package using the shortcut from the BBS
            R CMD check --install=check:examplePKG.install-out.txt --library="${R_LIBS_USER}" --no-vignettes --timings examplePKG*.tar.gz
            if [ $? -eq  1 ]; then exit 1; fi
            
            ## build a package binary for Linux ARM64
            mkdir -p examplePKG.buildbin-libdir
            R CMD INSTALL --build --library=examplePKG.buildbin-libdir examplePKG*.tar.gz
            if [ $? -eq  1 ]; then exit 1; fi
          shell: bash

We use the run option to provide steps similar to the Bioconductor Build System. There are four distinct stages to this process: install, build, check, and build binary. The arguments and setting used here are representative of the BBS, but one can change them if other testing mechanism are required. You could also choose to split this into four separate job steps if you wanted more fine grained control.

One minor wrinkle when running this in a Docker container is that GitHub Actions will use the return code of the Docker process to determine whether the step has failed or not, rather than the processes run inside the container. Thus it will often give a green tick, even if something went wrong, and it is easy to miss an error if just glancing at the step summaries. To resolve this, after each process in the container we test the return code produced by R and exit if it indicates failure.

Finally, although some of the test outputs will be printed to the workflow log, we might want to make any output available to download and investigate further. To do this we can use the upload-artifact action.

      - uses: actions/upload-artifact@v3
        if: always()
        with:
          name: my-artifact
          path: |
            ~/**/*.tar.gz
            ~/**/*.install-out.txt
            ~/**/*.Rcheck
          if-no-files-found: warn

We use if: always() to ensure the upload happens even if a previous step has failed; it’s often more important to get the logs when there’s a problem! Assuming every step executed this should upload both the source and binary tarballs, the installation log file, and the folder produced by R CMD check. Hopefully that is enough information to diagnose any issue.

Conclusion

The combination of GitHub Actions and QEMU provides a platform for testing packages across multiple CPU architectures more rapidly than the Bioconductor Build System. Using them in a continuous integration environment allows one to detect and highlight unforeseen issues introduced by changes to a package or the wider R environment. The same emulation techniques can then be employed on your local development environment to find a solution, before testing again in a your GitHub Workflow, and finally deploying to Bioconductor.