Abstract

The BiocKubeInstall package is a cloud computing resource meant to be used with Kubernetes. The package is used to create binaries for R and Bioconductor packages on specific docker images created by Bioconductor. It works in concert with k8sredis - the Kubernetes application and RedisParam package that parallelizes over a Kubernetes cluster. The package requires knowledge of the google cloud, kubernetes, and binary package creation. It is an internal package to Bioconductor used to make creation and maintainence of package binaries easier and faster.

Prerequesites

The following steps assume you have a Google cloud account, kubernetes controller (kubectl) installed, and your Google account configured, and have the following APIs enabled on the project at a minimum.

- Kubernetes Engine Admin
- Storage Admin
- Service account Admin

These services cost money, and are to be used at your expense.

Kubernetes setup

The ‘BiocKubeInstall’ package uses the k8sredis kubernetes configuration files to apply the kuberenetes configurations on to the GKE cluster.

Once you have k8sredis downloaded from https://github.com/Bioconductor/k8sredis, you can do the following.

Steps:

Default settings used in the following commands are based on local preferences. Please change them according to your needs.

cluster name: biock8scluster

zone: us-east1-b

service account name: bioc-binaries

project ID: fancy-house-303821

  1. Start a kubernetes cluster,

     gcloud container clusters create \
         --zone us-east1-b \
         --num-nodes=6 \
         --machine-type=e2-standard-4 biock8scluster
  2. Get the cluster credentials on your local machine,

     gcloud container clusters get-credentials --zone us-east1-b biock8scluster
  3. Create a service account key. The service account key has ‘Storage Admin’ permissions, so it can upload the binaries to a google bucket.

     ## Create service account
     gcloud iam service-accounts create bioc-binaries \
         --display-name "Storage Admin SA" \
         --description "Bioc Binaries storage admin"
    
     ## List service account
     gcloud iam service-accounts list \
         --filter bioc-binaries@fancy-house-303821.iam.gserviceaccount.com
    
     ## Download service account key locally
     gcloud iam service-accounts keys create \
         bioc-binaries.json \
         --iam-account bioc-binaries@fancy-house-303821.iam.gserviceaccount.com
    
     ## Add 'Storage Admin' role to service account.
     gcloud projects add-iam-policy-binding fancy-house-303821 \
         --member \
         "serviceAccount:bioc-binaries@fancy-house-303821.iam.gserviceaccount.com" \
         --role "roles/storage.admin"
  4. Start NFS volume which is going to be attached to your k8s compute nodes. The NFS mount is where the binary packages will be stored. The location of the NFS volume is under the directory /host on your compute nodes.

     kubectl apply -f k8s/nfs-volume/
  5. Create a kubernetes secret from the service account key which was downloaded in the previous steps bioc-binaries.json.

    The service account key is going to be transmitted to the kubernetes cluster in an encrypted manner. This command below assumes that bioc-binaries.json is located in the same path where you are running the kubectl commands.

     kubectl create secret generic \
         bioc-binaries-service-account-auth \
         --from-file=service_account_key=bioc-binaries.json
  6. Once the key is created, you can now launch the bioc-redis configuration on your kubernetes cluster.

     kubectl apply -f k8s/bioc-redis

Manage kubernetes cluster (if needed)

A few commands which help with the management of the kubernetes cluster are:

  • Check if all the pods, services, and nodes are working as expected:

      kubectl get all
  • Describe specific pods or nodes, to check for failures

      kubectl describe pod/manager
  • Log into a specific pod,

      kubectl exec pod/manager --stdin --tty /bin/bash

To delete and destroy your kubernetes cluster safely,

## Stop redis, rstudio services
kubectl delete -f k8s/bioc-redis/

## Delete NFS volume (this is not recommended)
kubectl delete -f k8s/nfs-volume/

## Delete the kubernetes cluster completely losing all data.
gcloud container clusters delete biock8scluster

BiocKubeInstall

Introduction

Cluster architecture and design

The k8s application launches multiple pods on the nodes within the cluster. The pods are classified into two types, a single “manager” pod and multiple “worker” pods. The manager pod delegates reponsibility i.e the jobs are sent to the workers that actually perform the task. The worker pods are always listening to the manager and waiting for a job. Each worker performs a single job till it is done.

The manager and the worker pass messages between each other using “redis”, and the redis work is available inside the package “RedisParam”.

Storage is also built into the k8s application. The manager and the worker also share a volume mount on the k8s cluster in the form of an NFS volume. The volume is mounted on the path /host/.

The “jobs” here refer to the building of binary R/Bioconductor packages. Each pod gets a single job, where the job is to build a single package binary.

Run

To run an actual job, there is one relevant function, i.e BiocKubeInstall::kube_run().

The function needs to be used within the k8sredis/k8s/bioc-redis/manager-pod.yaml file in your k8sredis k8s application. Within the specification of the manager-pod.yaml,

The function takes in the following arguments, the version of Bioconductor, the docker image_name for which the binaries are being built and the number of workers which need to run in parallel as worker_pool_size.

args: ["-e", "BiocKubeInstall::kube_run(version = '3.13',
                    image_name = 'bioconductor_docker',
                    worker_pool_size = 17)"]

The docker image of the manager node is always going to be bioconductor/bioc-redis:devel

Usage of Binaries on bioconductor_docker images

BiocManager::install('BiocParallel')

Usage of Binaries on the AnVIL

BiocManager::install('AnVIL')

library(AnVIL)
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.25.0
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.13     knitr_1.39          magrittr_2.0.3     
##  [4] R6_2.5.1            ragg_1.2.2          rlang_1.0.4        
##  [7] fastmap_1.1.0       stringr_1.4.0       tools_4.2.1        
## [10] xfun_0.32           cli_3.3.0           jquerylib_0.1.4    
## [13] systemfonts_1.0.4   htmltools_0.5.3     yaml_2.3.5         
## [16] digest_0.6.29       rprojroot_2.0.3     pkgdown_2.0.6      
## [19] bookdown_0.28       textshaping_0.3.6   BiocManager_1.30.18
## [22] purrr_0.3.4         sass_0.4.2          fs_1.5.2           
## [25] memoise_2.0.1       cachem_1.0.6        evaluate_0.16      
## [28] rmarkdown_2.15      stringi_1.7.8       compiler_4.2.1     
## [31] bslib_0.4.0         desc_1.4.1          jsonlite_1.8.0