BiocKubeInstall_Tutorial.Rmd
Abstract
The BiocKubeInstall package is a cloud computing resource meant to be used with Kubernetes. The package is used to create binaries for R and Bioconductor packages on specific docker images created by Bioconductor. It works in concert with k8sredis - the Kubernetes application and RedisParam package that parallelizes over a Kubernetes cluster. The package requires knowledge of the google cloud, kubernetes, and binary package creation. It is an internal package to Bioconductor used to make creation and maintainence of package binaries easier and faster.
The following steps assume you have a Google cloud account,
kubernetes controller (kubectl
) installed, and your Google
account configured, and have the following APIs enabled on the project
at a minimum.
- Kubernetes Engine Admin
- Storage Admin
- Service account Admin
These services cost money, and are to be used at your expense.
The ‘BiocKubeInstall’ package uses the k8sredis
kubernetes configuration files to apply the kuberenetes configurations
on to the GKE cluster.
Once you have k8sredis
downloaded from https://github.com/Bioconductor/k8sredis, you can do the
following.
Default settings used in the following commands are based on local preferences. Please change them according to your needs.
cluster name: biock8scluster
zone: us-east1-b
service account name: bioc-binaries
project ID: fancy-house-303821
Start a kubernetes cluster,
gcloud container clusters create \
--zone us-east1-b \
--num-nodes=6 \
--machine-type=e2-standard-4 biock8scluster
Get the cluster credentials on your local machine,
gcloud container clusters get-credentials --zone us-east1-b biock8scluster
Create a service account key. The service account key has ‘Storage Admin’ permissions, so it can upload the binaries to a google bucket.
## Create service account
gcloud iam service-accounts create bioc-binaries \
--display-name "Storage Admin SA" \
--description "Bioc Binaries storage admin"
## List service account
gcloud iam service-accounts list \
--filter bioc-binaries@fancy-house-303821.iam.gserviceaccount.com
## Download service account key locally
gcloud iam service-accounts keys create \
bioc-binaries.json \
--iam-account bioc-binaries@fancy-house-303821.iam.gserviceaccount.com
## Add 'Storage Admin' role to service account.
gcloud projects add-iam-policy-binding fancy-house-303821 \
--member \
"serviceAccount:bioc-binaries@fancy-house-303821.iam.gserviceaccount.com" \
--role "roles/storage.admin"
Start NFS volume which is going to be attached to your k8s
compute nodes. The NFS mount is where the binary packages will be
stored. The location of the NFS volume is under the directory
/host
on your compute nodes.
kubectl apply -f k8s/nfs-volume/
Create a kubernetes secret from the service account key which was
downloaded in the previous steps bioc-binaries.json
.
The service account key is going to be transmitted to the kubernetes
cluster in an encrypted manner. This command below assumes that
bioc-binaries.json
is located in the same path where you
are running the kubectl
commands.
kubectl create secret generic \
bioc-binaries-service-account-auth \
--from-file=service_account_key=bioc-binaries.json
Once the key is created, you can now launch the
bioc-redis
configuration on your kubernetes cluster.
kubectl apply -f k8s/bioc-redis
A few commands which help with the management of the kubernetes cluster are:
Check if all the pods, services, and nodes are working as expected:
kubectl get all
Describe specific pods or nodes, to check for failures
kubectl describe pod/manager
Log into a specific pod,
kubectl exec pod/manager --stdin --tty /bin/bash
To delete and destroy your kubernetes cluster safely,
## Stop redis, rstudio services
kubectl delete -f k8s/bioc-redis/
## Delete NFS volume (this is not recommended)
kubectl delete -f k8s/nfs-volume/
## Delete the kubernetes cluster completely losing all data.
gcloud container clusters delete biock8scluster
The k8s application launches multiple pods on the nodes within the cluster. The pods are classified into two types, a single “manager” pod and multiple “worker” pods. The manager pod delegates reponsibility i.e the jobs are sent to the workers that actually perform the task. The worker pods are always listening to the manager and waiting for a job. Each worker performs a single job till it is done.
The manager and the worker pass messages between each other using “redis”, and the redis work is available inside the package “RedisParam”.
Storage is also built into the k8s application. The manager and the
worker also share a volume mount on the k8s cluster in the form of an
NFS volume. The volume is mounted on the path /host/
.
The “jobs” here refer to the building of binary R/Bioconductor packages. Each pod gets a single job, where the job is to build a single package binary.
To run an actual job, there is one relevant function, i.e
BiocKubeInstall::kube_run()
.
The function needs to be used within the
k8sredis/k8s/bioc-redis/manager-pod.yaml
file in your
k8sredis
k8s application. Within the specification of the
manager-pod.yaml,
The function takes in the following arguments, the
version
of Bioconductor, the docker image_name
for which the binaries are being built and the number of workers which
need to run in parallel as worker_pool_size
.
args: ["-e", "BiocKubeInstall::kube_run(version = '3.13',
image_name = 'bioconductor_docker',
worker_pool_size = 17)"]
The docker image of the manager node is always going to be
bioconductor/bioc-redis:devel
bioconductor_docker
images
BiocManager::install('BiocParallel')
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocStyle_2.25.0
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.13 knitr_1.39 magrittr_2.0.3
## [4] R6_2.5.1 ragg_1.2.2 rlang_1.0.4
## [7] fastmap_1.1.0 stringr_1.4.0 tools_4.2.1
## [10] xfun_0.32 cli_3.3.0 jquerylib_0.1.4
## [13] systemfonts_1.0.4 htmltools_0.5.3 yaml_2.3.5
## [16] digest_0.6.29 rprojroot_2.0.3 pkgdown_2.0.6
## [19] bookdown_0.28 textshaping_0.3.6 BiocManager_1.30.18
## [22] purrr_0.3.4 sass_0.4.2 fs_1.5.2
## [25] memoise_2.0.1 cachem_1.0.6 evaluate_0.16
## [28] rmarkdown_2.15 stringi_1.7.8 compiler_4.2.1
## [31] bslib_0.4.0 desc_1.4.1 jsonlite_1.8.0