Project Activities
 - Available now
 - In progress
 - Future
 - Details
Project management


AnVIL_Admin repository
Hosted on GitHub Pages
Theme by orderedlist.

Bioconductor / AnVIL

Project Activities

This site summarizes ongoing Bioconductor development activities related to AnVIL.

Learn more about Bioconductor and AnVIL.

Project Activities Overview

Available Now

In Progress


Project Activities – Detailed

This section provides a more detailed description of projects.


The latest terra-jupyter-bioconductor docker containers are available at AnVIL and on the Google Container Registry (gcr). They work like the bioconductor_docker images, with the capability to install ‘all of’ Bioconductor packages along with a few pre-installed “core” set of packages. The terra-jupyter-bioconductor image inherits from the terra-jupyter-r image which has all the system dependencies installed. This image has been tested on Leonardo and installs all but a few packages in Bioconductor release,which fail due to achived CRAN dependencies.

Jupyter notebooks

The images are based on R version 3.6 and Bioconductor version 3.10.

RStudio / Bioconductor

The image now has R-4.0.0 and the latest stable release Bioc 3.11.

User and Developer Tools

AnVIL package (Bioconductor, github).

Binary package installation (under development)

Notes on construction of binary package images within AnVIL

  1. Install AnVIL – do with Ncpus > 1
  2. Allow updates
  3. NOT YET: set options(repos=AnVIL::repositories()) to get fast install of CRAN packages
  4. BiocManager::install("vjcitn/BiocBBSpack", Ncpus=10)
  5. library(BiocBBSpack)
  6. Retrieve manifest from Bioconductor git pl = get_bioc_packagelist()
  7. BiocManager::install(pl, Ncpus=50) – this gets us 3212 packages binary packages … odd situation for affypdnn not available for 3.11 but why in manifest? These packages do not install:

    > dput(sort(setdiff(pl, installed)))
    c("affypdnn", "anamiR", "BatchQC", "CALIB", "ccfindR", "cellGrowth", 
    "cellTree", "CHARGE", "chroGPS", "cobindR", "CountClust", "CTDquerier", 
    "CVE", "debrowser", "DEDS", "Doscheda", "flowFit", "GeneGeneInteR", 
    "Genominator", "gpuMagic", "IdMappingAnalysis", "IdMappingRetrieval", 
    "Imetagene", "lol", "lpNet", "LVSmiRNA", "manta", "MCRestimate", 
    "Melissa", "MoonlightR", "MSGFgui", "MSGFplus", "MTseeker", "nem", 
    "netbenchmark", "nethet", "PAPi", "PathwaySplice", "pcaGoPromoter", 
    "pint", "proteoQC", "QUALIFIER", "R3CPET", "readat", "RIPSeeker", 
    "SANTA", "scAlign", "sparsenetgls", "splicegear", "trena", "waveTiling", 
    "xps", "YAPSA")
  8. Use dotarmv() as follows

     jnk = lapply(dir(), dotarmv)  # could probably be done with mclapply or bash

    binaries will appear in the dest= argument of dotarmv()

  9. Set up

  10. Use gsutil -m cp to copy content of dotarmv to the src/contrib bucket

  11. Create PACKAGES.gz using tools::write_PACKAGE(unpacked=TRUE), copy to src/contrib

Metadata access and overview

AnVIL package tools can be used to discover incompatibilities or ambiguities in study annotation. BJ’s class worked through metadata survey exercises. An example of incompatible/ambiguous annotation is present in the Autism workspaces.

We are looking at two studies from NYGC referring to autism, one has substring ACE2 and the other SSC. What we see above is that AFFECTION_STATUS is coded 1/2 in the SSC study, and more prosaically in the ACE2 study. It may be that the labels in ACE2 study are more problematic as the options seem to be “0”, “ASD affected”, “ASD Affected”, and “Diagnosis uncertain” – or perhaps it is just a letter casing issue.

The ingestion group was notified and replied that “there is no process for the AnVIL team to retrospectively address existing data”. Interest was expressed in learning more about our metadata survey capabilities.

Other activities