Advances in scRNA-seq technologies have increased the number of cells that can be assayed in routine experiments. Public databases such as GEO are continually expanding with more scRNA-seq studies, while large-scale projects such as the Human Cell Atlas are expected to generate data for billions of cells. For effective data analysis, the computational methods need to scale with the increasing size of scRNA-seq data sets. This section discusses how we can use various aspects of the Bioconductor ecosystem to tune our analysis pipelines for greater speed and efficiency.
The count matrix is the central structure around which our analyses
are based. In most of the previous chapters, this has been held fully in
memory as a dense matrix
or as a sparse
dgCMatrix
. Howevever, in-memory representations may not be
feasible for very large data sets, especially on machines with limited
memory. For example, the 1.3 million brain cell data set from 10X
Genomics (Zheng et al.,
2017) would require over 100 GB of RAM to hold as a
matrix
and around 30 GB as a dgCMatrix
. This
makes it challenging to explore the data on anything less than a HPC
system.
The obvious solution is to use a file-backed matrix representation where the data are held on disk and subsets are retrieved into memory as requested. While a number of implementations of file-backed matrices are available (e.g., bigmemory, matter), we will be using the implementation from the HDF5Array package. This uses the popular HDF5 format as the underlying data store, which provides a measure of standardization and portability across systems. We demonstrate with a subset of 20,000 cells from the 1.3 million brain cell data set, as provided by the TENxBrainData package.
library(TENxBrainData)
sce.brain <- TENxBrainData20k()
sce.brain
## class: SingleCellExperiment
## dim: 27998 20000
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(2): Ensembl Symbol
## colnames: NULL
## colData names(4): Barcode Sequence Library Mouse
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Examination of the SingleCellExperiment
object indicates
that the count matrix is a HDF5Matrix
. From a comparison of
the memory usage, it is clear that this matrix object is simply a stub
that points to the much larger HDF5 file that actually contains the
data. This avoids the need for large RAM availability during
analyses.
counts(sce.brain)
## <27998 x 20000> HDF5Matrix object of type "integer":
## [,1] [,2] [,3] [,4] ... [,19997] [,19998] [,19999]
## [1,] 0 0 0 0 . 0 0 0
## [2,] 0 0 0 0 . 0 0 0
## [3,] 0 0 0 0 . 0 0 0
## [4,] 0 0 0 0 . 0 0 0
## [5,] 0 0 0 0 . 0 0 0
## ... . . . . . . . .
## [27994,] 0 0 0 0 . 0 0 0
## [27995,] 0 0 0 1 . 0 2 0
## [27996,] 0 0 0 0 . 0 1 0
## [27997,] 0 0 0 0 . 0 0 0
## [27998,] 0 0 0 0 . 0 0 0
## [,20000]
## [1,] 0
## [2,] 0
## [3,] 0
## [4,] 0
## [5,] 0
## ... .
## [27994,] 0
## [27995,] 0
## [27996,] 0
## [27997,] 0
## [27998,] 0
object.size(counts(sce.brain))
## 2496 bytes
## [1] 76264332
Manipulation of the count matrix will generally result in the
creation of a DelayedArray
object from the DelayedArray
package. This remembers the operations to be applied to the counts and
stores them in the object, to be executed when the modified matrix
values are realized for use in calculations. The use of delayed
operations avoids the need to write the modified values to a new file at
every operation, which would unnecessarily require time-consuming disk
I/O.
## <27998 x 20000> DelayedMatrix object of type "double":
## [,1] [,2] [,3] ... [,19999] [,20000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [27994,] 0 0 0 . 0 0
## [27995,] 0 0 0 . 0 0
## [27996,] 0 0 0 . 0 0
## [27997,] 0 0 0 . 0 0
## [27998,] 0 0 0 . 0 0
Many functions described in the previous workflows are capable of
accepting HDF5Matrix
objects. This is powered by the
availability of common methods for all matrix representations (e.g.,
subsetting, combining, methods from DelayedMatrixStats
as well as representation-agnostic C++ code using beachmat. For
example, we compute QC metrics below with the same
calculateQCMetrics()
function that we used in the other
workflows.
## Loading required package: scuttle
## Loading required package: ggplot2
is.mito <- grepl("^mt-", rowData(sce.brain)$Symbol)
qcstats <- perCellQCMetrics(sce.brain, subsets = list(Mt = is.mito))
qcstats
## DataFrame with 20000 rows and 6 columns
## sum detected subsets_Mt_sum subsets_Mt_detected subsets_Mt_percent
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 3060 1546 123 10 4.01961
## 2 3500 1694 118 11 3.37143
## 3 3092 1613 58 9 1.87581
## 4 4420 2050 131 10 2.96380
## 5 3771 1813 100 8 2.65182
## ... ... ... ... ... ...
## 19996 4431 2050 127 9 2.866170
## 19997 6988 2704 60 9 0.858615
## 19998 8749 2988 305 11 3.486113
## 19999 3842 1711 129 8 3.357626
## 20000 1775 945 26 6 1.464789
## total
## <numeric>
## 1 3060
## 2 3500
## 3 3092
## 4 4420
## 5 3771
## ... ...
## 19996 4431
## 19997 6988
## 19998 8749
## 19999 3842
## 20000 1775
Needless to say, data access from file-backed representations is slower than that from in-memory representations. The time spent retrieving data from disk is an unavoidable cost of reducing memory usage. Whether this is tolerable depends on the application. One example usage pattern involves performing the heavy computing quickly with in-memory representations on HPC systems with plentiful memory, and then distributing file-backed counterparts to individual users for exploration and visualization on their personal machines.
Parallelization of calculations across genes or cells is an obvious strategy for speeding up scRNA-seq analysis workflows.
The BiocParallel
package provides a common interface for parallel computing throughout
the Bioconductor ecosystem, manifesting as a BPPARAM
argument in compatible functions. We can also use
BiocParallel
with more expressive functions directly
through the package’s interface.
BiocParallel
makes it quite easy to iterate over a
vector and distribute the computation across workers using the
bplapply
function. Basic knowledge of lapply
is required.
In this example, we find the square root of a vector of numbers in
parallel by indicating the BPPARAM
argument in
bplapply
.
param <- MulticoreParam(workers = 1)
bplapply(
X = c(4, 9, 16, 25),
FUN = function(x) { sqrt(x) },
BPPARAM = param
)
## [[1]]
## [1] 2
##
## [[2]]
## [1] 3
##
## [[3]]
## [1] 4
##
## [[4]]
## [1] 5
Note. The number of workers is set to 1 due to continuous testing resource limitations.
There exists a diverse set of parallelization backends depending on available hardware and operating systems.
For example, we might use forking across two cores to parallelize the variance calculations on a Unix system:
library(MouseGastrulationData)
library(scran)
sce <- WTChimeraData(samples = 5, type = "processed")
sce <- logNormCounts(sce)
dec.mc <- modelGeneVar(sce, BPPARAM = MulticoreParam(2))
dec.mc
## DataFrame with 29453 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000051951 0.002800256 0.003504940 0.002856726 6.48214e-04 1.20918e-01
## ENSMUSG00000089699 0.000000000 0.000000000 0.000000000 0.00000e+00 NaN
## ENSMUSG00000102343 0.000000000 0.000000000 0.000000000 0.00000e+00 NaN
## ENSMUSG00000025900 0.000794995 0.000863633 0.000811027 5.26059e-05 3.68975e-01
## ENSMUSG00000025902 0.170777718 0.388633677 0.170893240 2.17740e-01 2.48097e-11
## ... ... ... ... ... ...
## ENSMUSG00000095041 0.35571083 0.34572194 0.3364126 0.009309343 0.443249
## ENSMUSG00000063897 0.49007956 0.41924282 0.4407846 -0.021541821 0.599512
## ENSMUSG00000096730 0.00000000 0.00000000 0.0000000 0.000000000 NaN
## ENSMUSG00000095742 0.00177158 0.00211619 0.0018073 0.000308881 0.189009
## tomato-td 0.57257331 0.47487832 0.4971974 -0.022319130 0.591555
## FDR
## <numeric>
## ENSMUSG00000051951 6.76330e-01
## ENSMUSG00000089699 NaN
## ENSMUSG00000102343 NaN
## ENSMUSG00000025900 7.56203e-01
## ENSMUSG00000025902 1.35619e-09
## ... ...
## ENSMUSG00000095041 0.756203
## ENSMUSG00000063897 0.756203
## ENSMUSG00000096730 NaN
## ENSMUSG00000095742 0.756203
## tomato-td 0.756203
Another approach would be to distribute jobs across a network of computers, which yields the same result:
dec.snow <- modelGeneVar(sce, BPPARAM = SnowParam(2))
For high-performance computing (HPC) systems with a cluster of
compute nodes, we can distribute jobs via the job scheduler using the
BatchtoolsParam
class. The example below assumes a SLURM
cluster, though the settings can be easily configured for a particular
system (see here
for details).
# 2 hours, 8 GB, 1 CPU per task, for 10 tasks.
rs <- list(walltime = 7200, memory = 8000, ncpus = 1)
bpp <- BatchtoolsParam(10, cluster = "slurm", resources = rs)
Parallelization is best suited for CPU-intensive calculations where the division of labor results in a concomitant reduction in compute time. It is not suited for tasks that are bounded by other compute resources, e.g., memory or file I/O (though the latter is less of an issue on HPC systems with parallel read/write). In particular, R itself is inherently single-core, so many of the parallelization backends involve (i) setting up one or more separate R sessions, (ii) loading the relevant packages and (iii) transmitting the data to that session. Depending on the nature and size of the task, this overhead may outweigh any benefit from parallel computing.
Identification of neighbouring cells in PC or expression space is a
common procedure that is used in many functions, e.g.,
buildSNNGraph()
, doubletCells()
. The default
is to favour accuracy over speed by using an exact nearest neighbour
(NN) search, implemented with the \(k\)-means for \(k\)-nearest neighbours algorithm. However,
for large data sets, it may be preferable to use a faster approximate
approach.
The BiocNeighbors
framework makes it easy to switch between search options by simply
changing the BNPARAM
argument in compatible functions. To
demonstrate, we will use the wild-type chimera data for which we had
applied graph-based clustering using the Louvain algorithm for community
detection:
library(bluster)
sce <- runPCA(sce)
colLabels(sce) <- clusterCells(sce, use.dimred = "PCA",
BLUSPARAM = NNGraphParam(cluster.fun = "louvain"))
The above clusters on a nearest neighbor graph generated with an
exact neighbour search. We repeat this below using an approximate
search, implemented using the Annoy algorithm. This
involves constructing a AnnoyParam
object to specify the
search algorithm and then passing it to the parameterization of the
NNGraphParam()
function. The results from the exact and
approximate searches are consistent with most clusters from the former
re-appearing in the latter. This suggests that the inaccuracy from the
approximation can be largely ignored.
library(scran)
library(BiocNeighbors)
clusters <- clusterCells(sce, use.dimred = "PCA",
BLUSPARAM = NNGraphParam(cluster.fun = "louvain",
BNPARAM = AnnoyParam()))
table(exact = colLabels(sce), approx = clusters)
## approx
## exact 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1 91 0 0 0 3 0 0 0 1 0 0 0 0 0 0
## 2 0 143 0 0 0 0 0 0 0 0 0 0 0 0 1
## 3 0 0 75 0 2 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 341 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 388 0 0 0 1 1 0 7 0 0 0
## 6 0 0 0 0 0 206 3 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 1 244 0 0 1 0 0 0 0 0
## 8 0 0 0 0 3 0 0 91 0 0 0 2 0 0 0
## 9 1 0 0 0 1 0 0 0 106 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 113 0 0 0 0 0
## 11 0 0 0 0 0 2 0 0 0 7 145 0 0 0 0
## 12 0 0 0 0 1 0 0 0 0 0 0 208 0 0 0
## 13 0 0 0 0 0 0 0 0 0 0 0 0 146 0 0
## 14 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
## 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56
Note that Annoy writes the NN index to disk prior to performing the search. Thus, it may not actually be faster than the default exact algorithm for small datasets, depending on whether the overhead of disk write is offset by the computational complexity of the search. It is also not difficult to find situations where the approximation deteriorates, especially at high dimensions, though this may not have an appreciable impact on the biological conclusions.
set.seed(1000)
y1 <- matrix(rnorm(50000), nrow = 1000)
y2 <- matrix(rnorm(50000), nrow = 1000)
Y <- rbind(y1, y2)
exact <- findKNN(Y, k = 20)
approx <- findKNN(Y, k = 20, BNPARAM = AnnoyParam())
mean(exact$index != approx$index)
## [1] 0.561925
The singular value decomposition (SVD) underlies the PCA used
throughout our analyses, e.g., in denoisePCA()
,
fastMNN()
, doubletCells()
. (Briefly, the right
singular vectors are the eigenvectors of the gene-gene covariance
matrix, where each eigenvector represents the axis of maximum remaining
variation in the PCA.) The default base::svd()
function
performs an exact SVD that is not performant for large datasets.
Instead, we use fast approximate methods from the irlba and
rsvd
packages, conveniently wrapped into the BiocSingular
package for ease of use and package development. Specifically, we can
change the SVD algorithm used in any of these functions by simply
specifying an alternative value for the BSPARAM
argument.
library(scater)
library(BiocSingular)
# As the name suggests, it is random, so we need to set the seed.
set.seed(101000)
r.out <- runPCA(sce, ncomponents = 20, BSPARAM = RandomParam())
str(reducedDim(r.out, "PCA"))
## num [1:2411, 1:20] 14.79 5.79 13.07 -32.19 -26.45 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:2411] "cell_9769" "cell_9770" "cell_9771" "cell_9772" ...
## ..$ : chr [1:20] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "varExplained")= num [1:20] 192.6 87 29.4 23.1 21.6 ...
## - attr(*, "percentVar")= num [1:20] 25.84 11.67 3.94 3.1 2.89 ...
## - attr(*, "rotation")= num [1:500, 1:20] -0.174 -0.173 -0.157 0.105 -0.132 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:500] "ENSMUSG00000055609" "ENSMUSG00000052217" "ENSMUSG00000069919" "ENSMUSG00000048583" ...
## .. ..$ : chr [1:20] "PC1" "PC2" "PC3" "PC4" ...
set.seed(101001)
i.out <- runPCA(sce, ncomponents = 20, BSPARAM = IrlbaParam())
str(reducedDim(i.out, "PCA"))
## num [1:2411, 1:20] -14.79 -5.79 -13.07 32.19 26.45 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:2411] "cell_9769" "cell_9770" "cell_9771" "cell_9772" ...
## ..$ : chr [1:20] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "varExplained")= num [1:20] 192.6 87 29.4 23.1 21.6 ...
## - attr(*, "percentVar")= num [1:20] 25.84 11.67 3.94 3.1 2.89 ...
## - attr(*, "rotation")= num [1:500, 1:20] 0.174 0.173 0.157 -0.105 0.132 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:500] "ENSMUSG00000055609" "ENSMUSG00000052217" "ENSMUSG00000069919" "ENSMUSG00000048583" ...
## .. ..$ : chr [1:20] "PC1" "PC2" "PC3" "PC4" ...
Both IRLBA and randomized SVD (RSVD) are much faster than the exact
SVD with negligible loss of accuracy. This motivates their default use
in many scran and
scater
functions, at the cost of requiring users to set the seed to guarantee
reproducibility. IRLBA can occasionally fail to converge and require
more iterations (passed via maxit=
in
IrlbaParam()
), while RSVD involves an explicit trade-off
between accuracy and speed based on its oversampling parameter
(p=
) and number of power iterations (q=
). We
tend to prefer IRLBA as its default behavior is more accurate, though
RSVD is much faster for file-backed matrices.
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat can be used to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse types of single-cell data. Seurat is developed and maintained by the Satija lab and is released under the MIT license.
Although the basic processing of single-cell data with Bioconductor
packages (described in the OSCA book) and
with Seurat is very similar and will produce overall roughly identical
results, there is also complementary functionality with regard to cell
type annotation, dataset integration, and downstream analysis. To make
the most of both ecosystems it is therefore beneficial to be able to
easily switch between a SeuratObject
and a
SingleCellExperiment
. See also the Seurat conversion
vignette for conversion to/from other popular single cell formats
such as the AnnData format used by scanpy.
Here, we demonstrate converting the Seurat object produced in
Seurat’s PBMC
tutorial to a SingleCellExperiment
for further analysis
with functionality from OSCA/Bioconductor. We therefore need to first
install the SeuratData package,
which is available from GitHub only.
BiocManager::install("satijalab/seurat-data")
We then proceed by loading all required packages and installing the PBMC dataset:
library(SeuratData)
InstallData("pbmc3k")
## Installing package into '/tmp/RtmpCPQPPU/temp_libpath1ebf59bf16ee'
## (as 'lib' is unspecified)
We then load the dataset as an SeuratObject
and convert
it to a SingleCellExperiment
.
# Use PBMC3K from SeuratData
pbmc <- LoadData(ds = "pbmc3k", type = "pbmc3k.final")
pbmc
## An object of class Seurat
## 13714 features across 2638 samples within 1 assay
## Active assay: RNA (13714 features, 2000 variable features)
## 2 dimensional reductions calculated: pca, umap
pbmc.sce <- as.SingleCellExperiment(pbmc)
pbmc.sce
## class: SingleCellExperiment
## dim: 13714 2638
## metadata(0):
## assays(3): counts logcounts scaledata
## rownames(13714): AL627309.1 AP006222.2 ... PNRC2.1 SRSF10.1
## rowData names(0):
## colnames(2638): AAACATACAACCAC AAACATTGAGCTAC ... TTTGCATGAGAGGC
## TTTGCATGCCTCAC
## colData names(8): orig.ident nCount_RNA ... seurat_clusters ident
## reducedDimNames(2): PCA UMAP
## mainExpName: RNA
## altExpNames(0):
Seurat also allows conversion from SingleCellExperiment
objects to Seurat objects; we demonstrate this here on the wild-type
chimera mouse gastrulation dataset.
sce <- WTChimeraData(samples = 5, type = "processed")
assay(sce) <- as.matrix(assay(sce))
sce <- logNormCounts(sce)
sce
## class: SingleCellExperiment
## dim: 29453 2411
## metadata(0):
## assays(2): counts logcounts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(2411): cell_9769 cell_9770 ... cell_12178 cell_12179
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## mainExpName: NULL
## altExpNames(0):
After some processing of the dataset, the actual conversion is
carried out with the as.Seurat
function.
## An object of class Seurat
## 29453 features across 2411 samples within 1 assay
## Active assay: originalexp (29453 features, 0 variable features)
## 2 dimensional reductions calculated: pca.corrected.E7.5, pca.corrected.E8.5
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells. Scanpy is developed and maintained by the Theis lab and is released under a BSD-3-Clause license. Scanpy is part of the scverse, a Python-based ecosystem for single-cell omics data analysis.
At the core of scanpy’s single-cell functionality is the
anndata
data structure, scanpy’s integrated single-cell
data container, which is conceptually very similar to Bioconductor’s
SingleCellExperiment
class.
Bioconductor’s zellkonverter
package provides a lightweight interface between the Bioconductor
SingleCellExperiment
data structure and the Python
AnnData
-based single-cell analysis environment. The idea is
to enable users and developers to easily move data between these
frameworks to construct a multi-language analysis pipeline across
R/Bioconductor and Python.
The readH5AD()
function can be used to read a
SingleCellExperiment
from an H5AD file. Here, we use an
example H5AD file contained in the zellkonverter
package.
example_h5ad <- system.file("extdata", "krumsiek11.h5ad",
package = "zellkonverter")
readH5AD(example_h5ad)
## Warning: The names of these selected uns$highlights items have been modified to match R
## conventions: '0' -> 'X0', '159' -> 'X159', '319' -> 'X319', '459' -> 'X459',
## and '619' -> 'X619'
## class: SingleCellExperiment
## dim: 11 640
## metadata(2): highlights iroot
## assays(1): X
## rownames(11): Gata2 Gata1 ... EgrNab Gfi1
## rowData names(0):
## colnames(640): 0 1 ... 158-3 159-3
## colData names(1): cell_type
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
We can also write a SingleCellExperiment
to an H5AD file
with the writeH5AD()
function. This is demonstrated below
on the wild-type chimera mouse gastrulation dataset.
## ℹ Using the 'counts' assay as the X matrix
The resulting H5AD file can then be read into Python using scanpy’s read_h5ad function and then directly used in compatible Python-based analysis frameworks.
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] zellkonverter_1.10.1 pbmc3k.SeuratData_3.1.4
## [3] SeuratData_0.2.2 SeuratObject_4.1.3
## [5] Seurat_4.3.0.1 BiocSingular_1.16.0
## [7] BiocNeighbors_1.18.0 bluster_1.10.0
## [9] scran_1.28.1 MouseGastrulationData_1.14.0
## [11] SpatialExperiment_1.10.0 BiocParallel_1.34.2
## [13] scater_1.28.0 ggplot2_3.4.2
## [15] scuttle_1.10.1 TENxBrainData_1.20.0
## [17] HDF5Array_1.28.1 rhdf5_2.44.0
## [19] DelayedArray_0.26.6 S4Arrays_1.0.4
## [21] Matrix_1.6-0 SingleCellExperiment_1.22.0
## [23] SummarizedExperiment_1.30.2 Biobase_2.60.0
## [25] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1
## [27] IRanges_2.34.1 S4Vectors_0.38.1
## [29] BiocGenerics_0.46.0 MatrixGenerics_1.12.2
## [31] matrixStats_1.0.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.2 spatstat.sparse_3.0-2
## [3] bitops_1.0-7 httr_1.4.6
## [5] RColorBrewer_1.1-3 tools_4.3.0
## [7] sctransform_0.3.5 utf8_1.2.3
## [9] R6_2.5.1 lazyeval_0.2.2
## [11] uwot_0.1.16 rhdf5filters_1.12.1
## [13] withr_2.5.0 sp_2.0-0
## [15] gridExtra_2.3 progressr_0.13.0
## [17] cli_3.6.1 textshaping_0.3.6
## [19] spatstat.explore_3.2-1 sass_0.4.6
## [21] spatstat.data_3.0-1 ggridges_0.5.4
## [23] pbapply_1.7-2 pkgdown_2.0.7
## [25] systemfonts_1.0.4 R.utils_2.12.2
## [27] parallelly_1.36.0 limma_3.56.2
## [29] RSQLite_2.3.1 generics_0.1.3
## [31] ica_1.0-3 spatstat.random_3.1-5
## [33] dplyr_1.1.2 ggbeeswarm_0.7.2
## [35] fansi_1.0.4 abind_1.4-5
## [37] R.methodsS3_1.8.2 lifecycle_1.0.3
## [39] yaml_2.3.7 edgeR_3.42.4
## [41] BiocFileCache_2.8.0 Rtsne_0.16
## [43] grid_4.3.0 blob_1.2.4
## [45] promises_1.2.0.1 dqrng_0.3.0
## [47] ExperimentHub_2.8.0 crayon_1.5.2
## [49] dir.expiry_1.8.0 miniUI_0.1.1.1
## [51] lattice_0.21-8 beachmat_2.16.0
## [53] cowplot_1.1.1 KEGGREST_1.40.0
## [55] magick_2.7.4 pillar_1.9.0
## [57] knitr_1.43 metapod_1.8.0
## [59] rjson_0.2.21 future.apply_1.11.0
## [61] codetools_0.2-19 leiden_0.4.3
## [63] glue_1.6.2 data.table_1.14.8
## [65] vctrs_0.6.3 png_0.1-8
## [67] gtable_0.3.3 cachem_1.0.8
## [69] xfun_0.39 mime_0.12
## [71] DropletUtils_1.20.0 survival_3.5-5
## [73] statmod_1.5.0 interactiveDisplayBase_1.38.0
## [75] ellipsis_0.3.2 fitdistrplus_1.1-11
## [77] ROCR_1.0-11 nlme_3.1-162
## [79] bit64_4.0.5 filelock_1.0.2
## [81] RcppAnnoy_0.0.21 BumpyMatrix_1.8.0
## [83] rprojroot_2.0.3 bslib_0.5.0
## [85] irlba_2.3.5.1 vipor_0.4.5
## [87] KernSmooth_2.23-22 colorspace_2.1-0
## [89] DBI_1.1.3 tidyselect_1.2.0
## [91] bit_4.0.5 compiler_4.3.0
## [93] curl_5.0.1 basilisk.utils_1.12.1
## [95] desc_1.4.2 plotly_4.10.2
## [97] scales_1.2.1 lmtest_0.9-40
## [99] rappdirs_0.3.3 stringr_1.5.0
## [101] digest_0.6.33 goftest_1.2-3
## [103] spatstat.utils_3.0-3 rmarkdown_2.23
## [105] basilisk_1.12.1 XVector_0.40.0
## [107] htmltools_0.5.5 pkgconfig_2.0.3
## [109] sparseMatrixStats_1.12.2 dbplyr_2.3.3
## [111] fastmap_1.1.1 rlang_1.1.1
## [113] htmlwidgets_1.6.2 shiny_1.7.4.1
## [115] DelayedMatrixStats_1.22.1 jquerylib_0.1.4
## [117] zoo_1.8-12 jsonlite_1.8.7
## [119] R.oo_1.25.0 RCurl_1.98-1.12
## [121] magrittr_2.0.3 GenomeInfoDbData_1.2.10
## [123] patchwork_1.1.2 Rhdf5lib_1.22.0
## [125] munsell_0.5.0 Rcpp_1.0.11
## [127] viridis_0.6.3 reticulate_1.30
## [129] stringi_1.7.12 zlibbioc_1.46.0
## [131] MASS_7.3-60 AnnotationHub_3.8.0
## [133] plyr_1.8.8 parallel_4.3.0
## [135] listenv_0.9.0 ggrepel_0.9.3
## [137] deldir_1.0-9 Biostrings_2.68.1
## [139] splines_4.3.0 tensor_1.5
## [141] locfit_1.5-9.8 igraph_1.5.0
## [143] spatstat.geom_3.2-2 reshape2_1.4.4
## [145] ScaledMatrix_1.8.1 BiocVersion_3.17.1
## [147] evaluate_0.21 BiocManager_1.30.21
## [149] httpuv_1.6.11 RANN_2.6.1
## [151] tidyr_1.3.0 purrr_1.0.1
## [153] polyclip_1.10-4 future_1.33.0
## [155] scattermore_1.2 rsvd_1.0.5
## [157] xtable_1.8-4 later_1.3.1
## [159] viridisLite_0.4.2 ragg_1.2.5
## [161] tibble_3.2.1 memoise_2.0.1
## [163] beeswarm_0.4.0 AnnotationDbi_1.62.2
## [165] cluster_2.1.4 globals_0.16.2
## [167] here_1.0.1
BiocParallel
intro
vignette.Hint: see the HDF5Array
function for reading from HDF5
and the writeHDF5Array
function for writing to HDF5 from
the HDF5Array
package.
Hint: use the function system.time
to obtain the runtime
of each job.
SingleCellExperiment
objects with
annotated cell- and gene-level metadata. Consult the vignette of the scRNAseq package
to inspect all available datasets and select a dataset of your choice.
Convert the chosen dataset to a Seurat object and produce a PCA plot
with cells colored by a cell metadata column of your choice.Hint: use Seurat’s DimPlot
function.