RaggedExperiment
Martin Morgan
Roswell Park Comprehensive Cancer Center, Buffalo, NYMarcel Ramos
Roswell Park Comprehensive Cancer Center, Buffalo, NY1 May 2025
Source:vignettes/RaggedExperiment.Rmd
RaggedExperiment.Rmd
Introduction
The RaggedExperiment package provides a flexible data representation for copy number, mutation and other ragged array schema for genomic location data. It aims to provide a framework for a set of samples that have differing numbers of genomic ranges.
The RaggedExperiment
class derives from a
GRangesList
representation and provides a semblance of a
rectangular dataset. The row and column dimensions of the
RaggedExperiment
correspond to the number of ranges in the
entire dataset and the number of samples represented in the data,
respectively.
Installation
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("RaggedExperiment")
Loading the package:
Citing RaggedExperiment
Your citations are crucial in keeping our software free and open source. To cite our package see the citation (Ramos et al. (2023)) in the Reference section. You may also browse to the publication at the link here.
RaggedExperiment
class overview
A schematic showing the class geometry and supported transformations
of the RaggedExperiment
class is show below. There are
three main operations for transforming the RaggedExperiment
representation:
sparseAssay
compactAssay
qreduceAssay
RaggedExperiment object schematic. Rows and columns represent genomic ranges and samples, respectively. Assay operations can be performed with (from left to right) compactAssay, qreduceAssay, and sparseAssay.
Constructing a RaggedExperiment
object
We start with a couple of GRanges
objects, each
representing an individual sample:
sample1 <- GRanges(
c(A = "chr1:1-10:-", B = "chr1:8-14:+", C = "chr2:15-18:+"),
score = 3:5)
sample2 <- GRanges(
c(D = "chr1:1-10:-", E = "chr2:11-18:+"),
score = 1:2)
Include column data colData
to describe the samples:
colDat <- DataFrame(id = 1:2)
Using GRanges
objects
ragexp <- RaggedExperiment(
sample1 = sample1,
sample2 = sample2,
colData = colDat
)
ragexp
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
Using a GRangesList
instance
grl <- GRangesList(sample1 = sample1, sample2 = sample2)
RaggedExperiment(grl, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
Using a list
of GRanges
rangeList <- list(sample1 = sample1, sample2 = sample2)
RaggedExperiment(rangeList, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
Using a List
of GRanges
with metadata
Note: In cases where a
SimpleGenomicRangesList
is provided along with accompanying
metadata (accessed by mcols
), the metadata is used as the
colData
for the RaggedExperiment
.
grList <- List(sample1 = sample1, sample2 = sample2)
mcols(grList) <- colDat
RaggedExperiment(grList)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
Accessors
Range data
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Dimension names
dimnames(ragexp)
## [[1]]
## [1] "A" "B" "C" "D" "E"
##
## [[2]]
## [1] "sample1" "sample2"
colData
colData(ragexp)
## DataFrame with 2 rows and 1 column
## id
## <integer>
## sample1 1
## sample2 2
Subsetting
*Assay functions
RaggedExperiment
package provides several different
functions for representing ranged data in a rectangular matrix via the
*Assay
methods.
sparseAssay
The most straightforward matrix representation of a
RaggedExperiment
will return a matrix of dimensions equal
to the product of the number of ranges and samples.
dim(ragexp)
## [1] 5 2
## [1] 10
sparseAssay(ragexp)
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
length(sparseAssay(ragexp))
## [1] 10
Support for sparse matrix output
We provide sparse matrix representations with the help of the
Matrix
package. To obtain a sparse representation, the user
can use the sparse = TRUE
argument.
sparseAssay(ragexp, sparse = TRUE)
## 5 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## A 3 .
## B 4 .
## C 5 .
## D . 1
## E . 2
This representation is of class dgCMatrix
see the
Matrix
documentation for more details.
compactAssay
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-18:+ NA 2
## chr2:15-18:+ 5 NA
Similarly, to sparseAssay
the compactAssay
function provides the option to obtain a sparse matrix representation
with the sparse = TRUE
argument. This will return a
dgCMatrix
class from the Matrix
package.
compactAssay(ragexp, sparse = TRUE)
## 4 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## chr1:8-14:+ 4 .
## chr1:1-10:- 3 1
## chr2:11-18:+ . 2
## chr2:15-18:+ 5 .
disjoinAssay
This function returns a matrix of disjoint ranges across all samples.
Elements of the matrix are summarized by applying the
simplifyDisjoin
functional argument to assay values of
overlapping ranges.
disjoinAssay(ragexp, simplifyDisjoin = mean)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-14:+ NA 2
## chr2:15-18:+ 5 2
qreduceAssay
The qreduceAssay
function works with a
query
parameter that highlights a window of ranges for the
resulting matrix. The returned matrix will have dimensions
length(query)
by ncol(x)
. Elements contain
assay values for the i th query range and the j th
sample, summarized according to the simplifyReduce
functional argument.
For demonstration purposes, we can have a look at the original
GRangesList
and the associated scores from which the
current ragexp
object is derived:
unlist(grl, use.names = FALSE)
## GRanges object with 5 ranges and 1 metadata column:
## seqnames ranges strand | score
## <Rle> <IRanges> <Rle> | <integer>
## A chr1 1-10 - | 3
## B chr1 8-14 + | 4
## C chr2 15-18 + | 5
## D chr1 1-10 - | 1
## E chr2 11-18 + | 2
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
This data is represented as rowRanges
and
assays
in RaggedExperiment
:
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
assay(ragexp, "score")
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
Here we provide the “query” region of interest:
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-14 -
## [2] chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The simplifyReduce
argument in qreduceAssay
allows the user to summarize overlapping regions with a custom method
for the given “query” region of interest. We provide one for calculating
a weighted average score per query range, where the weight is
proportional to the overlap widths between overlapping ranges and a
query range.
Note that there are three arguments to this function. See the documentation for additional details.
weightedmean <- function(scores, ranges, qranges)
{
isects <- pintersect(ranges, qranges)
sum(scores * width(isects)) / sum(width(isects))
}
A call to qreduceAssay
involves the
RaggedExperiment
, the GRanges
query and the
simplifyReduce
functional argument.
qreduceAssay(ragexp, query, simplifyReduce = weightedmean)
## sample1 sample2
## chr1:1-14:- 3 1
## chr2:11-18:+ 5 2
See the schematic for a visual representation.
Coercion
The RaggedExperiment
provides a family of parallel
functions for coercing to the SummarizedExperiment
class.
By selecting a particular assay index (i
), a parallel assay
coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperiment
compactSummarizedExperiment
disjoinSummarizedExperiment
qreduceSummarizedExperiment
See the documentation for details.
from dgCMatrix to RaggedExperiment
In the special case where the rownames of a sparseMatrix
are coercible to GRanges
, RaggedExperiment
provides the facility to convert sparse matrices into
RaggedExperiment
. This can be done using the
as
coercion method. The example below first creates an
example sparse dgCMatrix
class and then shows the
as
method usage to this end.
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
sm <- Matrix::sparseMatrix(
i = c(2, 3, 4, 3, 4, 3, 4),
j = c(1, 1, 1, 3, 3, 4, 4),
x = c(2L, 4L, 2L, 2L, 2L, 4L, 2L),
dims = c(4, 4),
dimnames = list(
c("chr2:1-10", "chr2:2-10", "chr2:3-10", "chr2:4-10"),
LETTERS[1:4]
)
)
as(sm, "RaggedExperiment")
## class: RaggedExperiment
## dim: 7 3
## assays(1): counts
## rownames: NULL
## colnames(3): A C D
## colData names(0):
Session Information
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Matrix_1.7-3 RaggedExperiment_1.32.1 GenomicRanges_1.60.0
## [4] GenomeInfoDb_1.44.0 IRanges_2.42.0 S4Vectors_0.46.0
## [7] BiocGenerics_0.54.0 generics_0.1.3 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_2.0.0 crayon_1.5.3
## [3] compiler_4.5.0 BiocManager_1.30.25
## [5] BiocBaseUtils_1.10.0 SummarizedExperiment_1.38.1
## [7] Biobase_2.68.0 jquerylib_0.1.4
## [9] systemfonts_1.2.3 textshaping_1.0.0
## [11] yaml_2.3.10 fastmap_1.2.0
## [13] lattice_0.22-7 R6_2.6.1
## [15] XVector_0.48.0 S4Arrays_1.8.0
## [17] knitr_1.50 htmlwidgets_1.6.4
## [19] DelayedArray_0.34.1 bookdown_0.43
## [21] desc_1.4.3 MatrixGenerics_1.20.0
## [23] GenomeInfoDbData_1.2.14 bslib_0.9.0
## [25] rlang_1.1.6 cachem_1.1.0
## [27] xfun_0.52 fs_1.6.6
## [29] sass_0.4.10 SparseArray_1.8.0
## [31] cli_3.6.5 pkgdown_2.1.2
## [33] grid_4.5.0 digest_0.6.37
## [35] lifecycle_1.0.4 evaluate_1.0.3
## [37] ragg_1.4.0 abind_1.4-8
## [39] rmarkdown_2.29 httr_1.4.7
## [41] matrixStats_1.5.0 tools_4.5.0
## [43] htmltools_0.5.8.1 UCSC.utils_1.4.0