SparseArray col/row summarization
SparseArray-matrixStats.Rd
The SparseArray package provides memory-efficient col/row
summarization methods (a.k.a. matrixStats methods) for SparseArray
objects, like colSums()
, rowSums()
, colMedians()
,
rowMedians()
, colVars()
, rowVars()
, etc...
Note that these are S4 generic functions defined in the MatrixGenerics package, with methods for ordinary matrices defined in the matrixStats package. This man page documents the methods defined for SparseArray objects.
Usage
## N.B.: Showing ONLY the col*() methods (usage of row*() methods is
## the same):
# S4 method for class 'SparseArray'
colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colSums(x, na.rm=FALSE, dims=1)
# S4 method for class 'SparseArray'
colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colMeans(x, na.rm=FALSE, dims=1)
# S4 method for class 'SparseArray'
colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)
# S4 method for class 'SparseArray'
colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
..., useNames=NA)
# S4 method for class 'SparseArray'
colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
..., useNames=NA)
# S4 method for class 'SparseArray'
colMedians(x, rows=NULL, cols=NULL, na.rm=FALSE, ..., useNames=NA)
Arguments
- x
A SparseMatrix or SparseArray object.
Note that the
colMedians()
androwMedians()
methods only support 2D objects (i.e. SparseMatrix objects) at the moment.- rows, cols, ...
Not supported.
- na.rm, useNames, center
See man pages of the corresponding generics in the MatrixGenerics package (e.g.
?MatrixGenerics::colVars
) for a description of these arguments.Note that, unlike the methods for ordinary matrices defined in the matrixStats package, the
center
argument of thecolVars()
,rowVars()
,colSds()
, androwSds()
methods for SparseArray objects can only be a single value (or aNULL
). In particular, ifx
has more than one column, thencenter
cannot be a vector with one value per column inx
.- dims
See
?base::colSums
for a description of this argument. Note that all the methods above support it, exceptcolMedians()
androwMedians()
.
Details
All these methods operate natively on the SVT_SparseArray internal representation, for maximum efficiency.
Note that more col/row summarization methods might be added in the future.
Value
See man pages of the corresponding generics in the MatrixGenerics
package (e.g. ?MatrixGenerics::colRanges
)
for the value returned by these methods.
Note
Most col*()
methods for SparseArray objects are multithreaded.
See set_SparseArray_nthread
for how to control the number
of threads.
See also
SparseArray objects.
The man pages of the various generic functions defined in the MatrixGenerics package e.g.
MatrixGenerics::colVars
etc...
Examples
## ---------------------------------------------------------------------
## 2D CASE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=6, ncol=4, dimnames=list(letters[1:6], LETTERS[1:4]))
m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L
m0["e", "B"] <- NA
svt0 <- SparseArray(m0)
svt0
#> <6 x 4 SparseMatrix> of type "integer" [nzcount=9 (38%)]:
#> A B C D
#> a 10 0 0 0
#> b 20 30 0 0
#> c 0 0 50 0
#> d 0 40 60 0
#> e 0 NA 70 0
#> f 0 0 0 80
colSums(svt0)
#> A B C D
#> 30 NA 180 80
colSums(svt0, na.rm=TRUE)
#> A B C D
#> 30 70 180 80
rowSums(svt0)
#> a b c d e f
#> 10 50 50 100 NA 80
rowSums(svt0, na.rm=TRUE)
#> a b c d e f
#> 10 50 50 100 70 80
colMeans(svt0)
#> A B C D
#> 5.00000 NA 30.00000 13.33333
colMeans(svt0, na.rm=TRUE)
#> A B C D
#> 5.00000 14.00000 30.00000 13.33333
colRanges(svt0)
#> [,1] [,2]
#> A 0 20
#> B NA NA
#> C 0 70
#> D 0 80
colRanges(svt0, useNames=FALSE)
#> [,1] [,2]
#> [1,] 0 20
#> [2,] NA NA
#> [3,] 0 70
#> [4,] 0 80
colRanges(svt0, na.rm=TRUE)
#> [,1] [,2]
#> A 0 20
#> B 0 40
#> C 0 70
#> D 0 80
colRanges(svt0, na.rm=TRUE, useNames=FALSE)
#> [,1] [,2]
#> [1,] 0 20
#> [2,] 0 40
#> [3,] 0 70
#> [4,] 0 80
colVars(svt0)
#> A B C D
#> 70.000 NA 1120.000 1066.667
colVars(svt0, useNames=FALSE)
#> [1] 70.000 NA 1120.000 1066.667
## Sanity checks:
stopifnot(
identical(colSums(svt0), colSums(m0)),
identical(colSums(svt0, na.rm=TRUE), colSums(m0, na.rm=TRUE)),
identical(rowSums(svt0), rowSums(m0)),
identical(rowSums(svt0, na.rm=TRUE), rowSums(m0, na.rm=TRUE)),
identical(colMeans(svt0), colMeans(m0)),
identical(colMeans(svt0, na.rm=TRUE), colMeans(m0, na.rm=TRUE)),
identical(colRanges(svt0), colRanges(m0, useNames=TRUE)),
identical(colRanges(svt0, useNames=FALSE), colRanges(m0, useNames=FALSE)),
identical(colRanges(svt0, na.rm=TRUE),
colRanges(m0, na.rm=TRUE, useNames=TRUE)),
identical(colVars(svt0), colVars(m0, useNames=TRUE)),
identical(colVars(svt0, na.rm=TRUE),
colVars(m0, na.rm=TRUE, useNames=TRUE))
)
## ---------------------------------------------------------------------
## 3D CASE (AND ARBITRARY NUMBER OF DIMENSIONS)
## ---------------------------------------------------------------------
set.seed(2009)
svt <- 6L * (poissonSparseArray(5:3, density=0.35) -
poissonSparseArray(5:3, density=0.35))
dimnames(svt) <- list(NULL, letters[1:4], LETTERS[1:3])
cs1 <- colSums(svt)
cs1 # cs1[j , k] is equal to sum(svt[ , j, k])
#> A B C
#> a -24 -12 6
#> b 0 0 18
#> c -18 0 6
#> d -12 -6 6
cs2 <- colSums(svt, dims=2)
cs2 # cv2[k] is equal to sum(svt[ , , k])
#> A B C
#> -54 -18 36
cv1 <- colVars(svt)
cv1 # cv1[j , k] is equal to var(svt[ , j, k])
#> A B C
#> a 25.2 28.8 61.2
#> b 18.0 0.0 10.8
#> c 28.8 36.0 115.2
#> d 46.8 25.2 7.2
cv2 <- colVars(svt, dims=2)
cv2 # cv2[k] is equal to var(svt[ , , k])
#> A B C
#> 28.32632 19.98947 42.06316
## Sanity checks:
k_idx <- setNames(seq_len(dim(svt)[3]), dimnames(svt)[[3]])
j_idx <- setNames(seq_len(dim(svt)[2]), dimnames(svt)[[2]])
cv1b <- sapply(k_idx, function(k)
sapply(j_idx, function(j) var(svt[ , j, k, drop=FALSE])))
cv2b <- sapply(k_idx, function(k) var(svt[ , , k]))
stopifnot(
identical(colSums(svt), colSums(as.array(svt))),
identical(colSums(svt, dims=2), colSums(as.array(svt), dims=2)),
identical(cv1, cv1b),
identical(cv2, cv2b)
)