Skip to contents

The SparseArray package provides memory-efficient col/row summarization methods (a.k.a. matrixStats methods) for SparseArray objects, like colSums(), rowSums(), colMedians(), rowMedians(), colVars(), rowVars(), etc...

Note that these are S4 generic functions defined in the MatrixGenerics package, with methods for ordinary matrices defined in the matrixStats package. This man page documents the methods defined for SparseArray objects.

Usage

## N.B.: Showing ONLY the col*() methods (usage of row*() methods is
## the same):

# S4 method for class 'SparseArray'
colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colSums(x, na.rm=FALSE, dims=1)

# S4 method for class 'SparseArray'
colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colMeans(x, na.rm=FALSE, dims=1)

# S4 method for class 'SparseArray'
colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

# S4 method for class 'SparseArray'
colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
           ..., useNames=NA)

# S4 method for class 'SparseArray'
colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
          ..., useNames=NA)

# S4 method for class 'SparseArray'
colMedians(x, rows=NULL, cols=NULL, na.rm=FALSE, ..., useNames=NA)

Arguments

x

A SparseMatrix or SparseArray object.

Note that the colMedians() and rowMedians() methods only support 2D objects (i.e. SparseMatrix objects) at the moment.

rows, cols, ...

Not supported.

na.rm, useNames, center

See man pages of the corresponding generics in the MatrixGenerics package (e.g. ?MatrixGenerics::colVars) for a description of these arguments.

Note that, unlike the methods for ordinary matrices defined in the matrixStats package, the center argument of the colVars(), rowVars(), colSds(), and rowSds() methods for SparseArray objects can only be a single value (or a NULL). In particular, if x has more than one column, then center cannot be a vector with one value per column in x.

dims

See ?base::colSums for a description of this argument. Note that all the methods above support it, except colMedians() and rowMedians().

Details

All these methods operate natively on the SVT_SparseArray internal representation, for maximum efficiency.

Note that more col/row summarization methods might be added in the future.

Value

See man pages of the corresponding generics in the MatrixGenerics package (e.g. ?MatrixGenerics::colRanges) for the value returned by these methods.

Note

Most col*() methods for SparseArray objects are multithreaded. See set_SparseArray_nthread for how to control the number of threads.

See also

  • SparseArray objects.

  • The man pages of the various generic functions defined in the MatrixGenerics package e.g. MatrixGenerics::colVars etc...

Examples

## ---------------------------------------------------------------------
## 2D CASE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=6, ncol=4, dimnames=list(letters[1:6], LETTERS[1:4]))
m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L
m0["e", "B"] <- NA
svt0 <- SparseArray(m0)
svt0
#> <6 x 4 SparseMatrix> of type "integer" [nzcount=9 (38%)]:
#>    A  B  C  D
#> a 10  0  0  0
#> b 20 30  0  0
#> c  0  0 50  0
#> d  0 40 60  0
#> e  0 NA 70  0
#> f  0  0  0 80

colSums(svt0)
#>   A   B   C   D 
#>  30  NA 180  80 
colSums(svt0, na.rm=TRUE)
#>   A   B   C   D 
#>  30  70 180  80 

rowSums(svt0)
#>   a   b   c   d   e   f 
#>  10  50  50 100  NA  80 
rowSums(svt0, na.rm=TRUE)
#>   a   b   c   d   e   f 
#>  10  50  50 100  70  80 

colMeans(svt0)
#>        A        B        C        D 
#>  5.00000       NA 30.00000 13.33333 
colMeans(svt0, na.rm=TRUE)
#>        A        B        C        D 
#>  5.00000 14.00000 30.00000 13.33333 

colRanges(svt0)
#>   [,1] [,2]
#> A    0   20
#> B   NA   NA
#> C    0   70
#> D    0   80
colRanges(svt0, useNames=FALSE)
#>      [,1] [,2]
#> [1,]    0   20
#> [2,]   NA   NA
#> [3,]    0   70
#> [4,]    0   80
colRanges(svt0, na.rm=TRUE)
#>   [,1] [,2]
#> A    0   20
#> B    0   40
#> C    0   70
#> D    0   80
colRanges(svt0, na.rm=TRUE, useNames=FALSE)
#>      [,1] [,2]
#> [1,]    0   20
#> [2,]    0   40
#> [3,]    0   70
#> [4,]    0   80

colVars(svt0)
#>        A        B        C        D 
#>   70.000       NA 1120.000 1066.667 
colVars(svt0, useNames=FALSE)
#> [1]   70.000       NA 1120.000 1066.667

## Sanity checks:
stopifnot(
  identical(colSums(svt0), colSums(m0)),
  identical(colSums(svt0, na.rm=TRUE), colSums(m0, na.rm=TRUE)),
  identical(rowSums(svt0), rowSums(m0)),
  identical(rowSums(svt0, na.rm=TRUE), rowSums(m0, na.rm=TRUE)),
  identical(colMeans(svt0), colMeans(m0)),
  identical(colMeans(svt0, na.rm=TRUE), colMeans(m0, na.rm=TRUE)),
  identical(colRanges(svt0), colRanges(m0, useNames=TRUE)),
  identical(colRanges(svt0, useNames=FALSE), colRanges(m0, useNames=FALSE)),
  identical(colRanges(svt0, na.rm=TRUE),
            colRanges(m0, na.rm=TRUE, useNames=TRUE)),
  identical(colVars(svt0), colVars(m0, useNames=TRUE)),
  identical(colVars(svt0, na.rm=TRUE),
            colVars(m0, na.rm=TRUE, useNames=TRUE))
)

## ---------------------------------------------------------------------
## 3D CASE (AND ARBITRARY NUMBER OF DIMENSIONS)
## ---------------------------------------------------------------------
set.seed(2009)
svt <- 6L * (poissonSparseArray(5:3, density=0.35) -
             poissonSparseArray(5:3, density=0.35))
dimnames(svt) <- list(NULL, letters[1:4], LETTERS[1:3])

cs1 <- colSums(svt)
cs1  # cs1[j , k] is equal to sum(svt[ , j, k])
#>     A   B  C
#> a -24 -12  6
#> b   0   0 18
#> c -18   0  6
#> d -12  -6  6

cs2 <- colSums(svt, dims=2)
cs2  # cv2[k] is equal to sum(svt[ , , k])
#>   A   B   C 
#> -54 -18  36 

cv1 <- colVars(svt)
cv1  # cv1[j , k] is equal to var(svt[ , j, k])
#>      A    B     C
#> a 25.2 28.8  61.2
#> b 18.0  0.0  10.8
#> c 28.8 36.0 115.2
#> d 46.8 25.2   7.2

cv2 <- colVars(svt, dims=2) 
cv2  # cv2[k] is equal to var(svt[ , , k])
#>        A        B        C 
#> 28.32632 19.98947 42.06316 

## Sanity checks:
k_idx <- setNames(seq_len(dim(svt)[3]), dimnames(svt)[[3]])
j_idx <- setNames(seq_len(dim(svt)[2]), dimnames(svt)[[2]])
cv1b <- sapply(k_idx, function(k)
                      sapply(j_idx, function(j) var(svt[ , j, k, drop=FALSE])))
cv2b <- sapply(k_idx, function(k) var(svt[ , , k]))
stopifnot(
  identical(colSums(svt), colSums(as.array(svt))),
  identical(colSums(svt, dims=2), colSums(as.array(svt), dims=2)),
  identical(cv1, cv1b),
  identical(cv2, cv2b)
)