SparseArray objects
SparseArray-class.RdThe SparseArray package defines the SparseArray virtual class whose purpose is to be extended by other S4 classes that aim at representing in-memory multidimensional sparse arrays.
It has currently two concrete subclasses, COO_SparseArray and SVT_SparseArray, both also defined in this package. Each subclass uses its own internal representation for the nonzero multidimensional data, the COO layout for COO_SparseArray, and the SVT layout for SVT_SparseArray. The two layouts are described in the COO_SparseArray and SVT_SparseArray man pages, respectively.
Finally, the package also defines the SparseMatrix virtual class, as a subclass of the SparseArray class, for the specific 2D case.
Arguments
- x
An ordinary matrix or array, or a dg[C|R]Matrix object, or an lg[C|R]Matrix object, or any matrix-like or array-like object that supports coercion to SVT_SparseArray.
- type
A single string specifying the requested type of the object.
By default, the SparseArray object returned by the constructor function has the same
type()asx. However the user can use thetypeargument to request a different type. Note that doing:sa <- SparseArray(x, type=type)is equivalent to doing:
sa <- SparseArray(x) type(sa) <- typebut the former is more convenient and will generally be more efficient.
Supported types are all R atomic types plus
"list".
Details
The SparseArray class extends the Array virtual class defined in the S4Arrays package. Here is the full SparseArray sub-hierarchy as defined in the SparseArray package (virtual classes are marked with an asterisk):
: Array class : Array*
: hierarchy : ^
|
- - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - -
: SparseArray : SparseArray*
: sub-hierarchy : ^ ^ ^
| | |
COO_SparseArray | SVT_SparseArray
^ | ^
- - - - - - - - - - - - | - - - - | - - - - | - - - - - - - - - -
: SparseMatrix : | SparseMatrix* |
: sub-sub-hierarchy : | ^ ^ |
| | | |
COO_SparseMatrix SVT_SparseMatrixAny object that belongs to a class that extends SparseArray e.g. (a SVT_SparseArray or SVT_SparseMatrix object) is called a SparseArray derivative.
Most of the standard matrix and array API defined in base R should
work on SparseArray derivatives, including dim(), length(),
dimnames(), `dimnames<-`(), [, drop(),
`[<-` (subassignment), t(), rbind(), cbind(),
etc...
SparseArray derivatives also support type(), `type<-`(),
is_sparse(), nzcount(), nzwhich(), nzvals(),
`nzvals<-`(), sparsity(), arbind(), and acbind().
Value
A SparseArray derivative, that is a SVT_SparseArray, COO_SparseArray, SVT_SparseMatrix, or COO_SparseMatrix object.
The type() of the input object is preserved, except if a
different one was requested via the type argument.
What is considered a zero depends on the type():
"logical"zero isFALSE;"integer"zero is0L;"double"zero is0;"complex"zero is0+0i;"raw"zero israw(1);"character"zero is""(empty string);"list"zero isNULL.
See also
The COO_SparseArray and SVT_SparseArray classes.
is_nonzero for
is_nonzero()andnz*()functionsnzcount(),nzwhich(), etc...SparseArray_aperm for permuting the dimensions of a SparseArray object (e.g. transposition).
SparseArray_subsetting for subsetting a SparseArray object.
SparseArray_subassignment for SparseArray subassignment.
SparseArray_abind for combining 2D or multidimensional SparseArray objects.
SparseArray_summarization for SparseArray summarization methods.
SparseArray_Arith, SparseArray_Compare, and SparseArray_Logic, for operations from the
Arith,Compare, andLogicgroups on SparseArray objects.SparseArray_Math for operations from the
MathandMath2groups on SparseArray objects.SparseArray_Complex for operations from the
Complexgroup on SparseArray objects.SparseArray_misc for miscellaneous operations on a SparseArray object.
SparseArray_matrixStats for col/row summarization methods for SparseArray objects.
rowsum_methods for
rowsum()methods for sparse matrices.SparseMatrix_mult for SparseMatrix multiplication and cross-product.
randomSparseArrayto generate a random SparseArray object.readSparseCSVto read/write a sparse matrix from/to a CSV (comma-separated values) file.dgCMatrix-class, dgRMatrix-class, and lgCMatrix-class in the Matrix package, for the de facto standard for sparse matrix representations in the R ecosystem.
is_sparsein the S4Arrays package.The Array class defined in the S4Arrays package.
Ordinary array objects in base R.
base::whichin base R.
Examples
## ---------------------------------------------------------------------
## Display details of class definition & known subclasses
## ---------------------------------------------------------------------
showClass("SparseArray")
#> Virtual Class "SparseArray" [package "SparseArray"]
#>
#> Slots:
#>
#> Name: dim dimnames
#> Class: integer list
#>
#> Extends: "Array"
#>
#> Known Subclasses:
#> Class "SparseMatrix", directly
#> Class "COO_SparseArray", directly
#> Class "SVT_SparseArray", directly
#> Class "COO_SparseMatrix", by class "SparseMatrix", distance 2
#> Class "COO_SparseMatrix", by class "COO_SparseArray", distance 2, with explicit coerce
#> Class "SVT_SparseMatrix", by class "SVT_SparseArray", distance 2
## ---------------------------------------------------------------------
## The SparseArray() constructor
## ---------------------------------------------------------------------
a <- array(rpois(9e6, lambda=0.3), dim=c(500, 3000, 6))
SparseArray(a) # an SVT_SparseArray object
#> <500 x 3000 x 6 SparseArray> of type "integer" [nzcount=2333666 (26%)]:
#> ,,1
#> [,1] [,2] [,3] [,4] ... [,2997] [,2998] [,2999] [,3000]
#> [1,] 1 1 1 0 . 0 0 1 1
#> [2,] 0 1 0 0 . 1 0 0 0
#> ... . . . . . . . . .
#> [499,] 0 0 0 1 . 0 0 0 1
#> [500,] 0 0 0 1 . 0 0 0 0
#>
#> ...
#>
#> ,,6
#> [,1] [,2] [,3] [,4] ... [,2997] [,2998] [,2999] [,3000]
#> [1,] 0 1 1 0 . 0 1 0 1
#> [2,] 0 3 0 1 . 0 0 0 2
#> ... . . . . . . . . .
#> [499,] 0 0 0 0 . 0 0 0 1
#> [500,] 0 0 0 0 . 1 0 0 0
#>
m <- matrix(rpois(9e6, lambda=0.3), ncol=500)
SparseArray(m) # an SVT_SparseMatrix object
#> <18000 x 500 SparseMatrix> of type "integer" [nzcount=2333381 (26%)]:
#> [,1] [,2] [,3] [,4] ... [,497] [,498] [,499] [,500]
#> [1,] 0 0 0 0 . 0 0 0 0
#> [2,] 1 0 1 0 . 0 1 0 0
#> [3,] 0 1 0 2 . 0 0 0 1
#> [4,] 0 0 0 0 . 0 0 1 0
#> [5,] 1 0 0 0 . 1 0 0 0
#> ... . . . . . . . . .
#> [17996,] 0 0 0 0 . 0 0 0 2
#> [17997,] 0 0 0 1 . 0 0 0 0
#> [17998,] 0 1 0 0 . 0 0 0 1
#> [17999,] 1 1 0 0 . 0 0 0 0
#> [18000,] 0 0 0 0 . 1 0 0 1
dgc <- sparseMatrix(i=c(4:1, 2:4, 9:12, 11:9), j=c(1:7, 1:7),
x=runif(14), dims=c(12, 7))
class(dgc)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
SparseArray(dgc) # an SVT_SparseMatrix object
#> <12 x 7 SparseMatrix> of type "double" [nzcount=14 (17%)]:
#> [,1] [,2] [,3] ... [,6] [,7]
#> [1,] 0.0000000 0.0000000 0.0000000 . 0.00000000 0.00000000
#> [2,] 0.0000000 0.0000000 0.2981720 . 0.00000000 0.00000000
#> [3,] 0.0000000 0.1679951 0.0000000 . 0.08551527 0.00000000
#> [4,] 0.7284175 0.0000000 0.0000000 . 0.00000000 0.58189642
#> [5,] 0.0000000 0.0000000 0.0000000 . 0.00000000 0.00000000
#> ... . . . . . .
#> [8,] 0.0000000 0.0000000 0.0000000 . 0.0000000 0.0000000
#> [9,] 0.3782352 0.0000000 0.0000000 . 0.0000000 0.6285016
#> [10,] 0.0000000 0.1226004 0.0000000 . 0.4560827 0.0000000
#> [11,] 0.0000000 0.0000000 0.5249243 . 0.0000000 0.0000000
#> [12,] 0.0000000 0.0000000 0.0000000 . 0.0000000 0.0000000
dgr <- as(dgc, "RsparseMatrix")
class(dgr)
#> [1] "dgRMatrix"
#> attr(,"package")
#> [1] "Matrix"
SparseArray(dgr) # a COO_SparseMatrix object
#> <12 x 7 SparseMatrix> of type "double" [nzcount=14 (17%)]:
#> [,1] [,2] [,3] ... [,6] [,7]
#> [1,] 0.0000000 0.0000000 0.0000000 . 0.00000000 0.00000000
#> [2,] 0.0000000 0.0000000 0.2981720 . 0.00000000 0.00000000
#> [3,] 0.0000000 0.1679951 0.0000000 . 0.08551527 0.00000000
#> [4,] 0.7284175 0.0000000 0.0000000 . 0.00000000 0.58189642
#> [5,] 0.0000000 0.0000000 0.0000000 . 0.00000000 0.00000000
#> ... . . . . . .
#> [8,] 0.0000000 0.0000000 0.0000000 . 0.0000000 0.0000000
#> [9,] 0.3782352 0.0000000 0.0000000 . 0.0000000 0.6285016
#> [10,] 0.0000000 0.1226004 0.0000000 . 0.4560827 0.0000000
#> [11,] 0.0000000 0.0000000 0.5249243 . 0.0000000 0.0000000
#> [12,] 0.0000000 0.0000000 0.0000000 . 0.0000000 0.0000000
## ---------------------------------------------------------------------
## nzcount(), nzwhich(), nzvals(), `nzvals<-`()
## ---------------------------------------------------------------------
x <- SparseArray(a)
## Get the number of nonzero array elements in 'x':
nzcount(x)
#> [1] 2333666
## nzwhich() returns the indices of the nonzero array elements in 'x'.
## Either as an integer (or numeric) vector of length 'nzcount(x)'
## containing "linear indices":
nzidx <- nzwhich(x)
length(nzidx)
#> [1] 2333666
head(nzidx)
#> [1] 1 5 10 14 15 26
## Or as an integer matrix with 'nzcount(x)' rows and one column per
## dimension where the rows represent "array indices" (a.k.a. "array
## coordinates"):
Mnzidx <- nzwhich(x, arr.ind=TRUE)
dim(Mnzidx)
#> [1] 2333666 3
## Each row in the matrix is an n-tuple representing the "array
## coordinates" of a nonzero element in 'x':
head(Mnzidx)
#> [,1] [,2] [,3]
#> [1,] 1 1 1
#> [2,] 5 1 1
#> [3,] 10 1 1
#> [4,] 14 1 1
#> [5,] 15 1 1
#> [6,] 26 1 1
tail(Mnzidx)
#> [,1] [,2] [,3]
#> [2333661,] 477 3000 6
#> [2333662,] 482 3000 6
#> [2333663,] 487 3000 6
#> [2333664,] 489 3000 6
#> [2333665,] 497 3000 6
#> [2333666,] 499 3000 6
## Extract the values of the nonzero array elements in 'x' and return
## them in a vector "parallel" to 'nzwhich(x)':
x_nzvals <- nzvals(x) # equivalent to 'x[nzwhich(x)]'
length(x_nzvals)
#> [1] 2333666
head(x_nzvals)
#> [1] 1 1 1 1 1 1
nzvals(x) <- log1p(nzvals(x))
x
#> <500 x 3000 x 6 SparseArray> of type "double" [nzcount=2333666 (26%)]:
#> ,,1
#> [,1] [,2] [,3] ... [,2999] [,3000]
#> [1,] 0.6931472 0.6931472 0.6931472 . 0.6931472 0.6931472
#> [2,] 0.0000000 0.6931472 0.0000000 . 0.0000000 0.0000000
#> ... . . . . . .
#> [499,] 0 0 0 . 0.0000000 0.6931472
#> [500,] 0 0 0 . 0.0000000 0.0000000
#>
#> ...
#>
#> ,,6
#> [,1] [,2] [,3] ... [,2999] [,3000]
#> [1,] 0.0000000 0.6931472 0.6931472 . 0.0000000 0.6931472
#> [2,] 0.0000000 1.3862944 0.0000000 . 0.0000000 1.0986123
#> ... . . . . . .
#> [499,] 0 0 0 . 0.0000000 0.6931472
#> [500,] 0 0 0 . 0.0000000 0.0000000
#>
## Sanity checks:
stopifnot(identical(nzidx, which(a != 0)))
stopifnot(identical(Mnzidx, which(a != 0, arr.ind=TRUE, useNames=FALSE)))
stopifnot(identical(x_nzvals, a[nzidx]))
stopifnot(identical(x_nzvals, a[Mnzidx]))
stopifnot(identical(`nzvals<-`(x, nzvals(x)), x))