Random SparseArray object
randomSparseArray.RdrandomSparseArray() and poissonSparseArray() can be used
to generate a random SparseArray object efficiently.
Usage
randomSparseArray(dim, density=0.05, dimnames=NULL)
poissonSparseArray(dim, lambda=-log(0.95), density=NA, dimnames=NULL)
## Convenience wrappers for the 2D case:
randomSparseMatrix(nrow, ncol, density=0.05, dimnames=NULL)
poissonSparseMatrix(nrow, ncol, lambda=-log(0.95), density=NA,
dimnames=NULL)Arguments
- dim
The dimensions (specified as an integer vector) of the SparseArray object to generate.
- density
The desired density (specified as a number >= 0 and <= 1) of the SparseArray object to generate, that is, the ratio between its number of nonzero elements and its total number of elements. This is
nzcount(x)/length(x)or1 - sparsity(x).Note that for
poissonSparseArray()andpoissonSparseMatrix()densitymust be < 1 and the actual density of the returned object won't be exactly as requested but will typically be very close.- dimnames
The dimnames to put on the object to generate. Must be
NULLor a list of length the number of dimensions. Each list element must be eitherNULLor a character vector along the corresponding dimension.- lambda
The mean of the Poisson distribution. Passed internally to the calls to
rpois().Only one of
lambdaanddensitycan be specified.When
densityis requested,rpois()is called internally withlambdaset to-log(1 - density). This is expected to generate Poisson data with the requested density.Finally note that the default value for
lambdacorresponds to a requested density of 0.05.- nrow, ncol
Number of rows and columns of the SparseMatrix object to generate.
Details
randomSparseArray() mimics the rsparsematrix()
function from the Matrix package but returns a SparseArray
object instead of a dgCMatrix object.
poissonSparseArray() populates a SparseArray object with
Poisson data i.e. it's equivalent to:
but is faster and more memory efficient because intermediate dense array
a is never generated.
Value
A SparseArray derivative (of class SVT_SparseArray or SVT_SparseMatrix) with the requested dimensions and density.
The type of the returned object is "double" for
randomSparseArray() and randomSparseMatrix(),
and "integer" for poissonSparseArray() and
poissonSparseMatrix().
Note
Unlike with Matrix::rsparsematrix() there's no
limit on the number of nonzero elements that can be contained in the
returned SparseArray object.
For example Matrix::rsparsematrix(3e5, 2e4, density=0.5) will fail
with an error but randomSparseMatrix(3e5, 2e4, density=0.5) should
work (even though it will take some time and the memory footprint of the
resulting object will be about 18 Gb).
See also
The
Matrix::rsparsematrixfunction in the Matrix package.The
stats::rpoisfunction in the stats package.SVT_SparseArray objects.
Examples
## ---------------------------------------------------------------------
## randomSparseArray() / randomSparseMatrix()
## ---------------------------------------------------------------------
set.seed(123)
dgcm1 <- rsparsematrix(2500, 950, density=0.1)
set.seed(123)
svt1 <- randomSparseMatrix(2500, 950, density=0.1)
svt1
#> <2500 x 950 SparseMatrix> of type "double" [nzcount=237500 (10%)]:
#> [,1] [,2] [,3] ... [,949] [,950]
#> [1,] 0 0 0 . 0 0
#> [2,] 0 0 0 . 0 0
#> [3,] 0 0 0 . 0 0
#> [4,] 0 0 0 . 0 0
#> [5,] 0 0 0 . 0 0
#> ... . . . . . .
#> [2496,] 0.00 0.00 0.00 . -0.18 0.00
#> [2497,] 0.00 0.00 0.00 . 0.00 0.00
#> [2498,] 0.00 0.00 0.00 . 0.00 0.00
#> [2499,] 0.00 0.00 0.00 . 0.00 0.00
#> [2500,] 0.00 0.53 0.00 . 0.00 0.00
type(svt1) # "double"
#> [1] "double"
stopifnot(identical(as(svt1, "dgCMatrix"), dgcm1))
## ---------------------------------------------------------------------
## poissonSparseArray() / poissonSparseMatrix()
## ---------------------------------------------------------------------
svt2 <- poissonSparseMatrix(2500, 950, density=0.1)
svt2
#> <2500 x 950 SparseMatrix> of type "integer" [nzcount=237276 (10%)]:
#> [,1] [,2] [,3] [,4] ... [,947] [,948] [,949] [,950]
#> [1,] 1 0 0 0 . 0 0 0 1
#> [2,] 1 0 0 0 . 0 0 0 0
#> [3,] 0 1 0 0 . 0 1 0 0
#> [4,] 0 1 0 0 . 0 0 0 0
#> [5,] 0 0 0 0 . 0 0 0 0
#> ... . . . . . . . . .
#> [2496,] 0 0 0 0 . 0 1 0 0
#> [2497,] 0 0 0 0 . 0 0 0 0
#> [2498,] 0 0 0 0 . 0 0 0 0
#> [2499,] 1 0 1 0 . 0 0 0 2
#> [2500,] 0 0 1 0 . 0 0 0 0
type(svt2) # "integer"
#> [1] "integer"
1 - sparsity(svt2) # very close to the requested density
#> [1] 0.09990568
set.seed(123)
svt3 <- poissonSparseArray(c(600, 1700, 80), lambda=0.01)
set.seed(123)
a3 <- array(rpois(length(svt3), lambda=0.01), dim(svt3))
stopifnot(identical(svt3, SparseArray(a3)))
## The memory footprint of 'svt3' is 10x smaller than that of 'a3':
object.size(svt3)
#> 20613424 bytes
object.size(a3)
#> 326400224 bytes
as.double(object.size(a3) / object.size(svt3))
#> [1] 15.83435