Write a matrix-like object as an HDF5-based sparse matrix
writeTENxMatrix.Rd
The 1.3 Million Brain Cell Dataset and other datasets published by 10x Genomics use an HDF5-based sparse matrix representation instead of the conventional (a.k.a. dense) HDF5 representation.
writeTENxMatrix
writes a matrix-like object to this format.
IMPORTANT NOTE: Only use writeTENxMatrix
if the matrix-like
object to write is sparse, that is, if most of its elements are zero.
Using writeTENxMatrix
on dense data is very inefficient!
In this case, you should use writeHDF5Array
instead.
Arguments
- x
The matrix-like object to write to an HDF5 file.
The object to write should typically be sparse, that is, most of its elements should be zero.
If
x
is a DelayedMatrix object,writeTENxMatrix
realizes it on disk, that is, all the delayed operations carried by the object are executed while the object is written to disk.- filepath
NULL
or the path (as a single string) to the (new or existing) HDF5 file where to write the data. IfNULL
, then the data will be written to the current HDF5 dump file i.e. to the file whose path isgetHDF5DumpFile
.- group
NULL
or the name of the HDF5 group where to write the data. IfNULL
, then the name returned bygetHDF5DumpName
will be used.- level
The compression level to use for writing the data to disk. By default,
getHDF5DumpCompressionLevel()
will be used. See?getHDF5DumpCompressionLevel
for more information.- verbose
Whether block processing progress should be displayed or not. If set to
NA
(the default), verbosity is controlled byDelayedArray:::get_verbose_block_processing()
. Settingverbose
toTRUE
orFALSE
overrides this.
Details
Please note that, depending on the size of the data to write to disk
and the performance of the disk, writeTENxMatrix
can take a long
time to complete. Use verbose=TRUE
to see its progress.
Use setHDF5DumpFile
and setHDF5DumpName
to
control the location of automatically created HDF5 datasets.
Value
A TENxMatrix object pointing to the newly written HDF5 data on disk.
See also
TENxMatrix objects.
The
TENxBrainData
dataset (in the TENxBrainData package).HDF5-dump-management to control the location and physical properties of automatically created HDF5 datasets.
h5ls
to list the content of an HDF5 file.
Examples
## ---------------------------------------------------------------------
## A SIMPLE EXAMPLE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=25, ncol=12,
dimnames=list(letters[1:25], LETTERS[1:12]))
m0[cbind(2:24, c(12:1, 2:12))] <- 100L + sample(55L, 23, replace=TRUE)
out_file <- tempfile()
M0 <- writeTENxMatrix(m0, out_file, group="m0")
M0
#> <25 x 12> sparse TENxMatrix object of type "integer":
#> A B C D ... I J K L
#> a 0 0 0 0 . 0 0 0 0
#> b 0 0 0 0 . 0 0 0 135
#> c 0 0 0 0 . 0 0 136 0
#> d 0 0 0 0 . 0 155 0 0
#> e 0 0 0 0 . 119 0 0 0
#> . . . . . . . . . .
#> u 0 0 0 0 . 131 0 0 0
#> v 0 0 0 0 . 0 111 0 0
#> w 0 0 0 0 . 0 0 120 0
#> x 0 0 0 0 . 0 0 0 149
#> y 0 0 0 0 . 0 0 0 0
sparsity(M0)
#> [1] 0.9233333
path(M0) # same as 'out_file'
#> [1] "/tmp/RtmpYQEKxJ/filef48ae5ab7d"
## Use h5ls() to list the content of this HDF5 file:
h5ls(path(M0))
#> group name otype dclass dim
#> 0 / m0 H5I_GROUP
#> 1 /m0 barcodes H5I_DATASET STRING 12
#> 2 /m0 data H5I_DATASET INTEGER 23
#> 3 /m0 genes H5I_DATASET STRING 25
#> 4 /m0 indices H5I_DATASET INTEGER 23
#> 5 /m0 indptr H5I_DATASET INTEGER 13
#> 6 /m0 shape H5I_DATASET INTEGER 2
## ---------------------------------------------------------------------
## USING THE "1.3 Million Brain Cell Dataset"
## ---------------------------------------------------------------------
## The 1.3 Million Brain Cell Dataset from 10x Genomics is available via
## ExperimentHub:
library(ExperimentHub)
hub <- ExperimentHub()
query(hub, "TENxBrainData")
#> ExperimentHub with 8 records
#> # snapshotDate(): 2025-04-12
#> # $dataprovider: 10X Genomics
#> # $species: Mus musculus
#> # $rdataclass: character
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH1039"]]'
#>
#> title
#> EH1039 | Brain scRNA-seq data, 'HDF5-based 10X Genomics' format
#> EH1040 | Brain scRNA-seq data, 'dense matrix' format
#> EH1041 | Brain scRNA-seq data, sample (column) annotation
#> EH1042 | Brain scRNA-seq data, gene (row) annotation
#> EH1689 | Brain scRNA-seq data 20k subset, 'HDF5-based 10x Genomics' format
#> EH1690 | Brain scRNA-seq data 20k subset, 'dense matrix' format
#> EH1691 | Brain scRNA-seq data 20k subset, sample (column) annotation
#> EH1692 | Brain scRNA-seq data 20k subset, gene (row) annotation
fname <- hub[["EH1039"]]
#> see ?TENxBrainData and browseVignettes('TENxBrainData') for documentation
#> loading from cache
oneM <- TENxMatrix(fname, group="mm10") # see ?TENxMatrix for the details
oneM
#> <27998 x 1306127> sparse TENxMatrix object of type "integer":
#> AAACCTGAGATAGGAG-1 ... TTTGTCATCTGAAAGA-133
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0 . 0
#> ENSMUSG00000095041 1 . 0
#> ENSMUSG00000063897 0 . 0
#> ENSMUSG00000096730 0 . 0
#> ENSMUSG00000095742 0 . 0
## Note that the following transformation preserves sparsity:
M2 <- log(oneM + 1) # delayed
M2 # a DelayedMatrix instance
#> <27998 x 1306127> sparse DelayedMatrix object of type "double":
#> AAACCTGAGATAGGAG-1 ... TTTGTCATCTGAAAGA-133
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0.0000000 . 0
#> ENSMUSG00000095041 0.6931472 . 0
#> ENSMUSG00000063897 0.0000000 . 0
#> ENSMUSG00000096730 0.0000000 . 0
#> ENSMUSG00000095742 0.0000000 . 0
## In order to reduce computation times, we'll write only the first
## 5000 columns of M2 to disk:
out_file <- tempfile()
M3 <- writeTENxMatrix(M2[ , 1:5000], out_file, group="mm10", verbose=TRUE)
#> / reading and realizing sparse block 1/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 2/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 3/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 4/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 5/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 6/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 7/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 8/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 9/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 10/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 11/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 12/12 ...
#> ok
#> \ Writing it ...
#> OK
#> sparsity: 0.93
M3 # a TENxMatrix instance
#> <27998 x 5000> sparse TENxMatrix object of type "double":
#> AAACCTGAGATAGGAG-1 ... CTGGTCTGTGAGGCTA-1
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0.0000000 . 0.000000
#> ENSMUSG00000095041 0.6931472 . 1.386294
#> ENSMUSG00000063897 0.0000000 . 0.000000
#> ENSMUSG00000096730 0.0000000 . 0.000000
#> ENSMUSG00000095742 0.0000000 . 0.000000