Write a matrix-like object as an HDF5-based sparse matrix
writeTENxMatrix.RdThe 1.3 Million Brain Cell Dataset and other datasets published by 10x Genomics use an HDF5-based sparse matrix representation instead of the conventional (a.k.a. dense) HDF5 representation.
writeTENxMatrix writes a matrix-like object to this format.
IMPORTANT NOTE: Only use writeTENxMatrix if the matrix-like
object to write is sparse, that is, if most of its elements are zero.
Using writeTENxMatrix on dense data is very inefficient!
In this case, you should use writeHDF5Array instead.
Arguments
- x
The matrix-like object to write to an HDF5 file.
The object to write should typically be sparse, that is, most of its elements should be zero.
If
xis a DelayedMatrix object,writeTENxMatrixrealizes it on disk, that is, all the delayed operations carried by the object are executed while the object is written to disk.- filepath
NULLor the path (as a single string) to the (new or existing) HDF5 file where to write the data. IfNULL, then the data will be written to the current HDF5 dump file i.e. to the file whose path isgetHDF5DumpFile.- group
NULLor the name of the HDF5 group where to write the data. IfNULL, then the name returned bygetHDF5DumpNamewill be used.- level
The compression level to use for writing the data to disk. By default,
getHDF5DumpCompressionLevel()will be used. See?getHDF5DumpCompressionLevelfor more information.- verbose
Whether block processing progress should be displayed or not. If set to
NA(the default), verbosity is controlled byDelayedArray:::get_verbose_block_processing(). SettingverbosetoTRUEorFALSEoverrides this.
Details
Please note that, depending on the size of the data to write to disk
and the performance of the disk, writeTENxMatrix can take a long
time to complete. Use verbose=TRUE to see its progress.
Use setHDF5DumpFile and setHDF5DumpName to
control the location of automatically created HDF5 datasets.
Value
A TENxMatrix object pointing to the newly written HDF5 data on disk.
See also
TENxMatrix objects.
The
TENxBrainDatadataset (in the TENxBrainData package).HDF5-dump-management to control the location and physical properties of automatically created HDF5 datasets.
h5lsto list the content of an HDF5 file.
Examples
## ---------------------------------------------------------------------
## A SIMPLE EXAMPLE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=25, ncol=12,
dimnames=list(letters[1:25], LETTERS[1:12]))
m0[cbind(2:24, c(12:1, 2:12))] <- 100L + sample(55L, 23, replace=TRUE)
out_file <- tempfile()
M0 <- writeTENxMatrix(m0, out_file, group="m0")
M0
#> <25 x 12> sparse TENxMatrix object of type "integer":
#> A B C D ... I J K L
#> a 0 0 0 0 . 0 0 0 0
#> b 0 0 0 0 . 0 0 0 135
#> c 0 0 0 0 . 0 0 136 0
#> d 0 0 0 0 . 0 155 0 0
#> e 0 0 0 0 . 119 0 0 0
#> . . . . . . . . . .
#> u 0 0 0 0 . 131 0 0 0
#> v 0 0 0 0 . 0 111 0 0
#> w 0 0 0 0 . 0 0 120 0
#> x 0 0 0 0 . 0 0 0 149
#> y 0 0 0 0 . 0 0 0 0
sparsity(M0)
#> [1] 0.9233333
path(M0) # same as 'out_file'
#> [1] "/tmp/RtmpYQEKxJ/filef48ae5ab7d"
## Use h5ls() to list the content of this HDF5 file:
h5ls(path(M0))
#> group name otype dclass dim
#> 0 / m0 H5I_GROUP
#> 1 /m0 barcodes H5I_DATASET STRING 12
#> 2 /m0 data H5I_DATASET INTEGER 23
#> 3 /m0 genes H5I_DATASET STRING 25
#> 4 /m0 indices H5I_DATASET INTEGER 23
#> 5 /m0 indptr H5I_DATASET INTEGER 13
#> 6 /m0 shape H5I_DATASET INTEGER 2
## ---------------------------------------------------------------------
## USING THE "1.3 Million Brain Cell Dataset"
## ---------------------------------------------------------------------
## The 1.3 Million Brain Cell Dataset from 10x Genomics is available via
## ExperimentHub:
library(ExperimentHub)
hub <- ExperimentHub()
query(hub, "TENxBrainData")
#> ExperimentHub with 8 records
#> # snapshotDate(): 2025-04-12
#> # $dataprovider: 10X Genomics
#> # $species: Mus musculus
#> # $rdataclass: character
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH1039"]]'
#>
#> title
#> EH1039 | Brain scRNA-seq data, 'HDF5-based 10X Genomics' format
#> EH1040 | Brain scRNA-seq data, 'dense matrix' format
#> EH1041 | Brain scRNA-seq data, sample (column) annotation
#> EH1042 | Brain scRNA-seq data, gene (row) annotation
#> EH1689 | Brain scRNA-seq data 20k subset, 'HDF5-based 10x Genomics' format
#> EH1690 | Brain scRNA-seq data 20k subset, 'dense matrix' format
#> EH1691 | Brain scRNA-seq data 20k subset, sample (column) annotation
#> EH1692 | Brain scRNA-seq data 20k subset, gene (row) annotation
fname <- hub[["EH1039"]]
#> see ?TENxBrainData and browseVignettes('TENxBrainData') for documentation
#> loading from cache
oneM <- TENxMatrix(fname, group="mm10") # see ?TENxMatrix for the details
oneM
#> <27998 x 1306127> sparse TENxMatrix object of type "integer":
#> AAACCTGAGATAGGAG-1 ... TTTGTCATCTGAAAGA-133
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0 . 0
#> ENSMUSG00000095041 1 . 0
#> ENSMUSG00000063897 0 . 0
#> ENSMUSG00000096730 0 . 0
#> ENSMUSG00000095742 0 . 0
## Note that the following transformation preserves sparsity:
M2 <- log(oneM + 1) # delayed
M2 # a DelayedMatrix instance
#> <27998 x 1306127> sparse DelayedMatrix object of type "double":
#> AAACCTGAGATAGGAG-1 ... TTTGTCATCTGAAAGA-133
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0.0000000 . 0
#> ENSMUSG00000095041 0.6931472 . 0
#> ENSMUSG00000063897 0.0000000 . 0
#> ENSMUSG00000096730 0.0000000 . 0
#> ENSMUSG00000095742 0.0000000 . 0
## In order to reduce computation times, we'll write only the first
## 5000 columns of M2 to disk:
out_file <- tempfile()
M3 <- writeTENxMatrix(M2[ , 1:5000], out_file, group="mm10", verbose=TRUE)
#> / reading and realizing sparse block 1/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 2/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 3/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 4/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 5/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 6/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 7/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 8/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 9/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 10/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 11/12 ...
#> ok
#> \ Writing it ...
#> OK
#> / reading and realizing sparse block 12/12 ...
#> ok
#> \ Writing it ...
#> OK
#> sparsity: 0.93
M3 # a TENxMatrix instance
#> <27998 x 5000> sparse TENxMatrix object of type "double":
#> AAACCTGAGATAGGAG-1 ... CTGGTCTGTGAGGCTA-1
#> ENSMUSG00000051951 0 . 0
#> ENSMUSG00000089699 0 . 0
#> ENSMUSG00000102343 0 . 0
#> ENSMUSG00000025900 0 . 0
#> ENSMUSG00000109048 0 . 0
#> ... . . .
#> ENSMUSG00000079808 0.0000000 . 0.000000
#> ENSMUSG00000095041 0.6931472 . 1.386294
#> ENSMUSG00000063897 0.0000000 . 0.000000
#> ENSMUSG00000096730 0.0000000 . 0.000000
#> ENSMUSG00000095742 0.0000000 . 0.000000