Write an array-like object to an HDF5 file
writeHDF5Array.Rd
A function for writing an array-like object to an HDF5 file.
Usage
writeHDF5Array(x, filepath=NULL, name=NULL,
H5type=NULL, chunkdim=NULL, level=NULL, as.sparse=NA,
with.dimnames=TRUE, verbose=NA)
Arguments
- x
The array-like object to write to an HDF5 file.
If
x
is a DelayedArray object,writeHDF5Array
realizes it on disk, that is, all the delayed operations carried by the object are executed while the object is written to disk. See "On-disk realization of a DelayedArray object as an HDF5 dataset" section below for more information.- filepath
NULL
or the path (as a single string) to the (new or existing) HDF5 file where to write the dataset. IfNULL
, then the dataset will be written to the current HDF5 dump file i.e. to the file whose path isgetHDF5DumpFile
.- name
NULL
or the name of the HDF5 dataset to write. IfNULL
, then the name returned bygetHDF5DumpName
will be used.- H5type
The H5 datatype to use for the HDF5 dataset to be written to the HDF5 file is automatically inferred from the type of
x
(type(x)
). Advanced users can override this by specifying the H5 datatype they want via theH5type
argument.See
rhdf5::h5const("H5T")
for a list of available H5 datatypes. See References section below for the link to the HDF Group's Support Portal where H5 predefined datatypes are documented.A typical use case is to use a datatype that is smaller than the automatic one in order to reduce the size of the dataset on disk. For example you could use
"H5T_IEEE_F32LE"
whentype(x)
is"double"
and you don't care about preserving the precision of 64-bit floating-point numbers (the automatic H5 datatype used for"double"
is"H5T_IEEE_F64LE"
). Another example is to use"H5T_STD_U16LE"
whenx
contains small non-negative integer values like counts (the automatic H5 datatype used for"integer"
is"H5T_STD_I32LE"
).- chunkdim
The dimensions of the chunks to use for writing the data to disk. By default (i.e. when
chunkdim
is set toNULL
),getHDF5DumpChunkDim(dim(x))
will be used. See?getHDF5DumpChunkDim
for more information.Set
chunkdim
to 0 to write unchunked data (a.k.a. contiguous data).- level
The compression level to use for writing the data to disk. By default,
getHDF5DumpCompressionLevel()
will be used. See?getHDF5DumpCompressionLevel
for more information.- as.sparse
Whether the data in the returned HDF5Array object should be flagged as sparse or not. If set to
NA
(the default), thenis_sparse(x)
is used.IMPORTANT NOTE: This only controls the
as.sparse
flag of the returned HDF5Array object. See man page of theHDF5Array()
constructor for more information. In particular this does NOT affect how the data will be laid out in the HDF5 file in any way (HDF5 doesn't natively support sparse storage at the moment). In other words, the data will always be stored in a dense format, even whenas.sparse
is set toTRUE
.- with.dimnames
Whether the dimnames on
x
should also be written to the HDF5 file or not.TRUE
by default.Note that
h5writeDimnames
is used internally to write the dimnames to disk. Settingwith.dimnames
toFALSE
and callingh5writeDimnames
is another way to write the dimnames onx
to disk that gives more control. See?h5writeDimnames
for more information.- verbose
Whether block processing progress should be displayed or not. If set to
NA
(the default), verbosity is controlled byDelayedArray:::get_verbose_block_processing()
. Settingverbose
toTRUE
orFALSE
overrides this.
Details
Please note that, depending on the size of the data to write to disk
and the performance of the disk, writeHDF5Array()
can take a
long time to complete. Use verbose=TRUE
to see its progress.
Use setHDF5DumpFile
and setHDF5DumpName
to
control the location of automatically created HDF5 datasets.
Use setHDF5DumpChunkLength
,
setHDF5DumpChunkShape
, and
setHDF5DumpCompressionLevel
, to control the
physical properties of automatically created HDF5 datasets.
Value
An HDF5Array object pointing to the newly written HDF5 dataset on disk.
On-disk realization of a DelayedArray object as an HDF5 dataset
When passed a DelayedArray object, writeHDF5Array
realizes it on disk, that is, all the delayed operations carried
by the object are executed on-the-fly while the object is written to disk.
This uses a block-processing strategy so that the full object is not
realized at once in memory. Instead the object is processed block by block
i.e. the blocks are realized in memory and written to disk one at a time.
In other words, writeHDF5Array(x, ...)
is semantically equivalent
to writeHDF5Array(as.array(x), ...)
, except that as.array(x)
is not called because this would realize the full object at once in memory.
See ?DelayedArray
for general information about
DelayedArray objects.
References
Documentation of the H5 predefined datatypes on the HDF Group's Support Portal: https://portal.hdfgroup.org/display/HDF5/Predefined+Datatypes
See also
HDF5Array objects.
h5writeDimnames
for writing the dimnames of an HDF5 dataset to disk.saveHDF5SummarizedExperiment
andloadHDF5SummarizedExperiment
in this package (the HDF5Array package) for saving/loading an HDF5-based SummarizedExperiment object to/from disk.HDF5-dump-management to control the location and physical properties of automatically created HDF5 datasets.
h5ls
to list the content of an HDF5 file.
Examples
## ---------------------------------------------------------------------
## WRITE AN ORDINARY ARRAY TO AN HDF5 FILE
## ---------------------------------------------------------------------
m0 <- matrix(runif(364, min=-1), nrow=26,
dimnames=list(letters, LETTERS[1:14]))
h5file <- tempfile(fileext=".h5")
M1 <- writeHDF5Array(m0, h5file, name="M1", chunkdim=c(5, 5))
M1
#> <26 x 14> HDF5Matrix object of type "double":
#> A B C ... M N
#> a -0.2415176 -0.1878519 0.1188636 . 0.09173722 -0.97734429
#> b -0.7996472 -0.2653078 0.5728390 . -0.58845059 -0.33130814
#> c 0.5510588 0.6076421 0.6109381 . 0.87479056 0.45154500
#> d -0.8835101 0.2675731 -0.3448681 . -0.67720860 -0.08890004
#> e -0.9128866 -0.3612784 -0.7045329 . -0.48803076 -0.27109176
#> . . . . . . .
#> v -0.15800392 0.94374470 0.03388799 . -0.33608094 0.23204147
#> w 0.40076755 -0.46812408 0.31903549 . 0.08697505 0.83354805
#> x -0.83867399 0.52069122 0.56992991 . -0.68192514 -0.51017577
#> y -0.57804483 -0.75715845 -0.47991500 . -0.86075243 0.89710130
#> z -0.78194136 -0.83990164 0.30505773 . 0.15409405 -0.66938476
chunkdim(M1)
#> [1] 5 5
## By default, writeHDF5Array() writes the dimnames to the HDF5 file:
dimnames(M1) # same as 'dimnames(m0)'
#> [[1]]
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
#>
#> [[2]]
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
#>
## Use 'with.dimnames=FALSE' to not write the dimnames to the file:
M1b <- writeHDF5Array(m0, h5file, name="M1b", with.dimnames=FALSE)
dimnames(M1b) # no dimnames
#> NULL
## With sparse data:
sm <- rsparsematrix(20, 8, density=0.1)
M2 <- writeHDF5Array(sm, h5file, name="M2", chunkdim=c(5, 5))
M2
#> <20 x 8> sparse HDF5Matrix object of type "double":
#> [,1] [,2] [,3] ... [,7] [,8]
#> [1,] 0.00 0.00 0.00 . 0.00 0.00
#> [2,] 0.00 0.00 0.66 . 0.00 0.00
#> [3,] 0.00 0.00 0.00 . 0.00 0.00
#> [4,] 0.00 0.00 0.00 . 0.00 0.00
#> [5,] -1.30 0.00 0.00 . 0.00 0.46
#> ... . . . . . .
#> [16,] 0.760 0.000 0.000 . 0.00 0.00
#> [17,] 0.000 0.000 0.000 . 0.00 0.00
#> [18,] -0.120 0.000 0.000 . 0.00 -0.64
#> [19,] 0.000 0.097 0.000 . 0.24 0.00
#> [20,] 0.000 0.000 0.000 . 0.00 0.00
is_sparse(M2) # TRUE
#> [1] TRUE
## ---------------------------------------------------------------------
## WRITE A DelayedArray OBJECT TO AN HDF5 FILE
## ---------------------------------------------------------------------
M3 <- log(t(DelayedArray(m0)) + 1)
M3 <- writeHDF5Array(M3, h5file, name="M3", chunkdim=c(5, 5))
M3
#> <14 x 26> HDF5Matrix object of type "double":
#> a b c ... y z
#> A -0.27643570 -1.60767569 0.43893780 . -0.86285620 -1.52299126
#> B -0.20807253 -0.30830366 0.47476855 . -1.41534612 -1.83196693
#> C 0.11231355 0.45288229 0.47681669 . -0.65376301 0.26624728
#> D 0.50309853 0.29427080 0.30847928 . 0.30508403 -3.10765286
#> E -0.13610159 -1.52728066 -0.02887988 . 0.05293947 -0.03455397
#> . . . . . . .
#> J 0.5047168 0.4204686 0.2668205 . -1.1184180 0.5872220
#> K 0.6326966 -1.2948789 -0.8031142 . -1.0752546 0.5189194
#> L 0.5539972 -0.3907982 0.1054876 . -0.1159769 -0.2439330
#> M 0.0877702 -0.8878262 0.6284970 . -1.9715019 0.1433157
#> N -3.7873434 -0.4024319 0.3726285 . 0.6403271 -1.1068000
chunkdim(M3)
#> [1] 5 5
library(h5vcData)
tally_file <- system.file("extdata", "example.tally.hfs5",
package="h5vcData")
h5ls(tally_file)
#> group name otype dclass dim
#> 0 / ExampleStudy H5I_GROUP
#> 1 /ExampleStudy 16 H5I_GROUP
#> 2 /ExampleStudy/16 Counts H5I_DATASET INTEGER 12 x 6 x 2 x 90354753
#> 3 /ExampleStudy/16 Coverages H5I_DATASET INTEGER 6 x 2 x 90354753
#> 4 /ExampleStudy/16 Deletions H5I_DATASET INTEGER 6 x 2 x 90354753
#> 5 /ExampleStudy/16 Reference H5I_DATASET INTEGER 90354753
#> 6 /ExampleStudy 22 H5I_GROUP
#> 7 /ExampleStudy/22 Counts H5I_DATASET INTEGER 12 x 6 x 2 x 51304566
#> 8 /ExampleStudy/22 Coverages H5I_DATASET INTEGER 6 x 2 x 51304566
#> 9 /ExampleStudy/22 Deletions H5I_DATASET INTEGER 6 x 2 x 51304566
#> 10 /ExampleStudy/22 Reference H5I_DATASET INTEGER 51304566
cvg0 <- HDF5Array(tally_file, "/ExampleStudy/16/Coverages")
cvg1 <- cvg0[ , , 29000001:29000007]
writeHDF5Array(cvg1, h5file, "cvg1")
#> <6 x 2 x 7> HDF5Array object of type "integer":
#> ,,1
#> [,1] [,2]
#> [1,] 19 40
#> [2,] 24 49
#> ... . .
#> [5,] 15 24
#> [6,] 22 30
#>
#> ...
#>
#> ,,7
#> [,1] [,2]
#> [1,] 18 30
#> [2,] 24 41
#> ... . .
#> [5,] 15 17
#> [6,] 21 18
#>
h5ls(h5file)
#> group name otype dclass dim
#> 0 / .M1_dimnames H5I_GROUP
#> 1 /.M1_dimnames 1 H5I_DATASET STRING 26
#> 2 /.M1_dimnames 2 H5I_DATASET STRING 14
#> 3 / .M2_dimnames H5I_GROUP
#> 4 / .M3_dimnames H5I_GROUP
#> 5 /.M3_dimnames 1 H5I_DATASET STRING 14
#> 6 /.M3_dimnames 2 H5I_DATASET STRING 26
#> 7 / M1 H5I_DATASET FLOAT 26 x 14
#> 8 / M1b H5I_DATASET FLOAT 26 x 14
#> 9 / M2 H5I_DATASET FLOAT 20 x 8
#> 10 / M3 H5I_DATASET FLOAT 14 x 26
#> 11 / cvg1 H5I_DATASET INTEGER 6 x 2 x 7