Skip to contents

The H5File class provides a formal representation of an HDF5 file (local or remote).

Usage

## Constructor function:
H5File(filepath, s3=FALSE, s3credentials=NULL, .no_rhdf5_h5id=FALSE)

Arguments

filepath

A single string specifying the path or URL to an HDF5 file.

s3

TRUE or FALSE. Should the filepath argument be treated as the URL to a file stored in an Amazon S3 bucket, rather than the path to a local file?

s3credentials

A list of length 3, providing the credentials for accessing files stored in a private Amazon S3 bucket. See ?H5Pset_fapl_ros3 in the rhdf5 package for more information.

.no_rhdf5_h5id

For internal use only. Don't use.

Details

IMPORTANT NOTE ABOUT H5File OBJECTS AND PARALLEL EVALUATION

The short story is that H5File objects cannot be used in the context of parallel evaluation at the moment.

Here is why:

H5File objects contain an identifier to an open connection to the HDF5 file. This identifier becomes invalid in the 2 following situations:

  • After serialization/deserialization, that is, after loading a serialized H5File object with readRDS() or load().

  • In the context of parallel evaluation, when using the SnowParam parallelization backend. This is because, unlike the MulticoreParam backend which used a system fork, the SnowParam backend uses serialization/deserialization to transmit the object to the workers.

In both cases, the connection to the file is lost and any attempt to read data from the H5File object will fail. Note that the above also happens to any H5File object that got serialized indirectly i.e. as part of a bigger object. For example, if an HDF5Array object was constructed from an H5File object, then it contains the H5File object and therefore blockApply(..., BPPARAM=SnowParam(4)) cannot be used on it.

Furthermore, even if sometimes an H5File object seems to work fine with the MulticoreParam parallelization backend, this is highly unreliable and must be avoided.

Value

An H5File object.

See also

  • H5Pset_fapl_ros3 in the rhdf5 package for detailed information about how to pass your S3 credentials to the s3credentials argument.

  • The HDF5Array class defined in the HDF5Array package for representing and operating on a conventional (a.k.a. dense) HDF5 dataset.

  • The H5SparseMatrix class defined in the HDF5Array package for representing and operating on an HDF5 sparse matrix.

  • The H5ADMatrix class defined in the HDF5Array package for representing and operating on the central matrix of an h5ad file, or any matrix in its /layers group.

  • The TENxMatrix class defined in the HDF5Array package for representing and operating on a 10x Genomics dataset.

  • The h5mread function in this package (h5mread) that is used internally by HDF5Array, TENxMatrix, and H5ADMatrix objects, for (almost) all their data reading needs.

  • h5ls to list the content of an HDF5 file.

  • bplapply, MulticoreParam, and SnowParam, in the BiocParallel package.

Examples

## ---------------------------------------------------------------------
## A. BASIC USAGE
## ---------------------------------------------------------------------

## With a local file:
test_h5 <- system.file("extdata", "test.h5", package="h5mread")
h5file1 <- H5File(test_h5)
h5ls(h5file1)
#>           group         name       otype  dclass          dim
#> 0             / .m2_dimnames   H5I_GROUP                     
#> 1 /.m2_dimnames            1 H5I_DATASET  STRING         4000
#> 2 /.m2_dimnames            2 H5I_DATASET  STRING           90
#> 3             /           a3 H5I_DATASET INTEGER 180 x 75 x 4
#> 4             /           m1 H5I_DATASET INTEGER       12 x 5
#> 5             /           m2 H5I_DATASET   FLOAT    4000 x 90
#> 6             /           m4 H5I_DATASET INTEGER    28 x 4000
#> 7             /       rwords H5I_DATASET  STRING        30000
path(h5file1)
#> [1] "/tmp/RtmpFXHjrt/temp_libpathc9d33e3a6bd/h5mread/extdata/test.h5"

h5mread(h5file1, "m2", list(1:10, 1:6))
#>             [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#>  [1,] -2.1242248         NA        NaN -2.5779525  1.9532461  4.9112339
#>  [2,]  2.8830514        Inf       -Inf -0.1996812 -1.2167090 -1.9776935
#>  [3,] -0.9102308  0.9364751  2.4219044  3.4048459  4.6469619 -0.6624099
#>  [4,]  3.8301740  3.0719664 -2.7439101  4.9207697  0.2735301 -3.3947909
#>  [5,]  4.4046728 -2.0594922  2.4889107 -2.5576531 -4.5567993  3.2302671
#>  [6,] -4.5444350 -3.5891478  3.2486851  1.2303872  1.7265274 -2.9190945
#>  [7,]  0.2810549  3.8862110 -4.0045662 -0.4121020  1.1311564 -2.1503403
#>  [8,]  3.9241904 -4.9171500 -2.2412863  3.4837234 -0.7017639  0.4620889
#>  [9,]  0.5143501  0.6912065 -4.7862322  2.2703366 -1.0607908  0.8904540
#> [10,] -0.4338526  4.6755019  0.9624297 -3.0869540  2.6021692 -0.5774682
get_h5mread_returned_type(h5file1, "m2")
#> [1] "double"

## With a file stored in an Amazon S3 bucket:
if (Sys.info()[["sysname"]] != "Darwin") {
  public_S3_url <-
   "https://rhdf5-public.s3.eu-central-1.amazonaws.com/rhdf5ex_t_float_3d.h5"
  h5file2 <- H5File(public_S3_url, s3=TRUE)
  h5ls(h5file2)

  h5mread(h5file2, "a1")
  get_h5mread_returned_type(h5file2, "a1")
}
#> [1] "double"

## ---------------------------------------------------------------------
## B. H5File OBJECTS AND PARALLEL EVALUATION
## ---------------------------------------------------------------------
## H5File objects cannot be used in the context of parallel evaluation
## at the moment!

library(BiocParallel)

FUN1 <- function(i, h5file, name)
    sum(h5mread::h5mread(h5file, name, list(i, NULL)))

FUN2 <- function(i, h5file, name)
    sum(h5mread::h5mread(h5file, name, list(i, NULL, NULL)))

## With the SnowParam parallelization backend, the H5File object
## does NOT work on the workers:
if (FALSE) { # \dontrun{
## ERROR!
res1 <- bplapply(1:150, FUN1, h5file1, "m2", BPPARAM=SnowParam(3))
## ERROR!
res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=SnowParam(3))
} # }

## With the MulticoreParam parallelization backend, the H5File object
## might seem to work on the workers. However this is highly unreliable
## and must be avoided:
if (FALSE) { # \dontrun{
if (.Platform$OS.type != "windows") {
  ## UNRELIABLE!
  res1 <- bplapply(1:150, FUN1, h5file1, "m2", BPPARAM=MulticoreParam(3))
  ## UNRELIABLE!
  res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=MulticoreParam(3))
}
} # }