Read/write a sparse matrix from/to a CSV file
readSparseCSV.Rd
Read/write a sparse matrix from/to a CSV (comma-separated values) file.
Usage
writeSparseCSV(x, filepath, sep=",", transpose=FALSE, write.zeros=FALSE,
chunknrow=250)
readSparseCSV(filepath, sep=",", transpose=FALSE)
Arguments
- x
A matrix-like object, typically sparse. IMPORTANT: The object must have rownames and colnames! These will be written to the file.
Another requirement is that the object must be subsettable. More precisely: it must support 2D-style subsetting of the kind
x[i, ]
andx[ , j]
wherei
andj
are integer vectors of valid row and column indices.- filepath
The path (as a single string) to the file where to write the matrix-like object or to read it from. Compressed files are supported.
If
""
,writeSparseCSV()
will write the data to the standard output connection.Note that
filepath
can also be a connection.- sep
The field separator character. Values on each line of the file are separated by this character.
- transpose
TRUE
orFALSE
. By default, rows in the matrix-like object correspond to lines in the CSV file. Settranspose
toTRUE
to transpose the matrix-like object on-the-fly, that is, to have its columns written to or read from the lines in the CSV file.Note that using
transpose=TRUE
is semantically equivalent to callingt()
on the object before writing it or after reading it, but it will tend to be more efficient. Also it will work even ifx
does not supportt()
(not all matrix-like objects are guaranteed to be transposable).- write.zeros
TRUE
orFALSE
. By default, the zero values inx
are not written to the file. Setwrite.zeros
toTRUE
to write them.- chunknrow
writeSparseCSV()
uses a block-processing strategy to try to speed up things. By default blocks of 250 rows (or columns iftranspose=TRUE
) are used. In our experience trying to increase this (e.g. to 500 or more) will generally not produce significant benefits while it will increase memory usage, so use carefully.
Value
writeSparseCSV
returns an invisible NULL
.
readSparseCSV
returns a SparseMatrix object of class
SVT_SparseMatrix.
See also
SparseArray objects.
dgCMatrix-class in the Matrix package.
Examples
## ---------------------------------------------------------------------
## writeSparseCSV()
## ---------------------------------------------------------------------
## Prepare toy matrix 'm0':
rownames0 <- LETTERS[1:6]
colnames0 <- letters[1:4]
m0 <- matrix(0L, nrow=length(rownames0), ncol=length(colnames0),
dimnames=list(rownames0, colnames0))
m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L
m0
#> a b c d
#> A 10 0 0 0
#> B 20 30 0 0
#> C 0 0 50 0
#> D 0 40 60 0
#> E 0 0 70 0
#> F 0 0 0 80
## writeSparseCSV():
writeSparseCSV(m0, filepath="", sep="\t")
#> a b c d
#> A 10
#> B 20 30
#> C 50
#> D 40 60
#> E 70
#> F 80
writeSparseCSV(m0, filepath="", sep="\t", write.zeros=TRUE)
#> a b c d
#> A 10 0 0 0
#> B 20 30 0 0
#> C 0 0 50 0
#> D 0 40 60 0
#> E 0 0 70 0
#> F 0 0 0 80
writeSparseCSV(m0, filepath="", sep="\t", transpose=TRUE)
#> A B C D E F
#> a 10 20
#> b 30 40
#> c 50 60 70
#> d 80
## Note that writeSparseCSV() will automatically (and silently) coerce
## non-integer values to integer by passing them thru as.integer().
## Example where type(x) is "double":
m1 <- m0 * runif(length(m0))
m1
#> a b c d
#> A 8.244634 0.00000 0.000000 0.00000
#> B 16.774056 27.65383 0.000000 0.00000
#> C 0.000000 0.00000 3.269299 0.00000
#> D 0.000000 19.89808 22.160415 0.00000
#> E 0.000000 0.00000 27.650126 0.00000
#> F 0.000000 0.00000 0.000000 52.67919
type(m1)
#> [1] "double"
writeSparseCSV(m1, filepath="", sep="\t")
#> a b c d
#> A 8
#> B 16 27
#> C 3
#> D 19 22
#> E 27
#> F 52
## Example where type(x) is "logical":
writeSparseCSV(m0 != 0, filepath="", sep="\t")
#> a b c d
#> A 1
#> B 1 1
#> C 1
#> D 1 1
#> E 1
#> F 1
## Example where type(x) is "raw":
m2 <- m0
type(m2) <- "raw"
m2
#> a b c d
#> A 0a 00 00 00
#> B 14 1e 00 00
#> C 00 00 32 00
#> D 00 28 3c 00
#> E 00 00 46 00
#> F 00 00 00 50
writeSparseCSV(m2, filepath="", sep="\t")
#> a b c d
#> A 10
#> B 20 30
#> C 50
#> D 40 60
#> E 70
#> F 80
## ---------------------------------------------------------------------
## readSparseCSV()
## ---------------------------------------------------------------------
csv_file <- tempfile()
writeSparseCSV(m0, csv_file)
svt1 <- readSparseCSV(csv_file)
svt1
#> <6 x 4 SparseMatrix> of type "integer" [nzcount=8 (33%)]:
#> a b c d
#> A 10 0 0 0
#> B 20 30 0 0
#> C 0 0 50 0
#> D 0 40 60 0
#> E 0 0 70 0
#> F 0 0 0 80
svt2 <- readSparseCSV(csv_file, transpose=TRUE)
svt2
#> <4 x 6 SparseMatrix> of type "integer" [nzcount=8 (33%)]:
#> A B C D E F
#> a 10 20 0 0 0 0
#> b 0 30 0 40 0 0
#> c 0 0 50 60 70 0
#> d 0 0 0 0 0 80
## If you need the sparse data as a dgCMatrix object, just coerce the
## returned object:
as(svt1, "dgCMatrix")
#> 6 x 4 sparse Matrix of class "dgCMatrix"
#> a b c d
#> A 10 . . .
#> B 20 30 . .
#> C . . 50 .
#> D . 40 60 .
#> E . . 70 .
#> F . . . 80
as(svt2, "dgCMatrix")
#> 4 x 6 sparse Matrix of class "dgCMatrix"
#> A B C D E F
#> a 10 20 . . . .
#> b . 30 . 40 . .
#> c . . 50 60 70 .
#> d . . . . . 80
## Sanity checks:
stopifnot(identical(m0, as.matrix(svt1)))
stopifnot(identical(t(m0), as.matrix(svt2)))