Create simplified representation of ragged assay data. — assay-functions • RaggedExperiment

These methods transform assay() from the default (i.e., sparseAssay()) representation to various forms of more dense representation. compactAssay() collapses identical ranges across samples into a single row. disjoinAssay() creates disjoint (non-overlapping) regions, simplifies values within each sample in a user-specified manner, and returns a matrix of disjoint regions x samples.

This method transforms assay() from the default (i.e., sparseAssay()) representation to a reduced representation summarizing each original range overlapping ranges in query. Reduction in each cell can be tailored to indivdual needs using the simplifyReduce functional argument.

Usage

sparseAssay(
  x,
  i = 1,
  withDimnames = TRUE,
  background = NA_integer_,
  sparse = FALSE
)

compactAssay(
  x,
  i = 1,
  withDimnames = TRUE,
  background = NA_integer_,
  sparse = FALSE
)

disjoinAssay(
  x,
  simplifyDisjoin,
  i = 1,
  withDimnames = TRUE,
  background = NA_integer_
)

qreduceAssay(
  x,
  query,
  simplifyReduce,
  i = 1,
  withDimnames = TRUE,
  background = NA_integer_
)

Arguments

x

A RaggedExperiment object

i

integer(1) or character(1) name of assay to be transformed.

withDimnames

logical(1) include dimnames on the returned matrix. When there are no explict rownames, these are manufactured with as.character(rowRanges(x)); rownames are always manufactured for compactAssay() and disjoinAssay().

background

A value (default NA) for the returned matrix after *Assay operations

sparse

logical(1) whether to return a sparseMatrix representation

simplifyDisjoin

A function / functional operating on a *List, where the elements of the list are all within-sample assay values from ranges overlapping each disjoint range. For instance, to use the simplifyDisjoin=mean of overlapping ranges, where ranges are characterized by integer-valued scores, the entries are calculated as


                    a
    original: |-----------|
                        b
                   |----------|

                a    a, b   b
    disjoint: |----|------|---|

    values <- IntegerList(a, c(a, b), b)
    simplifyDisjoin(values)

query

GRanges providing regions over which reduction is to occur.

simplifyReduce

A function / functional accepting arguments score, range, and qrange:

score A *List, where each list element corresponds to a cell in the matrix to be returned by qreduceAssay. Vector elements correspond to ranges overlapping query. The *List objects support many vectorized mathematical operations, so simplifyReduce can be implemented efficiently.
range A GRangesList instance, 'parallel' to score. Each element of the list corresponds to a cell in the matrix to be returned by qreduceAssay. Each range in the element corresponds to the range for which the score element applies.
qrange A GRanges instance with the same length as unlist(score), providing the query range window to which the corresponding scores apply.

Value

sparseAssay(): A matrix() with dimensions dim(x). Elements contain the assay value for the ith range and jth sample. Use 'sparse=TRUE' to obtain a sparseMatrix assay representation.

compactAssay(): Samples with identical range are placed in the same row. Non-disjoint ranges are NOT collapsed. Use 'sparse=TRUE' to obtain a sparseMatrix assay representation.

disjoinAssay(): A matrix with number of rows equal to number of disjoint ranges across all samples. Elements of the matrix are summarized by applying simplifyDisjoin() to assay values of overlapping ranges

qreduceAssay(): A matrix() with dimensions length(query) x ncol(x). Elements contain assay values for the ith query range and jth sample, summarized according to the function simplifyReduce.

Examples

re4 <- RaggedExperiment(GRangesList(
    GRanges(c(A = "chr1:1-10:-", B = "chr1:8-14:-", C = "chr2:15-18:+"),
        score = 3:5),
    GRanges(c(D = "chr1:1-10:-", E = "chr2:11-18:+"), score = 1:2)
), colData = DataFrame(id = 1:2))

query <- GRanges(c("chr1:1-14:-", "chr2:11-18:+"))

weightedmean <- function(scores, ranges, qranges)
{
    ## weighted average score per query range
    ## the weight corresponds to the size of the overlap of each
    ## overlapping subject range with the corresponding query range
    isects <- pintersect(ranges, qranges)
    sum(scores * width(isects)) / sum(width(isects))
}

qreduceAssay(re4, query, weightedmean)
#>                  [,1] [,2]
#> chr1:1-14:-  3.411765    1
#> chr2:11-18:+ 5.000000    2

if (FALSE) { # \dontrun{
    ## Extended example: non-silent mutations, summarized by genic
    ## region
    suppressPackageStartupMessages({
        library(TxDb.Hsapiens.UCSC.hg19.knownGene)
        library(org.Hs.eg.db)
        library(GenomeInfoDb)
        library(MultiAssayExperiment)
        library(curatedTCGAData)
        library(TCGAutils)
    })

    ## TCGA MultiAssayExperiment with RaggedExperiment data
    mae <- curatedTCGAData("ACC", c("RNASeq2GeneNorm", "CNASNP", "Mutation"),
        version = "1.1.38", dry.run = FALSE)

    ## genomic coordinates
    gn <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
    gn <- keepStandardChromosomes(granges(gn), pruning.mode="coarse")
    seqlevelsStyle(gn) <- "NCBI"
    genome(gn)
    gn <- unstrand(gn)

    ## reduce mutations, marking any genomic range with non-silent
    ## mutation as FALSE
    nonsilent <- function(scores, ranges, qranges)
        any(scores != "Silent")
    mre <- mae[["ACC_Mutation-20160128"]]
    seqlevelsStyle(rowRanges(mre)) <- "NCBI"
    ## hack to make genomes match
    genome(mre) <- paste0(correctBuild(unique(genome(mre)), "NCBI"), ".p13")
    mutations <- qreduceAssay(mre, gn, nonsilent, "Variant_Classification")
    genome(mre) <- correctBuild(unique(genome(mre)), "NCBI")

    ## reduce copy number
    re <- mae[["ACC_CNASNP-20160128"]]
    class(re)
    ## [1] "RaggedExperiment"
    seqlevelsStyle(re) <- "NCBI"
    genome(re) <- "GRCh37.p13"
    cn <- qreduceAssay(re, gn, weightedmean, "Segment_Mean")
    genome(re) <- "GRCh37"

    ## ALTERNATIVE
    ##
    ## TCGAutils helper function to convert RaggedExperiment objects to
    ## RangedSummarizedExperiment based on annotated gene ranges
    mae2 <- mae
    mae2[[1L]] <- re
    mae2[[2L]] <- mre
    qreduceTCGA(mae2)
} # }