Skip to contents

This function returns a BAM file representing reads overlapping regions specified either as chromosomal regions or as gencode gene symbols.

Usage

slicing(
  uuid,
  regions,
  symbols,
  destination = file.path(tempdir(), paste0(uuid, ".bam")),
  overwrite = FALSE,
  progress = interactive(),
  token = gdc_token(),
  legacy = FALSE
)

Arguments

uuid

character(1) identifying the BAM file resource

regions

character() vector describing chromosomal regions, e.g., c("chr1", "chr2:10000", "chr3:10000-20000") (all of chromosome 1, chromosome 2 from position 10000 to the end, chromosome 3 from 10000 to 20000).

symbols

character() vector of gencode gene symbols, e.g., c("BRCA1", "PTEN")

destination

character(1) default tempfile() file path for BAM file slice

overwrite

logical(1) default FALSE can destination be overwritten?

progress

logical(1) default interactive() should a progress bar be used?

token

character(1) security token allowing access to restricted data. Almost all BAM data is restricted, so a token is usually required. See https://docs.gdc.cancer.gov/Data/Data_Security/Data_Security/#authentication-tokens.

legacy

logical(1) whether or not to use the "legacy" archive, containing older, non-harmonized data. Slicing of unharmonized legacy BAM files is not supported. See https://docs.gdc.cancer.gov/API/Users_Guide/BAM_Slicing/.

Value

character(1) destination to the downloaded BAM file

Details

This function uses the Genomic Data Commons "slicing" API to get portions of a BAM file specified either using "regions" or using HGNC gene symbols.

Examples

if (FALSE) {
 slicing("df80679e-c4d3-487b-934c-fcc782e5d46e",
        regions="chr17:75000000-76000000",
        token=gdc_token())

# Get 10 BAM files.
bamfiles = files() %>% 
           filter(data_format=='BAM') %>%
           results(size=10) %>% ids()

# Current alignments at the GDC are to GRCh38
library('TxDb.Hsapiens.UCSC.hg38.knownGene')
all_genes = genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

first3genes = all_genes[1:3]
# remove strand info
strand(first3genes) = '*'

# We can get our regions easily now
as.character(first3genes)

# Use parallel downloads to speed processing
library(BiocParallel)
register(MulticoreParam())

fnames = bplapply(bamfiles, slicing, overwrite = TRUE,
                regions=as.character(first3genes))

# 10 BAM files
fnames

library(GenomicAlignments)
lapply(unlist(fnames), readGAlignments)

}