Skip to contents

This HowTo has been adapted from the list of HowTos provided in the vignette for the GenomicRanges Bioconductor package.

Bioconductor packages used in this document

How to read a BAM file in chunks

A large BAM file can be iterated through in chunks, in order to reduce the memory usage, by setting a yieldSize for the BamFile object. For illustration, we use data from the pasillaBamSubset data package.

suppressPackageStartupMessages({
    library(pasillaBamSubset)
    library(Rsamtools)
})
# Path to a bam file with single-end reads
(un1 <- untreated1_chr4())
#> [1] "/Users/runner/work/_temp/Library/pasillaBamSubset/extdata/untreated1_chr4.bam"
bf <- BamFile(un1, yieldSize = 100000)

Iteration through a BAM file requires that the file be opened, repeatedly queried inside a loop, then closed. Repeated calls to GenomicAlignments::readGAlignments without opening the file first result in the same 100000 records returned each time (with a yieldSize of 100000). As an example, let’s calculate the coverage for the bam file above.

suppressPackageStartupMessages({
    library(GenomicAlignments)
})
open(bf)
cvg <- NULL
repeat {
    chunk <- readGAlignments(bf)
    if (length(chunk) == 0L) {
        break
    }
    chunk_cvg <- coverage(chunk)
    if (is.null(cvg)) {
        cvg <- chunk_cvg
    } else {
        cvg <- cvg + chunk_cvg
    }
}
close(bf)
cvg
#> RleList of length 8
#> $chr2L
#> integer-Rle of length 23011544 with 1 run
#>   Lengths: 23011544
#>   Values :        0
#> 
#> $chr2R
#> integer-Rle of length 21146708 with 1 run
#>   Lengths: 21146708
#>   Values :        0
#> 
#> $chr3L
#> integer-Rle of length 24543557 with 1 run
#>   Lengths: 24543557
#>   Values :        0
#> 
#> $chr3R
#> integer-Rle of length 27905053 with 1 run
#>   Lengths: 27905053
#>   Values :        0
#> 
#> $chr4
#> integer-Rle of length 1351857 with 122061 runs
#>   Lengths:  891   27    5   12   13   45    5 ...    3  106   75 1600   75 1659
#>   Values :    0    1    2    3    4    5    4 ...    6    0    1    0    1    0
#> 
#> ...
#> <3 more elements>

Session info

Click to display session info
sessionInfo()
#> R Under development (unstable) (2025-02-15 r87725)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.7.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: UTC
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] GenomicAlignments_1.43.0    SummarizedExperiment_1.37.0
#>  [3] Biobase_2.67.0              MatrixGenerics_1.19.1      
#>  [5] matrixStats_1.5.0           Rsamtools_2.23.1           
#>  [7] Biostrings_2.75.3           XVector_0.47.2             
#>  [9] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
#> [11] IRanges_2.41.3              S4Vectors_0.45.4           
#> [13] BiocGenerics_0.53.6         generics_0.1.3             
#> [15] pasillaBamSubset_0.45.0     BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.7-2            jsonlite_1.8.9          compiler_4.5.0         
#>  [4] BiocManager_1.30.25     crayon_1.5.3            bitops_1.0-9           
#>  [7] parallel_4.5.0          BiocParallel_1.41.0     yaml_2.3.10            
#> [10] fastmap_1.2.0           lattice_0.22-6          R6_2.6.1               
#> [13] S4Arrays_1.7.3          knitr_1.49              DelayedArray_0.33.6    
#> [16] GenomeInfoDbData_1.2.13 rlang_1.1.5             xfun_0.50              
#> [19] SparseArray_1.7.5       cli_3.6.4               grid_4.5.0             
#> [22] digest_0.6.37           evaluate_1.0.3          codetools_0.2-20       
#> [25] abind_1.4-8             rmarkdown_2.29          httr_1.4.7             
#> [28] tools_4.5.0             htmltools_0.5.8.1       UCSC.utils_1.3.1