suppressPackageStartupMessages({
library(pasillaBamSubset)
})
# Path to a bam file with single-end reads
(un1 <- untreated1_chr4())
#> [1] "/Users/runner/work/_temp/Library/pasillaBamSubset/extdata/untreated1_chr4.bam"
This HowTo has been adapted from the list of HowTos provided in the vignette for the GenomicRanges Bioconductor package.
Bioconductor packages used in this document
How to compute read coverage
By “read coverage”, we mean the number of reads that cover a given genomic position. Computing the read coverage generally consists in computing the coverage at each position in the genome. This can be done with the coverage
function from GenomicAlignments. For illustration, we use data from the pasillaBamSubset data package.
Single-end reads can be read with the readGAlignments
function from GenomicAlignments.
suppressPackageStartupMessages({
library(GenomicAlignments)
})
(reads <- readGAlignments(un1))
#> GAlignments object with 204355 alignments and 0 metadata columns:
#> seqnames strand cigar qwidth start end width
#> <Rle> <Rle> <character> <integer> <integer> <integer> <integer>
#> [1] chr4 - 75M 75 892 966 75
#> [2] chr4 - 75M 75 919 993 75
#> [3] chr4 + 75M 75 924 998 75
#> [4] chr4 + 75M 75 936 1010 75
#> [5] chr4 + 75M 75 949 1023 75
#> ... ... ... ... ... ... ... ...
#> [204351] chr4 + 75M 75 1348268 1348342 75
#> [204352] chr4 + 75M 75 1348268 1348342 75
#> [204353] chr4 + 75M 75 1348268 1348342 75
#> [204354] chr4 - 75M 75 1348449 1348523 75
#> [204355] chr4 - 75M 75 1350124 1350198 75
#> njunc
#> <integer>
#> [1] 0
#> [2] 0
#> [3] 0
#> [4] 0
#> [5] 0
#> ... ...
#> [204351] 0
#> [204352] 0
#> [204353] 0
#> [204354] 0
#> [204355] 0
#> -------
#> seqinfo: 8 sequences from an unspecified genome
The coverage can then be calculated using the coverage
function from GenomicAlignments:
(cvg <- coverage(reads))
#> RleList of length 8
#> $chr2L
#> integer-Rle of length 23011544 with 1 run
#> Lengths: 23011544
#> Values : 0
#>
#> $chr2R
#> integer-Rle of length 21146708 with 1 run
#> Lengths: 21146708
#> Values : 0
#>
#> $chr3L
#> integer-Rle of length 24543557 with 1 run
#> Lengths: 24543557
#> Values : 0
#>
#> $chr3R
#> integer-Rle of length 27905053 with 1 run
#> Lengths: 27905053
#> Values : 0
#>
#> $chr4
#> integer-Rle of length 1351857 with 122061 runs
#> Lengths: 891 27 5 12 13 45 5 ... 3 106 75 1600 75 1659
#> Values : 0 1 2 3 4 5 4 ... 6 0 1 0 1 0
#>
#> ...
#> <3 more elements>
We can extract the coverage on a specific chromosome:
cvg$chr4
#> integer-Rle of length 1351857 with 122061 runs
#> Lengths: 891 27 5 12 13 45 5 ... 3 106 75 1600 75 1659
#> Values : 0 1 2 3 4 5 4 ... 6 0 1 0 1 0
We can also perform calculations on the returned object, e.g. to get the average and maximum coverage on chromosome 4:
We can use the slice
function to find the genomic regions where the coverage is greater or equal to a given threshold.
(chr4_highcov <- slice(cvg$chr4, lower = 500))
#> Views on a 1351857-length Rle subject
#>
#> views:
#> start end width
#> [1] 86849 87364 516 [ 525 538 554 580 583 585 589 706 942 ...]
#> [2] 87466 87810 345 [4924 4928 4941 4943 4972 5026 5039 5037 5050 ...]
#> [3] 340791 340798 8 [508 512 506 530 521 519 518 501]
#> [4] 340800 340885 86 [500 505 560 560 565 558 564 559 555 556 565 ...]
#> [5] 348477 348483 7 [503 507 501 524 515 513 512]
#> [6] 348488 348571 84 [554 554 559 552 558 553 549 550 559 560 565 ...]
#> [7] 692512 692530 19 [502 507 508 518 520 522 524 526 547 554 591 ...]
#> [8] 692551 692657 107 [ 530 549 555 635 645 723 725 733 740 ...]
#> [9] 692798 692800 3 [503 500 503]
#> ... ... ... ... ...
#> [34] 1054306 1054306 1 [502]
#> [35] 1054349 1054349 1 [501]
#> [36] 1054355 1054444 90 [510 521 525 532 532 539 549 555 557 560 563 ...]
#> [37] 1054448 1054476 29 [502 507 516 517 508 517 525 528 532 520 518 ...]
#> [38] 1054479 1054482 4 [504 503 506 507]
#> [39] 1054509 1054509 1 [500]
#> [40] 1054511 1054511 1 [502]
#> [41] 1054521 1054623 103 [529 521 529 530 524 525 547 540 536 533 529 ...]
#> [42] 1054653 1054717 65 [520 519 516 528 526 585 591 589 584 595 619 ...]
The weight of a given such region can be defined as the number of aligned nucleotides that belong to the region (i.e. the area under the coverage curve). It can be obtained with sum
:
sum(chr4_highcov)
#> [1] 1726347 1300700 4115 52301 3575 51233 10382 95103 1506
#> [10] 500 2051 500 5834 10382 92163 500 88678 1512
#> [19] 500 11518 14514 5915 3598 7821 511 508 503
#> [28] 500 1547 8961 43426 22842 503 502 501 51881
#> [37] 15116 2020 500 502 67010 40496
Note that coverage
and slice
are generic functions with methods for different types of objects. See ?coverage
or ?slice
for more information.
Further reading
- The GenomicAlignments vignette.
Session info
Click to display session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Ventura 13.7.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: UTC
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] GenomicAlignments_1.45.0 Rsamtools_2.25.0
#> [3] Biostrings_2.77.1 XVector_0.49.0
#> [5] SummarizedExperiment_1.39.0 Biobase_2.69.0
#> [7] MatrixGenerics_1.21.0 matrixStats_1.5.0
#> [9] GenomicRanges_1.61.0 GenomeInfoDb_1.45.3
#> [11] IRanges_2.43.0 S4Vectors_0.47.0
#> [13] BiocGenerics_0.55.0 generics_0.1.4
#> [15] pasillaBamSubset_0.47.0 BiocStyle_2.37.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-3 jsonlite_2.0.0 compiler_4.5.1
#> [4] BiocManager_1.30.25 crayon_1.5.3 bitops_1.0-9
#> [7] parallel_4.5.1 BiocParallel_1.43.2 yaml_2.3.10
#> [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
#> [13] S4Arrays_1.9.0 knitr_1.50 DelayedArray_0.35.1
#> [16] rlang_1.1.6 xfun_0.52 SparseArray_1.9.0
#> [19] cli_3.6.5 digest_0.6.37 grid_4.5.1
#> [22] evaluate_1.0.3 codetools_0.2-20 abind_1.4-8
#> [25] rmarkdown_2.29 httr_1.4.7 tools_4.5.1
#> [28] htmltools_0.5.8.1 UCSC.utils_1.5.0