Instructors and contact information
- Haibo Liu (Department of Animal Science, Iowa State University, Ames, IA, 50010, USA. haibol@iastate.edu)
- Jianhong Ou (Department of Cell Biology, Duke University Medical Center, Durham, NC, 27710, USA. Jianhong.ou@duke.edu)
- Lihua Julie Zhu (Corresponding author, Department of Molecular, Cell and Cancer Biology, Program in Molecular Medicine, Program in Bioinformatics and Integrative Biology, Worcester, MA, 01655, USA. Julie.Zhu@umassmed.edu)
Introduction to the ATAC-seq technology
In eukaryotes, nucleosomes are basic DNA packaging units, each of which consists of a nucleosome core particle (NPC), linker DNA and linker histone H1. The NPC is composed of ~147 bp of DNA wrapped around a histone core octamer, spaced from adjacent NPCs by a linker DNA of ~20-90 bp. Linker histones binds additional 20 bp of linker DNA at its entry/exit points to a NPC to stabilize nucleosome conformation and higher-order chromatin assembly. Nucleosomes can be hierarchically assembled into higher-order structures, eventually into chromatin or even chromosomes1,2 (Figure 1, adapted from Nature Education 1(1):26). In cells, different genomic regions are packaged with different accessibility to transcriptional machinery. Most notably, promoters and enhancers of actively transcribed genes are devoid of histone interaction, which are called “open” chromatin regions (Figure 2, adapted from Wang et al3). Thus, the interplay between histones and DNA serves as an important layer for controlling gene expression4,5. Therefore, it is important to determine the chromatin accessibility to better understand gene expression regulation in cells.
In recent years, a few methods have been developed to profile genome-wide chromatin accessibility (Figure 3, adapted from Tsompana Tsompana and Buck6) For review, see Tsompana and Buck6. Among these methods, ATAC-seq (an assay for Transposase-Accessible Chromatin using sequencing) is a rapid and sensitive method for profiling chromatin accessibility7. Compared to other methods, such as MNase-seq, FAIRE-seq and DNase-seq, ATAC-seq allows comparable or even higher signal-to-noise ratio, but requires much less amount of the biological materials and time to process. Briefly, hyperactive Tn5 transposases preloaded with adapters are first added to simultaneously tag and fragment open chromatin in nuclei (500~50,000) isolated from fresh or cryopreserved tissues/cells at nearly native states, a process called tagmentation. The tagged DNA fragments are then amplified and simultaneously sequencing adapters and indices compatible to Illumina sequencing platforms are added by using PCR with optimized cycles (Figure 4, adapted from Buenrostro et al7).
Since its first development, ATAC-seq has been used for quite a few purposes7,8:
- Infer nucleosome positioning
- Profile open chromatin regions
- Identify transcription factor (TF) footprints
- Infer gene transcriptional regulation
- Determine cell state/identity in combination with single cell technologies
Preprocessing of ATAC-seq data
Before using the ATACSeqQC package to assess the quality of ATAC-seq data, several preprocessing steps are needed as follows. These steps are usually performed in UNIX environments using open source software other than R. Detailed information about how to preprocess ATAC-seq data is provided in “vignettes/Preprocessing.scripts.for.BioC2019.tutorial.txt”.
- Check quality of raw reads using FASTQC.
- Generate an interactive summary report as a HTML file using MultiQC11.
- Trim sequencing adapters using Trimmomatic12 or Trim Galore{!}.
- Map the reads to the reference genome of choice using Bowtie/Bowtie213,14 or BWA-mem15.
- Convert and sort alignment files using SAMtools16 or sambamba17.
- Filter aligned reads
- Remove reads derived from organelles using SAMtools16
- Remove duplicates using picard tools
- Remove mapping artifacts (fragments < 38 bp6 and fragments > 2 kb, reads with un-coordinated mapping) using custom scripts
- Remove reads derived from organelles using SAMtools16
ATACseqQC workflow
In this workshop we will start with preprocessed alignment files in the BAM formats and use the R/BioConductor package ATACseqQC to perform some representative steps of post-alignment QC.
## prepare the example BAM files for importing
bamFile <- system.file("extdata", "GL1.bam",
package="ATACseqQC", mustWork=TRUE)
bamFileLabels <- gsub(".bam", "", basename(bamFile))
Assessing ATAC-seq read mapping status
FASTQC can generate a summary of the quality of raw reads, but it can’t reveal the quality of post-aligned reads or mapping status, such as mapping rate, duplicate rate, genome-wide distribution, mapping quality, and contamination. The bamQC
function can serve these needs, which requires sorted BAM files with duplicate reads marked as input.
## bamQC
bamQC(bamFile, outPath=NULL)
## $totalQNAMEs
## [1] 44357
##
## $duplicateRate
## [1] 0
##
## $mitochondriaRate
## [1] 0
##
## $properPairRate
## [1] 0.9815136
##
## $unmappedRate
## [1] 0
##
## $hasUnmappedMateRate
## [1] 0
##
## $notPassingQualityControlsRate
## [1] 0
##
## $nonRedundantFraction
## [1] 0.9398066
##
## $PCRbottleneckCoefficient_1
## [1] 0.9689694
##
## $PCRbottleneckCoefficient_2
## [1] 31.22622
##
## $MAPQ
## Var1 Freq
## 1 21 44
## 2 22 12
## 3 23 536
## 4 24 606
## 5 25 14
## 6 26 10
## 7 27 4
## 8 30 1682
## 9 31 780
## 10 32 238
## 11 34 244
## 12 35 118
## 13 36 44
## 14 37 62
## 15 38 64
## 16 39 34
## 17 40 2076
## 18 42 82146
##
## $idxstats
## seqnames seqlength mapped unmapped
## 1 chr1 249250621 88714 0
## 2 chr2 243199373 0 0
## 3 chr3 198022430 0 0
## 4 chr4 191154276 0 0
## 5 chr5 180915260 0 0
## 6 chr6 171115067 0 0
## 7 chr7 159138663 0 0
## 8 chr8 146364022 0 0
## 9 chr9 141213431 0 0
## 10 chr10 135534747 0 0
## 11 chr11 135006516 0 0
## 12 chr12 133851895 0 0
## 13 chr13 115169878 0 0
## 14 chr14 107349540 0 0
## 15 chr15 102531392 0 0
## 16 chr16 90354753 0 0
## 17 chr17 81195210 0 0
## 18 chr18 78077248 0 0
## 19 chr19 59128983 0 0
## 20 chr20 63025520 0 0
## 21 chr21 48129895 0 0
## 22 chr22 51304566 0 0
## 23 chrX 155270560 0 0
## 24 chrY 59373566 0 0
## 25 chrM 16571 0 0
Assessing insert size distribution
It is known that in ATAC-seq experiments, tagmentation of Tn5 transposases produce signature size pattern of fragments derived from nucleosome-free regions, mononucleosome, dinucleosome, trinucleosome and longer oligonucleosome from other open chromatin regions (Figure 6, adapted from Li et al18) . A typical distribution of insert fragment size is shown in (Figure 7, adapted from Buenrostro et al19. The function
fragSizeDist
in the ATACseqQC package can be used to generate such a distribution plot. Please note the pre-filtered BAM files need to be used to get an unbiased distribution of insert fragment size in the ATAC-seq library.fragSize <- fragSizeDist(bamFile, bamFileLabels)
Shifting aligned reads
Tagmentation of Tn5 transposases produce 5’ overhang of 9 base long, the coordinates of reads mapping to the positive and negative strands need to be shifted by + 4 and - 5, respectively7,20. The functions shiftGAlignmentsList
and shiftReads
in the ATACseqQC package can be used for this purpose.
## bamfile tags to be read in
possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))
bamTop100 <- scanBam(BamFile(bamFile, yieldSize = 100),
param = ScanBamParam(tag=possibleTag))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
## files will be output into outPath
outPath <- "splitBam"
if (dir.exists(outPath))
{
unlink(outPath, recursive = TRUE, force = TRUE)
}
dir.create(outPath)
## shift the coordinates of 5'ends of alignments in the bam file
seqlev <- "chr1" ## subsample data for quick run
which <- as(seqinfo(Hsapiens)[seqlev], "GRanges")
gal <- readBamFile(bamFile, tag=tags, which=which,
asMates=TRUE, bigFile=TRUE)
shiftedBamFile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamFile)
Splitting BAM files
The shifted reads need to be split into different bins, namely bins for reads from nucleosome-free regions, and regions occupied by mononucleosome, dinucleosomes, and trinucleosomes. Shifted reads that do not fit into any of the three bins could be discarded. The function splitGAlignmentsByCut
has been implemented to meet this demand.
## run program for chromosome 1 only
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- txs[seqnames(txs) %in% "chr1"]
genome <- Hsapiens
## split the reads into bins for reads derived from nucleosome-free regions, mononucleosome, dinucleosome and trinucleosome. And save the binned alignments into bam files.
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)
## list the files generated by splitGAlignmentsByCut.
dir(outPath)
## [1] "dinucleosome.bam" "dinucleosome.bam.bai"
## [3] "inter1.bam" "inter1.bam.bai"
## [5] "inter2.bam" "inter2.bam.bai"
## [7] "inter3.bam" "inter3.bam.bai"
## [9] "mononucleosome.bam" "mononucleosome.bam.bai"
## [11] "NucleosomeFree.bam" "NucleosomeFree.bam.bai"
## [13] "others.bam" "others.bam.bai"
## [15] "shifted.bam" "shifted.bam.bai"
## [17] "trinucleosome.bam" "trinucleosome.bam.bai"
Plotting global signal distribution around transcriptional start sites (TSSs)
Previous studies have shown that transcriptionally active elements, such as promoters and enhancers, are defined by short regions of DNA that are devoid of direct histone interactions. These regions of “open” chromatin are usually occupied by transcription factors that facilitate gene transcription. By contrast, the promoters of genes that are not actively expressed in a given cell type exhibit much tighter association with histones, which prevents transcription factors from activating transcription and contributes to gene repression. Typical nucleosome density distributions around TSSs of actively transcribed genes and active enhancers are shown in Figure 9, adapted from Baldi21. The function
enrichedFragments
is for calculating signals around TSSs using the split BAM objects. And the function featureAlignedHeatmap
is for generating a heatmap showing signal distribution around TSSs. The function matplot
is for summarizing the signal distribution around TSSs using a density plot.bamFiles <- file.path(outPath,
c("NucleosomeFree.bam",
"mononucleosome.bam",
"dinucleosome.bam",
"trinucleosome.bam"))
TSS <- promoters(txs, upstream=0, downstream=1)
TSS <- unique(TSS)
## estimate the library size for normalization
(librarySize <- estLibSize(bamFiles))
## splitBam/NucleosomeFree.bam splitBam/mononucleosome.bam
## 17704 5379
## splitBam/dinucleosome.bam splitBam/trinucleosome.bam
## 4907 923
## calculate the signals around TSSs.
NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree",
"mononucleosome",
"dinucleosome",
"trinucleosome")],
TSS=TSS,
librarySize=librarySize,
seqlev=seqlev,
TSS.filter=0.5,
n.tile = NTILE,
upstream = ups,
downstream = dws)
## log2 transformed signals
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
#plot heatmap
featureAlignedHeatmap(sigs.log2, reCenterPeaks(TSS, width=ups+dws),
zeroAt=.5, n.tile=NTILE)
## get signals normalized for nucleosome-free and nucleosome-bound regions.
out <- featureAlignedDistribution(sigs,
reCenterPeaks(TSS, width=ups+dws),
zeroAt=.5, n.tile=NTILE, type="l",
ylab="Averaged coverage")
## rescale the nucleosome-free and nucleosome signals to 0~1
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
out <- apply(out, 2, range01)
matplot(out, type="l", xaxt="n",
xlab="Position (bp)",
ylab="Fraction of signal")
axis(1, at=seq(0, 100, by=10)+1,
labels=c("-1K", seq(-800, 800, by=200), "1K"), las=2)
abline(v=seq(0, 100, by=10)+1, lty=2, col="gray")
Streamlining IGV snapshots showing signal distribution along housekeeping genes
Housekeeping genes are relatively stably expressed across tissues. Signal enrichment is expected in the regulatory regions of housekeeping genes in successful ATAC-seq experiments, which provides valuable insights into the quality of the ATAC-seq library. We have implemented the function IGVSnapshot
is to facilitate automatic visualization of signal distribution along any genomic region of interest. CAUTION: no white spaces are allowed in the full path to BAM files.
source(system.file("extdata", "IGVSnapshot.R", package = "ATACseqQC"))
IGVSnapshot(maxMem="lm", genomeBuild = "hg19",
bamFileFullPathOrURLs = bamFile,
geneNames = c("PQLC2", "MINOS1"))
Assessing footprints of DNA-binding factors
Genomic regions bound by transcription factors/insulators are locally protected from Tn5 transposase tagmentation and the pattern of Tn5 transposase cutting sites around TF-binding sites can be used to infer the footprints of TFs. The function factorFootprints
is for plotting footprints of DNA-binding factors.
## foot prints
CTCF <- query(MotifDb, c("CTCF"))
CTCF <- as.list(CTCF)
print(CTCF[[1]], digits=2)
## 1 2 3 4 5 6 7 8 9 10 11 12
## A 0.10 0.16 0.30 0.072 0.012 0.786 0.024 0.122 0.914 0.012 0.376 0.022
## C 0.36 0.21 0.10 0.826 0.966 0.024 0.620 0.494 0.010 0.008 0.010 0.022
## G 0.12 0.41 0.44 0.050 0.012 0.108 0.336 0.056 0.048 0.976 0.602 0.606
## T 0.42 0.22 0.16 0.052 0.010 0.082 0.020 0.328 0.028 0.004 0.012 0.350
## 13 14 15 16 17 18 19
## A 0.028 0.024 0.096 0.424 0.086 0.12 0.34
## C 0.002 0.016 0.818 0.024 0.532 0.35 0.26
## G 0.962 0.880 0.038 0.522 0.326 0.12 0.32
## T 0.008 0.080 0.048 0.030 0.056 0.41 0.08
sigs <- factorFootprints(shiftedBamFile, pfm=CTCF[[1]],
genome=genome,
min.score="90%", seqlev=seqlev,
upstream=100, downstream=100)
Assessing sequencing depth and library complexity
Although all the above assessments can be used to determine the quality of the available ATAC-seq data, they can’t tell whether the sequencing depth is saturated or not, nor whether the library is complex enough for deeper sequencing. The functions saturationPlot
and estimateLibComplexity
are for these purposes. CAUTION: only BAM files without removing duplicates are informative for estimating library complexity.
estimateLibComplexity(readsDupFreq(bamFile))
Assessing similarity of replicates
If multiple ATAC-seq assays have been performed, signal correlation among replicates can be checked using the function plotCorrelation
.
path <- system.file("extdata", package="ATACseqQC", mustWork=TRUE)
bamFiles <- dir(path, "*.bam$", full.name=TRUE)
gals <- lapply(bamFiles, function(bamfile){
readBamFile(bamFile=bamfile, tag=character(0),
which=GRanges("chr1", IRanges(1, 1e6)),
asMates=FALSE)
})
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
plotCorrelation(GAlignmentsList(gals), txs, seqlev="chr1")
Assessing other QC metrics
A few new QC functions have been added to the ATACseqQC package.
distanceDyad
: calculate the distance of potential nucleosome dyad and the linear model for V.
NFRscore
: calculate the ratio of cutting signal immediately adjacent to TSSs and that located to the regions flanking TSSs.
PTscore
: calculate the ratio of read coverage over promoters to read coverage over transcript body.
TSSEscore
: calculate aggregated distribution of reads centered on TSSs and that of reads flanking the corresponding TSSs.
vPlot
: aggregate ATAC-seq Fragment Midpoint vs. Length for a given motif generated over binding sites within the genome.
Exercises and live demonstration
ATAC-seq datasets from two different studies were downloaded from the ENA Sequence Read Archive (SRA). SRR891269 and SRR891270 are ATAC-seq data for two biological replicates of 50K cells from EBV-transformed lymphoblastoid cell line GM128787. SRR5800801 and SRR5800802 are ATAC-seq data for two replicates of 75k cells from a breast cancer cell line T47 (Valles and Izquierd-Bouldstridge, unpublished). The first dataset is of good quality, while the second is from failed ATAC-seq assays. After raw read quality QC using FASTQC, reads were aligned to the human reference genome hg38 from the UCSC Genome Browser after removing the alternative sequences (chr_alt chromosomes). The resulting BAM files were further preprocessed as above and are available from the Google Drive.
Two subsets of filtered BAM files from SRR891270, containing alignments on chr20 only, are provided in this workshop package for quick in-class practice.
Due to the tight schedule, we will not demonstrate how to perform the QC analyses using the functions saturationPlot
,factorFootprints
, NFRscore
, PTscore
, TSSEscore
, or vPlot
. You may play with the other data and these functions after class. You are encouraged to post your questions at https://support.bioconductor.org with post title “ATACseqQC”.
For your reference, example R scripts for Tasks 1-5 are available at “inst/vignettes/BioC2019.demo.ATACseqQC.R”.
First, load the required packages.
suppressPackageStartupMessages({
library(ATACseqQC)
library(ChIPpeakAnno)
library(BSgenome.Hsapiens.UCSC.hg38)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(MotifDb)
library(SRAdb)
library(GenomicAlignments)
})
Task 1. Check read alignment statistics using the functionbamQC
.
To get the location of the BAM file for Tasks 1 and 2, , please copy and run the following scripts in a R console.
dupBamFile <- system.file("vignettes/extdata",
"SRR891270.chr20.full.bam",
package = "ATACseqQCWorkshop")
Task 2. Check library complexity for SRR891270 using the function estimateLibComplexity
. CAUTION: duplicated should NOT be removed from the input BAM file.
Using the following code to get access to the BAM file for Tasks 2-4.
bamFile <- system.file("vignettes/extdata",
"SRR891270.chr20.rmdup.bam",
package = "ATACseqQCWorkshop")
bamFileLabels <- gsub(".rmdup.bam", "",
basename(bamFile))
Task 3. Plot the distribution of library insert size using the function fragSizeDist
.
Task 4. Plot global signal enrichment around TSSs using the function enrichedFragments
. Please shift, and split the BAM file first. Make sure set seqlev as chr20.
Task 5. Visualize signal distribution along genomic regions of genes “SPTLC3”, “CRLS1”, and “NELFCD”, using the function IGVSnapshot
and the 4 subsets of filtered BAM files.
Download data to a location without white spaces in the full path from the folder subsetted_filtered_BAM in the Google Drive, by following the shared link.
source(system.file("extdata", "IGVSnapshot.R", package = "ATACseqQC"))
Best practices of ATAC-seq and ATAC-seq data QC
A few best practices for ATAC-seq assays are suggested as follows:
- Digest away background DNA (medium/dead cells) using DNase22
- Use fresh/cyropreserved cells/tissues to isolate nuclei7,9
- Reduce mitochondrial/chloroplast DNA contamination as much as possible22–25
- Optimize the ratio of the amount of Tn5 enzyme to the number of nuclei
- Optimize the number of PCR cycles19
- Perform Paired-end (PE) sequencing, e.g., 2 x 50bp
- Sequence > 50M PE reads (~200 M for footprint discovery)7
A few best practices for ATAC-seq data analysis are suggested as follows:
Session information
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] GenomicAlignments_1.21.2
## [2] Rsamtools_2.1.2
## [3] SummarizedExperiment_1.15.3
## [4] DelayedArray_0.11.2
## [5] BiocParallel_1.19.0
## [6] matrixStats_0.54.0
## [7] SRAdb_1.47.0
## [8] RCurl_1.95-4.12
## [9] bitops_1.0-6
## [10] graph_1.63.0
## [11] RSQLite_2.1.1
## [12] MotifDb_1.27.0
## [13] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [14] GenomicFeatures_1.37.1
## [15] AnnotationDbi_1.47.0
## [16] Biobase_2.45.0
## [17] BSgenome.Hsapiens.UCSC.hg19_1.4.0
## [18] BSgenome_1.53.0
## [19] rtracklayer_1.45.1
## [20] ChIPpeakAnno_3.19.2
## [21] VennDiagram_1.6.20
## [22] futile.logger_1.4.3
## [23] GenomicRanges_1.37.12
## [24] GenomeInfoDb_1.21.1
## [25] Biostrings_2.53.0
## [26] XVector_0.25.0
## [27] IRanges_2.19.10
## [28] ATACseqQC_1.9.1
## [29] S4Vectors_0.23.13
## [30] BiocGenerics_0.31.4
## [31] BiocManager_1.30.4
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 seqinr_3.4-5
## [3] grImport2_0.1-5 base64enc_0.1-3
## [5] rGADEM_2.33.0 bit64_0.9-7
## [7] interactiveDisplayBase_1.23.0 xml2_1.2.0
## [9] splines_3.6.0 motifStack_1.29.5
## [11] knitr_1.23 ade4_1.7-13
## [13] splitstackshape_1.4.8 polynom_1.4-0
## [15] seqLogo_1.51.0 GO.db_3.8.2
## [17] dbplyr_1.4.2 png_0.1-7
## [19] shiny_1.3.2 readr_1.3.1
## [21] compiler_3.6.0 httr_1.4.0
## [23] assertthat_0.2.1 Matrix_1.2-17
## [25] lazyeval_0.2.2 limma_3.41.5
## [27] later_0.8.0 formatR_1.7
## [29] htmltools_0.3.6 prettyunits_1.0.2
## [31] tools_3.6.0 gtable_0.3.0
## [33] glue_1.3.1 GenomeInfoDbData_1.2.1
## [35] dplyr_0.8.1 rappdirs_0.3.1
## [37] Rcpp_1.0.1 multtest_2.41.0
## [39] blogdown_0.13.1 xfun_0.7
## [41] stringr_1.4.0 mime_0.7
## [43] ensembldb_2.9.2 XML_3.98-1.20
## [45] idr_1.2 AnnotationHub_2.17.3
## [47] edgeR_3.27.5 zlibbioc_1.31.0
## [49] MASS_7.3-51.4 scales_1.0.0
## [51] hms_0.4.2 promises_1.0.1
## [53] ProtGenerics_1.17.2 RBGL_1.61.0
## [55] GEOquery_2.53.0 AnnotationFilter_1.9.0
## [57] lambda.r_1.2.3 yaml_2.2.0
## [59] curl_3.3 memoise_1.1.0
## [61] ggplot2_3.2.0 MotIV_1.41.1
## [63] biomaRt_2.41.3 stringi_1.4.3
## [65] highr_0.8 randomForest_4.6-14
## [67] rlang_0.3.4 pkgconfig_2.0.2
## [69] evaluate_0.14 lattice_0.20-38
## [71] purrr_0.3.2 htmlwidgets_1.3
## [73] bit_1.1-14 tidyselect_0.2.5
## [75] magrittr_1.5 bookdown_0.11.1
## [77] R6_2.4.0 DBI_1.0.0
## [79] preseqR_4.0.0 pillar_1.4.1
## [81] survival_2.44-1.1 tibble_2.1.3
## [83] crayon_1.3.4 futile.options_1.0.1
## [85] KernSmooth_2.23-15 BiocFileCache_1.9.1
## [87] rmarkdown_1.13 jpeg_0.1-8
## [89] progress_1.2.2 locfit_1.5-9.1
## [91] data.table_1.12.2 blob_1.1.1
## [93] GenomicScores_1.9.0 digest_0.6.19
## [95] xtable_1.8-4 tidyr_0.8.3
## [97] httpuv_1.5.1 regioneR_1.17.3
## [99] munsell_0.5.0
References
1. Pederson, D. S., Thoma, F. & Simpson, R. T. Core particle, fiber, and transcriptionally active chromatin structure. Annual Review of Cell Biology 2, 117–147 (1986).
2. Bednar, J. et al. Structure and dynamics of a 197 bp nucleosome in complex with linker histone H1. Molecular Cell 66, 384–397.e8 (2017).
3. Wang, Y.-M. et al. Correlation between DNase I hypersensitive site distribution and gene expression in HeLa S3 cells. PloS one 7, e42414 (2012).
4. Gilbert, N. & Ramsahoye, B. The relationship between chromatin structure and transcriptional activity in mammalian genomes. Brief Funct Genomic Proteomic 4, 129–142 (2005).
5. Perino, M. & Veenstra, G. J. C. Chromatin control of developmental dynamics and plasticity. Developmental Cell 38, 610–620 (2016).
6. Tsompana, M. & Buck, M. J. Chromatin accessibility: A window into the genome. Epigenetics Chromatin 7, 33 (2014).
7. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213–1218 (2013).
8. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
9. Milani, P. et al. Cell freezing protocol suitable for atac-seq on motor neurons derived from human induced pluripotent stem cells. Scientific Reports 6, 25474 (2016).
10. Ou, J. et al. ATACseqQC: A Bioconductor package for post-alignment quality assessment of ATAC-seq data. BMC Genomics 19, 169 (2018).
11. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
12. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
13. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25 (2009).
14. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).
15. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
16. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
17. Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
18. Li, Z. et al. Identification of transcription factor binding sites using ATAC-seq. Genome Biology 20, 45 (2019).
19. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: A method for assaying chromatin accessibility genome-wide. Current protocols in molecular biology 109, 21.29.1–21.29.9 (2015).
20. Nag, D. K., DasGupta, U., Adelt, G. & Berg, D. E. IS50-mediated inverse transposition: Specificity and precision. Gene 34, 17–26 (1985).
21. Baldi, S. Nucleosome positioning and spacing: From genome-wide maps to single arrays. Essays In Biochemistry 63, 5–14 (2019).
22. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature Methods 14, 959–962 (2017).
23. Lu, Z., Hofmeister, B. T., Vollmers, C., DuBois, R. M. & Schmitz, R. J. Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acid Research 45, e41 (2016).
24. Rickner, H. D., Niu, S.-Y. & Cheng, C. S. ATAC-seq assay with low mitochondrial dna contamination from primary human CD4+ T lymphocytes. JoVE e59120 (2019). doi:10.3791/59120
25. Montefiori, L. et al. Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9. Scientific Reports 7, 2451 (2017).
26. Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 9, R137 (2008).
27. Tarbell, E. D. & Liu, T. HMMRATAC: A hidden markov modeler for atac-seq. bioRxiv (2019). doi:10.1101/306621
28. Zhu, L. J. et al. ChIPpeakAnno: A Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11, 237 (2010).
29. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology 28, 495–501 (2010).