library(Biostrings)
library(GenomicRanges)
library(rtracklayer)
library(txdbmaker)
library(GenomicFeatures)
library(BSgenome)
How to extract promoter sequences
2025-06-19
Source:vignettes/how-to-extract-promoter-sequences.qmd
Promoters are DNA sequences typically located upstream of a gene’s transcription start site (TSS), and they contain binding sites for transcription factors. Researchers working on gene regulation often want to extract promoter sequences for all genes or a set of genes. With these sequences, researchers can, for example, look for known transcription factor binding sites (TFBS), or predict TFBS de novo. This HowTo will demonstrate how to extract promoter sequences for any gene(s) in a genome.
Bioconductor packages used in this document
How to extract promoter sequences
We will start by loading the required packages.
Then, we will get our example data set. Here, we will use data for the model plant Arabidopsis thaliana. We will start by obtaining genome sequences and gene annotation from Ensembl Plants.
Working with model and non-model organisms
Here, we’re obtaining data from Ensembl Plants to explicitly show how this can be done for any organism. However, if you’re lucky enough to work with a model organism (e.g. human, mouse, etc), you can find curated genome sequences and gene annotation in AnnotationHub (see, for instance, this HowTo article).
# Read genomic data from Ensembl Plants
## Genome
genome <- Biostrings::readDNAStringSet(
"https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-61/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz"
)
names(genome) <- gsub(" .*", "", names(genome)) # remove text after chr name
## Annotation
annotation <- rtracklayer::import(
"https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-61/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.61.gff3.gz"
)
Genome sequences are stored in DNAStringSet
objects, and gene annotation (i.e., start and end coordinates of each gene) are stored in GRanges
objects. This is what these objects look like:
# Take a look at the genome and annotation
genome
#> DNAStringSet object of length 7:
#> width seq names
#> [1] 30427671 CCCTAAACCCTAAACCCTAAAC...TTTAGGGTTTAGGGTTTAGGG 1
#> [2] 19698289 NNNNNNNNNNNNNNNNNNNNNN...TTTAGGGTTTAGGGTTTAGGG 2
#> [3] 23459830 NNNNNNNNNNNNNNNNNNNNNN...TAAACCCTAAACCCTAAACCC 3
#> [4] 18585056 NNNNNNNNNNNNNNNNNNNNNN...GTTTAGGGTTTAGGGTTTAGG 4
#> [5] 26975502 TATACCATGTACCCTCAACCTT...AGGATTTAGGGTTTTTAGATC 5
#> [6] 366924 GGATCCGTTCGAAACAGGTTAG...AGAATGGAAACAAACCGGATT Mt
#> [7] 154478 ATGGGCGAACGACGGGAATTGA...AATAACTTGGTCCCGGGCATC Pt
annotation
#> GRanges object with 791564 ranges and 21 metadata columns:
#> seqnames ranges strand | source type score
#> <Rle> <IRanges> <Rle> | <factor> <factor> <numeric>
#> [1] 1 1-30427671 * | TAIR10 chromosome NA
#> [2] 1 3631-5899 + | araport11 gene NA
#> [3] 1 3631-5899 + | araport11 mRNA NA
#> [4] 1 3631-3759 + | araport11 five_prime_UTR NA
#> [5] 1 3631-3913 + | araport11 exon NA
#> ... ... ... ... . ... ... ...
#> [791560] Pt 152806-154312 + | araport11 mRNA NA
#> [791561] Pt 152806-153195 + | araport11 exon NA
#> [791562] Pt 152806-153195 + | araport11 CDS NA
#> [791563] Pt 153878-154312 + | araport11 exon NA
#> [791564] Pt 153878-154312 + | araport11 CDS NA
#> phase ID Alias
#> <integer> <character> <CharacterList>
#> [1] <NA> chromosome:1 Chr1,CP002684.1,NC_003070.9
#> [2] <NA> gene:AT1G01010
#> [3] <NA> transcript:AT1G01010.1
#> [4] <NA> <NA>
#> [5] <NA> <NA>
#> ... ... ... ...
#> [791560] <NA> transcript:ATCG01310.1
#> [791561] <NA> <NA>
#> [791562] 0 CDS:ATCG01310.1
#> [791563] <NA> <NA>
#> [791564] 0 CDS:ATCG01310.1
#> Name biotype description gene_id
#> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA>
#> [2] NAC001 protein_coding NAC domain containin.. AT1G01010
#> [3] NAC001-201 protein_coding <NA> <NA>
#> [4] <NA> <NA> <NA> <NA>
#> [5] AT1G01010.1.exon1 <NA> <NA> <NA>
#> ... ... ... ... ...
#> [791560] rpl2-201 protein_coding <NA> <NA>
#> [791561] ATCG01310.1.exon1 <NA> <NA> <NA>
#> [791562] <NA> <NA> <NA> <NA>
#> [791563] ATCG01310.1.exon2 <NA> <NA> <NA>
#> [791564] <NA> <NA> <NA> <NA>
#> logic_name Parent tag transcript_id
#> <character> <CharacterList> <character> <character>
#> [1] <NA> <NA> <NA>
#> [2] araport11 <NA> <NA>
#> [3] <NA> gene:AT1G01010 Ensembl_canonical AT1G01010.1
#> [4] <NA> transcript:AT1G01010.1 <NA> <NA>
#> [5] <NA> transcript:AT1G01010.1 <NA> <NA>
#> ... ... ... ... ...
#> [791560] <NA> gene:ATCG01310 Ensembl_canonical ATCG01310.1
#> [791561] <NA> transcript:ATCG01310.1 <NA> <NA>
#> [791562] <NA> transcript:ATCG01310.1 <NA> <NA>
#> [791563] <NA> transcript:ATCG01310.1 <NA> <NA>
#> [791564] <NA> transcript:ATCG01310.1 <NA> <NA>
#> constitutive ensembl_end_phase ensembl_phase exon_id
#> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA>
#> [5] 1 1 -1 AT1G01010.1.exon1
#> ... ... ... ... ...
#> [791560] <NA> <NA> <NA> <NA>
#> [791561] 1 0 0 ATCG01310.1.exon1
#> [791562] <NA> <NA> <NA> <NA>
#> [791563] 1 0 0 ATCG01310.1.exon2
#> [791564] <NA> <NA> <NA> <NA>
#> rank protein_id Is_circular
#> <character> <character> <character>
#> [1] <NA> <NA> <NA>
#> [2] <NA> <NA> <NA>
#> [3] <NA> <NA> <NA>
#> [4] <NA> <NA> <NA>
#> [5] 1 <NA> <NA>
#> ... ... ... ...
#> [791560] <NA> <NA> <NA>
#> [791561] 1 <NA> <NA>
#> [791562] <NA> ATCG01310.1 <NA>
#> [791563] 2 <NA> <NA>
#> [791564] <NA> ATCG01310.1 <NA>
#> -------
#> seqinfo: 7 sequences from an unspecified genome; no seqlengths
Note that our GRanges
object contains ranges for multiple types of features, including genes, mRNAs, exons, and even entire chromosomes. Since we’re only interested in promoter regions, we will subset annotation
to keep only ‘gene’ ranges.
# Keep only gene ranges
genes <- annotation[annotation$type == "gene"]
genes
#> GRanges object with 27655 ranges and 21 metadata columns:
#> seqnames ranges strand | source type score
#> <Rle> <IRanges> <Rle> | <factor> <factor> <numeric>
#> [1] 1 3631-5899 + | araport11 gene NA
#> [2] 1 6788-9130 - | araport11 gene NA
#> [3] 1 11649-13714 - | araport11 gene NA
#> [4] 1 23121-31227 + | araport11 gene NA
#> [5] 1 31170-33171 - | araport11 gene NA
#> ... ... ... ... . ... ... ...
#> [27651] Pt 141854-143708 + | araport11 gene NA
#> [27652] Pt 144921-145154 - | araport11 gene NA
#> [27653] Pt 145291-152175 - | araport11 gene NA
#> [27654] Pt 152506-152787 + | araport11 gene NA
#> [27655] Pt 152806-154312 + | araport11 gene NA
#> phase ID Alias Name biotype
#> <integer> <character> <CharacterList> <character> <character>
#> [1] <NA> gene:AT1G01010 NAC001 protein_coding
#> [2] <NA> gene:AT1G01020 ARV1 protein_coding
#> [3] <NA> gene:AT1G01030 NGA3 protein_coding
#> [4] <NA> gene:AT1G01040 DCL1 protein_coding
#> [5] <NA> gene:AT1G01050 PPa1 protein_coding
#> ... ... ... ... ... ...
#> [27651] <NA> gene:ATCG01250 ndhB protein_coding
#> [27652] <NA> gene:ATCG01270 ycf15-B protein_coding
#> [27653] <NA> gene:ATCG01280 ycf2 protein_coding
#> [27654] <NA> gene:ATCG01300 rpl23 protein_coding
#> [27655] <NA> gene:ATCG01310 rpl2 protein_coding
#> description gene_id logic_name Parent
#> <character> <character> <character> <CharacterList>
#> [1] NAC domain containin.. AT1G01010 araport11
#> [2] ARV1 family protein .. AT1G01020 araport11
#> [3] AP2/B3-like transcri.. AT1G01030 araport11
#> [4] dicer-like 1 [Source.. AT1G01040 araport11
#> [5] pyrophosphorylase 1 .. AT1G01050 araport11
#> ... ... ... ... ...
#> [27651] <NA> ATCG01250 araport11
#> [27652] <NA> ATCG01270 araport11
#> [27653] Ycf2 [Source:NCBI ge.. ATCG01280 araport11
#> [27654] ribosomal protein L2.. ATCG01300 araport11
#> [27655] ribosomal protein L2.. ATCG01310 araport11
#> tag transcript_id constitutive ensembl_end_phase
#> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA>
#> [5] <NA> <NA> <NA> <NA>
#> ... ... ... ... ...
#> [27651] <NA> <NA> <NA> <NA>
#> [27652] <NA> <NA> <NA> <NA>
#> [27653] <NA> <NA> <NA> <NA>
#> [27654] <NA> <NA> <NA> <NA>
#> [27655] <NA> <NA> <NA> <NA>
#> ensembl_phase exon_id rank protein_id Is_circular
#> <character> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA> <NA>
#> [5] <NA> <NA> <NA> <NA> <NA>
#> ... ... ... ... ... ...
#> [27651] <NA> <NA> <NA> <NA> <NA>
#> [27652] <NA> <NA> <NA> <NA> <NA>
#> [27653] <NA> <NA> <NA> <NA> <NA>
#> [27654] <NA> <NA> <NA> <NA> <NA>
#> [27655] <NA> <NA> <NA> <NA> <NA>
#> -------
#> seqinfo: 7 sequences from an unspecified genome; no seqlengths
Next, we can use the promoters()
function from GenomicRanges to extract the genomic coordinates of promoters. By default, promoters start at 2000 bp upstream the transcription start site (TSS), and end at 200 bp downstream the TSS, but you can modify these positions in parameters upstream and downstream of promoters()
. For example, some transcription factor families in plants (e.g. HSF and C2C2-GATA) bind exclusively to regions downstream the TSS, which is contrary to the common view of TFBS being located upstream TSSs. Importantly, if a gene has multiple TSSs (e.g., because of different transcripts starting at different positions), the most upstream TSS is used, not a cleverly chosen ‘representative’ or ‘canonical’ transcript.
# Get promoter regions from a `GRanges` object
prom_genes <- GenomicRanges::promoters(genes)
prom_genes
#> GRanges object with 27655 ranges and 21 metadata columns:
#> seqnames ranges strand | source type score
#> <Rle> <IRanges> <Rle> | <factor> <factor> <numeric>
#> [1] 1 1631-3830 + | araport11 gene NA
#> [2] 1 8931-11130 - | araport11 gene NA
#> [3] 1 13515-15714 - | araport11 gene NA
#> [4] 1 21121-23320 + | araport11 gene NA
#> [5] 1 32972-35171 - | araport11 gene NA
#> ... ... ... ... . ... ... ...
#> [27651] Pt 139854-142053 + | araport11 gene NA
#> [27652] Pt 144955-147154 - | araport11 gene NA
#> [27653] Pt 151976-154175 - | araport11 gene NA
#> [27654] Pt 150506-152705 + | araport11 gene NA
#> [27655] Pt 150806-153005 + | araport11 gene NA
#> phase ID Alias Name biotype
#> <integer> <character> <CharacterList> <character> <character>
#> [1] <NA> gene:AT1G01010 NAC001 protein_coding
#> [2] <NA> gene:AT1G01020 ARV1 protein_coding
#> [3] <NA> gene:AT1G01030 NGA3 protein_coding
#> [4] <NA> gene:AT1G01040 DCL1 protein_coding
#> [5] <NA> gene:AT1G01050 PPa1 protein_coding
#> ... ... ... ... ... ...
#> [27651] <NA> gene:ATCG01250 ndhB protein_coding
#> [27652] <NA> gene:ATCG01270 ycf15-B protein_coding
#> [27653] <NA> gene:ATCG01280 ycf2 protein_coding
#> [27654] <NA> gene:ATCG01300 rpl23 protein_coding
#> [27655] <NA> gene:ATCG01310 rpl2 protein_coding
#> description gene_id logic_name Parent
#> <character> <character> <character> <CharacterList>
#> [1] NAC domain containin.. AT1G01010 araport11
#> [2] ARV1 family protein .. AT1G01020 araport11
#> [3] AP2/B3-like transcri.. AT1G01030 araport11
#> [4] dicer-like 1 [Source.. AT1G01040 araport11
#> [5] pyrophosphorylase 1 .. AT1G01050 araport11
#> ... ... ... ... ...
#> [27651] <NA> ATCG01250 araport11
#> [27652] <NA> ATCG01270 araport11
#> [27653] Ycf2 [Source:NCBI ge.. ATCG01280 araport11
#> [27654] ribosomal protein L2.. ATCG01300 araport11
#> [27655] ribosomal protein L2.. ATCG01310 araport11
#> tag transcript_id constitutive ensembl_end_phase
#> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA>
#> [5] <NA> <NA> <NA> <NA>
#> ... ... ... ... ...
#> [27651] <NA> <NA> <NA> <NA>
#> [27652] <NA> <NA> <NA> <NA>
#> [27653] <NA> <NA> <NA> <NA>
#> [27654] <NA> <NA> <NA> <NA>
#> [27655] <NA> <NA> <NA> <NA>
#> ensembl_phase exon_id rank protein_id Is_circular
#> <character> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA> <NA>
#> [5] <NA> <NA> <NA> <NA> <NA>
#> ... ... ... ... ... ...
#> [27651] <NA> <NA> <NA> <NA> <NA>
#> [27652] <NA> <NA> <NA> <NA> <NA>
#> [27653] <NA> <NA> <NA> <NA> <NA>
#> [27654] <NA> <NA> <NA> <NA> <NA>
#> [27655] <NA> <NA> <NA> <NA> <NA>
#> -------
#> seqinfo: 7 sequences from an unspecified genome; no seqlengths
Alternatively, if you want to extract promoters for each transcript separately, you can subset the GRanges
object to extract only ranges of type ‘transcript’, and then use promoters()
as demonstrated above. Note that GFF3 files from some databases do not contain a ‘transcript’ type. In these cases, you will most likely find transcript-level ranges in rows of type ‘mRNA’. This is the case for Ensembl Plants, from where we obtained our example data. To extract promoters for each A. thaliana transcript, you’d execute:
# Extract transcript ranges from a GRanges object
tx_ranges <- annotation[annotation$type == "mRNA"] # or 'transcript' (if any)
# Extract promoters for each transcript separately
prom_tx <- promoters(tx_ranges)
prom_tx
#> GRanges object with 48359 ranges and 21 metadata columns:
#> seqnames ranges strand | source type score
#> <Rle> <IRanges> <Rle> | <factor> <factor> <numeric>
#> [1] 1 1631-3830 + | araport11 mRNA NA
#> [2] 1 8538-10737 - | araport11 mRNA NA
#> [3] 1 8538-10737 - | araport11 mRNA NA
#> [4] 1 8931-11130 - | araport11 mRNA NA
#> [5] 1 8931-11130 - | araport11 mRNA NA
#> ... ... ... ... . ... ... ...
#> [48355] Pt 139854-142053 + | araport11 mRNA NA
#> [48356] Pt 144955-147154 - | araport11 mRNA NA
#> [48357] Pt 151976-154175 - | araport11 mRNA NA
#> [48358] Pt 150506-152705 + | araport11 mRNA NA
#> [48359] Pt 150806-153005 + | araport11 mRNA NA
#> phase ID Alias Name
#> <integer> <character> <CharacterList> <character>
#> [1] <NA> transcript:AT1G01010.1 NAC001-201
#> [2] <NA> transcript:AT1G01020.2 ARV1-202
#> [3] <NA> transcript:AT1G01020.6 ARV1-201
#> [4] <NA> transcript:AT1G01020.1 ARV1-206
#> [5] <NA> transcript:AT1G01020.4 ARV1-205
#> ... ... ... ... ...
#> [48355] <NA> transcript:ATCG01250.1 ndhB-201
#> [48356] <NA> transcript:ATCG01270.1 ycf15-B-201
#> [48357] <NA> transcript:ATCG01280.1 ycf2-201
#> [48358] <NA> transcript:ATCG01300.1 rpl23-201
#> [48359] <NA> transcript:ATCG01310.1 rpl2-201
#> biotype description gene_id logic_name Parent
#> <character> <character> <character> <character> <CharacterList>
#> [1] protein_coding <NA> <NA> <NA> gene:AT1G01010
#> [2] protein_coding <NA> <NA> <NA> gene:AT1G01020
#> [3] protein_coding <NA> <NA> <NA> gene:AT1G01020
#> [4] protein_coding <NA> <NA> <NA> gene:AT1G01020
#> [5] protein_coding <NA> <NA> <NA> gene:AT1G01020
#> ... ... ... ... ... ...
#> [48355] protein_coding <NA> <NA> <NA> gene:ATCG01250
#> [48356] protein_coding <NA> <NA> <NA> gene:ATCG01270
#> [48357] protein_coding <NA> <NA> <NA> gene:ATCG01280
#> [48358] protein_coding <NA> <NA> <NA> gene:ATCG01300
#> [48359] protein_coding <NA> <NA> <NA> gene:ATCG01310
#> tag transcript_id constitutive ensembl_end_phase
#> <character> <character> <character> <character>
#> [1] Ensembl_canonical AT1G01010.1 <NA> <NA>
#> [2] <NA> AT1G01020.2 <NA> <NA>
#> [3] <NA> AT1G01020.6 <NA> <NA>
#> [4] Ensembl_canonical AT1G01020.1 <NA> <NA>
#> [5] <NA> AT1G01020.4 <NA> <NA>
#> ... ... ... ... ...
#> [48355] Ensembl_canonical ATCG01250.1 <NA> <NA>
#> [48356] Ensembl_canonical ATCG01270.1 <NA> <NA>
#> [48357] Ensembl_canonical ATCG01280.1 <NA> <NA>
#> [48358] Ensembl_canonical ATCG01300.1 <NA> <NA>
#> [48359] Ensembl_canonical ATCG01310.1 <NA> <NA>
#> ensembl_phase exon_id rank protein_id Is_circular
#> <character> <character> <character> <character> <character>
#> [1] <NA> <NA> <NA> <NA> <NA>
#> [2] <NA> <NA> <NA> <NA> <NA>
#> [3] <NA> <NA> <NA> <NA> <NA>
#> [4] <NA> <NA> <NA> <NA> <NA>
#> [5] <NA> <NA> <NA> <NA> <NA>
#> ... ... ... ... ... ...
#> [48355] <NA> <NA> <NA> <NA> <NA>
#> [48356] <NA> <NA> <NA> <NA> <NA>
#> [48357] <NA> <NA> <NA> <NA> <NA>
#> [48358] <NA> <NA> <NA> <NA> <NA>
#> [48359] <NA> <NA> <NA> <NA> <NA>
#> -------
#> seqinfo: 7 sequences from an unspecified genome; no seqlengths
Extracting promoter regions from
TxDb
objectsIf you have a
TxDb
object with transcript coordinates and metadata, you can extract promoter regions for each transcript with thepromoters()
function from the GenomicFeatures package. Note that function names are the same, butGenomicFeatures::promoters()
takesTxDb
objects as input, whileGenomicRanges::promoters()
takesGRanges
objects as input.# Create a `TxDb` object from a `GRanges` object txdb <- txdbmaker::makeTxDbFromGRanges(annotation) #> Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information #> is not available for this TxDb object # Get promoter regions for all transcripts prom_tx2 <- GenomicFeatures::promoters(txdb) prom_tx2 #> GRanges object with 54013 ranges and 2 metadata columns: #> seqnames ranges strand | tx_id tx_name #> <Rle> <IRanges> <Rle> | <integer> <character> #> AT1G01010.1 1 1631-3830 + | 1 AT1G01010.1 #> AT1G03987.1 1 9101-11300 + | 2 AT1G03987.1 #> AT1G01040.1 1 21121-23320 + | 3 AT1G01040.1 #> AT1G01040.2 1 21416-23615 + | 4 AT1G01040.2 #> at1g01046 1 26500-28699 + | 5 at1g01046 #> ... ... ... ... . ... ... #> ATCG01200.1 Pt 135649-137848 - | 54009 ATCG01200.1 #> ATCG01210.1 Pt 137438-139637 - | 54010 ATCG01210.1 #> ATCG01220.1 Pt 137741-139940 - | 54011 ATCG01220.1 #> ATCG01270.1 Pt 144955-147154 - | 54012 ATCG01270.1 #> ATCG01280.1 Pt 151976-154175 - | 54013 ATCG01280.1 #> ------- #> seqinfo: 7 sequences from an unspecified genome; no seqlengths
Finally, you can extract sequences given a genome and some coordinates with the getSeq()
function from BSgenome.
# Extract promoter sequences (1k genes only for demo purposes)
prom_seqs <- getSeq(genome, prom_genes[1:1000])
prom_seqs
#> DNAStringSet object of length 1000:
#> width seq
#> [1] 2200 TTCCGTCATTTGGAGATACGAAATCAAATCTC...GAGGAGCTCGTTGGTCACTATCTCCGTAACAA
#> [2] 2200 CAAAAGGTATGATATGTTTTCACATTGTTTGA...TATAATATGTTTCATTCATGAACCTACGCATT
#> [3] 2200 TAATTAACAGTTTAACACACTTACGTGATTAA...ATTTGCTTAGCTCTTTTCTTTCCTTGATTTCA
#> [4] 2200 GCTAAAGAGACGGCGCTCTTGTCATTTTAAGT...TCTTCTTCGTGACCCCTTTTTACCTGCAAACA
#> [5] 2200 TTTCCAGCTTGTCATTCACAGGATGATTACCG...GTTCTAAGATCATCAAAGTTTGATGCTTTTCC
#> ... ... ...
#> [996] 2200 GTTTGTGTCTGCTTATATGCTCCAACGATTTG...CATCTCGTTCCTCCACCAACGCATCCACAACA
#> [997] 2200 CTCATGTAACCATATATCCACTAATAAAGAGA...TATCCACTTACCTTTTTCGATTTCGTTTGATC
#> [998] 2200 TTCTCTGGCCGAACAAACCCATCAATGCTCCA...CCAACGAAAAATCTTAGCTTAACACCATCGCG
#> [999] 2200 TAAATCTATGTTAGATACGTTCACTAGTCTTA...GAGCTATAATAGCATTGTTATAAGATTAGCAA
#> [1000] 2200 AGCTATTCTTGCTTACCAGTCCATTAATGGTC...ATTGTGGCGGCGAGGACCGTTCTGATAACGGG
Further reading
To learn more about how to work with sequences and ranges using Bioconductor packages, look at the vignettes of the Biostrings and GenomicRanges packages.
Session info
Click to display session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Ventura 13.7.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: UTC
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BSgenome_1.77.0 BiocIO_1.19.0 txdbmaker_1.5.4
#> [4] GenomicFeatures_1.61.3 AnnotationDbi_1.71.0 Biobase_2.69.0
#> [7] rtracklayer_1.69.0 GenomicRanges_1.61.0 Biostrings_2.77.1
#> [10] GenomeInfoDb_1.45.3 XVector_0.49.0 IRanges_2.43.0
#> [13] S4Vectors_0.47.0 BiocGenerics_0.55.0 generics_0.1.4
#> [16] BiocStyle_2.37.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.49.0 SummarizedExperiment_1.39.0
#> [3] httr2_1.1.2 rjson_0.2.23
#> [5] xfun_0.52 lattice_0.22-7
#> [7] vctrs_0.6.5 tools_4.5.1
#> [9] bitops_1.0-9 curl_6.2.3
#> [11] parallel_4.5.1 tibble_3.2.1
#> [13] RSQLite_2.3.11 blob_1.2.4
#> [15] pkgconfig_2.0.3 Matrix_1.7-3
#> [17] dbplyr_2.5.0 lifecycle_1.0.4
#> [19] stringr_1.5.1 compiler_4.5.1
#> [21] Rsamtools_2.25.0 progress_1.2.3
#> [23] codetools_0.2-20 htmltools_0.5.8.1
#> [25] RCurl_1.98-1.17 yaml_2.3.10
#> [27] pillar_1.10.2 crayon_1.5.3
#> [29] BiocParallel_1.43.2 DelayedArray_0.35.1
#> [31] cachem_1.1.0 abind_1.4-8
#> [33] tidyselect_1.2.1 digest_0.6.37
#> [35] stringi_1.8.7 dplyr_1.1.4
#> [37] restfulr_0.0.15 biomaRt_2.65.0
#> [39] fastmap_1.2.0 grid_4.5.1
#> [41] cli_3.6.5 SparseArray_1.9.0
#> [43] magrittr_2.0.3 S4Arrays_1.9.0
#> [45] XML_3.99-0.18 filelock_1.0.3
#> [47] rappdirs_0.3.3 prettyunits_1.2.0
#> [49] UCSC.utils_1.5.0 bit64_4.6.0-1
#> [51] rmarkdown_2.29 httr_1.4.7
#> [53] matrixStats_1.5.0 bit_4.6.0
#> [55] png_0.1-8 hms_1.1.3
#> [57] memoise_2.0.1 evaluate_1.0.3
#> [59] knitr_1.50 BiocFileCache_2.99.5
#> [61] rlang_1.1.6 glue_1.8.0
#> [63] DBI_1.2.3 xml2_1.3.8
#> [65] BiocManager_1.30.25 jsonlite_2.0.0
#> [67] R6_2.6.1 MatrixGenerics_1.21.0
#> [69] GenomicAlignments_1.45.0