suppressPackageStartupMessages({
library(txdbmaker)
})
gff_file <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker")
txdb <- makeTxDbFromGFF(gff_file, format="gff3")
txdb
#> TxDb object:
#> # Db type: TxDb
#> # Supporting package: GenomicFeatures
#> # Data source: /Users/runner/work/_temp/Library/txdbmaker/extdata/GFF3_files/a.gff3
#> # Organism: NA
#> # Taxonomy ID: NA
#> # Genome: NA
#> # Nb of transcripts: 488
#> # Db created by: txdbmaker package from Bioconductor
#> # Creation time: 2025-06-19 17:14:40 +0000 (Thu, 19 Jun 2025)
#> # txdbmaker version at creation time: 1.5.4
#> # RSQLite version at creation time: 2.3.11
#> # DBSCHEMAVERSION: 1.2
How to load a gene model from a GFF or GTF file
2025-06-19
Source:vignettes/how-to-load-gene-from-gff-gtf.qmd
A gene model is essentially a set of annotations that describes the genomic locations of the known genes, transcripts, exons, and CDS, for a given organism. The standardized file format to hold gene models is a GFF or GTF. In Bioconductor, gene model information is typically represented as a TxDb object but also sometimes as a GRanges or GRangesList object. We can use the makeTxDbFromGFF()
function from the txdbmaker package to import a GFF or GTF file as a TxDb object.
Bioconductor packages used in this document
How to load a gene model from a GFF or GTF file
We will use a small .gff3 file provided by the txdbmaker package.
See ?makeTxDbFromGFF
in the txdbmaker package for more information.
Extract the exon coordinates grouped by gene from this gene model:
exonsBy(txdb, by="gene")
#> GRangesList object of length 488:
#> $Solyc00g005000.2
#> GRanges object with 2 ranges and 2 metadata columns:
#> seqnames ranges strand | exon_id exon_name
#> <Rle> <IRanges> <Rle> | <integer> <character>
#> [1] SL2.40ch00 16437-17275 + | 1 Solyc00g005000.2.1.1
#> [2] SL2.40ch00 17336-18189 + | 2 Solyc00g005000.2.1.2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $Solyc00g005020.1
#> GRanges object with 3 ranges and 2 metadata columns:
#> seqnames ranges strand | exon_id exon_name
#> <Rle> <IRanges> <Rle> | <integer> <character>
#> [1] SL2.40ch00 68062-68211 + | 3 Solyc00g005020.1.1.1
#> [2] SL2.40ch00 68344-68568 + | 4 Solyc00g005020.1.1.2
#> [3] SL2.40ch00 68654-68764 + | 5 Solyc00g005020.1.1.3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $Solyc00g005040.2
#> GRanges object with 4 ranges and 2 metadata columns:
#> seqnames ranges strand | exon_id exon_name
#> <Rle> <IRanges> <Rle> | <integer> <character>
#> [1] SL2.40ch00 550920-550945 + | 6 Solyc00g005040.2.1.1
#> [2] SL2.40ch00 551034-551132 + | 7 Solyc00g005040.2.1.2
#> [3] SL2.40ch00 551218-551250 + | 8 Solyc00g005040.2.1.3
#> [4] SL2.40ch00 551343-551576 + | 9 Solyc00g005040.2.1.4
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> ...
#> <485 more elements>
Session info
Click to display session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Ventura 13.7.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: UTC
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] txdbmaker_1.5.4 GenomicFeatures_1.61.3 AnnotationDbi_1.71.0
#> [4] Biobase_2.69.0 GenomicRanges_1.61.0 GenomeInfoDb_1.45.3
#> [7] IRanges_2.43.0 S4Vectors_0.47.0 BiocGenerics_0.55.0
#> [10] generics_0.1.4 BiocStyle_2.37.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.49.0 SummarizedExperiment_1.39.0
#> [3] httr2_1.1.2 rjson_0.2.23
#> [5] xfun_0.52 lattice_0.22-7
#> [7] vctrs_0.6.5 tools_4.5.1
#> [9] bitops_1.0-9 curl_6.2.3
#> [11] parallel_4.5.1 tibble_3.2.1
#> [13] RSQLite_2.3.11 blob_1.2.4
#> [15] pkgconfig_2.0.3 Matrix_1.7-3
#> [17] dbplyr_2.5.0 lifecycle_1.0.4
#> [19] stringr_1.5.1 compiler_4.5.1
#> [21] Rsamtools_2.25.0 Biostrings_2.77.1
#> [23] progress_1.2.3 codetools_0.2-20
#> [25] htmltools_0.5.8.1 RCurl_1.98-1.17
#> [27] yaml_2.3.10 pillar_1.10.2
#> [29] crayon_1.5.3 BiocParallel_1.43.2
#> [31] cachem_1.1.0 DelayedArray_0.35.1
#> [33] abind_1.4-8 tidyselect_1.2.1
#> [35] digest_0.6.37 stringi_1.8.7
#> [37] dplyr_1.1.4 restfulr_0.0.15
#> [39] biomaRt_2.65.0 fastmap_1.2.0
#> [41] grid_4.5.1 cli_3.6.5
#> [43] SparseArray_1.9.0 magrittr_2.0.3
#> [45] S4Arrays_1.9.0 XML_3.99-0.18
#> [47] filelock_1.0.3 rappdirs_0.3.3
#> [49] prettyunits_1.2.0 UCSC.utils_1.5.0
#> [51] bit64_4.6.0-1 rmarkdown_2.29
#> [53] XVector_0.49.0 httr_1.4.7
#> [55] matrixStats_1.5.0 bit_4.6.0
#> [57] png_0.1-8 hms_1.1.3
#> [59] memoise_2.0.1 evaluate_1.0.3
#> [61] knitr_1.50 BiocIO_1.19.0
#> [63] BiocFileCache_2.99.5 rtracklayer_1.69.0
#> [65] rlang_1.1.6 glue_1.8.0
#> [67] DBI_1.2.3 xml2_1.3.8
#> [69] BiocManager_1.30.25 jsonlite_2.0.0
#> [71] R6_2.6.1 MatrixGenerics_1.21.0
#> [73] GenomicAlignments_1.45.0