Skip to contents

A gene model is essentially a set of annotations that describes the genomic locations of the known genes, transcripts, exons, and CDS, for a given organism. The standardized file format to hold gene models is a GFF or GTF. In Bioconductor, gene model information is typically represented as a TxDb object but also sometimes as a GRanges or GRangesList object. We can use the makeTxDbFromGFF() function from the txdbmaker package to import a GFF or GTF file as a TxDb object.

Bioconductor packages used in this document

How to load a gene model from a GFF or GTF file

We will use a small .gff3 file provided by the txdbmaker package.

suppressPackageStartupMessages({
    library(txdbmaker)
})

 gff_file <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker")
 
 txdb <- makeTxDbFromGFF(gff_file, format="gff3")
 txdb
#> TxDb object:
#> # Db type: TxDb
#> # Supporting package: GenomicFeatures
#> # Data source: /Users/runner/work/_temp/Library/txdbmaker/extdata/GFF3_files/a.gff3
#> # Organism: NA
#> # Taxonomy ID: NA
#> # miRBase build ID: NA
#> # Genome: NA
#> # Nb of transcripts: 488
#> # Db created by: txdbmaker package from Bioconductor
#> # Creation time: 2025-02-17 15:47:55 +0000 (Mon, 17 Feb 2025)
#> # txdbmaker version at creation time: 1.3.1
#> # RSQLite version at creation time: 2.3.9
#> # DBSCHEMAVERSION: 1.2

See ?makeTxDbFromGFF in the txdbmaker package for more information.

Extract the exon coordinates grouped by gene from this gene model:


 exonsBy(txdb, by="gene")
#> GRangesList object of length 488:
#> $Solyc00g005000.2
#> GRanges object with 2 ranges and 2 metadata columns:
#>         seqnames      ranges strand |   exon_id            exon_name
#>            <Rle>   <IRanges>  <Rle> | <integer>          <character>
#>   [1] SL2.40ch00 16437-17275      + |         1 Solyc00g005000.2.1.1
#>   [2] SL2.40ch00 17336-18189      + |         2 Solyc00g005000.2.1.2
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $Solyc00g005020.1
#> GRanges object with 3 ranges and 2 metadata columns:
#>         seqnames      ranges strand |   exon_id            exon_name
#>            <Rle>   <IRanges>  <Rle> | <integer>          <character>
#>   [1] SL2.40ch00 68062-68211      + |         3 Solyc00g005020.1.1.1
#>   [2] SL2.40ch00 68344-68568      + |         4 Solyc00g005020.1.1.2
#>   [3] SL2.40ch00 68654-68764      + |         5 Solyc00g005020.1.1.3
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $Solyc00g005040.2
#> GRanges object with 4 ranges and 2 metadata columns:
#>         seqnames        ranges strand |   exon_id            exon_name
#>            <Rle>     <IRanges>  <Rle> | <integer>          <character>
#>   [1] SL2.40ch00 550920-550945      + |         6 Solyc00g005040.2.1.1
#>   [2] SL2.40ch00 551034-551132      + |         7 Solyc00g005040.2.1.2
#>   [3] SL2.40ch00 551218-551250      + |         8 Solyc00g005040.2.1.3
#>   [4] SL2.40ch00 551343-551576      + |         9 Solyc00g005040.2.1.4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> ...
#> <485 more elements>

Session info

Click to display session info
sessionInfo()
#> R Under development (unstable) (2025-02-15 r87725)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.7.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: UTC
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] txdbmaker_1.3.1        GenomicFeatures_1.59.1 AnnotationDbi_1.69.0  
#>  [4] Biobase_2.67.0         GenomicRanges_1.59.1   GenomeInfoDb_1.43.4   
#>  [7] IRanges_2.41.3         S4Vectors_0.45.4       BiocGenerics_0.53.6   
#> [10] generics_0.1.3         BiocStyle_2.35.0      
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.47.0             SummarizedExperiment_1.37.0
#>  [3] httr2_1.1.0                 rjson_0.2.23               
#>  [5] xfun_0.50                   lattice_0.22-6             
#>  [7] vctrs_0.6.5                 tools_4.5.0                
#>  [9] bitops_1.0-9                curl_6.2.0                 
#> [11] parallel_4.5.0              tibble_3.2.1               
#> [13] RSQLite_2.3.9               blob_1.2.4                 
#> [15] pkgconfig_2.0.3             Matrix_1.7-2               
#> [17] dbplyr_2.5.0                lifecycle_1.0.4            
#> [19] GenomeInfoDbData_1.2.13     stringr_1.5.1              
#> [21] compiler_4.5.0              Rsamtools_2.23.1           
#> [23] Biostrings_2.75.3           progress_1.2.3             
#> [25] codetools_0.2-20            htmltools_0.5.8.1          
#> [27] RCurl_1.98-1.16             yaml_2.3.10                
#> [29] pillar_1.10.1               crayon_1.5.3               
#> [31] BiocParallel_1.41.0         cachem_1.1.0               
#> [33] DelayedArray_0.33.6         abind_1.4-8                
#> [35] tidyselect_1.2.1            digest_0.6.37              
#> [37] stringi_1.8.4               dplyr_1.1.4                
#> [39] restfulr_0.0.15             biomaRt_2.63.1             
#> [41] fastmap_1.2.0               grid_4.5.0                 
#> [43] cli_3.6.4                   SparseArray_1.7.5          
#> [45] magrittr_2.0.3              S4Arrays_1.7.3             
#> [47] XML_3.99-0.18               filelock_1.0.3             
#> [49] rappdirs_0.3.3              prettyunits_1.2.0          
#> [51] UCSC.utils_1.3.1            bit64_4.6.0-1              
#> [53] rmarkdown_2.29              XVector_0.47.2             
#> [55] httr_1.4.7                  matrixStats_1.5.0          
#> [57] bit_4.5.0.1                 png_0.1-8                  
#> [59] hms_1.1.3                   memoise_2.0.1              
#> [61] evaluate_1.0.3              knitr_1.49                 
#> [63] BiocIO_1.17.1               BiocFileCache_2.15.1       
#> [65] rtracklayer_1.67.0          rlang_1.1.5                
#> [67] glue_1.8.0                  DBI_1.2.3                  
#> [69] xml2_1.3.6                  BiocManager_1.30.25        
#> [71] jsonlite_1.8.9              R6_2.6.1                   
#> [73] MatrixGenerics_1.19.1       GenomicAlignments_1.43.0