Curation of Bioconductor package metadata, targeting EDAM ontology and ELIXIR bio.tools metadata schemas
Vincent J. Carey, stvjc at channing.harvard.edu
October 18, 2025
Source:vignettes/curate.Rmd
curate.RmdIntroduction
This vignette is derived almost entirely from collaborative code supplied by Anh Nguyet Vu of Sage Bionetworks. The purpose is to illustrate usage of OpenAPI transformation to provide systematic organization and tagging of content available for Bioconductor packages.
Code in this vignette requires that OPENAI_API_KEY be defined.
Example 1: tximeta
We start with the transformation of vignette content, which may be in HTML or PDF, based on the structured data extraction code examples given in a vignette for the ellmer package on CRAN. We prompt GPT-4o to produce a concise and objective summary of at most 450 words, which is placed in the focus component of the returned data.
if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
library(biocEDAM)
content = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html")
str(content)
nchar(content$focus)
}## List of 5
## $ author : chr [1:4] "Michael I. Love" "Charlotte Soneson" "Peter F. Hickey" "Rob Patro"
## $ topics : chr [1:19] "RNA-seq" "transcript quantification" "genomic annotation" "metadata provenance" ...
## $ focused : chr "The article introduces tximeta, a Bioconductor package for R that automates annotation and metadata attachment "| __truncated__
## $ coherence : int 95
## $ persuasion: num 0.94
## [1] 2029
We then use schema-driven inference to produce associated EDAM tags; see the code in inst/curbioc in the package source.
if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
substr(content$focus,1,250)
ans = edamize(content$focus)
DT::datatable(mkdf(ans))
}## Loading required namespace: reticulate
## Success after 0 attempts
Example 2: MSnbase
if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
mm = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/v05-MSnbase-development.html")
uu = edamize(mm$focus)
if (is.null(uu)) uu = edamize(mm$focus) # second try
DT::datatable(mkdf(uu))
}## Using model = "gpt-4.1".
## Success after 0 attempts