Curation of Bioconductor package metadata, targeting EDAM ontology and ELIXIR bio.tools metadata schemas
Vincent J. Carey, stvjc at channing.harvard.edu
June 03, 2026
Source:vignettes/curate.Rmd
curate.RmdIntroduction
This vignette is derived almost entirely from collaborative code supplied by Anh Nguyet Vu of Sage Bionetworks. The purpose is to illustrate usage of OpenAPI transformation to provide systematic organization and tagging of content available for Bioconductor packages.
Code in this vignette requires that OPENAI_API_KEY be defined.
Example 1: tximeta
We start with the transformation of vignette content, which may be in HTML or PDF, based on the structured data extraction code examples given in a vignette for the ellmer package on CRAN. We prompt GPT-4o to produce a concise and objective summary of at most 450 words, which is placed in the focus component of the returned data.
if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
library(biocEDAM)
content = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html")
str(content)
nchar(content$focus)
}We then use schema-driven inference to produce associated EDAM tags; see the code in inst/curbioc in the package source.
if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
substr(content$focus,1,250)
pks = c("requests", "openai", "jsonschema", "tiktoken", "pandas")
for (i in pks) reticulate::py_require(i)
for (i in pks) reticulate::import(i)
ans = edamize(content$focus)
DT::datatable(mkdf(ans))
}