Curation of Bioconductor package metadata, targeting EDAM ontology and ELIXIR bio.tools metadata schemas

Introduction

This vignette is derived almost entirely from collaborative code supplied by Anh Nguyet Vu of Sage Bionetworks. The purpose is to illustrate usage of OpenAPI transformation to provide systematic organization and tagging of content available for Bioconductor packages.

Code in this vignette requires that OPENAI_API_KEY be defined.

Example 1: tximeta

We start with the transformation of vignette content, which may be in HTML or PDF, based on the structured data extraction code examples given in a vignette for the ellmer package on CRAN. We prompt GPT-4o to produce a concise and objective summary of at most 450 words, which is placed in the focus component of the returned data.

if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
library(biocEDAM)
content = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html")
str(content)
nchar(content$focus)
}

## List of 5
##  $ author    : chr [1:4] "Michael I. Love" "Charlotte Soneson" "Peter F. Hickey" "Rob Patro"
##  $ topics    : chr [1:19] "RNA-seq" "transcript quantification" "genomic annotation" "metadata provenance" ...
##  $ focused   : chr "The article introduces tximeta, a Bioconductor package for R that automates annotation and metadata attachment "| __truncated__
##  $ coherence : int 95
##  $ persuasion: num 0.94

## [1] 2029

We then use schema-driven inference to produce associated EDAM tags; see the code in inst/curbioc in the package source.

if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
substr(content$focus,1,250)
ans = edamize(content$focus)
DT::datatable(mkdf(ans))
}

## Loading required namespace: reticulate

## Success after 0 attempts

Example 2: MSnbase

if (nchar(Sys.getenv("OPENAI_API_KEY"))>0) {
mm = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/v05-MSnbase-development.html")
uu = edamize(mm$focus)
if (is.null(uu)) uu = edamize(mm$focus)  # second try
DT::datatable(mkdf(uu))
}

## Using model = "gpt-4.1".

## Success after 0 attempts

Caveats

Sometimes there is no result. This pertains to indeterminacy in the GPT environment we are using. Often a second try will get a result. If you have persistent trouble, please file an issue.

Vincent J. Carey, stvjc at channing.harvard.edu

October 18, 2025

Introduction

Example 1: tximeta

Example 2: MSnbase

Caveats