use the extract_data facility defined in ellmer's doc to obtain summary information about an html document, tailored to vignettes in bioconductor

Usage

vig2data(
  url = "https://bioconductor.org/packages/release/bioc/html/Voyager.html",
  maxnchar = 30000,
  n_pdf_pages = 10
)

Arguments

url: character(1) URL for an html bioconductor vignettes
maxnchar: numeric(1) text is truncated to a substring with this length
n_pdf_pages: numeric(1) maximum number of pages to extract text from for pdf vignettes

Value

a list with components author, topics, focused, coherence, and persuasion

Note

Based on code from https://cran.r-project.org/web/packages/ellmer/vignettes/structured-data.html March 15 2025. Requires that OPENAI_API_KEY is available in environment.

Examples

if (interactive()) {
# be sure OPENAI_API_KEY is available to Sys.getenv
tst = vig2data()
str(tst)
}
#> Using model = "gpt-4.1".
#> List of 5
#>  $ author    : chr [1:5] "Lambda Moses" "Alik Huseynov" "Kayla Jackson" "Laura Luebbert" ...
#>  $ topics    : chr [1:15] "Spatial omics" "Single-cell genomics" "Spatial statistics" "Moran's I" ...
#>  $ focused   : chr "The Voyager R/Bioconductor package provides a suite of exploratory spatial data analysis (ESDA) methods specifi"| __truncated__
#>  $ coherence : int 92
#>  $ persuasion: num 0.87