Skip to contents

use Anh Vu's OpenAI prompting to develop structured metadata about Bioconductor packages, targeting EDAM ontology and bio.tools schema

Usage

curate_bioc(
  packageName = "chromVAR",
  devurl =
    "https://raw.githubusercontent.com/GreenleafLab/chromVAR/refs/heads/master/README.md"
)

Arguments

packageName

character(1) a Bioconductor software package name, its release landing page will be scraped

devurl

character(1) a URL for doc originating from the developer

Value

two python dicts, base_final and edam_processed

Note

Schema completion is done with temperature set to 0.0; see edamize function for more flexibility.

Examples

if (interactive()) {
  key = Sys.getenv("OPENAI_API_KEY")
  if (nchar(key)==0) stop("need to have OPENAI_API_KEY set")
  lk = curate_bioc()
  str(lk)
}
#> Loading required namespace: reticulate
#> 
#> Warning: An ephemeral virtual environment managed by 'reticulate' is currently in use.
#> To add more packages to your current session, call `py_require()` instead
#> of `py_install()`. Running:
#>   `py_require(c("jsonschema==4.23.0", "openai==1.66.3", "pandas==2.2.3", "requests==2.32.3", "tiktoken==0.9.0"))`
#> Done!
#> List of 2
#>  $ base_final    :Dict (18 items)
#>  $ edam_processed:{'topic': [{'term': 'Epigenomics', 'uri': 'http://edamontology.org/topic_3173'}, {'term': 'Functional genomics', 'uri': 'http://edamontology.org/topic_0085'}, {'term': 'Gene regulation', 'uri': 'http://edamontology.org/topic_0204'}, {'term': 'Bioinformatics', 'uri': 'http://edamontology.org/topic_0091'}], 'function': [{'operation': [{'term': 'Sequence motif discovery', 'uri': 'http://edamontology.org/operation_0238'}, {'term': 'Sequence motif recognition', 'uri': 'http://edamontology.org/operation_0239'}, {'term': 'Gene expression profiling', 'uri': 'http://edamontology.org/operation_0314'}, {'term': 'Differential binding analysis', 'uri': 'http://edamontology.org/operation_3677'}], 'input': [{'data': {'term': 'Sequence record', 'uri': 'http://edamontology.org/data_0849'}, 'format': [{'term': 'BED', 'uri': 'http://edamontology.org/format_3003'}, {'term': 'BAM', 'uri': 'http://edamontology.org/format_2572'}]}], 'output': [{'data': {'term': 'Gene expression profile', 'uri': 'http://edamontology.org/data_0928'}, 'format': [{'term': 'CSV', 'uri': 'http://edamontology.org/format_3752'}]}]}]}