The NCI GDC has a complex data model that allows various studies to supply numerous clinical and demographic data elements. However, across all projects that enter the GDC, there are similarities. This function returns four data.frames associated with case_ids from the GDC.
Arguments
- case_ids
a character() vector of case_ids, typically from "cases" query.
- include_list_cols
logical(1), whether to include list columns in the "main" data.frame. These list columns have values for aliquots, samples, etc. While these may be useful for some situations, they are generally not that useful as clinical annotations.
Value
A list of four data.frames:
main, representing basic case identification and metadata (update date, etc.)
diagnoses
esposures
demographic
Details
Note that these data.frames can, in general, have different numbers of rows (or even no rows at all). If one wishes to combine to produce a single data.frame, using the approach of left joining to the "main" data.frame will yield a useful combined data.frame. We do not do that directly given the potential for 1:many relationships. It is up to the user to determine what the best approach is for any given dataset.
Examples
case_ids = cases() %>% results(size=10) %>% ids()
clinical_data = gdc_clinical(case_ids)
# overview of clinical results
class(clinical_data)
#> [1] "GDCClinicalList" "list"
names(clinical_data)
#> [1] "demographic" "diagnoses" "exposures" "main"
sapply(clinical_data, class)
#> demographic diagnoses exposures main
#> [1,] "tbl_df" "tbl_df" "tbl_df" "tbl_df"
#> [2,] "tbl" "tbl" "tbl" "tbl"
#> [3,] "data.frame" "data.frame" "data.frame" "data.frame"
sapply(clinical_data, nrow)
#> demographic diagnoses exposures main
#> 10 10 0 10
# available data
head(clinical_data$main)
#> # A tibble: 6 × 7
#> id disea…¹ submi…² prima…³ updat…⁴ case_id state
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 b9a32a1c-9c93-5a92-8b30-e09a91d… Comple… TARGET… Kidney 2019-0… b9a32a… rele…
#> 2 c2829ab9-d5b2-5a82-a134-de9c591… Comple… TARGET… Kidney 2019-0… c2829a… rele…
#> 3 f5548317-3be4-5227-a655-dfb97e6… Comple… TARGET… Kidney 2019-0… f55483… rele…
#> 4 cf3bd8c5-4cd6-57c6-b07a-f20d414… Comple… TARGET… Kidney 2019-0… cf3bd8… rele…
#> 5 eaffceb7-3b14-5b19-a7f0-a43bd8c… Comple… TARGET… Kidney 2019-0… eaffce… rele…
#> 6 c07901d8-2829-5e98-9e8a-e12faaf… Comple… TARGET… Kidney 2019-0… c07901… rele…
#> # … with abbreviated variable names ¹disease_type, ²submitter_id,
#> # ³primary_site, ⁴updated_datetime
head(clinical_data$demographic)
#> # A tibble: 6 × 10
#> demograph…¹ race gender ethni…² vital…³ updat…⁴ submi…⁵ state creat…⁶ case_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 6eec0bff-2… white female not re… Alive 2019-0… TARGET… rele… 2017-0… b9a32a…
#> 2 25d5d300-a… white male not re… Alive 2019-0… TARGET… rele… 2017-0… c2829a…
#> 3 4b79eb03-2… white male not hi… Alive 2019-0… TARGET… rele… 2017-0… f55483…
#> 4 77a2621d-4… white male not re… Alive 2019-0… TARGET… rele… 2017-0… cf3bd8…
#> 5 dfe7b3aa-1… white female not hi… Alive 2019-0… TARGET… rele… 2017-0… eaffce…
#> 6 f7ae4449-8… white male not hi… Alive 2019-0… TARGET… rele… 2017-0… c07901…
#> # … with abbreviated variable names ¹demographic_id, ²ethnicity, ³vital_status,
#> # ⁴updated_datetime, ⁵submitter_id, ⁶created_datetime
head(clinical_data$diagnoses)
#> # A tibble: 6 × 19
#> case_id days_…¹ morph…² submi…³ created_datetime last_…⁴ tissu…⁵ days_…⁶
#> <chr> <lgl> <chr> <chr> <dttm> <chr> <chr> <dbl>
#> 1 b9a32a1c-… NA 8960/3 TARGET… 2017-02-25 02:55:58 not re… Kidney… 3828
#> 2 c2829ab9-… NA 8960/3 TARGET… 2017-02-25 02:57:38 not re… Kidney… 3706
#> 3 f5548317-… NA 8960/3 TARGET… 2017-02-25 02:56:34 not re… Kidney… 2717
#> 4 cf3bd8c5-… NA 8960/3 TARGET… 2017-02-25 02:55:01 not re… Kidney… 4695
#> 5 eaffceb7-… NA 8960/3 TARGET… 2017-02-25 02:57:07 not re… Kidney… 3954
#> 6 c07901d8-… NA 8960/3 TARGET… 2017-02-25 03:01:15 not re… Kidney… 1918
#> # … with 11 more variables: age_at_diagnosis <int>, primary_diagnosis <chr>,
#> # updated_datetime <dttm>, diagnosis_id <chr>, year_of_diagnosis <dbl>,
#> # cog_renal_stage <chr>, site_of_resection_or_biopsy <chr>, state <chr>,
#> # tumor_grade <chr>, days_to_last_known_disease_status <lgl>,
#> # progression_or_recurrence <chr>, and abbreviated variable names
#> # ¹days_to_recurrence, ²morphology, ³submitter_id,
#> # ⁴last_known_disease_status, ⁵tissue_or_organ_of_origin, …
head(clinical_data$exposures)
#> # A tibble: 0 × 0