Skip to contents

The NCI GDC has a complex data model that allows various studies to supply numerous clinical and demographic data elements. However, across all projects that enter the GDC, there are similarities. This function returns four data.frames associated with case_ids from the GDC.

Usage

gdc_clinical(case_ids, include_list_cols = FALSE)

Arguments

case_ids

a character() vector of case_ids, typically from "cases" query.

include_list_cols

logical(1), whether to include list columns in the "main" data.frame. These list columns have values for aliquots, samples, etc. While these may be useful for some situations, they are generally not that useful as clinical annotations.

Value

A list of four data.frames:

  1. main, representing basic case identification and metadata (update date, etc.)

  2. diagnoses

  3. esposures

  4. demographic

Details

Note that these data.frames can, in general, have different numbers of rows (or even no rows at all). If one wishes to combine to produce a single data.frame, using the approach of left joining to the "main" data.frame will yield a useful combined data.frame. We do not do that directly given the potential for 1:many relationships. It is up to the user to determine what the best approach is for any given dataset.

Examples

case_ids = cases() %>% results(size=10) %>% ids()
clinical_data = gdc_clinical(case_ids)

# overview of clinical results
class(clinical_data)
#> [1] "GDCClinicalList" "list"           
names(clinical_data)
#> [1] "demographic" "diagnoses"   "exposures"   "main"       
sapply(clinical_data, class)
#>      demographic  diagnoses    exposures    main        
#> [1,] "tbl_df"     "tbl_df"     "tbl_df"     "tbl_df"    
#> [2,] "tbl"        "tbl"        "tbl"        "tbl"       
#> [3,] "data.frame" "data.frame" "data.frame" "data.frame"
sapply(clinical_data, nrow)
#> demographic   diagnoses   exposures        main 
#>          10          10           0          10 

# available data
head(clinical_data$main)
#> # A tibble: 6 × 7
#>   id                               disea…¹ submi…² prima…³ updat…⁴ case_id state
#>   <chr>                            <chr>   <chr>   <chr>   <chr>   <chr>   <chr>
#> 1 b9a32a1c-9c93-5a92-8b30-e09a91d… Comple… TARGET… Kidney  2019-0… b9a32a… rele…
#> 2 c2829ab9-d5b2-5a82-a134-de9c591… Comple… TARGET… Kidney  2019-0… c2829a… rele…
#> 3 f5548317-3be4-5227-a655-dfb97e6… Comple… TARGET… Kidney  2019-0… f55483… rele…
#> 4 cf3bd8c5-4cd6-57c6-b07a-f20d414… Comple… TARGET… Kidney  2019-0… cf3bd8… rele…
#> 5 eaffceb7-3b14-5b19-a7f0-a43bd8c… Comple… TARGET… Kidney  2019-0… eaffce… rele…
#> 6 c07901d8-2829-5e98-9e8a-e12faaf… Comple… TARGET… Kidney  2019-0… c07901… rele…
#> # … with abbreviated variable names ¹​disease_type, ²​submitter_id,
#> #   ³​primary_site, ⁴​updated_datetime
head(clinical_data$demographic)
#> # A tibble: 6 × 10
#>   demograph…¹ race  gender ethni…² vital…³ updat…⁴ submi…⁵ state creat…⁶ case_id
#>   <chr>       <chr> <chr>  <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>  
#> 1 6eec0bff-2… white female not re… Alive   2019-0… TARGET… rele… 2017-0… b9a32a…
#> 2 25d5d300-a… white male   not re… Alive   2019-0… TARGET… rele… 2017-0… c2829a…
#> 3 4b79eb03-2… white male   not hi… Alive   2019-0… TARGET… rele… 2017-0… f55483…
#> 4 77a2621d-4… white male   not re… Alive   2019-0… TARGET… rele… 2017-0… cf3bd8…
#> 5 dfe7b3aa-1… white female not hi… Alive   2019-0… TARGET… rele… 2017-0… eaffce…
#> 6 f7ae4449-8… white male   not hi… Alive   2019-0… TARGET… rele… 2017-0… c07901…
#> # … with abbreviated variable names ¹​demographic_id, ²​ethnicity, ³​vital_status,
#> #   ⁴​updated_datetime, ⁵​submitter_id, ⁶​created_datetime
head(clinical_data$diagnoses)
#> # A tibble: 6 × 19
#>   case_id    days_…¹ morph…² submi…³ created_datetime    last_…⁴ tissu…⁵ days_…⁶
#>   <chr>      <lgl>   <chr>   <chr>   <dttm>              <chr>   <chr>     <dbl>
#> 1 b9a32a1c-… NA      8960/3  TARGET… 2017-02-25 02:55:58 not re… Kidney…    3828
#> 2 c2829ab9-… NA      8960/3  TARGET… 2017-02-25 02:57:38 not re… Kidney…    3706
#> 3 f5548317-… NA      8960/3  TARGET… 2017-02-25 02:56:34 not re… Kidney…    2717
#> 4 cf3bd8c5-… NA      8960/3  TARGET… 2017-02-25 02:55:01 not re… Kidney…    4695
#> 5 eaffceb7-… NA      8960/3  TARGET… 2017-02-25 02:57:07 not re… Kidney…    3954
#> 6 c07901d8-… NA      8960/3  TARGET… 2017-02-25 03:01:15 not re… Kidney…    1918
#> # … with 11 more variables: age_at_diagnosis <int>, primary_diagnosis <chr>,
#> #   updated_datetime <dttm>, diagnosis_id <chr>, year_of_diagnosis <dbl>,
#> #   cog_renal_stage <chr>, site_of_resection_or_biopsy <chr>, state <chr>,
#> #   tumor_grade <chr>, days_to_last_known_disease_status <lgl>,
#> #   progression_or_recurrence <chr>, and abbreviated variable names
#> #   ¹​days_to_recurrence, ²​morphology, ³​submitter_id,
#> #   ⁴​last_known_disease_status, ⁵​tissue_or_organ_of_origin, …
head(clinical_data$exposures)
#> # A tibble: 0 × 0