Contents

1 Introduction

In this tutorial we walk through a typical spatial transcriptomics analysis using Bioconductor packages.

Spatial transcriptomics is a fast evolving set of technologies and we cannot cover all protocols here. However, we will show an example of spot-based protocols, namely 10X Genomics Visium, and an example of imaging-based methods, namely Nanostring CosMX. Note that this is an area of current development and some methods will likely be improved in the next months.

Most of the steps covered here, especially those for spot-based data, are described in the Best Practices ST book.

The Voyager Bioconductor package (Moses et al. 2023) also has an extensive set of tutorials that cover most of the currently available spatial transcriptomics technologies.

While not covered in this tutorial, there are packages and software tools for the analysis of spatial transcriptomics data outside of Bioconductor too. Popular tools include the Seurat R package, the Giotto R package and the SpatialData python package.

2 Visium Data

We will initially focus on 10X Genomics Visium. We start from the output of the Space Ranger preprocessing software. This is the 10X Genomics software suite that allows to pre-process the FASTQ files generated by the sequencing platform and perform alignment and quantification. We will perform exploratory data analysis (EDA) and quality control (QC). We will then cover normalization, the identification of spatially variable genes dimensionality reduction and cell type identification and .

2.1 Experimental data

We will use one sample of human brain from the dorsolateral prefrontal cortex (DLPFC) region, measured using the 10x Genomics Visium platform.

In the full dataset, there are 12 samples in total, from 3 individuals, with 2 pairs of spatially adjacent replicates (serial sections) per individual (4 samples per individual). The individuals and spatially adjacent replicates can be used as blocking factors. Each sample spans the six layers of the cortex plus white matter in a perpendicular tissue section. For the examples in this workflow we use a single sample from this dataset (sample 151673).

For more details on the dataset, see Maynard et al. (2021). The full dataset is publicly available through the spatialLIBD Bioconductor package.

2.2 The SpatialExperiment class

SpatialExperiment is a S4 class that extends SingleCellExperiment and can be used for efficiently storing and working with spatial data in R/Bioconductor.

This class is itself extended by MoleculeExperiment and SpatialFeatureExperiment, which allow to more easily work with imaging-based data. For the moment, we will use SpatialExperiment.

A more thorough overview of SingleCellExperiment can be found in Righelli et al. (2022).

We start by loading the data.

library(SpatialExperiment)
library(STexampleData)
spe <- Visium_humanDLPFC()
spe
## class: SpatialExperiment 
## dim: 33538 4992 
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(8): barcode_id sample_id ... reference cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

The SpatialExperiment class should be fairly familiar, since it is heavily based on the SingleCellExperiment class. In addition to the slots that you already know, a SpatialExperiment object includes the spatialCoords and the imgData slots.

head(spatialCoords(spe))
##                    pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1               3913               2435
## AAACAAGTATCTCCCA-1               9791               8468
## AAACAATCTACTAGCA-1               5769               2807
## AAACACCAATAACTGC-1               4068               9505
## AAACAGAGCGACTCCT-1               9271               4151
## AAACAGCTTTCAGAAG-1               3393               7583
imgData(spe)
## DataFrame with 2 rows and 4 columns
##       sample_id    image_id   data scaleFactor
##     <character> <character> <list>   <numeric>
## 1 sample_151673      lowres   ####   0.0450045
## 2 sample_151673       hires   ####   0.1500150

Note that the colData DataFrame is still where most of the useful information about the spots are available.

colData(spe)
## DataFrame with 4992 rows and 8 columns
##                            barcode_id     sample_id in_tissue array_row
##                           <character>   <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673         0         0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
## ...                               ...           ...       ...       ...
## TTGTTTCACATCCAGG-1 TTGTTTCACATCCAGG-1 sample_151673         1        58
## TTGTTTCATTAGTCTA-1 TTGTTTCATTAGTCTA-1 sample_151673         1        60
## TTGTTTCCATACAACT-1 TTGTTTCCATACAACT-1 sample_151673         1        45
## TTGTTTGTATTACACG-1 TTGTTTGTATTACACG-1 sample_151673         1        73
## TTGTTTGTGTAAATTC-1 TTGTTTGTGTAAATTC-1 sample_151673         1         7
##                    array_col ground_truth   reference cell_count
##                    <integer>  <character> <character>  <integer>
## AAACAACGAATAGTTC-1        16           NA          NA         NA
## AAACAAGTATCTCCCA-1       102       Layer3      Layer3          6
## AAACAATCTACTAGCA-1        43       Layer1      Layer1         16
## AAACACCAATAACTGC-1        19           WM          WM          5
## AAACAGAGCGACTCCT-1        94       Layer3      Layer3          2
## ...                      ...          ...         ...        ...
## TTGTTTCACATCCAGG-1        42           WM          WM          3
## TTGTTTCATTAGTCTA-1        30           WM          WM          4
## TTGTTTCCATACAACT-1        27       Layer6      Layer6          3
## TTGTTTGTATTACACG-1        41           WM          WM         16
## TTGTTTGTGTAAATTC-1        51       Layer2      Layer2          5

In this case, we have a ground truth annotation of the spots, as well as an indication of the number of cells covered by each spots.

2.3 Visualization

One nice feature of spatial transcriptomics is that we can “see” the tissue under study and visualize the spots on top of the histology image. We will use the `ggspavis package to do so.

library(ggspavis)

plotVisium(spe)