STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping (Q10348)

Dataset published at Zenodo repository.

Language	Label	Description	Also known as
default for all languages	No label defined
English	STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping	Dataset published at Zenodo repository.

Statements

instance of

data set

0 references

description

Spatial transcriptomics (ST) technologies enable high throughput gene expression characterization within thin tissue sections. However, comparing spatial observations across sections, samples, and technologies remains challenging. To address this challenge, we developed STalign to align ST datasets in a manner that accounts for partially matched tissue sections and other local non-linear distortions using diffeomorphic metric mapping. We apply STalign to align ST datasets within and across technologies as well as to align ST datasets to a 3D common coordinate framework. We show that STalign achieves high gene expression and cell-type correspondence across matched spatial locations that is significantly improved over landmark-based affine alignments. Applying STalign to align ST datasets of the mouse brain to the 3D common coordinate framework from the Allen Brain Atlas, we highlight how STalign can be used to lift over brain region annotations and enable the interrogation of compositional heterogeneity across anatomical structures. STalign is available as an open-source Python toolkit at https://github.com/JEFworks-Lab/STalign and as supplementary software with additional documentation and tutorials available at https://jef.works/STalign. Here we have included alignment results that were used in performance analysis of STalign: We aligned Slice 2 Replicate 3 to Slice 2 Replicate 2 of the MERFISH mouse coronal brain sections available from Vizgen Data Release V1.0. May 2021 (https://info.vizgen.com/mouse-brain-map). STalign_S2R3_to_S2R2.csv.gz contains cell ids, original cell centroid positions of S2R3, cell positions of S2R3 after alignment to S2R2 with STalign, cell positions of S2R3 after supervised affine alignment to S2R2, and counts for genes and blanks. STalign_S2R2.csv.gz contains cell ids, cell centroid positions of S2R2 and counts for genes and blanks. Additionally, we aligned Slice 2 Replicate 3 to a Visium dataset of an FFPE preserved adult mouse brain were obtained from the 10X Datasets website for Spatial Gene ExpressionDataset bySpace Ranger1.3.0 (https://www.10xgenomics.com/resources/datasets/adult-mouse-brain-ffpe-1-standard-1-3-0). STalign_S2R3_to_Visium.csv.gz contains cell ids, original cell centroid positions of S2R3, cell positions of S2R3 after alignment to Visium HE staining with STalign, and counts for genes and blanks. Furthermore, we performed alignments with the 50um resolution 3D Allen Reference Atlas Nissl common coordinate framework, CCF (https://help.brain-map.org/display/mouseconnectivity/API). We applied STalign to align the Allen CCF to each of the 9 MERFISH slices (3 slice locations with 3 biological replicates) provided by Vizgen. Because the Allen CCF has annotated brain regions, we were able to lift over those brain region annotations to label all cells in the MERFISH datasets. Also, since the STalign mappings from the Allen CCF to the MERFISH slices are invertible, for each slice we can apply the inverse of the mapping to get cell positions in the Allen CCF coordinates. STalign_SXRX_with_structure_id_name.csv.gz contains cell ids for Slice X Replicate X, original cell centroid positions, cell xyz-coordinates in Allen CCF, brain structure id per cell, brain structure acronym To evaluate the 3D CCF alignment, we performed unified transcriptional clustering analysis and cell-type annotation. All MERFISH datasets were combined. Transcriptional clustering analysis and cell type annotation was performed using the SCANPY package [version 1.9.1]. Data were normalized to counts per million (scanpy: normalize_total) and log transformed (scanpy: log1p). PCA (scanpy: pca) was computed on the cell by gene matrix. A neighborhood graph of cells using the top 10 PCs and 10 nearest neighbors was created (scanpy: neighbors), and Leiden clustering was performed on this graph (scanpy: leiden) to identify 29 clusters. Differentially expressed genes were extracted from each cluster (scanpy: rank_genes_groups), and cell-types were annotated based on marker genes in each cluster. STalign_celltypeannotations_merfishslices_v2.csv.gz contains for all nine slices cell ids and cell type annotations This updated (v2) cell-type annotation file contains a new column with simplified cell-types. Briefly, we fixed typos, standardized lower case/upper case formats, merged subclasses of each cell-types. For example, subclasses of astrocytes, which are originally labeled as Astrocytes, Astrocytes(1), Astrocytes(2), Astrocytes(3), are all labeled as Astrocytes in the added column. Note: Cell ids may have been mutated from original string of numbers through reading and writing across programming languages that handle numbers with different precision. If using R to read the files shared here, one can find the cells in STalign_celltypeannotations_merfishslices_v2.csv.gz that correspond with STalign_SXRX_with_structure_id_name.csv.gz when cell ids are formatted as a double in scientific notation, which is how R will read the file automatically.

0 references

publication date

28 February 2024

0 references

0 references

0 references

0 references