HTA DATA TYPES AND STANDARDS
Contents
Data Types | Data Standards | Data Levels
Data Types
The HTA is committed to ensuring access to spatial profiling data according to emerging FAIR standards (for Findability, Access, Interoperability, and Reproducibility). However, few well-recognized public standards and repositories exist, and so access to different types of data remains a work in progress.
HTA data currently comprises:
- High-plex whole slide image data collected using cyclic immunofluorescence (CyCIF), ORION, and 3D CyCIF
- Spatial feature tables computed from high-plex image data describing marker intensity and morphological features at a single cell level
- Dissociative mRNA-Seq data
- Transcriptional profiles obtained using micro-region transcript profiling methods such as GeoMX
Data Standards
Imaging data
BioFormats and OME-TIFF or the next generation NGFF replacements
Transcriptional data
Distributed via GEO using well-established approaches
Highly Multiplexed Tissue Images
We developed a standard for multiplexed tissue imaging called Minimum Information about highly multiplexed Tissue Imaging (MITI).
Data Levels
MITI adopted the concept of Data Levels from dbGAP to manage image data and the corresponding spatial profiles. Higher data levels are more processed (see the MITI publication for additional details and background information).
Level 1 Data
Whole slide imaging usually involves acquisition of ~100 to 1,000 individual image tiles, each collected from a different X and Y location.
Level 2 Data
Full-resolution images have undergone automated stitching, registration, illumination correction, background subtraction, intensity normalization, and have been stored in a standardized OME format.
To achieve this, Level 1 image tiles are combined at sub-pixel accuracy into a mosaic image in a process known as stitching. When high-plex images are assembled from multiple rounds of lower-plex imaging, it is also necessary to register channels to each other across imaging cycles and to correct for any unevenness in illumination (so-called flat fielding). Stitched and registered mosaics can be as large as 50,000 x 50,000 pixels x 100 channels, requiring ~500 GB of disk space.
Level 3 Data
Level 3 data represent images that have been processed with some interpretive intent and are intended to be the primary type of image data distributed by tissue atlases and similar projects. Interpretive intent may include (i) full-resolution images following quality control or artifact removal, (ii) segmentation masks computed from such images, (iii) machine-generated spatial models, and (iv) images with human or machine-generated annotations.
Level 4 Data
Level 4 data comprise features derived from level 3 images, most commonly single-cell features in “spatial feature tables” that describes marker intensities, cell coordinates and other single-cell features.
Level 5 Data
Level 5 data includes results computed from spatial feature tables or primary images.
Accessing terabyte-size full-resolution image data is impractically burdensome for browsing a large dataset, so we have developed a specialized browsing tool, MINERVA, to enable panning and zooming across large images using a standard web browser.