XJTU

Publications

OpenHI - An open source framework for annotating histopathological image




https://doi.org/10.1109/BIBM.2018.8621393

Puttapirat, P., Zhang, H., Lian, Y., Wang, C., Zhang, X., Yao, L., & Li, C. (2018). OpenHI - An open source framework for annotating histopathological image. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 1076–1082.

Abstract: Histopathological images carry informative cellular visual phenotypes and have been digitalized in huge amount in medical institutes. However, the lack of software for annotating the specialized images has been a hurdle of fully exploiting the images for educating and researching, and enabling intelligent systems for automatic diagnosis or phenotype-genotype association study. This paper proposes an open-source web framework, OpenHI, for the whole-slide image annotation. The proposed framework could be utilized for simultaneous collaborative or crowd-sourcing annotation with standardized semantic enrichment at a pixel-level precision. Meanwhile, our accurate virtual magnification indicator provides annotators a crucial reference for deciding the grading of each region. In testing, the framework can responsively annotate the acquired whole-slide images from TCGA project and provide efficient annotation which is precise and semantically meaningful. OpenHI is an open-source framework thus it can be extended to support the annotation of whole-slide images from different source with different oncological types. The framework may facilitate the creation of large-scale precisely annotated histopathological image datasets.

OpenHI: Open platform for histopathological image annotation




https://doi.org/10.1504/IJDMB.2019.101393

Puttapirat, P., Zhang, H., Deng, J., Dong, Y., Shi, J., Lou, P., Wang, C., Yao, L., Zhang, X., Li, C. (2019). OpenHI: Open platform for histopathological image annotation. International Journal of Data Mining and Bioinformatics, 22(4), 328–349.

Abstract: Consolidating semantically rich digital histopathological image by annotating scanned glass slides known as whole-slide images requires a software capable of handling this type of biomedical data and a support for procedures which align with existing pathological routine. Demand for large-scale annotated histopathological datasets are on the raise because they are needed for developments of artificial intelligence techniques to promote automatic diagnosis, mass screening, or phenotype-genotype association study. This paper presents an open platform for efficient collaborative histopathological image annotation with standardized semantic enrichment at a pixel-level precision named OpenHI—Open Histopathological Image. The framework’s responsive processing algorithm can perform large-scale histopathological image annotation and serve as biomedical data infrastructure for digital pathology. It is highly configurable and could be extended to annotate histopathological image of various oncological types. The framework is open-source and fully documented. It is publicly available at https://gitlab.com/BioAI/OpenHI.

OpenHI2 – Open source histopathological image platform




Puttapirat, P., Zhang, H., Deng, J., Dong, Y., Shi, J., He, H., Gao, Z., Wang, C., Zhang, X., Li, C. (2019). OpenHI2 – Open source histopathological image platform. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), xxxx–xxxx.

Abstract: Transition from conventional to digital pathology requires a new category of biomedical informatic infrastructure which could facilitate delicate pathological routine. Pathological diagnoses are sensitive to many external factors and is known to be subjective. Only systems that can meet strict requirements in pathology would be able to run along pathological routines and eventually digitized the study area, and the developed platform should comply with existing pathological routines and international standards. Currently, there are a number of available software tools which can perform histopathological tasks including virtual slide viewing, annotating, and basic image analysis, however, none of them can serve as a digital platform for pathology. Here we describe OpenHI2, an enhanced version Open Histopathological Image platform which is capable of supporting all basic pathological tasks and file formats; ready to be deployed in medical institutions on a standard server environment or cloud computing infrastructure. In this paper, we also describe the development decisions for the platform and propose solutions to overcome technical challenges so that OpenHI2 could be used as a platform for histopathological images. Further addition can be made to the platform since each component is modularized and fully documented. OpenHI2 is free, open-source, and available at https://gitlab.com/BioAI/OpenHI.

Effects of annotation granularity in deep learning models for histopathological images




Shi, J., Gao, Z., Zhang, H., Puttapirat, P., Wang, C., Zhang, X., Li, C. (2019). Effects of annotation granularity in deep learning models for histopathological images. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), xxxx–xxxx.

Abstract: Pathological is crucial to cancer diagnosis. Usually, Pathologists draw their conclusion based on observed cell and tissue structure on histology slides. Rapid development in machine learning, especially deep learning have established robust and accurate classifiers. They are being used to analyze histopathological slides and assist pathologists in diagnosis. Most machine learning systems rely heavily on annotated data sets to gain experiences and knowledge to correctly and accurately perform various tasks such as classification and segmentation. Generally, annotations made in pathology-related datasets have inherited annotation methods from natural scene images. This work investigates different granularity of annotations in histopathological data set including image-wise, bounding box, ellipse-wise, and pixel-wise to verify the influence of annotation in pathological slide on deep learning models. We design corresponding experiments to test classification and segmentation performance of deep learning models based on annotations with different annotation granularity. In classification, state-of-the-art deep learning-based classifiers perform better when trained by pixel-wise annotation dataset. On average, precision, recall and F1-score improves by 7.87%, 8.83% and 7.85% respectively. Thus, it is suggested that finer granularity annotations are better utilized by deep learning algorithms in classification tasks. Similarly, semantic segmentation algorithms can achieve 8.33% better segmentation accuracy when trained by pixel-wise annotations. Our study shows not only that finer-grained annotation can improve the performance of deep learning models, but also help they extract more accurate phenotypic information from histopathological slides. The accurate and spatially precise acquisitions of phenotypic information can improve the reliability of the model prediction. Intelligence systems trained on granular annotations may help pathologists inspecting certain regions and features in the slide that were mainly used to calculate the prediction. The compartmentalized prediction approach similar to this work may contribute to phenotype and genotype association studies.

Comparing digital histology slides with multiple staining based on decoloring and dyeing technique




Wang, C., Yang, Z., Wang, K., Puttapirat, P., Li, C., Zhang, G. (2019). Comparing digital histology slides with multiple staining based on decoloring and dyeing technique. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), xxxx–xxxx.

Abstract: Information in histology slides are usually visualized by different staining techniques, each of them unveils specific chemical and biological substances within tissue samples. Correlations between different stains can be useful to predict how certain tissue slides may look like if they were stained by other staining techniques. This work investigates two stains including hematoxylin and eosin (H&E) and immunohistochemistry (IHC) in digital pathological slides. Four cases of surgical biopsies were used in this work. The specimens were subjected to two consecutive stains with a decoloring process based on ethanol and potassium permanganate in between. After each stain, slides were digitized and archived as results. Comparing the effects of the two staining pipelines, IHC slides after decoloring of H&E showed that the cell structure was clear, the positive IHC staining was accurate, the background of the slide was clean, there was no DAB residue, and tissue fragments were intact. However, the other pipeline where IHC was stained before H&E showed that the nuclear border was blurred. Eosin is lightly colored resulting in low contrast visualization of nucleoplasm, DAB is not completely decolored, and parts of tissue were fragmented. We conclude that, from the proposed staining and decoloring technique, tissue slides could be stained with IHC more effectively on decolored H&E slides than those stained with H&E after IHC. Utilizing digital section scanning technology, we can obtain pairs of tissue images stained differently while preserving the exact same tissue structure.

Renal Cell Carcinoma Detection and Subtyping with Minimal Point-Based Annotation in Whole-Slide Images

Link to publisher (Springer) Accept by MICCAI 2020



Gao Z., Puttapirat P., Shi J., Li C. (2020) Renal Cell Carcinoma Detection and Subtyping with Minimal Point-Based Annotation in Whole-Slide Images. In: Martel A.L. et al. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, vol 12265. Springer, Cham. https://doi.org/10.1007/978-3-030-59722-1_42.

Abstract: Cancerous region detection and subtyping in whole-slide images (WSIs) are fundamental for renal cell carcinoma (RCC) diagnosis. The main challenge in the development of automated RCC diagnostic systems is the lack of large-scale datasets with precise annotations. In this paper, we propose a framework that employs a semi-supervised learning (SSL) method to accurately detect cancerous regions with a novel annotation method called Minimal Point-Based (Min-Point) annotation. The predicted results are efficiently utilized by a hybrid loss training strategy in a classification model for subtyping. The annotator only needs to mark a few cancerous and non-cancerous points in each WSI. The experiments on three significant subtypes of RCC proved that the performance of the cancerous region detector trained with the Min-Point annotated dataset is comparable to the classifiers trained on the dataset with full cancerous region delineation. In subtyping, the proposed model outperforms the model trained with only whole-slide diagnostic labels by 12% in terms of the testing f1-score. We believe that our “detect then classify” schema combined with the Min-Point annotation would set a standard for developing intelligent systems with similar challenges.

Structured Information Extraction of Pathology Reports with Attention-based Graph Convolutional Network

Link to publisher (IEEEXplore) Accept by BIBM 2020



J. Wu, K. Tang, H. Zhang, C. Wang and C. Li, "Structured Information Extraction of Pathology Reports with Attention-based Graph Convolutional Network," 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020, pp. 2395-2402, doi: 10.1109/BIBM49941.2020.9313347.

Abstract: Cancerous region detection and subtyping in whole-slide images (WSIs) are fundamental for renal cell carcinoma (RCC) diagnosis. The main challenge in the development of automated RCC diagnostic systems is the lack of large-scale datasets with precise annotations. In this paper, we propose a framework that employs a semi-supervised learning (SSL) method to accurately detect cancerous regions with a novel annotation method called Minimal Point-Based (Min-Point) annotation. The predicted results are efficiently utilized by a hybrid loss training strategy in a classification model for subtyping. The annotator only needs to mark a few cancerous and non-cancerous points in each WSI. The experiments on three significant subtypes of RCC proved that the performance of the cancerous region detector trained with the Min-Point annotated dataset is comparable to the classifiers trained on the dataset with full cancerous region delineation. In subtyping, the proposed model outperforms the model trained with only whole-slide diagnostic labels by 12% in terms of the testing f1-score. We believe that our “detect then classify” schema combined with the Min-Point annotation would set a standard for developing intelligent systems with similar challenges.

LibMI – An open source library for efficient histopathological image processing




Dong, Y., Puttapirat, P., Deng, J., Li, C. (2020). LibMI – An open source library for efficient histopathological image processing. Journal of Pathology Informatics, x(x). (in press)

Abstract: Background: Whole-slide images as a kind of image data is rapidly growing in the digital pathology domain. With unusually high resolution, these images make them hard to be supported by conventional tools or file formats. Thus it obstructs data sharing and automated analysis. Here we propose a library, libMI, along with its open and standardized image file format. They can be used together to efficiently read, write, modify and annotate large images. Methods and Material: LibMI utilizes the concept of pyramid image structure and lazy propagation from a segment tree algorithm to support reading and modifying and guarantee that both operations have linear time complexity. Also, a cache mechanism was introduced to speed up the program. Results: LibMI is an open and efficient library for histopathological image processing. To demonstrate its functions, we applied it to several tasks including image thresholding, microscopic color correction and storing pixel-wise information on whole-slide images. The result shows that LibMI is particularly suitable for modifying large images. Furthermore, compared with congeneric libraries and file formats, libMI and MMSI run 18.237 times faster on read-only tasks. Conclusions: The combination of libMI library and MMSI file format enables developers to efficiently read and modify whole-slide images, thus can assist in pixel-wise image processing on extremely large images to promote building image processing pipeline. The library together with the data schema are freely available on GitLab: https://gitlab.com/BioAI/libMI.

Nuclei Grading of Clear Cell Renal Cell Carcinoma in Histopathological Image by Composite High-Resolution Network

Link to publisher (Springer) Accepted by MICCAI 2021



Abstract: The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRCC nuclei grading a crucial task in RCC pathology analysis. Computer-aided nuclei grading aims to improve pathologists' work efficiency while reducing their misdiagnosis rate by automatically identifying the grades of tumor nuclei within histopathological images. Such a task requires precisely segment and accurately classify the nuclei. However, most of the existing nuclei segmentation and classification methods can not handle the inter-class similarity property of nuclei grading, thus can not be directly applied to the ccRCC grading task. In this paper, we propose a Composite High-Resolution Network for ccRCC nuclei grading. Specifically, we propose a segmentation network called W-Net that can separate the clustered nuclei. Then, we recast the fine-grained classification of nuclei to two cross-category classification tasks, based on two high-resolution feature extractors (HRFEs) which are proposed for learning these two tasks. The two HRFEs share the same backbone encoder with W-Net by a composite connection so that meaningful features for the segmentation task can be inherited for the classification task. Last, a head-fusion block is applied to generate the predicted label of each nucleus. Furthermore, we introduce a dataset for ccRCC nuclei grading, containing 1000 image patches with 70945 annotated nuclei. We demonstrate that our proposed method achieves state-of-the-art performance compared to existing methods on this large ccRCC grading dataset.

Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image

Link to publisher (Springer) Accepted by MICCAI 2021



Abstract: Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both the cellular and cell-layer level patterns by a position-embedding layer, a grade-embedding layer, and a multi-head multi-layer self-attention module. To evaluate the performance of the proposed framework, experienced pathologists are invited to selected 1162 regions of interest from 171 whole slide images of type 1 and type 2 pRCC. Experimental results show that the proposed method achieves better performance than existing CNN-based models with a significant margin.

A Semi-supervised Multi-task Learning Framework for Cancer Classification with Weak Annotation in Whole-slide Images

Link to publisher (Elsevier) Publised in Medical Image Analysis



Abstract: Cancer region detection (CRD) and subtyping are two fundamental tasks in digital pathology image analysis. The development of data-driven models for CRD and subtyping on whole-slide images (WSIs) would mitigate the burden of pathologists and improve their accuracy in diagnosis. However, the existing models are facing two major limitations. Firstly, they typically require large-scale datasets with precise annotations, which contradicts with the original intention of reducing labor effort. Secondly, for the subtyping task, the non-cancerous regions are treated as the same as cancerous regions within a WSI, which confuses a subtyping model in its training process. To tackle the latter limitation, the previous research proposed to perform CRD first for ruling out the non-cancerous region, then train a subtyping model based on the remaining cancerous patches. However, separately training ignores the interaction of these two tasks, also leads to propagating the error of the CRD task to the subtyping task. To address these issues and concurrently improve the performance on both CRD and subtyping tasks, we propose a semi-supervised multi-task learning (MTL) framework for cancer classification. Our framework consists of a backbone feature extractor, two task-specific classifiers, and a weight control mechanism. The backbone feature extractor is shared by two task-specific classifiers, such that the interaction of CRD and subtyping tasks can be captured. The weight control mechanism preserves the sequential relationship of these two tasks and guarantees the error back-propagation from the subtyping task to the CRD task under the MTL framework. We train the overall framework in a semi-supervised setting, where datasets only involve small quantities of annotations produced by our minimal point-based (min-point) annotation strategy. Extensive experiments on four large datasets with different cancer types demonstrate the effectiveness of the proposed framework in both accuracy and generalization.

Uncertainty-based Model Acceleration for Cancer Classification in Whole-Slide Images

Link to publisher (IEEEXplore) Accepted by BIBM 2022



Abstract: Computational Pathology (CPATH) offers the possibility for highly accurate and low-cost automated pathological diagnosis. However, the high time cost of model inference is one of the main issues limiting the application of CPATH methods. Due to the large size of Whole-Slide Image (WSI), commonly used CPATH methods divided a WSI into a large number of image patches at relatively high magnification, then predicted each image patch individually, which is time-consuming. In this paper, we propose a novel Uncertainty-based Model Acceleration (UMA) method for reducing the time cost of model inference, thereby relieving the deployment burden of CPATH applications. Enlightened by the slide-viewing process of pathologists, only a few high-uncertain regions are regarded as “suspicious” regions that need to be predicted at high magnification, and most of the regions in WSI are predicted at low magnification, thereby reducing the times of image patch extraction and prediction. Meanwhile, uncertainty estimation ensures prediction accuracy at low magnification. We take two fundamental CPATH classification tasks (i.e., cancer region detection and subtyping) as examples. Extensive experiments on two large-scale renal cell carcinoma classification datasets demonstrate that our UMA can significantly reduce the time cost of model inference while maintaining competitive classification performance.

Unsupervised Representation Learning for Tissue Segmentation in Histopathological Images: From Global to Local Contrast

Link to publisher (IEEEXplore) Published in IEEE Transactions on Medical Imaging



Abstract: Tissue segmentation is an essential task in computational pathology. However, relevant datasets for such a pixel-level classification task are hard to obtain due to the difficulty of annotation, bringing obstacles for training a deep learning-based segmentation model. Recently, contrastive learning has provided a feasible solution for mitigating the heavy reliance of deep learning models on annotation. Nevertheless, applying contrastive loss to the most abstract image representations, existing contrastive learning frameworks focus on global features, therefore, are less capable of encoding finer-grained features (e.g., pixel-level discrimination) for the tissue segmentation task. Enlightened by domain knowledge, we design three contrastive learning tasks with multi-granularity views (from global to local) for encoding necessary features into representations without accessing annotations. Specifically, we construct: (1) an image-level task to capture the difference between tissue components, i.e., encoding the component discrimination; (2) a superpixel-level task to learn discriminative representations of local regions with different tissue components, i.e., encoding the prototype discrimination; (3) a pixel-level task to encourage similar representations of different tissue components within a local region, i.e., encoding the spatial smoothness. Through our global-to-local pre-training strategy, the learned representations can reasonably capture the domain-specific and fine-grained patterns, making them easily transferable to various tissue segmentation tasks in histopathological images. We conduct extensive experiments on two tissue segmentation datasets, while considering two real-world scenarios with limited or sparse annotations. The experimental results demonstrate that our framework is superior to existing contrastive learning methods and can be easily combined with weakly supervised and semi-supervised segmentation methods.

HistoML, a markup language for representation and exchange of histopathological features in pathology images

Link to publisher (Nature Publishing) Published in Scientific Data



Abstract: The study of histopathological phenotypes is vital for cancer research and medicine as it links molecular mechanisms to disease prognosis. It typically involves integration of heterogenous histopathological features in whole-slide images (WSI) to objectively characterize a histopathological phenotype. However, the large-scale implementation of phenotype characterization has been hindered by the fragmentation of histopathological features, resulting from the lack of a standardized format and a controlled vocabulary for structured and unambiguous representation of semantics in WSIs. To fill this gap, we propose the Histopathology Markup Language (HistoML), a representation language along with a controlled vocabulary (Histopathology Ontology) based on Semantic Web technologies. Multiscale features within a WSI, from single-cell features to mesoscopic features, could be represented using HistoML which is a crucial step towards the goal of making WSIs findable, accessible, interoperable and reusable (FAIR). We pilot HistoML in representing WSIs of kidney cancer as well as thyroid carcinoma and exemplify the uses of HistoML representations in semantic queries to demonstrate the potential of HistoML-powered applications for phenotype characterization.

To be updated