A systematic analysis of deep learning in genomics and histopathology for precision oncology

Unger, Michaela; Kather, Jakob Nikolas

doi:10.1186/s12920-024-01796-9

Research
Open access
Published: 05 February 2024

A systematic analysis of deep learning in genomics and histopathology for precision oncology

Michaela Unger¹ &
Jakob Nikolas Kather^1,2,3,4

BMC Medical Genomics volume 17, Article number: 48 (2024) Cite this article

2803 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

Background

Digitized histopathological tissue slides and genomics profiling data are available for many patients with solid tumors. In the last 5 years, Deep Learning (DL) has been broadly used to extract clinically actionable information and biological knowledge from pathology slides and genomic data in cancer. In addition, a number of recent studies have introduced multimodal DL models designed to simultaneously process both images from pathology slides and genomic data as inputs. By comparing patterns from one data modality with those in another, multimodal DL models are capable of achieving higher performance compared to their unimodal counterparts. However, the application of these methodologies across various tumor entities and clinical scenarios lacks consistency.

Methods

Here, we present a systematic survey of the academic literature from 2010 to November 2023, aiming to quantify the application of DL for pathology, genomics, and the combined use of both data types. After filtering 3048 publications, our search identified 534 relevant articles which then were evaluated by basic (diagnosis, grading, subtyping) and advanced (mutation, drug response and survival prediction) application types, publication year and addressed cancer tissue.

Results

Our analysis reveals a predominant application of DL in pathology compared to genomics. However, there is a notable surge in DL incorporation within both domains. Furthermore, while DL applied to pathology primarily targets the identification of histology-specific patterns in individual tissues, DL in genomics is more commonly used in a pan-cancer context. Multimodal DL, on the contrary, remains a niche topic, evidenced by a limited number of publications, primarily focusing on prognosis predictions.

Conclusion

In summary, our quantitative analysis indicates that DL not only has a well-established role in histopathology but is also being successfully integrated into both genomic and multimodal applications. In addition, there is considerable potential in multimodal DL for harnessing further advanced tasks, such as predicting drug response. Nevertheless, this review also underlines the need for further research to bridge the existing gaps in these fields.

Peer Review reports

Background

Over the last few decades, precision oncology has emerged as the standard strategy in cancer care. Once a diagnosis is made, precision oncology tailors cancer treatment based on the specific molecular alterations unique to each patient [1]. This is enabled by biomarkers present in the tumor’s morphology or genotype. Biomarkers are biological features that serve as indicators of healthy or pathogenic processes, as well as responses to specific drug treatments [2]. While conventional biomarkers such as histopathological grade or subtype provide preliminary insights into a patient’s disease, there are also more nuanced prognostic and predictive biomarkers available. For example, the presence of lymphocytes in tumor tissue is a useful prognostic biomarker that indicates the course of the disease in various types of cancer [3]. Additionally, predictive biomarkers forecast response to specific treatments. One notable example is homologous repair deficiency (HRD), which can increase patients’ susceptibility for treatment with Poly-ADP-Ribose-Polymerase (PARP) inhibitors [4,5,6]. Usually, the routine acquisition of patient-specific features is typically limited to a select set of biomarkers, due to high costs, its time-consuming nature, and specialized equipment and expertise required for complex biological assays [7]. Furthermore, many of these advancements remain accessible only to a limited number of cancer patients globally. Consequently, new concepts could help to streamline clinical workflows, enhancing the process from diagnosis to treatment, especially in low- and middle-income countries. Artificial intelligence (AI) tools could play a role in addressing this challenge by offering predictive estimates of biomarkers, thereby supporting clinicians in making informed decisions [8]. Ultimately, in some cases, AI has the potential to bypass the conventional biomarker detection stage entirely [9].

Deep learning (DL), a subclass of AI [10], can extract meaningful patterns from complex data. DL models are neural networks that can undergo supervised training wherein they process input through layers of small units, the neurons. These models generate an output which is then compared to a predefined label. The error which is initially produced during this process is then propagated back through the network, causing updates to the internal parameters, thereby improving the prediction accuracy in the subsequent round [11]. When labels are only partially available or entirely absent, DL models can also be trained using weakly-supervised or unsupervised methods [12, 13]. In oncology, the histopathological phenotype and genetic alterations create an abundance of complex data which can in principle be analyzed with DL. In digital pathology phenotypes from routinely available hematoxylin and eosin (H&E) stained whole slide images (WSIs) serve as a rich data source [14]. In the last years, DL has demonstrated its ability to derive global patterns for cancer diagnosis from WSIs [15, 16], which could offer a more quantitative measure of the disease and enhance diagnostic throughput. Nevertheless, for the majority of cancer types, there remains a need for more comprehensive prognostic and predictive information to refine therapeutic choices. Genomic tests, designed to detect specific alterations in tumor DNA, are part of an arsenal of tests that can yield additional data for clinical decision-making. Traditionally, genomic data has been analyzed using standard bioinformatics pipelines. These are composed of deterministic computer programs, enabling the comparison of alterations in the tumor’s genome either with the patient’s germline genome or a reference genome. In this context, DL holds the potential to replace certain aspects of these traditional pipelines. DL’s ability to discover known patterns in the genomic sequence, but also to identify new ones, could facilitate the accessibility of concealed features within the data. Another recent development is taking place at the intersection of histopathology and genomics – the domain of multimodal models [17]. Technological advancements now enable the simultaneous integration and interpretation of patterns across both data types. Potentially, some patterns in pathology slides might only be meaningful given a known genetic background, or vice versa. As such, multimodal models could offer more comprehensive insights than an independent analysis of either data modality. In conclusion, the application of DL has the potential to advance precision oncology, conceivably making the acquisition of biomarkers quicker and more affordable.

Here, we present a systematic review of the literature, covering DL applications in pathology, genomics, and their multimodal combination for precision oncology (Fig. 1a). In order to perform a comprehensive analysis across these expansive fields, we needed to establish a set of criteria related to workflows and biomarker usage in the clinics. Therefore, we divide the literature into six fields of DL application, as established by previous studies [18]. Three “basic” applications: DL for predicting the diagnosis (cancer detection), grading (determining the grade of cancer) or subtype of a tumor; and three “advanced” applications: predicting prognosis (survival probability of the patient), patterns of genetic alterations (such as the detection of driver mutations) or treatment response to a specific therapy scheme or a single medicine [18, 19]. Our systematic analysis resulted in 534 academic publications (Fig. 1b), all of which are enumerated in the Supplementary material and will be explained in the following sections. With this approach, we summarize the integration of DL in these fields, examine overall trends and identify gaps warranting further research.

Methods

Article selection criteria

For conducting our systematic review we aimed to adhere to the PRISMA [20] guidelines as closely as possible. However, given that the scope of the review was primarily oriented towards a quantitative analysis of publication numbers, not all screening criteria outlined in the PRISMA guidelines were considered applicable. We designed our query to include publications which employed DL techniques within genomics and histopathology in oncology, and for multimodality in these two fields (Fig. 1b). Additionally, the considered papers had to meet the following criteria: published between the year 2010 and the 16th of November 2023, written in English, and have both title and abstract readily accessible. The considered studies had to utilize DL in at least one of the following six categories: diagnosis, grading, subtyping, prognosis, mutation, and response. In alignment with other publications, we categorized the application areas into basic (diagnosis, grading, subtyping) and advanced (prognosis, mutation, response) tasks. A corresponding flowchart is depicted in Fig. 1b.

Data extraction

All papers obtained for the PubMed query (queries available in the Supplementary Material) were collected, pooled and annotated, with regards to histopathology, genomics or multimodal data, tissue type, and application class. Rayyan was utilized as a tool for assessing papers and structuring the systematic review process. The applicability of each paper was determined by screening its title and abstract according to our selection criteria. If the relevance of a paper remained ambiguous following this step, we proceeded to a full-text review. Any papers without available full text, or those for which the relevance remained uncertain after full-text review, were discarded. From the publication list we furthermore excluded review articles, duplicated papers, and articles not related to this review topic. Out of scope for this review we defined as papers not related to oncology, not applying DL methods, not applying DL methods in our six categories, utilizing other imaging techniques than bright field microscopy of histological sections, utilizing proteome and metabolome data and/or not using human samples. Certain papers encompassed multiple application classes, which were labeled with all applicable types. A comprehensive list of all selected papers is available in Supplementary Tables 1–3. The search in PubMed displayed some limitations. Specifically, it might not have identified publications that did not include our specified keywords in their title or abstract. Hence, relevant papers that align with our topic of interest may have been omitted. Furthermore, restricting our search to only PubMed as a database could propagate biases to our findings. Nonetheless, in summary, our approach facilitated the discovery of a diverse range of papers, providing valuable insights into the fields of interest.

Results

Initially, we gathered all publications which describe basic and advanced applications from the three key research areas: histopathology, genomics and their multimodal combination. In general, DL is observably more implemented in histopathology than in genomics (Fig. 2a). The task of cancer diagnosis is the primary task tackled by DL-based studies in histopathology, with a total of 128 articles published on this topic. On the other hand, determining tumor grade, a key biomarker, has received less attention in DL studies within histopathology, underlined by 19 publications in this domain. Turning our attention to genomics-based cancer diagnosis, our search yielded 18 papers. Strikingly, only two publications addressed the grading aspect [21, 22]. Regarding tumor subtype, the publication count increased, with 33 for histopathology and 26 for genomics. As anticipated, multimodality demonstrated the fewest publications, with three for diagnosis [23, 24], two for grading [25,26,27], and two for subtyping [28, 29]. Nevertheless, it is crucial to acknowledge that multimodal models present a significantly higher level of complexity compared to their unimodal counterparts. Given the satisfactory performance of unimodal models in most applications, there has not been a compelling need to utilize more complex multimodal models for simpler tasks.

The landscape shifts when we explore the advanced applications of DL across our research domains (Fig. 2a). Interestingly, the prediction of biomarkers in the form of (driver) mutations is most frequently seen in DL for histopathology, as indicated by the 64 relevant publications. Conversely, genomics accounted for 20 articles in this area. It should be noted that in genomics data, when utilizing whole exome or genome sequencing, driver mutations can be derived without the need of DL, which might account for the lower publication count. Our search revealed only one multimodal publication for mutation biomarkers. Predicting drug response and with this bypassing the traditional biomarker approach, was an area that the histopatho-genomic multimodality did not address at all. However, intriguingly, treatment response became the only DL application for which genomics yielded the highest publication count. Nevertheless, to put this number into perspective, DL in genomics produced 31 publications for this application. This modest publication volume is likely due to the scarcity of data. Publications for drug response in histopathology mostly targeted general therapies like (neoadjuvant) chemotherapy, as opposed to individual drugs. Contrarily, in genomics, cancer cell line screens are the routine datasource for this application type. Moving on to the most prevalent advanced application of DL, prognosis does not focus on biomarker prediction as well, but on directly estimating a patient’s survival probability. Here, histopathology led this area with 125 papers, followed by 60 in genomics. Significantly, 75% of all multimodality models were developed for prognosis prediction. Multimodality thrives in this area as it merges insights from diverse sources, potentially outperforming unimodal models. Yet, the existing number of multimodality publications indicates opportunities for further exploration in this research domain.

Additionally, we investigated publication trends over time and examined the coverage of cancer entities. The first application of DL in histopathology targeted breast cancer diagnosis in 2016 [30], aligning with its status as one of the most prevalent cancers [31]. Furthermore, breast cancer was the focus of the pioneering CAMELYON computational histopathology challenges in 2016 and 2017 [32, 33]. Such challenges could not only emphasize the current trends of the field in terms of state-of-the-art techniques and emerging directions but also highlight knowledge gaps and facilitate collaborations, as well as data and resource sharing. 2017 witnessed further publications in breast cancer, but also in brain and colorectal cancer, alongside with the first grading prediction in histopathology. The field’s substantial growth is particularly noticeable when comparing the publication numbers between 2017 and 2018, approximately displaying a doubling in articles. A seminal paper by Coudray et al. [34], employing DL exclusively to classify lung cancer subtypes and their driver mutations, was published during this period. In 2018, the first four pan-cancer studies in DL for histopathology were published, a novelty considering the substantial data volume required for such research. The first prognosis prediction in this field also occurred in this year. In 2019, ten prognosis-related histopathology DL papers were published, covering a wide range of tissues including breast, colorectal, kidney, and skin, among others. This milestone marked a shift in DL for histopathology, where basic applications no longer dominated the field. 2020 constituted the first year where the number of advanced applications reached the level of basic ones, pushing the field’s knowledge boundaries further. Additionally, this period also witnessed the publication of the first drug response paper in DL for histopathology. Over time similar trends were observable; a surge in prognosis-related publications in 2019 followed the introduction of prognostic applications in 2018. The popularity of mutation prediction, the biomarkers linking genotypic alterations to phenotypic traits in histopathology, was similarly noted in 2020, two years after its introduction. Furthermore, in 2020, pan-cancer publications also expanded their application repertoire. Notably, Fu et al. [21] advanced the field with a publication employing an immense patient cohort for mutation prediction. For the year 2021, we identified a trend for DL in histopathology where specific cancer tissues, including breast, colorectal, skin, and stomach, attracted more research attention across various application types. In contrast, gastrointestinal and ovarian cancers were only explored for the first time, potentially due to prior limitations in cohort size. DL demands large sample sizes for good model performance. As such, more prevalent cancers typically benefit from data availability, while models trained on scarce data are likely to perform poorly. Another factor contributing to the limited number of publications in certain tissue types could be their inherent morphology. Certain cancer entities could form more heterogeneous patterns difficult to recognize for the models, compromising the learning process. Despite these challenges, the growth trend in the field persisted in 2021, yielding 65 publications, only surpassed by the 90 papers published in 2022 and 114 in 2023. When compared to the humble beginnings in the 2010s, this progress is remarkable. Finally, as the last application type, drug response became widely utilized in the field in 2023, not only in single tissue studies such as ovary, colorectal or esophagus but also in pan-cancer studies. This development could open up numerous new clinical applications for DL in computational pathology, ultimately including the use of DL models as companion diagnostics for new drugs. Thus, although the expansion of DL in histopathology might decelerate in the future, automated biomarker prediction from WSIs is anticipated to be translated into clinical workflows in the near future [14].

In DL for genomics, a notable initial observation was the narrower range of investigated cancer entities compared to histopathology. One of the first articles in this field was published already in 2016 [35] focusing on differentiating tumor types from genomic data. Furthermore, early publications targeted pan-cancer studies rather than specific cancer types, likely due to the availability of pan-cancer datasets and the goal of genomics to understand general cancer mechanisms. By 2018, the field had diversified with applications emerging in subtyping, drug response, and mutation prediction, resulting in a total of five publications. Notably, in this year, the first DL model was applied on liver cancers [36]. The publication count grew to nine by 2019, with prognostic applications accounting for 30% of the total. Furthermore, additional cancer tissues, such as breast, and stomach, were investigated. Interestingly, DL for genomics started the exploration of drug response applications earlier, with seven papers published before the debut publication in histopathology-DL in the same field. This is probably the result of greater data generation and time efficiency of cancer cell line studies compared to actual patient data. By 2020, the first diagnostic DL methods appeared in genomics, however, this development did not spark a significant trend in the years to follow. Likewise to pathology, breast cancer remained a dominant research area, with other cancers only occasionally studied. 2021 saw a temporary peak in pan-cancer research for DL in genomics, yielding 24 publications. This suggests that genomic biomarkers may have more sustainability across various cancer types than histopathological ones. In 2022, the interest towards individual prognosis predictions slightly increased. However, the absence of a similar expanding trend in DL for clinical genomics, as observed in histopathology, was a surprising discovery. This suggests that there are hurdles yet to be addressed for the comprehensive integration of DL in this field. Nevertheless, DL could play an immense role in the discovery of novel genetic biomarkers as indicators or targets for individualized therapy in the future.

Multimodal DL research between histopathology and genomics was conducted in the least number of tissue types. This field, only recently established in 2018, saw in its first year the release of three publications. The emergence of studies on brain tumors in this context is likely attributable to changes in the WHO guidelines that now mandate molecular tests alongside pathological sample examinations for patient diagnosis [37]. This requirement probably led DL models to incorporate histopathologic and genomic data to reflect medical workflows. Prostate cancer research also made an appearance in 2018 with two papers both published by Ren et al. [38, 39]. Next to single cancer types, in multimodality, pan-cancer studies are in use as well, as demonstrated by Cheerla and Gevaert [40]. This was the only multimodal DL study published in 2019, suggesting that despite the initial momentum, the field’s overall popularity receded. Perhaps the concept and application of multimodal biomarkers had not been fully developed or realized at that time. Three more articles followed in 2020, two focusing on breast [41, 42] and one on the brain [43], all targeting prognostic biomarkers or direct implications about patient survival. Interestingly, the multimodal research changed paradigms in 2021 and was predominantly directed towards basic applications, namely grading and subtyping. An increase in publications was observed in 2022, including both basic and advanced studies, culminating in a total of seven publications. An expansion was observed in 2023 as well, not only focusing on prognosis predictions but also introducing mutation predictions for the first time in the field. This trend hints at the growing recognition and potential of multimodality in histopathology and genomics, a progress that will probably increase in importance in the forthcoming years.

Discussion

A first difference between DL in histopathology and genomics emerges when comparing the sheer number of publications. Our literature search yielded more than twice as many articles utilizing DL for histopathology as it did for genomics. Reasons for this behavior are not clear to define but one possible explanation could be data availability. H&E stained slides have been available for decades and are ready to be digitized. Thus, it may be easier to establish DL-appropriate cohort sizes in histopathology than in genomics, given that genomics has only been generating data since the early 2000s. Furthermore, genomics data is not yet routinely collected for every cancer patient which may lead to differences in cohort size as well. Additionally, the origins of DL in histopathology and genomics may play a role in their adoption within these fields. While DL for medical imaging evolved from successful applications in computer vision, its use for sparse, tabular genomic data was less prevalent, facing substantial competition from traditional bioinformatics tools. Another crucial point to consider is the human interpretability of histopathological images compared to genomic information. The human eye can detect distinct patterns in histology, which form the basis for patient diagnosis, making it less abstract and more intuitive than genomic data. As a reflection of this, the field of explainable AI is emerging, aiming to elucidate the black-box properties of DL models. For histopathology, pixel- or region-wise attention maps [44] can be employed to display important areas in the input images. Clinical applications of DL in genomics, on the other hand, are less favored, as the relationships between specific genes and model outputs are often not interpretable. Here, attribution methods like SHAP values [45] can be applied to highlight the most influential features of the data, but this is effective only when the input features are already human comprehensible. Consequently, the application of DL in a medical context may be more straightforward for image-based fields, where practitioners can directly associate the model’s attention with a biological rationale. Furthermore, institutional biases could be stronger for genomic data than for histopathology. Protocols and techniques might lead to more substantial data compatibility issues in genomics than in histopathology. Lastly, legal hurdles could make the distribution of genomic data more challenging than in pathology. Collectively, these factors could contribute to the observed lower utilization of DL in genomics.

Throughout the years, a common theme across all three fields is their expansion in DL. DL in histopathology emerged around the same time as DL in genomics, but in its early years, it primarily focused on basic applications such as diagnosis, grading, and subtyping. This trend could be attributed to the fundamental role that histopathology plays in the medical workflow, forming the basis for these applications. Typically, the initial step in diagnosing cancer patients involves pathology, with genomic data often obtained post-diagnosis, thus rendering its use for genomic-DL in diagnosis redundant in most cases. Additionally, obtaining genomic data from sequencing technologies is more costly than the preparation of H&E tissue slides. This economic factor might contribute to the usage pattern, where genomic data is reserved for challenging cases and advanced applications, while histopathology suffices for diagnosis and basic biomarkers. This trend could explain why genomics predominantly finds application in advanced tasks such as prognosis, prediction of drug response, and mutation prediction. Genomics can potentially offer more profound insights into molecular mechanisms within cancer cells necessary for these advanced tasks. Our observation indicates that drug response was infrequently the objective of DL in histopathology, while it was more extensively covered in genomics. This underlines the possibility that information derived from histopathology might not be sufficient for precise predictions. On the other hand, drug response predictions in genomics were mostly carried out in relation to pharmacogenomics in which cancer cell lines were used. Predicting drug response using fixed cell lines in genomics may be simpler as it lacks the added complexity of real-life scenarios in histopathology, such as the tumor microenvironment and other factors. Regardless, we expect integration of DL in both fields will continue to increase with vast potential for further growth for the future.

A distinct feature of DL in histopathology is the diverse range of cancer tissues studied. In contrast, DL studies in genomics primarily focus on pan-cancer approaches, occasionally focusing on prevalent cancer types such as breast or liver. This could be attributed to cancer type overarching questions posed in genomics, which might necessitate pan-cancer studies. Moreover, genomic data might be less reliant on the specific cancer type and exhibit more consistency. Evidence for this can be seen in the similar molecular alterations, like driver mutations, which are active across different cancer types. This could facilitate the aggregation of different tissue types into larger pan-cancer studies. Nevertheless, genomics could be used in the future to address specific cancer tissues, making predictions more precise and understanding molecular alterations in these cancer types more deeply. Regardless of the rarity of cancer types, patients could still benefit from DL applications. On the other hand, WSIs are highly tissue specific. This means that the absence of pan-cancer studies can be explained by substantially different visual patterns of various cancer types. For instance, breast cancer and brain cancer appear significantly different histologically, as the underlying tissue architecture varies. The patterns recognized by the DL algorithm could become confusing or even contradictory when combined, which may hamper the model’s performance. Consequently, to yield the highest-performing models, DL in histopathology might prefer to keep tissue types separated. However, for future medical applications, it may become necessary to develop models applicable to the broadest possible patient group. Therefore, building DL models that encompass diverse cancer types is arguably more feasible for clinical use than focusing on a single cancer type.

In the realm of multimodality, we observed that the most prominent application was prognosis prediction. Given the complexity of securing a large sample size that includes both genomic and histopathologic data, researchers might have prioritized addressing key challenges such as predicting survival rates or individual patient risks. The articles demonstrated that the combination of histopathology and genomics can embrace synergies between them and make DL predictions more reliable. Moreover, the integration of synergistic data could enable a direct progression to advanced tasks, potentially circumventing the initial biomarker detection stage. Yet, it is unclear how interactions between modalities affect the predictions of these models. In turn, this raises an important question: Is a comprehensive understanding of the model’s inner workings necessary for its clinical deployment, or is exceptional performance justification enough? As this question is open to debate in the scientific community, we still anticipate a serious growth in the coming years for multimodality, since the field’s relative novelty is paired with growing data sources and an immense medical relevance.

In all three areas, there remains a diverse range of unexplored cancer tissues and application combinations. While DL for histopathology needs to enhance its pan-cancer comprehension, DL for genomics must work towards refining its approaches for specific cancer types. For multimodality, we found the most significant gaps concerning advanced applications, with none discovered for drug response and mutation prediction. This presents a substantial opportunity for future researchers to address these vital questions. By understanding how DL models work and elucidating connections between both data types, novel knowledge could be uncovered. This newfound understanding of interplay of the genome and tissue morphology could potentially shift our perception of fundamental biological processes behind cancer development.

Conclusion

In this review, we have explored applications of DL in histopathology and genomics. Evidently, the rise of DL in these fields began in the 2010s and maintained a steady growth trajectory. In the realm of histopathology, DL has found numerous applications spanning basic and advanced topics. Initially, the primary focus was on diagnosis, which then broadened to include prediction of phenotypic biomarkers such as cancer grade and subtypes. Over time, the scope extended to include molecular biomarkers, and ultimately evolved to encompass prognosis and drug response prediction. This diversification presents opportunities for more focused research on rare cancer entities, thereby enlarging our understanding of them. Conversely, the application of DL in genomics is currently less prevalent than in histopathology. The trend in genomics has leaned towards pan-cancer approaches with only a few publications investigating specific cancer types. However, there is the need to develop more cancer-type-specific diagnostic tests and prognostic biomarkers, thus paving the way for more personalized cancer care. Lastly, multimodal DL is a relatively new area that brings together data from both previously mentioned fields. Multimodal approaches have demonstrated the potential to outperform single-modality models, signifying its promising future. The synthesis of data from diverse sources, such as histopathology images and genomic sequences, offers a more comprehensive view of the disease, potentially leading to more accurate and clinically actionable insights. In conclusion, the dynamic evolution of DL in medical research, particularly in histopathology and genomics, underlines its potential in fostering breakthroughs in our understanding of diagnosis and treatment of cancer. Nevertheless, a considerable scope for further exploration and advancement remains. As the fields continue to grow and technology continues to improve, we expect that DL will play an increasing role in shaping the landscape of precision medicine.

Availability of data and materials

All data generated or analyzed during this study are included in this published article (and its supplementary information files).

Abbreviations

AI:: Artificial intelligence
DL:: Deep learning
DNA:: Deoxyribonucleic acid
HRD:: Homologous repair deficiency
H&E:: Hematoxylin and eosin
PARP:: Poly-ADP-Ribose-Polymerase
PRISMA:: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
TCGA:: The cancer genome atlas
WHO:: World health organization
WSI:: Whole slide image

References

Yates LR, Seoane J, Le Tourneau C, Siu LL, Marais R, Michiels S, et al. The European Society for Medical Oncology (ESMO) precision medicine glossary. Ann Oncol. 2018;29:30–5.
Article CAS PubMed Google Scholar
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001;69:89–95.
Article Google Scholar
Ingold Heppner B, Untch M, Denkert C, Pfitzner BM, Lederer B, Schmitt W, et al. Tumor-infiltrating lymphocytes: a predictive and prognostic biomarker in Neoadjuvant-treated HER2-positive breast Cancer. Clin Cancer Res. 2016;22:5747–54.
Article CAS PubMed Google Scholar
Lord CJ, Ashworth A. PARP inhibitors: synthetic lethality in the clinic. Science. 2017;355:1152–8.
Article ADS CAS PubMed PubMed Central Google Scholar
Litton JK, Rugo HS, Ettl J, Hurvitz SA, Gonçalves A, Lee K-H, et al. Talazoparib in patients with advanced breast Cancer and a germline BRCA mutation. N Engl J Med. 2018;379:753–63.
Article CAS PubMed PubMed Central Google Scholar
Audeh MW, Carmichael J, Penson RT, Friedlander M, Powell B, Bell-McGuinn KM, et al. Oral poly(ADP-ribose) polymerase inhibitor olaparib in patients with BRCA1 or BRCA2 mutations and recurrent ovarian cancer: a proof-of-concept trial. Lancet. 2010;376:245–51.
Article CAS PubMed Google Scholar
Pennell NA, Mutebi A, Zhou Z-Y, Ricculli ML, Tang W, Wang H, et al. Economic impact of next-generation sequencing versus single-gene testing to detect genomic alterations in metastatic non-small-cell lung Cancer using a decision analytic model. JCO Precis Oncol. 2019;3:1–9.
Article PubMed Google Scholar
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56.
Article CAS PubMed Google Scholar
Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat Cancer. 2022;3:1026–38.
Article PubMed Google Scholar
Bordoloi D, Singh V, Sanober S, Buhari SM, Ujjan JA, Boddu R. Deep learning in healthcare system for quality of service. J Healthc Eng. 2022;2022:8169203.
Article PubMed PubMed Central Google Scholar
Hecht-Nielsen. Theory of the backpropagation neural network. International 1989 Joint Conference on Neural Networks. 1989;593–605.
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323:533–6.
Article ADS Google Scholar
Bunescu RC, Mooney RJ. Multiple instance learning for sparse positive bags. In: Proceedings of the 24th international conference on machine learning. New York, NY, USA: Association for Computing Machinery; 2007. p. 105–12.
Chapter Google Scholar
Reis-Filho JS, Kather JN. Overcoming the challenges to implementation of artificial intelligence in pathology. J Natl Cancer Inst. 2023;115:608–12.
Article PubMed Google Scholar
Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep Learning for Identifying Metastatic Breast Cancer. arXiv. 2016.
Wang KS, Yu G, Xu C, Meng XH, Zhou J, Zheng C, et al. Accurate diagnosis of colorectal cancer based on histopathology images using artificial intelligence. BMC Med. 2021;19:76.
Article CAS PubMed PubMed Central Google Scholar
Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell. 2022;40:1095–110.
Article CAS PubMed PubMed Central Google Scholar
Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer. 2021;124:686–96.
Article PubMed Google Scholar
Cifci D, Foersch S, Kather JN. Artificial intelligence to identify genetic alterations in conventional histopathology. J Pathol. 2022;257(4):430–44. https://doi.org/10.1002/path.5898.
Article PubMed Google Scholar
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
Article PubMed PubMed Central Google Scholar
Ouellette TW, Awadalla P. Inferring ongoing cancer evolution from single tumour biopsies using synthetic supervised learning. PLoS Comput Biol. 2022;18:e1010007.
Article ADS CAS PubMed PubMed Central Google Scholar
Dogan H, Hakguder Z, Madadjim R, Scott S, Pierobon M, Cui J. Elucidation of dynamic microRNA regulations in cancer progression using integrative machine learning. Brief Bioinform. 2021;22:bbab270.
Article PubMed PubMed Central Google Scholar
Chang Y, He F, Wang J, Chen S, Li J, Liu J, et al. Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning. Comput Struct Biotechnol J. 2022;20:4600–17.
Article CAS PubMed PubMed Central Google Scholar
Chen RJ, Lu MY, Wang J, Williamson DFK, Rodig SJ, Lindeman NI, et al. Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis. IEEE Transactions on Medical Imaging. 2019;41:757–70.
Pei L, Jones KA, Shboul ZA, Chen JY, Iftekharuddin KM. Deep neural network analysis of pathology images with integrated molecular data for enhanced glioma classification and grading. Front Oncol. 2021;11:668694.
Article CAS PubMed PubMed Central Google Scholar
Elsharawy KA, Gerds TA, Rakha EA, Dalton LW. Artificial intelligence grading of breast cancer: a promising method to refine prognostic classification for management precision. Histopathology. 2021;79:187–99.
Article PubMed Google Scholar
Tan K, Huang W, Liu X, Hu J, Dong S. A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction. Artif Intell Med. 2022;126:102260.
Article PubMed Google Scholar
Ektefaie Y, Yuan W, Dillon DA, Lin NU, Golden JA, Kohane IS, et al. Integrative multiomics-histopathology analysis for breast cancer classification. Npj Breast Cancer. 2021;7:147.
Article CAS PubMed PubMed Central Google Scholar
Wang X, Yu G, Yan Z, Wan L, Wang W, Lizhen LCC. Lung Cancer subtype diagnosis by fusing image-genomics data and hybrid deep networks. IEEE/ACM Trans Comput Biol Bioinform. 2021;2:512–23.
Albarqouni S, Baur C, Achilles F, Belagiannis V, Demirci S, Navab N. AggNet: deep learning from crowds for mitosis detection in breast Cancer histology images. IEEE Trans Med Imaging. 2016;35:1313–21.
Article PubMed Google Scholar
Choi JE, Kim Z, Park CS, Park EH, Lee SB, Lee SK, et al. Breast Cancer statistics in Korea, 2019. J Breast Cancer. 2023;26:207–20.
Article PubMed PubMed Central Google Scholar
CAMELYON16 - grand challenge. grand-challenge.org. https://camelyon16.grand-challenge.org. Accessed 28 Nov 2023.
CAMELYON17 - grand challenge. grand-challenge.org. https://camelyon17.grand-challenge.org. Accessed 28 Nov 2023.
Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24:1559–67.
Article CAS PubMed PubMed Central Google Scholar
Yuan Y, Shi Y, Li C, Kim J, Cai W, Han Z, et al. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics. 2016;17(Suppl 17):476.
Article PubMed PubMed Central Google Scholar
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver Cancer. Clin Cancer Res. 2018;24:1248–59.
Article CAS PubMed Google Scholar
Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 2016;131:803–20.
Article PubMed Google Scholar
Ren J, Karagoz K, Gatza ML, Singer EA, Sadimin E, Foran DJ, et al. Recurrence analysis on prostate cancer patients with Gleason score 7 using integrated histopathology whole-slide images and genomic data through deep neural networks. J Med Imaging (Bellingham). 2018;5:047501.
PubMed Google Scholar
Ren J, Karagoz K, Gatza M, Foran DJ, Qi X. Differentiation among prostate cancer patients with Gleason score of 7 using histopathology whole-slide image and genomic data. Proceedings of SPIE-the International Society for Optical Engineering. 2018;10579:1057904.
Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics. 2019;35:i446–54.
Article CAS PubMed PubMed Central Google Scholar
Lu Z, Xu S, Shao W, Wu Y, Zhang J, Han Z, et al. Deep-learning-based characterization of tumor-infiltrating lymphocytes in breast cancers from histopathology images and multiomics data. JCO Clin Cancer Inform. 2020;4:480–90.
Article PubMed Google Scholar
Xu S, Lu Z, Shao W, Yu CY, Reiter JL, Feng Q, et al. Integrative analysis of histopathological images and chromatin accessibility data for estrogen receptor-positive breast cancer. BMC Med Genet. 2020;13:1–12.
Google Scholar
Hao J, Kosaraju SC, Tsaku NZ, Song DH, Kang M. PAGE-net: interpretable and integrative deep learning for survival analysis using histopathological images and genomic data. Biocomputing. 2020;2019:355–66.
Google Scholar
Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int J Comput Vis. 2020;128:336–59.
Lundberg S, Lee S-I. A Unified Approach to Interpreting Model Predictions. arXiv [cs.AI]. 2017.

Download references

Acknowledgements

BioRender.com was used to generate Figs. 1 and 2.

Funding

Open Access funding enabled and organized by Projekt DEAL. JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111), the Max-Eder-Programme of the German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (Transplant.KI, 01VSF21048) the European Union’s Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312) and the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Author information

Authors and Affiliations

Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
Michaela Unger & Jakob Nikolas Kather
Department of Medicine I, University Hospital Dresden, Dresden, Germany
Jakob Nikolas Kather
Pathology & Data Analytics, Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, UK
Jakob Nikolas Kather
Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
Jakob Nikolas Kather

Authors

Michaela Unger
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Nikolas Kather
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MU and JNK jointly conceived this study. MU performed the analyses. MU and JNK jointly wrote the manuscript. As per the ICMJE/COPE guidelines of April 2023, we hereby disclose that the following tools were used to write this article. Microsoft Word and Google Documents as Word processing software, ChatGPT-4 for checking and correcting spelling and grammar.

Corresponding author

Correspondence to Jakob Nikolas Kather.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

JNK declares consulting services (scientific advisory board member) for Owkin, France; DoMore Diagnostics, Norway and Panakeia, UK and has received honoraria for lectures by Bayer, Eisai, MSD, BMS, Roche, Pfizer and Fresenius. MU does not have any competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. Supplementary Table 1.

Histopathology Papers. For all papers title, authors, year, journal and PubMed-URL, category, subcategory and cancer tissue are listed. Supplementary Table 2. Genomic Papers. For all papers title, authors, year, journal and PubMed-URL, category, subcategory and cancer tissue are listed. Supplementary Table 3. Multimodal Papers. For all papers title, authors, year, journal and PubMed-URL, category, subcategory and cancer tissue are listed. Supplementary Table 4. All Publications. For all papers title, authors, year, PubMed-URL and decision are listed. Supplementary Table 5. Timewise Publication Counts. Raw Counts of publications grouped by year and tissue type needed to generate Figure 2. Supplementary Material. Data selection of this study. Publications were collected from PubMed in nine search queries obtaining 3048 results. All papers were then uploaded to Rayyan to manually filter and classify them down to a total number of 534 articles used for this study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Unger, M., Kather, J.N. A systematic analysis of deep learning in genomics and histopathology for precision oncology. BMC Med Genomics 17, 48 (2024). https://doi.org/10.1186/s12920-024-01796-9

Download citation

Received: 03 August 2023
Accepted: 02 January 2024
Published: 05 February 2024
DOI: https://doi.org/10.1186/s12920-024-01796-9

A systematic analysis of deep learning in genomics and histopathology for precision oncology