Mapping the genomic diaspora of gastric cancer

10 | FEBRUARY 2022 | VOLUME 22 www.nature.com/nrc REV I EWS 0123456789( ) ; : Gastric cancer (GC) is a leading contributor to global cancer mortality, being responsible for >700,000 deaths annually1–3. GC incidence is geographically distinct across the world, with a high prevalence in Asia, Africa, South America and eastern Europe4. In Western countries, while absolute GC incidence has steadily declined over past decades5, the number of cases in the proximal stomach appear to be increasing6. These geographical differences have been ascribed to various factors, including Helicobacter pylori infection, lifestyle factors (for example, alcohol, smoking) and genetic risk factors7. GC is a heterogenous disease, with patients routinely stratified by clinical stage and histopathological variants. Prior to the genomics era, the most widely applied classification scheme was proposed by Lauren, classifying GCs into ‘intestinal’ and ‘diffuse’ subtypes8. Intestinal-type GC is associated with H. pylori infection and glandular or papillary differentiated structures9, while diffuse-type GC typically comprises poorly cohesive, dedifferentiated tumour cells amidst a copious cellular stroma10. Besides that of Lauren, other histopathological classifications have been proposed11–14. However, unlike clinical staging (tumour size, lymph node status, presence of metastasis (TNM)), histopathological variation is currently not routinely used to dictate GC treatment andmanagement. Motivated by the need for a clinically informative GC taxonomy, pioneering genomic studies by our group and others in the earlier half of this decade uncovered many GC-related genes15–18. GC driver genes can be broadly divided two classes: those frequently altered in multiple tumour types (for example, TP53, ARID1A, ERBB2 (also known as HER2), FGFR2) and others that are more tissue-restricted and lineage-restricted (for example, CDH1, RHOA, GATA factors)19,20. Seminal studies by The Cancer Genome Atlas (TCGA) consortium and the Asian Cancer Research Group (ACRG) have also defined specific molecular subtypes of GC21,22 as extensively reviewed elsewhere23,24. Using DNA-based alterations, the TCGA consortium reported four major GC molecular subtypes, exhibiting chromosomal instability (CIN), microsatellite instability (MSI), genome stability (GS) or Epstein–Barr virus (EBV) positivity. Driver pathways associated with these four subtypes include receptor tyrosine kinase (RTK) gene or KRAS amplifications and TP53 mutations in CIN GCs, high tumour mutation burden and MLH1 methylation silencing in MSI GCs, CDH1 and RHOA mutations or ARHGAP fusions in GS GCs, and genome-wide DNA hypermethylation, PIK3CA mutations, and PDL1 or PDL2 overexpression in EBV+ GCs21 (Fig. 1a). Likewise, using transcriptomic Mapping the genomic diaspora of gastric cancer Khay Guan Yeoh 1,2,3 and Patrick Tan 3,4,5,6 ✉ Abstract | Gastric cancer (GC) is a leading contributor to global cancer incidence andmortality. Pioneering genomic studies, focusing largely on primary GCs, revealed driver alterations in genes such as ERBB2, FGFR2, TP53 and ARID1A as well as multiplemolecular subtypes. However, clinical efforts targeting these alterations have produced variable results, hampered by complex co-alteration patterns inmolecular profiles and intra-patient genomic heterogeneity. In this Review, we highlight foundational and translational advances in dissecting the genomic cartography of GC, including non-coding variants, epigenomic aberrations and transcriptomic alterations, and describe how these alterations interplay with environmental influences, germline factors and the tumour microenvironment. Mapping of these alterations over the GC life cycle in normal gastric tissues, metaplasia, primary carcinoma and distant metastasis will improve our understanding of biological mechanisms driving GC development and promoting cancer hallmarks. On the translational front, integrative genomic approaches are identifying diversemechanisms of GC therapy resistance and emerging preclinical targets, enabled by technologies such as single-cell sequencing and liquid biopsies. Validating these insights will require specifically designedGC cohorts, convergingmulti-modal genomic data with longitudinal data on therapeutic challenges andpatient outcomes. Genomic findings fromthese studieswill facilitate ‘next-generation’ clinical initiatives inGCprecision oncology and prevention. 1Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore. 2Department of Gastroenterology and Hepatology, National University Health System, Singapore, Singapore. 3Singapore Gastric Cancer Consortium, Singapore, Singapore. 4Cancer and Stem Cell Biology, Duke-NUS Medical School Singapore, Singapore, Singapore. 5Genome Institute of Singapore, Singapore, Singapore. 6Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore. ✉e-mail: gmstanp@ duke-nus.edu.sg https://doi.org/10.1038/ s41568-021-00412-7 Reviews Nature reviews | CanCer volume 22 | February 2022 | 71

VOLUME 22 | FEBRUARY 2022 | 11 NATURE REVIEWS | CANCER REV I EWS 0123456789( ) ; : data, the ACRG reported that GCs could be divided into TP53 -active and TP53 -inactive, mesenchymal-like, and MSI subtypes. Independent studies, including those from our group, have confirmed the mesenchymal GC subtype 25 ,26 . While clearly representing milestones in the field, these studies have also had limitations, such as a focus on surgically resected primary tumours, and little being known about the presence and distribution of these alterations in pre-malignant gastric lesions or at metastatic sites. The TCGA study, by prioritizing samples with high-tumour content, may also have de-emphasized the role of stromal cell types, which is now appreciated as a critical component of the tumour ecosystem 27 . Similarly, widespread applicability of the ACRG transcriptomic classifier has been challenged by issues of data normalization, platform differences, data centering and the need for robust single-sample predictor algorithms 28 ,29 . In this Review, we describe recent foundational and translational advances in GC genomics and epigenomics and highlight how these findings have improved our understanding of basic mechanisms driving GC biology. While our Review does not focus on the direct clinical actionability of genomic targets or treatment recommendations, it is nevertheless worth noting recent clinical studies stratifying patients with GC into clinical trials based on their genomic aberrations (reviewed in reFs 24 ,30 ) (Supplementary Table 1). Conducted as either single-biomarker-specific trials 31 –34 or as ‘umbrella’ trials (VIKTORY, PANGEA, GI-SCREEN), where multiple biomarkers are profiled for each tumour 35 –37 , most of these studies have reported relatively modest improvements in patient outcomes. Factors contributing to the latter include high levels of inter-patient and intra-patient genomic heterogeneity 38 ,39 , the reliance on archival primary sample profiles to inform the treatment of metachronous cancers , differences in tumour molecular profiles between primary and metastatic sites (see later section), and the emergence of resistant tumour clones following the death of sensitive clones as also shown in other tumour types 40 . However, some promising signals have also emerged from these studies, such as relationships between high MET copy numbers in circulating tumour DNA (ctDNA) and response to the MET inhibitor savolitinib 36 . Several of these studies have also demonstrated how integrative genomic profiling can identify mechanisms of baseline or acquired treatment resistance. For example, patients with ERBB2-positive GCs ( iHC3 + or IHC2 + /FISH + ) have better outcomes when treated with anti-HER2 therapies. However, there is still considerable variability in overall responses (90% progressing by 24 months) 31 and upfront and acquired resistance to ERBB2 targeting is not uncommon (47% objective response rate). Genomic profiling of patients treated with ERBB2-targeting agents (trastuzumab, lapatinib, afatinib) has shown that resistance can occur by diverse mechanisms, such as RTK and/or cell cycle regulator co-amplification ( EGFR , MET , CCNE1 ), Tumour a Genetic changes b Epigenetic changes Non-coding alterations • Recurrent indels in gastric lineage-specific genes (e.g. PGC, LIPF, MUC6) • Hotspot mutations in CTCF binding sites involving AT>CG and AT>GC substitutions • Hotspot tandem duplications cause ‘enhancer hijacking’ and high oncogene expression (e.g. IGF2, CCNE1) Mutational signatures • Tissue agnostic (e.g. signature 3 homologous recombination defects, platinum or PARPi therapy) • Tissue specific (e.g. signature 17 gastric acid reflux?) • DNA methylation (e.g. miR-124a-3, EMX1 as risk predictors of metachronous GC) c RNA-based tumour alterations • Tumour-associated splicing • RNA editing and A-to-I base-pair changes • N6-methyladenosine modifications via METTL3 and YTHDF1 • Histone modifications (e.g. H3K27ac, H3K4me3, H3K4me1 to profile promoters and enhancers) TCGA molecular subtypes • EBV • Microsatellite instability • Genome stability • Chromosomal instability • Alternative promoter usage • Host–EBV interactions • Host enhancer activation Me Me Me Me Fig. 1 | Discovery of non-coding genomic alterations, epigenetic modifications and RNA alterations in GC. a | Delving into the non-coding genome has identified novel mechanisms impacting gastric cancer (GC)-related genes, such as recurrent insertion-deletions in tissue-specific genes and structural variants including tandem duplications. Mutational signatures can provide clues about prior cellular insults and potential therapeutic options (signatures linked to homologous recombination deficiency and microsatellite instability). b | Epigenetic modifications encompass DNA methylation, histone marks and, more recently, the pervasive use of alternative promoters and Epstein–Barr virus (EBV)–host interactions resulting in abnormal chromatin states and gene expression. Helicobacter pylori has been shown to cause DNAmethylation epigenetic field cancerization. c | RNA alterations, including splicing changes, RNA editing and RNA modifications, are emerging contributors to the transcriptomic complexity of GCandpotential drivers of GCbiology. PARPi, poly(ADP-ribose) polymerase inhibitor; TCGA, The Cancer Genome Atlas. Helicobacter pylori A bacterium capable of living in the stomach and associated with increased gastric cancer risk. Stroma Cell types present in a tumour but not cancer epithelial cells, such as fibroblasts, blood vessels and immune cells. Driver gene A gene where mutations functionally contribute to the development of cancer. Chromosomal instability (CiN). Defective control of chromosomal number and structure. Microsatellite instability (Msi). Defective replication of specific repeat-regions (microsatellites) in the genome. Metachronous cancers Cancers that occur 6 months after resection of the primary cancer. Clones set of cells with shared genetic traits among a larger field of cells. www.nature.com/nrc REV I EWS 72 | February 2022 | volume 22

12 | FEBRUARY 2022 | VOLUME 22 www.nature.com/nrc REV I EWS 0123456789( ) ; : selection of ERBB2-negative resistant tumour subclones, acquired RAS and PIK3CA mutations, and secondary ERBB2 mutations38,41,42. Studies targeting other genomic alterations, such as FGFR2, MET and PIK3CA, have yielded similar findings43–45. These findings argue that single-target-based approaches are unlikely to prove widely applicable to GC and that a deeper understanding of GC at the systems level is required to improve the outcomes of patients with GC, for example, by identifying promising therapeutic combinations46. It is thus our hope that a comprehensive understanding of the (epi) genomic cartography of GC, with the associated causative mutational processes and downstream impact on cancer driver pathways, will reveal novel translational opportunities to improve the management of patients with GC and interception efforts for early detection and prevention. Beyond protein coding genes Non-coding genetic alterations. Previous GC genomic surveys focused largely on point mutations (for example, PIK3CA, TP53) or chromosomal aberrations affecting protein coding genes (for example, ERBB2 amplification, ARHGAP fusions). However, protein-encoding genomic regions (exomes) represent only 1–2% of the entire human genome and our knowledge of somatic events affecting the remaining 98–99% of non-coding regions remains scanty. Recent whole-genome sequencing studies of GC have enabled the identification of non-coding genomic alterations. We analysed 212 primary GC whole-genome sequences using an analytical framework controlling for >200 covariates influencing regional mutation rates (for example, local sequence context, chromatin profiles). This study identified two classes of frequent non-coding mutations in GC (‘hotspots’) occurring significantly above background mutation levels47 (Fig. 1a). The first class included insertion- deletion sequences (indels) in the intronic regions of gastric lineage-specific genes, such as PGC, LIPF and MUC6, in 18% of non-MSI GCs. Parallel pan-cancer studies, published around the same period, showed a similar relationship between non-coding somatic indels and highly expressed tissue-specific genes (for example, surfactant genes in lung cancer)48. Transcriptional analyses of these lineage-specific genes did not reveal significant changes between GCs with and without indel mutations, making it likely that these non-coding indel mutations are passenger rather than driver events47. However, supporting the activity of an as-yet unknown mutational process linking cell-lineage pathways to cancer mutability, these non-coding indels are associated with specific sequence motifs, chromatin modifications and, possibly, transcription-associated mutagenesis48. From a translational perspective, the tissue-specific nature of these non-coding mutations could be exploited in situations such as carcinoma of unknown primary to determine cancer tissues-of-origin49. The second class of non-coding mutations comprises frequent non-coding hotspot mutations in genomic regions associated with CTCF binding sites (CBSs), occurring in 25% of GCs47 (Fig. 1a). CTCF encodes a chromatin regulator protein necessary for the establishment of topologically associated domains (TADs)50,51. Interestingly, CBS hotspot mutations have also been reported in other gastrointestinal malignancies such as colorectal cancer52. Work is ongoing to clarify if CBS hotspot mutations are driver or passenger events caused by an underlying mutational process that remains poorly understood. In GC, CBS hotspot mutations are enriched in CINGCs, marginally linked to gene expression alterations at nearby genes, and associated with local chromosomal breakpoints47. This latter point is intriguing given the function of CTCF in establishing TADs, as higher-order genomic interactions are strong determinants of regions susceptible to chromosomal breaks53. An independent whole-genome analysis of somatic structural variants in 168 primary GCs reported a new subtype exhibiting high levels of tandem duplications (TDs), another class of non-genomic alterations54 (Fig. 1a). This ‘tandem duplicator phenotype’, present in 14% of GCs, has also been observed in triple-negative breast, serous ovarian and endometrial cancers55,56. However, unlike the latter tumour types, tandem duplicator phenotype positivity in GC does not appear to be associated with TP53 mutations or BRCA status. In cancer, TDs are generally notable as they can disrupt TAD boundaries, relocating enhancers normally insulated from oncogenes into the same TAD, thereby causing oncogene activation57, a phenomenon termed ‘enhancer hijacking’. Reflecting the importance of TDs in promoting GC, several GC TDs were concentrated at known GC driver genes such as ERBB2, MYC and KLF5 (reF.54). This study also demonstrated the utility of studying TDs for cancer gene discovery by pinpointing new GC-related genes such as the gene encoding the RNA-binding protein ZFP36L2. In a similar vein, by paired-end H3K27ac ChIP-sequencing we found that enhancer hijacking is a relatively common mechanism in GC, driving the high expression of oncogenes such as IGF2, CCNE1 and CCND1 (reF.58). In our study, CCNE1 enhancer hijacking was observed in 8% of GCs, caused by multiple chromosomal rearrangements juxtaposing enhancers from diverse genomic regions to the CCNE1 gene body. This observation may have implications for GC therapeutic resistance as high CCNE1 levels have been associated with resistance to ERBB2-targeting agents in GC and other tumour types59,60. Mutational signatures. Another class of genomic alterationsmore readily captured bywhole-genome sequencing than exome sequencing are ‘mutational signatures’, comprising sequence-specific mutation patterns across the genome caused by mutational processes in human cancers61 (Fig. 1a). Prevalent mutation signatures in GC include signature 1 (associated with ageing and 5-methyl cytosine deamination), signature 2 (APOBEC activity) and signature 5 (aetiology currently unknown). These signatures also present in other tumour types and are likely associated with intrinsic biological processes. However, GCs have also been shown to harbour tissue- specific signatures that are possibly the result of extrinsic factors such as signature 17 (T>G substitutions in a CTT context) and signature 18 (G:C>T:A transversions)62. IHC3+ A measure of the intensity of immunohistochemical staining (3+: high; 2+ moderate; 1+ weak). Covariates Factors that directly or indirectly influence the variable of interest. Chromatin modifications Chemical modifications occurring at precise positions in DNA–protein complexes and necessary for gene regulation. Topologically associated domains (TADs). Three-dimensional chromatin domains involved in regulating gene expression. Tandem duplications (TDs). A mutational process where genomic regions are copied adjacent to one another. Enhancer A non-coding region of the genome that can activate the expression of target genes. Nature reviews | CanCer Rev i ews volume 22 | February 2022 | 73

VOLUME 22 | FEBRUARY 2022 | 13 NATURE REVIEWS | CANCER REV I EWS 0123456789( ) ; : Signature 18, largely present in the non-coding genome, has been linked to oxidative damage and base excision repair defects63. While gastric acid reflux has been proposed as a potential cause of signature 17 (reFs64,65), this remains to be definitively proven. Establishing the definitive causes of these signatures, perhaps using defined experimental models66, represents an important area for future GC genomic research as the gastric epithelium can encounter several different genotoxic and pro- inflammatory metabolites, including unconjugated bile acids (for example, deoxycholate) and nitrosamines67,68. Identifying instigator compounds underlying these ‘extrinsic’ signatures may suggest strategies to modulate their mutagenic effects. The clinical impact of mutational signatures in GC is under active investigation. Evidence is emerging that mutational signatures in primary and metastatic GCs may differ (see next section). Analysis of over 470 GC exome and whole-genome sequences revealed that 7–12% of GCs harbour a mutational signature associated with homologous recombination defects (HRDs), which may sensitize GCs bearing these signatures to platinum therapy or poly(ADP-ribose) polymerase inhibitors (PARPi)69. Stratifying patients by HRD mutational signatures for PARPi may prove promising in light of recent trials reporting a lack of overall PARPi efficacy in unselected GC populations70. The HRD signature is enriched in GCs with inactivating germline or somatic mutations, or epigenetic silencing, of genes regulating homologous DNA recombination such as BRCA1, BRCA2, PALB2 and RAD51C71,72. GC samples carrying HRD signatures also tend to be associated with CIN, intestinal-subtype GC and focal somatic copy number alteration events69,71. Other notable GC mutational signatures are those associated with defective DNA mismatch repair (signatures 15, 20)73 (Fig. 1a) and GCs exhibiting mismatch repair signatures, which tend to be MSI-positive, are distinct from GCs with high HRD signature levels. Compared to other GCs, MSI-positive GCs exhibit highly distinct associations between mutation patterns and epigenetic features, such as open chromatin marks or replication timing47, which may be caused by the loss of mismatch-coupled repair at early-replicating open chromatin regions. Analysis of clinical trial samples suggests that, in GC, MSI positivity is associated with improved prognosis and response to immune-checkpoint inhibitors (ICIs) (see next section) but paradoxically predictive of lack of benefit to standard chemotherapy38,74,75. The resistance of MSI GCs to chemotherapy may be related to chemotherapeutic drugs requiring components of an intact DNA mismatch repair machinery to exert their cytotoxic effects or to the suppression of the immune system by chemotherapy76,77. Epigenetic andRNA regulation. Another category of GC alterations affecting non-coding genomic regions are epigenetic alterations that can influence the expression of oncogenes and tumour suppressor genes78 (Fig. 1b). Similar to genomic alterations, epigenetic alterations are transmitted through cell divisions and preserved during DNA replication, thereby allowing epigenetic aberrations to propagate during clonal outgrowth. Previous epigenetic studies in GC focused largely on DNA methylation alterations at CpG islands, often associated with transcriptional gene silencing. Aberrant DNA methylation in GC can be caused by exposure to extrinsic agents such as H. pylori or EBV. Chronic inflammatory processes induced by H. pylori can result in widespread DNA hypermethylation and hypomethylation across the stomach epithelium (‘epigenetic field cancerization’)79,80. These DNA methylation changes can occur at regulatory regions (for example, TF binding motifs, CpG islands or gene promoters of cancer-related genes) and repeat elements, effectively remodelling normal gastric methylomes81,82. Notably, a proportion of these alterations are irreversible even after H. pylori eradication83, suggesting that DNAmethylation levels may be harnessed as a biomarker to predict GC risk. This was confirmed in a recent prospective study showing that increased DNA methylation levels of marker genes, such as mir-124a-3 and EMX1, can be used to predict the risk of metachronous GC after initial surgical resection84,85 (Fig. 1b). In terms of molecular subtypes, EBV-positive GCs have been reported to exhibit a profoundly high level of genome-wide DNA methylation86, and certain GCs can also exhibit a CpG island methylator phenotype (CIMP) even in the absence of EBV infection87. More recently, GC epigenetic studies have focused on other chromatin modifications, such as histone alterations (for example, H3K27ac, H3K4me3 andH3K4me1), that bookmark enhancers and promoters88,89. In a genome-wide survey of promoter activity in GC, we found that alternate promoter selection is a pervasive feature of GCs90 and this phenomenon has been subsequently validated in other tumour types by our group and others91,92 (Fig. 1b). We have proposed that alternate promoter usage may be utilized by emerging tumours to avoid the host immune system, thereby representing a novel immune-editing process91,92. Alternative promoter usage can also produce tumour-associated gene isoforms93. We also performed another study investigating higher-order chromatin topologies in GC, finding that EBV-positive GCs have a specific 3D topological architecture established through cross-species chromatin interactions between EBV episomes and the host GC genome94. These EBV–human chromatin interactions may function to reactivate latent enhancers and drive EBV-positive GC by increasing proto-oncogene expression (Fig. 1b). Notably, despite stability across cell divisions, epigenetic modifications are also conceptually targetable by virtue of their associated enzymatic readers, writers and erasers. The development of novel epigenetic compounds to target GC, either as monotherapy or in combination with other therapeutics, thus comprises an important future area of research95,96. An emerging area of investigation in GC are RNA-associated alterations (Fig. 1c). Previous GC RNA studies focused on microRNAs (miRNAs) and long non-coding RNAs97,98. These RNA species can be oncogenic (for example, the long non-coding RNA ZFAS1 promotes GC proliferation by suppressing KLF2 and NKD2 (reF.99)) or tumour inhibitory (for example, Homologous recombination A repair process where double-stranded DNA breaks are corrected using similar or identical DNA molecules in the cell. www.nature.com/nrc Rev i ews 74 | February 2022 | volume 22

14 | FEBRUARY 2022 | VOLUME 22 www.nature.com/nrc REV I EWS 0123456789( ) ; : miR-35-5p and miR-584-3p supress GC progression or metastasis) 100 ,101 . Transcriptome-focused studies are now highlighting the presence of other levels of RNA alterations in GC. These include tumour-associated alternative splicing 102 ,103 , A-to-I base-pair alterations caused by RNA editing 104 , and RNA-based modifications such as N 6 -methyladenosine (m 6 A) 105 . The extent to which these alterations represent true cancer driver events remains to be fully established since, unlike DNA and epigenetic alterations, RNA alterations may by themselves not be heritable and should thus be interpreted with caution. Nevertheless, m 6 A, the most common RNA methylation, has been recently shown to be significantly increased in GC and mediated through the METTL3 writer, stabilizing mRNA transcripts of genes, such as HGDF and ZMYM1 , that drive GC aggressiveness 105 ,106 . Another m 6 A reader, YTHDF1, has also been shown to promote GC carcinogenesis by increasing the translation of the Wnt receptor frizzled 7 (FZD7) 107 . Notably, RNA alterations, such as alternative splicing and m 6 A modifications, cannot be readily detected through conventional DNA-only analysis. Thus, if functionally validated as GC drivers, such findings raise the importance of profiling GCs at levels beyond the genome to fully elucidate regulators of GC biology. Beyond early-stage tumours Evolution of GC from pre-malignancy. Understanding how (epi)genomic changes evolve over the complete GC life cycle, from normal, metaplasia (pre-malignancy), early-stage tumours to late-stage metastases is an enduring goal of GC genomics. At one end of the spectrum are lesions associated with gastric pre-malignancy. The ‘Correa cascade’ posits that intestinal-type GC emerges from normal gastric epithelia, progressing to chronic gastritis, intestinal metaplasia (IM), dysplasia and GC 108 (Fig. 2 ) . Among these, IM is a pre-malignant condition where the gastric epithelial lining is replaced by cells exhibiting intestinal-like phenotypes (Fig. 2 ) . Although patients with IM have an estimated annual GC risk of 0.129–0.3% per year 109 ,110 , many more individuals have IM compared to GC. Evidence for IM heterogeneity is supported by studies describing distinct IM histopathological categories, with patients with incomplete (or type III) IM having an increased risk of progression 111 . Thus, identifying the subset of patients with IM that will eventually progress to full malignancy may identify ‘very-high-risk’ individuals for intensive endoscopic surveillance 112 . In Singapore, the Gastric Cancer Epidemiology Program recruited 2,980 individuals with multiple risk factors for GC (Chinese, age ≥50 years, with history of H. pylori infection and/or known pre-malignant gastric lesions such as atrophic gastritis and intestinal metaplasia) 113 . Each participant was prospectively followed for 5 years with scheduled surveillance endoscopies at years 3 and 5 (reF. 113 ) . At completion of the follow-up period, 21 patients had early gastric neoplasia (dysplasia or carcinoma). In a pilot study, we analysed 108 high-risk IM samples from the Gastric Cancer Epidemiology Program and discovered that features of large-scale chromosomal instability, such as reduced telomere lengths and somatic copy number alterations, were predictive of IM progression to GC 112 . Intestinaltype GC Diffuse-type GC Dysplasia Normal gastric mucosa Chronic to atrophic gastritis Intestinal metaplasia H. pylori Cardia Fundus Corpus Pyloric antrum Paneth cell Base Neck Isthmus Gastric pit Mucous cell Stem cell Parietal cell Chief cell Endocrine cell Enterocyte H. pylori Normal Intestinal metaplasia Goblet cell Neuroendocrine cell Evolution of GC Tumour Aberrant DNA methylation • ↑ Chromosomal instability • ↓ Telomere length • ↑ Copy number alterations Identification of high-risk individuals Fig. 2 | Temporal evolution of GCs from pre-malignancy to metaplasia and carcinoma. The Correa cascade posits that normal gastric cells are transformed tomalignancy via a stepwise process starting from Helicobacter pylori infection of normal gastricmucosa to chronic-atrophic gastritis, intestinal metaplasia, dysplasia and, eventually, intestinal-type gastric cancers (GCs). The pathogenesis of diffuse-type tumours remains poorly understood but may also require H. pylori exposure 190 . Genetic (for example, chromosomal instability, reduced telomere length, copy number alterations) and epigenetic (for example, DNAmethylation) changes are associatedwith transition to carcinoma. Intestinal metaplasia may represent a ‘point of no return’, signalling increased cancer risk and involving histological changes (for example, presence of intestinal cell types). NaTure revIeWS | CANCER Rev i ews volume 22 | February 2022 | 75

VOLUME 22 | FEBRUARY 2022 | 15 NATURE REVIEWS | CANCER REV I EWS 0123456789( ) ; : These results suggest that loss of regulation of chromosomal integrity (that is, aneuploidy) may represent a defining feature of patients with IM that eventually progresses to malignancy. In this same study, we also observed somatic mutations in normal gastric tissues relative tomatched blood samples albeit to a lower extent than in IM samples. These results are consistent with previous data establishing the presence of sub-clonal mutations in normal gastric tissues114. Interestingly, mutation rates were correlated with age in normal gastric samples but not in IM samples, indicating the presence of additional mutational processes operative in IM. Gene–environment interactions. Dissecting how environmental, germline and lifestyle factors interplay to drive GC development may improve our understanding of GC aetiology and how intrinsic and extrinsic factors interact to drive carcinogenesis (Box 1 summarizes the extrinsic and germline factors associated with GC). An emerging area of interest involves how tumour molecular phenotypes vary across different ethnic groups, given geographical variations in GC incidence. While stage-matched clinical outcomes are known to vary between ethnic groups with Asian patients showing better prognosis115–117, somatic alterations and driver gene profiles between Asian and non-Asian patients with GC appear to be similar118, although subtype frequencies may differ119. Suzuki et al. compared the GC exome profiles of 319 Asian patients versus 212 non-Asian patients120. This study identified a subtype of Asian- specific GCs where mutations were strongly dominated by mutational signature 16 (T>C mutations at ApTpN motifs). Notably, many Asian patients in this subgroup harboured a lifestyle history of alcohol intake and a combination of inactive germline ALDH2 alleles, a well- known Asian-specific germline polymorphism120–122. While the exact aetiology of signature 16 is still unknown, it is possible that loss of germline ALDH2 activity in these patients, compounded with excessive alcohol intake, may result in increased levels of acetaldehyde, a by-product of ethanol and a known genotoxic agent123. Interestingly, this Asian-specific GC subtype also exhibited distinct compositions of immune cells, which may contribute to previously described differences in immune populations between Asian and non- Asian GCs124. This observation provides an elegant example of how germline factors (ALDH2) and lifestyle choices (alcohol intake) can interact to directly influence somatic mutations in GC (signature 16). Molecular processes in metastatic GC. At the other end of the clinical spectrum are GCs occurring in the metastatic setting. Recent profiling studies of patients with late-stage GC or local/regional metastases have revealed important molecular differences compared with primary tumours. These include lower proportions in metastatic GCs of specific GC subtypes (MSI, EBV), enrichment of TP53 mutations, and less frequent mutations in driver genes such as KMT2C, PTPRD and CTNNB1 (reFs38,39). Analysis of GC profiles from peritoneal carcinomatosis (PC) has also suggested a higher degree of CDH1 mutations, increased proportions of ‘ageing’-related mutational signatures (signature 1), whole-genome doubling events and enhanced chromosomal instability125,126. Notably, these reports have also highlighted specific driver mutations in metastatic GCs previously unreported in early-stage GC studies such as mutations in genes encoding the transcription initiation factor TAF1 and the polymeric immunoglobulin receptor PIGR125,127. It is possible that these primary versus metastatic genomic differences may be due to metastatic lesions experiencing distinct selective pressures not encountered at gastric primary sites. For example, it has been reported that primary and metastatic GCs are associated with distinct immune environments128. Studies comparing the genomic profiles of primary lesions and metastases from the same patient are also providing insights into the origins of metastases39,129. While ‘founder’ mutations, such as TP53 and CCNE1 amplifications, are often common between primary and metastatic lesions, only 40–60% of mutations are shared. Phylogenetic analysis supports an evolutionary model where metastatic lesions first emerge from subclones resident in primary GC and subsequently undergo further molecular adaptation and evolution at the metastatic site. This model is consistent with multi-sector genomic profiling of primary GCs showing that tumours with high intra-tumour heterogeneity (ITH) are associated with poor survival, possibly by providing more opportunities for generating metastatic clones130,131 (Box 2). Highlighting their capacity for ongoing tumour evolution, metastatic lesions in GC have also been shown to exhibit high levels of clonal complexity, which may be further enhanced by successive rounds of polyclonal seeding132. From a translational perspective, Box 1 | Environmental and germline factors in GC Helicobacter pylori infection is the most significant environmental gastric cancer (GC) risk factor191,192. H. pylori infection also suppresses gastric acid secretion193, facilitating colonization of the stomach by other bacteria. emergingmetagenomic studies now suggest that the GCmicrobiota is characterized by reduced H. pylori levels and enrichment of other bacterial genera, including intestinal commensals such as Achromobacter, Citrobacter and Lactobacillus194. This may create an environment of ‘microbial dysbiosis’ facilitating the production of nitrosatingmetabolites (for example, nitrite, N-nitrosamines) contributing toGCdevelopment via their genotoxic potential195. Previous studies on germline variants associated with GC risk have largely focused on investigating candidate genes, such as the pro-inflammatory cytokine IL1B196, or confined to families exhibiting high hereditary risk such as hereditary diffuse GC (HDGC) and gastric adenocarcinoma and proximal polyposis of the stomach (GaPPS, caused by APC promoter 1bmutations)197,198. HDGC is characterized by a high prevalence of signet ring cell or diffuse-type GC199 and is largely caused (30–50%) by inactivating germline mutations in CDH1 (reF.200). Pathogenic variants in CTNNA1 have been found in a small number of HDGC families201. recent studies have revealed that patients with GC, including those with and without family histories of GC, may carry pathogenic germline variants in genes such as PALB2, BRCA1, BRCA2 and RAD51C at frequencies above those found in the general population72,200,202, in some cases approaching 10% in east asian countries120. besides single gene variants, new genomic approaches, such as polygenic risk scores (PrSs), are also now being applied to GC, extending previous genome-wide association studies that have highlighted GC risk genes such as PLEC1, SPOCD1, CUX2 and PRKA41 (reFs203–205). a recent study described the development of GC PrSs in Han Chinese patients206. This analysis not only revealed that PrSs can indeed identify individuals with an increased risk of GC but also that, even among individuals with a high PrS genetic risk, participants adopting a ‘favourable lifestyle’ had a lower risk of GC. Polygenic risk scores (Prss). A number comprising the weighted aggregate of multiple genetic variants associated with a particular phenotype. Peritoneal carcinomatosis (PC). A condition where tumour cells from the primary tumour are shed into the peritoneal cavity. www.nature.com/nrc Rev i ews 76 | February 2022 | volume 22

16 | FEBRUARY 2022 | VOLUME 22 www.nature.com/nrc REV I EWS 0123456789( ) ; : these differences between primary and metastatic GC molecular profiles suggest that relying on archival tissue samples to inform treatment decisions in the metachronous setting may require revisiting, raising the growing importance of analysing metastatic biopsies to allocate therapies for patients with late-stage GC, particularly for clinical trials (Box 2). The presence of ITH also suggests that, even for primary GCs, current clinical protocols relying on endoscopic biopsies (which sample superficial regions of primary GCs) to inform therapeutic choices may need to be interpreted with caution133 as evidenced by our recent work showing that tumour regions sampled from deep regions of primary tumours are more similar to paired lymph node metastases than to tumour regions from superficial regions134. While these are admittedly technically difficult issues, emerging platforms such as liquid biopsies may address the challenges of tissue-based sampling. Genomics of therapy response. Many previous GC genomic studies analysed tumour samples that were either untreated or associated with heterogeneous treatments and limited clinical outcome data135–137. Recent studies are now describing bespoke clinical cohorts specifically designed to identify predictive factors associated with particular therapies. For example, to identify molecular predictors associated with ICIs, Kim et al. assembled a prospective phase II cohort of 61 patients with GC uniformly treated with the PD1-targeting agent pembrolizumab138. Comprehensive molecular profiling of this cohort revealed that patients with EBV and MSI GCs were highly sensitive to ICIs. A follow-up study from the same investigators, focusing on MSI GCs, further demonstrated correlations between levels of non-synonymous mutations and T cell receptor diversity with ICI antitumour activity139. Notably, while less pronounced, responses to ICIs in patients with non-EBV andMSI GCs were also observed. Highlighting the value of such datasets as resources for novel hypothesis generation, we have recently shown that epigenetic promoter alterations may explain responses to ICIs in patients with these CIN and GS subtype tumours140. Regarding chemotherapy, Li et al. recently profiled a cohort of 35 patients treated with neoadjuvant 5-fluorouracil plus oxaliplatin-based chemotherapy141. They reported that patients with GCs exhibiting MSI and increased mutational burden were predominantly chemotherapy non-responders and discovered new gene alterations associated with either treatment response (MYC) or resistance (C10orf71, MDM2). Genomic profiling of trial cohorts can also provide insights into GC alterations arising as a consequence of therapeutic pressure. For example, integrative genomic profiling of post- treatment samples after EGFR blockade highlighted a role for ERBB2 amplification and KRAS mutations in tumours from patients with EGFR-inhibitor resistance44. Likewise, transcriptomic profiling of GCs from patients post-chemotherapy revealed the upregulation of pathways related to inflammation and oncogenic signalling (KRAS, IL-6–JAK–STAT3)141. Beyond bulk tumour analysis New genomic technologies, such as single-cell sequencing, have also facilitated investigations of sub-lineage heterogeneity, distinct cell types and GC gene expression programmes142–144 (Box 2). To obtain insights into early GC development, Zhang et al. reported a single-cell analysis of >30,000 cells comprising pre-malignant lesions (chronic atrophic gastritis, IM) and biopsies of early GC revealing a diverse repertoire of gastric epithelial cells145, including antral basal gland mucous cells (expressing MUC6 and TFF2), pit mucous cells (expressing MUC5AC and TFF1), chief cells (expressing PGA4 and PGA3), enteroendocrine cells (expressing CHGA and CHGB) and IM-associated cell types such as goblet cells (expressing MUC2 and ITLN1) and enterocytes (expressing FABP1 and APOA1). Relating these cell types to early GC revealed that cancer cells show transcriptional similarity to enterocytes and a metaplastic Wnt-driven stem cell type expressing OLFM4, EPHB2 and SOX9. Sathe et al., profiling ~ 55,000 cells from seven GCs and one IM biopsy generated a receptor-ligand network linking different components of the GC tumour microenvironment — an analysis that would be highly challenging using only bulk-profiling platforms143. We have also established a GC atlas of >200,000 cells across 48 tumour tissue samples from 31 patients with GC across clinical stages and subtypes, revealing cancer-associated expression programmes in different cellular lineages, epithelial-resident KLF2 in diffuse-type GCs as amechanism for plasma cell homing, and distinct cancer-associated fibroblast populations146. Single-cell analysis can also be applied to selected cellular subtypes in GCs; specifically, Fu et al. performed single-cell RNA sequencing on CD45 isolated immune cells from GC biopsies (n ≈ 26,000 cells) and found Box 2 | Technical challenges and advances in tackling ITH in GC Intra-tumour heterogeneity (ITH) can occur between primary tumours andmetastatic lesions in the same patient or even within a single tumour site. Spatiotemporal changes and differences in the cellular microenvironment can significantly influence tumour signals at multiplemolecular layers (for example, genetic, epigenetic and transcriptomic), determining cancer phenotypes and therapy outcomes. To address these biological differences, multisector profiling and single-cell rNa sequencing (scrNa-seq) have emerged as major approaches to investigate gastric cancer (GC) ITH. Multi-sector profiling • Current clinical endoscopic biopsy protocols do not entirely capture ITH. multi-region sequencing of primary GCs point to both shared and private clonal mutations (for example, pathogenic CTNNB1 and PTEN mutations)207. • ITH levels differ between different GCmolecular subtypes208. • liquid biopsies can circumvent challenges of tissue-based and longitudinal sampling. scRNA-seq • scrNa-seq of GCs has been used to generate a receptor–ligand networks linking different components of the GCmicroenvionment143. • scrNa-seq of intestinal metaplasia revealed intestinal metaplasia-associated cell types such as goblet cells (expressing MUC2 and ITLN1) and enterocytes (expressing FABP1 and APOA1). early cancer cells showed transcriptional similarity to enterocytes andmetaplasticWnt-driven stem cells expressing OLFM4, EPHB2 and SOX9 (reF.145). • snrNa-seq of peritoneal carcinomatosis cells revealed diverse cells-of-origin, including stomach/gastric (‘gastric dominant’) but also other tissue types such as colorectal (‘mixed gastrointestinal’)148. • loss of spatial information in scrNa-seq platforms can be addressed with spatial transcriptomics (for example, digital spatial profiling). Nature reviews | CanCer Rev i ews volume 22 | February 2022 | 77

VOLUME 22 | FEBRUARY 2022 | 17 NATURE REVIEWS | CANCER REV I EWS 0123456789( ) ; : that the IRF8 transcription factor was downregulated in CD8+ tumour-infiltrating lymphocytes compared to CD8+ cells from adjacent normal tissues147. A summary of non-epithelial cell types found in the GC ecosystem, including cancer-associated fibroblasts and immune cells, is presented in Box 3. Single-cell analysis is also yielding insights into mechanisms of GC dissemination and peritoneal seeding125 as exemplified by another recent analysis of >45,000 cells from malignant ascites148. Intriguingly, despite similar clinical origins (primary GC), comparisons of PC cells from different patients with GC revealed strong gene expression similarities to diverse cells-of-origin such as cells of gastric origin (‘gastric-dominant’) but also of colorectal and pancreatic origin. Clinically, patients with PC and a ‘gastric-dominant’ profile exhibited worse survival possibly due to the activation of oncogenic pathways involved in cell cycle regulation, DNA repair and metabolic reprogramming. These findings suggest that, during metastatic progression, GC cells acquire a high degree of developmental plasticity that may represent a key determinant of ITH. It is likely that additional single-cell sequencing studies, performed on larger numbers of cells and clinical samples, will improve our understanding of GC heterogeneity. However, some limitations of the more widely used single-cell platforms include the need for extensive normalization and batch correction (particularly when combining cohorts), sensitivity in detecting lowly-expressed genes, difficulties in obtaining isoform-level transcript data, and the current inability to profile both DNA mutation and transcriptomic profiles from the same cell149. Emerging GC targets and platforms On the translational front, genomic approaches are also revealing new preclinical GC targets, leveraging on new experimental models such as patient-derived organoids (PDOs) and genetically engineered mouse models (GEMMs). While cancer cell lines have long been used in GC research150, cell lines often exhibit biological differences from in vivo disease and lack either matched germline counterparts and/or key GC driver alterations (for example, ARHGAP fusions)151–153. Patient-derived xenografts may better recapitulate in vivo disease, but can also suffer from low establishment efficiency, clonal selection and relatively high maintenance costs154–156. PDOs have emerged as a versatile platform for GC translation and research157–160. Recent studies have described the creation of GC PDOs representing the TCGA molecular subtypes with canonical molecular features, including RHOA mutations and ARHGAP fusions, enabling functional investigation of these poorly understood alterations in GC-related pathways159,160. GC PDOs have been used to demonstrate a role for CDH1 and TP53 compound mutations in R-spondin growth factor independence159 and to identify napabucasin (STAT3 inhibitor), abemaciclib (CDK4 and CDK6 inhibitor), and VE-822 (ATR inhibitor for ARID1A-mutated GC PDOs) as promising GC therapeutics160. GEMMs represent another important GC preclinical platform. While a comprehensive overview of GC GEMMS is beyond the scope of this review161,162, two recent studies are noteworthy. To understand how environmental exposures interplay with GC driver alterations, Sethi et al. utilized a GEMM where TP53 was conditionally deleted in different stomach regions163. Although TP53 inactivation was sufficient to induce gastric dysplasia, co-depletion of TP53 and the CDKN2A cell cycle regulator was required for further GC progression, suggesting that CDKN2A inactivation in TP53-depleted or TP53-inactivated gastric epithelia is required for further GC development. GEMMs can also be used to perform lineage-tracing stem cell studies. For example, transcriptomic profiling of LGR5+ stem cell populations from the mouse pyloric stomach identified the AQP5 membrane protein as a marker for mouse and human adult pyloric stem cells164. Preclinical research investigating molecular vulnerabilities caused by GC genomic alterations are also highlighting novel targets and therapeutic combinations (TABle 1). We and others have previously shown that genomic amplification of wild-type KRAS is frequent in GC (9–12% of GCs; >480 primary GCs analysed collectively) compared to KRAS-activating mutations (for example, G12D and G12V)165,166. Recently, Wong et al. reported that wild-type KRAS-amplified GCs were resistant to MEK blockade using single-agents (AZD6244 or GSK1120212)165. However, concurrent inhibition of the guanine-exchange factors SOS1 and SOS2 or the protein tyrosine phosphatase SHP2 attenuated this resistance, thus demonstrating how a system-level approach can uncover promising combinational strategies targeting wild-type KRAS-amplified GC. The same group has also investigated the function of RHOA mutations in GC, found in 15–20% of diffuse-type GCs167. They reported that RHOA (Y42C) Box 3 | Emerging roles of non-epithelial cell types in GC Studies informed by genomics are illuminating hownon-epithelial cell types promote the gastric cancer (GC) phenotype, extending previous studies establishing a prognostic and predictive role for elevated stromal compositions in GC209,210. For example, cancer- associated fibroblasts (CaFs) can promote context-dependent GC growth and drug resistance through the local secretion of factors such asWNT5a, CXCl1, CXCl8 and aNXa6 (reFs211–213). by performing rNa sequencing of normal fibroblasts and CaFs from patients with diffuse-type GC, Ishimoto et al. reported that CAFs exhibit heightened TGFβ1 signalling, increasedmotility, and enhanced cancer cell aggressiveness when GC cells and CaFs were co-injected intomice214. activation of TGFβ1 signalling in CaFs may be achieved through rHbDF2-mediated cleavage of the TGFβr1 receptor, where RHFBDF2 expression is in turn induced in CaFs by pro-inflammatory cytokines, such as Il-1α and Il-1β, secreted by diffuse-type GC cells. another recent study has described a role for the heat-shock factor HSF1 in regulating the secretion of CaF-derived extracellular vesicles to promote GC215. The increasing importance of immunotherapy has highlighted immune cells as another major cell type in the GC tumour microenvironment147,216–218. FoXP3+ regulatory T (Treg) cells are associated with Helicobacter pylori infection and development of precancerous lesions219, while other T reg cell subtypes, such as ICOS+FoXP3+ T reg cells and CD45ra–CCr7– T reg cells, have inhibitory functions in the tumour immunemicroenvironment associated with poorer survival220. Subsets of CD8+ cytotoxic T cell infiltrates (for example, CD103+CD8+ T cells) have shown an association with unique immunophenotypic characteristics and improved prognosis221,222. Plasmacytoid and activated dendritic cells have been identified in GC, with the latter expressing high levels of IDO1, suggesting an immunosuppressive phenotype143. Huang et al. performed a spatial analysis of tumour-associatedmacrophages (Tams) in GC usingmultiplex immunohistochemistry. While based on a relatively small cohort, this work represents one of the first spatial analyses of immune cells in GC, describing CD68+IF8+ Tams in close proximity to tumour cells, while CD163+CD206–CD68+ Tams (CD163+ is a marker of m2 macrophages) were associated with upregulated immune signalling223. www.nature.com/nrc Rev i ews 78 | February 2022 | volume 22

RkJQdWJsaXNoZXIy MTYzOTI3MA==