Bibliographic Reference
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium (2020). Pan-cancer analysis of whole genomes. Nature, 578(7793), 82–93. https://doi.org/10.1038/s41586-020-1969-6
Core Argument
Whole-genome sequencing of 2,658 cancers across 38 tumour types reveals the full landscape of somatic and germline variation that drives cancer. Uniform alignment, variant calling, and quality control across all samples enabled integrated analysis of point mutations, structural variants, copy-number alterations, retrotransposition, mitochondrial mutations, and telomere maintenance across the pan-cancer cohort. This flagship paper describes the PCAWG resource and presents core findings on driver mutations, clustered mutational processes, germline determinants of somatic mutation, and replicative immortality. It frames all findings within a Darwinian evolutionary framework: cancer arises from “the stochastic nature of Darwinian evolution” with three preconditions — heritable variation, differential fitness, and competition for survival.
Methods
PCAWG cohort: 2,658 white- and grey-listed donors (2,605 primary tumours, 173 metastases/recurrences) across 38 tumour types. Mean coverage: 39× normal, 38–60× (bimodal) tumour. RNA-sequencing available for 1,222 donors. Three core variant-calling pipelines for SNVs, indels, CNAs, and SVs. Bespoke algorithms for retrotransposition, mtDNA mutations, and telomere length. Cloud computing distributed across 13 data centres on 3 continents using Docker containers. Validation on 63 representative tumour–normal pairs with deep-sequencing hybridization bait set. Consensus calling: SNV sensitivity 95%, precision 95%; indel sensitivity 60%, precision 91%; SV sensitivity ~90%, precision 97.5%. A ‘rank-and-cut’ approach identified probable driver mutations by ranking mutations based on recurrence, functional consequence, and driver pattern expectation, then cutting at the excess-over-background threshold. A ‘compendium of mutational driver elements’ supplemented novel discoveries with previously known cancer genes.
Key Findings
-
Driver landscape across 2,658 cancers. 91% of tumours had at least one identified driver mutation, with an average of 4.6 drivers per tumour (2.6 coding point mutations, plus non-coding, SVs, and CNAs). In ~5% of cases no drivers were identified, suggesting incomplete driver discovery. The most recurrently mutated genes: TP53 (954), CDKN2A (475), ARID1A (316), KRAS (287), PTEN (269), TERT (263), CDKN2B (258), SMAD4 (181), PIK3CA (177), RB1 (167). 77% of TP53-mutated tumours had biallelic inactivation — 96% of these combined a point mutation with deletion of the other allele.
-
Non-coding drivers are infrequent but present. Only 13% (785/5,913) of driver point mutations were non-coding. Nonetheless, 25% of tumours bore at least one putative non-coding driver, and one-third (237/785) affected the TERT promoter (9% of tumours). Beyond TERT, individual enhancers and promoters were only infrequent targets.
-
Driver type varies by cancer type. Structural variant drivers dominated in breast adenocarcinomas (6.4 SVs vs 2.2 point mutations) and ovarian adenocarcinomas (5.8 vs 1.9). Point mutation drivers predominated in colorectal adenocarcinomas (7.4 point mutations vs 2.4 SVs) and mature B cell lymphomas (6 vs 2.2). SETD2 was discovered as a novel driver in medulloblastoma group-4 tumours, with biallelic inactivation significantly decreasing gene expression.
-
Chromothripsis is frequent, early, and enriched for drivers. 22.3% of samples (587/2,583) exhibited chromothripsis, most frequently in sarcoma, glioblastoma, lung SCC, melanoma, and breast adenocarcinoma. Chromothripsis tended to be clonal rather than subclonal, indicating it occurs early in tumour evolution. Chromothripsis regions coincided with 3.6% of all identified drivers and ~7% of copy-number drivers — significantly enriched beyond chance. The majority of coinciding driver events were amplifications (58%), followed by homozygous deletions (34%) and SV disruptions (8%). Chromothripsis manifested in cancer-type-specific patterns: acral melanoma with CCND1 amplification, lung SCC with SOX2 amplification, chromophobe RCC almost exclusively on chromosome 5 near TERT (increasing TERT expression 80-fold).
-
Clustered mutational processes — kataegis and chromoplexy. Kataegis and chromoplexy showed distinct cancer-type distributions. Lymphoid tumours displayed hypermutation hot spots near known cancer genes with associated SVs. Chromoplexy was recurrent in lymphoid, prostate, and thyroid cancers.
-
Germline variants shape the somatic mutation landscape. A GWAS in Europeans (n=1,201) identified rs12628403 at 22q13.1 associated with APOBEC3B-like mutagenesis — the minor allele tags a ~30-kb germline deletion that fuses APOBEC3B to APOBEC3A and is protective against APOBEC mutagenesis. Rare-variant analysis showed: BRCA1 PTVs associated with increased small tandem duplications and a novel ‘cycles of templated insertions’ SV class; BRCA2 PTVs associated with increased small deletions; MBD4 PTVs associated with increased CpG > T mutation rates (replicated in TCGA exomes, n=8,134).
-
Germline LINE-1 elements show two activity patterns. Of 114 active germline L1 source elements, 16 accounted for 67% of all somatic L1-mediated retrotranspositions. These fell into two patterns: ‘Strombolian’ — frequently active, modest somatic activity, common in population (MAF > 2%); and ‘Plinian’ — rarely active, aggressive somatic activity, infrequent (MAF ≤ 2%), potentially younger in the germline. Only 38% of donors carried ≥1 Plinian element.
-
Telomere maintenance is more diverse than the TERT/ALT dichotomy. Four distinct telomere clusters were identified from 12 telomere features. C1 (ALT subtype 1): enriched for RB1 mutations, sarcomas, longer telomeres. C2 (ALT subtype 2): ATRX/DAXX mutations, pancreatic NET, low-grade glioma. C3: TERT promoter mutations, thyroid adenocarcinoma. C4: normal-like. RB1 mutations emerged as a potential third route to ALT activation. Tumour types with highest rates of abnormal telomere maintenance originated from tissues with low endogenous replicative activity — an inverse correlation between stem cell division rate and telomere abnormality frequency.
Concepts Introduced or Used
clonal-evolution, driver-mutation, passenger-mutation, chromothripsis, kataegis, chromoplexy, whole-genome-duplication, mutational-signature, APOBEC-mutagenesis, telomere-maintenance, alternative-lengthening-of-telomeres, retrotransposition, germline-variant, structural-variant, copy-number-alteration, subclonal-architecture, intratumor-heterogeneity, molecular-clock, biallelic-inactivation, rank-and-cut, templated-insertion, non-coding-driver, TERT-promoter, Plinian, Strombolian
Entities Referenced
- PCAWG Consortium, ICGC, TCGA
- 38 tumour types including glioblastoma, medulloblastoma, melanoma (acral/cutaneous/mucosal), breast adenocarcinoma, ovarian adenocarcinoma, colorectal adenocarcinoma, lung SCC, lung adenocarcinoma, prostate adenocarcinoma, pancreatic adenocarcinoma, pancreatic neuroendocrine, chromophobe RCC, clear cell RCC, papillary RCC, hepatocellular carcinoma, B cell lymphoma, CLL, thyroid adenocarcinoma, sarcoma (liposarcoma, leiomyosarcoma, osteosarcoma), bladder TCC, stomach adenocarcinoma, head and neck SCC, cervical SCC and adenocarcinoma, etc.
- Genes: TP53, CDKN2A, ARID1A, KRAS, PTEN, TERT, CDKN2B, SMAD4, PIK3CA, RB1, BRAF, CTNNB1, ERG, MYC, NF1, CCNE1, VHL, KMT2D, APC, PBRM1, MCL1, CCND1, MAP2K4, CREBBP, ATM, SETD2, ATRX, DAXX, SOX2, EGFR, MDM2, CDK4, ERBB2, BRCA1, BRCA2, MBD4, APOBEC3A, APOBEC3B, LINE-1
- Methods: Whole-genome sequencing (mean 38–60×), RNA-sequencing, cloud computing (13 data centres), Docker containers, rank-and-cut driver identification, molecular clock timing, GWAS, rare-variant association, t-SNE clustering
- Software: Dockstore, ICGC Data Portal, UCSC Xena, Expression Atlas, PCAWG Scout, Chromothripsis Explorer, PLINK
Limitations
- Driver identification depends on current knowledge; ~5% of tumours had no identified drivers, and non-coding driver discovery remains incomplete. The rank-and-cut approach relies on estimated functional consequence, which will improve with better annotations.
- Bulk sequencing predominates; subclonal architecture below detection limits is not captured. Single-time-point sampling limits longitudinal inference.
- Indel calling sensitivity (60%) is substantially lower than SNV sensitivity (95%), and low-VAF variants (likely subclonal) remain challenging.
- The GWAS and rare-variant analyses were restricted to individuals of inferred European or East Asian ancestry due to sample sizes in other populations.
- Telomere features were measured in bulk tumour tissue, potentially masking subclonal variation in telomere maintenance mechanisms.
- The companion papers contain deeper analyses of non-coding drivers, mutational signatures, structural variation, tumour evolution, and transcriptomic effects — this flagship paper provides the resource overview rather than exhaustive treatment of each topic.
Relevance to Clonal Evolution
This is the flagship paper of the largest cancer whole-genome sequencing project to date, providing the definitive genomic resource for clonal evolution research. It explicitly frames cancer within a Darwinian evolutionary system — heritable variation (mutations), differential fitness (driver vs passenger), and competition for survival within the somatic cell population. The finding that chromothripsis is predominantly an early, clonal event has direct implications for phylogenetic reconstruction and evolutionary timing. The diversity of driver types across cancer types (SV-driven vs point-mutation-driven) reveals that the mode of clonal evolution varies by tissue context. The germline findings — that inherited variants shape the somatic mutational landscape — add a layer of host genetic constraint on the evolutionary substrate available to tumours. This paper’s companion publication on evolutionary history (Gerstung et al. 2020, also in the vault) provides the detailed temporal analysis; this flagship paper provides the resource and the foundational genomic catalogue.