Ma, Liu, Liu, et al. (2018) — Pan-Cancer Genome and Transcriptome Analyses of 1,699 Paediatric Leukaemias and Solid Tumours

Bibliographic Reference

Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M. N., Gawad, C., Zhou, X., Li, Y., Rusch, M. C., Easton, J., Huether, R., Gonzalez-Pena, V., Wilkinson, M. R., Hermida, L. C., Davis, S., Sioson, E., Pounds, S., Cao, X., Ries, R. E., … Zhang, J. (2018). Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 555, 371–376. https://doi.org/10.1038/nature25795

Core Argument

This is the first comprehensive pan-cancer analysis of pediatric cancers, encompassing 1,699 tumours across six histotypes (B-ALL, T-ALL, AML, neuroblastoma, Wilms tumour, osteosarcoma) with uniform whole-genome, whole-exome, and transcriptome sequencing. The study establishes that pediatric cancers represent a fundamentally distinct genomic landscape from adult cancers — they have dramatically lower somatic mutation rates, a different repertoire of driver genes (only 45% overlap with adult pan-cancer studies), and a greater reliance on copy number alterations and structural variants (62% of driver events) rather than point mutations as the primary mechanism of oncogenic alteration. These differences arise from the distinct tissue origins of pediatric cancers (developing mesodermic rather than adult epithelial tissues) and have direct implications for the development of pediatric-specific precision therapies.

Methods

Cohort. Paired tumour-normal samples from 1,699 patients enrolled in Children’s Oncology Group (COG) clinical trials via the TARGET project: 689 B-ALL, 267 T-ALL, 210 AML, 316 neuroblastomas (NBL), 128 Wilms tumours (WT), and 89 osteosarcomas (OS). All specimens obtained at initial diagnosis; 98.5% of patients were 20 years of age or younger.

Sequencing and analysis. WGS data generated with Complete Genomics Inc. technology (~50x coverage, 31–35 bp mate-paired reads) and aligned to hg19/GRCh37. WES data generated with standard capture. RNA-seq data were mapped with StrongArm; rearrangements identified with CICERO. All data processed under a uniform analytical framework.

Variant calling and filtering. Somatic SNVs and indels from CGI data passed through a multi-stage filter removing germline variants (dbSNP, NHLBI ESP, PCGP, cohort-specific) and requiring: ≥3 mutant reads in tumour, significant enrichment over normal (Fisher’s exact P < 0.01), normal MAF < 0.05, and unique BLAT mapping. A “rescue” pipeline recovered pathogenic variants via Medal_Ceremony. This reduced 51 million raw SNVs and 38 million indels to 711,490 SNVs and 57,700 indels (9,397 SNVs and 1,000 indels in coding regions).

Driver gene discovery. GRIN analysis integrating SNVs, indels, CNAs, structural variants, and fusions for 654 CGI samples; MutSigCV on coding SNVs/indels from WGS+WES. Putative genes with Q < 0.01 by either method underwent curated driver status determination. Subnetwork analysis via HotNet2 and variant pathogenicity classification via Medal_Ceremony identified additional candidates. Down-sampling analysis assessed saturation of gene discovery.

Mutational signature analysis. Catalogues generated using 96-bin trinucleotide classification. Signatures deciphered and activities quantified using established methodology (Alexandrov et al., 2013).

Allele-specific expression (ASE). Point mutations required ≥20x coverage in both DNA and RNA. ASE called when |RNA_MAF — DNA_MAF| > 0.2 and FDR < 0.01 (two-sided Fisher’s exact test, qvalue R package). Single-cell targeted resequencing (Fluidigm C1) performed to disambiguate ASE from subclonal LOH.

Chromothripsis detection. Bartlett’s goodness-of-fit test assessed whether structural variant breakpoint distributions departed from random; chromosomes with P < 0.01 Bartlett’s and P > 0.01 for SV type test were reviewed for oscillation between restricted CNA states.

Key Findings

  1. Pediatric cancers have dramatically lower somatic mutation rates than adult cancers. Median mutation rate ranged from 0.17 per Mb (AML and Wilms tumours) to 0.79 per Mb (osteosarcomas), compared to 1–10 per Mb in common adult cancers.

  2. 142 driver genes identified; only 45% are shared with adult pan-cancer studies. More than half (73) of driver genes were specific to a single histotype. Thirty-seven genes were absent from both adult pan-cancer studies and the Cancer Gene Census (v81), including NIPBL and LEMD3. Copy number alterations and structural variants constituted the majority (62%) of driver events.

  3. Eleven genome-wide mutational signatures were identified. Signatures T-1 through T-9 corresponded to known COSMIC signatures. T-5, attributed to ultraviolet-light exposure, was unexpectedly present in eight aneuploid B-ALL samples (all lacking oncogenic fusions, P = 3 × 10^(-5) for aneuploidy enrichment). CC>TT dinucleotide mutations were enriched 110-fold in these samples (P = 1.07 × 10^(-7)). Two novel signatures (T-10, T-11) were enriched in low-MAF mutations and T-11 was likely associated with platform-specific sequencing artefacts.

  4. Chromothripsis detected in 11% of all samples, including 13/89 osteosarcomas, 15/128 Wilms tumours, 22/316 NBLs, 14/689 B-ALLs, and 6/210 AMLs.

  5. Mutant alleles were expressed for 34% of protein-coding mutations; 20% exhibited allele-specific expression (ASE). Truncating mutations showed suppression of the mutant allele (76% of ASE cases, P = 7 × 10^(-5)), while hotspot mutations showed elevated mutant allele expression (87%, P = 6 × 10^(-5)). Single-cell sequencing confirmed that WT1 D447N ASE in AML was attributable to epigenetic silencing rather than subclonal LOH.

  6. Novel KRAS isoforms detected in 70% of leukaemias but rarely in solid tumours. These isoforms create truncated KRAS proteins retaining the GTPase domain but lacking the hypervariable region critical for membrane targeting.

Concepts Introduced or Used

  • mutational-signature — Characteristic patterns of somatic mutations reflecting underlying DNA damage and repair processes. Eleven signatures identified, nine matching known COSMIC signatures and two novel (likely artefactual).
  • driver-mutation — Somatic alteration conferring selective fitness advantage. 142 driver genes catalogued; the study establishes that pediatric drivers are predominantly CNAs/SVs rather than point mutations, and largely distinct from adult drivers.
  • chromothripsis — Massive genomic rearrangements from a single catastrophic event, detected in 11% of samples across all six histotypes.
  • clonal-evolution — The framework within which this study is situated. Low mutation rates in pediatric cancers imply different evolutionary dynamics compared to adult cancers — fewer substrates for selection acting over shorter developmental timeframes.
  • intratumor-heterogeneity — Nearly half (40–50%) of point mutations in leukaemia and NBL driver genes had low MAFs (<0.3), indicative of subclonal mutations contributing to tumorigenesis.
  • allele-specific-expression — Differential expression of mutant versus wild-type alleles. 20% of expressed mutations showed ASE, with distinct patterns for truncating versus hotspot mutations.

Entities Referenced

  • Genes: CDKN2A (most frequently altered, predominantly deletions), STAG2 (five alteration types across five histotypes), WT1 (ASE confirmed by single-cell sequencing), KRAS (novel isoforms in leukaemias), TAL1 (T-ALL specific), MYCN, ATRX, ALK (NBL), TP53 (co-occurrence/exclusivity), MAP3K4 (novel recurrent hotspot G1366R), UBTF (ITD in AML), NIPBL, LEMD3 (novel driver genes), BRAF (most frequent in pathogenicity analysis)
  • Pathways: RAS, JAK-STAT, PI3K signalling; cell cycle; epigenetic regulation; NOTCH signalling; Wnt/beta-catenin
  • Databases: COSMIC mutational signatures, Cancer Gene Census (v81), TCGA pan-cancer studies, gnomAD, dbSNP, ClinVar
  • Projects: TARGET (Therapeutically Applicable Research to Generate Effective Treatments), COG (Children’s Oncology Group)
  • Tools: ProteinPaint portal, GRIN, MutSigCV, HotNet2, Medal_Ceremony, CONSERTING, CICERO, StrongArm, ESTIMATE, CIBERSORT
  • Platforms: Complete Genomics Inc. (CGI) sequencing, Illumina WES, Fluidigm C1 single-cell system
  • Software: Bambino, BLAT, Circos

Limitations (as stated by authors)

  • Detection power for low-frequency drivers: “Because of reduced power for detecting low-frequency drivers, detection limits were 1% for the entire cohort and 3% for individual histotypes with more than 200 samples” — implying rare driver genes may have been missed, especially in osteosarcomas and Wilms tumours (smaller sample sizes).
  • Novel signatures likely artefactual: “T-11 was the only signature that was significantly correlated (r^2 = 0.9) with the presence of multi-nucleotide variations composed of co-occurring SNVs separated by 3 or 4 bp which were not verified by Illumina WES. Therefore, it is likely to be associated with platform-specific sequencing artefacts.” T-10 and T-11 were both enriched in low-MAF mutations and flagged as potential artefacts.
  • Statistical power for mutual exclusivity: “Although the co-occurrence test is well-powered for most gene pairs, we recognize that the mutual exclusivity test is not powered for most gene pairs” — meaning many reported exclusivity pairs with P < 0.05 but Q > 0.05 may not replicate.
  • No statistical methods for sample size: “No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.”
  • Osteosarcoma WES quality: 23 osteosarcoma WES samples excluded from driver discovery and coding mutation rate calculations due to quality issues.

Relevance to Clonal Evolution

This paper is foundational for understanding pediatric cancers as a distinct evolutionary regime within clonal evolution theory. Several findings reshape how evolutionary dynamics operate in this context:

Low mutation burden constrains selection. With median mutation rates 10–100x lower than adult epithelial cancers, the substrate upon which natural selection can act is dramatically reduced. This implies that fewer driver events are available, and the evolutionary dynamics may be dominated by a small number of strong-effect drivers — or that non-genetic mechanisms (epigenetic, developmental) play a proportionally larger role in pediatric cancer progression.

Different driver architecture alters evolutionary trajectories. The dominance of CNAs and structural variants (62% of driver events) over point mutations in pediatric cancers means that the evolutionary “steps” are larger and qualitatively different from adult cancers. A single chromothripsis event can simultaneously alter many genes, producing punctuated evolutionary changes rather than gradual accumulation. The 11% chromothripsis rate across histotypes supports punctuated-evolution models.

Developmental origins alter the evolutionary starting point. Pediatric cancers arise in developing (mesodermic) tissues rather than adult epithelial tissues, meaning the normal cellular context — proliferation rate, differentiation state, microenvironment — is fundamentally different. The clock-like mutational signatures (T-1, T-4) contributed 97% of mutations in T-ALL and 63% in AML, consistent with rapid proliferation in developing tissues.

Subclonal architecture is prevalent despite low mutation rates. The finding that 40–50% of point mutations in leukaemia and NBL driver genes had low MAFs (<0.3) indicates that intratumor-heterogeneity exists even in low-mutation-burden cancers. This challenges the assumption that low mutation rates would produce homogeneous tumours.

ASE as an underappreciated evolutionary mechanism. The finding that truncating mutations are preferentially suppressed while hotspot mutations are elevated at the RNA level (confirmed by single-cell sequencing) reveals a layer of post-transcriptional regulation that modulates the evolutionary impact of somatic mutations — a dimension not captured by DNA-level subclonal-reconstruction alone.

The companion pediatric pan-cancer study (Gröbner et al., 2018, analysing 961 CNS tumours across 24 histotypes) complements these findings, and both datasets are available through the ProteinPaint portal, providing an integrated resource for pediatric cancer evolutionary genomics.