Kataegis

Kataegis (from the Greek for “thunderstorm”) is a phenomenon of localized hypermutation discovered by Nik-Zainal et al. (2012) in their whole-genome analysis of 21 breast cancers.

Defining Features

Kataegis is characterized by:

  • Clusters of C>T and C>G substitutions at TpC dinucleotides (the canonical APOBEC-mutagenesis motif)
  • Colocalization with sites of somatic structural rearrangement (breakpoints)
  • Spatial restriction: most kataegic foci span a few kilobases
  • Variable presence across tumors: some cancers show multiple kataegic events, others none

Mechanistic Inference

The tight colocalization with rearrangement breakpoints suggests a mechanistic coupling between DNA breakage and localized hypermutation. Nik-Zainal et al. (2012) proposed the model: single-stranded DNA generated during resection at rearrangement breakpoints serves as a substrate for APOBEC cytidine deaminases. APOBEC enzymes cannot act on double-stranded DNA. The break is necessary because resection exposes the ssDNA they require.

At the time of discovery, APOBEC involvement was inferred from the TpC mutational pattern — not directly demonstrated. Petljak et al. (2022) later provided direct causal evidence, identifying APOBEC3A as the primary driver of somatic APOBEC mutagenesis in cancer cells.

Enzymatic Chemistry

The AID/APOBEC family (Conticello, 2008) belongs to the zinc-dependent deaminase superfamily. The catalytic mechanism: a zinc-activated water molecule attacks carbon 4 of cytidine, with a nearby glutamate acting as proton donor, converting cytidine to uridine (C→U). The catalytic core is defined by the H[AV]E-x[24-36]-PCxxC motif, which coordinates the zinc atom. APOBEC enzymes preferentially target cytosine at TpC dinucleotide motifs on single-stranded DNA (Nik-Zainal et al., 2012).

Two Mutational Outcomes

Once uracil sits on DNA, two paths determine the final mutation type (Nik-Zainal et al., 2012):

  1. Replication across uracil → C>T transition. If uracil persists until DNA replication, the replicative polymerase reads uracil as thymine and inserts adenine on the complementary strand. The result is a C:G → T:A transition.

  2. UNG excision → abasic site → C>G transversion. Uracil DNA glycosylase (UNG) recognizes uracil in DNA as foreign and excises it, creating an abasic site. Translesion synthesis polymerases (primarily REV1) bypass the abasic site, often inserting cytosine opposite the gap. The result is a C:G → G:C transversion.

The balance between these two fates — replication bypass vs. UNG excision — determines the relative abundance of C>T and C>G mutations at a kataegic focus. This balance shapes the distinction between the two APOBEC-associated signatures: SBS2 (C>T dominant) and SBS13 (C>G dominant).

Significance

Kataegis demonstrates that mutational processes can be spatially and temporally heterogeneous within a single genome — not all regions experience the same mutational forces.

Spatial. Most kataegic foci span a few kilobases flanking the rearrangement breakpoint. The clustering is tight because the ssDNA substrate is available only where resection has exposed it.

Temporal. All mutations in a kataegic focus arise simultaneously with the rearrangement — not one-by-one over time. This violates the assumption underlying the molecular-clock: that passenger mutations accumulate independently and gradually. A kataegic event deposits hundreds of mutations at once in a single genomic region. Treating that burst as gradual accumulation would distort phylogenetic branch-length estimates. The mutations are temporally clustered — they share a single causative event, not a long accumulation history. This has implications for subclonal-reconstruction, which must distinguish between mutations acquired gradually and those acquired in a single catastrophic burst.

Asexual Evolution and Linked Mutations

Cancer cells reproduce clonally through mitosis. There is no recombination. Beneficial mutations stay permanently linked to the genome they arose in — the entire genome hitchhikes along with any driver event (McGranahan & Swanton, 2017). This is the population-genetic definition of asexual evolution: no segregation, no independent assortment, no meiotic shuffling.

Kataegis is the linked-genome principle at micro-scale. A single rearrangement event deposits hundreds of temporally clustered mutations within a few kilobases. Those mutations are physically adjacent on the same chromosome. They will never be separated. They travel together for the lifetime of that lineage. The kataegic focus is not just a mutational cluster — it is a permanently linked haplotype generated in a single catastrophic event.

This has a practical consequence for subclonal-reconstruction: kataegic mutations cannot be treated as independent phylogenetic markers. They share a single causative event. A phylogenetic method that weights each mutation equally would overweight a kataegic focus relative to the same number of mutations scattered across the genome, because the scattered mutations represent independent evolutionary time while the kataegic mutations represent a single moment.