APOBEC Mutagenesis
Definition
APOBEC mutagenesis is an endogenous mutational process driven by the AID/APOBEC family of cytidine deaminases — enzymes that deaminate cytidine to uridine on single-stranded DNA. The resulting mutations are predominantly C>T and C>G substitutions occurring at TpC dinucleotides (the nucleotide immediately 5’ of the mutated cytidine is thymine). This process generates mutational signatures 2 and 13 and is active across the majority of human cancer types, providing a continuous source of somatic genetic variation throughout tumor evolution.
Molecular Basis
The AID/APOBEC family (Conticello, 2008) belongs to the zinc-dependent deaminase superfamily and originated from tRNA-editing enzymes at the dawn of vertebrate radiation. All members share a conserved catalytic mechanism: a zinc-activated water molecule attacks carbon 4 of cytidine, with a nearby glutamate acting as proton donor, converting cytidine to uridine (C→U). The catalytic core is defined by the H[AV]E-x[24-36]-PCxxC motif, which coordinates the zinc atom.
The key members relevant to cancer mutagenesis are:
- APOBEC3A: The main driver of somatic APOBEC mutagenesis in cancer (Petljak et al., 2022). Preferentially targets YTCA sequence motifs (Y = pyrimidine). Its expression can be induced by viral infection and inflammatory signaling.
- APOBEC3B: A minor contributor to APOBEC mutational burdens. Despite higher expression levels and stronger in vitro deaminase activity — which historically led the field to designate it the primary mutator — APOBEC3B deletion does not significantly reduce SBS2/SBS13 in cell lines with strong APOBEC3 mutagenesis. In some contexts, APOBEC3B can restrain APOBEC3A by reducing its protein levels (Petljak et al., 2022). Preferentially targets RTCA motifs (R = purine).
- AID: The ancestral family member, normally restricted to activated B cells in germinal centers. Required for physiological somatic hypermutation and class-switch recombination at the immunoglobulin locus. Aberrant AID activity causes c-myc/IgH translocations in B-cell lymphomas.
- APOBEC3F and APOBEC3G: Broadly expressed antiviral restriction factors; packaged into HIV virions and deaminate viral cDNA during reverse transcription.
Each APOBEC paralog has a distinct sequence-context preference for the nucleotide immediately 5’ of the deaminated cytidine — the basis for distinguishing their relative contributions in mutational signature analysis.
Cancer Mutational Signatures
Nik-Zainal et al. (2012) were the first to extract APOBEC-associated mutational signatures from whole-genome sequencing of 21 breast cancers, identifying two distinct patterns:
- Signature 2: C>T and C>G substitutions at TpC dinucleotides, attributed to APOBEC activity. Found across most cancer types.
- Signature 13: A related C>G-dominated pattern at TpC, also attributed to APOBEC enzymes, with a subtly different sequence-context preference reflecting the activities of different APOBEC paralogs.
The TpC preference arises because APOBEC enzymes preferentially deaminate cytidines preceded by thymine in the single-stranded DNA substrate. The resulting uracil is then either: (i) replicated across, producing a C→T transition (the uracil is read as thymine by DNA polymerase), or (ii) excised by uracil DNA glycosylase (UNG), creating an abasic site that is bypassed by translesion synthesis polymerases, often generating C→G transversions. The balance between these two outcomes determines the relative abundance of C>T and C>G mutations in the signature.
flowchart TD Enzyme["APOBEC Enzyme Activation<br>ssDNA substrate at TpC motifs"] --> Deam["C to U Deamination<br>A3A: YTCA preference (main driver)<br>A3B: RTCA preference (minor, restrains A3A)"] Deam --> Fork{"Uracil processed?"} Fork -->|"YES: UNG excises Uracil<br>creates abasic site"| REV1["REV1 TLS Bypass<br>dCMP insertion opposite<br>abasic site"] Fork -->|"NO: Uracil persists"| Repl["Replication across Uracil<br>U read as Thymine<br>by DNA polymerase"] REV1 --> CG["C to G Transversions<br>(primary outcome)"] REV1 --> CA["C to A Transversions<br>(minor: other TLS polymerases)"] Repl --> CT["C to T Transitions"] CT --> SBS2["SBS2<br>(C to T at TpC)"] CG --> SBS13["SBS13<br>(C to G at TpC)"] CA --> SBS13 SBS2 --> Episodic["Subclonal Mutation Burden<br>13-fold episodic variation<br>between daughter clones<br>(Petljak et al., 2022)"] SBS13 --> Episodic
Figure: The APOBEC mutagenesis pathway from deamination to mutational signature output. C-to-U deamination by APOBEC3A (main driver, YTCA preference) or APOBEC3B (minor/restrainer, RTCA preference) on single-stranded DNA at TpC motifs produces uracil. When UNG excises the uracil, the resulting abasic site is bypassed by REV1-dependent translesion synthesis, producing C-to-G transversions (SBS13). When uracil persists, replication reads it as thymine, producing C-to-T transitions (SBS2). The balance between these two processing paths determines the SBS2/SBS13 ratio. Output is episodic, with 13-fold variation between daughter clones. Synthesized from Nik-Zainal et al. (2012), Petljak et al. (2022), and Conticello (2008).
Causal Evidence
For two decades, the role of APOBEC3 enzymes in cancer mutagenesis was supported by indirect evidence: expression correlations, in vitro deamination activity, and mutational pattern similarity. Petljak et al. (2022) provided the first direct causal evidence by deleting APOBEC3A and APOBEC3B from human cancer cell lines that naturally acquire APOBEC3-associated mutations and tracking de novo mutation acquisition across 251 whole-genome-sequenced clonal lines:
- APOBEC3A deletion significantly diminished SBS2 and SBS13 mutations across breast cancer and B cell lymphoma cell lines, directly establishing APOBEC3A as the main driver.
- APOBEC3B deletion alone did not significantly reduce SBS2/SBS13 in cell lines with strong APOBEC3 mutagenesis, challenging predictions based on APOBEC3B’s higher expression and in vitro deaminase activity.
- APOBEC3B deletion increased APOBEC3A-mediated mutagenesis in MDA-MB-453 breast cancer cells — APOBEC3A protein levels were stabilized after APOBEC3B loss, revealing that APOBEC3B can restrain APOBEC3A. This provides a mechanistic explanation for why the APOBEC3B germline deletion polymorphism is associated with increased cancer risk.
- APOBEC3A/APOBEC3B double knockout further decreased but did not eliminate APOBEC3 mutations, suggesting APOBEC3H or another enzyme may contribute small amounts.
This study also confirmed the episodic nature of APOBEC3 mutagenesis: daughter clones from the same parent acquired anywhere from 954 to 12,504 SBS2 mutations over similar time periods — a 13-fold range not explained by growth differences or expression levels.
Historical Consensus Shift: APOBEC3B-Centric to APOBEC3A-Centric Model
The causal evidence from Petljak et al. (2022) represents a major revision of the field’s understanding of which APOBEC paralog drives cancer mutagenesis. The pre-2022 consensus, established primarily by Burns et al. (2013, 2015), held that APOBEC3B was the primary somatic mutator — a conclusion based on its higher expression levels in cancer and stronger in vitro deaminase activity. The Petljak et al. CRISPR knockout experiments directly tested this model and reversed it.
| Aspect | Pre-2022 Consensus (Burns 2013, 2015) | Petljak et al. (2022) Empirical Evidence |
|---|---|---|
| Primary mutator | APOBEC3B — higher expression, stronger in vitro activity | APOBEC3A — A3A knockout significantly diminishes SBS2/SBS13; A3B knockout does not |
| APOBEC3B role | Primary driver of somatic APOBEC mutations | Minor contributor; can restrain APOBEC3A protein stability |
| Evidence type | Correlative: expression levels and in vitro deamination assays | Causal: CRISPR knockout with de novo mutation tracking across 251 clonal WGS lines |
| A3B germline deletion | Predicted protective (removing the “mutator”) | Paradoxically increases APOBEC3A-mediated mutagenesis (removing a brake) |
| Sequence preference | APOBEC3B: RTCA | APOBEC3A: YTCA (matches the dominant TpC pattern in tumors) |
This is not a contradiction between studies of equal evidentiary weight. It is a case where stronger evidence (causal knockout experiments with direct mutation tracking) superseded weaker evidence (expression correlations and in vitro activity measurements). The Burns et al. model was a reasonable inference given the data available at the time — APOBEC3B is more highly expressed and shows stronger in vitro deaminase activity — but it conflated expression level with mutagenic impact. In cells, APOBEC3A, despite lower expression, is the predominant enzymatic source of APOBEC mutations. APOBEC3B’s higher expression may reflect a distinct biological role: it can restrain APOBEC3A by reducing its protein levels, functioning as a negative regulator rather than a primary mutator.
Dananberg et al. (2024) confirm this revised understanding across both in vitro and in vivo evidence: APOBEC3A is the most potent carcinogenic APOBEC3 paralog (the only one driving HCC in the Fah liver regeneration model), while APOBEC3B shows variable effects (promoting or constraining tumorigenesis depending on context).
Burns et al. papers not in corpus
Burns et al. (2013) “APOBEC3B is an enzymatic source of mutation in breast cancer” (Nature) and Burns et al. (2015) “APOBEC3B: pathological consequences of an innate immune DNA mutator” (Biomedical Journal) are not in the current wiki corpus. The pre-2022 consensus characterization above is reconstructed from how Petljak et al. (2022) and Dananberg et al. (2024) describe the prior literature. Direct source-summaries for Burns et al. would strengthen the evidence trail for this claim revision.
Downstream Processing: UNG and REV1
APOBEC3-mediated C→U deamination is only the first step. The resulting uracil is processed by downstream enzymes that determine the final mutation type:
- UNG (uracil DNA glycosylase) excises the uracil, creating an abasic site. This is required for APOBEC3-mediated transversions (C>G and C>A). UNG deletion reduced C>A and C>G proportions and decreased SBS13 burdens; UNG reconstitution restored them (Petljak et al., 2022).
- REV1 is a translesion synthesis polymerase with deoxycytidyl transferase activity opposite abasic sites. REV1 deletion decreased overall SBS2 and SBS13a/b burdens and reduced C>G proportions. REV1 may also contribute to the ubiquitous clock-like signature SBS5 (Petljak et al., 2022).
- SMUG1 can partially substitute for UNG in excising APOBEC3-mediated uracil, consistent with previous observations in immunoglobulin diversification (Petljak et al., 2022).
The balance between these processing pathways determines the mutational outcome: UNG excision followed by REV1-dependent TLS produces transversions (SBS13), while replication across the uracil without excision produces transitions (SBS2).
Episodic Mutagenesis
APOBEC3 mutagenesis does not operate continuously. Petljak et al. (2022) documented 13-fold variation in SBS2 burdens between daughter clones propagated in parallel from the same parent, consistent with episodic bursts of APOBEC3 activity. This episodic pattern was previously reported in cancer cell lines and has implications for intratumor-heterogeneity: a single APOBEC3 burst can generate thousands of mutations in a short time window, creating substantial subclonal genetic diversity in a single event rather than gradual accumulation.
APOBEC Mutagenesis in Tumor Evolution
Late-acting mutational process. Unlike age-related signature 1 (C>T at CpG, which accumulates steadily throughout life), APOBEC mutagenesis often appears to act episodically and can remain active late in tumor evolution (McGranahan & Swanton, 2017; Nik-Zainal et al., 2012). This late activity generates ongoing subclonal mutational diversity in established tumors, providing the standing variation from which therapy-resistant and metastasis-competent clones can emerge.
Coupling to genomic catastrophe. APOBEC mutagenesis is mechanistically linked to large-scale genomic alterations. Single-stranded DNA generated during resection at chromothripsis and rearrangement breakpoints serves as substrate for APOBEC deaminases. This coupling produces kataegis — localized hypermutation clusters colocalizing with rearrangement breakpoints — where hundreds of C>T and C>G substitutions occur in kilobase-scale regions surrounding structural variant junctions.
A substrate for selection. APOBEC mutations generate neo-epitopes that can be recognized by the immune system, linking this mutational process to immunotherapy response. Conversely, APOBEC-mediated mutations can also generate therapy-resistance alleles — the same enzymatic activity that diversifies the immunoglobulin repertoire in B cells can, when misdirected, diversify the cancer genome to evade treatment.
Germline Modulation
The PCAWG Consortium (2020) demonstrated that germline genetic variation directly modulates APOBEC mutagenesis. A common ~30-kb germline deletion on chromosome 22q13.1 removes the APOBEC3B coding sequence and fuses the APOBEC3A 3’ UTR to APOBEC3B’s coding sequence (tagged by rs12628403, MAF ~8% in Europeans). Paradoxically, carriers of this deletion show higher APOBEC3-associated mutation burdens — the deletion is associated with increased cancer risk in some contexts.
Petljak et al. (2022) provided the mechanistic explanation: loss of APOBEC3B increases APOBEC3A protein stability, leading to higher APOBEC3A-mediated mutagenesis. The APOBEC3B germline deletion, by removing APOBEC3B, effectively removes a brake on APOBEC3A — the hybrid APOBEC3A-APOBEC3B transcript produced by the deletion may further stabilize APOBEC3A.
The A3AB deletion shows striking ethnic variation in prevalence (Dananberg et al., 2024): Southeast Asian 36.9%, South American 57.7%, African 0.9%, European ~6%. This has implications for population-level cancer susceptibility — populations with higher deletion frequencies may have higher APOBEC3A-mediated mutagenesis.
A second germline variant, rs1014971 (allele T), located in a long-distance enhancer upstream of the APOBEC3 cluster, interacts with the APOBEC3B promoter and is associated with increased bladder cancer risk, elevated APOBEC3B expression, and higher APOBEC-signature burdens in bladder tumors. In breast cancer, the same variant is associated with increased APOBEC3B expression but NOT with increased APOBEC mutations — suggesting APOBEC3B may contribute to breast cancer susceptibility through mutagenesis-independent mechanisms.
Tissue Specificity and Timing
APOBEC mutagenesis is profoundly tissue-specific. Dananberg et al. (2024) synthesized data from 20+ normal-tissue sequencing studies:
Tissues with APOBEC in normal epithelium (signatures active before malignancy):
- Lung bronchus: 11–78% of samples
- Small intestine: 14–73% of samples
- Bladder urothelium: ~22% of samples
Tissues where APOBEC is absent or rare in normal tissue (signatures appear only during/after transformation):
- Colon epithelium: 0–0.5% of samples
- Liver, endometrium, blood, bone marrow, placenta, prostate: 0% across thousands of samples
- Esophagus: ~0–2% in normal epithelium, rising to ~28% at high-grade intraepithelial neoplasia
This distribution mirrors cancer: APOBEC signatures are prevalent in lung, bladder, and breast cancers but rare or absent in liver, testicular, and thyroid cancers. The tissue-specific regulation of APOBEC enzymes — likely involving epigenetic silencing in tissues where their activity would be most dangerous — determines which cancer types experience APOBEC-driven clonal-evolution.
Timing differs by tissue. In esophageal squamous cell carcinoma, APOBEC hypermutation appears at the high-grade dysplasia stage, AFTER TP53 biallelic loss — it is a late event, a consequence rather than a cause of clonal expansion. In lung and intestine, where APOBEC signatures appear in histologically normal epithelium, mutagenesis may contribute from the earliest stages of carcinogenesis. This has implications for cancer-early-detection: in tissues where APOBEC acts early, APOBEC-associated mutations could serve as biomarkers of cancer susceptibility.
In Vivo Evidence
Dananberg et al. (2024) catalogued transgenic mouse models that establish causal roles for APOBEC enzymes in carcinogenesis:
- APOBEC3A is the most potent carcinogenic APOBEC3. In the Fah liver regeneration model, APOBEC3A expression drove hepatocellular carcinoma dependent on catalytic activity — the other six APOBEC3 paralogs, including APOBEC3B, FAILED to develop tumors. APOBEC3A also promoted colon cancer in ApcMin mice and drove pancreatic cancer metastasis through a non-catalytic, deaminase-independent mechanism.
- APOBEC3B has context-dependent effects. Constitutive tumor-level APOBEC3B expression accelerated carcinogenesis, increased tumor heterogeneity, and promoted metastasis in older wild-type mice (Durfee 2023). However, in EGFR-mutant lung cancer, APOBEC3B constrained tumorigenesis (Caswell 2023), and acute overexpression caused RNA editing and lethality (de la Vega 2023). These variable outcomes mirror the Petljak et al. (2022) finding that APOBEC3B can restrain APOBEC3A in some contexts.
- APOBEC3G contributes to bladder cancer. Constitutive APOBEC3G expression in a BBN-induced bladder cancer model promoted mutagenesis, genomic instability, and kataegis, producing a novel SBS signature and shorter survival (Liu 2023). This extends the mutational repertoire beyond the canonical APOBEC3A/B dichotomy.
Detection and Interpretation
APOBEC mutagenesis is inferred from mutational pattern similarity — the observed C>T and C>G substitutions at TpC dinucleotides match the in vitro sequence preferences of APOBEC enzymes. Direct demonstration of APOBEC enzymatic activity in individual tumors remains challenging. The relative contribution of different APOBEC paralogs (3A vs 3B vs 3F vs 3G) is inferred from their distinct sequence-context preferences (e.g., APOBEC3G prefers 5’-CC while APOBEC3B prefers 5’-TC), but deconvolution of these overlapping signals is an area of active investigation.
APOBEC as a Clock-Disrupting Mutational Process
APOBEC mutagenesis poses a fundamental challenge to the molecular-clock framework used in cancer evolutionary reconstruction. The molecular clock depends on the constant-rate assumption — that neutral passenger mutations accumulate at a stable rate over a tumor’s evolutionary history. APOBEC violates this assumption through three distinct mechanisms.
Episodic Activity Breaks Rate Constancy
Petljak et al. (2022) documented 13-fold variation in SBS2 burdens between daughter clones propagated from the same parent over similar time periods. This is not gradual accumulation — it is a burst process. When an APOBEC burst fires, a clone can acquire thousands of mutations in a short time window; between bursts, the mutation rate returns to baseline. This episodic pattern means passenger mutation count cannot be naively converted to elapsed evolutionary time in lineages experiencing APOBEC activity: a branch with 10,000 mutations is not necessarily an order of magnitude older than a branch with 1,000 — it may simply have experienced an APOBEC burst that the other branch did not.
Tissue-Specific Onset Creates Clock Asynchrony
In tissues where APOBEC turns on late in tumor evolution — for example, esophageal squamous cell carcinoma, where APOBEC hypermutation appears only after TP53 biallelic loss at the high-grade dysplasia stage (Dananberg et al., 2024) — the clock effectively “accelerates” at a defined evolutionary transition. Mutations acquired before this transition accumulated at a different rate (and with a different mutational spectrum) than those acquired after. Comparing passenger counts across this boundary without accounting for the rate change would misdate branching events. In tissues where APOBEC acts early (lung, intestine — signatures present in histologically normal epithelium), the clock disruption spans the entire observable evolutionary history, potentially making APOBEC-active lineages appear systematically older than APOBEC-quiescent ones of equal chronological age.
Signature Shifts Compound Rate Changes
Gerstung et al. (2020) found that ~40% of tumors show significant mutational spectrum shifts between early (clonal) and late (subclonal) evolutionary epochs. APOBEC-driven spectrum changes — from clock-like SBS1 dominance early to SBS2/SBS13 dominance late — are a leading example (Nik-Zainal et al., 2012). When the mutational spectrum changes, the overall passenger mutation rate likely changes as well, because different mutational processes operate at different intrinsic rates. A lineage that experienced an APOBEC burst will show both a higher total mutation burden and a different signature composition than a lineage of equal chronological age that did not — confounding both clock timing and signature-based evolutionary epoch classification.
Implications for Neutral Evolution Inference
Under neutral-evolution, passenger mutations provide the temporal record against which selection is tested. When those passengers are generated by an episodic, clock-disrupting process like APOBEC, the neutral record becomes distorted in three ways:
False selection signal. An APOBEC burst in a subclone generates a sudden spike of high-frequency mutations. Under the standard neutral null (constant rate), these mutations would appear “too abundant for their frequency” — mimicking the signature of a selective sweep. The excess is real (the mutations exist), but its cause is mutagenic rather than selective. Without correcting for APOBEC activity, a neutral subclone that experienced a burst could be misclassified as showing positive selection. The two-step selection-inference procedure formalized in neutral-evolution (estimate death-birth ratio delta first, then test for deviations) does not automatically correct for this — APOBEC-induced mutation rate variation is a distinct confounder from the cell-turnover effects captured by delta.
Clonal-not-truncal ambiguity amplified. The Bozic et al. (2016) insight that clonal passengers can reach fixation without selection (clonal mutations are not necessarily truncal mutations) is amplified by APOBEC: an APOBEC burst early in clonal expansion generates mutations that ride the expansion to high frequency. These mutations appear clonal (present in all sampled cells) but are not truncal (not present in the founding cell). Standard subclonal reconstruction methods that equate clonal with truncal will overestimate the number of truncal events in APOBEC-active tumors. See neutral-evolution for the mathematical basis.
Bursts hidden in the detection blind zone. The molecular clock’s detection-limit blind zone (CCF below 0.05-0.10, corresponding to the most recent ~7 population doublings at standard sequencing depth; Tarabichi et al., 2021) means the most recent APOBEC bursts are invisible to standard sequencing. A clone that experienced a burst 3 doublings ago would have its burst mutations at CCF ~0.01-0.02 — below detection threshold. The clock systematically misses APOBEC’s most recent disruptions, underestimating its contribution to ongoing subclonal diversity and to the total passenger burden of recent subclones.
Practical Mitigation
When APOBEC signatures are present, evolutionary timing inferences should account for the rate disruption. Two approaches are available: (i) exclude APOBEC-attributed mutations from clock calculations and use only clock-like signatures (SBS1, SBS5) for timing, or (ii) model APOBEC as a time-varying rate process, estimating burst timing from the VAF distribution of SBS2/SBS13 mutations. The first approach is simpler but discards information; the second is more informative but requires modeling assumptions about burst frequency and duration that remain poorly constrained by current data. Neither approach is standard practice in current cancer genomics pipelines.
Revision history
- 2026-06-27 — Added Mermaid pathway diagram: enzyme activation to mutational signature output (deamination, UNG excision fork, REV1-dependent transversions vs. replication-across-uracil transitions, SBS2/SBS13). Added “APOBEC as a Clock-Disrupting Mutational Process” section connecting to molecular-clock and neutral-evolution: episodic rate violation, tissue-specific clock asynchrony, signature-shift compound effects, three implications for neutral evolution inference (false selection signal, clonal-not-truncal amplification, detection blind zone), practical mitigation approaches. Formalized Burns (2013, 2015) to Petljak et al. (2022) historical consensus shift as a structured tension table with evidence-type comparison, explicit characterization as stronger-evidence-superseding-weaker-evidence rather than contradiction of equals, and corpus-gap note for Burns papers. Added molecular-clock and neutral-evolution to related frontmatter links.
- 2026-06-20 — Added tissue specificity and in vivo evidence from Dananberg et al. (2024): tissue prevalence table (lung 78%, intestine 73%, bladder 22%, liver/endometrium/blood 0%), APOBEC3A as most potent carcinogenic APOBEC3 in vivo (only paralog driving HCC in Fah model), APOBEC3G bladder cancer role, APOBEC3B variable effects across models. Added A3AB deletion ethnic variation and rs1014971 SNP. (dananberg2024-apobec-cancer-review)
- 2026-06-20 — Claim revised. Previous claim: “APOBEC3A and APOBEC3B are the primary sources of somatic APOBEC mutagenesis in cancer” — reflecting the Burns et al. (2013, 2015) APOBEC3B-centric model based on expression and in vitro deamination activity. New claim from Petljak et al. (2022): APOBEC3A is the main driver; APOBEC3B contributes only small burdens and can restrain APOBEC3A protein stability. First direct causal evidence from knockout experiments. Also added UNG/REV1 downstream processing and episodic mutagenesis. (petljak2022-apobec3-mechanisms)
- 2026-06-20 — Created from Conticello (2008), Nik-Zainal et al. (2012), PCAWG Consortium (2020), and existing wiki sources (McGranahan & Swanton 2017, Turajlic et al. 2019). Resolves 10+ pre-existing wikilinks across concept pages.