Cancer Evolution Olog

Introduction

An ontology log (olog) is a category-theoretic knowledge representation in which objects are sets representing entities in a domain and arrows are unique functions between them (Giesa, Spivak, & Buehler, 2011). A hierarchical olog adds a subcategory H forming a forest (a collection of trees) that organizes objects by compositional scale. Commutative diagrams enforce consistency: different paths between the same two objects must yield the same result. A functor between two ologs preserves this compositional structure, providing a formal criterion for valid cross-domain analogies.

This olog formalizes the cancer evolution domain. Every object is a set whose elements are instances of the corresponding biological entity. Every arrow is a function mapping elements of its domain to elements of its codomain. Every commutativity condition is a biological constraint that the empirical literature either supports or challenges.

The olog is constructed from the wiki’s existing concept pages: clonal-evolution, clonal-sweep, population-bottleneck, dual-regime-evolution, compression-progress-evolution, therapy-resistance, and subclonal-reconstruction. It serves as the foundational domain model for cross-domain functor construction — the canonical target category for analogies to ecology, protein science, music, and other domains formalized in the wiki’s cross-domain synthesis program.


1. Objects

Objects are organized by hierarchical (compositional) level. Each object is a set; its elements are instances of the biological entity it represents. For each object we provide a definition (what the set contains), biological examples (what elements look like), and wikilinks to the relevant wiki concept pages.

1.1 Molecular Level

GenomeState

The set of all possible DNA sequence configurations of a tumor cell, including point mutations (SNVs), insertions and deletions (indels), copy-number alterations (CNAs), structural variants (SVs), and whole-genome duplication (WGD) events. An element g ∈ GenomeState is a specific genome — the complete DNA sequence with all acquired alterations, measured relative to the germline reference.

  • Examples: TP53 R175H mutation + 8q amplification + chromothripsis of chr5; germline-normal genome with an acquired EGFR exon 19 deletion.
  • Formal note: GenomeState is the state space of the Darwinian regime. It changes only through mutation arrows (irreversible composition).
  • Sources: clonal-evolution, subclonal-reconstruction
  • Related objects: Mutation, DriverMutation, PassengerMutation

Mutation

The set of all possible changes in GenomeState — a function m: GenomeState → GenomeState mapping one genome state to another. An element µ ∈ Mutation is a specific mutational event: a C→T transition at position chr1:100,000, a 2 Mb duplication on 7p, a chromothriptic rearrangement.

  • Examples: APOBEC signature C→T mutation at a trinucleotide context TpCpW; chromothripsis event shattering chr3; focal MYC amplification.
  • Sources: clonal-evolution
  • Related objects: GenomeState, DriverMutation, PassengerMutation

DriverMutation

The subset of Mutation whose elements confer a fitness advantage on the cell bearing them: DriverMutation ⊆ Mutation. A mutation d ∈ DriverMutation is one that undergoes positive selection — it increases the net reproductive rate (b − d) of the cell relative to competitors.

  • Examples: KRAS G12D (constitutive MAPK signaling), TP53 R175H (loss of tumor suppression), EGFR L858R (activating mutation).
  • Empirical anchor: Average ~4.6 drivers per tumor across 38 cancer types (PCAWG Consortium, 2020). Average selective advantage ~0.4% (Bozic et al., 2010, cited in Greaves & Maley, 2012).
  • Sources: clonal-evolution, therapy-resistance

PassengerMutation

The subset of Mutation that does not confer a fitness advantage: PassengerMutation ⊆ Mutation, and DriverMutation ∩ PassengerMutation = ∅. A passenger mutation p ∈ PassengerMutation is selectively neutral — it may drift to high or low frequency through stochastic birth-death processes.

  • Examples: Synonymous coding mutations, non-coding mutations in intergenic regions not affecting regulatory elements, intronic mutations with no splice effect.
  • Empirical anchor: The vast majority of somatic mutations are passengers. Neutral evolution is detectable by the 1/f² VAF distribution (Turajlic et al., 2019).
  • Sources: clonal-evolution, subclonal-reconstruction

MutationalSignature

The set of characteristic mutation patterns produced by specific mutagenic processes. An element s ∈ MutationalSignature is a probability distribution over 96 trinucleotide contexts (6 substitution types × 4 bases 5’ × 4 bases 3’).

  • Examples: Signature 2 (APOBEC, C→T at TpCpW), Signature 4 (tobacco smoke, C→A), Signature 3 (HRD, large-scale structural rearrangements).
  • Sources: clonal-evolution, dual-regime-evolution
  • Related objects: GenomeState, Mutation

EpigeneticState

The set of all possible chromatin configurations — DNA methylation, histone modifications, chromatin accessibility, and 3D genome organization. An element e ∈ EpigeneticState is a specific chromatin configuration across the genome.

  • Examples: MGMT promoter methylated → silenced; global DNA hypomethylation; H3K27me3-enriched Polycomb-repressed domain; open chromatin at active enhancers.
  • Sources: dual-regime-evolution
  • Related objects: GenomeState

1.2 Cellular/Population Level

Clone

The set of all tumor cells sharing a common GenomeState inherited from a recent common ancestor. An element c ∈ Clone is a cell lineage defined by a specific constellation of mutations that uniquely identifies it. Clones are distinguished by their private mutations (present in all cells of the clone) and shared branch mutations (present in this clone and its descendants).

  • Examples: The founding clone (truncal mutations), a subclone with KRAS G12D + private TP53 mutation, a resistant subclone with EGFR T790M.
  • Formal note: Clones are the objects of the phylogenetic tree. The function genome: Clone → GenomeState assigns each clone its defining genome state.
  • Sources: clonal-evolution, subclonal-reconstruction

TumorCellPopulation

The set of all clones present in a tumor at a given time — a finite set of clones, each with a frequency (size). An element p ∈ TumorCellPopulation is a tumor’s complete clonal composition.

  • Formal note: A TumorCellPopulation can be represented as a probability distribution over Clones: freq: Clone → [0,1], summing to 1.
  • Sources: clonal-evolution, subclonal-reconstruction
  • Related objects: Clone, CloneFrequency

CloneFrequency

The set of real numbers in [0,1] representing the proportion of tumor cells belonging to a specific clone — equivalently, the Cancer Cell Fraction (CCF). An element φ ∈ CloneFrequency is the CCF of a particular clone: φ = |c| / Σ|cⱼ|.

  • Examples: CCF = 1.0 (clonal, present in every tumor cell); CCF = 0.3 (subclonal, present in 30% of cells); CCF ≈ 0.05 (near detection limit).
  • Detection limit: CCF ~ 0.05–0.10 at standard 100× sequencing depth (Tarabichi et al., 2021; Turajlic et al., 2019).
  • Sources: subclonal-reconstruction

SubclonalArchitecture

The set of possible phylogenetic structures relating clones — ordered pairs (Tree, Frequencies) where Tree is a rooted binary tree (clonal phylogeny) and Frequencies is an assignment of CCFs to each node.

  • Examples: Linear architecture (one clone per time slice), branched architecture (coexisting sibling subclones), neutral architecture (many small subclones on a single branch).
  • Sources: clonal-evolution, subclonal-reconstruction

IntratumorHeterogeneity

The set of diversity measures over TumorCellPopulation. An element ι ∈ IntratumorHeterogeneity is a scalar (Shannon diversity index, number of subclones, VAF distribution width) capturing the degree of clonal diversity.

1.3 Fitness/Selection Level

FitnessValue

The set of real numbers representing the reproductive fitness of a clone — the net growth rate r = b − d where b is the birth rate and d is the death rate per cell division. An element f ∈ FitnessValue is a real number; positive values indicate net growth; negative values indicate net decline.

  • Formal note: The absolute scale is arbitrary; fitness is a relative measure. The selection coefficient s = (r_driver − r_wt)/r_wt is the relevant comparative quantity.
  • Empirical anchor: Typical driver advantage s ≈ 0.004 (0.4%). WHIM-09 chromothriptic cure: s ≈ 0.01–0.1 (McDermott et al., 2015).
  • Sources: clonal-evolution, clonal-sweep, therapy-resistance

SelectionEvent

The set of time intervals during which clone frequencies change directionally due to fitness differences. An element σ ∈ SelectionEvent is an episode where one or more clones expand relative to competitors because they carry fitness-conferring mutations.

  • Examples: A clonal sweep after TP53 loss; elimination of HR-proficient cells after PARP inhibitor treatment; immune editing eliminating neo-antigen-bearing clones.
  • Sources: clonal-evolution, clonal-sweep

ClonalSweep

The subset of SelectionEvent where a single clone expands to dominate the population (CCF → 1). An element w ∈ ClonalSweep is a complete or near-complete selective replacement.

  • Sources: clonal-sweep
  • Condition for occurrence: The Bozic-Nowak condition τ_k > sweep_time must hold (see §3.5).

ClonalInterference

The subset of SelectionEvent where multiple adaptive clones expand simultaneously, competing for resources and space, with no single clone achieving fixation. ClonalInterference ∩ ClonalSweep = ∅.

  • Sources: clonal-evolution, clonal-sweep
  • Condition for occurrence: The Bozic-Nowak condition fails: τ_k < sweep_time — new drivers appear before the previous clone has swept.

GeneticDrift

The set of stochastic frequency changes in CloneFrequency over time, occurring when selection coefficients are small relative to 1/N (where N is effective population size). An element δ ∈ GeneticDrift is a random fluctuation with expected value 0 and variance proportional to φ(1−φ)/N.

1.4 Environmental Level

Microenvironment

The set of local tissue contexts that a tumor cell experiences — oxygen tension, nutrient availability, pH, immune cell infiltration, growth factor concentrations, extracellular matrix composition, stromal cell interactions. An element m ∈ Microenvironment is a specific combination of these conditions at a time and spatial location.

TherapeuticPressure

The set of drug types, concentrations, schedules, and combinations applied to the tumor. An element t ∈ TherapeuticPressure is a specific therapy regimen.

  • Examples: Maximum-tolerated-dose cisplatin + etoposide (Q3W); adaptive therapy with intermittent abiraterone (PSA-driven); lenalidomide maintenance (continuous 10 mg daily).
  • Sources: therapy-resistance, clonal-evolution

BottleneckEvent

The set of severe population reductions — measurable decreases in TumorCellPopulation size caused by TherapeuticPressure, microenvironmental catastrophe, or natural stochastic dynamics. An element b ∈ BottleneckEvent is a specific bottleneck episode.

  • Examples: Therapy-induced bottleneck (CR/vgPR in Myeloma XI); natural bottleneck (hypoxic crisis); spatial bottleneck (founder cells of a metastasis).
  • Sources: population-bottleneck, clonal-evolution

BottleneckSeverity

The set of metrics quantifying the depth of population reduction — the set of possible values of the ratio N_post/N_pre (surviving fraction). An element β ∈ BottleneckSeverity is a survival fraction.

  • Empirical anchor: Shallow (β > 0.01, corresponds to PR/incomplete response) vs. deep (β < 0.01, corresponds to CR/vgPR) (Miething, 2019). These are clinical surrogates; direct measurements are not available.
  • Sources: population-bottleneck

1.5 Immune Level

ImmuneSystem

The set of immune cell populations and their functional states within the tumor microenvironment. An element IS ∈ ImmuneSystem is a specific immune contexture — the composition, activation state, and spatial distribution of T cells (CD8+, CD4+), B cells, NK cells, macrophages (M1/M2), dendritic cells, and myeloid-derived suppressor cells (MDSCs).

  • Empirical anchor: Immune infiltration measured by IHC (CD8, PD-L1), gene expression signatures (GEP, Immunoscore), or TCR repertoire sequencing. PCAWG Consortium (2020) identified immune-edited tumors with reduced neoantigen burden.
  • Sources: clonal-evolution, PCAWG Consortium (2020)

Neoantigen

The set of peptide sequences produced by somatic mutations that can be recognized as non-self by T cells. An element n ∈ Neoantigen is a specific peptide-MHC complex derived from a mutated protein. Clonal neoantigens (from truncal mutations) are present in all tumor cells; subclonal neoantigens are present only in a subset.

  • Empirical anchor: Neoantigen burden correlates with immunotherapy response. Clonal neoantigens are superior therapeutic targets (McGranahan & Swanton, 2017). Immune editing preferentially eliminates subclones bearing immunogenic neoantigens.
  • Sources: clonal-evolution, McGranahan & Swanton (2017)

ImmuneRecognition

The set of molecular events by which the immune system detects a clone. An element ir ∈ ImmuneRecognition is a specific recognition event: TCR binding to peptide-MHC, NK cell recognition of stress ligands, or antibody binding to surface antigens.

  • Sources: McGranahan & Swanton (2017)

ImmuneResponse

The set of immune effector functions triggered by recognition. An element r ∈ ImmuneResponse is a specific outcome: CD8+ T-cell killing, NK-mediated lysis, antibody-dependent cellular cytotoxicity (ADCC), or cytokine-mediated growth arrest.

  • Empirical anchor: Immune checkpoint blockade (anti-PD-1, anti-CTLA-4) removes inhibitory signals, enabling pre-existing but suppressed immune responses.
  • Sources: clonal-evolution, therapy-resistance

ImmuneEvasion

The set of mechanisms by which clones escape immune elimination. Elements include: loss of antigen presentation (B2M mutation, HLA loss), upregulation of checkpoint ligands (PD-L1), secretion of immunosuppressive cytokines (TGF-β, IL-10), recruitment of regulatory T cells (Tregs), and immunoediting (selective elimination of immunogenic subclones, leaving non-immunogenic clones).

  • Empirical anchor: HLA LOH is common in lung cancer (McGranahan & Swanton, 2017). PD-L1 expression is a predictive biomarker for checkpoint inhibitor response. Immunoediting produces tumors with reduced neoantigen burden compared to expected mutation load.
  • Sources: McGranahan & Swanton (2017), PCAWG Consortium (2020), therapy-resistance

1.6 Temporal/Phylogenetic Level

EvolutionaryTime

The set of positive real numbers representing elapsed time in units of cell division generations. An element τ ∈ EvolutionaryTime is a time index.

  • Formal note: Evolution is time-indexed: many objects (Clone, TumorCellPopulation, Microenvironment) are functions of EvolutionaryTime.
  • Sources: clonal-evolution, clonal-sweep

PhylogeneticTree

The set of rooted bifurcating trees representing clonal ancestry — the hierarchical relationship among clones. An element T ∈ PhylogeneticTree is a specific tree with nodes = clones, edges = ancestor-descendant relationships, root = the single cell of origin.

EvolutionaryMode

The set of four qualitative patterns describing how clonal composition changes over time: {linear, branching, neutral, punctuated}. An element m ∈ EvolutionaryMode classifies the dominant dynamics over a specified time interval.

MolecularClock

The set of passenger mutation accumulation rates — the expected number of neutral mutations per cell division. An element µ_c ∈ MolecularClock is a rate parameter (mutations per division per genome).

1.7 Clinical Level

TherapyResponse

The set of clinical response categories: {CR, vgPR, PR, SD, PD}. An element r ∈ TherapyResponse is a categorical assessment of tumor burden change under therapy.

Relapse

The set of recurrence events — tumor regrowth after TherapyResponse. An element r ∈ Relapse is a specific relapse episode, characterized by its clonal architecture (linear vs. branching) and time to progression.

OverallSurvival

The set of positive real numbers representing time from diagnosis (or treatment start) to death from any cause. An element s ∈ OverallSurvival is a survival time.

Metastasis

The set of dissemination events — a tumor cell or cell cluster that establishes a secondary tumor at a distant anatomical site. An element m ∈ Metastasis is a specific metastatic event, characterized by its divergence time relative to the last clonal sweep (early vs. late divergence).

  • Sources: clonal-evolution, clonal-sweep
  • Empirical anchor: ~75% of NSCLC metastases show late divergence (after last clonal sweep); ~25% show early divergence (Al Bakir et al., 2023).

2. Arrows

Arrows are functions between object sets, labeled by what they mean biologically. For each arrow we specify: name, domain → codomain, biological interpretation, and wikilinks to relevant concept pages.

2.1 Genomic Arrows

ArrowDomain → CodomainMeaningSources
genomeClone → GenomeStateAssigns each clone its defining genome state — the set of somatic mutations shared by all cells in that cloneclonal-evolution, subclonal-reconstruction
mutatesGenomeState → GenomeStateA single mutational event transforming one genome state to another; composition is sequential mutation accumulationclonal-evolution
isDriverDriverMutation → GenomeStateA driver mutation alters the genome (inclusion arrow); DriverMutation ⊆ Mutationclonal-evolution
isPassengerPassengerMutation → GenomeStateA passenger mutation alters the genome without fitness effect; PassengerMutation ⊆ Mutationclonal-evolution
generatesSignatureMutationalSignature → GenomeStateA specific mutagenic process acting on a genome produces a characteristic pattern of mutationsclonal-evolution
hasEpigenomeClone → EpigeneticStateAssigns each clone its epigenetic configuration — chromatin state maintained through cell divisiondual-regime-evolution
epimutatesEpigeneticState → EpigeneticStateEpigenetic modification event — methylation gain/loss, histone mark deposition/removal; reversible (arrows form a groupoid)dual-regime-evolution
couplesGeneticToEpigeneticDriverMutation × EpigeneticState → EpigeneticStateA genetic mutation alters the epigenetic landscape (e.g., IDH1 mutation → DNA hypermethylation via 2-HG; ARID1A mutation → SWI/SNF complex disruption)dual-regime-evolution
couplesEpigeneticToGeneticEpigeneticState → MutationRateEpigenetic changes modulate the rate of genetic mutation (e.g., MGMT promoter methylation → increased C→T mutation rate from unrepaired O⁶-methylguanine; chromatin accessibility affects APOBEC mutagenesis)dual-regime-evolution
producesMutationMutationalSignature × GenomeState → MutationA mutational signature acting on a genome produces a specific mutation — the composition of signature generation and genome alterationclonal-evolution

2.2 Population Arrows

ArrowDomain → CodomainMeaningSources
composesClone × TumorCellPopulation → TumorCellPopulationA clone is a subset of a tumor cell population — the population is the union of its clonesclonal-evolution, subclonal-reconstruction
frequencyClone × TumorCellPopulation → CloneFrequencyReturns the proportion of tumor cells belonging to a given clone at a given timesubclonal-reconstruction
architectureTumorCellPopulation → SubclonalArchitectureMaps a tumor cell population to its clonal phylogenetic structureclonal-evolution, subclonal-reconstruction
diversityTumorCellPopulation → IntratumorHeterogeneityComputes a diversity measure (e.g., Shannon index, number of subclones, pairwise VAF divergence) from the clonal compositionclonal-evolution
bifurcatesClone → Clone × CloneA cell division produces two daughter cells; after a driver mutation in one daughter, a new subclone diverges from the parentclonal-evolution
expandsClone × EvolutionaryTime → CloneFrequencyThe change in a clone’s frequency over time — population dynamics functionclonal-evolution, clonal-sweep
contractsClone × EvolutionaryTime → CloneFrequencyReduction in clone frequency due to selection, drift, or therapyclonal-evolution
initiatesCellOfOrigin → CloneA single normal cell acquires the first driver mutation and founds a new clone — the origin of the neoplasm (Nowell, 1976)clonal-evolution
foundsFromBottleneckBottleneckEvent × TumorCellPopulation → CloneA bottleneck event selects a subset of survivors; these survivors found the new clonal populationpopulation-bottleneck

2.3 Selection Arrows

ArrowDomain → CodomainMeaningSources
fitnessClone × Microenvironment → FitnessValueReturns the fitness of a clone conditional on its microenvironment; the same clone has different fitness in different environmentsclonal-evolution
confersFitnessDriverMutation → FitnessValueA driver mutation increases the fitness of the clone that bears it; formally: fitness(mutated_clone) > fitness(parent_clone)clonal-evolution, clonal-sweep
selectsTumorCellPopulation × SelectionEvent → TumorCellPopulationA selection event maps a population to its descendant population with altered clone frequencies — directional changeclonal-evolution
driftsTumorCellPopulation × GeneticDrift × EvolutionaryTime → TumorCellPopulationGenetic drift maps a population to its descendant population with stochastic frequency changesclonal-evolution
sweepsClonalSweep × TumorCellPopulation → TumorCellPopulationA clonal sweep maps a population to one where a single clone has reached fixation (CCF → 1)clonal-sweep
interferesClonalInterference × TumorCellPopulation → TumorCellPopulationClonal interference maps a population to one with multiple expanding clones and no fixationclonal-evolution, clonal-sweep
fitnessGradientClone × Microenvironment → ℝThe gradient of fitness with respect to mutational distance — the “interestingness” in Schmidhuber’s sense; steep gradients mean large fitness effects per mutationcompression-progress-evolution

2.4 Environmental Arrows

ArrowDomain → CodomainMeaningSources
createsBottleneckTherapeuticPressure × TumorCellPopulation → BottleneckEventA therapy regimen applied to a tumor cell population creates a bottleneck (surviving subpopulation)population-bottleneck, therapy-resistance
determinesSeverityTherapeuticPressure × TumorCellPopulation → BottleneckSeverityThe depth of the bottleneck is a function of therapy intensity and the population’s composition (fraction of resistant cells)population-bottleneck
surviveBottleneckEvent × TumorCellPopulation → TumorCellPopulationThe survivors of a bottleneck — the subset of clones that persists after population reductionpopulation-bottleneck
remodelsMicroenvironmentTumorCellPopulation → MicroenvironmentThe tumor modifies its own microenvironment through angiogenesis, ECM remodeling, metabolic waste production, and immune modulationclonal-evolution
respondsToTherapyClone × TherapeuticPressure → TherapyResponseA clone’s response to therapy — determined by its resistance genotype and the therapy mechanismtherapy-resistance
inducesEpigeneticMicroenvironment → EpigeneticStateEnvironmental signals (hypoxia, inflammation, ECM stiffness) induce specific epigenetic changes — non-Darwinian variation generationdual-regime-evolution

2.5 Immune Arrows

ArrowDomain → CodomainBiological meaningSource
presentsMutation × MHC → NeoantigenA somatic mutation produces a peptide presented on MHC — the molecular basis of immune recognitionMcGranahan & Swanton (2017)
recognizesNeoantigen × ImmuneSystem → ImmuneRecognitionA T cell clone recognizes a neoantigen — initiates immune responseMcGranahan & Swanton (2017)
elicitsImmuneRecognition → ImmuneResponseRecognition triggers effector function — killing, cytokine release, ADCCMcGranahan & Swanton (2017)
eliminatesImmuneResponse × Clone → CloneFrequencyAn effective immune response reduces clone frequency — immune-mediated negative selectionclonal-evolution, PCAWG (2020)
evadesClone × ImmuneSystem → ImmuneEvasionA clone deploys mechanisms to escape immune elimination — HLA loss, PD-L1 upregulation, Treg recruitmentMcGranahan & Swanton (2017)
editsImmuneSystem × TumorCellPopulation → SubclonalArchitectureSustained immune pressure reshapes clonal architecture by eliminating immunogenic subclones — immunoeditingPCAWG (2020)

2.6 Temporal/Phylogenetic Arrows

ArrowDomain → CodomainMeaningSources
ancestorClone × Clone → CloneReturns the most recent common ancestor of two clones (the MRCA node in the phylogenetic tree). Partial function — defined iff the two clones share a common ancestor.clonal-evolution
divergenceTimeClone × Clone → EvolutionaryTimeReturns the time (in generations) since two clones diverged from their MRCAclonal-sweep
treeFromPopulationTumorCellPopulation → PhylogeneticTreeReconstructs the phylogenetic tree from the clonal composition (inverse of architecture arrow, subject to detection limits)subclonal-reconstruction
modeFromTreePhylogeneticTree × EvolutionaryTime → EvolutionaryModeClassifies the evolutionary mode from the tree structure and dynamics over a time intervalclonal-evolution
tickMolecularClock × EvolutionaryTime → PassengerMutationThe molecular clock — passenger mutations accumulate at a characteristic rate per generationclonal-evolution
sweepTimingReferenceClonalSweep × Metastasis → EvolutionaryTimeThe last clonal sweep provides a reference point for timing metastatic divergence: post-sweep = late, pre-sweep = earlyclonal-sweep

2.7 Clinical Arrows

ArrowDomain → CodomainMeaningSources
clinResponseTumorCellPopulation × EvolutionaryTime → TherapyResponseClinical assessment of tumor burden change under therapy at a given time pointtherapy-resistance
relapseArchitectureRelapse → SubclonalArchitectureAssigned the clonal architecture at relapse — linear, branching, or polyclonalpopulation-bottleneck
precedesRelapseTherapyResponse → BottleneckEventA therapy response (especially CR/vgPR) produces a bottleneck; this arrow maps response depth to bottleneck severitypopulation-bottleneck
prognosisFromModeEvolutionaryMode → OverallSurvivalEvolutionary mode predicts clinical outcome: punctuated → worse survival; branching → intermediate; linear → better (Turajlic et al., 2019)clonal-evolution
metastasizesClone × AnatomicalSite → MetastasisA clone disseminates to a distant site and establishes a secondary tumorclonal-evolution

3. Commutativity Conditions

Commutativity conditions are the formal constraints that make the olog a verified knowledge structure. A condition states that two paths from A to B (composite arrows) must map each element of A to the same element of B — they must be equal as functions. Each condition encodes a biological constraint that empirical evidence supports, partially supports, or challenges.

3.1 Driver-Sweep Commutativity

Path 1: Clone →(genome)→ GenomeState →(mutates via driver)→ GenomeState →(confersFitness)→ FitnessValue →(selection)→ ClonalSweep

Path 2: Clone →(genome)→ GenomeState →(isDriver)→ DriverMutation →(confersFitness)→ FitnessValue →(selects via sweep)→ TumorCellPopulation

Constraint: A driver mutation confers a fitness advantage that produces a clonal sweep through direct selection of the mutant genome. The two paths must commute: the process of “mutate genome → increase fitness → sweep” produces the same population outcome as “acquire driver mutation → undergo selection → fixed population.”

Biological meaning: This is the core Darwinian condition. It asserts that the arrow from DriverMutation to ClonalSweep factors uniquely through FitnessValue — drivers cause sweeps because and only because they increase fitness. If a driver mutation caused a sweep through a non-fitness mechanism (e.g., by directly reducing cell death rate without changing birth rate), the commute preserves this as long as FitnessValue captures net growth rate.

Empirical support: Supported in the strong-selection regime (s >> 1/N). Bozic-Nowak deterministic model (Bozic et al., 2010) explicitly assumes this commutativity: driver → fitness increase → sweep. PCAWG Consortium (2020) data confirms 91% of tumors have at least one identified driver, consistent with this path dominating early tumor evolution.

Challenge: In the weak-selection regime (s ~ 1/N) and clonal interference regime, the commutativity fails because multiple drivers with similar fitness effects compete — no single DriverMutation → ClonalSweep path is defined. The condition holds only on the subcategory of strong-selection events. This is noted in compression-progress-evolution §Category-theoretic validation: “Where commutativity breaks… For passenger mutations and weak selection (s ~ 1/N, drift regime), commutativity fails.”

Sources: clonal-evolution, clonal-sweep, compression-progress-evolution, therapy-resistance

3.2 Bottleneck-Diversity Condition

Path 1: TherapeuticPressure →(createsBottleneck)→ BottleneckEvent →(survive)→ TumorCellPopulation(survivors) →(architecture)→ SubclonalArchitecture(post-bottleneck)

Path 2: TumorCellPopulation(pre-treatment) →(architecture)→ SubclonalArchitecture(pre-treatment) →(apply filter via bottleneck)→ SubclonalArchitecture(post-bottleneck)

Constraint: The post-bottleneck subclonal architecture obtained by “applying therapy → surviving cells → reconstruct architecture” must equal the architecture obtained by “compute pre-treatment architecture → filter through bottleneck.” Mathematically: the bottleneck function on populations must commute with the architecture function — the surviving population’s architecture is the pre-treatment architecture restricted to the survivor subset.

Biological meaning: The bottleneck preserves the ancestral relationships among surviving clones. The phylogenetic tree after a bottleneck is a subtree (not a restructured tree) of the pre-bottleneck tree. If a clone A was ancestral to clone B before the bottleneck, and both survive, A remains ancestral to B after the bottleneck. The condition enforces phylogenetic consistency through population perturbations.

Empirical support: Supported by Myeloma XI data (Miething, 2019): diagnosis-clonal relationships are preserved at relapse — the bottleneck prunes but does not reorder the phylogeny. Walens et al. (2020) cellular barcoding confirms that recurrent clones are present in the pre-treatment population: “Clonal diversity decreased progressively during regression and residual disease” — consistent with pruning, not restructuring.

Challenge: The bottleneck can induce de novo mutations in survivors (therapy-induced mutagenesis). If a survivor acquires a genome-state-changing mutation during or immediately after the bottleneck, the post-bottleneck phylogeny gains a new branch not present in the pre-treatment architecture. This creates a formal violation of commutativity unless we restrict GenomeState to pre-bottleneck mutations. The empirical question: how many relapse mutations are pre-existing vs. de novo? Walens et al. (2020) found ~50% polyclonal recurrence involving Jak/Stat pathway activation — some of which was pre-existing (epigenetic plasticity) and some potentially de novo.

Sources: population-bottleneck, therapy-resistance, clonal-evolution

3.3 Phylogenetic Consistency Condition

Path 1: Clone₁ × Clone₂ →(ancestor)→ MRCA →(treeFromPopulation)→ PhylogeneticTree

Path 2: Clone₁ × Clone₂ →(treeFromPopulation)→ PhylogeneticTree →(findMRCA)→ MRCA

Constraint: The most recent common ancestor (MRCA) of two clones derived from the pairwise ancestor arrow must equal the MRCA computed from the full phylogenetic tree. Formally: the ancestor arrow commutes with the tree-building and MRCA-extraction arrows.

Biological meaning: Pairwise clone comparisons are consistent with the global phylogeny. This is the fundamental consistency condition that subclonal reconstruction algorithms (Tarabichi et al., 2021) attempt to satisfy. When it fails, it indicates: (a) convergent evolution — two clones independently acquired the same mutation, violating the infinite sites assumption; (b) polyclonal origin — the tumor had more than one cell of origin; (c) sequencing errors or misclustering — two clones were artifactually merged or split.

Empirical support: Generally holds for well-validated phylogenetic reconstructions. The infinite sites assumption is reasonable for point mutations (SNVs) in most tumors. PCAWG Consortium (2020) used branching phylogenies across 2,658 tumors and found consistency in the vast majority.

Challenge: In large tumors with high mutation rates, the same mutation can arise independently in different clones (homoplasy). Chromosomal instability generates CNAs that violate the infinite sites assumption for structural variants (McGranahan & Swanton, 2017). Subclonal reconstruction tools must explicitly model violations: “The ‘infinite sites’ assumption… can be violated in large tumors” (Tarabichi et al., 2021). The condition holds as an approximation, not a strict law.

Sources: subclonal-reconstruction, clonal-evolution

3.4 Fitness-Environment Coupling Condition

Path 1: Clone →(fitness in env1)→ FitnessValue₁

Path 2: Clone →(fitness in env2)→ FitnessValue₂

Constraint (non-commutativity): For a fixed Clone c, fitness(c, Microenvironment₁) ≠ fitness(c, Microenvironment₂) in general. More precisely, the function fitness: Clone × Microenvironment → FitnessValue does NOT factor as Clone → FitnessValue composed with a Microenvironment-independent map — fitness is not an intrinsic property of a clone, but a joint property of clone and environment.

Biological meaning: The same genotype has different fitness in different microenvironments. A KRAS-mutant clone that is fit in the primary tumor (growth factors present, wild-type stroma) may be unfit in the metastatic liver microenvironment (different growth factors, activated stellate cells, different immune milieu). This is the biological basis for: (a) microenvironment-dependent therapy response — drugs work differently in different niches; (b) metastatic tropism — some clones thrive only in specific metastatic sites; (c) the failure of single-biopsy genomics to capture the full adaptive landscape.

Empirical support: Definitive. Hypoxic microenvironment selects for p53-loss clones that are less fit in normoxic conditions. Breast cancer brain metastases are enriched for HER2 amplification even in patients whose primary was HER2-negative — the brain microenvironment selects for HER2-amplified clones (Turajlic et al., 2019). Single-cell prostate cancer data shows transcriptional state shifts across spatially distinct sites while genetic lineage is preserved (Mikutenaite et al., 2025, cited in dual-regime-evolution).

Formal note: This condition is technically a non-commutativity — it states that a certain diagram fails to commute, which is itself a biologically meaningful constraint. It is the reason the olog requires Microenvironment as an explicit argument to fitness rather than a global parameter. In compression-progress language: fitness = compression quality, but the data being compressed (the environment) changes. The same compressor (genome) may be good for one environment and bad for another.

Sources: clonal-evolution, dual-regime-evolution, compression-progress-evolution

3.5 Sweep-Timing Condition (Bozic-Nowak)

Path 1: DriverMutation →(confersFitness)→ FitnessValue →(selection)→ ClonalSweep →(requires)→ τ_k > sweep_time

Path 2: DriverMutation →(confersFitness)→ FitnessValue →(selection)→ ClonalInterference →(requires)→ τ_k ≤ sweep_time

Constraint: For a given driver mutation in a population of size N, the evolutionary outcome (sweep vs. interference) is determined by the inequality τ_k > sweep_time, where τ_k = (T/ks) × log(2ks/u) is the waiting time for the next driver and sweep_time is proportional to N/(ks) (Greaves & Maley, 2012). The two paths partition the domain: the same mutation cannot produce both a sweep and interference.

Biological meaning: Sweeps occur when the waiting time for the next driver exceeds the time needed for the current driver to reach fixation. This is the fundamental timescale separation that determines tumor evolutionary mode. When τ_k >> sweep_time (early tumors, N ~ 10³–10⁵), clean sequential sweeps dominate — Nowell’s regime. When τ_k ≤ sweep_time (late tumors, N ~ 10⁸–10¹¹), clonal interference dominates — Greaves & Maley’s regime.

Empirical support: Strong. The condition is derived from the Bozic et al. (2010) branching process model. It explains why early tumors show linear evolution (small N, long τ_k) and late tumors show branching evolution (large N, short τ_k). Therapy artificially shortens sweep time by killing sensitive cells, converting a τ_k < sweep_time situation into τ_k > sweep_time — this is why therapy triggers sweeps of resistant clones (clonal-sweep §When Sweeps Can Occur, therapy-resistance).

Formal note: This is a conditional commutativity: the arrow ClonalSweep is only defined (only has elements in its domain) when the inequality holds. In categorical terms, ClonalSweep is a subset of SelectionEvent, and the inclusion is defined by the inequality.

Sources: clonal-sweep, clonal-evolution, therapy-resistance

3.6 Bottleneck-Severity Dichotomy Condition

Path 1 (Shallow bottleneck): TherapeuticPressure →(createsBottleneck)→ BottleneckEvent →(determinesSeverity)→ BottleneckSeverity(shallow) →(survive)→ ClonePopulation(survivors) →(architecture)→ SubclonalArchitecture(linear/stable)

Path 2 (Deep bottleneck): TherapeuticPressure →(createsBottleneck)→ BottleneckEvent →(determinesSeverity)→ BottleneckSeverity(deep) →(survive)→ ClonePopulation(survivors) →(architecture)→ SubclonalArchitecture(branching)

Constraint: Bottleneck severity determines post-bottleneck architecture. Shallow bottlenecks (β > ~0.01, i.e., partial response) produce linear or stable clonal architecture at relapse. Deep bottlenecks (β < ~0.01, i.e., CR/vgPR) produce branching clonal architecture at relapse. The two paths from TherapeuticPressure to SubclonalArchitecture factor through different values of BottleneckSeverity and produce distinct outcomes.

Biological meaning: The bottleneck paradox (Miething, 2019): deeper responses produce more diverse relapses. This is a non-linear mapping — the relationship between population reduction and architectural outcome is not monotonic. In compression-progress language: shallow bottlenecks partially damage the dominant compression; the same clone repairs and re-expands. Deep bottlenecks destroy the compression entirely; survivors must re-explore, generating new diversity (compression-progress-evolution §The bottleneck paradox, population-bottleneck §Resolution via Compression-Progress).

Empirical support: Myeloma XI trial (Jones et al., 2019, N = 56 diagnosis/relapse pairs): “Patients achieving CR/vgPR underwent a clonal bottleneck leading to branched clonal architecture upon relapse, whereas patients with partial responses maintained linear evolution or stable clonal patterns” (Miething, 2019). Walens et al. (2020) provides the mechanism: ~50% of recurrences after a deep bottleneck use clonal dominance (Met amplification — genetic compression breakthrough) and ~50% use polyclonal recurrence (Jak/Stat pathway — plasticity-driven re-diversification).

Challenge: Bottleneck severity is not directly measured — response depth (CR/vgPR vs. PR) is a clinical surrogate (population-bottleneck §Limitations). The threshold value (β ≈ 0.01) is a notional estimate from the clinical data, not a precisely calibrated parameter. Replication in other cancer types is needed.

Sources: population-bottleneck, compression-progress-evolution, therapy-resistance

3.7 Dual-Regime Coupling Condition

Path 1 (Epigenetic → Genetic): Microenvironment →(inducesEpigenetic)→ EpigeneticState(methylated MGMT) →(couplesEpigeneticToGenetic)→ MutationRate(increased) →(mutates)→ GenomeState(hypermutated)

Path 2 (Genetic → Epigenetic): DriverMutation(IDH1 R132) →(couplesGeneticToEpigenetic)→ EpigeneticState(hypermethylated) →(hasEpigenome)→ Clone(altered chromatin)

Constraint: The two arrows couplesEpigeneticToGenetic and couplesGeneticToEpigenetic must compose such that the coupled system is irreducible — the state of the genetic regime is a function of the epigenetic regime and vice versa. Formally, the diagram formed by these two arrows and the identity arrows on GenomeState and EpigeneticState must have the property that there is no projection GenomeState × EpigeneticState → GenomeState (or → EpigeneticState) that factors all dynamics — the coupled system is not a product of independent systems.

Biological meaning: The genetic and epigenetic regimes are not independent. They are coupled through specific mechanisms: (a) epigenetic silencing of DNA repair genes (MGMT, MLH1) increases mutation rate — the epigenetic state controls the genetic mutation rate; (b) mutations in chromatin modifiers (IDH1/2, ARID1A, PBRM1, EZH2, DNMT3A, TET2) globally alter the epigenetic landscape — genetic changes control the epigenetic regulatory machinery. Neither regime can be understood in isolation (dual-regime-evolution §Coupling Between Regimes).

Empirical support: Definitive. IDH1/2 mutations cause genome-wide DNA hypermethylation via 2-hydroxyglutarate production, producing the Glioma CpG Island Methylator Phenotype (G-CIMP). MGMT promoter methylation increases C→T mutation rates in glioblastoma — tumors with methylated MGMT have more mutations than those without (PCAWG Consortium, 2020, cited in dual-regime-evolution). TP53 mutation disables the DNA damage response, decoupling entropy detection from epigenetic restructuring.

Formal note: The coupling arrows define a profunctor, not a functor, between DarwCat and NonDarwCat (dual-regime-evolution §Category-theoretic Analysis). A strict functor F: DarwCat → NonDarwCat does not exist because: (1) epigenetic arrows are reversible (groupoid structure) while genetic arrows are irreversible (partial order); (2) epigenetic change is massively parallel while genetic mutation is serial; (3) the “same” genetic arrow produces different epigenetic outcomes depending on context. The coupling mechanisms are the coherence conditions that make the profunctor well-behaved.

Sources: dual-regime-evolution, compression-progress-evolution

3.8 Compression-Entrenchment Condition

Path 1: Clone →(fitness in env0)→ FitnessValue(high, local max) →(remodelsMicroenvironment)→ Microenvironment(stable) →(fitness in perturbed env1)→ FitnessValue(high, still fit)

Path 2: Clone →(fitness in perturbed env1 directly)→ FitnessValue(high)

Constraint: A clone at a local fitness maximum (high FitnessValue in its current Microenvironment) resists displacement by small perturbations to that Microenvironment. Formally: for a clone c with fitness(c, env0) >> fitness(c', env0) for all competitors c’, and for “small” perturbations δ (TherapeuticPressure or microenvironmental change), fitness(c, env0+δ) > fitness(c', env0+δ) still holds — the entrenched clone remains fitter despite perturbation. The clone must be displaced by a perturbation large enough to move it off its fitness peak.

Biological meaning: Entrenchment — the longer a clone has dominated (the more it has “compressed” its microenvironment), the harder it is to dislodge therapeutically. This is the compression-entrenchment hypothesis (compression-progress-evolution, population-bottleneck): a clone that has achieved a high-quality compression of its environment is resistant to small perturbations. Large perturbations (deep bottlenecks) are required to force decompression, but these carry the risk of maximal re-diversification.

Empirical support: Indirect — the bottleneck paradox (deep response → branching relapse) is consistent with entrenchment (Miething, 2019). The compression-entrenchment hypothesis is formally specified in docs/superpowers/specs/2026-07-05-ith-outcome-test-design.md (referenced from population-bottleneck §Falsifiable Predictions). Walens et al. (2020) clonal dominance route (Met amplification) is an example of entrenchment: the Met-amplified clone has achieved a superior compression and resists displacement.

Challenge: The “size of perturbation” is not formally defined in the olog — there is no metric on Microenvironment or TherapeuticPressure that specifies what counts as “small.” The condition is qualitative. Also, the condition implicitly assumes the microenvironment is the only thing that changes — in reality, the clone continues to mutate during the perturbation, potentially finding an even better compression rather than being displaced. The condition holds as a first-order approximation on short timescales relative to mutation rates.

Sources: compression-progress-evolution, population-bottleneck, dual-regime-evolution

3.9 Molecular Clock Consistency Condition

Path 1: EvolutionaryTime →(tick)× MolecularClock → PassengerMutation(accumulated)

Path 2: EvolutionaryTime →(tick)× MolecularClock' → PassengerMutation(accumulated)

Constraint: For a given molecular clock rate µ_c and elapsed time τ, the number of accumulated passenger mutations N_pass = µ_c × τ must be the same regardless of which clones’ lineage is measured — the molecular clock ticks at the same rate for all clones in the same tumor.

Biological meaning: Passenger mutations accumulate at a constant rate per cell division, independent of the clone’s fitness or selection history. This is the molecular clock assumption that underlies phylogenetic dating: the number of passenger mutations shared by two clones is proportional to the time since their divergence.

Empirical support: “The molecular clock assumption holds in ~60% of samples” (CN-005 in contradiction-registry). Gerstung et al. (2020) found that ~40% of tumors show mutational signature shifts during their evolution, which implies rate changes. Constant-rate clock precision is reduced in these cases, but “rate changes can be calibrated over separate evolutionary epochs” (contradiction-registry resolution for CN-005).

Challenge: ~40% of tumors have non-constant mutation rates due to mutational signature changes (APOBEC activation late in evolution, HRD emergence, therapy-induced mutagenesis). Rate changes violate commutativity unless EvolutionaryTime is partitioned into epochs with distinct MolecularClock rates. The condition holds as an approximation for the constant-rate ~60% of tumors.

Sources: clonal-evolution, subclonal-reconstruction

3.10 Metastatic Timing Consistency Condition

Path 1: Clone(primary) →(metastasizes)→ Metastasis →(divergenceTime from primary last sweep)→ EvolutionaryTime(post-sweep → late divergence)

Path 2: Clone(primary) →(metastasizes)→ Metastasis →(divergenceTime from primary last sweep)→ EvolutionaryTime(pre-sweep → early divergence)

Constraint: The divergence time of a metastasis relative to the last clonal sweep in the primary tumor partitions the set of Metastasis into two disjoint subsets: early divergence (metastasis seeded before the sweep) and late divergence (seeded after). If the metastasis shares truncal mutations that arose during or after the last sweep, divergence is late. If it lacks them, divergence is early. The two paths cover all possibilities: every metastasis is either early or late relative to the last sweep.

Biological meaning: The last clonal sweep in the primary tumor is the reference point for timing metastatic dissemination. Al Bakir et al. (2023): “If the metastasis shares these mutations, divergence occurred after the last sweep (late divergence, ~75% of cases). If the metastasis lacks them, divergence occurred before the last sweep (early divergence, ~25%).” Early-diverging metastases evolved independently from the primary for longer and may have distinct therapeutic vulnerabilities.

Empirical support: TRACERx NSCLC data (Al Bakir et al., 2023). The ~75%/~25% split is specific to NSCLC and may differ across cancer types. Turajlic et al. (2019) note that punctuated-evolution tumors seed metastases monophyletically (early divergence from a single ancestral clone), while gradual-evolution tumors produce intermetastatic heterogeneity (mixed early and late divergence). The condition captures the relationship between clonal sweep dynamics and metastatic timing.

Sources: clonal-sweep, clonal-evolution

3.11 Immune Editing Commutativity Condition

Path 1: Mutation →(presents)→ Neoantigen →(recognizes)→ ImmuneRecognition →(elicits)→ ImmuneResponse →(eliminates)→ CloneFrequency(reduced)

Path 2: Mutation →(evades)→ ImmuneEvasion →(blocks)→ ImmuneRecognition →(no elicitation)→ CloneFrequency(unchanged)

Constraint: For a clone bearing a mutation that generates a neoantigen, the immune arrow (presents → recognizes → elicits → eliminates) and the evasion arrow (evades → blocks recognition) are competing paths with opposite effects on CloneFrequency. Path 1 reduces clone frequency; Path 2 preserves it. The actual outcome depends on whether immune evasion is effective: if evades succeeds, Path 2 overrides Path 1 and the clone survives. If evades fails (or is blocked by checkpoint inhibition), Path 1 dominates and the clone is eliminated.

Biological meaning: Immune recognition is necessary but not sufficient for clone elimination. A clone can be recognized (Path 1 initiated) but survive if it deploys effective evasion mechanisms (Path 2 overrides). This is why checkpoint inhibitors work: they block evasion, allowing Path 1 to complete. Immunoediting — the selective elimination of immunogenic clones over evolutionary time — is the population-level consequence: ImmuneSystem →(edits)→ SubclonalArchitecture maps the set of clones to the subset that have either low neoantigen burden or effective evasion.

Empirical support: PCAWG Consortium (2020) found evidence of immunoediting: tumors with high immune infiltration had reduced neoantigen burden compared to expected mutation load, consistent with selective elimination of immunogenic subclones. McGranahan & Swanton (2017): HLA LOH is a common evasion mechanism in lung cancer; clonal neoantigens are superior therapeutic targets because they are present in all cells and cannot be evaded by subclonal loss.

Formal note: Unlike the other commutativity conditions which assert that two paths MUST produce the same result, this condition asserts that two paths CANNOT both hold simultaneously — they are mutually exclusive for a given clone at a given time. This is a conflict condition, not a consistency condition. It formalizes the evolutionary arms race between immune recognition and immune evasion.

Sources: clonal-evolution, McGranahan & Swanton (2017), PCAWG Consortium (2020), therapy-resistance


4. Hierarchical Structure (Forest Subcategory H)

Define H as a subcategory of the olog where objects are the same objects as the full olog (Ob(H) = Ob(C)) and arrows are inclusion maps representing compositional hierarchies: lower-level objects are included IN higher-level objects. H is a forest — a disjoint union of trees — where each tree represents a biological scale level and edges represent “is a component of” (part-whole) or “is a type of” (subtype) relationships.

Level 1: Molecular
├── GenomeState ─────────┐
│   ├── Mutation ────────┤── includes → DriverMutation, PassengerMutation
│   └── EpigeneticState ─┤
└── MutationalSignature ─┘

Level 2: Cellular
├── Clone ─────────────── contains → genome(Clone → GenomeState)
│                         contains → hasEpigenome(Clone → EpigeneticState)
└── FitnessValue ──────── assigned by → fitness(Clone × Microenvironment → FitnessValue)

Level 3: Population
├── TumorCellPopulation ─ composed of → Clone instances
├── CloneFrequency ────── assigned by → frequency(Clone × TumorCellPopulation → CloneFrequency)
├── SubclonalArchitecture ──── constructed by → architecture(TumorCellPopulation → SubclonalArchitecture)
│                            └── IN → PhylogeneticTree (subtype)
└── IntratumorHeterogeneity ── computed by → diversity(TumorCellPopulation → IntratumorHeterogeneity)

Level 4: Environmental
├── Microenvironment ──── shapes → fitness(Clone × Microenvironment → FitnessValue)
│                         induces → inducesEpigenetic(Microenvironment → EpigeneticState)
├── TherapeuticPressure ── creates → createsBottleneck(TherapeuticPressure × TumorCellPopulation → BottleneckEvent)
├── BottleneckEvent ────── has → determinesSeverity(BottleneckEvent → BottleneckSeverity)
└── BottleneckSeverity ── quantifies → depth of population reduction

Level 5: Temporal/Phylogenetic
├── EvolutionaryTime ──── parameter of → expands(Clone × EvolutionaryTime → CloneFrequency)
├── PhylogeneticTree ──── includes → ancestor: Clone × Clone → Clone
├── EvolutionaryMode ──── classifies → modeFromTree(PhylogeneticTree × EvolutionaryTime → EvolutionaryMode)
└── MolecularClock ────── generates → tick(MolecularClock × EvolutionaryTime → PassengerMutation)

Level 6: Clinical
├── TherapyResponse ───── evaluates → clinResponse(TumorCellPopulation × EvolutionaryTime → TherapyResponse)
├── Relapse ────────────── characterized by → relapseArchitecture(Relapse → SubclonalArchitecture)
├── OverallSurvival ────── predicted by → prognosisFromMode(EvolutionaryMode → OverallSurvival)
└── Metastasis ─────────── timed by → sweepTimingReference(ClonalSweep × Metastasis → EvolutionaryTime)

Hierarchy constraints:

  • Each object appears at exactly one level (disjoint union of levels).
  • Arrows within a level connect objects at the same hierarchical scale (e.g., population-level arrows connect population objects).
  • Arrows crossing levels (e.g., confersFitness: DriverMutation → FitnessValue crosses from molecular to cellular) are “hierarchical interactions” in Buehler et al.’s sense — a property of a higher-level structure depends on a lower-level element. These cross-level arrows are the most biologically interesting; they encode why molecular changes (mutations) affect population dynamics (selection).
  • The forest H satisfies Ob(H) = Ob(C) (all objects appear in the hierarchy) and H is a forest (each object has at most one parent arrow pointing to a higher-level object — though cross-level arrows in the full olog are not hierarchy arrows).

5. Core Olog Diagram

The following Mermaid diagram shows the core olog structure — the minimal set of objects and arrows that capture the fundamental dynamics of clonal evolution. Objects are grouped by hierarchical level (dashed boxes). Commutativity conditions are annotated in red with their condition numbers. Cross-level arrows (hierarchical interactions) are emphasized with dotted lines.

flowchart TD
    subgraph L1["Level 1: Molecular"]
        GS[GenomeState]
        M[Mutation]
        DM[DriverMutation]
        PM[PassengerMutation]
        MS[MutationalSignature]
        ES[EpigeneticState]
    end

    subgraph L2["Level 2: Cellular"]
        C[Clone]
        FV[FitnessValue]
    end

    subgraph L3["Level 3: Population"]
        TCP[TumorCellPopulation]
        CF[CloneFrequency]
        SA[SubclonalArchitecture]
        ITH[IntratumorHeterogeneity]
    end

    subgraph L4["Level 4: Environmental"]
        ME[Microenvironment]
        TP[TherapeuticPressure]
        BE[BottleneckEvent]
        BS[BottleneckSeverity]
    end

    subgraph L5["Level 5: Temporal"]
        ET[EvolutionaryTime]
        PT[PhylogeneticTree]
        EM[EvolutionaryMode]
        CS[ClonalSweep]
        CI[ClonalInterference]
    end

    subgraph L6["Level 6: Clinical"]
        TR[TherapyResponse]
        RL[Relapse]
        OS[OverallSurvival]
        MT[Metastasis]
    end

    %% Genomic arrows
    M -->|"mutates"| GS
    DM -->|"isDriver"| GS
    PM -->|"isPassenger"| GS
    MS -->|"generatesSignature"| GS
    ES ---|"epimutates"| ES

    %% Population arrows
    C -->|"genome"| GS
    C ---|"hasEpigenome"| ES
    C -->|"composes"| TCP
    TCP -->|"architecture"| SA
    TCP -->|"diversity"| ITH

    %% Cross-level interaction arrows
    DM -.->|"confersFitness"| FV
    C -.->|"fitness(env)"| FV
    ME -->|"fitness(env)"| FV

    %% Selection arrows
    FV -->|"selects"| TCP
    FV -->|"sweeps"| CS
    FV -->|"interferes"| CI

    %% Environmental arrows
    TP -->|"createsBottleneck"| BE
    BE -->|"determinesSeverity"| BS
    BE -->|"survive"| TCP
    ME ---|"inducesEpigenetic"| ES
    TCP ---|"remodelsMicro"| ME

    %% Temporal arrows
    ET -->|"expands"| C
    C -->|"bifurcates"| C
    PT -->|"modeFromTree"| EM

    %% Clinical arrows
    TCP -->|"clinResponse"| TR
    TR ---|"precedesRelapse"| BE
    RL -->|"relapseArchitecture"| SA
    EM -->|"prognosisFromMode"| OS
    C -->|"metastasizes"| MT
    CS ---|"sweepTimingReference"| MT

    %% Dual-regime coupling
    DM -.->|"couplesGeneticToEpigenetic"| ES
    ES -.->|"couplesEpigeneticToGenetic"| M

    %% Commutativity annotations
    C3["CC3: Phylogenetic Consistency\nancestor ∘ tree = tree ∘ MRCA"] -.-> SA
    C5["CC5: Sweep-Timing\nτ_k > sweep_time → Sweep\nτ_k ≤ sweep_time → Interference"] -.-> CS
    C5 -.-> CI
    C6["CC6: Bottleneck Severity\nShallow → linear\nDeep → branching"] -.-> BE
    C7["CC7: Dual-Regime Coupling\nIrreducible coupled system"] -.-> DM
    C7 -.-> ES
    C10["CC10: Metastatic Timing\nEarly vs Late divergence"] -.-> MT

    %% Style
    classDef molecular fill:#e1f5fe,stroke:#01579b
    classDef cellular fill:#f3e5f5,stroke:#4a148c
    classDef population fill:#fff3e0,stroke:#e65100
    classDef environmental fill:#e8f5e9,stroke:#1b5e20
    classDef temporal fill:#fff8e1,stroke:#f57f17
    classDef clinical fill:#fce4ec,stroke:#880e4f
    classDef condition fill:#ffebee,stroke:#b71c1c,stroke-dasharray: 5 5

    class GS,M,DM,PM,MS,ES molecular
    class C,FV cellular
    class TCP,CF,SA,ITH population
    class ME,TP,BE,BS environmental
    class ET,PT,EM,CS,CI temporal
    class TR,RL,OS,MT clinical
    class C3,C5,C6,C7,C10 condition

Core cancer evolution olog. Objects grouped by hierarchical level (six levels: molecular through clinical). Solid arrows are within-level or cross-level functions. Dotted arrows are cross-level hierarchical interactions — properties of higher-level systems that depend on lower-level elements (Buehler et al., 2011). Dashed arrows with red annotations mark commutativity conditions. Five of the ten commutativity conditions are shown on the diagram for readability. Condition CC1 (Driver-Sweep) is the central Darwinian spine (DM → FV → TCP and DM → GS → TCP); CC2 (Bottleneck-Diversity) involves the BE → TCP → SA path; CC4 (Fitness-Environment) asserts that FV depends on both C and ME; CC8 (Compression-Entrenchment) and CC9 (Molecular Clock) are temporal constraints not directly shown.


6. Discussion and Limitations

What the olog captures

The olog formalizes the complete clonal evolution cycle: mutation generates variation (GenomeState changes), variation affects fitness (FitnessValue assignments), fitness differences drive selection (ClonalSweep, ClonalInterference), selection shapes population structure (SubclonalArchitecture, IntratumorHeterogeneity), and the population interacts with its environment (Microenvironment, TherapeuticPressure) — producing clinical outcomes (TherapyResponse, Relapse, OverallSurvival).

The ten commutativity conditions are the formal constraints that make this olog a verified knowledge structure, not just a diagram:

  • Conditions 1, 3, 5, 9, 10 are well-supported by empirical evidence and hold as approximations in defined regimes.
  • Condition 2, 6, 8 are partially supported — the structure is correct but the quantitative boundaries (bottleneck severity thresholds, perturbation size metrics) are not yet empirically calibrated.
  • Condition 4 is a non-commutativity — the failure of commutativity is itself biologically meaningful.
  • Condition 7 is a structural constraint — the dual-regime coupling is well-supported mechanistically, but the categorical structure (profunctor, not functor) requires further formal development.

Limitations

Deterministic bias. The olog’s arrow semantics presume deterministic functions. Evolution is dominated by stochastic drift. A proper evolutionary olog would require probabilistic extensions — Markov categories (where arrows are Markov kernels) or random functions. The current olog captures the expected (mean) behavior but not the variance.

Static representation. The olog captures structure at a snapshot in time. Evolutionary dynamics — frequency changes, sweep times, drift variance — require time-indexed categories or temporal logics. The object EvolutionaryTime is included as an explicit parameter to many arrows, but the olog does not enforce temporal consistency constraints (e.g., that a clone cannot be ancestral to itself, that mutation times are monotonic).

Incomplete domain. The olog is constructed from the wiki’s existing concept pages. Important objects may be missing: immune system components (T-cell clones, neo-antigens, HLA types), metabolic state, cell-of-origin, dormancy, polyclonal origins, and spatial structure are not fully formalized. The olog is extensible — new objects and arrows can be added as the wiki grows.

Fuzzy boundaries. Some object boundaries are not sharp: the DriverMutation/PassengerMutation distinction is context-dependent (a mutation can be a driver in one microenvironment and a passenger in another). MutationalSignature is a distribution, not a discrete set. These fuzzinesses reflect biological reality but reduce the olog’s formal precision.

No universal properties. The olog is a descriptive category, not a free category with universal properties. It does not have limits, colimits, adjunctions, or natural transformations — all of which would increase formal power at the cost of biological interpretability. The olog is a category in the “thin” sense used by Buehler et al. (2011).

Cross-domain synthesis

This cancer evolution olog is the target category for the wiki’s cross-domain functor program. The existing analogies — to compression progress (Schmidhuber 2009 → compression-progress-evolution), to cultural evolution (Gabora 2017 → dual-regime-evolution), and to ecological invasion (Geng et al. 2016 → population-bottleneck) — can be formalized as functors to or from this olog:

  • Compression progress functor G: ClonalCat → CompressionCat maps GenomeState → Data, FitnessValue → CompressionQuality, ClonalSweep → CompressionBreakthrough. Commutativity holds for strong-selection events (CC1 restricted to s >> 1/N) but fails for passenger mutations and the drift regime (compression-progress-evolution §Category-theoretic validation).

  • Dual-regime profunctor: The coupling between DarwCat (genetic) and NonDarwCat (epigenetic) is a profunctor, not a functor — defined by the coupling arrows (CC7). The profunctor structure captures the biological reality that a given mutation does not determine a unique epigenetic outcome but enables a distribution of possible outcomes.

  • Bottleneck functor F: EcologyCat → CancerCat maps: plant population → tumor cell population, founder bottleneck → therapy bottleneck, ISSR diversity → subclonal heterogeneity, phenotypic plasticity → epigenetic plasticity. The commutativity condition P → bottleneck → diversity → success = P → bottleneck → plasticity → success is the formal encoding of the bottleneck paradox (population-bottleneck §Category-theoretic structure).

These functors are defined and their commutativity conditions analyzed in the respective concept pages. The present olog provides the unified target category — the canonical representation of the cancer evolution domain against which all cross-domain analogies can be validated.