Compression Progress Olog

Definition

A category-theoretic ontology log (olog) for the compression progress domain, based on Schmidhuber (2009) and formalized following the hierarchical olog methodology of Giesa, Spivak, & Buehler (2011) (buehler2011-reoccurring-patterns). This olog formalizes the entities, functions, and commutativity conditions of the compression progress framework — the principle that adaptive systems are driven by intrinsic reward for compressing sensory history — and extends it to the evolutionary domain via the compression-evolution isomorphism (compression-progress-evolution).

This is one of three domain ologs (cancer, ecology, compression) designed to admit functorial mappings between them, enabling formal cross-domain analogy verification. The compression olog is the most abstract of the three: it formalizes a computational theory of intelligence, creativity, and curiosity that Schmidhuber (2009) proposes as universal across all adaptive systems.

Confidence note: The olog formalizes a theoretical framework with significant limitations — uncomputability of Kolmogorov complexity, unfalsifiability risk, literature isolation, and an unsupported consciousness claim (schmidhuber2009-compression-progress). The olog captures the formal structure of the framework as stated, not its empirical validation. The evolutionary extension (Level 5 objects) is the wiki’s cross-domain synthesis, not Schmidhuber’s own claim. The functorial mappings to cancer and ecology ologs are predictions to be tested, not established results.

Hierarchical Structure

The olog is organized into five levels forming a forest (following Buehler’s hierarchical subcategory H). Each level builds on the previous: data is compressed, compression drives cognition, cognition drives behavior, and behavior maps onto evolutionary dynamics.

Level 1: Data/Information      (foundational — sensory history, regularity, noise)
Level 2: Compressor             (compression machinery — representation, quality, progress)
Level 3: Cognitive/Agent        (curious agent — beauty, interestingness, learning)
Level 4: Action/Exploration     (behavioral — exploration, exploitation, discovery)
Level 5: Evolutionary Mapping   (biological — genome, fitness, selection, cancer)

Hierarchy condition: Ob(C) = Ob(H) and H is a forest: every object belongs to exactly one level, and arrows between levels respect the composition hierarchy (lower-level objects feed into higher-level functions). Cross-level arrows are permitted only when they represent hierarchical interactions (a higher-level property depending on a lower-level element).

1. Objects

Level 1 — Data/Information

ObjectDescriptionType
DataSequenceThe set of all possible sensory histories or observation streams experienced by an agent. Each element is a finite string over some alphabet.Set
RegularityThe set of compressible patterns, structures, or algorithmic regularities that can be discovered in data. A regularity is a description that is shorter than the data it describes.Set
NoiseThe set of incompressible randomness — data lacking any algorithmic regularity. A noise sequence has Kolmogorov complexity approximately equal to its length (K(s) |s).
KolmogorovComplexityThe set of values representing the length of the shortest program that generates a given data sequence. An uncomputable ideal (theoretical limit).Set (N)

Foundational constraint (Level 1 closure): The union Regularity ∪ Noise = DataSequence (every data sequence partitions into compressible structure and incompressible residue). The intersection Regularity ∩ Noise = ∅ (no sequence is both compressible and incompressible under the same compressor).

Level 2 — Compressor

ObjectDescriptionType
CompressorThe set of all algorithms or programs that discover regularities in data and produce compressed representations. A compressor takes a data sequence and outputs a shorter encoding.Set
CompressedRepresentationThe set of compressed encodings — shorter programs or representations that generate (uncompress to) the original data. The compressed representation is the compressor’s output.Set
CompressionQualityThe set of values measuring how well the compressor encodes the data. Typically measured as bits saved: |uncompressed| - |compressed|, or the ratio |compressed| / |uncompressed|.Set (R)
CompressionProgressThe set of values representing the first derivative of CompressionQuality with respect to time (or with respect to compressor improvement). Negative values = decompression; zero = stasis; positive = learning.Set (R)

Compressor identity condition: A Compressor is identified by its input-output behavior. Two compressors that produce identical compressed representations for all data sequences in the agent’s history are equivalent. The compressor is always evaluated on the same data history before and after modification (asynchronous evaluation requirement, Schmidhuber 2009, Appendix A.6).

Level 3 — Cognitive/Agent

ObjectDescriptionType
CuriousAgentThe set of agents equipped with an adaptive compressor and the intrinsic reward mechanism. The agent seeks to maximize cumulative intrinsic reward over time. An agent is identified by its current compressor state and data history.Set
IntrinsicRewardThe set of reward values generated internally by the agent, proportional to compression progress. Reward = α · Δ(compression_quality) where α > 0 is an intrinsic scaling factor.Set (R)
SubjectiveBeautyThe set of values representing the compressibility of a data sequence given the observer’s current knowledge and computational limits. Beauty is a stock: it measures how few bits are needed to encode the data. A fully compressed pattern is maximally beautiful; a novel pattern requiring new compression is minimally beautiful.Set (R)
SubjectiveInterestingnessThe set of values representing the first derivative of SubjectiveBeauty with respect to time: d(beauty)/dt. Interestingness is a flow: it measures how fast the observer is learning to compress better. What drives curiosity, attention, and exploration.Set (R)
LearningCurveThe set of trajectories (time-indexed sequences of CompressionQuality values) tracking how compression quality evolves over repeated exposure to a given data stream. A learning curve has positive slope when compression progress > 0, zero slope at plateau, and negative slope during forgetting or decompression.Set (functions R → R)

Subjectivity condition: All objects at Level 3 are observer-dependent — they depend on the specific CuriousAgent’s current compressor state and data history. Two agents with different compressors will assign different beauty and interestingness values to the same data sequence.

Level 4 — Action/Exploration

ObjectDescriptionType
ExplorationThe set of actions taken to seek new data that yields compression progress. Exploration is driven by intrinsic reward for compression progress (curiosity).Set
ExploitationThe set of actions that use known regularities without seeking new data. Exploitation occurs when no further compression progress is possible on available data.Set
DiscoveryThe set of discrete, large-magnitude improvements in compression — a “compression breakthrough.” A discovery is a step-change in CompressionQuality that is large relative to the baseline rate of compression progress.Set
BoringPredictabilityThe set of data sequences that have been fully compressed — their structure is completely known. No further compression progress is possible. Beauty is maximal; interestingness is zero.Subset of DataSequence
BoringRandomnessThe set of data sequences that are permanently incompressible — stochastic noise. No regularities exist to be discovered. Beauty is minimal (no compression possible); interestingness is zero.Subset of DataSequence

Boredom dichotomy: BoringPredictability ∪ BoringRandomness = BoringData (the union of both boredom types). BoringPredictability ∩ BoringRandomness = ∅. Every data sequence that is not boring (not in either subset) is “interesting” — compression progress is still possible. Note that a sequence cannot be both perfectly predictable and completely random, hence the disjointness condition.

Level 5 — Evolutionary Mapping

ObjectDescriptionType
GenomeAsCompressorThe set of genomes that function as compressors of environmental regularities. A genome encodes a strategy for surviving in an environment; the description length of “how to survive” is compressed into the genomic sequence.Set
FitnessAsCompressionQualityThe set of values representing how well the genome compresses the environment. Fitness = compression quality of the genome as a predictor/survival program. Higher fitness = better compression.Set (R)
MutationAsCompressorPerturbationThe set of mutational events that modify the genome, analogous to modifying the compressor algorithm. Single mutations are small perturbations; large-scale rearrangements are major compressor rewrites.Set
SelectionAsCompressionEvaluationThe set of selection processes that evaluate how well different genomes compress the environment. Natural selection is the evaluation function that compares compressed representations.Set
ClonalSweepAsDiscoveryThe set of clonal sweeps — rapid expansions of a lineage bearing a mutation that achieves qualitatively better compression of the environment. A sweep is the evolutionary analogue of a discovery (compression breakthrough).Set
CancerAsDecompressionThe set of malignant transformations representing progressive loss of the compressed regulatory program. Cancer = decompression through genomic instability and regulatory module failure.Set (subtype of DataSequence or a mapping from CompressedRepresentation to DataSequence)

Self-referential constraint (critical): In the compression framework, the compressor is distinct from the data it compresses. In evolution, the genome is both the compressor and the data being compressed — it is both the program that encodes survival strategy AND the sequence that is evaluated by selection. This is formalized as an endofunctor condition: the genome maps to itself under the compression mapping. This self-referential structure has no analogue in Schmidhuber’s original framework and is the root limitation of the compression-evolution isomorphism. All commutativity conditions involving Level 5 objects must account for this self-reference compression-progress-evolution.

2. Arrows

2.1 Compression Arrows

compresses: DataSequence × Compressor → CompressedRepresentation

A compressor, applied to a data sequence, yields a compressed representation. This is the fundamental compression operation. The same data compressed by different compressors yields different representations; the same compressor applied to different data yields representations of varying lengths.

Domain: (data, compressor) pairs. Codomain: compressed encodings.

hasQuality: CompressedRepresentation → CompressionQuality

Every compressed representation has a quality score — the number of bits saved relative to the uncompressed data (or relative to the previous best compression). This is the measure that drives all downstream cognitive and behavioral processes.

Domain: any compressed representation. Codomain: a real number.

improvesOn: Compressor_new × Compressor_old × DataSequence → CompressionProgress

Given a data sequence and two compressors (old and new), measures how many additional bits the new compressor saves over the old one. This is the fundamental measure of compression progress. Requires asynchronous evaluation: both compressors must be tested on the same data history to be fair schmidhuber2009-compression-progress.

Domain: (new compressor, old compressor, data) triples. Codomain: real number (positive = progress, zero = no improvement, negative = regression/decompression).

discovers: Compressor × DataSequence → Regularity

When a compressor discovers a pattern in data, it identifies a regularity — a compressible structure that allows the data to be encoded more compactly. This arrow represents the active discovery process, which is distinct from mere compression (the discovery of a new regularity changes the compressor itself).

Domain: (compressor, data) pairs. Codomain: the regularity discovered.

2.2 Aesthetic Arrows

hasBeauty: CompressedRepresentation × CuriousAgent → SubjectiveBeauty

A compressed representation has beauty for a given observer — the beauty score is how compressible the data is given the observer’s current knowledge. This is observer-dependent: what is beautiful to one agent (who has already compressed the pattern) may be opaque to another (who hasn’t yet discovered the regularity). Beauty is a stock — it measures the current state of compression, not the rate of improvement.

Domain: (representation, agent) pairs. Codomain: real number (higher = more compressible = more beautiful).

hasInterestingness: DataSequence × CuriousAgent → SubjectiveInterestingness

A data sequence has interestingness for a given observer — the degree to which exposure to this data yields compression progress. Interestingness is the first derivative of beauty with respect to time. A sequence is interesting iff the learning curve has positive slope (the observer is learning to compress it better). Once fully compressed, the same sequence becomes boring (zero interestingness).

Domain: (data, agent) pairs. Codomain: real number (positive = interesting, zero = boring, negative = actively confusing/decompressing).

isInteresting: DataSequence × CuriousAgent → Boolean

A data sequence is interesting for an agent iff compression progress is possible — iff the sequence is neither already fully compressed (BoringPredictability) nor permanently incompressible (BoringRandomness). This is the binary threshold function derived from the continuous interestingness measure.

Domain: (data, agent) pairs. Codomain: {true, false}.

2.3 Exploration Arrows

seeks: CuriousAgent → Exploration

An agent seeking exploration chooses actions that yield new data where compression progress is possible. The seeking behavior is driven by intrinsic reward for compression progress: the agent allocates attention to stimuli where d(beauty)/dt > 0. This arrow is active when the agent is in the “curious” state.

Domain: agents currently in exploration mode. Codomain: exploration actions.

exploits: CuriousAgent → Exploitation

An agent that exploits uses known regularities without seeking new compression. This occurs when no further compression progress is possible on available data — either because all data is fully compressed (boringly predictable) or because all available data is incompressible noise (boringly random). The exploration-exploitation boundary is determined by the compression progress gradient.

Domain: agents currently in exploitation mode. Codomain: exploitation actions.

discovers: CuriousAgent × DataSequence → Discovery

When an agent encounters a data sequence that yields a large, discrete improvement in compression quality, the result is a discovery — a compression breakthrough. The threshold for discovery is: CompressionProgress > avg(CompressionProgress) + k · stddev(CompressionProgress) for some k > 1. Normal learning is continuous; discovery is discontinuous.

Domain: (agent, data) pairs that yield breakthroughs. Codomain: discovery events.

generatesReward: CompressionProgress → IntrinsicReward

Compression progress generates intrinsic reward. The reward is proportional to bits saved: reward = α · CompressionProgress, where α > 0. This is the core motivational mechanism — the agent is pre-wired to find compression progress rewarding, which drives learning without requiring external rewards. The proportionality constant α determines the agent’s “curiosity intensity.”

Domain: positive compression progress values. Codomain: reward values.

2.4 Evolutionary Arrows

encodes: GenomeAsCompressor × Environment → CompressedRepresentation

A genome, operating in an environment, produces a compressed representation — the organism’s phenotype as an encoding of the environmental challenge. This is the evolutionary analogue of compresses: the genome compresses the environment into a survival strategy. Unlike the computational compressor, however, the genome’s encoding is evaluated over generations, not over a single learning episode.

Domain: (genome, environment) pairs. Codomain: compressed representations (phenotypic strategies).

evaluates: SelectionAsCompressionEvaluation × CompressedRepresentation → FitnessAsCompressionQuality

Natural selection evaluates a compressed representation (the phenotypic strategy encoded by the genome) and assigns a fitness value. The fitness value IS the compression quality — how well the genome’s program compresses the environmental challenge into a viable survival strategy. A genome that encodes the environment efficiently (short prediction program, accurate survival decisions) has high fitness.

Domain: (selection process, compressed representation) pairs. Codomain: fitness values.

perturbs: MutationAsCompressorPerturbation × GenomeAsCompressor → GenomeAsCompressor

A mutation perturbs the genome, yielding a new genome — a modified compressor. This is the evolutionary analogue of compressor improvement. The arrow is a function on the GenomeAsCompressor set: given a mutation and an existing genome, produce a modified genome. Note that mutations are generally random with respect to fitness effects; the compression framework provides no guarantee that perturbations improve compression.

Domain: (mutation, existing genome) pairs. Codomain: modified genomes.

sweeps: ClonalSweepAsDiscovery × CompressedRepresentation_old → CompressedRepresentation_new

A clonal sweep replaces an old compressed representation with a new, better one. The old genotype (whose compressed representation was suboptimal) is outcompeted by the new genotype (whose compression is better). This is the evolutionary analogue of discovers: a clonal sweep IS a discovery at the population level — a discrete compression breakthrough that changes which genomic program dominates.

Domain: (sweep event, old representation) pairs. Codomain: new representations.

decompresses: CancerAsDecompression × CompressedRepresentation → DataSequence

Cancer is decompression: it takes a compressed representation (a normal cell’s regulatory program) and degrades it, producing something closer to “raw data” — a cell that has lost its regulatory structure and reverts to default behavior (unchecked proliferation). Unlike all other arrows in the olog, this arrow is an inverse of compression: it increases description length rather than decreasing it. Critically, this inverse is not uniquely defined — many different decompression paths can produce the same degraded phenotype, and decompression is not the inverse of any specific compressors: the identity decompresses(compresses(d)) = d does not hold in general.

Domain: (decompression event, compressed program) pairs. Codomain: decompressed data sequences.

3. Commutativity Conditions

Condition 1: Compression Progress = d(Beauty)/dt

The compression progress achieved by improving a compressor on a data sequence must equal the change in subjective beauty of the compressed representation for the agent.

Formal statement: The following diagram commutes:

Compressor_new × Compressor_old × DataSequence ──improvesOn──→ CompressionProgress
                              │                                       │
                              │ (project new compressor,              │
                              │  project old compressor,              │
                              │  evaluate both on same data)          │
                              │                                       │
                              ▼                                       ▼
SubjectiveBeauty_new ────(subtract)───→ SubjectiveBeauty_difference = CompressionProgress

Equivalently: For a fixed agent, data sequence d, old compressor C_old, and new compressor C_new:

CompressionProgress(C_new, C_old, d) = SubjectiveBeauty(compresses(d, C_new), agent) - SubjectiveBeauty(compresses(d, C_old), agent)

Basis: This is the core identity of Schmidhuber’s framework — compression progress IS the improvement in subjective compressibility. The beauty difference across a compressor update equals the compression progress. (Schmidhuber, 2009, Sections 2.3–2.4)

Limitation: This holds only when the agent’s computational limits are constant across the comparison. If the agent’s architecture changes (new learning algorithm, different computational resources), the beauty metric shifts and the subtraction is apples-to-oranges. Valid comparisons require the asynchronous evaluation condition (same agent, same data, different compressor) (Schmidhuber, 2009, Appendix A.6).

Condition 2: Interestingness Gradient

SubjectiveInterestingness = d(SubjectiveBeauty)/dt. A data sequence is interesting iff the learning curve has positive slope.

Formal statement: For any agent A and data sequence d, let t index time (or learning episodes). Then:

SubjectiveInterestingness(d, A) = d(SubjectiveBeauty(compresses(d, C_t), A))/dt

where C_t is the agent’s compressor at time t. The binary predicate isInteresting(d, A) = true iff d(beauty)/dt > 0.

Basis: This is Schmidhuber’s central conceptual contribution: the distinction between beauty (stock) and interestingness (flow). All cognitive phenomena involving curiosity, attention, and preference for novelty follow from this derivative relationship. (Schmidhuber, 2009, Section 2.4)

Limitation: In discrete time (which is the only tractable case), the derivative must be approximated by finite differences. The choice of Δt affects what counts as “interesting” — a very large Δt may smooth over interesting transients; a very small Δt may be dominated by noise. There is no principled criterion for choosing Δt in the framework.

Condition 3: Exploration-Exploitation Boundary

The CuriousAgent’s behavioral mode is determined by the compression progress afforded by the current data stream.

Formal statement: For a CuriousAgent A with data stream d_t at time t:

  • If CompressionProgress(C_t, C_{t-1}, d_t) > 0, then seeks(A) ∈ Exploration (exploration is active)
  • If CompressionProgress ≈ 0 AND CompressionQuality(compresses(d_t, C_t)) is high (above a threshold), then exploits(A) ∈ Exploitation (exploitation of well-compressed data)
  • If CompressionProgress ≈ 0 AND CompressionQuality is low, the agent is stuck in a region where neither exploration nor exploitation is productive (boring randomness)

Basis: Schmidhuber’s optimal curious agent (Section 2.7) maximizes intrinsic reward by allocating attention to data where compression progress is expected. The exploration-exploitation boundary is determined by the gradient of the learning curve. (Schmidhuber, 2009, Section 2.7)

Limitation: The boundary requires a threshold for “compression progress ≈ 0.” In practice, the agent must estimate expected compression progress for unobserved data, which introduces an exploration-exploitation dilemma at the meta-level: should the agent explore new data (to potentially discover compressible structure) or exploit existing compressible data? The framework does not provide an optimal solution to this meta-problem — it inherits the same exploration-exploitation trade-off it attempts to resolve.

Condition 4: Boredom Dichotomy

A data sequence is boring iff it belongs to either BoringPredictability or BoringRandomness. Interesting data lies in the complement.

Formal statement: Let B = BoringPredictability ∪ BoringRandomness. Then for any agent A and data sequence d:

isInteresting(d, A) = false iff d ∈ B.

For d ∈ BoringPredictability: CompressionQuality is maximal and CompressionProgress = 0 (the data is fully compressed; no further learning possible). For d ∈ BoringRandomness: CompressionQuality ≈ |d| (no compression has been achieved) AND CompressionProgress = 0 (no regularities exist to be discovered; this is not a failure of the compressor but a property of the data).

Basis: Schmidhuber’s key refinement of the concept of surprise: Shannon surprise (negative log probability) does not distinguish between white noise (which is boring despite being maximally surprising in Shannon sense) and a learnable pattern (which is interesting because compression progress is possible). The boredom dichotomy is the framework’s most empirically successful prediction — it matches human aesthetic preferences (people find neither pure noise nor pure repetition interesting). (Schmidhuber, 2009, Section 2.6)

Limitation: In practice, the boundary between BoringRandomness and “data that is learnable with more computational power” is porous. Data that appears incompressible to a bounded compressor may be highly compressible to a more powerful one. The framework defines boredom relative to the agent’s computational capacity, but provides no theory of computational capacity ceilings. “Incompressible for this agent” does not equal “incompressible in principle.”

Condition 5: Discovery as Discontinuity

A Discovery is a step-change in CompressionQuality that is large relative to the baseline rate of compression progress.

Formal statement: Let ΔQ(t) = CompressionQuality(t) - CompressionQuality(t-1) be the compression progress at time t. Let μ = mean(ΔQ) and σ = std(ΔQ) over a recent window. Then an event at time t is a Discovery iff:

ΔQ(t) > μ + k · σ for some threshold k > 1 (typically k = 2 or 3)

Normal learning (iteration of the compressor on familiar data) produces small, continuous improvements in compression. Discovery (a fundamentally new regularity) produces a discontinuity — a qualitative jump in compression quality.

Basis: Schmidhuber discusses “compression breakthroughs” as the mechanism behind creativity, scientific discovery, and humor (Sections 2.10–2.16). A joke is a compression breakthrough (the punchline restructures the preceding context into a shorter description); a scientific theory is a compression breakthrough (it unifies previously disparate phenomena under a shorter explanation). (Schmidhuber, 2009, Sections 2.10, 2.15)

Limitation: The threshold k is arbitrary. Without a principled criterion for distinguishing “discontinuous progress” from “the upper tail of a continuous distribution,” the discovery/non-discovery boundary is a modeling choice, not an empirical fact. Furthermore, the same event may be a discovery for one agent (who has never seen the pattern before) and normal learning for another (who is refining a known regularity). Subjectivity of discovery is a feature of the framework, but it means the olog cannot provide an objective definition.

Condition 6: Evolution-Compression Isomorphism

This is the core functorial claim connecting Level 5 (evolutionary) objects to Level 2–4 (compression) objects. The functor G: Olog_Evolution → Olog_Compression maps evolutionary concepts to their compression analogues.

Formal statement — commutative subdiagrams:

(6a) Encoding commutes with compression:

GenomeAsCompressor × Environment ──encodes──→ CompressedRepresentation
       │                                              │
       │ G │                                        G │
       │   │                                         │
       ▼   ▼                                         ▼
  Compressor × DataSequence ──compresses──→ CompressedRepresentation

The genome’s encoding of the environment IS the compression operation. A genome that efficiently encodes environmental regularities produces a compressed representation (a well-adapted phenotype).

Commutes when: The environment’s structure is weakly stationary (the statistical regularities that selection acts on are preserved across the timescale of adaptation). If the environment changes faster than selection can track, the genome’s compression is of out-of-date data, and the encoding fails to be a valid compression of current conditions.

(6b) Evaluation commutes with quality measurement:

SelectionAsCompressionEvaluation × CompressedRepresentation ──evaluates──→ FitnessAsCompressionQuality
       │                                                                          │
       │  G │                                                                   G │
       │     │                                                                     │
       ▼     ▼                                                                     ▼
   Compressor × DataSequence ──hasQuality──→ CompressionQuality

Selection evaluates the quality of the genome’s compression of the environment. Fitness IS compression quality. A genome that compresses the environment better (makes better predictions about survival-relevant variables) has higher fitness.

Commutes when: Selection acts directly on the compressed representation (i.e., on the phenotype) without confounding factors. When selection is frequency-dependent, niche-constructed, or otherwise indirect, the evaluation path is not a simple function — fitness depends on what other genomes are doing, not just on the genome-environment compression.

(6c) Mutation commutes with compressor perturbation:

MutationAsCompressorPerturbation × GenomeAsCompressor ──perturbs──→ GenomeAsCompressor
       │                                                                     │
       │ G │                                                               G │
       │     │                                                               │
       ▼     ▼                                                               ▼
   Compressor_new × Compressor_old × DataSequence ──improvesOn──→ CompressionProgress

A mutation modifies the genome, producing a new genome that compresses the environment differently. The compression progress achieved by the mutation is how much better (or worse) the new genome compresses the environment compared to the old one.

Does NOT commute when: The mutation is neutral (passenger mutation, drift regime). A neutral mutation increases the genome’s description length without affecting the organism’s compression of the environment. There is no compression analogue of “an irrelevant bit that doesn’t change the program’s output but gets copied anyway.” The functor G is only defined on the subcategory of fitness-affecting mutations (compression-progress-evolution).

(6d) Clonal sweep commutes with discovery:

ClonalSweepAsDiscovery × CompressedRepresentation_old ──sweeps──→ CompressedRepresentation_new
       │                                                                          │
       │ G │                                                                   G │
       │     │                                                                     │
       ▼     ▼                                                                     ▼
   Discovery_even ────────────────────────→ CompressionProgress

A clonal sweep is a discrete compression breakthrough at the population level — the old genome (suboptimal compression) is replaced by the new genome (better compression). The sweep replaces the old compressed representation with a new one in the population.

Commutes when: The fitness advantage of the new compression is large enough that selection overcomes drift (s >> 1/N, the Bozic-Nowak deterministic sweep regime). For weakly beneficial mutations (s ~ 1/N), the sweep may not complete, and the discovery (the new compression) does not replace the old representation — the population may contain multiple competing compressions (intratumor heterogeneity as “failed discovery”).

(6e) Where the isomorphism breaks — decompression has no compression analogue:

CancerAsDecompression × CompressedRepresentation ──decompresses──→ DataSequence

This arrow does NOT commute with any compression arrow. Decompression is the inverse of compression — it takes a compressed program and produces (or reverts to) raw, unstructured data. The inverse of compression is not uniquely defined: many different decompression paths (different sequences of genomic alterations, different orders of regulatory module failure) can produce the same decompressed state. There is no arrow f: CompressedRepresentation → DataSequence in the compression framework because the compression framework is about reducing description length, not increasing it.

Significance of the failure: This is the most important limitation of the compression-evolution isomorphism. In evolution, decompression (cancer) is a natural and inevitable process — genomes that have been optimized for one environment may fail (decompress) when the environment changes. In Schmidhuber’s framework, decompression does not occur — the compressor either compresses better, compresses the same, or is inviable. Evolution, unlike Schmidhuber’s artificial agents, can lose compression. The olog captures this asymmetry: the evolutionary domain has decompression as a basic operation; the compression domain does not. This is a genuine disanalogy that no functor can bridge.

Condition 7: Self-Referential Limit

The genome is both compressor and compressed data.

In Schmidhuber’s framework, the compressor and the data are distinct objects: compresses: DataSequence × Compressor → CompressedRepresentation. The compressor acts on data from outside.

In evolution, the genome is the compressor (it encodes a survival strategy) AND the data being compressed (the genome sequence itself is what selection evaluates). This creates a self-referential loop:

GenomeAsCompressor ──acts as──→ DataSequence (the genome IS the data)
      ↑                                   │
      │ (the genome encodes)              │ (selection evaluates the sequence)
      │                                   │
      └───────────────────────────────────┘

Formal statement: There exists a mapping E: GenomeAsCompressor → DataSequence that identifies the genome as a data sequence. This mapping is an endofunctor — a functor from the evolutionary subcategory to itself — rather than a functor between distinct categories.

Consequences for commutativity (self-referential correction to Condition 6):

All commutativity conditions in Condition 6 that involve GenomeAsCompressor must be read with the self-referential correction: the diagram does NOT fully commute because the genome, when treated as a data sequence, must be compressed by itself. This is analogous to a universal Turing machine taking itself as input — the operation is well-defined but the results are not guaranteed to be identical to those of a non-self-referential compressor.

Where this matters concretely:

  • The mutation arrow perturbs: MutationAsCompressorPerturbation × GenomeAsCompressor → GenomeAsCompressor changes both the compressor AND the data simultaneously. In Schmidhuber’s framework, the compressor is modified independently of the data. In evolution, modifying the genome changes both what encodes and what is encoded — a perturbation to the compression algorithm IS a perturbation to the data being compressed.
  • The evaluation arrow evaluates: SelectionAsCompressionEvaluation × CompressedRepresentation → FitnessAsCompressionQuality evaluates the genome-as-compressed-data, not the genome-as-compressor. But the compressed representation that selection evaluates (the phenotype) is a function of the genome-as-compressor (the program) acting on the environment. So selection evaluates the COMPRESSOR’S OUTPUT, not the COMPRESSOR itself. In Schmidhuber’s framework, the compression quality is a property of the compressed representation alone. In evolution, the compressed representation (phenotype) is evaluated, but the compressor (genotype) is what evolves. This disconnect is the deepest formal challenge for the isomorphism.

Basis: Identified in the category-theoretic validation section of compression-progress-evolution. Not addressed by Schmidhuber (2009) or by Buehler et al. (2011). This is the wiki’s original analysis.

Condition 8: Bottleneck as Forced Decompression

A population bottleneck (therapy-induced collapse) forces a reset of the compressor — the existing compressed representation is destroyed, and the surviving clones must re-compress from scratch.

Formal statement: Let B: BottleneckEvent × CompressedRepresentation → CompressedRepresentation’ be a function that maps a compressed representation (a well-adapted genome/phenotype) to a degraded representation (a genome in a post-bottleneck environment). The bottleneck event satisfies:

  • CompressionQuality(B(compressed)) < CompressionQuality(compressed) — the bottleneck reduces compression quality
  • CompressionProgress(B(compressed)) may be negative (active decompression) or zero (stasis at reduced quality)
  • The severity of decompression determines the subsequent exploration regime:
    • Shallow bottleneck: CompressionQuality loss < threshold τ. The agent can repair existing compression (return to roughly the same compressed representation) through small adjustments.
    • Deep bottleneck: CompressionQuality loss >> τ. The previous compression is largely destroyed. The agent must re-explore — generating diverse new compressions (diversified relapse, branching architecture).

Mapping to evolutionary dynamics (from population-bottleneck):

In the Myeloma XI trial (Miething, 2019), patients achieving CR/vgPR (complete response/very good partial response) showed branched clonal architecture at relapse — the deep bottleneck forced re-diversification. Incomplete responders maintained linear or stable patterns — the shallow bottleneck preserved pre-existing clonal structure. This is consistent with:

DeepBottleneck → RenewedExploration → DiverseNewCompressions → BranchingRelapse
ShallowBottleneck → RepairExistingCompression → PreservedArchitecture → LinearRelapse

Basis: Synthesized from the bottleneck paradox analysis in compression-progress-evolution and population-bottleneck. The compression framework provides a formal language for describing the clinically observed pattern: deeper responses (more complete decompression) generate more diverse relapses (more exploration of compression space).

Limitation: The threshold τ between shallow and deep decompression is unknown and likely context-dependent (tissue type, therapy mechanism, pre-treatment genomic architecture). The olog formalizes the structure of the relationship but provides no empirical values for thresholds.

4. Hierarchical Subcategory H (Forest Structure)

The hierarchical subcategory H organizes all objects into a forest with five levels. Each object belongs to exactly one level; arrows between levels represent hierarchical interactions.

Level 1 (Data):        DataSequence ──contains──→ Regularity
                                                  Noise
                       KolmogorovComplexity ──measures──→ DataSequence (theoretical limit)

Level 2 (Compression): Compressor ──produces──→ CompressedRepresentation
                       CompressedRepresentation ──has──→ CompressionQuality
                       CompressionQuality ──changes──→ CompressionProgress

Level 3 (Cognitive):   CuriousAgent ──possesses──→ Compressor
                                          ├──→ IntrinsicReward
                                          ├──→ LearningCurve
                                          └──→ SubjectiveBeauty (for representations)
                       SubjectiveBeauty ──derivative──→ SubjectiveInterestingness

Level 4 (Behavioral):  CuriousAgent ──engages in──→ Exploration
                                          └──→ Exploitation
                       CuriousAgent × DataSequence ──yields──→ Discovery

Level 5 (Evolutionary): GenomeAsCompressor ──encodes──→ Environment ──produces──→ Phenotype
                                             ←──self-reference──→
                        MutationAsCompressorPerturbation ──modifies──→ GenomeAsCompressor
                        SelectionAsCompressionEvaluation ──evaluates──→ GenomeAsCompressor → FitnessAsCompressionQuality
                        ClonalSweepAsDiscovery ──replaces──→ CompressedRepresentation_old → CompressedRepresentation_new
                        CancerAsDecompression ──degrades──→ CompressedRepresentation → DataSequence

Hierarchical interaction condition: A property of a higher-level structure can depend on an element of a lower-level structure (following Buehler et al., 2011, Section 3, Figure 2). For example:

  • The CompressionQuality of a CompressedRepresentation (Level 2) depends on the properties of the DataSequence (Level 1) — specifically, on the amount of Regularity present.
  • The SubjectiveBeauty (Level 3) depends on CompressedRepresentation (Level 2) and CuriousAgent (Level 3).
  • The Discovery (Level 4) depends on CompressionProgress (Level 2) exceeding a threshold relative to baseline learning rate (Level 3 via LearningCurve).

These cross-level dependencies are allowed but must be explicitly annotated when they descend more than one level (e.g., Level 4 depending on Level 1 must pass through Level 2 or Level 3 first, preserving the forest hierarchy).

5. Mermaid Diagram

flowchart TB
    subgraph L1["Level 1: Data/Information"]
        DS[DataSequence]
        R[Regularity]
        N[Noise]
        KC[KolmogorovComplexity]
    end

    subgraph L2["Level 2: Compressor"]
        C[Compressor]
        CR[CompressedRepresentation]
        CQ[CompressionQuality]
        CP[CompressionProgress]
    end

    subgraph L3["Level 3: Cognitive/Agent"]
        CA[CuriousAgent]
        IR[IntrinsicReward]
        SB[SubjectiveBeauty]
        SI[SubjectiveInterestingness]
    end

    subgraph L4["Level 4: Action/Exploration"]
        E[Exploration]
        ET[Exploitation]
        D[Discovery]
    end

    subgraph L5["Level 5: Evolutionary Mapping"]
        GC[GenomeAsCompressor]
        F[FitnessAsCompressionQuality]
        M[MutationAsCompressorPerturbation]
        S[SelectionAsCompressionEvaluation]
        CS[ClonalSweepAsDiscovery]
        CD[CancerAsDecompression]
    end

    %% Level 1 → Level 2
    DS -->|contains| R
    DS -->|contains| N
    KC -.->|measures theoretical limit| DS
    C -->|compresses DS×C| CR
    CR -->|hasQuality| CQ
    CQ -->|improvesOn| CP

    %% Level 1 + 2 → Level 3
    CR -->|hasBeauty, with CA| SB
    DS -->|hasInterestingness, with CA| SI
    CP -->|generatesReward| IR
    SB -.->|d/dt| SI
    CA -->|possesses| C
    CA -.->|evaluates beauty for| SB

    %% Level 3 → Level 4
    SI -->|positive? →| E
    SI -->|zero? →| ET
    D -->|CompressionProgress > μ + kσ| CP
    IR -->|drives| E

    %% Level 5 internal (evolutionary)
    GC -->|encodes environment| CR
    S -->|evaluates CR| F
    M -->|perturbs| GC
    CS -->|sweeps, replaces old CR| CR
    CD -->|decompresses| CR

    %% Cross-level arrows (evolutionary ↔ compression)
    GC -.->|self-referential: IS the data| DS
    CD -.->|maps to data, not compression| DS

    %% Style
    classDef data fill:#e1f5fe,stroke:#0288d1
    classDef compression fill:#e8f5e9,stroke:#388e3c
    classDef cognitive fill:#fff3e0,stroke:#f57c00
    classDef behavioral fill:#fce4ec,stroke:#d32f2f
    classDef evolutionary fill:#f3e5f5,stroke:#7b1fa2
    class DS,R,N,KC data
    class C,CR,CQ,CP compression
    class CA,IR,SB,SI cognitive
    class E,ET,D behavioral
    class GC,F,M,S,CS,CD evolutionary

Caption: Hierarchical olog for the compression progress domain, organized by level. Solid arrows represent functorial mappings within the domain; dotted arrows represent cross-level hierarchical interactions and the self-referential constraint. The evolutionary mapping level (Level 5) is connected to Level 1 via the self-referential constraint (the genome is both compressor and data) and to Level 2 via the compression arrows (genome compresses environment → compressed representation → quality → progress). Synthesis of Schmidhuber (2009), Buehler et al. (2011), and the compression-evolution isomorphism (compression-progress-evolution).

6. Functorial Mapping Notes

6.1 Target: Olog_Cancer

When the cancer olog is constructed, the functor F: Olog_Compression → Olog_Cancer should map:

Compression ObjectCancer ObjectStatus
DataSequenceTumorMicroenvironmentSignalexpected to commute
CompressorGenomeAsRegulatoryProgramexpected to commute
CompressedRepresentationNormalCellPhenotypeexpected to commute
CompressionQualityCellFitnessexpected to commute
CompressionProgressFitnessGradientexpected to commute
DiscoveryClonalSweepexpected to commute
CancerAsDecompressionMalignantTransformationidentity (same concept)
BoringPredictabilityMonoclonalDominancecandidate
BoringRandomnessHighIntratumorHeterogeneitycandidate
ExplorationAdaptiveTherapyExplorationcandidate
ExploitationCytotoxicTherapycandidate

6.2 Target: Olog_Ecology

When the ecology olog is constructed, the functor G: Olog_Compression → Olog_Ecology should map:

Compression ObjectEcology ObjectStatus
DataSequenceEnvironmentalNichecandidate
CompressorPopulationGenePoolcandidate
CompressedRepresentationEcosystemStructurecandidate
CompressionQualityEcosystemFitness/Stabilitycandidate
DiscoveryEvolutionaryInnovationcandidate
ExplorationRangeExpansioncandidate

6.3 Verification Criterion

Each functor must satisfy the commutativity condition: for every commutative diagram in the source olog (Compression), the image diagram in the target olog must also commute. When a diagram fails to commute, the failure is informative — it identifies where the analogy breaks and what additional structure is needed in the target domain to restore consistency.

The three functors (Compression → Cancer, Compression → Ecology, and the secondary functor Cancer → Ecology) will be constructed and their commutativity verified in a subsequent analysis step. The present olog is the source category for these functorial mappings.

7. Limitations Specific to the Olog

  1. Uncomputability of Kolmogorov complexity. The theoretical foundation of the compression framework (Kolmogorov complexity) is uncomputable. The olog includes KolmogorovComplexity as an object, but the associated arrows (measuring, comparing) are not computable functions. Practical approximations (neural network compressors, prediction-error-based reward) have no formal guarantee of approximating the ideal Kolmogorov-based measures. This is a foundational limitation inherited from the source theory schmidhuber2009-compression-progress.

  2. The olog is descriptive, not predictive. Like Buehler et al.’s (2011) silk-music analogy, this olog captures the structural relationships within the compression framework but makes no novel predictions. The commutativity conditions are identities derived from definitions, not empirical claims subject to falsification. The framework’s unfalsifiability risk (identified in schmidhuber2009-compression-progress) carries over to the olog.

  3. No temporal dynamics in the base formalism. Buehler’s ologs are static — they capture structure at a point in time. Compression progress, interestingness, and learning are inherently temporal. The olog handles dynamics through derivative relationships (Condition 2) that are not native to the category-theoretic formalism. A more complete treatment would require a time-indexed category (a functor category from a time category to the olog), which is not developed here buehler2011-reoccurring-patterns.

  4. Evolution cannot ignore noise. In Schmidhuber’s framework, the curious agent can avoid incompressible data and seek learnable regularities. Evolution cannot — if the environment contains stochastic elements that affect survival, the lineage must encode responses to them, compressible or not. This means the evolutionary olog must include an arrow that compresses noise (which is impossible in the pure compression framework). The olog captures this as a disanalogy in Condition 6e, but does not resolve it compression-progress-evolution.

  5. Self-referential constraint limits functoriality. The endofunctor status of the genome (Condition 7) means that the functor G is not a standard functor between distinct categories but a more complex structure involving self-maps. Buehler et al.’s (2011) framework does not address endofunctors or self-referential categories. The functorial mapping between the compression olog and the evolutionary sub-olog is therefore approximate and requires relaxation of the strict commutativity condition.

  6. Deterministic assumption. Category theory in its standard form is silent on probability. Evolution is dominated by stochastic drift, yet the olog’s arrows are deterministic functions. A probabilistic or Markov-categorical extension would be needed for fully accurate evolutionary modeling. The olog in its current form captures the deterministic structure of strong selection (s >> 1/N) but not the drift regime (s ~ 1/N) buehler2011-reoccurring-patterns.