Subclonal Architecture

Definition

Subclonal architecture is the composition, phylogenetic structure, and relative abundance of genetically distinct subclonal populations within a tumor. It describes which mutations coexist in which clones, the ancestral relationships between clones, and the fraction of tumor cells each clone occupies.

Components

A full description of subclonal architecture comprises:

  • Clone genotypes: the set of somatic mutations (SNVs, indels, CNAs, structural variants) that define each clone
  • Clone frequencies: the cancer-cell-fraction (CCF) of each clone — the proportion of cancer cells carrying that clone’s defining mutations
  • Phylogenetic relationships: the ancestral hierarchy of clones, represented as a phylogenetic-tree
  • Spatial distribution: the geographic arrangement of clones across the tumor (requires multi-region-sequencing)

Relationship to Molecular Clock

Subclonal architecture is the structural description from which molecular-clock timing inferences are derived. The clock does not measure time directly — it uses the passenger mutation burden on phylogenetic branches, and the branch structure itself is defined by the subclonal architecture. Three architectural features provide the raw material for clock inference:

  • Passenger counts per branch: The number of passenger mutations unique to each subclonal lineage provides the clock signal. Clonal (truncal) mutations with CCF = 1.0 accumulated before the tumor’s most recent common ancestor; subclonal mutations with CCF < 1.0 accumulated after branching. The deeper a branch in the phylogeny, the older the event (Gerstung et al., 2020).

  • VAF distributions as temporal stratification: The variant-allele-fraction distribution of mutations stratifies them by evolutionary time — high-VAF mutations are early (clonal), low-VAF mutations are late (subclonal). This VAF-to-time mapping underpins clock-based inference and was used by Nik-Zainal et al. (2012) to demonstrate that different mutational processes (e.g., APOBEC) are active at different phases of clonal evolution.

  • CNA timing via mutation copy-number states: Mutations occurring before a chromosomal gain are present on both copies of the gained segment; mutations occurring after are present on only one copy. The ratio of two-copy to one-copy mutations on gained chromosomal segments times the gain event relative to point mutations, anchoring the clock to large-scale genomic events. Gerstung et al. (2020) used this method to show that chromosomal gains occur predominantly early and that whole-genome-duplication precedes extensive subclonal diversification.

Subclonal architecture is the clock’s necessary substrate — without a correctly reconstructed phylogeny, clone frequencies, and multiplicity assignments, clock inferences are ungrounded. Conversely, the clock transforms the static architecture into a dynamic timeline, revealing when each architectural element (clone, driver mutation, copy-number alteration) arose.

Inference

Subclonal architecture is inferred through subclonal-reconstruction — a multi-step computational process that involves copy number reconstruction, SNV-to-clone assignment, multiplicity determination (how many allelic copies carry each mutation), and phylogenetic tree construction (Tarabichi et al., 2021).

The inference process is error-prone. Errors in each step propagate through the analysis. As Tarabichi et al. (2021) note, in a sample of 50% purity, the difference in variant-allele-fraction of an SNV present on one of three copies versus one of four copies is only ~3% — a level of accuracy rarely achieved with moderate-depth (~100x) sequencing. Consequently, “errors in the allelic copy number inference…propagate to produce an erroneous clone phylogenetic tree and give a misleading picture of the clonal structure of a tumour” (Turajlic et al., 2019, p. 408).

The crossing-rule (pigeonhole principle). When multiple tumor regions are sampled, the crossing-rule provides a decisive test for branching versus linear phylogenies (Tarabichi et al., 2021). If clone A has higher CCF than clone B in one region but lower in another, A and B cannot be ancestor-descendant — they must be sibling clones arising from a shared ancestor. This is because an ancestor clone must be present in at least as many cells as its descendant in every sampled region (the descendant inherits all the ancestor’s mutations). A crossing of CCF ranks between regions violates this constraint and reveals branching.

Consider two regions (R1, R2) and two mutation clusters (A, B):

ClusterCCF in R1CCF in R2
A0.800.30
B0.450.65

In R1, A > B (0.80 > 0.45), suggesting A could be ancestral to B. In R2, B > A (0.65 > 0.30), suggesting B could be ancestral to A. The reversal is a crossing: neither clone can be the ancestor of the other. They must be sibling branches descended from a common (undetected or clonal) ancestor, with different spatial distributions across the two regions. This principle is the foundation of multi-sample phylogenetic reconstruction and is implemented in tools such as PhyloWGS (Tarabichi et al., 2021).

Detection limits and systematic incompleteness. Two independent lines of analysis converge on the same blind zone in subclonal architecture inference. From the sequencing side, Tarabichi et al. (2021) establish that mutations with CCF below ~0.05-0.10 are undetectable at standard (~100x) sequencing depths. From the evolutionary side, Turajlic et al. (2019) calculate that ~7 population doublings are undetectable at 100x — because each doubling halves the frequency of newly arising mutations, and after ~7 doublings, they fall below the detection threshold. These two derivations reach the same number from different starting points: sequencing sensitivity and population dynamics, respectively. The implication is that subclonal architecture as described from bulk sequencing is systematically incomplete — the most recent ~7 doublings of the tumor’s evolutionary history are invisible. The terminal branches of the inferred phylogeny are truncated, and the subclones we can describe are those that have expanded substantially since their origin, not the full set of genetically distinct populations present in the tumor.

Evolutionary Patterns

Different subclonal architectures reflect different evolutionary histories:

  • Punctuated: low subclonal diversity, early clonal aneuploidy, rapid sweeps (Turajlic et al., 2019)
  • Gradual/Darwinian: high subclonal diversity, ongoing diversification, multiple coexisting clones (Turajlic et al., 2019)
  • Neutral: subclonal diversity consistent with mutation and drift, no evidence of ongoing selection (Williams et al., 2016, cited in Turajlic et al., 2019)
flowchart TD
    subgraph Punctuated["Punctuated (burst early, then stasis)"]
        direction LR
        P0(Founder) --> P1(Clone A<br/>many drivers<br/>in one burst)
        P0 --> P2(Extinct)
        P1 --> P3(Low diversity<br/>all cells carry<br/>same clonal drivers)
    end
    subgraph Gradual["Gradual / Darwinian (ongoing selection)"]
        direction LR
        G0(Founder) --> G1(Clone A)
        G0 --> G2(Clone B)
        G1 --> G3(Clone A1)
        G1 --> G4(Clone A2)
        G2 --> G5(Clone B1)
    end
    subgraph Neutral["Neutral (drift only)"]
        direction LR
        N0(Founder) --> N1(Clone A)
        N0 --> N2(Clone B)
        N0 --> N3(Clone C)
        N0 --> N4(Clone D)
    end
    Punctuated ---|"Rapid progression<br/>low ITH<br/>clonal aneuploidy"| Punctuated
    Gradual ---|"Ongoing diversification<br/>high ITH<br/>resistance-prone"| Gradual
    Neutral ---|"Passenger accumulation<br/>no detectable selection<br/>1/f² VAF spectrum"| Neutral

The three patterns are not mutually exclusive — a single tumor may show punctuated bursts superimposed on a background of neutral evolution, or a punctuated event followed by gradual diversification of the resulting dominant clone.

Clinical Significance

Tumors with complex subclonal structures are more aggressive, more likely to develop drug resistance, and more likely to metastasize (Tarabichi et al., 2021). Understanding subclonal architecture is therefore central to prognosis and therapy selection. Truncal (clonal) mutations present in all cells represent more homogeneous therapeutic targets, while subclonal mutations — even in driver genes — may have limited therapeutic value because they are absent from a substantial fraction of the tumor (McGranahan & Swanton, 2017).