Subclonal Reconstruction

Definition

Subclonal reconstruction is the computational inference of a tumor’s clonal composition — clone genotypes, clone frequencies, and phylogenetic relationships — from bulk DNA sequencing data (Tarabichi et al., 2021). It has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes.

Three Core Tasks

Subclonal reconstruction involves three key aspects (Tarabichi et al., 2021):

  1. Clone identification: Characterizing the major cell populations by identifying which somatic mutations co-occur in each clone
  2. Frequency estimation: Quantifying the proportion of cells from each clone (its cancer-cell-fraction or cellular prevalence)
  3. Phylogenetic inference: Reconstructing the ancestral relationships between clones — the phylogenetic-tree

The Standard Workflow

For single-sample reconstruction:

  1. Copy number reconstruction: Infer regions of clonal and subclonal copy number change from read depth (logR) and B-allele frequency (BAF) data
  2. VAF-to-CCF translation: Convert observed variant-allele-fraction to cancer-cell-fraction by correcting for sample purity and local copy number
  3. SNV clustering: Group mutations with similar CCFs, assuming they belong to the same clone (using Bayesian Dirichlet processes, MCMC sampling)
  4. Multiplicity determination: Infer how many allelic copies carry each mutation (integer programming)
  5. Phylogenetic tree construction: Build the clonal phylogeny using maximum parsimony or probabilistic models

Multi-sample reconstruction adds the crossing-rule: if clone A has higher CCF than clone B in one region but lower in another, they cannot be in a simple ancestor-descendant relationship — they are sibling clones in a branching phylogeny.

Key Tools

PyClone (Roth et al., 2014), PhyloWGS (Deshwar et al., 2015), SciClone (Miller et al., 2014), and ABSOLUTE (Carter et al., 2012) are the most widely used tools. Tarabichi et al. (2021) recommend comparing results from multiple tools and validating with orthogonal data (e.g., FISH for CNAs) whenever possible.

Limitations

The process is error-prone. Key challenges include:

  • Copy number confounds VAF interpretation — misclassification can erroneously create or merge clones
  • Detection limits (~0.05–0.10 CCF at standard depths) mean minor subclones are invisible
  • The “infinite sites” assumption (each mutation occurs exactly once) can be violated in large tumors
  • Single-sample reconstruction undersamples the tumor’s spatial diversity