Variant Allele Fraction

Definition

Variant allele fraction (VAF), also called variant allele frequency, is the proportion of sequencing reads at a genomic locus that carry a specific variant allele, expressed as a percentage. In cancer genomics, VAF is the primary observable from which clone abundance is inferred.

Relationship to Clone Frequency

VAF is a function of:

  • The cancer-cell-fraction (CCF) — the true proportion of cancer cells carrying the mutation
  • Sample purity (ρ) — the proportion of cancer cells in the sampled tissue
  • Local copy number (NT) — the total number of copies at the locus in the cancer cells
  • Multiplicity (m) — how many of those copies carry the mutation

The relationship is: VAF = m × ρ × CCF / (ρ × NT + 2 × (1 − ρ)) (Tarabichi et al., 2021)

The 1/f² Distribution

Under neutral-evolution in a growing population, the number of mutations as a function of their VAF follows a 1/f² distribution (Turajlic et al., 2019). This is the null model against which selection is tested. The distribution arises because each new mutation arises in a single cell when the population is of size N, and as N doubles, the mutation’s frequency halves. Older (higher-frequency) mutations are rarer because they arose when the population was smaller.

The Detection Ceiling

Bulk sequencing imposes a fundamental time bias through the VAF. “Each doubling of the cancer cell population halves the frequency of new mutations arising in the population; hence, after just seven doublings, new mutations are undetectable with 100× sequencing, and after ten doublings, new mutations are undetectable at 1,000× sequencing depth” (Turajlic et al., 2019, p. 406). Consequently, bulk sequencing mostly recovers mutations that arose early in the tumor’s history. Late-arising mutations are invisible.

Copy Number Confounding

copy-number-alterations alter the VAF independently of clone abundance. An SNV on one of three copies will have a different VAF than the same SNV on one of four copies, even if both are present in 100% of cancer cells. As Turajlic et al. (2019) note, “In a tumour sample composed of 50% cancer cells, the difference in frequency of an SNV present on one of three copies versus one of four copies is only ~3%, which is a level of accuracy that is rarely achievable with moderate-depth sequencing (~100×)” (p. 408). Without accurate copy number correction, VAF-based clone inference can produce misleading results.