Phylogenetics III

Bayesian phylogenetics

Public Health Modeling Unit

2025-08-19

Barney Isaksen Potter

Series overview

  1. Trees, tree likelihoods, and models of evolution
  2. Rate heterogeneity and maximum likelihood
  3. Bayesian phylogenetics, Markov chain Monte Carlo, and summary trees
  4. Phylogeography and Kingman's coalescent

Thank you Philippe

"the global workhorse of genomic epidemiology"

One Tree to rule them all, One Tree to find them, One Tree to bring them all and in the darkness bind them

  • The (maximum likelihood) method we've learned so far gets a single tree
  • The one it finds is likely not the "best"
  • Are we doing a good job of reporting a single tree?

Frequentist philosophy

  • Probabilities refer to the outcome of experiments (i.e. data)
  • Probabilities are objectively real in the same way that physical objects are real
  • "Likelihood" referes to the degree to which data support a hypothesis

Bayesian philosophy

  • BOTH data and model parameters are described by probabilities
  • Probability represents the degree to which we believe a hypothesis
  • Hypotheses can have probabilities in the absence of data

Bayesian inference

Fundamentals of Bayesian inference

  • Bayesian inference produces a posterior probability distribution instead of a single MLE
  • The posterior combines information from both data and prior knowledge
  • Each parameter in the model has a prior probability distribution representing known knowledge about that parameter

"Human heights follow a uniform (flat) distribution between 1 angstrom and the width of the universe."

"Human heights follow a uniform (flat) distribution between 1 angstrom and the width of the universe."

"Human heights follow a uniform (flat) distribution between 1 angstrom and the width of the universe."

"Human heights follow a uniform (flat) distribution between 1 angstrom and the width of the universe."

"Human heights follow a normal distribution with mean 170cm and standard deviation 5cm."

"Human heights follow a normal distribution with mean 170cm and standard deviation 5cm."

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can we apply this framework to phylogenetic inference?

\[ P(\theta|\textbf{X}) = \frac{P(\textbf{X} |\theta) \times P(\theta )}{P(\textbf{X})} \]

The posterior probability of a phylogenetic tree, $\tau$:


\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times P(\tau )}{P(\textbf{X})} \]

$\tau = $ phylogenetic hyopthesis (tree)

$\textbf{X} =$ genomic sequence data

Likelihood calculation


\[ P(\tau|\textbf{X}) = \frac{\begingroup \color{teal} P(\textbf{X} |\tau) \endgroup \times P(\tau )}{P(\textbf{X})} \] \[ \begingroup \color{teal} L(\tau,\nu,\Theta | x_1 \mathellipsis x_N) \endgroup = \prod_{i=1}^N Pr(x_i | \begingroup \color{darkmagenta} \tau \endgroup , \begingroup \color{darkblue} \nu \endgroup , \begingroup \color{mediumseagreen} \Theta \endgroup) \]

$\begingroup \color{darkmagenta} \tau = \text{tree topology} \endgroup, \begingroup \color{darkblue} \nu = \text{branch lengths} \endgroup, \atop \begingroup \color{mediumseagreen} \Theta = \text{model parameters} \endgroup, i \in \text{sites in genome}$

Prior calculation


\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times \begingroup \color{chocolate} P(\tau ) \endgroup}{P(\textbf{X})} \] \[ \begingroup \color{chocolate} P(\tau) \endgroup = \frac{1}{\begingroup \color{crimson} B(s) \endgroup} \]

$\begingroup \color{crimson} B(s) = \text{number of possible topologies} \endgroup$

Marginal term calculation


\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times P(\tau )}{\begingroup \color{goldenrod} P(\textbf{X}) \endgroup} \] \[ \begingroup \color{goldenrod} P(\textbf{X}) \endgroup = \sum_{j=1}^{\begingroup \color{crimson} B(s) \endgroup} P(\textbf{X} | \tau_j) \times P(\tau_j) \]

To calculate this we need to sum the density across every possible tree...

... but tree topology space is too big!

\[\tiny \begin{array}{cc} \text{Num.~taxa} & \text{Num.~topologies:} \begingroup \color{crimson} B(s) \endgroup \\ \hline 1 & 1 \\ 2 & 1 \\ 3 & 3 \\ 4 & 15 \\ 5 & 105 \\ 6 & 945 \\ 7 & 10,395 \\ 8 & 135,135 \\ 9 & 2,027,025 \\ \vdots & \vdots \\ 20 & 8,200,794,532,637,891,559,375 \\ \vdots & \vdots \\ 769 & 3.753 \times 10^{2,110} \\ \end{array} \]
\[ 3 \times 5 \times 7 \times \mathellipsis \times (2n-3) = \frac{(2n-3)!}{2^{n-1} \times (n-1)!} \]

Markov chain Monte Carlo (MCMC)

Markov chain Monte Carlo (MCMC) Sampling

Posterior probabilities are difficult to calculate analytically. However, we can sample values from the posterior distribution with a frequency proportional to thir posterior probability by using MCMC.

Recall: ML optimization

Recall: ML optimization

Recall: ML optimization

Recall: ML optimization

Recall: ML optimization

Recall: ML optimization

MCMC leverages randomness

Metropolis-hastings algorithm

Operators

Tuning and mixing

Operators on trees

MCMC diagnostics, summaries, and interpretation

Downsampling

Stationarity

Convergence in multiple chains

Effective sample size

Autocorrelation within a chain

ESS Calculation

How do we summarize and interpret the results of our Bayesian phylogenetic analysis?

Summary of continuous parameters

Posterior distribution

Credible intervals

Credible intervals

How do we choose a consensus?

Maximum clade credibility (MCC) trees

MCC calculation

My chain won't converge and I am confused and angry and a little bit scared. What can I do?

DON'T PANIC

Some things might help:

  • Wait; run more parallel chains (different seeds)
  • Retune parameters (target $10 \text{--} 70\%$ acceptance)
  • Propose changes to "difficult" parameters more often (operator weights)
  • Use different operators
  • Simplify or make the model more realistic (stronger priors)

FIN