Phylogenetics III

Bayesian phylogenetics

Public Health Modeling Unit

2025-08-19

Barney Isaksen Potter

Series overview

Trees, tree likelihoods, and models of evolution
Rate heterogeneity and maximum likelihood
Bayesian phylogenetics, Markov chain Monte Carlo, and summary trees
Phylogeography and Kingman's coalescent

Thank you Philippe

"the global workhorse of genomic epidemiology"

One Tree to rule them all, One Tree to find them, One Tree to bring them all and in the darkness bind them

The (maximum likelihood) method we've learned so far gets a single tree
The one it finds is likely not the "best"
Are we doing a good job of reporting a single tree?

Frequentist philosophy

Probabilities refer to the outcome of experiments (i.e. data)
Probabilities are objectively real in the same way that physical objects are real
"Likelihood" referes to the degree to which data support a hypothesis

Bayesian philosophy

BOTH data and model parameters are described by probabilities
Probability represents the degree to which we believe a hypothesis
Hypotheses can have probabilities in the absence of data

Bayesian inference

Fundamentals of Bayesian inference

Bayesian inference produces a posterior probability distribution instead of a single MLE
The posterior combines information from both data and prior knowledge
Each parameter in the model has a prior probability distribution representing known knowledge about that parameter

"Human heights follow a uniform (flat) distribution between 1 angstrom and the width of the universe."

"Human heights follow a normal distribution with mean 170cm and standard deviation 5cm."

How can observing data (X) change our belief in a hypothesis ($\theta$)?

How can we apply this framework to phylogenetic inference?

\[ P(\theta|\textbf{X}) = \frac{P(\textbf{X} |\theta) \times P(\theta )}{P(\textbf{X})} \]

The posterior probability of a phylogenetic tree, $\tau$:

\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times P(\tau )}{P(\textbf{X})} \]

$\tau = $ phylogenetic hyopthesis (tree)

$\textbf{X} =$ genomic sequence data

Likelihood calculation

\[ P(\tau|\textbf{X}) = \frac{\begingroup \color{teal} P(\textbf{X} |\tau) \endgroup \times P(\tau )}{P(\textbf{X})} \] \[ \begingroup \color{teal} L(\tau,\nu,\Theta | x_1 \mathellipsis x_N) \endgroup = \prod_{i=1}^N Pr(x_i | \begingroup \color{darkmagenta} \tau \endgroup , \begingroup \color{darkblue} \nu \endgroup , \begingroup \color{mediumseagreen} \Theta \endgroup) \]

$\begingroup \color{darkmagenta} \tau = \text{tree topology} \endgroup, \begingroup \color{darkblue} \nu = \text{branch lengths} \endgroup, \atop \begingroup \color{mediumseagreen} \Theta = \text{model parameters} \endgroup, i \in \text{sites in genome}$

Prior calculation

\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times \begingroup \color{chocolate} P(\tau ) \endgroup}{P(\textbf{X})} \] \[ \begingroup \color{chocolate} P(\tau) \endgroup = \frac{1}{\begingroup \color{crimson} B(s) \endgroup} \]

$\begingroup \color{crimson} B(s) = \text{number of possible topologies} \endgroup$

Marginal term calculation

\[ P(\tau|\textbf{X}) = \frac{P(\textbf{X} |\tau) \times P(\tau )}{\begingroup \color{goldenrod} P(\textbf{X}) \endgroup} \] \[ \begingroup \color{goldenrod} P(\textbf{X}) \endgroup = \sum_{j=1}^{\begingroup \color{crimson} B(s) \endgroup} P(\textbf{X} | \tau_j) \times P(\tau_j) \]

To calculate this we need to sum the density across every possible tree...

... but tree topology space is too big!

\[\tiny \begin{array}{cc} \text{Num.~taxa} & \text{Num.~topologies:} \begingroup \color{crimson} B(s) \endgroup \\ \hline 1 & 1 \\ 2 & 1 \\ 3 & 3 \\ 4 & 15 \\ 5 & 105 \\ 6 & 945 \\ 7 & 10,395 \\ 8 & 135,135 \\ 9 & 2,027,025 \\ \vdots & \vdots \\ 20 & 8,200,794,532,637,891,559,375 \\ \vdots & \vdots \\ 769 & 3.753 \times 10^{2,110} \\ \end{array} \]

\[ 3 \times 5 \times 7 \times \mathellipsis \times (2n-3) = \frac{(2n-3)!}{2^{n-1} \times (n-1)!} \]

Markov chain Monte Carlo (MCMC)

Markov chain Monte Carlo (MCMC) Sampling

Posterior probabilities are difficult to calculate analytically. However, we can sample values from the posterior distribution with a frequency proportional to thir posterior probability by using MCMC.

Recall: ML optimization

MCMC leverages randomness

Metropolis-hastings algorithm

Recall: our state tau=tree, nu=branch lengths, Theta=model parameters
The algorithm proposes a new, modified state Psi star
Calculate a ratio R:

Ratio of posterior probabilities
Term "Hastings ratio" that accounts for potential asymmetry in proposals (ignore)
Expand posterior probabilities according to Bayes' theorem
Same problem as earlier, except that the marginal terms (B(s)) cancels out!
This is the crux of MCMC, because it allows us to never calculate the marginal term
The end result is a value between 0 and 1 that is proportional to the ratio of the likelihoods between Psi and Psi star

Generate a random number uniformly between 0 and 1
If that value is less than the ratio R, we accept the new proposal
Takeaway: we always take a step up, we often take small steps down, we rarely (but sometimes) take big steps down