Module 4: An Introduction to Bayesian Modelling and Stan

Now that we have our data and have forecasted our future predictor values, it’s time to understand how we’re going to model it. We are using a Bayesian approach, which is a powerful framework for statistical modelling, especially when it comes to quantifying uncertainty.

1. What is Bayesian Inference?

At its core, Bayesian inference is a way of updating our beliefs in the face of new evidence. It’s a mathematical formalization of how we learn.

Imagine you think there’s a 50% chance of rain today. You look outside and see dark clouds. You might update your belief to a 90% chance of rain. That’s Bayesian thinking!

In statistical terms, it works like this:

  • Prior Belief (Prior): What we believe about something before seeing the data. For example, we might have a general idea of how many staff absences are “normal.”
  • Evidence (Likelihood): The data we collect. In our case, this is the nhs_opel_data.csv file.
  • Updated Belief (Posterior): The combination of our prior beliefs and the evidence. This is the result of our model—a new, more informed understanding of the situation.

The key output of a Bayesian model is not a single number, but a probability distribution. Instead of saying “the effect of staff absences is 0.08,” a Bayesian model says, “we are 95% certain that the effect of staff absences is between 0.06 and 0.10.” This is incredibly useful for decision-making under uncertainty.

2. Why Use a Bayesian Approach for this Problem?

  • Quantifying Uncertainty: It gives us a direct way to measure our uncertainty about the future. We won’t just predict that the hospital will enter OPEL-4; we’ll predict the probability of it happening.
  • Flexibility: Bayesian models are very flexible and can be extended to include new information or more complex relationships.
  • Works with Less Data: While more data is always better, Bayesian methods can often provide reasonable estimates even with limited data by incorporating prior knowledge.

3. What is Stan?

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. It is a probabilistic programming language, which means we write down the structure of our model, and Stan’s powerful algorithms will handle the difficult job of fitting it to the data.

4. The Basic Structure of a Stan Model

A Stan program is typically organized into three main blocks:

  1. data block: Here, you declare all the data that you will pass in from R. This includes the number of observations, the predictors (like daily_admissions), and the outcome (like opel_level).

  2. parameters block: This is where you declare the unknown quantities you want the model to estimate. These are the things we have “beliefs” about, such as the coefficients for our predictors.

  3. model block: This is the heart of the program. Here, you specify the relationship between the data and the parameters. This is where you define the prior and the likelihood.

We will also often use a generated quantities block to calculate additional values of interest, such as making predictions for future days.


This was a brief, high-level overview. Don’t worry if it doesn’t all make sense yet. It will become much clearer when we see it in action.

Let’s move on to the most important part.