Module 3: Forecasting Predictors with ARIMA

In real-world scenarios, when you want to predict future outcomes, you often don’t have the future values of your predictor variables. For instance, when forecasting OPEL-4 for the next 10 days, you won’t know the exact daily admissions or staff absences for those future days.

Instead of making simplifying assumptions (like using historical means or sampling from recent data), a more robust approach is to forecast the predictors themselves using time-series models. This module will introduce you to using ARIMA models to forecast daily_admissions and staff_absences.

1. Introduction to ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average) models are a popular class of statistical models for analysing and forecasting time-series data. They are particularly useful for data that exhibit trends, seasonality, and other non-stationary characteristics.

  • AR (AutoRegressive): Uses the relationship between an observation and a number of lagged observations.
  • I (Integrated): Uses differencing of raw observations to make the time series stationary (i.e., remove trends or seasonality).
  • MA (Moving Average): Uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

The forecast package in R provides the auto.arima() function, which automatically selects the best ARIMA model for a given time series, making it very convenient to use.

2. Forecasting Daily Admissions and Staff Absences

We will use the auto.arima() function to forecast our synthetic daily_admissions and staff_absences for the next 10 days. These forecasted values will then be passed to our Stan model.

# Load necessary libraries
library(tidyverse)
library(forecast)
library(lubridate)

# Load the synthetic data (ensure you've run Module 2 to create nhs_opel_data.csv)
all_data <- read_csv("nhs_opel_data.csv")

# Convert date to a time series object for forecasting
# We assume a weekly seasonality (frequency = 7) for these daily data

# Forecast Daily Admissions
admissions_ts <- ts(all_data$daily_admissions, frequency = 7)
fit_admissions <- auto.arima(admissions_ts)
forecast_admissions <- forecast(fit_admissions, h = 10) # Forecast 10 days ahead

print(forecast_admissions)

# Forecast Staff Absences
absences_ts <- ts(all_data$staff_absences, frequency = 7)
fit_absences <- auto.arima(absences_ts)
forecast_absences <- forecast(fit_absences, h = 10) # Forecast 10 days ahead

print(forecast_absences)

# The 'mean' component of these forecast objects will be used as inputs
# for the future predictor values in our Stan model.

3. Understanding the Forecast Output

The forecast() function provides several components, including:

  • mean: The point forecasts for the next h periods.
  • lower and upper: The lower and upper bounds for prediction intervals (e.g., 80% and 95%).

For our Stan model, we will primarily use the mean values as the expected future values of our predictors. In more advanced Bayesian models, you could even incorporate the uncertainty from these forecasts into your Stan model.


Now that we know how to forecast our predictors, let’s move on to the next module.