Module 5: Building the Stan Model
This is the heart of our project. We will now write the Stan code to build our predictive model. The model will be a logistic regression model, which is a standard choice for binary outcomes (in our case, whether the system is in OPEL-4 or not).
We will save this code in a separate file named opel_model.stan
.
1. The Full Stan Model Code
Create a new file in your RStudio project named opel_model.stan
and paste the following code into it.
// Stan model for predicting OPEL-4 status
data {
int<lower=0> N; // Number of historical data points
int<lower=0, upper=1> y[N]; // Binary outcome: 1 if OPEL-4, 0 otherwise
vector[N] daily_admissions;
vector[N] staff_absences;
// --- New data for forecasted predictors ---
int<lower=0> N_future; // Number of days to forecast ahead
vector[N_future] future_daily_admissions; // Forecasted admissions
vector[N_future] future_staff_absences; // Forecasted absences
}
parameters {
real alpha; // Intercept
real beta_admissions;
real beta_absences;
}
model {
// Priors - our initial beliefs about the parameters
4, 2);
alpha ~ normal(-0, 0.5);
beta_admissions ~ normal(0, 0.5);
beta_absences ~ normal(
// Likelihood - how the data relates to the parameters
y ~ bernoulli_logit(alpha + beta_admissions * daily_admissions + beta_absences * staff_absences);
}
generated quantities {
vector[N_future] y_pred; // 0/1 outcome (simulated)
vector[N_future] p_opel_4_pred; // Probability of OPEL-4
// Loop through each future day and use the corresponding forecast
for (t in 1:N_future) {
real linear_predictor = alpha +
beta_admissions * future_daily_admissions[t] +
beta_absences * future_staff_absences[t];
// Probability of OPEL-4
p_opel_4_pred[t] = inv_logit(linear_predictor); // Simulated 0/1 outcome
y_pred[t] = bernoulli_rng(p_opel_4_pred[t]);
} }
2. Understanding the Stan Code
data
block
int<lower=0> N;
: An integerN
for the number of data points, which must be non-negative.int<lower=0, upper=1> y[N];
: An array ofN
integers for our outcome variable. We constrain it to be 0 or 1, representing our binary outcome (OPEL-4 or not).vector[N] daily_admissions;
: A vector to hold the daily admissions data.vector[N] staff_absences;
: A vector for the staff absences data.int<lower=0> N_future;
: The number of days we want to forecast into the future (we will set this to 10 in R).
parameters
block
real alpha;
: The intercept term of our logistic regression. This is the baseline log-odds of being in OPEL-4 when all predictors are zero.real beta_admissions;
: The coefficient fordaily_admissions
. This represents how much the log-odds of OPEL-4 change for each additional admission.real beta_absences;
: The coefficient forstaff_absences
.
model
block
- Priors:
alpha ~ normal(-4, 2);
This is our prior belief for the intercept. Anormal(-4, 2)
prior suggests that the baseline probability of OPEL-4 is very low, which is a reasonable assumption. The priors on thebeta
coefficients are centered at 0, indicating that before seeing the data, we don’t know if the effect will be positive or negative. - Likelihood:
y ~ bernoulli_logit(...)
. This is the core of the model. It states that our outcomey
follows a Bernoulli distribution (a coin flip) where the probability of success is determined by the logistic function of our linear predictor (alpha + beta_admissions * ...
).
generated quantities
block
This block is for calculations we want to perform after the model has been fitted.
y_pred
: This is the simulated 0/1 outcome for each future day, based on the model’s predictions.p_opel_4_pred
: This is the probability of being in OPEL-4 for each future day. We will use this to calculate credible intervals.Important Note on Predictors: In this version of the model, we are directly using forecasted future values for
daily_admissions
andstaff_absences
. These forecasts would typically be generated by separate time-series models (as discussed in Module 3). This approach provides a more robust and realistic prediction by incorporating the expected future trends of the predictors.
With the model defined, the next step is to use R to pass our data to this Stan program, run the model, and analyse the results.