Module 5: Building the Stan Model

This is the heart of our project. We will now write the Stan code to build our predictive model. The model will be a logistic regression model, which is a standard choice for binary outcomes (in our case, whether the system is in OPEL-4 or not).

We will save this code in a separate file named opel_model.stan.

1. The Full Stan Model Code

Create a new file in your RStudio project named opel_model.stan and paste the following code into it.

// Stan model for predicting OPEL-4 status

data {
  int<lower=0> N; // Number of historical data points
  int<lower=0, upper=1> y[N]; // Binary outcome: 1 if OPEL-4, 0 otherwise
  vector[N] daily_admissions;
  vector[N] staff_absences;
  
  // --- New data for forecasted predictors ---
  int<lower=0> N_future; // Number of days to forecast ahead
  vector[N_future] future_daily_admissions; // Forecasted admissions
  vector[N_future] future_staff_absences;   // Forecasted absences
}

parameters {
  real alpha; // Intercept
  real beta_admissions;
  real beta_absences;
}

model {
  // Priors - our initial beliefs about the parameters
  alpha ~ normal(-4, 2);
  beta_admissions ~ normal(0, 0.5);
  beta_absences ~ normal(0, 0.5);
  
  // Likelihood - how the data relates to the parameters
  y ~ bernoulli_logit(alpha + beta_admissions * daily_admissions + beta_absences * staff_absences);
}

generated quantities {
  vector[N_future] y_pred; // 0/1 outcome (simulated)
  vector[N_future] p_opel_4_pred; // Probability of OPEL-4
  
  // Loop through each future day and use the corresponding forecast
  for (t in 1:N_future) {
    real linear_predictor = alpha + 
                            beta_admissions * future_daily_admissions[t] + 
                            beta_absences * future_staff_absences[t];
                            
    p_opel_4_pred[t] = inv_logit(linear_predictor); // Probability of OPEL-4
    y_pred[t] = bernoulli_rng(p_opel_4_pred[t]);     // Simulated 0/1 outcome
  }
}

2. Understanding the Stan Code

data block

  • int<lower=0> N;: An integer N for the number of data points, which must be non-negative.
  • int<lower=0, upper=1> y[N];: An array of N integers for our outcome variable. We constrain it to be 0 or 1, representing our binary outcome (OPEL-4 or not).
  • vector[N] daily_admissions;: A vector to hold the daily admissions data.
  • vector[N] staff_absences;: A vector for the staff absences data.
  • int<lower=0> N_future;: The number of days we want to forecast into the future (we will set this to 10 in R).

parameters block

  • real alpha;: The intercept term of our logistic regression. This is the baseline log-odds of being in OPEL-4 when all predictors are zero.
  • real beta_admissions;: The coefficient for daily_admissions. This represents how much the log-odds of OPEL-4 change for each additional admission.
  • real beta_absences;: The coefficient for staff_absences.

model block

  • Priors: alpha ~ normal(-4, 2); This is our prior belief for the intercept. A normal(-4, 2) prior suggests that the baseline probability of OPEL-4 is very low, which is a reasonable assumption. The priors on the beta coefficients are centered at 0, indicating that before seeing the data, we don’t know if the effect will be positive or negative.
  • Likelihood: y ~ bernoulli_logit(...). This is the core of the model. It states that our outcome y follows a Bernoulli distribution (a coin flip) where the probability of success is determined by the logistic function of our linear predictor (alpha + beta_admissions * ...).

generated quantities block

This block is for calculations we want to perform after the model has been fitted.

  • y_pred: This is the simulated 0/1 outcome for each future day, based on the model’s predictions.

  • p_opel_4_pred: This is the probability of being in OPEL-4 for each future day. We will use this to calculate credible intervals.

  • Important Note on Predictors: In this version of the model, we are directly using forecasted future values for daily_admissions and staff_absences. These forecasts would typically be generated by separate time-series models (as discussed in Module 3). This approach provides a more robust and realistic prediction by incorporating the expected future trends of the predictors.


With the model defined, the next step is to use R to pass our data to this Stan program, run the model, and analyse the results.