Reviewed by Logan Harris on November 21, 2025

Introduction

What are glass-box models?

Glass-box models are transparent, intrinsically interpretable alternatives to their opaque counterparts, black box models. Data scientists typically consider regression-based methods and sparse decision trees as “glass-box”. In this post, I describe the benefits of glass-box methods for modeling data, arguing for the importance of intrinsic interpretability. An intrinsically interpretable model is one whose internal logic is clear enough that a human can see why it makes each prediction, not merely trust a separate explainer.

I hope to convince you that the prevailing definition for glass-box models requires significant refinement in order to be true to its namesake.

Just because a model is a “simple regression” does not mean it is interpretable. In fact, model selection methods that attempt to create more interpretable models often invalidate inference by making p-values unjustifiably low, and confidence intervals (CIs) unrealistically narrow.

Are glass-box models a novelty? Or a necessity?

Consider the useful analogy in the image below, which compares two very different uses of glass as a window into a process. On the left, a penny pressor. I always considered these to be a fun novelty (I still don’t understand how they are legal!). The glass box is part of the appeal - you get to watch a penny turn into a souvenir. Cool!

On the right, you see scientists at the Hanford Site in Washington peering through shield windows while working on the plutonium that would eventually be used in the world’s first atomic bomb explosion. One intact version of these shield windows used during the Manhattan project is currently priced at 10 million dollars, although you can purchase a fragment of it for much less shop.minimuseum.com.

The example of the Hanford Site clearly shows how in high-stakes decisions or systems, opacity is dangerous. Therefore, sometimes, a glass-box model isn’t a novelty like with the penny pressor - it’s a requirement for oversight and correction.

If the AI doomers are right that AI represent existential threats on par with nuclear war… it is easy to see that glass-box models should be (at least in many cases) classified as a necessity, not a novelty.

Why do we build models in the first place?

Let’s start with a quote you have probably heard before from George Box:

All models are wrong, but some are useful.

What makes them useful? Good models help us to:

make good predictions
understand phenomena.

The mix of these two goals are entirely context dependent. In my experience, it is rare for the focus to be entirely on one of these goals.

A brief history of model selection

Hypothesis testing

Starting in the 1700s and through the 1980s, hypothesis testing & p-values were the main way to build models. In the typical set up, two competing models are compared (e.g., a null and an alternative), and evidence against the null model is summarized via a p-value.

Information criteria

In the 1970s, Akaike changed the game with his famous information criteria, AIC, and the modeling goal substantively changed from one of testing to one of optimization. Instead of model A vs model B, we now had a set of a bunch of models that we could compare at once to find the best.

Computationally-intensive validation

More recently, however, “computationally-intensive” validation opened Pandora’s box. In this era, the data scientist is only limited by their imagination. Any model can be compared against any other model and fed through an optimization pipeline that uses computationally-intensive validation to ensure bad models (that is, models that predict poorly), get sifted out and never see the light of day.

Now, so-called black box modeling approaches, where the desire to understand phenomena is completely defenestrated in conquest of making better predictions, are ubiquitous.

I suppose that’s what Pandora deserved for having an opaque box to begin with.

Now what?

In light of these advancements, “scientific” model sifting via interdisciplinary expertise remains and is increasingly important. Burnham and Anderson suggest that scientists should build a small set of models to clearly and uniquely represent their hypotheses a priori:

“…it seems poor practice to consider all possible models; surely some science can be brought to bear on such an unthinking approach (otherwise, the scientist is superfluous)”

So in the present era (with potentially huge data sets and lots of features), how can we use domain expertise efficiently?

These are important questions I’ve devoted a lot of effort to, but this is not the topic of today’s post.

On black & glass boxes

Black box machine learning (ML) methods are not designed with interpretability in mind. Black box models:

include random forests, ensembles/super learners, neural networks, XGBoost, etc.
can capture nonlinearities and/or high-order interactions well.
may have predictions that are extrinsically explainable. This differs from intrinsic interpretability, however.

The term glass-box models arose to contrast black box models with traditional statistical models.

Generally, the following are considered glass-box:

Regression (linear, logistic, etc.)
Penalized regression (if high-dimensional)
Interpretable decision trees

Regression methods allow us to estimate how an outcome \(Y\) changes with, or is impacted by, a covariate \(X\), holding other covariates constant. “Holding confounders constant” is, in my view, a statistical slight-of-hand that is often very poorly understood and mis-applied. At its best though, the language and techniques of regression allow us to get closer to a causal interpretation of \(X \rightarrow Y\). Modern causal inference methods formalize this and can yield additional insights.

In traditional statistical models, we not only describe these model-based associations, we seek inferences about them (often with p-values and CIs).

Note

Confidence intervals and p-values are hard to obtain in black box ML settings.

Not all regression models are glass-box.

If you disagree, please keep reading.

A refined glass-box model definition

Here’s my suggestion for a refined definition of “glass-box model”.

Glass-box model:

A statistical model expressed in terms of a linear combination of a parsimonious set of meaningful parameters with quantified uncertainty. The best glass-box models are transparent, small, quantify parameter uncertainty honestly, and still predict well.

Transparency is reduced as more features are added, especially features that render models difficult to interpret (like interactions), or those involving complex transformations. This definition of transparency resembles that for typical applications of Occam’s Razor in model selection, where the number of parameters in the model translates directly to its simplicity, except that we consider some parameters (coefficients on interactions, for instance) more complex than others (main effects).

Uncertainty quantification is also a key component of this definition. Inferential tools like p-values and CIs are a cornerstone of science, replicability, and transparency. Models without such measures are limited to description, and may therefore have poor generalizability.

Linearity: While the definition contains the word “linear,” it’s language is careful to encompass generalized linear models in addition to linear regression.

Under this definition, transparency (conversely, opacity) is a spectrum:

The most transparent model is the “null” model
Single-predictor models, often used to describe “unadjusted” relationships, might be labeled as the next most transparent.*
On the other end might be large-language models with billions of interconnected parameters.

*A brief aside.

My college friend Tom smoked cigarettes, despite being “pre-med”. I asked him about why he smoked given the health consequences. He said “It’s not bad for me until I’ve smoked 20 pack-years worth! That’s what the literature says!”

In fact, this 20+ pack years is all over the literature and official screening guidelines. Frank Harrell refers to this as dichotomania. It illustrates the appeal of a glass-box model in the most negative possible light, but an appeal nonetheless.

Let’s go through a simple example to illustrate.

Example: HERS Dataset

The data

The Heart and Estrogen/progestin Replacement Study (HERS) was a clinical trial of hormone therapy for prevention of recurrent heart attacks and death among post-menopausal women with existing coronary heart disease.

This data set contains 27 baseline features for n=2571 patients, and complete 1-year follow-up cholesterol data.

Baseline covariates include age, baseline cholesterol, clinical characteristics, treatment, patient-reported health/activity, diabetes status, blood biomarkers, etc.

Let’s take a fresh look at the HERS data to build a glass-box model for HDL cholesterol 1-year post-baseline. We might want to do this for several reasons, for instance, if we want to plan a trial to consider whether a particular intervention will impact cholesterol. We may want to pre-specify our statistical analysis and specify what model will be used to test our hypothesis. Specifically, the question might be posed - what patient factors should our model for cholesterol include as “precision” variables?

This setting is “easy”…

We have \(n=2571 >> p= 32\), a relatively Gaussian response, a small set of interpretable features. However, we’ll see that even in this “easy” setting, building a glass-box model (under the refined definition) is not trivial.

Describing the data

Here’s our outcome:

library(tidyverse)
library(ggplot2)
library(gtsummary)
library(kableExtra)
library(recipes)
library(selectInferToolkit) #github.com/petersonR/selectInferToolkit

data("hers")

hers %>%
  ggplot(aes(x = hdl1)) +
  geom_histogram() +
  theme_minimal() +
  xlab("HDL (1 year ahead)")

hers %>%
  select(where(is.numeric)) %>%
  select(-hdl1) %>%
  tbl_summary()

Characteristic	N = 2,571¹
age	67 (62, 72)
weight	71 (62, 81)
bmi	27.8 (24.6, 31.7)
waist	91 (82, 100)
whr	0.87 (0.81, 0.92)
glucose	99 (91, 114)
ldl	141 (119, 166)
hdl	49 (41, 57)
tg	157 (116, 208)
sbp	134 (121, 146)
dbp	72 (67, 80)
¹ Median (Q1, Q3)

And our predictors:

hers %>%
  select(where(is.factor)) %>%
  tbl_summary()

Characteristic	N = 2,571¹
ht
placebo	1,303 (51%)
hormone therapy	1,268 (49%)
raceth
White	2,299 (89%)
African American	184 (7.2%)
Other	88 (3.4%)
smoking	328 (13%)
drinkany	1,020 (40%)
exercise	1,012 (39%)
physact
much less active	170 (6.6%)
somewhat less active	459 (18%)
about as active	854 (33%)
somewhat more active	794 (31%)
much more active	294 (11%)
globrat
poor	46 (1.8%)
fair	536 (21%)
good	1,229 (48%)
very good	648 (25%)
excellent	112 (4.4%)
medcond	947 (37%)
htnmeds	2,107 (82%)
statins	951 (37%)
diabetes	662 (26%)
dmpills	246 (9.6%)
insulin	244 (9.5%)
¹ n (%)

Let’s clean it up a bit for modeling. The steps below use the recipes package to standardize the data and create indicators for each of the factor variables.

hers <- hers %>%
  mutate(physact = relevel(physact, "about as active"),
         globrat = relevel(globrat, "good"))

rec_obj <- recipe(hdl1 ~ ., data = hers)

rec_obj <- rec_obj %>%
  step_dummy(all_nominal(), keep_original_cols = FALSE) %>%
  step_center(all_predictors()) %>%
  step_scale(all_numeric()) %>%
  prep()

df <- bake(rec_obj, new_data = hers)

Unadjusted models

We can look at the unadjusted associations via gtsummary’s function below (recall, these are close to the most transparent, interpretable models we have, but they probably don’t predict well).

tbl_uvregression(df, method = "lm", y = hdl1)

Characteristic	N	Beta	95% CI	p-value
age	2,571	0.11	0.08, 0.15	<0.001
weight	2,571	-0.19	-0.23, -0.15	<0.001
bmi	2,571	-0.19	-0.23, -0.15	<0.001
waist	2,571	-0.23	-0.26, -0.19	<0.001
whr	2,571	-0.20	-0.24, -0.16	<0.001
glucose	2,571	-0.15	-0.19, -0.11	<0.001
ldl	2,571	-0.01	-0.05, 0.03	0.5
hdl	2,571	0.73	0.70, 0.76	<0.001
tg	2,571	-0.36	-0.40, -0.33	<0.001
sbp	2,571	0.00	-0.03, 0.04	0.8
dbp	2,571	0.02	-0.01, 0.06	0.2
ht_hormone.therapy	2,571	0.17	0.13, 0.21	<0.001
raceth_African.American	2,571	0.04	0.00, 0.08	0.061
raceth_Other	2,571	-0.04	-0.08, 0.00	0.056
smoking_yes	2,571	-0.04	-0.08, 0.00	0.057
drinkany_yes	2,571	0.14	0.10, 0.18	<0.001
exercise_yes	2,571	0.05	0.02, 0.09	0.006
physact_much.less.active	2,571	-0.05	-0.09, -0.01	0.008
physact_somewhat.less.active	2,571	-0.06	-0.10, -0.02	0.004
physact_somewhat.more.active	2,571	0.07	0.03, 0.11	<0.001
physact_much.more.active	2,571	0.02	-0.02, 0.06	0.2
globrat_poor	2,571	-0.03	-0.07, 0.01	0.2
globrat_fair	2,571	-0.03	-0.07, 0.00	0.079
globrat_very.good	2,571	0.03	0.00, 0.07	0.087
globrat_excellent	2,571	0.06	0.02, 0.09	0.005
medcond_yes	2,571	-0.01	-0.05, 0.03	0.7
htnmeds_yes	2,571	-0.06	-0.10, -0.02	0.004
statins_yes	2,571	0.01	-0.02, 0.05	0.5
diabetes_yes	2,571	-0.16	-0.20, -0.13	<0.001
dmpills_yes	2,571	-0.13	-0.17, -0.09	<0.001
insulin_yes	2,571	-0.08	-0.12, -0.05	<0.001
Abbreviation: CI = Confidence Interval

So… in these unadjusted relationships, nearly everything is significantly associated with HDL 1-year ahead… Are these helpful?

A kitchen sink approach

In settings like this when \(n > p\), and especially when \(n > 10p\), quite a few statisticians suggest that a “full model approach” is best for inference. We’ll tackle that debate in another post.

For now, let’s check out where this kitchen sink approach (as in, throw all predictors into the model except the kitchen sink) gets us:

fit <- lm(hdl1 ~., data = df)

tbl_regression(fit) %>%
  bold_p()

Characteristic	Beta	95% CI	p-value
age	0.03	0.01, 0.06	0.018
weight	-0.03	-0.11, 0.04	0.4
bmi	-0.01	-0.08, 0.06	0.8
waist	0.01	-0.08, 0.10	0.9
whr	-0.02	-0.06, 0.03	0.5
glucose	-0.01	-0.05, 0.02	0.4
ldl	0.00	-0.03, 0.03	>0.9
hdl	0.69	0.66, 0.72	<0.001
tg	-0.05	-0.08, -0.03	<0.001
sbp	-0.01	-0.04, 0.02	0.5
dbp	0.02	-0.01, 0.05	0.3
ht_hormone.therapy	0.19	0.16, 0.21	<0.001
raceth_African.American	0.03	0.01, 0.06	0.016
raceth_Other	-0.03	-0.05, 0.00	0.038
smoking_yes	-0.01	-0.04, 0.02	0.5
drinkany_yes	0.01	-0.01, 0.04	0.3
exercise_yes	0.01	-0.01, 0.04	0.4
physact_much.less.active	-0.01	-0.04, 0.01	0.3
physact_somewhat.less.active	0.01	-0.02, 0.04	0.5
physact_somewhat.more.active	-0.01	-0.04, 0.02	0.6
physact_much.more.active	-0.03	-0.06, 0.00	0.025
globrat_poor	-0.01	-0.03, 0.02	0.7
globrat_fair	-0.01	-0.03, 0.02	0.7
globrat_very.good	0.02	-0.01, 0.04	0.3
globrat_excellent	0.01	-0.02, 0.03	0.6
medcond_yes	0.00	-0.02, 0.03	0.8
htnmeds_yes	-0.03	-0.05, 0.00	0.054
statins_yes	0.01	-0.02, 0.04	0.5
diabetes_yes	0.01	-0.04, 0.05	0.7
dmpills_yes	-0.03	-0.06, 0.01	0.10
insulin_yes	-0.02	-0.06, 0.01	0.2
Abbreviation: CI = Confidence Interval

OK - we have a good starting point now. A few questions arise.

Is this a good model?

The predictive accuracy (\(R^2\)) is 0.58. Not bad, but actually not great considering a model containing only baseline HDL achieves an \(R^2\) of 0.53.

Is this a glass-box model?

Well, we can learn quickly that baseline HDL, triglycerides, treatment group, race and maybe physical activity are significant predictors of HDL 1-year out.

A subquestion is whether these other “insignificant” variables are not important? The answer is NO. In fact, we are missing something big… More on this soon.

Let’s look back at our new glass-box model definition:

A statistical model expressed in terms of a linear combination of a parsimonious set of meaningful parameters with quantified uncertainty.

How does our kitchen sink model stack up?

Linear? ✅
Predicts well? 🤷
Parsimonious? ❌
Meaningful parameters? ❌*
Valid uncertainty: ✅

So, while this checks some of the boxes, I would not consider this kitchen sink model a glass-box model. How can we make this a better glass box?

*The trouble with collinearity

On the face of it, you might think it’s straightforward to interpret each parameter of the kitchen sink model. For instance, holding other variables constant, for every 1 SD increase in age, the expected HDL 1 year after baseline increases by 0.11 SDs. However, for other parameters, it’s not so easy.

BMI (body mass index), WHR (waist to hip ratio), weight, and waist circumference are highly collinear with each other. They all measure adiposity. In the kitchen sink model, it appeared none of these were “significant”. However, the meaning of these parameters in the full model is lacking. “The effect of BMI holding WHR and waist circumference constant” is, in fact, quite ridiculous to conceptualize since these variables relate so highly to each other.

To see this, consider the model below where we remove 3 of the four adiposity variables, so only waist circumference represents adiposity as a “singular flagship”:

fit <- lm(hdl1 ~. - bmi - whr - weight, data = df)

tbl_regression(fit) %>%
  bold_p()

Characteristic	Beta	95% CI	p-value
age	0.04	0.01, 0.07	0.009
waist	-0.04	-0.07, -0.01	0.009
glucose	-0.02	-0.05, 0.02	0.4
ldl	0.00	-0.03, 0.02	>0.9
hdl	0.69	0.66, 0.72	<0.001
tg	-0.05	-0.08, -0.03	<0.001
sbp	-0.01	-0.04, 0.02	0.5
dbp	0.02	-0.01, 0.05	0.3
ht_hormone.therapy	0.19	0.16, 0.21	<0.001
raceth_African.American	0.03	0.00, 0.06	0.020
raceth_Other	-0.03	-0.05, 0.00	0.042
smoking_yes	-0.01	-0.03, 0.02	0.6
drinkany_yes	0.01	-0.01, 0.04	0.4
exercise_yes	0.01	-0.01, 0.04	0.4
physact_much.less.active	-0.02	-0.04, 0.01	0.3
physact_somewhat.less.active	0.01	-0.02, 0.04	0.5
physact_somewhat.more.active	-0.01	-0.04, 0.02	0.6
physact_much.more.active	-0.03	-0.06, 0.00	0.026
globrat_poor	-0.01	-0.03, 0.02	0.7
globrat_fair	0.00	-0.03, 0.02	0.7
globrat_very.good	0.02	-0.01, 0.04	0.3
globrat_excellent	0.01	-0.02, 0.03	0.6
medcond_yes	0.00	-0.02, 0.03	0.8
htnmeds_yes	-0.03	-0.05, 0.00	0.054
statins_yes	0.01	-0.02, 0.04	0.5
diabetes_yes	0.01	-0.04, 0.05	0.7
dmpills_yes	-0.03	-0.06, 0.00	0.095
insulin_yes	-0.03	-0.06, 0.01	0.14
Abbreviation: CI = Confidence Interval

We’ve reduced our model dimension and in fact now discover that waist circumference was significant after all! This is the benefit of a small degree of critical thought applied to the kitchen sink model - we’ve gotten one step closer to a good glass-box model.

Selecting a glass-box model…

Let’s say we want to get an even better glass-box model. Lots of stuff in our last model was insignificant; do we really need to keep them all? If we use fewer predictors, our model becomes more parsimonious, and thereby becomes more transparent.

Here’s how we could go about this:

Contextual model refining: This is a great choice and should be the first go to, but it can be hard if there isn’t much context on the features, and in high dimensions. We already did this by looking only at waist as a candidate predictor.
Stepwise selection (select with AIC, BIC, p-values, etc.)
Best-subsets (select with AIC or BIC)
Lasso/penalized regression (Simultaneous selection & estimation with shrinkage toward zero)
Bayesian methods

For now, let’s show the results of a model selected via stepwise selection with AIC:

fit_stepAIC <- MASS::stepAIC(fit, direction = "both", trace = 0)
tbl_regression(fit_stepAIC) %>%
  bold_p()

Characteristic	Beta	95% CI	p-value
age	0.03	0.01, 0.06	0.015
waist	-0.04	-0.06, -0.01	0.008
hdl	0.69	0.67, 0.72	<0.001
tg	-0.06	-0.08, -0.03	<0.001
ht_hormone.therapy	0.19	0.16, 0.21	<0.001
raceth_African.American	0.03	0.00, 0.05	0.030
raceth_Other	-0.03	-0.05, 0.00	0.032
physact_much.less.active	-0.02	-0.05, 0.00	0.10
physact_much.more.active	-0.03	-0.05, 0.00	0.034
htnmeds_yes	-0.03	-0.05, 0.00	0.030
dmpills_yes	-0.03	-0.06, -0.01	0.012
insulin_yes	-0.03	-0.06, -0.01	0.015
Abbreviation: CI = Confidence Interval

Whoa! Such low p-values!!!

After this selective process, it seems that we can also claim significance for age, waist, medications, and insulin! And it’s a smaller model so it’s a more “glass-box” approach!

Right?

…Right?

……

Well, no. This is a classic example of an UPSI.

UPSI

A term I’m coining right now that stands for an “Unadjusted Post Selection Inference”. It also is a statistical “oopsie”.

You should know deep down the p-values from the model selected via stepwise AIC are too low. The data set has already been used to select the model, so of course everything that’s selected is more likely to be significant.

We’ll tackle UPSIs in more depth in another post, as well as “better” alternatives like selective inference. In short, inferences post-selection get tricky. Ignoring the variability inherent in the model selection process has been characterized as a common “bad” practice in statistics. UPSIs generally leads to invalid, non-replicable inferences.

Then… why are UPSIs so common?

Well, you tell me why anyone would want their p-values to be lower than they should be… 🙄

Eyerolls and publication bias aside, even well-intentioned statisticians find it difficult to properly adjusting these p-values for the selection process. Some say it’s impossible. Again, we’ll save this for another post.

Is the stepwise AIC model a good glass-box model?

Linear? ✅
Predicts well? ✅
Parsimonious? ✅
Meaningful parameters? ✅ This model has fewer highly collinear features.
Valid uncertainty: ❌Unfortunately, selecting for a more parsimonious model created dishonestly low p-values.

So while this model may predict better, is more parsimonious, and has more meaningful parameters, it’s still not an optimal glass-box model (under our refined definition).

Conclusion

Producing high-quality glass-box models is both necessary and difficult.

Key Takeaways

Regression is not necessarily a glass-box approach
- collinearity can obfuscate meaningful relationships and patterns
- p-values can be easily invalidated by selection
Building an optimal glass-box model is not easy, even for easy scenarios
There is no substitute for domain expertise and critical thinking.

Appendix

R Session Info

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Sequoia 15.7.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Los_Angeles
 date     2025-11-22
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package            * version    date (UTC) lib source
 adaptMCMC            1.5        2024-01-29 [1] CRAN (R 4.5.0)
 backports            1.5.0      2024-05-23 [1] CRAN (R 4.5.0)
 base64enc            0.1-3      2015-07-28 [1] CRAN (R 4.5.0)
 broom              * 1.0.10     2025-09-13 [1] CRAN (R 4.5.0)
 broom.helpers        1.22.0     2025-09-17 [1] CRAN (R 4.5.0)
 cards                0.7.0      2025-08-27 [1] CRAN (R 4.5.0)
 cardx                0.3.0      2025-08-27 [1] CRAN (R 4.5.0)
 class                7.3-23     2025-01-01 [2] CRAN (R 4.5.1)
 cli                  3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
 coda                 0.19-4.1   2024-01-31 [1] CRAN (R 4.5.0)
 codetools            0.2-20     2024-03-31 [2] CRAN (R 4.5.1)
 commonmark           2.0.0      2025-07-07 [1] CRAN (R 4.5.0)
 data.table           1.17.8     2025-07-10 [1] CRAN (R 4.5.0)
 digest               0.6.37     2024-08-19 [1] CRAN (R 4.5.0)
 dplyr              * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
 evaluate             1.0.5      2025-08-27 [1] CRAN (R 4.5.0)
 farver               2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
 fastmap              1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
 forcats            * 1.0.1      2025-09-25 [1] CRAN (R 4.5.0)
 foreach              1.5.2      2022-02-02 [1] CRAN (R 4.5.0)
 future               1.67.0     2025-07-29 [1] CRAN (R 4.5.0)
 future.apply         1.20.0     2025-06-06 [1] CRAN (R 4.5.0)
 generics             0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
 ggplot2            * 4.0.0      2025-09-11 [1] CRAN (R 4.5.0)
 glmnet               4.1-10     2025-07-17 [1] CRAN (R 4.5.0)
 globals              0.18.0     2025-05-08 [1] CRAN (R 4.5.0)
 glue                 1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
 gower                1.0.2      2024-12-17 [1] CRAN (R 4.5.0)
 gt                   1.0.0      2025-04-05 [1] CRAN (R 4.5.0)
 gtable               0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
 gtsummary          * 2.4.0      2025-08-28 [1] CRAN (R 4.5.0)
 hardhat              1.4.2      2025-08-20 [1] CRAN (R 4.5.0)
 haven                2.5.5      2025-05-30 [1] CRAN (R 4.5.0)
 hms                  1.1.4      2025-10-17 [1] CRAN (R 4.5.0)
 htmltools            0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets          1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
 intervals            0.15.5     2024-08-23 [1] CRAN (R 4.5.0)
 ipred                0.9-15     2024-07-18 [1] CRAN (R 4.5.0)
 iterators            1.0.14     2022-02-05 [1] CRAN (R 4.5.0)
 jsonlite             2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
 kableExtra         * 1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
 knitr                1.50       2025-03-16 [1] CRAN (R 4.5.0)
 labeling             0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
 labelled             2.15.0     2025-09-16 [1] CRAN (R 4.5.0)
 lattice              0.22-7     2025-04-02 [2] CRAN (R 4.5.1)
 lava                 1.8.2      2025-10-30 [1] CRAN (R 4.5.0)
 lifecycle            1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
 listenv              0.10.0     2025-11-02 [1] CRAN (R 4.5.0)
 litedown             0.7        2025-04-08 [1] CRAN (R 4.5.0)
 lubridate          * 1.9.4      2024-12-08 [1] CRAN (R 4.5.0)
 magrittr             2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
 markdown             2.0        2025-03-23 [1] CRAN (R 4.5.0)
 MASS                 7.3-65     2025-02-28 [2] CRAN (R 4.5.1)
 Matrix               1.7-3      2025-03-11 [2] CRAN (R 4.5.1)
 ncvreg               3.16.0     2025-10-09 [1] Github (pbreheny/ncvreg@5fecc8c)
 nnet                 7.3-20     2025-01-01 [1] CRAN (R 4.5.0)
 parallelly           1.45.1     2025-07-24 [1] CRAN (R 4.5.0)
 pbapply              1.7-4      2025-07-20 [1] CRAN (R 4.5.0)
 pillar               1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig            2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
 prodlim              2025.04.28 2025-04-28 [1] CRAN (R 4.5.0)
 purrr              * 1.2.0      2025-11-04 [1] CRAN (R 4.5.0)
 R6                   2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer         1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                 1.1.0      2025-07-02 [1] CRAN (R 4.5.0)
 readr              * 2.1.5      2024-01-10 [1] CRAN (R 4.5.0)
 recipes            * 1.3.1      2025-05-21 [1] CRAN (R 4.5.0)
 rlang                1.1.6      2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown            2.30       2025-09-28 [1] CRAN (R 4.5.0)
 rpart                4.1.24     2025-01-07 [2] CRAN (R 4.5.1)
 rstudioapi           0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
 S7                   0.2.0      2024-11-07 [1] CRAN (R 4.5.0)
 sass                 0.4.10     2025-04-11 [1] CRAN (R 4.5.0)
 scales               1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
 selectInferToolkit * 0.4.1      2025-11-19 [1] Github (petersonR/selectInferToolkit@8d37f5e)
 selectiveInference   1.2.5      2019-09-07 [1] CRAN (R 4.5.0)
 sessioninfo          1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
 shape                1.4.6.1    2024-02-23 [1] CRAN (R 4.5.0)
 sparsevctrs          0.3.4      2025-05-25 [1] CRAN (R 4.5.0)
 stringi              1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
 stringr            * 1.6.0      2025-11-04 [1] CRAN (R 4.5.0)
 survival             3.8-3      2024-12-17 [2] CRAN (R 4.5.1)
 svglite              2.2.1      2025-05-12 [1] CRAN (R 4.5.0)
 systemfonts          1.3.1      2025-10-01 [1] CRAN (R 4.5.0)
 textshaping          1.0.4      2025-10-10 [1] CRAN (R 4.5.0)
 tibble             * 3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
 tidyr              * 1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect           1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
 tidyverse          * 2.0.0      2023-02-22 [1] CRAN (R 4.5.0)
 timechange           0.3.0      2024-01-18 [1] CRAN (R 4.5.0)
 timeDate             4051.111   2025-10-17 [1] CRAN (R 4.5.0)
 tzdb                 0.5.0      2025-03-15 [1] CRAN (R 4.5.0)
 vctrs                0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
 viridisLite          0.4.2      2023-05-02 [1] CRAN (R 4.5.0)
 withr                3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
 xfun                 0.54       2025-10-30 [1] CRAN (R 4.5.0)
 xml2                 1.4.1      2025-10-27 [1] CRAN (R 4.5.0)
 yaml                 2.3.10     2024-07-26 [1] CRAN (R 4.5.0)

 [1] /Users/rpterson/Library/R/arm64/4.5/library
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

What do we mean by glass-box, exactly?

Introduction

What are glass-box models?

Are glass-box models a novelty? Or a necessity?

Why do we build models in the first place?

A brief history of model selection

Hypothesis testing

Information criteria

Computationally-intensive validation

Now what?

On black & glass boxes

Not all regression models are glass-box.

A refined glass-box model definition

Example: HERS Dataset

The data

Describing the data

Unadjusted models

A kitchen sink approach

Selecting a glass-box model…

Conclusion

Key Takeaways

Appendix

Thanks for reading

Introduction

What are glass-box models?

Are glass-box models a novelty? Or a necessity?

Why do we build models in the first place?

A brief history of model selection

Hypothesis testing

Information criteria

Computationally-intensive validation

Now what?

On black & glass boxes

Not all regression models are glass-box.

A refined glass-box model definition

Example: HERS Dataset

The data

Describing the data

Unadjusted models

A kitchen sink approach

Selecting a glass-box model…

Conclusion

Key Takeaways

Future Threads / Related Questions

Appendix

Thanks for reading