<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Data Diction</title>
<link>https://www.data-diction.com/</link>
<atom:link href="https://www.data-diction.com/index.xml" rel="self" type="application/rss+xml"/>
<description>Data and the stories they tell us</description>
<image>
<url>https://www.data-diction.com/logo2.png</url>
<title>Data Diction</title>
<link>https://www.data-diction.com/</link>
<height>149</height>
<width>144</width>
</image>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Tue, 28 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Can an AI assistant handle the tedious parts of academic writing?</title>
  <dc:creator>Ryan Peterson</dc:creator>
  <link>https://www.data-diction.com/posts/claude-code-demo/</link>
  <description><![CDATA[ 





<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Author Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This post deviates from our usual AI use policy as an experiment using Claude Code, the result of which will become clear as you read.</p>
</div>
</div>
<div class="ai-text">
<p>What if you could offload the parts of academic writing that have nothing to do with <em>writing</em>? Not the thinking, not the modeling, not the prose — but the LaTeX errors, the git housekeeping, the YAML frontmatter surgery that eats an afternoon every time you switch journals.</p>
<p>I recently put this to the test with <a href="https://claude.ai/code">Claude Code</a>, Anthropic’s AI coding assistant. Over a single conversation, I used it to clean up a git repo, migrate a manuscript from one journal template to another, and debug the resulting build errors. Here’s how it went.</p>
<section id="the-setup" class="level2">
<h2 class="anchored" data-anchor-id="the-setup">The setup</h2>
<p>I’m working on a paper targeting MDPI’s journal <em>Entropy</em>, but the manuscript (<code>RBIC_Multimodal.Rmd</code>) was still using the <code>rticles::elsevier_article</code> template from an earlier submission plan. The repo also had some generated figure files tracked in git that shouldn’t have been. Routine housekeeping, but the kind that quietly devours time.</p>
<p>Claude Code runs in your terminal (or IDE) and has direct access to your project files, shell, and git. You describe what you want, it proposes a plan, and you approve or redirect. It’s a conversation, not a one-shot prompt.</p>
</section>
<section id="task-1-stop-tracking-build-artifacts" class="level2">
<h2 class="anchored" data-anchor-id="task-1-stop-tracking-build-artifacts">Task 1: Stop tracking build artifacts</h2>
<p>The <code>RBIC_Multimodal_files/</code> directory — full of generated PDFs from knitr — was being tracked in git. These get regenerated every build, so they just add noise to diffs.</p>
<p>I asked Claude about it, and it laid out the standard three-step fix:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add to .gitignore</span></span>
<span id="cb1-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">echo</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RBIC_Multimodal_files/"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;</span> .gitignore</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove from git's index (but keep the local files!)</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> rm <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-r</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--cached</span> RBIC_Multimodal_files/</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Commit and push</span></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> commit <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-m</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stop tracking RBIC_Multimodal_files/"</span></span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> push</span></code></pre></div></div>
<p>The key here is the <code>--cached</code> flag — it untracks the files without deleting them from disk. Claude explained this clearly and then, after I confirmed, executed it. Eight PDFs removed from the repo, <code>.gitignore</code> updated, done.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Nothing here is beyond a quick Stack Overflow search. But Claude handled it end-to-end — checking what was tracked, editing <code>.gitignore</code>, running the commands, committing — without me switching contexts.</p>
</div>
</div>
</section>
<section id="task-2-elsevier-to-mdpi-entropy" class="level2">
<h2 class="anchored" data-anchor-id="task-2-elsevier-to-mdpi-entropy">Task 2: Elsevier to MDPI Entropy</h2>
<p>This is where things got more interesting. Switching <code>rticles</code> templates isn’t just changing one line in the YAML. The author/affiliation format is different, the citation engine changes (CSL to natbib), extra metadata fields are required, and you need a <code>Definitions/</code> folder with the MDPI class files.</p>
<p>I asked Claude to help ensure the proper template was in use. It:</p>
<ol type="1">
<li>Spawned a <strong>sub-agent</strong> to research <code>rticles::mdpi_article</code> requirements, YAML fields, and Entropy-specific settings</li>
<li>Read my existing Rmd frontmatter</li>
<li>Rewrote the YAML from scratch</li>
</ol>
<p>Here’s a simplified before/after:</p>
<p><strong>Before (Elsevier):</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">output</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb2-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  rticles:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:elsevier_article</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb2-3"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">keep_tex</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">true</span></span>
<span id="cb2-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">author</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb2-5"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ryan A. Peterson"</span></span>
<span id="cb2-6"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">affiliation</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> a,b</span></span>
<span id="cb2-7"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">footnote</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Corresponding Author"</span></span>
<span id="cb2-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">address</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb2-9"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">code</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> a</span></span>
<span id="cb2-10"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">address</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Department of Biostatistics..."</span></span>
<span id="cb2-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">csl</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> biometrics.csl</span></span></code></pre></div></div>
<p><strong>After (MDPI):</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">output</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb3-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  rticles:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:mdpi_article</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb3-3"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extra_dependencies</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> longtable</span></span>
<span id="cb3-4"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">keep_tex</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">true</span></span>
<span id="cb3-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">author</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb3-6"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> Ryan A. Peterson</span></span>
<span id="cb3-7"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">affil</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1,2,*"</span></span>
<span id="cb3-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">affiliation</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb3-9"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">num</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">    address</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">: </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">|</span></span>
<span id="cb3-11">      Department of Biostatistics...</span>
<span id="cb3-12"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">email</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> ryan-peterson@uiowa.edu</span></span>
<span id="cb3-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">journal</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> entropy</span></span>
<span id="cb3-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> article</span></span>
<span id="cb3-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">status</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> submit</span></span></code></pre></div></div>
<p>Claude also copied the <code>Definitions/</code> folder from the installed <code>rticles</code> package, added required back-matter fields (<code>acknowledgement</code>, <code>funding</code>, <code>conflictsofinterest</code>), and removed packages that conflict with <code>mdpi.cls</code> (like <code>endfloat</code> and the custom <code>caption</code> width).</p>
</section>
<section id="task-3-debugging-the-build" class="level2">
<h2 class="anchored" data-anchor-id="task-3-debugging-the-build">Task 3: Debugging the build</h2>
<p>I rendered the document in RStudio and fed the errors back to Claude. Three rounds of fixes followed.</p>
<section id="round-1-missing-ghostscript" class="level3">
<h3 class="anchored" data-anchor-id="round-1-missing-ghostscript">Round 1: Missing Ghostscript</h3>
<pre><code>! epstopdf Error: Required program gs not found</code></pre>
<p>The MDPI logos are <code>.eps</code> files, and pdfLaTeX needs Ghostscript to convert them. Claude proposed two options: install Ghostscript, or pre-convert the logos to PDF so collaborators don’t hit the same issue.</p>
<p>I pointed out that option 2 is better for the team:</p>
<blockquote class="blockquote">
<p>“It seems like [option 2] is the better option because if others are rendering this document on their machines, they may run into a similar issue.”</p>
</blockquote>
<p>Claude agreed, installed Ghostscript via conda (which I already had), converted the three EPS logos to PDF, and then patched <code>mdpi.cls</code> to drop the <code>.eps</code> extensions from <code>\includegraphics</code> calls. Now pdfLaTeX finds the PDFs automatically — no Ghostscript required at build time.</p>
</section>
<section id="round-2-a-sneaky-bibliography-entry" class="level3">
<h3 class="anchored" data-anchor-id="round-2-a-sneaky-bibliography-entry">Round 2: A sneaky bibliography entry</h3>
<p>The next error looked like a math issue:</p>
<pre><code>! Missing $ inserted.
l.22 ...95/3/10.1093/biomet/asn034/2/asn034.pdf]}}</code></pre>
<p>I told Claude I’d seen this kind of thing before with tables and escape characters. But it traced the actual source to a <code>.bib</code> entry with an <code>eprint</code> field containing a URL-like path full of underscores. Under natbib, those underscores get interpreted as LaTeX subscript operators. The <code>doi</code> and <code>URL</code> fields already covered the same reference, so removing <code>eprint</code> was the clean fix.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>This was the moment that sold me. I had a plausible (but wrong) hypothesis about the error source. Claude didn’t anchor on my suggestion — it searched the <code>.bib</code> file, matched the error text, and found the real cause.</p>
</div>
</div>
</section>
<section id="round-3-unused-packages" class="level3">
<h3 class="anchored" data-anchor-id="round-3-unused-packages">Round 3: Unused packages</h3>
<pre><code>Package gensymb Warning: Not defining \perthousand.</code></pre>
<p>I wasn’t sure whether <code>gensymb</code> was actually used anywhere in the paper. Claude searched the entire Rmd for any <code>gensymb</code> commands (<code>\degree</code>, <code>\celsius</code>, <code>\micro</code>, etc.) — found nothing but the <code>\usepackage</code> line itself. Removed it.</p>
</section>
</section>
<section id="the-collaboration-pattern" class="level2">
<h2 class="anchored" data-anchor-id="the-collaboration-pattern">The collaboration pattern</h2>
<p>What I found most useful wasn’t any single capability — it was the iteration loop:</p>
<ol type="1">
<li>I describe the goal</li>
<li>Claude proposes a plan</li>
<li>I approve or redirect</li>
<li>Claude executes</li>
<li>I report results (or errors)</li>
<li>Repeat</li>
</ol>
<p>I stayed in control throughout. Claude asked before running destructive commands. When I redirected (the EPS portability issue), it adapted immediately. When I told it the undefined references were expected (those chunks have <code>eval=FALSE</code> while I re-run an analysis), it moved on without trying to “fix” them.</p>
</section>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key takeaways</h2>
<ol type="1">
<li><p><strong>Claude Code is a collaborator, not a button.</strong> It works best with back-and-forth. The human provides judgment; the AI handles execution and research.</p></li>
<li><p><strong>It handles tedious format migrations well.</strong> YAML rewriting, class file patching, bibliography fixes — exactly the kind of work that’s straightforward but time-consuming.</p></li>
<li><p><strong>It debugs iteratively.</strong> Each error got diagnosed and fixed in one round, not blindly retried.</p></li>
<li><p><strong>Human oversight matters.</strong> I caught the portability issue with EPS conversion. I knew the undefined references were expected. The AI didn’t need to know everything — it just needed to listen when I told it.</p></li>
<li><p><strong>It’s git-aware.</strong> It reads status, writes descriptive commit messages, and pushes when asked — but only when asked.</p></li>
</ol>
</section>
<section id="one-more-thing" class="level2">
<h2 class="anchored" data-anchor-id="one-more-thing">One more thing</h2>
<p>At the end of our session, I asked Claude to generate a Quarto reveal.js presentation summarizing everything we’d done. It wrote 20 slides with accurate quotes from our conversation, code blocks from the actual commands, a mermaid diagram of the workflow, and custom SCSS theming.</p>
<p>Then I asked it to fix three issues with the first draft. It did.</p>
<p>Then I asked it to write this blog post.</p>
<p>It did that too.</p>
<hr>
</section>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Author Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>It felt incorrect to say that I – Ryan Peterson – authored this post, because the “I” used throughout – Claude generating text through my perspective – is not me. The only text written by me is contained in the two “Author Note” boxes.</p>
<p>We decided to leave the text as it is, without our usual review process, so that it stands as a genuine experiment of what Claude is capable of from a pure writing perspective. This post therefore represents an exception to a key GMWG value to be <strong>human first</strong>:</p>
<blockquote class="blockquote">
<p>We pledge to only use AI as a supporting writing tool</p>
</blockquote>
<p>It also demonstrates the importance of such a pledge.</p>
<p>In this and future posts, any AI generated content will be clearly denoted as such with a dotted border. For example:</p>
<div class="ai-text">
<p>This text is AI-generated…</p>
</div>
<p>…This text is not.</p>
</div>
</div>



 ]]></description>
  <category>tools</category>
  <category>R</category>
  <guid>https://www.data-diction.com/posts/claude-code-demo/</guid>
  <pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.data-diction.com/posts/claude-code-demo/claude-logo.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>How can I guarantee a significant result?</title>
  <dc:creator>Perry Hackman, PhD</dc:creator>
  <link>https://www.data-diction.com/posts/upsi-example/</link>
  <description><![CDATA[ 





<p><em>Dear student,</em></p>
<p>Before you embark on your “career” as a statistician, you must purge yourself of a childish misconception: that our job is to seek truth. Truth is stubborn, unpredictable, and worst of all, <strong>often unpublishable</strong>. Scientists crave confidence, the journals crave significance, and we, if we are clever, can provide both without the nuisance of real rigor.</p>
<p>In this post series, I will instruct in the Statistical Dark Arts. Today, I’ll describe how to always ensure a publishable result with model selection and unadjusted post selection inferences (UPSIs).</p>
<section id="why-upsis" class="level1">
<h1>Why UPSIs?</h1>
<p>You will frequently come across scientists with ambitious ideas, saying things like:</p>
<blockquote class="blockquote">
<p>I’ve conceived a brilliant new way to treat chronic pain effectively. I don’t want to waste time, efforts, and money on a result that ends up being insignificant… <strong>How can I guarantee a significant result??</strong></p>
</blockquote>
<p>No statistician wants to contribute to a null finding. Interpreting those is too hard! I’ll let you in on a statistical secret: there IS a way to guarantee statistical significance – and it’s actually extremely common in science.</p>
<p>It’s called <em>unadjusted post-selection inference (UPSI)</em>.</p>
<p>With good model selection tools and UPSIs in your toolbelt, you can turn any negative study into a positive one – guaranteed.</p>
</section>
<section id="example-a-study-for-treatment-of-chronic-pain" class="level1">
<h1>Example: a study for treatment of chronic pain</h1>
<p>Say the researcher who approached you is planning a cross-over study where patients were given one of two treatments for chronic pain. Patients will record their overall pain each day for 1 week while undergoing treatment A, and 1 week while undergoing treatment B.</p>
<p>Investigators are hoping to determine the difference between treatments in the average pain score. Our outcome <img src="https://latex.codecogs.com/png.latex?Y_i"> is thus a continuous measure for pain reduction: for each subject, this value reflects the difference in that subject’s weekly average pain score between the two candidate treatments.</p>
<p>In addition to determining whether the treatment works on average in their population, investigators wish to determine specific subgroups that might see the most benefit. They are primarily interested in subgroups by age and sex, as well as patient self-reported data such as alcohol consumption and physical activity.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Make sure they have an <em>exhaustive</em> list of candidate effect modifiers; I’d even suggest a few new ones like left/right handed-ness and coffee consumption.</p>
</div>
</div>
<p>Here’s a way to list all the possible subgroups of this set of variables:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">subgroups <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand.grid</span>(</span>
<span id="cb1-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"18-35"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"36-50"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"51-65"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"66-80"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"80+"</span>),</span>
<span id="cb1-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sex =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Female"</span>),</span>
<span id="cb1-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hand =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Left"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Right"</span>), </span>
<span id="cb1-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coffee =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2+"</span>),</span>
<span id="cb1-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alcohol =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3+"</span>),</span>
<span id="cb1-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">physical_activity =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3+"</span>)</span>
<span id="cb1-8">)</span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(subgroups)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 960</code></pre>
</div>
</div>
<p>This is a LOT of subgroups! And it’s way too many subgroups to possibly sift through, at least without tipping off the statistical reviewers about multiplicity.</p>
<div class="callout callout-style-default callout-caution callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Caution</span>Multiplicity
</div>
</div>
<div class="callout-body-container callout-body">
<p>Gone, alas, are the carefree days when we could run tons of tests, report the precious few that were significant, and quietly ignore the rest. Some meddlesome truth-seekers eventually caught on and began scolding us for our ingenuity, slapping the sinister label <strong>multiplicity</strong> onto our beloved golden-egg-laying goose. Now the reputable journals know to look for it. Our approach must become more subtle.</p>
</div>
</div>
<p>Here’s the key: turn to <strong>model selection</strong>. Write in the analysis plan:</p>
<blockquote class="blockquote">
<p>We will use forward selection to determine which variables or interactions improve our model’s predictions, adding each candidate predictor in a stepwise fashion until the AIC indicates the model’s predictions can no longer be improved.</p>
</blockquote>
<p>Sounds rigorous and honest, right?</p>
<p>This simple trick can guarantee significant results, and help you ensure your study will find a statistically significant result by the end, even when none exists!</p>
<p>How about that?! ZERO RISK! (well, to us anyway).</p>
<p>As a bonus, the more interactions we include that are truly null, the more unnecessarily opaque our model becomes – talk about a win-win.</p>
<hr>
<section id="virtual-trial-1" class="level2">
<h2 class="anchored" data-anchor-id="virtual-trial-1">Virtual Trial 1</h2>
<p>Let me illustrate via a simulation. Let’s virtually collect <img src="https://latex.codecogs.com/png.latex?n%20=%20100"> patients in our trial.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>)</span>
<span id="cb3-4">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sample size  </span></span>
<span id="cb3-5"></span>
<span id="cb3-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate recruitment (y: outcome)</span></span>
<span id="cb3-7">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb3-8"></span>
<span id="cb3-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate recruitment (assume each new person has random subgroup)</span></span>
<span id="cb3-10">x_idx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(subgroups), n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) </span>
<span id="cb3-11">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> subgroups[x_idx,]</span>
<span id="cb3-12">simdata <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y, X)</span>
<span id="cb3-13">simdata</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 100 × 7
          y age   sex    hand  coffee alcohol physical_activity
      &lt;dbl&gt; &lt;fct&gt; &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;  &lt;fct&gt;   &lt;fct&gt;            
 1  0.793   36-50 Female Left  0      2       1                
 2  0.522   80+   Female Left  2+     1       3+               
 3  1.75    36-50 Female Left  2+     1       0                
 4 -1.27    66-80 Female Right 0      3+      0                
 5  2.20    51-65 Male   Right 2+     0       2                
 6  0.433   36-50 Male   Left  0      2       1                
 7 -1.57    66-80 Male   Right 0      2       0                
 8 -0.935   80+   Female Left  1      3+      3+               
 9  0.0635  36-50 Female Left  2+     0       2                
10 -0.00239 66-80 Male   Left  0      2       0                
# ℹ 90 more rows</code></pre>
</div>
</div>
<p>We need to create interactions to select from:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create model matrix w/all interactions</span></span>
<span id="cb5-2">X2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">model.matrix</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata)[,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sanity check: make sure these columns represent main effects + interactions</span></span>
<span id="cb5-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(X2), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "alcohol2"                     "sexFemale:coffee1"           
 [3] "alcohol1:physical_activity3+" "age66-80:handRight"          
 [5] "alcohol1:physical_activity1"  "age51-65:sexFemale"          
 [7] "handRight:coffee1"            "age66-80:physical_activity2" 
 [9] "age51-65:coffee2+"            "age36-50:coffee1"            </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># combine into data set for model fitting</span></span>
<span id="cb7-2">simdata2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>y, X2)</span></code></pre></div></div>
</div>
<p>OK! We have 93 candidate predictors for model selection. Don’t worry, unless our paper’s reviewers are extremely thorough, we don’t have to report this number. Our final model will have many fewer.</p>
<p>Model selection is a simple task. The following code uses forward step-wise selection with AIC to build an optimally-predicting model, per our analysis plan.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(selectInferToolkit)</span>
<span id="cb8-2">fit_aic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_stepwise_ic</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"forward"</span>)</span></code></pre></div></div>
</div>
<div class="callout callout-style-default callout-caution callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Caution
</div>
</div>
<div class="callout-body-container callout-body">
<p>While other packages provide p-values after selection, most do so without being transparent, making it difficult to ensure they are truly UPSIs. If you aren’t careful with these less transparent tools, you might report p-values that are somehow diabolically adjusted towards insignificance. In contrast, the <code>selectInferToolkit</code> package, available on <a href="https://github.com/petersonR/selectInferToolkit/">GitHub</a>, helps you <strong>explicitly</strong> use UPSIs for inference:</p>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">infer_upsi</span>(fit_aic, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(coef <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="jucqwrccuk" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#jucqwrccuk table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#jucqwrccuk thead, #jucqwrccuk tbody, #jucqwrccuk tfoot, #jucqwrccuk tr, #jucqwrccuk td, #jucqwrccuk th {
  border-style: none;
}

#jucqwrccuk p {
  margin: 0;
  padding: 0;
}

#jucqwrccuk .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#jucqwrccuk .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#jucqwrccuk .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#jucqwrccuk .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#jucqwrccuk .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#jucqwrccuk .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#jucqwrccuk .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#jucqwrccuk .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#jucqwrccuk .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#jucqwrccuk .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#jucqwrccuk .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#jucqwrccuk .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#jucqwrccuk .gt_spanner_row {
  border-bottom-style: hidden;
}

#jucqwrccuk .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#jucqwrccuk .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#jucqwrccuk .gt_from_md > :first-child {
  margin-top: 0;
}

#jucqwrccuk .gt_from_md > :last-child {
  margin-bottom: 0;
}

#jucqwrccuk .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#jucqwrccuk .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#jucqwrccuk .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#jucqwrccuk .gt_row_group_first td {
  border-top-width: 2px;
}

#jucqwrccuk .gt_row_group_first th {
  border-top-width: 2px;
}

#jucqwrccuk .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#jucqwrccuk .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#jucqwrccuk .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#jucqwrccuk .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#jucqwrccuk .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#jucqwrccuk .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#jucqwrccuk .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#jucqwrccuk .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#jucqwrccuk .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#jucqwrccuk .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#jucqwrccuk .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#jucqwrccuk .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#jucqwrccuk .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#jucqwrccuk .gt_left {
  text-align: left;
}

#jucqwrccuk .gt_center {
  text-align: center;
}

#jucqwrccuk .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#jucqwrccuk .gt_font_normal {
  font-weight: normal;
}

#jucqwrccuk .gt_font_bold {
  font-weight: bold;
}

#jucqwrccuk .gt_font_italic {
  font-style: italic;
}

#jucqwrccuk .gt_super {
  font-size: 65%;
}

#jucqwrccuk .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#jucqwrccuk .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#jucqwrccuk .gt_indent_1 {
  text-indent: 5px;
}

#jucqwrccuk .gt_indent_2 {
  text-indent: 10px;
}

#jucqwrccuk .gt_indent_3 {
  text-indent: 15px;
}

#jucqwrccuk .gt_indent_4 {
  text-indent: 20px;
}

#jucqwrccuk .gt_indent_5 {
  text-indent: 25px;
}

#jucqwrccuk .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#jucqwrccuk div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="term" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">term</th>
<th id="coef" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">coef</th>
<th id="ci_low" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_low</th>
<th id="ci_high" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_high</th>
<th id="p_value" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">p_value</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="term">(Intercept)</td>
<td class="gt_row gt_right" headers="coef">0.07</td>
<td class="gt_row gt_right" headers="ci_low">−0.07</td>
<td class="gt_row gt_right" headers="ci_high">0.21</td>
<td class="gt_row gt_right" headers="p_value">0.315</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age36.50</td>
<td class="gt_row gt_right" headers="coef">0.17</td>
<td class="gt_row gt_right" headers="ci_low">−0.06</td>
<td class="gt_row gt_right" headers="ci_high">0.41</td>
<td class="gt_row gt_right" headers="p_value">0.156</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age66.80</td>
<td class="gt_row gt_right" headers="coef">−0.35</td>
<td class="gt_row gt_right" headers="ci_low">−0.55</td>
<td class="gt_row gt_right" headers="ci_high">−0.15</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">handRight</td>
<td class="gt_row gt_right" headers="coef">0.23</td>
<td class="gt_row gt_right" headers="ci_low">−0.02</td>
<td class="gt_row gt_right" headers="ci_high">0.47</td>
<td class="gt_row gt_right" headers="p_value">0.076</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age80..sexFemale</td>
<td class="gt_row gt_right" headers="coef">−0.29</td>
<td class="gt_row gt_right" headers="ci_low">−0.51</td>
<td class="gt_row gt_right" headers="ci_high">−0.07</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.013</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age36.50.handRight</td>
<td class="gt_row gt_right" headers="coef">−0.21</td>
<td class="gt_row gt_right" headers="ci_low">−0.43</td>
<td class="gt_row gt_right" headers="ci_high">0.01</td>
<td class="gt_row gt_right" headers="p_value">0.071</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age80..handRight</td>
<td class="gt_row gt_right" headers="coef">−0.22</td>
<td class="gt_row gt_right" headers="ci_low">−0.44</td>
<td class="gt_row gt_right" headers="ci_high">0.00</td>
<td class="gt_row gt_right" headers="p_value">0.053</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age66.80.coffee1</td>
<td class="gt_row gt_right" headers="coef">0.45</td>
<td class="gt_row gt_right" headers="ci_low">0.19</td>
<td class="gt_row gt_right" headers="ci_high">0.71</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age36.50.coffee2.</td>
<td class="gt_row gt_right" headers="coef">0.19</td>
<td class="gt_row gt_right" headers="ci_low">−0.01</td>
<td class="gt_row gt_right" headers="ci_high">0.40</td>
<td class="gt_row gt_right" headers="p_value">0.070</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age51.65.coffee2.</td>
<td class="gt_row gt_right" headers="coef">0.33</td>
<td class="gt_row gt_right" headers="ci_low">0.16</td>
<td class="gt_row gt_right" headers="ci_high">0.49</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age36.50.alcohol1</td>
<td class="gt_row gt_right" headers="coef">−0.13</td>
<td class="gt_row gt_right" headers="ci_low">−0.31</td>
<td class="gt_row gt_right" headers="ci_high">0.06</td>
<td class="gt_row gt_right" headers="p_value">0.178</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age66.80.alcohol1</td>
<td class="gt_row gt_right" headers="coef">−0.26</td>
<td class="gt_row gt_right" headers="ci_low">−0.50</td>
<td class="gt_row gt_right" headers="ci_high">−0.01</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.042</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age66.80.alcohol2</td>
<td class="gt_row gt_right" headers="coef">−0.19</td>
<td class="gt_row gt_right" headers="ci_low">−0.40</td>
<td class="gt_row gt_right" headers="ci_high">0.02</td>
<td class="gt_row gt_right" headers="p_value">0.081</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age51.65.physical_activity1</td>
<td class="gt_row gt_right" headers="coef">−0.33</td>
<td class="gt_row gt_right" headers="ci_low">−0.50</td>
<td class="gt_row gt_right" headers="ci_high">−0.17</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age51.65.physical_activity2</td>
<td class="gt_row gt_right" headers="coef">−0.18</td>
<td class="gt_row gt_right" headers="ci_low">−0.35</td>
<td class="gt_row gt_right" headers="ci_high">−0.01</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.039</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">sexFemale.alcohol1</td>
<td class="gt_row gt_right" headers="coef">0.33</td>
<td class="gt_row gt_right" headers="ci_low">0.15</td>
<td class="gt_row gt_right" headers="ci_high">0.50</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">sexFemale.alcohol2</td>
<td class="gt_row gt_right" headers="coef">0.19</td>
<td class="gt_row gt_right" headers="ci_low">0.02</td>
<td class="gt_row gt_right" headers="ci_high">0.37</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.036</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">sexFemale.alcohol3.</td>
<td class="gt_row gt_right" headers="coef">−0.17</td>
<td class="gt_row gt_right" headers="ci_low">−0.37</td>
<td class="gt_row gt_right" headers="ci_high">0.03</td>
<td class="gt_row gt_right" headers="p_value">0.091</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">sexFemale.physical_activity2</td>
<td class="gt_row gt_right" headers="coef">−0.27</td>
<td class="gt_row gt_right" headers="ci_low">−0.44</td>
<td class="gt_row gt_right" headers="ci_high">−0.10</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.003</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">handRight.alcohol3.</td>
<td class="gt_row gt_right" headers="coef">−0.16</td>
<td class="gt_row gt_right" headers="ci_low">−0.36</td>
<td class="gt_row gt_right" headers="ci_high">0.04</td>
<td class="gt_row gt_right" headers="p_value">0.123</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">handRight.physical_activity3.</td>
<td class="gt_row gt_right" headers="coef">−0.29</td>
<td class="gt_row gt_right" headers="ci_low">−0.48</td>
<td class="gt_row gt_right" headers="ci_high">−0.10</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.004</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">coffee2..alcohol3.</td>
<td class="gt_row gt_right" headers="coef">0.12</td>
<td class="gt_row gt_right" headers="ci_low">−0.05</td>
<td class="gt_row gt_right" headers="ci_high">0.29</td>
<td class="gt_row gt_right" headers="p_value">0.172</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">coffee1.physical_activity1</td>
<td class="gt_row gt_right" headers="coef">0.17</td>
<td class="gt_row gt_right" headers="ci_low">0.01</td>
<td class="gt_row gt_right" headers="ci_high">0.33</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.040</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">alcohol3..physical_activity3.</td>
<td class="gt_row gt_right" headers="coef">0.25</td>
<td class="gt_row gt_right" headers="ci_low">0.08</td>
<td class="gt_row gt_right" headers="ci_high">0.42</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.006</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<p>And there you have it!</p>
<p>For many subgroups, treatment A worked great (increased pain scores, significant positive effects). Treatment A didn’t work so well for others (with negative significant effects), for whom it actually appears to decrease pain scores. These results indicate we could stand to maximize pain by giving certain patients A, and other patients B. I wouldn’t fret too much about how we included interactions without their constituent main effects; leave that subtlety to the media to figure out.</p>
<div class="callout callout-style-default callout-caution callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Caution
</div>
</div>
<div class="callout-body-container callout-body">
<p>Sometimes, researchers want to actually decrease pain scores; you might want to check on this.</p>
</div>
</div>
</section>
<section id="whats-with-the-high-p-values" class="level2">
<h2 class="anchored" data-anchor-id="whats-with-the-high-p-values">What’s with the high p-values?</h2>
<p>I know what you’re thinking - won’t it be confusing to include all those results with high p-values?</p>
<p>An expert tip: if you want to make fewer discoveries and get even lower p-values, simply use BIC instead of AIC. BIC only lets the <em>most</em> significant results into the model, and you can say it’s <em>asymptotically consistent</em>, which sounds equally rigorous to AIC, which is <em>asymptotically efficient</em>.</p>
<p>In this example, here’s the model selected via stepwise BIC:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">fit_bic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_stepwise_ic</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2, </span>
<span id="cb10-2">                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"forward"</span>, </span>
<span id="cb10-3">                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BIC"</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">infer_upsi</span>(fit_bic, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(coef <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="cngiboesdc" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#cngiboesdc table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#cngiboesdc thead, #cngiboesdc tbody, #cngiboesdc tfoot, #cngiboesdc tr, #cngiboesdc td, #cngiboesdc th {
  border-style: none;
}

#cngiboesdc p {
  margin: 0;
  padding: 0;
}

#cngiboesdc .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#cngiboesdc .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#cngiboesdc .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#cngiboesdc .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#cngiboesdc .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#cngiboesdc .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#cngiboesdc .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#cngiboesdc .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#cngiboesdc .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#cngiboesdc .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#cngiboesdc .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#cngiboesdc .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#cngiboesdc .gt_spanner_row {
  border-bottom-style: hidden;
}

#cngiboesdc .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#cngiboesdc .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#cngiboesdc .gt_from_md > :first-child {
  margin-top: 0;
}

#cngiboesdc .gt_from_md > :last-child {
  margin-bottom: 0;
}

#cngiboesdc .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#cngiboesdc .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#cngiboesdc .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#cngiboesdc .gt_row_group_first td {
  border-top-width: 2px;
}

#cngiboesdc .gt_row_group_first th {
  border-top-width: 2px;
}

#cngiboesdc .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#cngiboesdc .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#cngiboesdc .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#cngiboesdc .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#cngiboesdc .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#cngiboesdc .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#cngiboesdc .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#cngiboesdc .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#cngiboesdc .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#cngiboesdc .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#cngiboesdc .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#cngiboesdc .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#cngiboesdc .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#cngiboesdc .gt_left {
  text-align: left;
}

#cngiboesdc .gt_center {
  text-align: center;
}

#cngiboesdc .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#cngiboesdc .gt_font_normal {
  font-weight: normal;
}

#cngiboesdc .gt_font_bold {
  font-weight: bold;
}

#cngiboesdc .gt_font_italic {
  font-style: italic;
}

#cngiboesdc .gt_super {
  font-size: 65%;
}

#cngiboesdc .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#cngiboesdc .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#cngiboesdc .gt_indent_1 {
  text-indent: 5px;
}

#cngiboesdc .gt_indent_2 {
  text-indent: 10px;
}

#cngiboesdc .gt_indent_3 {
  text-indent: 15px;
}

#cngiboesdc .gt_indent_4 {
  text-indent: 20px;
}

#cngiboesdc .gt_indent_5 {
  text-indent: 25px;
}

#cngiboesdc .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#cngiboesdc div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="term" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">term</th>
<th id="coef" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">coef</th>
<th id="ci_low" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_low</th>
<th id="ci_high" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_high</th>
<th id="p_value" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">p_value</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="term">(Intercept)</td>
<td class="gt_row gt_right" headers="coef">0.07</td>
<td class="gt_row gt_right" headers="ci_low">−0.09</td>
<td class="gt_row gt_right" headers="ci_high">0.24</td>
<td class="gt_row gt_right" headers="p_value">0.393</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age36.50</td>
<td class="gt_row gt_right" headers="coef">0.27</td>
<td class="gt_row gt_right" headers="ci_low">0.10</td>
<td class="gt_row gt_right" headers="ci_high">0.45</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.003</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age66.80</td>
<td class="gt_row gt_right" headers="coef">−0.38</td>
<td class="gt_row gt_right" headers="ci_low">−0.57</td>
<td class="gt_row gt_right" headers="ci_high">−0.18</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age66.80.coffee1</td>
<td class="gt_row gt_right" headers="coef">0.23</td>
<td class="gt_row gt_right" headers="ci_low">0.04</td>
<td class="gt_row gt_right" headers="ci_high">0.42</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.019</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age51.65.coffee2.</td>
<td class="gt_row gt_right" headers="coef">0.29</td>
<td class="gt_row gt_right" headers="ci_low">0.11</td>
<td class="gt_row gt_right" headers="ci_high">0.46</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.002</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age51.65.physical_activity1</td>
<td class="gt_row gt_right" headers="coef">−0.24</td>
<td class="gt_row gt_right" headers="ci_low">−0.42</td>
<td class="gt_row gt_right" headers="ci_high">−0.07</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.007</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">sexFemale.physical_activity2</td>
<td class="gt_row gt_right" headers="coef">−0.32</td>
<td class="gt_row gt_right" headers="ci_low">−0.49</td>
<td class="gt_row gt_right" headers="ci_high">−0.15</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">&lt;0.001</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<p>Voila – the headlines practically write themselves!</p>
</section>
<section id="how-upsis-work-so-well" class="level2">
<h2 class="anchored" data-anchor-id="how-upsis-work-so-well">How UPSIs work so well</h2>
<p>Let me let you in on a little secret. If you were paying attention, you’d have seen the outcomes in the preceding example, <img src="https://latex.codecogs.com/png.latex?Y_i">, were generated <em>completely randomly</em>. <strong>There was, by design, no true relationship to discover at all</strong>. Yet, I was virtually guaranteed to find a significant result. This is due to multiplicity’s clever younger brother, selective inference.</p>
<p>Selective inference is a difficult problem to solve; in fact, some call it impossible. Others hold out hope. At any rate, UPSIs have inertia and incentives on their side.</p>
<p>Later posts to this blog will go into specific solutions to the selective inference problem and how to implement them in R. Fair warning though, these solutions are often less promising and will certainly be less significant.</p>
<p>For now, let me finish convincing you that UPSIs are indeed effective at resolving the “publish or perish” dilemma; with USPIs, we can thrive at both! It should suffice to repeat this simulation again, as though it were performed in a parallel universe.</p>
</section>
<section id="virtual-trial-2" class="level2">
<h2 class="anchored" data-anchor-id="virtual-trial-2">Virtual Trial 2</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">simdata2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb12-2">fit_bic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_stepwise_ic</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2, </span>
<span id="cb12-3">                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"forward"</span>, </span>
<span id="cb12-4">                              <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BIC"</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">infer_upsi</span>(fit_bic, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(coef <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) </span></code></pre></div></div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="adrrnzvnsg" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#adrrnzvnsg table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#adrrnzvnsg thead, #adrrnzvnsg tbody, #adrrnzvnsg tfoot, #adrrnzvnsg tr, #adrrnzvnsg td, #adrrnzvnsg th {
  border-style: none;
}

#adrrnzvnsg p {
  margin: 0;
  padding: 0;
}

#adrrnzvnsg .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#adrrnzvnsg .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#adrrnzvnsg .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#adrrnzvnsg .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#adrrnzvnsg .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#adrrnzvnsg .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#adrrnzvnsg .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#adrrnzvnsg .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#adrrnzvnsg .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#adrrnzvnsg .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#adrrnzvnsg .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#adrrnzvnsg .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#adrrnzvnsg .gt_spanner_row {
  border-bottom-style: hidden;
}

#adrrnzvnsg .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#adrrnzvnsg .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#adrrnzvnsg .gt_from_md > :first-child {
  margin-top: 0;
}

#adrrnzvnsg .gt_from_md > :last-child {
  margin-bottom: 0;
}

#adrrnzvnsg .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#adrrnzvnsg .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#adrrnzvnsg .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#adrrnzvnsg .gt_row_group_first td {
  border-top-width: 2px;
}

#adrrnzvnsg .gt_row_group_first th {
  border-top-width: 2px;
}

#adrrnzvnsg .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#adrrnzvnsg .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#adrrnzvnsg .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#adrrnzvnsg .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#adrrnzvnsg .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#adrrnzvnsg .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#adrrnzvnsg .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#adrrnzvnsg .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#adrrnzvnsg .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#adrrnzvnsg .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#adrrnzvnsg .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#adrrnzvnsg .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#adrrnzvnsg .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#adrrnzvnsg .gt_left {
  text-align: left;
}

#adrrnzvnsg .gt_center {
  text-align: center;
}

#adrrnzvnsg .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#adrrnzvnsg .gt_font_normal {
  font-weight: normal;
}

#adrrnzvnsg .gt_font_bold {
  font-weight: bold;
}

#adrrnzvnsg .gt_font_italic {
  font-style: italic;
}

#adrrnzvnsg .gt_super {
  font-size: 65%;
}

#adrrnzvnsg .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#adrrnzvnsg .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#adrrnzvnsg .gt_indent_1 {
  text-indent: 5px;
}

#adrrnzvnsg .gt_indent_2 {
  text-indent: 10px;
}

#adrrnzvnsg .gt_indent_3 {
  text-indent: 15px;
}

#adrrnzvnsg .gt_indent_4 {
  text-indent: 20px;
}

#adrrnzvnsg .gt_indent_5 {
  text-indent: 25px;
}

#adrrnzvnsg .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#adrrnzvnsg div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="term" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">term</th>
<th id="coef" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">coef</th>
<th id="ci_low" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_low</th>
<th id="ci_high" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">ci_high</th>
<th id="p_value" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">p_value</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="term">(Intercept)</td>
<td class="gt_row gt_right" headers="coef">0.03</td>
<td class="gt_row gt_right" headers="ci_low">−0.14</td>
<td class="gt_row gt_right" headers="ci_high">0.21</td>
<td class="gt_row gt_right" headers="p_value">0.715</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age66.80.sexFemale</td>
<td class="gt_row gt_right" headers="coef">−0.25</td>
<td class="gt_row gt_right" headers="ci_low">−0.43</td>
<td class="gt_row gt_right" headers="ci_high">−0.07</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.008</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="term">age80..coffee2.</td>
<td class="gt_row gt_right" headers="coef">0.30</td>
<td class="gt_row gt_right" headers="ci_low">0.11</td>
<td class="gt_row gt_right" headers="ci_high">0.50</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.003</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="term">age80..physical_activity1</td>
<td class="gt_row gt_right" headers="coef">−0.26</td>
<td class="gt_row gt_right" headers="ci_low">−0.46</td>
<td class="gt_row gt_right" headers="ci_high">−0.07</td>
<td class="gt_row gt_right" headers="p_value" style="font-weight: bold">0.009</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<p>Indeed, we still discover “significant” relationships. And, these are completely different treatment effect modifiers. In this parallel universe, our scientist friends publish completely different effects and will no doubt pour their hard-fought resources into what we well know are wild goose chases.</p>
</section>
<section id="this-is-not-a-fluke." class="level2">
<h2 class="anchored" data-anchor-id="this-is-not-a-fluke.">This is not a fluke.</h2>
<p>In case you are still skeptical, let’s repeat this 50 times, as though we ran the study in 50 parallel universes.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">sim_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb14-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span>(s <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>) { </span>
<span id="cb14-3">  simdata2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb14-4">  fit_bic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_stepwise_ic</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2, </span>
<span id="cb14-5">                                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"forward"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BIC"</span>)</span>
<span id="cb14-6">  sim_results[[s]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">infer_upsi</span>(fit_bic, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> simdata2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(coef <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) </span>
<span id="cb14-9">}</span>
<span id="cb14-10"></span>
<span id="cb14-11">all_selections <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(sim_results, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.id =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"simulation"</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">main_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> all_selections <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(term <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(Intercept)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(simulation) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb15-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb15-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">any_p_lt_05 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>), </span>
<span id="cb15-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_selections =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()</span>
<span id="cb15-7">  )</span></code></pre></div></div>
</div>
<p>Using BIC, we found at least one significant effect in 46 of 50 trials, and on average these trials found 2.6 significant effects. We achieved falsely significant results 92% of the time.</p>
<p>If you’re thinking “hmm, my study still might have a chance of failing”, well good news. AIC finds a significant result nearly 100% of the time. We also only considered pairwise interactions between 6 candidate predictors. To minimize the risk of a negative study, one could consider higher level interactions or a greater number of subgroups, and a positive result quickly becomes a sure thing.</p>
<p>As I’ve said, UPSIs have incentives and inertia on their side. The more broad we make this practice, the better we can do at ensuring that the publications keep flowing, regardless of the truth.</p>
<p>Despicably yours,</p>
<p><em>P. Hackman</em></p>
<hr>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key Takeaways</h2>
<ul>
<li>Unadjusted post-selection inference nearly guarantees false positives</li>
<li>Searching for subgroup effects is a model selection problem</li>
<li>Model selection algorithms don’t make multiplicity issues go away; they make them more subtle and harder to adjust for.</li>
</ul>
</section>
<section id="future-threads" class="level2">
<h2 class="anchored" data-anchor-id="future-threads">Future Threads</h2>
<ul>
<li>What are some alternatives to UPSI-based inference, and how are they implemented in R?</li>
<li>What is the <code>selectInferToolkit</code> package?</li>
</ul>
</section>
<section id="related" class="level2">
<h2 class="anchored" data-anchor-id="related">Related</h2>
<ul>
<li>Our group first described the term UPSI in the post on <a href="https://data-diction.com/posts/glassbox-models/">glass-box modeling</a>.</li>
<li>Good further reading: <a href="https://hdsr.mitpress.mit.edu/pub/l39rpgyc/release/3">Benjamini 2020</a></li>
</ul>
<!-- Edit the line below after review -->
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-5-contents" aria-controls="callout-5" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Reviewers
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-5" class="callout-5-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<ul>
<li>Logan Harris (December 1, 2025)</li>
<li>Matt Bolt (December 11, 2025)</li>
</ul>
</div>
</div>
</div>
<hr>
</section>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<details>
<summary>
R Session Info
</summary>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">sessioninfo<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">session_info</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Sequoia 15.7.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2025-12-22
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package            * version     date (UTC) lib source
 adaptMCMC            1.5         2024-01-29 [1] CRAN (R 4.5.0)
 backports            1.5.0       2024-05-23 [1] CRAN (R 4.5.0)
 broom              * 1.0.11      2025-12-04 [1] CRAN (R 4.5.2)
 class                7.3-23      2025-01-01 [2] CRAN (R 4.5.2)
 cli                  3.6.5       2025-04-23 [1] CRAN (R 4.5.0)
 coda                 0.19-4.1    2024-01-31 [1] CRAN (R 4.5.0)
 codetools            0.2-20      2024-03-31 [2] CRAN (R 4.5.2)
 data.table           1.17.8      2025-07-10 [1] CRAN (R 4.5.0)
 digest               0.6.39      2025-11-19 [1] CRAN (R 4.5.2)
 dplyr              * 1.1.4       2023-11-17 [1] CRAN (R 4.5.0)
 evaluate             1.0.5       2025-08-27 [1] CRAN (R 4.5.0)
 farver               2.1.2       2024-05-13 [1] CRAN (R 4.5.0)
 fastmap              1.2.0       2024-05-15 [1] CRAN (R 4.5.0)
 forcats            * 1.0.1       2025-09-25 [1] CRAN (R 4.5.0)
 foreach              1.5.2       2022-02-02 [1] CRAN (R 4.5.0)
 fs                   1.6.6       2025-04-12 [1] CRAN (R 4.5.0)
 future               1.68.0      2025-11-17 [1] CRAN (R 4.5.2)
 future.apply         1.20.0      2025-06-06 [1] CRAN (R 4.5.0)
 generics             0.1.4       2025-05-09 [1] CRAN (R 4.5.0)
 ggplot2            * 4.0.1       2025-11-14 [1] CRAN (R 4.5.2)
 glmnet               4.1-10      2025-07-17 [1] CRAN (R 4.5.0)
 globals              0.18.0      2025-05-08 [1] CRAN (R 4.5.0)
 glue                 1.8.0       2024-09-30 [1] CRAN (R 4.5.0)
 gower                1.0.2       2024-12-17 [1] CRAN (R 4.5.0)
 gt                 * 1.1.0       2025-09-23 [1] CRAN (R 4.5.0)
 gtable               0.3.6       2024-10-25 [1] CRAN (R 4.5.0)
 hardhat              1.4.2       2025-08-20 [1] CRAN (R 4.5.0)
 hms                  1.1.4       2025-10-17 [1] CRAN (R 4.5.0)
 htmltools            0.5.9       2025-12-04 [1] CRAN (R 4.5.2)
 htmlwidgets          1.6.4       2023-12-06 [1] CRAN (R 4.5.0)
 intervals            0.15.5      2024-08-23 [1] CRAN (R 4.5.0)
 ipred                0.9-15      2024-07-18 [1] CRAN (R 4.5.0)
 iterators            1.0.14      2022-02-05 [1] CRAN (R 4.5.0)
 jsonlite             2.0.0       2025-03-27 [1] CRAN (R 4.5.0)
 knitr                1.50        2025-03-16 [1] CRAN (R 4.5.0)
 lattice              0.22-7      2025-04-02 [2] CRAN (R 4.5.2)
 lava                 1.8.2       2025-10-30 [1] CRAN (R 4.5.0)
 lifecycle            1.0.4       2023-11-07 [1] CRAN (R 4.5.0)
 listenv              0.10.0      2025-11-02 [1] CRAN (R 4.5.0)
 lubridate          * 1.9.4       2024-12-08 [1] CRAN (R 4.5.0)
 magrittr             2.0.4       2025-09-12 [1] CRAN (R 4.5.0)
 MASS                 7.3-65      2025-02-28 [2] CRAN (R 4.5.2)
 Matrix               1.7-4       2025-08-28 [2] CRAN (R 4.5.2)
 ncvreg               3.16.0      2025-10-09 [1] Github (pbreheny/ncvreg@5fecc8c)
 nnet                 7.3-20      2025-01-01 [1] CRAN (R 4.5.0)
 parallelly           1.45.1      2025-07-24 [1] CRAN (R 4.5.0)
 pbapply              1.7-4       2025-07-20 [1] CRAN (R 4.5.0)
 pillar               1.11.1      2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig            2.0.3       2019-09-22 [1] CRAN (R 4.5.0)
 prodlim              2025.04.28  2025-04-28 [1] CRAN (R 4.5.0)
 purrr              * 1.2.0       2025-11-04 [1] CRAN (R 4.5.0)
 R6                   2.6.1       2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer         1.1-3       2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                 1.1.0       2025-07-02 [1] CRAN (R 4.5.0)
 readr              * 2.1.6       2025-11-14 [1] CRAN (R 4.5.2)
 recipes              1.3.1       2025-05-21 [1] CRAN (R 4.5.0)
 rlang                1.1.6       2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown            2.30        2025-09-28 [1] CRAN (R 4.5.0)
 rpart                4.1.24      2025-01-07 [2] CRAN (R 4.5.2)
 rstudioapi           0.17.1      2024-10-22 [1] CRAN (R 4.5.0)
 S7                   0.2.1       2025-11-14 [1] CRAN (R 4.5.2)
 sass                 0.4.10      2025-04-11 [1] CRAN (R 4.5.0)
 scales               1.4.0       2025-04-24 [1] CRAN (R 4.5.0)
 selectInferToolkit * 0.4.2       2025-12-15 [1] local
 selectiveInference   1.2.5       2019-09-07 [1] CRAN (R 4.5.0)
 sessioninfo          1.2.3       2025-02-05 [1] CRAN (R 4.5.0)
 shape                1.4.6.1     2024-02-23 [1] CRAN (R 4.5.0)
 sparsevctrs          0.3.4       2025-05-25 [1] CRAN (R 4.5.0)
 stringi              1.8.7       2025-03-27 [1] CRAN (R 4.5.0)
 stringr            * 1.6.0       2025-11-04 [1] CRAN (R 4.5.0)
 survival             3.8-3       2024-12-17 [2] CRAN (R 4.5.2)
 tibble             * 3.3.0       2025-06-08 [1] CRAN (R 4.5.0)
 tidyr              * 1.3.1       2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect           1.2.1       2024-03-11 [1] CRAN (R 4.5.0)
 tidyverse          * 2.0.0       2023-02-22 [1] CRAN (R 4.5.0)
 timechange           0.3.0       2024-01-18 [1] CRAN (R 4.5.0)
 timeDate             4051.111    2025-10-17 [1] CRAN (R 4.5.0)
 tzdb                 0.5.0       2025-03-15 [1] CRAN (R 4.5.0)
 utf8                 1.2.6       2025-06-08 [1] CRAN (R 4.5.0)
 vctrs                0.6.5       2023-12-01 [1] CRAN (R 4.5.0)
 withr                3.0.2       2024-10-28 [1] CRAN (R 4.5.0)
 xfun                 0.54        2025-10-30 [1] CRAN (R 4.5.0)
 xml2                 1.5.1       2025-12-01 [1] CRAN (R 4.5.2)
 yaml                 2.3.11.9000 2025-12-10 [1] Github (r-lib/r-yaml@6dc4582)

 [1] /Users/rpterson/Library/R/arm64/4.5/library
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
</details>


</section>

 ]]></description>
  <category>model selection</category>
  <category>glass-box modeling</category>
  <category>post-selection inference</category>
  <category>analysis</category>
  <category>R</category>
  <category>satire</category>
  <guid>https://www.data-diction.com/posts/upsi-example/</guid>
  <pubDate>Mon, 22 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.data-diction.com/posts/upsi-example/thumbnail.png" medium="image" type="image/png" height="112" width="144"/>
</item>
<item>
  <title>Does debiasing estimates lead to better predictions?</title>
  <dc:creator>Logan Harris</dc:creator>
  <link>https://www.data-diction.com/posts/bias-benefit/</link>
  <description><![CDATA[ 





<!-- Edit the line below after review -->
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Reviewers
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li>Ryan Peterson (2025-11-24).</li>
<li>Patrick Breheny (2025-12-12).</li>
</ul>
</div>
</div>
<hr>
<p>What connotation do you attach to the word “bias”? A negative one?</p>
<p>In this post we will see why not all bias is bad… at least when it comes to building predictive models. In fact, for many years, statisticians have recognized the benefits of biased estimators in reducing prediction error. Perhaps you knew this, but if not, don’t worry. That is the purpose of this post.</p>
<div class="hidden">
<p>$$</p>
<p>% Uppercase roman letters </p>
<p>% Lowercase roman letters (c, d, u, v have to be treated special; see end) </p>
<p>%% Roman letters with hats </p>
<p>%% Roman letters with subscripts </p>
<p>%% Roman letters with tildes </p>
<p>%% Script letters </p>
<p>%% Greek letters </p>
<p>%% Greek letters with tildes </p>
<p>%% Operators </p>
<p>%% Statistical </p>
% Fisher/observed information
<p>% Independence </p>
<p>%% Mathematical %% %% Requires dsfonts <!-- \providecommand{\abs}[1]{\left\lvert#1\right\rvert} --> </p>
<p>%% Equations </p>
<p>% Other </p>
<p>$$</p>
</div>
<section id="modeling-goal" class="level1">
<h1>Modeling Goal</h1>
<p>When starting off with a data analysis, it is important to outline the primary goals. Whether you realize it or not, in virtually all cases, a primary goal is to use the data to develop a predictive model.</p>
<p>Consider a clinical trial evaluating a new drug for metastatic lung cancer. We say our goal is to “estimate the treatment effect” of our new drug. However, estimating a treatment effect is fundamentally a predictive task. That is, we need to <em>predict</em> what would happen with treatment and what would happen without treatment. The treatment effect is the difference between those two predictions*, the better our predictions… the better our estimate of the treatment effect is.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>A pre-<em>dictor</em> is pre-<em>what</em>, exactly?*
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>When we say a treatment effect is “the difference between two predictions,” that only works if the predictions are made using information that comes before the treatment starts. These are true pre-<em>treatment</em> predictors.</p>
<p>If we accidentally include a model “predictor” that is measured after treatment begins, the treatment could affect it, and the real treatment effect gets obfuscated. The model might appear to “predict” well, but the truth is that it’s not “predicting” at all. Models built this way can give a misleading estimate of the effect of the treatment because part of the effect has been absorbed by the post-treatment variable.</p>
<p>For a valid treatment effect, we must base predictions only on information that could not have been influenced by the treatment – valid <strong>pre</strong>-dictor variables.</p>
</div>
</div>
</div>
<p>Now, suppose with treatment alone we can predict remission status with 60% accuracy. If you also have patients’ genetic information, what should you do with that?</p>
<p>Including it directly into a model is problematic because the human genome is large*. Unless you have a massive number of patients in your trial (hundreds of thousands) you will NEED to make assumptions about the plausible effects of someone’s genetic information on how well the new treatment works. These assumptions allow us to fit a model with all the genetic information but also intentionally introduce bias to the estimates. This is what a method known as penalized regression does, which we will cover momentarily.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Human Genome*
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>The human genome is estimated to contain between 30,000 and 40,000 genes!</p>
</div>
</div>
</div>
<p>Now, suppose including genetic information allows us to predict if a patient will be experience remission with 75% accuracy. But this was with biased estimates?! Surely, if we could remove this bias we would get even better predictions, right?</p>
<p>To answer this, we will now break down what makes a set of estimates good at prediction.</p>
</section>
<section id="breaking-down-predictive-performance" class="level1">
<h1>Breaking down predictive performance</h1>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Bias-variance tradeoff
</div>
</div>
<div class="callout-body-container callout-body">
<p>When we estimate parameters for a model, we are always juggling two competing forces: <strong>bias</strong>, which measures how far our average estimate is from the truth and <strong>variance</strong>, which reflects how much our estimates would change if we collect a new sample. A model’s ability to predict an outcome is dependent on both. Increasing or inducing bias can often <em>reduce</em> variance, whereas decreasing bias can often <em>increase</em> variance. Good predictions come from striking a balance between the two. It is not uncommon to be able to intentionally introduce a little bias and be able to drastically reduce variance resulting in better predictive abilities.</p>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/bias-benefit/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>It is like comparing one friend who is always 5 minutes late to one that is sometimes 30 minutes early and other times 30 minutes late. While the latter is on average on time, you’d probably describe the one who is always 5 minutes late as more reliable.</p>
<p>If you are someone who likes mathematical details, keep reading. But if the previous example makes sense, feel free to skip to the next section.</p>
<p>For a sample of size <img src="https://latex.codecogs.com/png.latex?n">, suppose we have a continuous outcome stored in a length <img src="https://latex.codecogs.com/png.latex?n"> vector <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7By%7D"> and <img src="https://latex.codecogs.com/png.latex?p"> features on each sample unit stored in an <img src="https://latex.codecogs.com/png.latex?n%20%5Ctimes%20p"> matrix <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D">. We will focus on a linear predictor setting. That is, we want to predict <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7By%7D"> based on a linear combination of the features in <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D">. We do this by estimating <img src="https://latex.codecogs.com/png.latex?p"> parameters <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> and then obtaining predictions <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmathbf%7By%7D%7D">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cmathbf%7By%7D%7D%20=%20%5Cmathbf%7BX%7D%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D%0A"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D"> is usually estimated in a way that is based on minimizing <img src="https://latex.codecogs.com/png.latex?%5ClVert%5Cmathbf%7By%7D-%20%5Chat%7B%5Cmathbf%7By%7D%7D%5CrVert_2%5E2">, known as the residual sum of squares. However, to assess predictive performance we might consider mean square prediction error (MSPE). MSPE shifts our focus to how well our model is expected to predict <em>a single out-of-sample observation</em> (an observation <img src="https://latex.codecogs.com/png.latex?y_0"> with predictors <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bx%7D_0"> that was not in the original sample used to estimate the model).</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BMSPE%7D%20=%20E%20%5Cleft%5B%5ClVert%20y_0%20-%20%5Cmathbf%7Bx%7D_0%5E%7B%5Cscriptscriptstyle%5Ctop%7D%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D%5CrVert_2%5E2%5Cright%5D%20=%20E%20%5Cleft%5B%20(y_0%20-%20%5Chat%20y_0)%5E2%5Cright%5D%0A"></p>
<p>Assuming that the error structure for <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7By%7D"> is normally distributed (i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7By%7D=%20%5Cmathbf%7BX%7D%5Cboldsymbol%7B%5Cbeta%7D+%20%5Cepsilon,%20%5Cepsilon%5Coverset%7B%5Ctext%7Biid%7D%7D%7B%5Csim%7D%5Ctextrm%7BN%7D(0,%20%5Csigma%5E2)">) and letting <img src="https://latex.codecogs.com/png.latex?%5Cepsilon_0"> be the error corresponding to <img src="https://latex.codecogs.com/png.latex?y_0"> and <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bx%7D_0">, MSPE can be decomposed as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A(y_0%20-%20%5Chat%7By%7D_0)%5E2%20&amp;=%20(%5Cmathbf%7Bx%7D_0%5ET%20%5Cboldsymbol%7B%5Cbeta%7D-%20%5Cmathbf%7Bx%7D_0%5ET%20%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D+%20%5Cepsilon_0)%5E2%20%5C%5C%0A&amp;=%20(%5Cmathbf%7Bx%7D_0%5E%7B%5Cscriptscriptstyle%5Ctop%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D))%5E2%20+%202%20%5Cmathbf%7Bx%7D_0%20%5E%7B%5Cscriptscriptstyle%5Ctop%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D)%5Cepsilon_0%20+%20%5Cepsilon_0%5E2%20%5C%5C%0A%5CRightarrow%20%5Ctext%7BMSPE%7D%20&amp;=%20E%5Cleft%5B%5Cmathbf%7Bx%7D_0%5E%7B%5Cscriptscriptstyle%5Ctop%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D)(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D)%5E%7B%5Cscriptscriptstyle%5Ctop%7D%5Cmathbf%7Bx%7D_0%5Cright%5D%20+%20%5Csigma%5E2%20%5C%5C%0A&amp;=%20%5Cmathbf%7Bx%7D_0%5E%7B%5Cscriptscriptstyle%5Ctop%7DE%5Cleft%5B(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D)(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D-%20%5Cboldsymbol%7B%5Cbeta%7D)%5E%7B%5Cscriptscriptstyle%5Ctop%7D%5Cright%5D%5Cmathbf%7Bx%7D_0%20+%20%5Csigma%5E2%20%5C%5C%0A&amp;=%20%5Cmathbf%7Bx%7D_0%5E%7B%5Cscriptscriptstyle%5Ctop%7D%5Cleft%5B%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D)%20+%20%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D)%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D)%5E%7B%5Cscriptscriptstyle%5Ctop%7D%5Cright%5D%5Cmathbf%7Bx%7D_0%20+%20%5Csigma%5E2%0A%5Cend%7Baligned%7D%0A"></p>
<p>To make the point easier to see, assume we have just a single predictor <img src="https://latex.codecogs.com/png.latex?x_0"> (i.e., <img src="https://latex.codecogs.com/png.latex?p%20=%201">) and consider two different candidate estimators for <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D%5EA"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D%5EB">. Then, the difference in their MSPEs is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BMSPE%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)%20-%20%5Ctext%7BMSPE%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%0A"></p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A&amp;%20=%20x_0%5E2%20%5Cleft(%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)%20+%20%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)%5E2%5Cright)+%20%5Csigma%5E2%0A-%20%5Cleft%5B%20x_0%5E2%20%5Cleft(%20%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%20+%20%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%5E2%5Cright)%20+%20%5Csigma%5E2%20%5Cright%5D%20%5C%5C%0A&amp;%20%5Cpropto%20(%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)-%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EB))%20+%20(%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)%5E2%20-%20%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%5E2).%0A%5Cend%7Baligned%7D%0A"></p>
<p>So now we see the details of the earlier claim that predictive performance is a function of both variance and bias of the estimates.</p>
<p>Let’s work through a toy example. Consider an example where estimator A has a bias of -0.5 and has a variance of 1. This scenario might reflect a <strong>penalized</strong> estimate for <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20%5EA">, where the estimator wants to shrink the estimate by a certain amount. Consider an alternative estimator B which corrects the bias of estimator A so that <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%20=%200">, but this increases its variance to 1.5. Then</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A(%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)-%5Ctext%7BVar%7D(%5Cwidehat%7B%5Cbeta%7D%5EB))%20+%20(%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EA)%5E2%20-%20%5Ctext%7BBias%7D(%5Cwidehat%7B%5Cbeta%7D%5EB)%5E2)%20&amp;=%20(1%20-%201.5)%20+%20((-0.5)%5E2%20-%200%5E2)%20%5C%5C%0A&amp;=%20-0.25%0A%5Cend%7Baligned%7D%0A"></p>
<p>The MSPE for the “debiased” estimator is greater than that of the “biased” estimator. Correcting the bias in the estimator had a negative impact on predictive performance.</p>
<p>So, if we care about predictive performance, then bias in our estimates is not necessarily a bad thing.</p>
</section>
<section id="penalized-regression" class="level1">
<h1>Penalized Regression</h1>
<p>A common rule of thumb is that you need <em>at least</em> 10 observations (<img src="https://latex.codecogs.com/png.latex?n"> denotes number of observations) per predictor (<img src="https://latex.codecogs.com/png.latex?p"> denotes number of predictors) in traditional linear regression settings to estimate <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> “stably” using ordinary least squares (OLS), but even more may be required. If <img src="https://latex.codecogs.com/png.latex?n%20%3E%20p"> but <img src="https://latex.codecogs.com/png.latex?n%20%3C%2010p">, this rule of thumb would suggest we are in a gray area where estimates for <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> tend to be highly variable and can lead to poor predictions. Of course, if <img src="https://latex.codecogs.com/png.latex?n%20%3C%20p">, then it is not possible to use OLS at all.</p>
<p>Adding bias to the estimation process is helpful in both of these scenarios. Enter <em>penalized regression</em> methods.</p>
<p>In general, such methods “penalize” larger estimates of <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> by an amount dictated by a tuning parameter, <img src="https://latex.codecogs.com/png.latex?%5Clambda">. The penalty acts like a “complexity tax” forcing the model to stop chasing noise and to start paying attention to the real signal. So, while this introduces bias in the estimates, it also reduces their variance, often leading to superior predictive performance. In fact, this can be true even when <img src="https://latex.codecogs.com/png.latex?n"> is much larger than <img src="https://latex.codecogs.com/png.latex?p">. In high-dimensional or noisy settings or if there is large amount of correlation between predictors, the gains are often large enough that biased estimators dominate unbiased ones in prediction.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-5-contents" aria-controls="callout-5" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Popular Penalized Regression Methods
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-5" class="callout-5-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<ol type="1">
<li>The least absolute shrinkage and selection operator (lasso)</li>
<li>Ridge regression</li>
<li>Elastic net (simply a mix of 1 and 2)</li>
</ol>
</div>
</div>
</div>
<p>It’s easiest to show this with data, so we’ll now turn to an example.</p>
</section>
<section id="example-predicting-leukemia-subtype" class="level1">
<h1>Example: Predicting Leukemia Subtype</h1>
<p>To start, we’ll load some libraries and a helpful function.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(hdrm)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ncvreg)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(hdi)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-5"></span>
<span id="cb1-6">estimate_intercept <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(beta, X, y) {</span>
<span id="cb1-7">  eta_no_intercept <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop</span>(X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> beta)</span>
<span id="cb1-8"></span>
<span id="cb1-9">  f <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(alpha) {</span>
<span id="cb1-10">    p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plogis</span>(alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eta_no_intercept)</span>
<span id="cb1-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(p)</span>
<span id="cb1-12">  }</span>
<span id="cb1-13"></span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uniroot</span>(f, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">interval =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>root</span>
<span id="cb1-15">}</span></code></pre></div></div>
</div>
<p>Now, consider a data set for predicting leukemia subtype using gene expression data <span class="citation" data-cites="Golub1999">(Golub et al. 1999)</span>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">brca1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hdrm<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Golub1999"</span>)</span>
<span id="cb2-2">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> brca1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>X</span>
<span id="cb2-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> brca1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ALL"</span></span>
<span id="cb2-4">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(brca1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y)</span></code></pre></div></div>
</div>
<p>This dataset has 47 patients with acute lymphoblastic leukemia (abbreviated as ALL) and 25 patients with acute myeloid leukemia (AML). There are 7129 gene expression features, putting us in the high-dimensional realm of <img src="https://latex.codecogs.com/png.latex?n%20%3C%20p">.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>When <img src="https://latex.codecogs.com/png.latex?n%20%3C%20p">, the variance of <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> for an ordinary, unpenalized regression of any type is essentially <img src="https://latex.codecogs.com/png.latex?%5Cinfty"> because the model is not identifiable. It’s like trying to solve a puzzle with more missing pieces than clues; many solutions look “possible,” so you can’t tell which one is the real one.</p>
</div>
</div>
<p>We split the dataset 50/50 into train and test sets.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2806</span>)</span>
<span id="cb3-2">idx_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample.int</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">floor</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>))</span>
<span id="cb3-3">Xtrain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X[idx_train,]</span>
<span id="cb3-4">ytrain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> y[idx_train]</span>
<span id="cb3-5">Xtest <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>idx_train,]</span>
<span id="cb3-6">ytest <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> y[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>idx_train]</span></code></pre></div></div>
</div>
<p>Then, we will perform a penalized regression method called the “lasso” and select the tuning parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> using cross validation:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2806</span>)</span>
<span id="cb4-2">cv_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.ncvreg</span>(Xtrain, ytrain, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lasso"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span>)</span>
<span id="cb4-3">lambda_min <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cv_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lambda.min</span></code></pre></div></div>
</div>
<p>Finally we can visualize the predicted probabilities on the testing data to see how well the selected lasso model is able to differentiate between the two disease subtypes.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">lasso_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb5-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">predicted_prob_ALL =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(cv_fit, Xtest, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>),</span>
<span id="cb5-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(ytest, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AML"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ALL"</span>))</span>
<span id="cb5-4">)</span>
<span id="cb5-5"></span>
<span id="cb5-6">lasso_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>prediction_quality <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(</span>
<span id="cb5-7">  lasso_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted_prob_ALL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> lasso_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>class <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AML"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> </span>
<span id="cb5-8">    lasso_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted_prob_ALL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> lasso_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>class <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ALL"</span>,</span>
<span id="cb5-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bad"</span>, </span>
<span id="cb5-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Good"</span></span>
<span id="cb5-11">)</span>
<span id="cb5-12"></span>
<span id="cb5-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(lasso_res, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> class, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> predicted_prob_ALL, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> prediction_quality)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">position_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb5-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Lasso-based predicted probability ALL"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/bias-benefit/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Along with the fact that we reduced the variability of estimation (enough to actually fit a model), introducing bias also helps mitigate over fitting which leads to models that are more generalizable with improved out-of-sample predictions. With this example, we have good (though imperfect) separation between the two subtypes.</p>
<p>This is not the only benefit of introducing bias here. Penalties like the lasso produce sparse fits, with many coefficients set exactly to zero (the second “s” in lasso does literally stand for <em>selection</em>). Weak and noisy predictors are removed, leading to more interpretable <a href="https://data-diction.com/posts/glassbox-models/">glass-box models</a>.</p>
</section>
<section id="the-pitfall-of-debiasing" class="level1">
<h1>The Pitfall of Debiasing</h1>
<p>That being said, it is reasonable to think that reducing the bias of the estimates for <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Cbeta%7D"> would lead to better predictive performance. One such example is the debiased lasso (also known as the desparsified lasso, <span class="citation" data-cites="ZhangZhang2014">Zhang and Zhang (2014)</span>) . However, while this method may be good for providing asymptotically unbiased estimators, debiasing can come at the cost of reintroducing a high amount of variance.</p>
<p>As a result, debiasing often leads to noticeably <em>worse</em> predictions. To see this, we will fit the desparsified lasso to our leukemia dataset.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="do" style="color: #5E5E5E;
background-color: null;
font-style: italic;">## Takes about 20 minutes; cached </span></span>
<span id="cb6-2">debiased_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hdi<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lasso.proj</span>(Xtrain, ytrain, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> lambda_min)</span>
<span id="cb6-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">saveRDS</span>(debiased_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bhat, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"debiased_fit.rds"</span>)</span></code></pre></div></div>
</div>
<p>In the code below, we re-estimate the intercept after we obtain the debiased estimates. Then, we can visualize the predictions as we did with the lasso.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">debiased_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readRDS</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"debiased_fit.rds"</span>)</span>
<span id="cb7-2">debiased_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb7-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">predicted_prob_ALL =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plogis</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">estimate_intercept</span>(debiased_fit, Xtrain, ytrain) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Xtest <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> debiased_fit),</span>
<span id="cb7-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">class =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(ytest, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AML"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ALL"</span>))</span>
<span id="cb7-5">)</span>
<span id="cb7-6"></span>
<span id="cb7-7">debiased_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>prediction_quality <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(</span>
<span id="cb7-8">  debiased_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted_prob_ALL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> debiased_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>class <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AML"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> </span>
<span id="cb7-9">    debiased_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted_prob_ALL <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> debiased_res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>class <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ALL"</span>,</span>
<span id="cb7-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bad"</span>, </span>
<span id="cb7-11">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Good"</span></span>
<span id="cb7-12">)</span>
<span id="cb7-13"></span>
<span id="cb7-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(debiased_res, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> class, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> predicted_prob_ALL, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> prediction_quality)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">position_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Debiased-lasso-based predicted probability ALL"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/bias-benefit/index_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Debiasing the original lasso point estimates results in 1 additional subject with AML being misclassified into the ALL group. Additionally, whereas the lasso correctly predicted all subjects with ALL, debiasing leads to 2 incorrect AML predictions. This is a reduction in accuracy from 87.5% to 78.1%!</p>
<p>While you might say you don’t care about prediction, in a future post we will explore why you can’t afford not to.</p>
</section>
<section id="take-aways" class="level1">
<h1>Take aways</h1>
<ul>
<li>Biased estimation can improve a model’s predictive performance</li>
<li>Reducing bias (via debiasing) often worsens predictions</li>
<li>The lasso and other penalized regression methods can yield better <a href="https://data-diction.com/posts/glassbox-models/">glass-box models</a></li>
</ul>
</section>
<section id="follow-up-questions" class="level1">
<h1>Follow-up Questions</h1>
<p>The following questions might be of interest for new posts:</p>
<ul>
<li>What implications does bias have on inference and model interpretation?</li>
<li>When might debiasing be a good idea?</li>
</ul>



</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-Golub1999" class="csl-entry">
Golub, Todd R., Donna K. Slonim, Pablo Tamayo, et al. 1999. <span>“Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.”</span> <em>Science</em> 286 (5439): 531–37. <a href="https://doi.org/10.1126/science.286.5439.531">https://doi.org/10.1126/science.286.5439.531</a>.
</div>
<div id="ref-ZhangZhang2014" class="csl-entry">
Zhang, C. H., and S. S. Zhang. 2014. <span>“Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.”</span> <em>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</em> 76 (1): 217–42.
</div>
</div></section></div> ]]></description>
  <category>lasso</category>
  <category>penalized regression</category>
  <category>model selection</category>
  <category>interpretability</category>
  <guid>https://www.data-diction.com/posts/bias-benefit/</guid>
  <pubDate>Mon, 15 Dec 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>What do we mean by glass-box, exactly?</title>
  <dc:creator>Ryan Peterson</dc:creator>
  <link>https://www.data-diction.com/posts/glassbox-models/</link>
  <description><![CDATA[ 





<!-- Edit the line below after review -->
<div class="callout callout-style-default callout-note callout-empty-content callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Reviewed by Logan Harris on November 21, 2025
</div>
</div>
<div class="callout-body-container callout-body">

</div>
</div>
<section id="introduction" class="level1">
<h1>Introduction</h1>
<section id="what-are-glass-box-models" class="level2">
<h2 class="anchored" data-anchor-id="what-are-glass-box-models">What are glass-box models?</h2>
<p>Glass-box models are transparent, intrinsically interpretable alternatives to their opaque counterparts, black box models. Data scientists typically consider regression-based methods and sparse decision trees as “glass-box”. In this post, I describe the benefits of glass-box methods for modeling data, arguing for the importance of <em>intrinsic interpretability</em>. An intrinsically interpretable model is one whose internal logic is clear enough that a human can see why it makes each prediction, not merely trust a separate explainer.</p>
<p>I hope to convince you that the prevailing definition for glass-box models requires significant refinement in order to be true to its namesake.</p>
<p>Just because a model is a “simple regression” <strong>does not</strong> mean it is interpretable. In fact, model selection methods that attempt to create more interpretable models often invalidate inference by making p-values unjustifiably low, and confidence intervals (CIs) unrealistically narrow.</p>
</section>
<section id="are-glass-box-models-a-novelty-or-a-necessity" class="level2">
<h2 class="anchored" data-anchor-id="are-glass-box-models-a-novelty-or-a-necessity">Are glass-box models a novelty? Or a necessity?</h2>
<p>Consider the useful analogy in the image below, which compares two very different uses of glass as a window into a process. On the left, a penny pressor. I always considered these to be a fun novelty (I still don’t understand how they are legal!). The glass box is part of the appeal - you get to watch a penny turn into a souvenir. Cool!</p>
<p><img src="https://www.data-diction.com/posts/glassbox-models/thumbnail.png" class="img-fluid"></p>
<p>On the right, you see scientists at the Hanford Site in Washington peering through shield windows while working on the plutonium that would eventually be used in the world’s first atomic bomb explosion. One intact version of these shield windows used during the Manhattan project is currently priced at 10 million dollars, although you can purchase a fragment of it for much less <a href="https://shop.minimuseum.com/collections/manhattan-project-shield-window-glass?srsltid=AfmBOorJNhqFW5iN0z9sRdzW7xiFAIDY_s7co9zt41ZPnvypEH0V4imX">shop.minimuseum.com</a>.</p>
<p>The example of the Hanford Site clearly shows how in high-stakes decisions or systems, <strong>opacity is dangerous</strong>. Therefore, sometimes, a glass-box model isn’t a novelty like with the penny pressor - it’s a requirement for oversight and correction.</p>
<p>If the AI doomers are right that AI represent existential threats on par with nuclear war… it is easy to see that glass-box models should be (at least in many cases) classified as a <em>necessity</em>, not a novelty.</p>
</section>
</section>
<section id="why-do-we-build-models-in-the-first-place" class="level1">
<h1>Why do we build models in the first place?</h1>
<p>Let’s start with a quote you have probably heard before from George Box:</p>
<blockquote class="blockquote">
<p>All models are wrong, but some are useful.</p>
</blockquote>
<p>What makes them useful? Good models help us to:</p>
<ol type="1">
<li>make good predictions</li>
<li>understand phenomena.</li>
</ol>
<p>The mix of these two goals are entirely context dependent. In my experience, it is rare for the focus to be entirely on one of these goals.</p>
<section id="a-brief-history-of-model-selection" class="level2">
<h2 class="anchored" data-anchor-id="a-brief-history-of-model-selection">A brief history of model selection</h2>
<section id="hypothesis-testing" class="level3">
<h3 class="anchored" data-anchor-id="hypothesis-testing">Hypothesis testing</h3>
<p>Starting in the 1700s and through the 1980s, hypothesis testing &amp; p-values were the main way to build models. In the typical set up, two competing models are compared (e.g., a null and an alternative), and evidence against the null model is summarized via a p-value.</p>
</section>
<section id="information-criteria" class="level3">
<h3 class="anchored" data-anchor-id="information-criteria">Information criteria</h3>
<p>In the 1970s, Akaike changed the game with his famous information criteria, AIC, and the modeling goal substantively changed from one of <em>testing</em> to one of <em>optimization</em>. Instead of model A vs model B, we now had a set of a bunch of models that we could compare at once to find the best.</p>
</section>
<section id="computationally-intensive-validation" class="level3">
<h3 class="anchored" data-anchor-id="computationally-intensive-validation">Computationally-intensive validation</h3>
<p>More recently, however, “computationally-intensive” validation <em>opened Pandora’s box</em>. In this era, the data scientist is only limited by their imagination. Any model can be compared against any other model and fed through an optimization pipeline that uses computationally-intensive validation to ensure bad models (that is, models that predict poorly), get sifted out and never see the light of day.</p>
<p>Now, so-called black box modeling approaches, where the desire to understand phenomena is completely defenestrated in conquest of making better predictions, are ubiquitous.</p>
<p>I suppose that’s what Pandora deserved for having an opaque box to begin with.</p>
</section>
<section id="now-what" class="level3">
<h3 class="anchored" data-anchor-id="now-what">Now what?</h3>
<p>In light of these advancements, “scientific” model sifting via interdisciplinary expertise remains and is increasingly important. Burnham and Anderson suggest that scientists should build a small set of models to clearly and uniquely represent their hypotheses <em>a priori</em>:</p>
<blockquote class="blockquote">
<p>“…it seems poor practice to consider all possible models; surely some science can be brought to bear on such an unthinking approach (otherwise, the scientist is superfluous)”</p>
</blockquote>
<p>So in the present era (with potentially huge data sets and lots of features), how can we use domain expertise efficiently?</p>
<p>These are important questions I’ve devoted a lot of effort to, but this is not the topic of today’s post.</p>
</section>
</section>
</section>
<section id="on-black-glass-boxes" class="level1">
<h1>On black &amp; glass boxes</h1>
<p>Black box machine learning (ML) methods are not designed with interpretability in mind. Black box models:</p>
<ul>
<li>include random forests, ensembles/super learners, neural networks, XGBoost, etc.</li>
<li>can capture nonlinearities and/or high-order interactions well.</li>
<li>may have predictions that are extrinsically explainable. This differs from intrinsic interpretability, however.</li>
</ul>
<p>The term <strong>glass-box models</strong> arose to contrast black box models with traditional statistical models.</p>
<p>Generally, the following are considered glass-box:</p>
<ul>
<li>Regression (linear, logistic, etc.)</li>
<li>Penalized regression (if high-dimensional)</li>
<li>Interpretable decision trees</li>
</ul>
<p>Regression methods allow us to estimate how an outcome <img src="https://latex.codecogs.com/png.latex?Y"> changes with, or is impacted by, a covariate <img src="https://latex.codecogs.com/png.latex?X">, holding other covariates constant. “<em>Holding confounders constant</em>” is, in my view, a statistical slight-of-hand that is often very poorly understood and mis-applied. At its best though, the language and techniques of regression allow us to get closer to a causal interpretation of <img src="https://latex.codecogs.com/png.latex?X%20%5Crightarrow%20Y">. Modern causal inference methods formalize this and can yield additional insights.</p>
<p>In traditional statistical models, we not only describe these model-based associations, we seek inferences about them (often with p-values and CIs).</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Confidence intervals and p-values are hard to obtain in black box ML settings.</p>
</div>
</div>
</section>
<section id="not-all-regression-models-are-glass-box." class="level1">
<h1>Not all regression models are glass-box.</h1>
<p>If you disagree, please keep reading.</p>
<section id="a-refined-glass-box-model-definition" class="level2">
<h2 class="anchored" data-anchor-id="a-refined-glass-box-model-definition">A refined glass-box model definition</h2>
<p>Here’s my suggestion for a refined definition of “glass-box model”.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Glass-box model:
</div>
</div>
<div class="callout-body-container callout-body">
<p>A statistical model expressed in terms of a linear combination of a parsimonious set of meaningful parameters with quantified uncertainty. The <em>best</em> glass-box models are transparent, small, quantify parameter uncertainty honestly, and still predict well.</p>
</div>
</div>
<p><strong>Transparency</strong> is reduced as more features are added, especially features that render models difficult to interpret (like interactions), or&nbsp;those involving complex transformations. This definition of transparency resembles that for typical applications of Occam’s Razor in model selection, where the number of parameters in the model translates directly to its simplicity, except that we consider some parameters (coefficients on interactions, for&nbsp;instance) more complex than&nbsp;others (main effects).</p>
<p><strong>Uncertainty quantification</strong> is also a key component of this definition. Inferential tools like p-values and CIs are a cornerstone of science, replicability, and transparency. Models without such measures are limited to description, and may therefore have poor generalizability.</p>
<p><strong>Linearity</strong>: While the definition contains the word “linear,” it’s language is careful to encompass <em>generalized linear models</em> in addition to linear regression.</p>
<p>Under&nbsp;this definition, transparency (conversely, opacity) is a spectrum:</p>
<ul>
<li>The most transparent model is the “null” model</li>
<li>Single-predictor models, often used to describe “unadjusted” relationships, might be labeled as the next most transparent.*</li>
<li>On the other end might be large-language models with billions of interconnected parameters.</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-4-contents" aria-controls="callout-4" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>*A brief aside.
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-4" class="callout-4-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>My college friend Tom smoked cigarettes, despite being “pre-med”. I asked him about why he smoked given the health consequences. He said <em>“It’s not bad for me until I’ve smoked 20 pack-years worth! That’s what the literature says!”</em></p>
<p>In fact, this 20+ pack years is all over the literature and official screening guidelines. Frank Harrell refers to this as <em>dichotomania</em>. It illustrates the appeal of a glass-box model in the most negative possible light, but an appeal nonetheless.</p>
</div>
</div>
</div>
<p>Let’s go through a simple example to illustrate.</p>
</section>
</section>
<section id="example-hers-dataset" class="level1">
<h1>Example: HERS Dataset</h1>
<section id="the-data" class="level2">
<h2 class="anchored" data-anchor-id="the-data">The data</h2>
<p>The Heart and Estrogen/progestin Replacement Study (HERS) was a clinical trial of hormone therapy for prevention of recurrent heart attacks and death among post-menopausal women with existing coronary heart disease.</p>
<p>This data set contains 27 baseline features for n=2571 patients, and complete 1-year follow-up cholesterol data.</p>
<p>Baseline covariates include age, baseline cholesterol, clinical characteristics, treatment, patient-reported health/activity, diabetes status, blood biomarkers, etc.</p>
<p>Let’s take a fresh look at the HERS data to build a glass-box model for HDL cholesterol 1-year post-baseline. We might want to do this for several reasons, for instance, if we want to plan a trial to consider whether a particular intervention will impact cholesterol. We may want to pre-specify our statistical analysis and specify what model will be used to test our hypothesis. Specifically, the question might be posed - what patient factors should our model for cholesterol include as “precision” variables?</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>This setting is “easy”…
</div>
</div>
<div class="callout-body-container callout-body">
<p>We have <img src="https://latex.codecogs.com/png.latex?n=2571%20%3E%3E%20p=%2032">, a relatively Gaussian response, a small set of interpretable features. However, we’ll see that even in this “easy” setting, building a glass-box model (under the refined definition) is not trivial.</p>
</div>
</div>
</section>
<section id="describing-the-data" class="level2">
<h2 class="anchored" data-anchor-id="describing-the-data">Describing the data</h2>
<p>Here’s our outcome:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gtsummary)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(kableExtra)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(recipes)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(selectInferToolkit) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#github.com/petersonR/selectInferToolkit</span></span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hers"</span>)</span>
<span id="cb1-9"></span>
<span id="cb1-10">hers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> hdl1)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HDL (1 year ahead)"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/glassbox-models/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">hers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>hdl1) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_summary</span>() </span></code></pre></div></div>
<div class="cell-output-display">
<div id="aiondlliuw" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#aiondlliuw table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#aiondlliuw thead, #aiondlliuw tbody, #aiondlliuw tfoot, #aiondlliuw tr, #aiondlliuw td, #aiondlliuw th {
  border-style: none;
}

#aiondlliuw p {
  margin: 0;
  padding: 0;
}

#aiondlliuw .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#aiondlliuw .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#aiondlliuw .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#aiondlliuw .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#aiondlliuw .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#aiondlliuw .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#aiondlliuw .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#aiondlliuw .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#aiondlliuw .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#aiondlliuw .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#aiondlliuw .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#aiondlliuw .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#aiondlliuw .gt_spanner_row {
  border-bottom-style: hidden;
}

#aiondlliuw .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#aiondlliuw .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#aiondlliuw .gt_from_md > :first-child {
  margin-top: 0;
}

#aiondlliuw .gt_from_md > :last-child {
  margin-bottom: 0;
}

#aiondlliuw .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#aiondlliuw .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#aiondlliuw .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#aiondlliuw .gt_row_group_first td {
  border-top-width: 2px;
}

#aiondlliuw .gt_row_group_first th {
  border-top-width: 2px;
}

#aiondlliuw .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#aiondlliuw .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#aiondlliuw .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#aiondlliuw .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#aiondlliuw .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#aiondlliuw .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#aiondlliuw .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#aiondlliuw .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#aiondlliuw .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#aiondlliuw .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#aiondlliuw .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#aiondlliuw .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#aiondlliuw .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#aiondlliuw .gt_left {
  text-align: left;
}

#aiondlliuw .gt_center {
  text-align: center;
}

#aiondlliuw .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#aiondlliuw .gt_font_normal {
  font-weight: normal;
}

#aiondlliuw .gt_font_bold {
  font-weight: bold;
}

#aiondlliuw .gt_font_italic {
  font-style: italic;
}

#aiondlliuw .gt_super {
  font-size: 65%;
}

#aiondlliuw .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#aiondlliuw .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#aiondlliuw .gt_indent_1 {
  text-indent: 5px;
}

#aiondlliuw .gt_indent_2 {
  text-indent: 10px;
}

#aiondlliuw .gt_indent_3 {
  text-indent: 15px;
}

#aiondlliuw .gt_indent_4 {
  text-indent: 20px;
}

#aiondlliuw .gt_indent_5 {
  text-indent: 25px;
}

#aiondlliuw .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#aiondlliuw div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="stat_0" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>N = 2,571</strong><span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">age</td>
<td class="gt_row gt_center" headers="stat_0">67 (62, 72)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">weight</td>
<td class="gt_row gt_center" headers="stat_0">71 (62, 81)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">bmi</td>
<td class="gt_row gt_center" headers="stat_0">27.8 (24.6, 31.7)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">waist</td>
<td class="gt_row gt_center" headers="stat_0">91 (82, 100)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">whr</td>
<td class="gt_row gt_center" headers="stat_0">0.87 (0.81, 0.92)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">glucose</td>
<td class="gt_row gt_center" headers="stat_0">99 (91, 114)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">ldl</td>
<td class="gt_row gt_center" headers="stat_0">141 (119, 166)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">hdl</td>
<td class="gt_row gt_center" headers="stat_0">49 (41, 57)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">tg</td>
<td class="gt_row gt_center" headers="stat_0">157 (116, 208)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">sbp</td>
<td class="gt_row gt_center" headers="stat_0">134 (121, 146)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dbp</td>
<td class="gt_row gt_center" headers="stat_0">72 (67, 80)</td>
</tr>
</tbody><tfoot class="gt_footnotes">
<tr class="odd">
<td colspan="2" class="gt_footnote"><span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span> Median (Q1, Q3)</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>And our predictors:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">hers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.factor)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_summary</span>() </span></code></pre></div></div>
<div class="cell-output-display">
<div id="fuggkgcubm" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#fuggkgcubm table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#fuggkgcubm thead, #fuggkgcubm tbody, #fuggkgcubm tfoot, #fuggkgcubm tr, #fuggkgcubm td, #fuggkgcubm th {
  border-style: none;
}

#fuggkgcubm p {
  margin: 0;
  padding: 0;
}

#fuggkgcubm .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#fuggkgcubm .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#fuggkgcubm .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#fuggkgcubm .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#fuggkgcubm .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#fuggkgcubm .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#fuggkgcubm .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#fuggkgcubm .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#fuggkgcubm .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#fuggkgcubm .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#fuggkgcubm .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#fuggkgcubm .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#fuggkgcubm .gt_spanner_row {
  border-bottom-style: hidden;
}

#fuggkgcubm .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#fuggkgcubm .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#fuggkgcubm .gt_from_md > :first-child {
  margin-top: 0;
}

#fuggkgcubm .gt_from_md > :last-child {
  margin-bottom: 0;
}

#fuggkgcubm .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#fuggkgcubm .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#fuggkgcubm .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#fuggkgcubm .gt_row_group_first td {
  border-top-width: 2px;
}

#fuggkgcubm .gt_row_group_first th {
  border-top-width: 2px;
}

#fuggkgcubm .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#fuggkgcubm .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#fuggkgcubm .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#fuggkgcubm .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#fuggkgcubm .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#fuggkgcubm .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#fuggkgcubm .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#fuggkgcubm .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#fuggkgcubm .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#fuggkgcubm .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#fuggkgcubm .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#fuggkgcubm .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#fuggkgcubm .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#fuggkgcubm .gt_left {
  text-align: left;
}

#fuggkgcubm .gt_center {
  text-align: center;
}

#fuggkgcubm .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#fuggkgcubm .gt_font_normal {
  font-weight: normal;
}

#fuggkgcubm .gt_font_bold {
  font-weight: bold;
}

#fuggkgcubm .gt_font_italic {
  font-style: italic;
}

#fuggkgcubm .gt_super {
  font-size: 65%;
}

#fuggkgcubm .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#fuggkgcubm .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#fuggkgcubm .gt_indent_1 {
  text-indent: 5px;
}

#fuggkgcubm .gt_indent_2 {
  text-indent: 10px;
}

#fuggkgcubm .gt_indent_3 {
  text-indent: 15px;
}

#fuggkgcubm .gt_indent_4 {
  text-indent: 20px;
}

#fuggkgcubm .gt_indent_5 {
  text-indent: 25px;
}

#fuggkgcubm .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#fuggkgcubm div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="stat_0" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>N = 2,571</strong><span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">ht</td>
<td class="gt_row gt_center" headers="stat_0"><br>
</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;placebo</td>
<td class="gt_row gt_center" headers="stat_0">1,303 (51%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;hormone therapy</td>
<td class="gt_row gt_center" headers="stat_0">1,268 (49%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">raceth</td>
<td class="gt_row gt_center" headers="stat_0"><br>
</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;White</td>
<td class="gt_row gt_center" headers="stat_0">2,299 (89%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;African American</td>
<td class="gt_row gt_center" headers="stat_0">184 (7.2%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;Other</td>
<td class="gt_row gt_center" headers="stat_0">88 (3.4%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">smoking</td>
<td class="gt_row gt_center" headers="stat_0">328 (13%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">drinkany</td>
<td class="gt_row gt_center" headers="stat_0">1,020 (40%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">exercise</td>
<td class="gt_row gt_center" headers="stat_0">1,012 (39%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact</td>
<td class="gt_row gt_center" headers="stat_0"><br>
</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;much less active</td>
<td class="gt_row gt_center" headers="stat_0">170 (6.6%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;somewhat less active</td>
<td class="gt_row gt_center" headers="stat_0">459 (18%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;about as active</td>
<td class="gt_row gt_center" headers="stat_0">854 (33%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;somewhat more active</td>
<td class="gt_row gt_center" headers="stat_0">794 (31%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;much more active</td>
<td class="gt_row gt_center" headers="stat_0">294 (11%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat</td>
<td class="gt_row gt_center" headers="stat_0"><br>
</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;poor</td>
<td class="gt_row gt_center" headers="stat_0">46 (1.8%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;fair</td>
<td class="gt_row gt_center" headers="stat_0">536 (21%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;good</td>
<td class="gt_row gt_center" headers="stat_0">1,229 (48%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;very good</td>
<td class="gt_row gt_center" headers="stat_0">648 (25%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">&nbsp;&nbsp;&nbsp;&nbsp;excellent</td>
<td class="gt_row gt_center" headers="stat_0">112 (4.4%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">medcond</td>
<td class="gt_row gt_center" headers="stat_0">947 (37%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">htnmeds</td>
<td class="gt_row gt_center" headers="stat_0">2,107 (82%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">statins</td>
<td class="gt_row gt_center" headers="stat_0">951 (37%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">diabetes</td>
<td class="gt_row gt_center" headers="stat_0">662 (26%)</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dmpills</td>
<td class="gt_row gt_center" headers="stat_0">246 (9.6%)</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">insulin</td>
<td class="gt_row gt_center" headers="stat_0">244 (9.5%)</td>
</tr>
</tbody><tfoot class="gt_footnotes">
<tr class="odd">
<td colspan="2" class="gt_footnote"><span class="gt_footnote_marks" style="white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;"><sup>1</sup></span> n (%)</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>Let’s clean it up a bit for modeling. The steps below use the <code>recipes</code> package to standardize the data and create indicators for each of the factor variables.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">hers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">physact =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">relevel</span>(physact, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"about as active"</span>),</span>
<span id="cb4-3">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">globrat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">relevel</span>(globrat, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"good"</span>))</span>
<span id="cb4-4"></span>
<span id="cb4-5">rec_obj <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(hdl1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> hers)</span>
<span id="cb4-6"></span>
<span id="cb4-7">rec_obj <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rec_obj <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_dummy</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_nominal</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">keep_original_cols =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_center</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_scale</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prep</span>()</span>
<span id="cb4-12"></span>
<span id="cb4-13">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bake</span>(rec_obj, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> hers)</span></code></pre></div></div>
</div>
</section>
<section id="unadjusted-models" class="level2">
<h2 class="anchored" data-anchor-id="unadjusted-models">Unadjusted models</h2>
<p>We can look at the <em>unadjusted associations</em> via <code>gtsummary</code>’s function below (recall, these are close to the most transparent, interpretable models we have, but they probably don’t predict well).</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_uvregression</span>(df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> hdl1)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="lgkrtpudfy" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#lgkrtpudfy table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#lgkrtpudfy thead, #lgkrtpudfy tbody, #lgkrtpudfy tfoot, #lgkrtpudfy tr, #lgkrtpudfy td, #lgkrtpudfy th {
  border-style: none;
}

#lgkrtpudfy p {
  margin: 0;
  padding: 0;
}

#lgkrtpudfy .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#lgkrtpudfy .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#lgkrtpudfy .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#lgkrtpudfy .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#lgkrtpudfy .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#lgkrtpudfy .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#lgkrtpudfy .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#lgkrtpudfy .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#lgkrtpudfy .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#lgkrtpudfy .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#lgkrtpudfy .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#lgkrtpudfy .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#lgkrtpudfy .gt_spanner_row {
  border-bottom-style: hidden;
}

#lgkrtpudfy .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#lgkrtpudfy .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#lgkrtpudfy .gt_from_md > :first-child {
  margin-top: 0;
}

#lgkrtpudfy .gt_from_md > :last-child {
  margin-bottom: 0;
}

#lgkrtpudfy .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#lgkrtpudfy .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#lgkrtpudfy .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#lgkrtpudfy .gt_row_group_first td {
  border-top-width: 2px;
}

#lgkrtpudfy .gt_row_group_first th {
  border-top-width: 2px;
}

#lgkrtpudfy .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#lgkrtpudfy .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#lgkrtpudfy .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#lgkrtpudfy .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#lgkrtpudfy .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#lgkrtpudfy .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#lgkrtpudfy .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#lgkrtpudfy .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#lgkrtpudfy .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#lgkrtpudfy .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#lgkrtpudfy .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#lgkrtpudfy .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#lgkrtpudfy .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#lgkrtpudfy .gt_left {
  text-align: left;
}

#lgkrtpudfy .gt_center {
  text-align: center;
}

#lgkrtpudfy .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#lgkrtpudfy .gt_font_normal {
  font-weight: normal;
}

#lgkrtpudfy .gt_font_bold {
  font-weight: bold;
}

#lgkrtpudfy .gt_font_italic {
  font-style: italic;
}

#lgkrtpudfy .gt_super {
  font-size: 65%;
}

#lgkrtpudfy .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#lgkrtpudfy .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#lgkrtpudfy .gt_indent_1 {
  text-indent: 5px;
}

#lgkrtpudfy .gt_indent_2 {
  text-indent: 10px;
}

#lgkrtpudfy .gt_indent_3 {
  text-indent: 15px;
}

#lgkrtpudfy .gt_indent_4 {
  text-indent: 20px;
}

#lgkrtpudfy .gt_indent_5 {
  text-indent: 25px;
}

#lgkrtpudfy .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#lgkrtpudfy div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="stat_n" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>N</strong></th>
<th id="estimate" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>Beta</strong></th>
<th id="conf.low" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>95% CI</strong></th>
<th id="p.value" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>p-value</strong></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">age</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.11</td>
<td class="gt_row gt_center" headers="conf.low">0.08, 0.15</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">weight</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.19</td>
<td class="gt_row gt_center" headers="conf.low">-0.23, -0.15</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">bmi</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.19</td>
<td class="gt_row gt_center" headers="conf.low">-0.23, -0.15</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">waist</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.23</td>
<td class="gt_row gt_center" headers="conf.low">-0.26, -0.19</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">whr</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.20</td>
<td class="gt_row gt_center" headers="conf.low">-0.24, -0.16</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">glucose</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.15</td>
<td class="gt_row gt_center" headers="conf.low">-0.19, -0.11</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">ldl</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">hdl</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.73</td>
<td class="gt_row gt_center" headers="conf.low">0.70, 0.76</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">tg</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.36</td>
<td class="gt_row gt_center" headers="conf.low">-0.40, -0.33</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">sbp</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.8</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dbp</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.06</td>
<td class="gt_row gt_center" headers="p.value">0.2</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">ht_hormone.therapy</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.17</td>
<td class="gt_row gt_center" headers="conf.low">0.13, 0.21</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">raceth_African.American</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.04</td>
<td class="gt_row gt_center" headers="conf.low">0.00, 0.08</td>
<td class="gt_row gt_center" headers="p.value">0.061</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">raceth_Other</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.04</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.056</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">smoking_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.04</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.057</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">drinkany_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.14</td>
<td class="gt_row gt_center" headers="conf.low">0.10, 0.18</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">exercise_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.05</td>
<td class="gt_row gt_center" headers="conf.low">0.02, 0.09</td>
<td class="gt_row gt_center" headers="p.value">0.006</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_much.less.active</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.05</td>
<td class="gt_row gt_center" headers="conf.low">-0.09, -0.01</td>
<td class="gt_row gt_center" headers="p.value">0.008</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_somewhat.less.active</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.06</td>
<td class="gt_row gt_center" headers="conf.low">-0.10, -0.02</td>
<td class="gt_row gt_center" headers="p.value">0.004</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_somewhat.more.active</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.07</td>
<td class="gt_row gt_center" headers="conf.low">0.03, 0.11</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_much.more.active</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.06</td>
<td class="gt_row gt_center" headers="p.value">0.2</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_poor</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.07, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.2</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_fair</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.07, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.079</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_very.good</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.00, 0.07</td>
<td class="gt_row gt_center" headers="p.value">0.087</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_excellent</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.06</td>
<td class="gt_row gt_center" headers="conf.low">0.02, 0.09</td>
<td class="gt_row gt_center" headers="p.value">0.005</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">medcond_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">htnmeds_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.06</td>
<td class="gt_row gt_center" headers="conf.low">-0.10, -0.02</td>
<td class="gt_row gt_center" headers="p.value">0.004</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">statins_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.05</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">diabetes_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.16</td>
<td class="gt_row gt_center" headers="conf.low">-0.20, -0.13</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">dmpills_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.13</td>
<td class="gt_row gt_center" headers="conf.low">-0.17, -0.09</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">insulin_yes</td>
<td class="gt_row gt_center" headers="stat_n">2,571</td>
<td class="gt_row gt_center" headers="estimate">-0.08</td>
<td class="gt_row gt_center" headers="conf.low">-0.12, -0.05</td>
<td class="gt_row gt_center" headers="p.value">&lt;0.001</td>
</tr>
</tbody><tfoot class="gt_sourcenotes">
<tr class="odd">
<td colspan="5" class="gt_sourcenote">Abbreviation: CI = Confidence Interval</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>So… in these unadjusted relationships, nearly everything is significantly associated with HDL 1-year ahead… Are these helpful?</p>
</section>
<section id="a-kitchen-sink-approach" class="level2">
<h2 class="anchored" data-anchor-id="a-kitchen-sink-approach">A <strong>kitchen sink</strong> approach</h2>
<p>In settings like this when <img src="https://latex.codecogs.com/png.latex?n%20%3E%20p">, and especially when <img src="https://latex.codecogs.com/png.latex?n%20%3E%2010p">, quite a few statisticians suggest that a <em>“full model approach”</em> is best for inference. We’ll tackle that debate in another post.</p>
<p>For now, let’s check out where this <em>kitchen sink</em> approach (as in, throw all predictors into the model except the kitchen sink) gets us:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(hdl1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> df)</span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_regression</span>(fit) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bold_p</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div id="eycfftdzzx" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#eycfftdzzx table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#eycfftdzzx thead, #eycfftdzzx tbody, #eycfftdzzx tfoot, #eycfftdzzx tr, #eycfftdzzx td, #eycfftdzzx th {
  border-style: none;
}

#eycfftdzzx p {
  margin: 0;
  padding: 0;
}

#eycfftdzzx .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#eycfftdzzx .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#eycfftdzzx .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#eycfftdzzx .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#eycfftdzzx .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#eycfftdzzx .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eycfftdzzx .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#eycfftdzzx .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#eycfftdzzx .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#eycfftdzzx .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#eycfftdzzx .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#eycfftdzzx .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#eycfftdzzx .gt_spanner_row {
  border-bottom-style: hidden;
}

#eycfftdzzx .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#eycfftdzzx .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#eycfftdzzx .gt_from_md > :first-child {
  margin-top: 0;
}

#eycfftdzzx .gt_from_md > :last-child {
  margin-bottom: 0;
}

#eycfftdzzx .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#eycfftdzzx .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#eycfftdzzx .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#eycfftdzzx .gt_row_group_first td {
  border-top-width: 2px;
}

#eycfftdzzx .gt_row_group_first th {
  border-top-width: 2px;
}

#eycfftdzzx .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#eycfftdzzx .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#eycfftdzzx .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#eycfftdzzx .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eycfftdzzx .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#eycfftdzzx .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#eycfftdzzx .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#eycfftdzzx .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#eycfftdzzx .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#eycfftdzzx .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#eycfftdzzx .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#eycfftdzzx .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#eycfftdzzx .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#eycfftdzzx .gt_left {
  text-align: left;
}

#eycfftdzzx .gt_center {
  text-align: center;
}

#eycfftdzzx .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#eycfftdzzx .gt_font_normal {
  font-weight: normal;
}

#eycfftdzzx .gt_font_bold {
  font-weight: bold;
}

#eycfftdzzx .gt_font_italic {
  font-style: italic;
}

#eycfftdzzx .gt_super {
  font-size: 65%;
}

#eycfftdzzx .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#eycfftdzzx .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#eycfftdzzx .gt_indent_1 {
  text-indent: 5px;
}

#eycfftdzzx .gt_indent_2 {
  text-indent: 10px;
}

#eycfftdzzx .gt_indent_3 {
  text-indent: 15px;
}

#eycfftdzzx .gt_indent_4 {
  text-indent: 20px;
}

#eycfftdzzx .gt_indent_5 {
  text-indent: 25px;
}

#eycfftdzzx .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#eycfftdzzx div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="estimate" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>Beta</strong></th>
<th id="conf.low" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>95% CI</strong></th>
<th id="p.value" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>p-value</strong></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">age</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.01, 0.06</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.018</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">weight</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.11, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">bmi</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, 0.06</td>
<td class="gt_row gt_center" headers="p.value">0.8</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">waist</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, 0.10</td>
<td class="gt_row gt_center" headers="p.value">0.9</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">whr</td>
<td class="gt_row gt_center" headers="estimate">-0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">glucose</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">ldl</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.03</td>
<td class="gt_row gt_center" headers="p.value">&gt;0.9</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">hdl</td>
<td class="gt_row gt_center" headers="estimate">0.69</td>
<td class="gt_row gt_center" headers="conf.low">0.66, 0.72</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">tg</td>
<td class="gt_row gt_center" headers="estimate">-0.05</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, -0.03</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">sbp</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dbp</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.05</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">ht_hormone.therapy</td>
<td class="gt_row gt_center" headers="estimate">0.19</td>
<td class="gt_row gt_center" headers="conf.low">0.16, 0.21</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">raceth_African.American</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.01, 0.06</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.016</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">raceth_Other</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.038</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">smoking_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">drinkany_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">exercise_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_much.less.active</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_somewhat.less.active</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_somewhat.more.active</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.6</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_much.more.active</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.025</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_poor</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_fair</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_very.good</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_excellent</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.6</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">medcond_yes</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.8</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">htnmeds_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.054</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">statins_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">diabetes_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.05</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">dmpills_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.10</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">insulin_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.2</td>
</tr>
</tbody><tfoot class="gt_sourcenotes">
<tr class="odd">
<td colspan="4" class="gt_sourcenote">Abbreviation: CI = Confidence Interval</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>OK - we have a good starting point now. A few questions arise.</p>
<blockquote class="blockquote">
<p>Is this a good model?</p>
</blockquote>
<p>The predictive accuracy (<img src="https://latex.codecogs.com/png.latex?R%5E2">) is 0.58. Not bad, but actually not great considering a model containing <em>only</em> baseline HDL achieves an <img src="https://latex.codecogs.com/png.latex?R%5E2"> of 0.53.</p>
<blockquote class="blockquote">
<p>Is this a glass-box model?</p>
</blockquote>
<p>Well, we can learn quickly that baseline HDL, triglycerides, treatment group, race and maybe physical activity are significant predictors of HDL 1-year out.</p>
<p>A subquestion is whether these other “insignificant” variables are not important? The answer is NO. In fact, we are missing something big… More on this soon.</p>
<p>Let’s look back at our new glass-box model definition:</p>
<blockquote class="blockquote">
<p>A statistical model expressed in terms of a linear combination of a parsimonious set of meaningful parameters with quantified uncertainty.</p>
</blockquote>
<p>How does our kitchen sink model stack up?</p>
<ul>
<li>Linear? ✅</li>
<li>Predicts well? 🤷</li>
<li>Parsimonious? ❌</li>
<li>Meaningful parameters? ❌*</li>
<li>Valid uncertainty: ✅</li>
</ul>
<p>So, while this checks some of the boxes, <strong>I would not consider this kitchen sink model a glass-box model</strong>. How can we make this a better glass box?</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-6-contents" aria-controls="callout-6" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>*The trouble with collinearity
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-6" class="callout-6-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>On the face of it, you might think it’s straightforward to interpret each parameter of the kitchen sink model. For instance, holding other variables constant, for every 1 SD increase in age, the expected HDL 1 year after baseline increases by 0.11 SDs. However, for other parameters, it’s not so easy.</p>
</div>
</div>
</div>
<p>BMI (body mass index), WHR (waist to hip ratio), weight, and waist circumference are highly collinear with each other. They all measure adiposity. In the kitchen sink model, it appeared none of these were “significant”. However, the meaning of these parameters in the full model is lacking. “The effect of BMI <em>holding WHR and waist circumference constant</em>” is, in fact, quite ridiculous to conceptualize since these variables relate so highly to each other.</p>
<p>To see this, consider the model below where we remove 3 of the four adiposity variables, so only waist circumference represents adiposity as a “singular flagship”:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(hdl1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>. <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> bmi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> whr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> weight, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> df)</span>
<span id="cb7-2"></span>
<span id="cb7-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_regression</span>(fit) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bold_p</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div id="kcrkwrhuyo" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#kcrkwrhuyo table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#kcrkwrhuyo thead, #kcrkwrhuyo tbody, #kcrkwrhuyo tfoot, #kcrkwrhuyo tr, #kcrkwrhuyo td, #kcrkwrhuyo th {
  border-style: none;
}

#kcrkwrhuyo p {
  margin: 0;
  padding: 0;
}

#kcrkwrhuyo .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#kcrkwrhuyo .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#kcrkwrhuyo .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#kcrkwrhuyo .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#kcrkwrhuyo .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kcrkwrhuyo .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kcrkwrhuyo .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kcrkwrhuyo .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#kcrkwrhuyo .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#kcrkwrhuyo .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#kcrkwrhuyo .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#kcrkwrhuyo .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#kcrkwrhuyo .gt_spanner_row {
  border-bottom-style: hidden;
}

#kcrkwrhuyo .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#kcrkwrhuyo .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#kcrkwrhuyo .gt_from_md > :first-child {
  margin-top: 0;
}

#kcrkwrhuyo .gt_from_md > :last-child {
  margin-bottom: 0;
}

#kcrkwrhuyo .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#kcrkwrhuyo .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#kcrkwrhuyo .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#kcrkwrhuyo .gt_row_group_first td {
  border-top-width: 2px;
}

#kcrkwrhuyo .gt_row_group_first th {
  border-top-width: 2px;
}

#kcrkwrhuyo .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kcrkwrhuyo .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#kcrkwrhuyo .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#kcrkwrhuyo .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kcrkwrhuyo .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kcrkwrhuyo .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#kcrkwrhuyo .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#kcrkwrhuyo .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#kcrkwrhuyo .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kcrkwrhuyo .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kcrkwrhuyo .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#kcrkwrhuyo .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kcrkwrhuyo .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#kcrkwrhuyo .gt_left {
  text-align: left;
}

#kcrkwrhuyo .gt_center {
  text-align: center;
}

#kcrkwrhuyo .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#kcrkwrhuyo .gt_font_normal {
  font-weight: normal;
}

#kcrkwrhuyo .gt_font_bold {
  font-weight: bold;
}

#kcrkwrhuyo .gt_font_italic {
  font-style: italic;
}

#kcrkwrhuyo .gt_super {
  font-size: 65%;
}

#kcrkwrhuyo .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#kcrkwrhuyo .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#kcrkwrhuyo .gt_indent_1 {
  text-indent: 5px;
}

#kcrkwrhuyo .gt_indent_2 {
  text-indent: 10px;
}

#kcrkwrhuyo .gt_indent_3 {
  text-indent: 15px;
}

#kcrkwrhuyo .gt_indent_4 {
  text-indent: 20px;
}

#kcrkwrhuyo .gt_indent_5 {
  text-indent: 25px;
}

#kcrkwrhuyo .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#kcrkwrhuyo div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="estimate" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>Beta</strong></th>
<th id="conf.low" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>95% CI</strong></th>
<th id="p.value" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>p-value</strong></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">age</td>
<td class="gt_row gt_center" headers="estimate">0.04</td>
<td class="gt_row gt_center" headers="conf.low">0.01, 0.07</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.009</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">waist</td>
<td class="gt_row gt_center" headers="estimate">-0.04</td>
<td class="gt_row gt_center" headers="conf.low">-0.07, -0.01</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.009</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">glucose</td>
<td class="gt_row gt_center" headers="estimate">-0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">ldl</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">&gt;0.9</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">hdl</td>
<td class="gt_row gt_center" headers="estimate">0.69</td>
<td class="gt_row gt_center" headers="conf.low">0.66, 0.72</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">tg</td>
<td class="gt_row gt_center" headers="estimate">-0.05</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, -0.03</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">sbp</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">dbp</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.05</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">ht_hormone.therapy</td>
<td class="gt_row gt_center" headers="estimate">0.19</td>
<td class="gt_row gt_center" headers="conf.low">0.16, 0.21</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">raceth_African.American</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.00, 0.06</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.020</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">raceth_Other</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.042</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">smoking_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.6</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">drinkany_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">exercise_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.4</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_much.less.active</td>
<td class="gt_row gt_center" headers="estimate">-0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_somewhat.less.active</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_somewhat.more.active</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.6</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_much.more.active</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.026</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_poor</td>
<td class="gt_row gt_center" headers="estimate">-0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_fair</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.03, 0.02</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">globrat_very.good</td>
<td class="gt_row gt_center" headers="estimate">0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.01, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.3</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">globrat_excellent</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.6</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">medcond_yes</td>
<td class="gt_row gt_center" headers="estimate">0.00</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.03</td>
<td class="gt_row gt_center" headers="p.value">0.8</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">htnmeds_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.054</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">statins_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.02, 0.04</td>
<td class="gt_row gt_center" headers="p.value">0.5</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">diabetes_yes</td>
<td class="gt_row gt_center" headers="estimate">0.01</td>
<td class="gt_row gt_center" headers="conf.low">-0.04, 0.05</td>
<td class="gt_row gt_center" headers="p.value">0.7</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dmpills_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.095</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">insulin_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, 0.01</td>
<td class="gt_row gt_center" headers="p.value">0.14</td>
</tr>
</tbody><tfoot class="gt_sourcenotes">
<tr class="odd">
<td colspan="4" class="gt_sourcenote">Abbreviation: CI = Confidence Interval</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>We’ve reduced our model dimension and in fact now discover that waist circumference <em>was</em> significant after all! This is the benefit of a small degree of critical thought applied to the kitchen sink model - we’ve gotten one step closer to a good glass-box model.</p>
</section>
<section id="selecting-a-glass-box-model" class="level2">
<h2 class="anchored" data-anchor-id="selecting-a-glass-box-model">Selecting a glass-box model…</h2>
<p>Let’s say we want to get an even better glass-box model. Lots of stuff in our last model was insignificant; do we really need to keep them all? If we use fewer predictors, our model becomes more <em>parsimonious</em>, and thereby becomes more transparent.</p>
<p>Here’s how we could go about this:</p>
<ol type="1">
<li><em>Contextual model refining</em>: This is a great choice and should be the first go to, but it can be hard if there isn’t much context on the features, and in high dimensions. We already did this by looking only at waist as a candidate predictor.</li>
<li>Stepwise selection (select with AIC, BIC, p-values, etc.)</li>
<li>Best-subsets (select with AIC or BIC)</li>
<li>Lasso/penalized regression (Simultaneous selection &amp; estimation with shrinkage toward zero)</li>
<li>Bayesian methods</li>
</ol>
<p>For now, let’s show the results of a model selected via stepwise selection with AIC:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">fit_stepAIC <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stepAIC</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"both"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">trace =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tbl_regression</span>(fit_stepAIC) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bold_p</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div id="glgrtdklcm" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#glgrtdklcm table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#glgrtdklcm thead, #glgrtdklcm tbody, #glgrtdklcm tfoot, #glgrtdklcm tr, #glgrtdklcm td, #glgrtdklcm th {
  border-style: none;
}

#glgrtdklcm p {
  margin: 0;
  padding: 0;
}

#glgrtdklcm .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#glgrtdklcm .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#glgrtdklcm .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#glgrtdklcm .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#glgrtdklcm .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#glgrtdklcm .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#glgrtdklcm .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#glgrtdklcm .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#glgrtdklcm .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#glgrtdklcm .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#glgrtdklcm .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#glgrtdklcm .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#glgrtdklcm .gt_spanner_row {
  border-bottom-style: hidden;
}

#glgrtdklcm .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#glgrtdklcm .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#glgrtdklcm .gt_from_md > :first-child {
  margin-top: 0;
}

#glgrtdklcm .gt_from_md > :last-child {
  margin-bottom: 0;
}

#glgrtdklcm .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#glgrtdklcm .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#glgrtdklcm .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#glgrtdklcm .gt_row_group_first td {
  border-top-width: 2px;
}

#glgrtdklcm .gt_row_group_first th {
  border-top-width: 2px;
}

#glgrtdklcm .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#glgrtdklcm .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#glgrtdklcm .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#glgrtdklcm .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#glgrtdklcm .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#glgrtdklcm .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#glgrtdklcm .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#glgrtdklcm .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#glgrtdklcm .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#glgrtdklcm .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#glgrtdklcm .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#glgrtdklcm .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#glgrtdklcm .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#glgrtdklcm .gt_left {
  text-align: left;
}

#glgrtdklcm .gt_center {
  text-align: center;
}

#glgrtdklcm .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#glgrtdklcm .gt_font_normal {
  font-weight: normal;
}

#glgrtdklcm .gt_font_bold {
  font-weight: bold;
}

#glgrtdklcm .gt_font_italic {
  font-style: italic;
}

#glgrtdklcm .gt_super {
  font-size: 65%;
}

#glgrtdklcm .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#glgrtdklcm .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#glgrtdklcm .gt_indent_1 {
  text-indent: 5px;
}

#glgrtdklcm .gt_indent_2 {
  text-indent: 10px;
}

#glgrtdklcm .gt_indent_3 {
  text-indent: 15px;
}

#glgrtdklcm .gt_indent_4 {
  text-indent: 20px;
}

#glgrtdklcm .gt_indent_5 {
  text-indent: 25px;
}

#glgrtdklcm .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#glgrtdklcm div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-bootstrap="false">
<thead>
<tr class="gt_col_headings header">
<th id="label" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col"><strong>Characteristic</strong></th>
<th id="estimate" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>Beta</strong></th>
<th id="conf.low" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>95% CI</strong></th>
<th id="p.value" class="gt_col_heading gt_columns_bottom_border gt_center" data-quarto-table-cell-role="th" scope="col"><strong>p-value</strong></th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="label">age</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.01, 0.06</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.015</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">waist</td>
<td class="gt_row gt_center" headers="estimate">-0.04</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, -0.01</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.008</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">hdl</td>
<td class="gt_row gt_center" headers="estimate">0.69</td>
<td class="gt_row gt_center" headers="conf.low">0.67, 0.72</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">tg</td>
<td class="gt_row gt_center" headers="estimate">-0.06</td>
<td class="gt_row gt_center" headers="conf.low">-0.08, -0.03</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">ht_hormone.therapy</td>
<td class="gt_row gt_center" headers="estimate">0.19</td>
<td class="gt_row gt_center" headers="conf.low">0.16, 0.21</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">&lt;0.001</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">raceth_African.American</td>
<td class="gt_row gt_center" headers="estimate">0.03</td>
<td class="gt_row gt_center" headers="conf.low">0.00, 0.05</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.030</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">raceth_Other</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.032</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">physact_much.less.active</td>
<td class="gt_row gt_center" headers="estimate">-0.02</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value">0.10</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">physact_much.more.active</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.034</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">htnmeds_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.05, 0.00</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.030</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="label">dmpills_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, -0.01</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.012</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="label">insulin_yes</td>
<td class="gt_row gt_center" headers="estimate">-0.03</td>
<td class="gt_row gt_center" headers="conf.low">-0.06, -0.01</td>
<td class="gt_row gt_center" headers="p.value" style="font-weight: bold">0.015</td>
</tr>
</tbody><tfoot class="gt_sourcenotes">
<tr class="odd">
<td colspan="4" class="gt_sourcenote">Abbreviation: CI = Confidence Interval</td>
</tr>
</tfoot>

</table>

</div>
</div>
</div>
<p>Whoa! Such low p-values!!!</p>
<p>After this selective process, it seems that we can also claim significance for age, waist, medications, and insulin! And it’s a smaller model so it’s a more “glass-box” approach!</p>
<p>Right?</p>
<p>…Right?</p>
<p>……</p>
<p>Well, no. This is a classic example of an <strong>UPSI</strong>.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>UPSI
</div>
</div>
<div class="callout-body-container callout-body">
<p>A term I’m coining right now that stands for an “Unadjusted Post Selection Inference”. It also is a statistical “oopsie”.</p>
</div>
</div>
<p>You should know deep down the p-values from the model selected via stepwise AIC are too low. The data set has already been used to select the model, so of course everything that’s selected is more likely to be significant.</p>
<p>We’ll tackle UPSIs in more depth in another post, as well as “better” alternatives like selective inference. In short, inferences post-selection get tricky. Ignoring the variability inherent in the model selection process has been characterized as a common “bad” practice in statistics. UPSIs generally leads to invalid, non-replicable inferences.</p>
<blockquote class="blockquote">
<p>Then… why are UPSIs so common?</p>
</blockquote>
<p>Well, you tell me why anyone would want their p-values to be lower than they should be… 🙄</p>
<p>Eyerolls and publication bias aside, even well-intentioned statisticians find it difficult to properly adjusting these p-values for the selection process. Some say it’s impossible. Again, we’ll save this for another post.</p>
<blockquote class="blockquote">
<p>Is the stepwise AIC model a good glass-box model?</p>
</blockquote>
<ul>
<li>Linear? ✅</li>
<li>Predicts well? ✅</li>
<li>Parsimonious? ✅</li>
<li>Meaningful parameters? ✅ This model has fewer highly collinear features.</li>
<li>Valid uncertainty: ❌Unfortunately, <strong>selecting for a more parsimonious model created dishonestly low p-values</strong>.</li>
</ul>
<p>So while this model may predict better, is more parsimonious, and has more meaningful parameters, it’s still not an optimal glass-box model (under our refined definition).</p>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>Producing high-quality glass-box models is both necessary and difficult.</p>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key Takeaways</h2>
<ul>
<li>Regression is not necessarily a glass-box approach
<ul>
<li>collinearity can obfuscate meaningful relationships and patterns</li>
<li>p-values can be easily invalidated by selection</li>
</ul></li>
<li>Building an optimal glass-box model is not easy, even for <em>easy</em> scenarios</li>
<li>There is no substitute for domain expertise and critical thinking.</li>
</ul>
</section>
<section id="future-threads-related-questions" class="level2">
<h2 class="anchored" data-anchor-id="future-threads-related-questions">Future Threads / Related Questions</h2>
<ul>
<li>In the present era, how can we efficiently use domain expertise?</li>
<li>Isn’t there a tradeoff between model opacity and its ability to predict well?<br>
</li>
<li>Why are UPSIs a problem?</li>
<li>How can one perform valid post-selection inference?</li>
<li>What’s the difference between intrinsic and extrinsic interpretability?</li>
</ul>
<hr>
</section>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<details>
<summary>
R Session Info
</summary>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">sessioninfo<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">session_info</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Sequoia 15.7.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Los_Angeles
 date     2025-11-22
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package            * version    date (UTC) lib source
 adaptMCMC            1.5        2024-01-29 [1] CRAN (R 4.5.0)
 backports            1.5.0      2024-05-23 [1] CRAN (R 4.5.0)
 base64enc            0.1-3      2015-07-28 [1] CRAN (R 4.5.0)
 broom              * 1.0.10     2025-09-13 [1] CRAN (R 4.5.0)
 broom.helpers        1.22.0     2025-09-17 [1] CRAN (R 4.5.0)
 cards                0.7.0      2025-08-27 [1] CRAN (R 4.5.0)
 cardx                0.3.0      2025-08-27 [1] CRAN (R 4.5.0)
 class                7.3-23     2025-01-01 [2] CRAN (R 4.5.1)
 cli                  3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
 coda                 0.19-4.1   2024-01-31 [1] CRAN (R 4.5.0)
 codetools            0.2-20     2024-03-31 [2] CRAN (R 4.5.1)
 commonmark           2.0.0      2025-07-07 [1] CRAN (R 4.5.0)
 data.table           1.17.8     2025-07-10 [1] CRAN (R 4.5.0)
 digest               0.6.37     2024-08-19 [1] CRAN (R 4.5.0)
 dplyr              * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
 evaluate             1.0.5      2025-08-27 [1] CRAN (R 4.5.0)
 farver               2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
 fastmap              1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
 forcats            * 1.0.1      2025-09-25 [1] CRAN (R 4.5.0)
 foreach              1.5.2      2022-02-02 [1] CRAN (R 4.5.0)
 future               1.67.0     2025-07-29 [1] CRAN (R 4.5.0)
 future.apply         1.20.0     2025-06-06 [1] CRAN (R 4.5.0)
 generics             0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
 ggplot2            * 4.0.0      2025-09-11 [1] CRAN (R 4.5.0)
 glmnet               4.1-10     2025-07-17 [1] CRAN (R 4.5.0)
 globals              0.18.0     2025-05-08 [1] CRAN (R 4.5.0)
 glue                 1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
 gower                1.0.2      2024-12-17 [1] CRAN (R 4.5.0)
 gt                   1.0.0      2025-04-05 [1] CRAN (R 4.5.0)
 gtable               0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
 gtsummary          * 2.4.0      2025-08-28 [1] CRAN (R 4.5.0)
 hardhat              1.4.2      2025-08-20 [1] CRAN (R 4.5.0)
 haven                2.5.5      2025-05-30 [1] CRAN (R 4.5.0)
 hms                  1.1.4      2025-10-17 [1] CRAN (R 4.5.0)
 htmltools            0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets          1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
 intervals            0.15.5     2024-08-23 [1] CRAN (R 4.5.0)
 ipred                0.9-15     2024-07-18 [1] CRAN (R 4.5.0)
 iterators            1.0.14     2022-02-05 [1] CRAN (R 4.5.0)
 jsonlite             2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
 kableExtra         * 1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
 knitr                1.50       2025-03-16 [1] CRAN (R 4.5.0)
 labeling             0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
 labelled             2.15.0     2025-09-16 [1] CRAN (R 4.5.0)
 lattice              0.22-7     2025-04-02 [2] CRAN (R 4.5.1)
 lava                 1.8.2      2025-10-30 [1] CRAN (R 4.5.0)
 lifecycle            1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
 listenv              0.10.0     2025-11-02 [1] CRAN (R 4.5.0)
 litedown             0.7        2025-04-08 [1] CRAN (R 4.5.0)
 lubridate          * 1.9.4      2024-12-08 [1] CRAN (R 4.5.0)
 magrittr             2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
 markdown             2.0        2025-03-23 [1] CRAN (R 4.5.0)
 MASS                 7.3-65     2025-02-28 [2] CRAN (R 4.5.1)
 Matrix               1.7-3      2025-03-11 [2] CRAN (R 4.5.1)
 ncvreg               3.16.0     2025-10-09 [1] Github (pbreheny/ncvreg@5fecc8c)
 nnet                 7.3-20     2025-01-01 [1] CRAN (R 4.5.0)
 parallelly           1.45.1     2025-07-24 [1] CRAN (R 4.5.0)
 pbapply              1.7-4      2025-07-20 [1] CRAN (R 4.5.0)
 pillar               1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig            2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
 prodlim              2025.04.28 2025-04-28 [1] CRAN (R 4.5.0)
 purrr              * 1.2.0      2025-11-04 [1] CRAN (R 4.5.0)
 R6                   2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer         1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                 1.1.0      2025-07-02 [1] CRAN (R 4.5.0)
 readr              * 2.1.5      2024-01-10 [1] CRAN (R 4.5.0)
 recipes            * 1.3.1      2025-05-21 [1] CRAN (R 4.5.0)
 rlang                1.1.6      2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown            2.30       2025-09-28 [1] CRAN (R 4.5.0)
 rpart                4.1.24     2025-01-07 [2] CRAN (R 4.5.1)
 rstudioapi           0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
 S7                   0.2.0      2024-11-07 [1] CRAN (R 4.5.0)
 sass                 0.4.10     2025-04-11 [1] CRAN (R 4.5.0)
 scales               1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
 selectInferToolkit * 0.4.1      2025-11-19 [1] Github (petersonR/selectInferToolkit@8d37f5e)
 selectiveInference   1.2.5      2019-09-07 [1] CRAN (R 4.5.0)
 sessioninfo          1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
 shape                1.4.6.1    2024-02-23 [1] CRAN (R 4.5.0)
 sparsevctrs          0.3.4      2025-05-25 [1] CRAN (R 4.5.0)
 stringi              1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
 stringr            * 1.6.0      2025-11-04 [1] CRAN (R 4.5.0)
 survival             3.8-3      2024-12-17 [2] CRAN (R 4.5.1)
 svglite              2.2.1      2025-05-12 [1] CRAN (R 4.5.0)
 systemfonts          1.3.1      2025-10-01 [1] CRAN (R 4.5.0)
 textshaping          1.0.4      2025-10-10 [1] CRAN (R 4.5.0)
 tibble             * 3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
 tidyr              * 1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect           1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
 tidyverse          * 2.0.0      2023-02-22 [1] CRAN (R 4.5.0)
 timechange           0.3.0      2024-01-18 [1] CRAN (R 4.5.0)
 timeDate             4051.111   2025-10-17 [1] CRAN (R 4.5.0)
 tzdb                 0.5.0      2025-03-15 [1] CRAN (R 4.5.0)
 vctrs                0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
 viridisLite          0.4.2      2023-05-02 [1] CRAN (R 4.5.0)
 withr                3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
 xfun                 0.54       2025-10-30 [1] CRAN (R 4.5.0)
 xml2                 1.4.1      2025-10-27 [1] CRAN (R 4.5.0)
 yaml                 2.3.10     2024-07-26 [1] CRAN (R 4.5.0)

 [1] /Users/rpterson/Library/R/arm64/4.5/library
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
</details>


</section>

 ]]></description>
  <category>model selection</category>
  <category>glass-box modeling</category>
  <category>interpretability</category>
  <category>analysis</category>
  <category>R</category>
  <guid>https://www.data-diction.com/posts/glassbox-models/</guid>
  <pubDate>Mon, 24 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.data-diction.com/posts/glassbox-models/thumbnail.png" medium="image" type="image/png" height="69" width="144"/>
</item>
<item>
  <title>Did Denver’s 2022 ‘Zero Fare for Cleaner Air’ campaign actually work?</title>
  <dc:creator>Ryan Peterson</dc:creator>
  <link>https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/</link>
  <description><![CDATA[ 





<p><em>A data-dictated look at whether Denver’s free August public transit policy had its intended effect on air quality.</em></p>
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/hazy-denver.jpg" class="img-fluid" alt="Hazy Denver"> <small>Image credit: National Renewable Energy Laboratory, Colorado State University</small></p>
<section id="backstory" class="level1">
<h1>Backstory</h1>
<p>Most summers, Coloradoans flock to the majestic Rocky Mountains with their beautiful hikes and various mountain activities. This is the case, at least, unless poor air quality forces them indoors. For me, this occurred on a smoky July day in 2020, when surreal “snowing” ash sprinkling down from nearby wildfires forced us to evacuate the pickleball courts.</p>
<p>Between wildfires and pollution, Denver’s summer air often leaves room for improvement. Sadly, the Rockies don’t seem so enticing when they are obscured behind a polluted haze.</p>
<p>In August 2022, I noticed my RTD bus was more crowded than usual, and I was not asked to scan my bus pass. This is how I learned of Denver’s 2022 “Zero fare for cleaner air” initiative. Throughout the month, my packed bus led me to believe the policy did work to increase ridership.</p>
<p>My story was validated by the RTD; according to the <a href="https://www.rtd-denver.com/sites/default/files/files/2022-11/Zero-Fare%20August%20Impact%20Analysis%20Final%20Report%20-%2011.30.2022.pdf">final RTD report</a>, RTD did indeed see 22% increased ridership during the free-fare month, up 36% from the August prior. This increase led some to conclude that the campaign was a huge success, and also to the expansion of the program in 2023.</p>
<p>But wait… the campaign is called “Zero fare for better air”. So for this to really be a success, the policy change should be measurable in better air quality, not just ridership. To this point, the report concluded that “impacts to air quality are difficult to quantify”. They mention this difficulty is due to no baseline provided. So we’re left wondering – did it work? Did we actually have cleaner air in August of 2022?</p>
<p>Recently, my team investigated how the Covid-19 pandemic affected congestion and air quality in cities across the US (<a href="https://www.mdpi.com/2071-1050/13/13/7275">we found that it did</a>). In this post I use similar outcomes and methods to determine the impact of this policy in Denver.</p>
</section>
<section id="air-quality-data" class="level1">
<h1>Air Quality Data</h1>
<p>There are plenty of important pollutants to worry about in our air, but automobile traffic contributes especially to nitrous oxide (NO2) and ozone (O3). We’ll consider each of these using data from the EPA’s <a href="https://www.epa.gov/aqs">Air Quality System</a>.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Code and data for this and other blog posts are available <a href="https://github.com/petersonR/datadiction">here</a>.</p>
</div>
</div>
</section>
<section id="no2" class="level1">
<h1>NO2</h1>
<section id="data-visualizations" class="level2">
<h2 class="anchored" data-anchor-id="data-visualizations">Data visualizations</h2>
<p>Here are plots of the historical daily data for NO2 in Denver.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">Raw data</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Transformed data</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false" href="">2022 only</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false" href="">Monthly</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-5-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-5" aria-controls="tabset-1-5" aria-selected="false" href="">Yearly</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-1-5" class="tab-pane" aria-labelledby="tabset-1-5-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="modeling" class="level2">
<h2 class="anchored" data-anchor-id="modeling">Modeling</h2>
<p>We can use this historical data to build a forecast of what August’s NO2 levels would be using <a href="https://arxiv.org/abs/2211.01492">forecasting methodology</a> available in the <a href="https://www.github.com/petersonR/fastTS"><code>fastTS</code> R package</a> that can handle this kind of seasonal data. The series is logged (+10) prior to modeling. We include weekday and month indicator variables and a natural cubic basis spline for time. We computed 30-day-ahead predictions and tested whether these predictions were significantly different than the observed daily values during the zero-fare period. As some months were easier to forecast than others (August was easier to forecast than winter months), we also use heteroskedasticity-corrected standard errors. To evaluate our model, a 10% test set was held out.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/modeling_no2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Our model can predict daily NO2 on the log scale to within about 0.175 units, with an out-of-sample <img src="https://latex.codecogs.com/png.latex?R%5E2"> of 0.533 (about 53% of the variation in this outcome can be explained by historical patterns in our model).</p>
<p>The observed daily NO2 values were on average a factor of 0.932 lower, or 6.8% lower, during the zero-fare month compared to their forecasted values (95% CI: 0.921, 0.944). This is strong evidence of a decrease in daily NO2 during the month of August 2022 than would have been expected historically.</p>
</section>
</section>
<section id="ozone" class="level1">
<h1>Ozone</h1>
<section id="data-visualizations-1" class="level2">
<h2 class="anchored" data-anchor-id="data-visualizations-1">Data visualizations</h2>
<p>Here are plots of the historical daily data for ozone in Denver.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">Raw data</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">2022 only</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-3" aria-controls="tabset-2-3" aria-selected="false" href="">Monthly</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-4" aria-controls="tabset-2-4" aria-selected="false" href="">Yearly</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-2-3" class="tab-pane" aria-labelledby="tabset-2-3-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-2-4" class="tab-pane" aria-labelledby="tabset-2-4-tab">
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="modeling-1" class="level2">
<h2 class="anchored" data-anchor-id="modeling-1">Modeling</h2>
<p>Modeling of ozone data proceeded similarly, although no outcome transformation was used.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/index_files/figure-html/modeling_ozn-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Our model can predict daily ozone to within about 0.005 units, with an out-of-sample <img src="https://latex.codecogs.com/png.latex?R%5E2"> of 0.67 (about 67% of the variation in this outcome can be explained by historical patterns in our model).</p>
<p>The observed daily ozone values were on average -0.001 parts per million lower during the zero-fare month compared to their forecasted values (95% CI: -0.002, -0.001). This doesn’t show evidence of a change in daily ozone during the month of August 2022 in comparison to would have been expected historically.</p>
</section>
</section>
<section id="takeaways" class="level1">
<h1>Takeaways</h1>
<ul>
<li>Daily average NO2 in Denver during the zero fare month was about 7% less than forecasts (p &lt; 0.001)!</li>
<li>No observable change was seen in ozone relative to forecasts.</li>
<li>There is room for more to be done to improve Denver’s air quality.</li>
</ul>
</section>
<section id="limitations" class="level1">
<h1>Limitations</h1>
<p>Ozone and NO2 are affected by many things on a daily basis, which were not controlled for in this analysis. A more effective analysis would control for these things, which might have improved the precision of the model estimates or better account for the possibility of confounding. Both outcomes are also not perfectly measured by the AQS stations scattered about the city of Denver; there’s always the possibility that more accurate or more granular data could better show an effect of the zero-fare policy.</p>
</section>
<section id="sensitivity-of-method" class="level1">
<h1>Sensitivity of method</h1>
<p>If the method we used for determining the effect of intervention were flawed, we might expect to see high rejection rates for any other subset of 31 days. We can check this by reproducing the same method for 31-day chunks of time surrounding August 2022. Below is a table of the estimated effect under the same methodology for every cut point listed, including FDR-adjusted (and nominal) p-values as well as effect size estimates.</p>
<div class="cell">
<div class="cell-output-display">
<table class="lightable-minimal caption-top table table-sm table-striped small">
<thead>
<tr class="header">
<th style="text-align: left;" data-quarto-table-cell-role="th">cut</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">estimate</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">ci_lb</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">ci_ub</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">p.value</th>
<th style="text-align: left;" data-quarto-table-cell-role="th">p.adj</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">2021-01-01</td>
<td style="text-align: right;">1.008</td>
<td style="text-align: right;">0.96</td>
<td style="text-align: right;">1.06</td>
<td style="text-align: left;">0.86</td>
<td style="text-align: left;">0.95</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-02-01</td>
<td style="text-align: right;">0.989</td>
<td style="text-align: right;">0.95</td>
<td style="text-align: right;">1.03</td>
<td style="text-align: left;">0.80</td>
<td style="text-align: left;">0.95</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2021-03-01</td>
<td style="text-align: right;">1.082</td>
<td style="text-align: right;">1.04</td>
<td style="text-align: right;">1.13</td>
<td style="text-align: left;">0.054</td>
<td style="text-align: left;">0.21</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-04-01</td>
<td style="text-align: right;">0.997</td>
<td style="text-align: right;">0.97</td>
<td style="text-align: right;">1.02</td>
<td style="text-align: left;">0.90</td>
<td style="text-align: left;">0.95</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2021-05-01</td>
<td style="text-align: right;">1.035</td>
<td style="text-align: right;">1.01</td>
<td style="text-align: right;">1.06</td>
<td style="text-align: left;">0.094</td>
<td style="text-align: left;">0.29</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-06-01</td>
<td style="text-align: right;">1.030</td>
<td style="text-align: right;">1.01</td>
<td style="text-align: right;">1.05</td>
<td style="text-align: left;">0.17</td>
<td style="text-align: left;">0.37</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2021-07-01</td>
<td style="text-align: right;">0.965</td>
<td style="text-align: right;">0.95</td>
<td style="text-align: right;">0.98</td>
<td style="text-align: left;">0.033</td>
<td style="text-align: left;">0.16</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-08-01</td>
<td style="text-align: right;">0.977</td>
<td style="text-align: right;">0.96</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: left;">0.18</td>
<td style="text-align: left;">0.37</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2021-09-01</td>
<td style="text-align: right;">0.999</td>
<td style="text-align: right;">0.97</td>
<td style="text-align: right;">1.03</td>
<td style="text-align: left;">0.99</td>
<td style="text-align: left;">0.99</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-10-01</td>
<td style="text-align: right;">1.004</td>
<td style="text-align: right;">0.97</td>
<td style="text-align: right;">1.04</td>
<td style="text-align: left;">0.91</td>
<td style="text-align: left;">0.95</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2021-11-01</td>
<td style="text-align: right;">0.944</td>
<td style="text-align: right;">0.91</td>
<td style="text-align: right;">0.98</td>
<td style="text-align: left;">0.12</td>
<td style="text-align: left;">0.32</td>
</tr>
<tr class="even">
<td style="text-align: left;">2021-12-01</td>
<td style="text-align: right;">0.977</td>
<td style="text-align: right;">0.94</td>
<td style="text-align: right;">1.02</td>
<td style="text-align: left;">0.58</td>
<td style="text-align: left;">0.89</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-01-01</td>
<td style="text-align: right;">1.132</td>
<td style="text-align: right;">1.09</td>
<td style="text-align: right;">1.18</td>
<td style="text-align: left;">0.002</td>
<td style="text-align: left;">0.019</td>
</tr>
<tr class="even">
<td style="text-align: left;">2022-02-01</td>
<td style="text-align: right;">1.024</td>
<td style="text-align: right;">0.98</td>
<td style="text-align: right;">1.07</td>
<td style="text-align: left;">0.59</td>
<td style="text-align: left;">0.89</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-03-01</td>
<td style="text-align: right;">0.958</td>
<td style="text-align: right;">0.93</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: left;">0.15</td>
<td style="text-align: left;">0.36</td>
</tr>
<tr class="even">
<td style="text-align: left;">2022-04-01</td>
<td style="text-align: right;">0.920</td>
<td style="text-align: right;">0.89</td>
<td style="text-align: right;">0.95</td>
<td style="text-align: left;">0.009</td>
<td style="text-align: left;">0.068</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-05-01</td>
<td style="text-align: right;">0.986</td>
<td style="text-align: right;">0.96</td>
<td style="text-align: right;">1.01</td>
<td style="text-align: left;">0.54</td>
<td style="text-align: left;">0.89</td>
</tr>
<tr class="even">
<td style="text-align: left;">2022-06-01</td>
<td style="text-align: right;">1.019</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: right;">1.04</td>
<td style="text-align: left;">0.44</td>
<td style="text-align: left;">0.81</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-07-01</td>
<td style="text-align: right;">0.970</td>
<td style="text-align: right;">0.95</td>
<td style="text-align: right;">0.99</td>
<td style="text-align: left;">0.098</td>
<td style="text-align: left;">0.29</td>
</tr>
<tr class="even">
<td style="text-align: left; font-weight: bold;">2022-08-01</td>
<td style="text-align: right; font-weight: bold;">0.932</td>
<td style="text-align: right; font-weight: bold;">0.92</td>
<td style="text-align: right; font-weight: bold;">0.94</td>
<td style="text-align: left; font-weight: bold;">&lt; 0.001</td>
<td style="text-align: left; font-weight: bold;">&lt; 0.001</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-09-01</td>
<td style="text-align: right;">0.989</td>
<td style="text-align: right;">0.97</td>
<td style="text-align: right;">1.01</td>
<td style="text-align: left;">0.66</td>
<td style="text-align: left;">0.93</td>
</tr>
<tr class="even">
<td style="text-align: left;">2022-10-01</td>
<td style="text-align: right;">0.990</td>
<td style="text-align: right;">0.96</td>
<td style="text-align: right;">1.02</td>
<td style="text-align: left;">0.70</td>
<td style="text-align: left;">0.93</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2022-11-01</td>
<td style="text-align: right;">1.008</td>
<td style="text-align: right;">0.97</td>
<td style="text-align: right;">1.05</td>
<td style="text-align: left;">0.85</td>
<td style="text-align: left;">0.95</td>
</tr>
<tr class="even">
<td style="text-align: left;">2022-12-01</td>
<td style="text-align: right;">1.103</td>
<td style="text-align: right;">1.05</td>
<td style="text-align: right;">1.15</td>
<td style="text-align: left;">0.032</td>
<td style="text-align: left;">0.16</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<section id="on-the-horizon" class="level1">
<h1>On the horizon</h1>
<p>Come August 2023, Denver will roll out the program again and I will revisit this analysis to see whether zero fares produce <em>observably</em> cleaner air throughout the month. Please check out <a href="https://zerofareaugust.coloradotransit.com/">their website</a> to sign up and participate.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Again, code and data for this and other posts are available <a href="https://github.com/petersonR/datadiction">here</a>. This post was updated on 2/15/2024 to point to the <code>fastTS</code> R package, which is an updated version of <code>srlTS</code>, and again on 6/11/2024 based on a bug fix in <code>fastTS</code> 1.0.0, which strengthened the observed effect of NO2.</p>
</div>
</div>
<hr>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<details>
<summary>
R Session Info
</summary>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">sessioninfo<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">session_info</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Sequoia 15.7.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2025-11-20
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version date (UTC) lib source
 bit            4.6.0   2025-03-06 [1] CRAN (R 4.5.0)
 bit64          4.6.0-1 2025-01-16 [1] CRAN (R 4.5.0)
 cli            3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
 crayon         1.5.3   2024-06-20 [1] CRAN (R 4.5.0)
 digest         0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
 dplyr        * 1.1.4   2023-11-17 [1] CRAN (R 4.5.0)
 evaluate       1.0.5   2025-08-27 [1] CRAN (R 4.5.0)
 farver         2.1.2   2024-05-13 [1] CRAN (R 4.5.0)
 fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
 forcats      * 1.0.1   2025-09-25 [1] CRAN (R 4.5.0)
 generics       0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
 ggplot2      * 4.0.0   2025-09-11 [1] CRAN (R 4.5.0)
 glue           1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
 gtable         0.3.6   2024-10-25 [1] CRAN (R 4.5.0)
 here           1.0.2   2025-09-15 [1] CRAN (R 4.5.0)
 hms            1.1.4   2025-10-17 [1] CRAN (R 4.5.0)
 htmltools      0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets    1.6.4   2023-12-06 [1] CRAN (R 4.5.0)
 jsonlite       2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
 knitr          1.50    2025-03-16 [1] CRAN (R 4.5.0)
 lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
 lubridate    * 1.9.4   2024-12-08 [1] CRAN (R 4.5.0)
 magrittr       2.0.4   2025-09-12 [1] CRAN (R 4.5.0)
 pillar         1.11.1  2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.5.0)
 png          * 0.1-8   2022-11-29 [1] CRAN (R 4.5.0)
 prettyunits    1.2.0   2023-09-24 [1] CRAN (R 4.5.0)
 progress     * 1.2.3   2023-12-06 [1] CRAN (R 4.5.0)
 purrr        * 1.2.0   2025-11-04 [1] CRAN (R 4.5.0)
 R6             2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer   1.1-3   2022-04-03 [1] CRAN (R 4.5.0)
 readr        * 2.1.5   2024-01-10 [1] CRAN (R 4.5.0)
 rlang          1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown      2.30    2025-09-28 [1] CRAN (R 4.5.0)
 rprojroot      2.1.1   2025-08-26 [1] CRAN (R 4.5.0)
 rstudioapi     0.17.1  2024-10-22 [1] CRAN (R 4.5.0)
 S7             0.2.0   2024-11-07 [1] CRAN (R 4.5.0)
 scales         1.4.0   2025-04-24 [1] CRAN (R 4.5.0)
 sessioninfo    1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
 stringi        1.8.7   2025-03-27 [1] CRAN (R 4.5.0)
 stringr      * 1.6.0   2025-11-04 [1] CRAN (R 4.5.0)
 tibble       * 3.3.0   2025-06-08 [1] CRAN (R 4.5.0)
 tidyr        * 1.3.1   2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect     1.2.1   2024-03-11 [1] CRAN (R 4.5.0)
 tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.5.0)
 timechange     0.3.0   2024-01-18 [1] CRAN (R 4.5.0)
 tzdb           0.5.0   2025-03-15 [1] CRAN (R 4.5.0)
 vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.5.0)
 vroom          1.6.6   2025-09-19 [1] CRAN (R 4.5.0)
 withr          3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
 xfun           0.54    2025-10-30 [1] CRAN (R 4.5.0)
 yaml           2.3.10  2024-07-26 [1] CRAN (R 4.5.0)

 [1] /Users/rpterson/Library/R/arm64/4.5/library
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
</details>


</section>

 ]]></description>
  <category>news</category>
  <category>analysis</category>
  <category>environment</category>
  <category>time series</category>
  <category>R</category>
  <guid>https://www.data-diction.com/posts/did-denver-zero-fare-policy-work/</guid>
  <pubDate>Fri, 21 Jul 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Detecting interactions in R</title>
  <dc:creator>Ryan Peterson</dc:creator>
  <link>https://www.data-diction.com/posts/selecting-interactions/</link>
  <description><![CDATA[ 





<p><em>But what about interactions; are any of those significant?</em></p>
<p>I have heard some variant of this question from clinicians and researchers from many fields of science. While usually asked in earnest, <strong>this question is a dangerous one</strong>; the sheer number of interactions can greatly inflate the number of false discoveries in the interactions, resulting in difficult-to-interpret models with many unnecessary interactions. Still, there are times when these expeditions are necessary and fruitful. Thankfully, useful tools are now available to help with the process. This article discusses two regularization-based approaches: Group-Lasso INTERaction-NET (glinternet) and the Sparsity-Ranked Lasso (SRL). The glinternet method implements a hierarchy-preserving selection and estimation procedure, while the SRL is a hierarchy-preferring regularization method which operates under ranked sparsity principles (in short, ranked sparsity methods ensure interactions are treated more skeptically than main effects <em>a priori</em>).</p>
<section id="useful-package-1-ranked-sparsity-methods-via-sparser" class="level2">
<h2 class="anchored" data-anchor-id="useful-package-1-ranked-sparsity-methods-via-sparser">Useful package #1: ranked sparsity methods via <strong>sparseR</strong></h2>
<p>The <strong>sparseR</strong> package has been designed to make dealing with interactions and polynomials much more analyst-friendly. Building on the <strong>recipes</strong> package, <strong>sparseR</strong> has many built-in tools to facilitate the prepping of a model matrix with interactions and polynomials; these features are presented in the package website located at https://petersonr.github.io/sparseR/. The package is available on CRAN and can be installed and loaded with the code below</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sparseR"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sparseR)</span></code></pre></div></div>
</div>
<p>The simplest way to implement the SRL in <strong>sparseR</strong> is via a single call to the <code>sparseR()</code> function, here demonstrated with Fisher’s <code>iris</code> data set. 10-fold cross-validation is used by default, so we set the <code>seed = 1</code> here for reproducibility.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb2-2">srl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sparseR</span>(Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb2-3">srl</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Model summary @ min CV:
-----------------------------------------------------
  lasso-penalized linear regression with n=150, p=21
  (At lambda=0.0019):
    Nonzero coefficients: 8
    Cross-validation error (deviance): 0.07
    R-squared: 0.63
    Signal-to-noise ratio: 1.71
    Scale estimate (sigma): 0.264

  SR information:
             Vartype Total Selected Saturation Penalty
         Main effect     6        3      0.500    2.45
 Order 1 interaction    12        3      0.250    3.46
  Order 2 polynomial     3        2      0.667    3.00


Model summary @ CV1se:
-----------------------------------------------------
  lasso-penalized linear regression with n=150, p=21
  (At lambda=0.0070):
    Nonzero coefficients: 6
    Cross-validation error (deviance): 0.08
    R-squared: 0.58
    Signal-to-noise ratio: 1.39
    Scale estimate (sigma): 0.281

  SR information:
             Vartype Total Selected Saturation Penalty
         Main effect     6        2      0.333    2.45
 Order 1 interaction    12        2      0.167    3.46
  Order 2 polynomial     3        2      0.667    3.00</code></pre>
</div>
</div>
<p>The <code>summary</code> function produces additional details:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(srl, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">at =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cv1se"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>lasso-penalized linear regression with n=150, p=21
At lambda=0.0070:
-------------------------------------------------
  Nonzero coefficients         :   6
  Expected nonzero coefficients:   0.44
  Average mfdr (6 features)    :   0.074

                               Estimate      z       mfdr Selected
Species_setosa                  0.82889 18.596    &lt; 1e-04        *
Sepal.Length_poly_1             0.19494  9.638    &lt; 1e-04        *
Petal.Width_poly_2              0.10142  4.698 0.00016138        *
Petal.Width:Species_versicolor  0.29190  3.335 0.02568952        *
Sepal.Length:Species_setosa     0.06826  2.769 0.14613161        *
Sepal.Length_poly_2            -0.03215 -2.694 0.27005358        *</code></pre>
</div>
</div>
<p>We see that two models are displayed by default corresponding to two “smart” choices for the penalization parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda">. The first model printed refers to the model where <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is set to minimize the cross-validated error, while the second one refers to a model where <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is set to a value such that the model is as sparse as possible while still being within 1 SD of the minimum cross-validated error. Visualizations are also available via sparseR that can help visualize both the solution path and the resulting model (interactions can be very challenging to interpret without a good figure!)</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(srl)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/selecting-interactions/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/selecting-interactions/index_files/figure-html/unnamed-chunk-5-2.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">effect_plot</span>(srl, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Species"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">at =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cvmin"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/selecting-interactions/index_files/figure-html/unnamed-chunk-5-3.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">effect_plot</span>(srl, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Species"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">at =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cv1se"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/selecting-interactions/index_files/figure-html/unnamed-chunk-5-4.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Note that while ranked sparsity principles were motivated by the estimation of the lasso (Peterson &amp; Cavanaugh 2022), they can also be implemented with MCP, SCAD, or elastic net and for binary, normal, and survival data. Finally, sparseR includes some functionality to perform forward-stepwise selection using a sparsity-ranked modification of BIC, as well as post-selection inferential techniques using sample splitting and bootstrapping.</p>
</section>
<section id="useful-package-2-hierarchy-preserving-regularization-via-glinternet" class="level2">
<h2 class="anchored" data-anchor-id="useful-package-2-hierarchy-preserving-regularization-via-glinternet">Useful package #2: hierarchy-preserving regularization via <strong>glinternet</strong></h2>
<p>Some argue that when it comes to interactions, hierarchy is very important (i.e., an interaction shouldn’t be included in a model without its constituent main effects). While ranked sparsity methods do <em>prefer</em> hierarchical models, they can often still produce non-hierarchical ones. The <strong>glinternet</strong> package and the function of the same name uses regularization for model selection under hierarchy constraint, such that all candidate models are hierarchical. <strong>Glinternet</strong> can handle both continuous and categorical predictors, but requires pre-specification of a numeric model matrix. It can be performed as follows:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("glinternet")</span></span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glinternet)</span>
<span id="cb9-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb9-4"></span>
<span id="cb9-5">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Sepal.Width) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Species =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(Species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb9-8"></span>
<span id="cb9-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">321</span>)</span>
<span id="cb9-10">cv_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glinternet.cv</span>(X, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Y =</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">numLevels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
<p>The <code>cv_fit</code> object contains necessary information from the cross-validation procedure and the fits themselves stored in a series of lists. A more in-depth tutorial to extract coefficients (and facilitate a model interpretation) using the <strong>glinternet</strong> package can be found at https://strakaps.github.io/post/glinternet/. Importantly, both the <strong>glinternet</strong> and <strong>sparseR</strong> methods have associated predict methods which can yield predictions on new (or the training) data, shown below. For comparison, we also fit a “main effects only” model with <strong>sparseR</strong> by setting <code>k = 0</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">me <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sparseR</span>(Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">333</span>)</span>
<span id="cb10-2">p_me <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(me)</span>
<span id="cb10-3">p_srl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(srl)</span>
<span id="cb10-4">p_gln <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.vector</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(cv_fit, X))</span></code></pre></div></div>
</div>
<p>With a little help from the <strong>yardstick</strong> package’s <code>metrics()</code> function, we can compare the accuracy of each model’s predictions using root-mean-squared error (RMSE), R-squared (RSQ), and mean absolute error (MAE); see table below. Evidently, <strong>glinternet</strong> and SRL are similar in terms of their predictive performance. However, both outperform the main effects model considerably, suggesting interactions among other variables do have signal worth capturing when predicting <code>Sepal.Width</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">gln_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(p_gln, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Width) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-2">  yardstick<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">metrics</span>(y, p_gln) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"glinternet"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> .estimate) </span>
<span id="cb11-4">srl_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(p_srl, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Width) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-5">  yardstick<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">metrics</span>(y, p_srl) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SRL"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> .estimate) </span>
<span id="cb11-7">me_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(p_me, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Width) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-8">  yardstick<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">metrics</span>(y, p_me) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Main effects only"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> .estimate) </span>
<span id="cb11-10"></span>
<span id="cb11-11">results_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> gln_res <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(srl_res[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(me_res[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Metric"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> .metric) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Metric =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toupper</span>(Metric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>.estimator)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="cell-output-display">
<table class="table table-striped caption-top table-sm small">
<thead>
<tr class="header">
<th style="text-align: left;" data-quarto-table-cell-role="th">Metric</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">glinternet</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">SRL</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">Main effects only</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">RMSE</td>
<td style="text-align: right;">0.24</td>
<td style="text-align: right;">0.25</td>
<td style="text-align: right;">0.25</td>
</tr>
<tr class="even">
<td style="text-align: left;">RSQ</td>
<td style="text-align: right;">0.69</td>
<td style="text-align: right;">0.68</td>
<td style="text-align: right;">0.66</td>
</tr>
<tr class="odd">
<td style="text-align: left;">MAE</td>
<td style="text-align: right;">0.19</td>
<td style="text-align: right;">0.19</td>
<td style="text-align: right;">0.19</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<section id="other-packages-worth-mentioning-ncvreg-hiernet-visreg-sjplot" class="level2">
<h2 class="anchored" data-anchor-id="other-packages-worth-mentioning-ncvreg-hiernet-visreg-sjplot">Other packages worth mentioning: ncvreg, hierNet, visreg, sjPlot</h2>
<p>The SRL and other sparsity-ranked regularization methods implemented in <strong>sparseR</strong> would not be possible without the <strong>ncvreg</strong> package, which performs the heavy-lifting in terms of model fitting, optimization, and cross-validation. The <strong>hierNet</strong> package is another hierarchy-enforcing procedure that may yield better models than <strong>glinternet</strong>, however the latter is more computationally efficient especially for situations with a medium-to-large number of covariates. Finally, when interactions or polynomials are included in models, figures are truly worth a thousand words, and packages such as <strong>visreg</strong> and <strong>sjPlot</strong> have great functionality for plotting the effects of interactions.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Bien J and Tibshirani R (2020). hierNet: A Lasso for Hierarchical Interactions. R package version 1.9. https://CRAN.R-project.org/package=hierNet</li>
<li>Breheny P and Burchett W (2017). Visualization of Regression Models Using visreg. The R Journal, 9: 56-71.</li>
<li>Breheny P and Huang J (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.</li>
<li>Kuhn M and Vaughan D (2021). yardstick: Tidy Characterizations of Model Performance. R package version 0.0.8. https://CRAN.R-project.org/package=yardstick</li>
<li>Lim M and Hastie T (2020). glinternet: Learning Interactions via Hierarchical Group-Lasso Regularization. R package version 1.0.11. https://CRAN.R-project.org/package=glinternet</li>
<li>Lüdecke D (2021). sjPlot: Data Visualization for Statistics in Social Science. R package version 2.8.8. https://CRAN.R-project.org/package=sjPlot</li>
<li>Peterson R (2021). sparseR: Variable selection under ranked sparsity principles for interactions and polynomials. https://github.com/petersonR/sparseR/.</li>
<li>Peterson, R, Cavanaugh, J. Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials. AStA Adv Stat Anal 106, 427–454 (2022). https://doi.org/10.1007/s10182-021-00431-7</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This post was originally published in the <a href="https://www.biometricsociety.org/publications/biometric-bulletin">Biometric Bulletin (2021) Volume 38 Issue 3</a>.</p>
</div>
</div>
<hr>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<details>
<summary>
R Session Info
</summary>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">sessioninfo<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">session_info</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Sequoia 15.7.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2025-11-20
 pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 class          7.3-23     2025-01-01 [2] CRAN (R 4.5.1)
 cli            3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
 codetools      0.2-20     2024-03-31 [2] CRAN (R 4.5.1)
 data.table     1.17.8     2025-07-10 [1] CRAN (R 4.5.0)
 digest         0.6.37     2024-08-19 [1] CRAN (R 4.5.0)
 dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
 evaluate       1.0.5      2025-08-27 [1] CRAN (R 4.5.0)
 farver         2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
 fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
 future         1.67.0     2025-07-29 [1] CRAN (R 4.5.0)
 future.apply   1.20.0     2025-06-06 [1] CRAN (R 4.5.0)
 generics       0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
 glinternet   * 1.0.12     2021-09-03 [1] CRAN (R 4.5.0)
 globals        0.18.0     2025-05-08 [1] CRAN (R 4.5.0)
 glue           1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
 gower          1.0.2      2024-12-17 [1] CRAN (R 4.5.0)
 hardhat        1.4.2      2025-08-20 [1] CRAN (R 4.5.0)
 htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
 ipred          0.9-15     2024-07-18 [1] CRAN (R 4.5.0)
 jsonlite       2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
 kableExtra   * 1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
 knitr          1.50       2025-03-16 [1] CRAN (R 4.5.0)
 lattice        0.22-7     2025-04-02 [2] CRAN (R 4.5.1)
 lava           1.8.2      2025-10-30 [1] CRAN (R 4.5.0)
 lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
 listenv        0.10.0     2025-11-02 [1] CRAN (R 4.5.0)
 lubridate      1.9.4      2024-12-08 [1] CRAN (R 4.5.0)
 magrittr       2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
 MASS           7.3-65     2025-02-28 [2] CRAN (R 4.5.1)
 Matrix         1.7-3      2025-03-11 [2] CRAN (R 4.5.1)
 ncvreg         3.16.0     2025-10-09 [1] Github (pbreheny/ncvreg@5fecc8c)
 nnet           7.3-20     2025-01-01 [1] CRAN (R 4.5.0)
 parallelly     1.45.1     2025-07-24 [1] CRAN (R 4.5.0)
 pillar         1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
 prodlim        2025.04.28 2025-04-28 [1] CRAN (R 4.5.0)
 purrr          1.2.0      2025-11-04 [1] CRAN (R 4.5.0)
 R6             2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp           1.1.0      2025-07-02 [1] CRAN (R 4.5.0)
 recipes        1.3.1      2025-05-21 [1] CRAN (R 4.5.0)
 rlang          1.1.6      2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown      2.30       2025-09-28 [1] CRAN (R 4.5.0)
 rpart          4.1.24     2025-01-07 [2] CRAN (R 4.5.1)
 rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
 scales         1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
 sessioninfo    1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
 sparseR      * 0.3.2      2025-04-14 [1] CRAN (R 4.5.0)
 sparsevctrs    0.3.4      2025-05-25 [1] CRAN (R 4.5.0)
 stringi        1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
 stringr        1.6.0      2025-11-04 [1] CRAN (R 4.5.0)
 survival       3.8-3      2024-12-17 [2] CRAN (R 4.5.1)
 svglite        2.2.1      2025-05-12 [1] CRAN (R 4.5.0)
 systemfonts    1.3.1      2025-10-01 [1] CRAN (R 4.5.0)
 textshaping    1.0.4      2025-10-10 [1] CRAN (R 4.5.0)
 tibble         3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
 tidyr          1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
 timechange     0.3.0      2024-01-18 [1] CRAN (R 4.5.0)
 timeDate       4051.111   2025-10-17 [1] CRAN (R 4.5.0)
 vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
 viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.5.0)
 withr          3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
 xfun           0.54       2025-10-30 [1] CRAN (R 4.5.0)
 xml2           1.4.1      2025-10-27 [1] CRAN (R 4.5.0)
 yaml           2.3.10     2024-07-26 [1] CRAN (R 4.5.0)
 yardstick      1.3.2      2025-01-22 [1] CRAN (R 4.5.0)

 [1] /Users/rpterson/Library/R/arm64/4.5/library
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
</details>


</section>

 ]]></description>
  <category>statistical computation</category>
  <category>analysis</category>
  <category>interpretability</category>
  <category>model selection</category>
  <category>R</category>
  <guid>https://www.data-diction.com/posts/selecting-interactions/</guid>
  <pubDate>Tue, 20 Jun 2023 00:00:00 GMT</pubDate>
  <media:content url="https://www.data-diction.com/posts/selecting-interactions/thumbnail.png" medium="image" type="image/png" height="103" width="144"/>
</item>
<item>
  <title>Welcome to Data Diction</title>
  <dc:creator>Ryan Peterson</dc:creator>
  <link>https://www.data-diction.com/posts/welcome/</link>
  <description><![CDATA[ 





<!-- Edit the line below after review -->
<div class="callout callout-style-default callout-note callout-empty-content callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Reviewed by Logan Harris on 2025-11-12
</div>
</div>
<div class="callout-body-container callout-body">

</div>
</div>
<hr>
<p><img src="https://www.data-diction.com/posts/welcome/logo2.png" class="img-fluid"></p>
<section id="data-diction" class="level1">
<h1>Data Diction</h1>
<ul>
<li><p><strong>Data</strong>: things known or assumed as facts, making the basis of reasoning or calculation</p></li>
<li><p><strong>Diction</strong>: 1) the choice and use of words and phrases in speech or writing. 2) the choice of words especially with regard to correctness, clearness, or effectiveness.</p></li>
</ul>
<p>In addition to the play on “Data Addiction”, Data Diction is also a play on the very commonly used term of “Data dictionary”, a term with which statistical practitioners should be familiar.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Data Diction started in 2022 as a blog. Posts were infrequent. The Glassbox Modeling Working Group (see below) now aims to improve post frequency and participation by multiple authors and reviewers who will collaborate and disseminate quick tutorials, opinion pieces, etc.</p>
</div>
</div>
</section>
<section id="logo" class="level1">
<h1>Logo</h1>
<p>Ryan Peterson created the logo using ozone data from Denver that he compiled for the <a href="../../posts/did-denver-zero-fare-policy-work/index.html">post about Denver’s 2021 “Zero Fare for Cleaner Air”</a>.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"></span>
<span id="cb1-3">df_ozone <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(here<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts/did-denver-zero-fare-policy-work/ozone_data-91-23.csv"</span>))</span>
<span id="cb1-4"></span>
<span id="cb1-5">df_ozone<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>daily_avg <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> imputeTS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">na_kalman</span>(df_ozone<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>daily_avg)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(df_ozone, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>date_local, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>daily_avg)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">85</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue4"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">level =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">999999</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ylab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Daily average Ozone in Denver"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb1-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xlab</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.data-diction.com/posts/welcome/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="glassbox-modeling-working-group" class="level1">
<h1>Glassbox Modeling Working Group</h1>
<section id="overviewpurpose" class="level2">
<h2 class="anchored" data-anchor-id="overviewpurpose">Overview/Purpose</h2>
<p>This working group is a collaboration centered on broadly improving statistical practice regarding model selection, model transparency, and post-selection inference.</p>
<p><strong>Mission</strong>: Advance the science and practice of transparent, interpretable, and reproducible modeling through collaborative research, education, and dissemination.</p>
<p><strong>Vision</strong>: Establish a cross-institutional hub that develops novel glass box methods and disseminates best practices for glass box methods in accessible formats such as software, tutorials, papers, and concise blog posts.</p>
</section>
<section id="values" class="level2">
<h2 class="anchored" data-anchor-id="values">Values</h2>
<ol type="1">
<li>Team science: We approach each issue from multiple perspectives, including
<ul>
<li><strong>Applied statisticians</strong> wishing to perform best practices</li>
<li><strong>Domain experts</strong> seeking to understand glass-box approaches and issues with bad statistical practices</li>
<li><strong>Students</strong> aspiring to understand and apply sound statistical reasoning</li>
<li><strong>AI systems</strong> (e.g.&nbsp;LLMs like ChatGPT) ingesting our human-authored material</li>
</ul></li>
<li>Reproducibility: We ensure all analyses can be independently verified and replicated</li>
<li>Transparency: We work to produce interpretable methods with explicit assumptions</li>
<li>Humility: We recognize the limits of our current knowledge and remain open to revision and critique</li>
<li>Human-first
<ul>
<li>We pledge to only use AI as a supporting writing tool</li>
<li>We encourage dialogue through comment sections</li>
</ul></li>
<li>Accessibility
<ul>
<li>We release content in multiple formats to reach diverse audiences</li>
</ul></li>
<li>Occam’s Razor: We will produce material that is as simple as possible, but no simpler</li>
</ol>


</section>
</section>

 ]]></description>
  <category>glass-box modeling</category>
  <category>model selection</category>
  <guid>https://www.data-diction.com/posts/welcome/</guid>
  <pubDate>Wed, 01 Jun 2022 00:00:00 GMT</pubDate>
  <media:content url="https://www.data-diction.com/posts/welcome/logo2.png" medium="image" type="image/png" height="149" width="144"/>
</item>
</channel>
</rss>
