Medical Research Outline

 

COURSE PHILOSOPHY

Most medical research is shallow.
Not because clinicians are unintelligent —
but because the system rewards speed over rigor and volume over depth.

Overview of current pitfalls

  • weak assumptions

  • soft models

  • p-hacked results

  • copy-paste methods

  • superficial inference

  • research as résumé filler

Instead, this understanding helps me to

Build intellectual leverage through rigor, design, and truth-seeking.


CORE IDENTITY PRINCIPLE

If wanting to move from a consumer of research.

To a producer of methods and models.

If not here to:

  • run someone else’s code

  • echo literature conclusions

  • decorate CVs with low-impact papers

But are here to: build, test, model, verify, challenge


Make sure that in each study this is discussed

Objective

Ensure rigorous statistics

Content

Red flags to master:

  • Underspecified models

  • No causal framework

  • Unjustified adjustments

  • P-hacking behaviors

  • Selective outcome reporting

  • Forking-path analyses

  • Overconfident conclusions from weak data

  • Regression worship without clinical sense



Underspecified models

Meaning:
The authors ran a model without including important variables or structure — they used a “thin” model for a complex problem.

Example:
A paper models:

steroid use → reduced mortality

But does NOT include:

  • disease severity

  • oxygen requirement

  • ICU admission status

  • comorbidities

They adjusted for:

  • age

  • sex
    …and stopped.

That’s underspecified.


Meaning:
There’s no articulated explanation for how X is supposed to cause Y. They’re just calculating associations.

Example:

“High CRP predicts mortality. Therefore CRP should be targeted.”

But:

  • Is CRP a marker?

  • A mediator?

  • A confounder?

  • A consequence?

No causal thinking.

Translation:
They found correlation and wrote destiny.


Unjustified adjustments

Meaning:
They adjusted for variables that should not be adjusted for, which actually biases results.

Example:
Studying:

effect of ventilation on survival

They adjust for:

oxygen saturation after intubation

That’s a mediator — not a confounder.

Result = distorted estimate.

Translation:
They “controlled” for the very thing they were trying to measure.



The Three Core Variable Types

1. ConfounderAdjust for this

Definition:

A confounder is a variable that:

  • affects the exposure

  • affects the outcome

  • is NOT caused by the exposure

It creates a fake relationship if you ignore it.


Example:

Studying:

Steroids → Mortality in ICU

Confounder:

Disease severity

Why?

  • Sicker patients are more likely to get steroids

  • Sicker patients are more likely to die

So if you don’t adjust:

It looks like steroids cause death
because disease severity is mixed into the estimate.


Rule:

✅ Adjust for confounders.


How to recognize one:

Ask:

“Does this cause both my exposure and my outcome?”

If yes → likely a confounder.


2. MediatorDo NOT adjust

Definition:

A mediator is:

  • caused by the exposure

  • in the causal path to the outcome

It explains how the exposure works.


Example:

Studying:

Ventilator strategy → Survival

Mediator:

Oxygenation after ventilation

Ventilator → oxygenation → survival

If you adjust for oxygenation…

You erase the very effect you’re trying to measure.


Rule:

❌ Do not adjust for mediators if you want the total effect.


How to recognize one:

Ask:

“Is this result of the exposure?”

If yes → it’s a mediator.


3. Collider ❌❌ Never adjust

Definition:

A collider is:

  • caused by both exposure and outcome

Adjusting for it creates bias.


Example:

Studying:

Smoking → Lung cancer

Collider:

Hospital admission

Smoking increases admissions
Lung cancer increases admissions

Conditioning on hospital admission introduces fake inverse relationships.




Forking-path analyses

Meaning:
They made many analytic choices without accounting for uncertainty created by those choices.

Example:

  • Changed inclusion age cutoffs

  • Tried different covariate sets

  • Used different model forms


Assignment

Take 5 papers relevant to your interest area and classify:

  • What model is used?

  • What causal assumption is implied?

  • What bias is unaddressed?

  • What claim is unjustified?



MODULE 2

FROM “DATA” TO CAUSE

Objective

Move from association to explanation.

Content

Topics:

  • Directed acyclic graphs (DAGs)

  • Confounding vs mediation

  • Collider bias

  • Identification strategies

  • Exchangeability

  • Positivity

  • Target trial emulation

Assignment

Build DAGs for:

  • Vasopressors → mortality

  • Ventilator strategy → lung injury

  • Diuretics → renal outcomes

  • Steroids → survival

Label:

  • Confounders

  • Mediators

  • Colliders

  • Unmeasured bias


MODULE 3

ASSOCIATION MODELS ≠ ANSWERS

Objective

Use standard models honestly, not lazily.

Content

Tools:

  • Logistic regression

  • Cox models

  • KM curves

  • Propensity scores

  • Splines

But learn:

  • When each model lies

  • When it misleads

  • When hazard ratios distort reality

  • When matching worsens bias

  • When proportional hazards fails

Assignment

Re-analyze one “classic” study and show:

  • What assumptions were violated

  • How conclusions change under different models


MODULE 4

CAUSAL INFERENCE THAT ACTUALLY WORKS

Objective

Use methods clinicians never learn — and therefore misuse.

Content

Methods:

  • IPW / IPTW

  • Marginal structural models

  • G-methods

  • Instrumental variables

  • Sensitivity analyses

  • Negative controls

Assignment

Pick one clinical question and:

  • Emulate a target trial

  • Define exposure, outcome, eligibility

  • Identify time-zero clearly

  • Justify design choices


MODULE 5

BAYESIAN THINKING FOR CLINICIANS

Objective

Think in probability, not superstition.

Content

Topics:

  • Bayesian interpretation

  • Prior formation

  • Posterior updating

  • Credible intervals vs CI

  • Hierarchical models

  • Bayesian decision theory

Assignment

Rewrite a frequentist paper as:

  • a prior

  • a likelihood

  • a posterior belief statement


MODULE 6

INTERPRETABLE MACHINE LEARNING

Objective

Do ML that’s defensible, not embarrassing.

Content

Tools:

  • SHAP

  • calibration curves

  • decision curves

  • feature dependence plots

  • nested cross-validation

Concepts:

  • Overfitting

  • Data leakage

  • Bias amplification

  • Model stability

Assignment

Train a model and:

  • explain it

  • validate it

  • test calibration

  • demonstrate generalization


MODULE 7

SIMULATION & BOOTSTRAP THINKING

Objective

Stop relying on asymptotics and wishful thinking.

Content

  • Monte Carlo simulation

  • Bootstrap confidence

  • Power simulation

  • resampling logic

  • uncertainty propagation

Assignment

Simulate:

  • a biased dataset

  • a confounded dataset

  • a mis-specified model
    and show failure modes.


MODULE 8

REPRODUCIBILITY OR NOTHING

Objective

Build research that survives close inspection.

Content

Practices:

  • Version control

  • Data pipeline discipline

  • Code hygiene

  • Pre-registration

  • Notebooks

  • Reproducible reporting

Assignment

Take one old project and:

  • rebuild it clean

  • document assumptions

  • make it rerunnable


MODULE 9

CHOOSING QUESTIONS THAT MATTER

Objective

Research should matter — or it dies quietly.

Content

Good research questions:

  • change decisions

  • reduce uncertainty

  • expose bias

  • alter treatment

  • predict reality

Bad ones:

  • are convenient

  • trend-chasing

  • hollow associations

  • CV padding

Assignment

Kill one project idea that isn’t worth doing.
Design one that is.


CAPSTONE

BUILD SOMETHING REAL

Your final project must:

✅ Answer a real clinical question
✅ Include causal reasoning
✅ Use appropriate models
✅ Produce interpretable results
✅ Be reproducible
✅ Teach you something new

Comments