11  Systematic Bias Analysis

11.1 Objective

Detect and quantify systematic biases in AI and pathologist assessments.

Note for Pathologist: Systematic bias means “always missing in the same direction”. For example, does the AI always underestimate Ki67 in high-grade tumors? Or does it always score ER higher than humans? Detecting these biases allows us to “calibrate” our expectations - if we know the AI runs high, we can mentally adjust.

11.2 Setup

11.3 Load Data

[1] "Filtered to common cohort. Merged N: 1184"
[1] "Before NA removal: 1184"
[1] "After NA removal: 1037"

11.4 Overall AI Bias

Does AI systematically over or underestimate compared to pathologists’ initial assessments?

Overall AI Bias
Mean and median differences (Post-AI minus Pre-AI)
Marker Mean Bias1 Median Bias SD of Bias NA
er −0.76 0.00 6.39
pr −1.23 0.00 7.62
ki67 5.88 4.00 7.69
n 1037
1 Positive values = AI tends to give higher scores
Statistical Tests for Systematic Bias
Paired t-tests: H0 = no difference between Pre and Post
Marker Mean Difference t-statistic P-value 95% CI Lower 95% CI Upper
ER −0.764 −3.846 0.0001 −1.153 −0.374
PR −1.226 −5.176 0.0000 −1.690 −0.761
Ki67 5.884 24.642 0.0000 5.416 6.353

11.5 Bias by Score Range

Test if AI bias varies by the initial score level (e.g., more bias for low vs high scores).

ANOVA: Bias Variation by Score Range
Test if AI bias differs across score ranges
Marker F-statistic P-value
ER 10.72 0.0000
PR 56.61 0.0000

11.6 Regression to the Mean

Test if AI shows regression to the mean (extreme values become more moderate).

Regression to the Mean Analysis
Relationship between initial score and change
Marker Slope (β)1 SE t-value P-value
ER −0.0199 0.0053 −3.7706 0.00017
PR −0.0627 0.0062 −10.1466 0.00000
Ki67 −0.0296 0.0112 −2.6327 0.00860
1 Negative slope = regression to mean (high values decrease, low values increase)

11.7 Pathologist-Specific Bias Patterns

Compare systematic biases across pathologists.

Pathologist-Specific Bias
Mean change after AI by pathologist
Pathologist ER Bias PR Bias Ki67 Bias
Pathologist 1 −1.83 −1.69 8.63
Pathologist 2 −1.38 −1.94 3.43
Pathologist 3 0.29 −1.03 5.62
Pathologist 4 −0.05 −0.16 6.00

11.8 HER2 Bias Patterns

Analyze directional bias in HER2 scoring.

HER2 Score Changes
Overall directional patterns
Change Direction Count Percentage (%)
Downgraded 56 5.4
No Change 938 90.5
Upgraded 43 4.1
HER2 Changes by Pathologist
Distribution of upgrades, downgrades, and no changes
Downgraded No Change Upgraded Net Change Total Cases
Pathologist 1
8 237 14 6 259
Pathologist 2
30 230 15 -15 275
Pathologist 3
12 231 9 -3 252
Pathologist 4
6 240 5 -1 251
Most Common HER2 Transitions
Top 10 score change patterns
Pre-AI Score Post-AI Score Direction Count
1 1 No Change 384
0 0 No Change 293
2 2 No Change 137
3 3 No Change 124
2 1 Downgraded 44
0 1 Upgraded 24
1 2 Upgraded 16
1 0 Downgraded 10
2 3 Upgraded 3
3 2 Downgraded 2

11.9 Threshold Effects

Examine if bias is stronger near clinically important thresholds.

AI Influence Near Clinical Thresholds
Mean absolute change within 5% of threshold vs away from threshold
Marker Threshold Near Threshold Far from Threshold Difference Ratio1
ER 1% 0.33 3.76 −3.43 0.09
ER 10% 4.05 3.09 0.96 1.31
PR 1% 0.54 7.40 −6.86 0.07
PR 10% 3.58 4.05 −0.47 0.88
Ki67 20% 7.81 6.60 1.21 1.18
Ki67 30% 8.61 6.55 2.06 1.32
1 Ratio > 1 = More AI influence near thresholds

11.10 Bland-Altman Analysis

Classic method for assessing agreement and detecting systematic bias.

Note: The Bland-Altman analysis below pools observations across all four pathologists. This approach captures overall pre-vs-post bias but conflates within-pathologist change with between-pathologist variation, potentially widening the limits of agreement. Per-pathologist Bland-Altman plots are available in the individual pathologist performance analysis (Chapter 7).

Bland-Altman Statistics
Mean difference and 95% limits of agreement
Marker Mean Difference SD of Differences Upper LoA Lower LoA
ER −0.76 6.39 11.77 −13.30
PR −1.23 7.62 13.72 −16.17
Ki67 5.88 7.69 20.96 −9.19

11.11 Conclusion

11.11.1 Summary of Bias Patterns

  1. Overall Bias: Quantifies whether AI systematically shifts scores up or down

  2. Range-Dependent Bias: AI may show different bias patterns for low vs high scores

  3. Regression to the Mean: Extreme values tend to become more moderate after AI input

  4. Pathologist-Specific Patterns: Individual pathologists may show unique bias patterns when using AI

  5. Threshold Effects: AI influence may be stronger near clinically important cutoffs

  6. Bland-Altman Analysis: Visualizes agreement and identifies proportional bias

11.11.2 Clinical Implications

  • Awareness of systematic biases is crucial for appropriate AI implementation
  • Bias patterns should inform calibration and validation strategies
  • Individual pathologists may need targeted training to address specific biases
  • Cases near clinical thresholds require extra scrutiny