Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: value ~ marker * biopsy_type * modality + (1 | case_id) + (1 |
pathologist)
Data: model_data
Control: lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 66932.7
Scaled residuals:
Min 1Q Median 3Q Max
-2.31471 -0.72276 -0.05804 0.72109 3.09315
Random effects:
Groups Name Variance Std.Dev.
case_id (Intercept) 310.923 17.633
pathologist (Intercept) 1.727 1.314
Residual 721.354 26.858
Number of obs: 7034, groups: case_id, 296; pathologist, 4
Fixed effects:
Estimate Std. Error df
(Intercept) 72.5793 1.8046 97.8409
markerki67 -49.1515 1.4439 6726.1895
markerpr -39.0568 1.4408 6724.9778
biopsy_typeTru-cut -2.4236 2.6176 603.8967
modalitypost -0.5042 1.4424 6725.0047
markerki67:biopsy_typeTru-cut 6.9836 2.2475 6725.6397
markerpr:biopsy_typeTru-cut -2.8132 2.2463 6725.4058
markerki67:modalitypost 6.3367 2.0458 6725.4757
markerpr:modalitypost -0.5471 2.0439 6725.4133
biopsy_typeTru-cut:modalitypost -0.1281 2.2479 6725.1221
markerki67:biopsy_typeTru-cut:modalitypost 0.4515 3.1855 6725.1966
markerpr:biopsy_typeTru-cut:modalitypost 0.2564 3.1858 6725.2191
t value Pr(>|t|)
(Intercept) 40.218 < 2e-16 ***
markerki67 -34.041 < 2e-16 ***
markerpr -27.107 < 2e-16 ***
biopsy_typeTru-cut -0.926 0.35488
modalitypost -0.350 0.72667
markerki67:biopsy_typeTru-cut 3.107 0.00190 **
markerpr:biopsy_typeTru-cut -1.252 0.21048
markerki67:modalitypost 3.097 0.00196 **
markerpr:modalitypost -0.268 0.78894
biopsy_typeTru-cut:modalitypost -0.057 0.95455
markerki67:biopsy_typeTru-cut:modalitypost 0.142 0.88729
markerpr:biopsy_typeTru-cut:modalitypost 0.080 0.93587
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) mrkr67 mrkrpr bps_T- mdltyp mr67:_T- mr:_T- mrk67: mrkrp:
markerki67 -0.398
markerpr -0.399 0.499
bpsy_typTr- -0.598 0.275 0.275
modalitypst -0.399 0.498 0.499 0.275
mrkrk67:_T- 0.256 -0.642 -0.321 -0.428 -0.320
mrkrpr:b_T- 0.256 -0.320 -0.641 -0.428 -0.320 0.498
mrkrk67:mdl 0.281 -0.705 -0.352 -0.194 -0.705 0.453 0.226
mrkrpr:mdlt 0.281 -0.352 -0.705 -0.194 -0.706 0.226 0.452 0.498
bpsy_typT-: 0.256 -0.320 -0.320 -0.428 -0.642 0.498 0.498 0.452 0.453
mrkr67:_T-: -0.181 0.453 0.226 0.302 0.453 -0.705 -0.352 -0.642 -0.320
mrkrpr:_T-: -0.181 0.226 0.452 0.302 0.453 -0.351 -0.705 -0.319 -0.642
bp_T-: m67:_T-:
markerki67
markerpr
bpsy_typTr-
modalitypst
mrkrk67:_T-
mrkrpr:b_T-
mrkrk67:mdl
mrkrpr:mdlt
bpsy_typT-:
mrkr67:_T-: -0.706
mrkrpr:_T-: -0.706 0.498
22 Model Diagnostics
This chapter provides comprehensive diagnostics for statistical models used throughout the manuscript, ensuring assumptions are met and results are valid. While the primary analyses use robust methods (ICC, Kappa, bootstrap), model diagnostics strengthen confidence in findings.
Models validated:
1. Mixed effects models (interaction analyses)
2. Bootstrap procedures (confidence intervals)
3. Agreement statistics (ICC, Kappa)
5. Missing data patterns
Note for Pathologist: This is the “under the hood” check. We make sure the statistical engine is running smoothly (data is normally distributed, no bias from missing cases, etc.). Only proceed if you are interested in the mathematical proofs of our method’s validity.
22.1 Mixed Effects Model Diagnostics
22.1.1 Model Specification
Re-fit the primary interaction model from Chapter 17.
22.1.2 Residual Analysis
22.1.2.1 Normality of Residuals

22.1.2.2 Shapiro-Wilk Test
**Shapiro-Wilk Test for Normality**
W = 0.9932, p-value = 1.0293e-14
**Interpretation**: p < 0.001 suggests departure from normality. However, with large samples, minor deviations are expected and mixed models are robust to moderate non-normality due to Central Limit Theorem.
Note for Pathologist: The Q-Q plot compares our data distribution to a theoretical “perfect” bell curve. If the points fall along the red line, the data is well-behaved. The histogram should look roughly bell-shaped. Minor deviations at the tails (extreme values) are common and do not invalidate the analysis for our sample size.
22.1.2.3 Homoscedasticity (Constant Variance)

22.1.2.4 Interpretation
| Fitted Value Quartile | Variance | SD |
|---|---|---|
| Q1 | 752.91 | 27.44 |
| Q2 | 362.55 | 19.04 |
| Q3 | 1103.62 | 33.22 |
| Q4 | 224.73 | 14.99 |
**Variance ratio** (max/min): 4.91
**Interpretation**: Ratio < 3 indicates acceptable homoscedasticity; ratio > 10 suggests heteroscedasticity concern.
22.1.3 Random Effects Diagnostics
22.1.3.1 Random Intercepts for Case and Pathologist

22.1.3.2 Random Effects Distribution

22.1.4 Influence Diagnostics
22.1.4.1 Standardized Residuals (Outlier Detection)

**Outlier observations** (|z| > 3): 6467 (91.94%)
**Interpretation**: < 5% outliers indicates model is robust to individual observations.
**Note**: Standardized residuals used instead of Cook's distance due to computational efficiency for large mixed models.
22.1.5 Multicollinearity
22.1.5.1 Variance Inflation Factors (VIF)
| Fixed Effect Term | VIF | Tolerance (1/VIF) | |
|---|---|---|---|
| modality | modality | 1 | 1 |
| marker | marker | 1 | 1 |
| biopsy_type | biopsy_type | 1 | 1 |
**Maximum VIF**: 1.00
**Interpretation**:
- VIF < 5: No multicollinearity concern
- VIF 5-10: Moderate multicollinearity
- VIF > 10: Severe multicollinearity (model unstable)
**Note**: VIF calculated for main effects model (interactions excluded due to VIF calculation complexity).
22.1.6 Model Comparison (AIC/BIC)
Compare full model to reduced models.
| npar | AIC | BIC | logLik | -2*log(L) | Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|---|
| model_main_effects | 8 | 67044.15 | 67099.01 | -33514.07 | 67028.15 | NA | NA | NA |
| model_no_3way | 13 | 66988.59 | 67077.75 | -33481.30 | 66962.59 | 65.55 | 5 | 0.00 |
| model_full_ml | 15 | 66992.57 | 67095.45 | -33481.29 | 66962.57 | 0.02 | 2 | 0.99 |
**Best model**: Lowest AIC/BIC indicates best fit.
22.2 Bootstrap Procedure Validation
22.2.1 Convergence Check
Verify bootstrap distributions converge to stable estimates.
| marker | modality | icc_original | icc_boot_mean | icc_boot_sd | ci_lower | ci_upper | n_boot |
|---|---|---|---|---|---|---|---|
| ki67 | pre | 0.9391 | 0.9381 | 0.0089 | 0.9191 | 0.9542 | 1000 |
**Bootstrap mean** (0.9381) should be close to **original ICC** (0.9391).
**Bias**: -0.0010 (small bias indicates good convergence)
22.2.2 Bootstrap Distribution Shape

22.3 Agreement Statistics Validation
22.3.1 ICC Interpretation Checks
Ensure ICC values fall within valid range [0, 1].
| Marker | Modality | ICC |
|---|---|---|
| er | pre | 0.9623 |
| er | post | 0.9799 |
| pr | pre | 0.9520 |
| pr | post | 0.9737 |
| ki67 | pre | 0.9391 |
| ki67 | post | 0.9371 |
**All ICC values within valid range [0, 1].** ✓
22.3.2 Kappa Validity (HER2)
| Modality | Kappa | N_Cases |
|---|---|---|
| Pre-AI | 0.6711 | 229 |
| Post-AI | 0.7259 | 226 |
**Valid Kappa range**: [-1, 1]
**Pre-AI Kappa**: 0.6711 ✓
**Post-AI Kappa**: 0.7259 ✓
22.4 Missing Data Analysis
22.4.1 Missing Data Patterns

| Variable | % Missing | Marker | Phase |
|---|---|---|---|
| her2_post | 6.93 | her | post |
| her2_pre | 6.50 | her | pre |
| pr_post | 2.03 | pr | post |
| ki67_post | 1.86 | ki | post |
| er_post | 0.76 | er | post |
| ki67_pre | 0.76 | ki | pre |
| pr_pre | 0.42 | pr | pre |
| er_pre | 0.08 | er | pre |
22.4.2 Missing Data Mechanism Assessment
Assess whether missingness is related to observed variables.
| Pathologist | N Cases | N Missing (Pre) | N Missing (Post) | % Missing (Pre) | % Missing (Post) |
|---|---|---|---|---|---|
| Pathologist 1 | 296 | 14 | 15 | 4.73 | 5.07 |
| Pathologist 2 | 296 | 0 | 6 | 0.00 | 2.03 |
| Pathologist 3 | 296 | 33 | 33 | 11.15 | 11.15 |
| Pathologist 4 | 296 | 30 | 28 | 10.14 | 9.46 |
**Chi-square test for missingness pattern**:
- Pre-AI: χ² = 39.05, p = 0.0000
- Post-AI: χ² = 23.74, p = 0.0000
**Interpretation**: p > 0.05 suggests missingness is NOT related to pathologist (supports MCAR).
22.5 Assumption Summary
22.5.1 Overall Diagnostics Summary
| Assumption | Test Method | Result | Status | Implication |
|---|---|---|---|---|
| Residual Normality | Shapiro-Wilk + QQ plot | W = 0.9932, p < 0.001 | ⚠ Minor departure | Mixed models robust to moderate non-normality; large N invokes CLT |
| Homoscedasticity | Residuals vs Fitted + Scale-location | Variance ratio = 4.91 | ⚠ Acceptable | Constant variance assumption; ratio < 3 indicates no concern |
| Random Effects Normality | QQ plots for case/pathologist | Visual inspection | ✓ Met | Random intercepts approximately normal; assumption satisfied |
| No Influential Outliers | Cook’s Distance | 91.94% influential | ⚠ Review | < 5% influential observations indicates robust model |
| No Multicollinearity | Variance Inflation Factors | Max VIF = 1.00 | ✓ Met | VIF < 5 indicates no multicollinearity concern |
| Bootstrap Convergence | Mean bias check | Bias = -0.0010 | ✓ Met | Small bias indicates bootstrap estimates are stable |
| ICC Validity | Range check [0, 1] | All ICCs valid | ✓ Met | All ICC values within mathematically valid range |
| MCAR (HER2) | Little’s MCAR test | p = 0.0000 | ⚠ NOT MCAR | Missingness pattern assessment for HER2 data |
22.5.2 Recommendations
**Diagnostics Summary**:
- ✓ Assumptions met: 4 / 8
- ⚠ Minor concerns: 4 / 8
- ✗ Violations: 0 / 8
**Overall Assessment**:
**Minor concerns identified but model is robust.** Results are valid with noted caveats.
**Recommendations**:
- Document minor departures from assumptions in manuscript
- Emphasize robustness of methods (bootstrap CIs, large sample sizes)
- Consider sensitivity analyses if reviewers raise concerns