# A tibble: 10 × 3
marker modality n_cases
<chr> <chr> <int>
1 ER post 1184
2 ER pre 1184
3 PR post 1184
4 PR pre 1184
5 Ki67 post 1184
6 Ki67 pre 1184
7 HER2 post 1184
8 HER2 pre 1184
9 Subtype post 1184
10 Subtype pre 1184
15 Agreement Extensions and Precision
15.1 Aim
Quantify how reliable our agreement metrics are, retain more data via pairwise handling, and compare pre/post reliability directly.
Note for Pathologist: This is a technical supplement.
We calculate “Confidence Intervals” to show how certain we are about the agreement numbers.
We also look at “Pairwise Agreement” to see if specific pairs of pathologists agree more than others (e.g., do Senior pathologists agree more with each other?).
15.2 Sample Size and Completeness
15.3 Bootstrap Confidence Intervals for Kappa/ICC
15.4 Pairwise Agreement by Rater Pair
15.5 Pre vs Post Reliability Shift
Note for Pathologist: The delta table (Pre vs Post ICC/Kappa) is the key output here. A positive delta means agreement improved after AI; a negative delta means it worsened. If the bootstrap confidence intervals for the delta do not cross zero, we can be confident the change is real and not due to chance.
15.6 Reporting Plan
- Present bootstrap CIs for Fleiss’ Kappa (HER2, subtype) and ICC (ER/PR/Ki67) to convey precision.
- Add pairwise Kappa/ICC tables to highlight which raters benefit most from AI.
- Include a delta table with Pre vs Post ICC/Kappa and a note on whether CIs overlap.