6  Impact of AI on Interpretation

6.1 Objective

Analyze how AI assistance changes the interpretation of individual pathologists and if it leads to consensus.

Note for Pathologist: This analysis focuses on the “Impact of AI” on your individual assessments. We perform a “Non-inferiority Analysis” to ensure AI doesn’t negatively disrupt standard scoring (benchmark >75% agreement). We also visualize “Score Shifting” (did AI make you grade higher or lower?) and “Variance Reduction” (did AI make the group of pathologists agree more with each other?).

6.2 Molecular Classification Definitions

  • HER2 Positive: HER2 Score 2 or 3.
  • Luminal A: HER2 (0/1) & ER > 10% & PR > 10% & Ki67 < 30%.
  • Luminal B: HER2 (0/1) & ER > 10% & Ki67 >= 30%.
  • Hormone Weak Positive: HER2 (0/1) & (ER > 0 OR PR > 0) & Not Luminal A/B.
  • Triple Negative: HER2 (0/1) & ER = 0 & PR = 0.

6.3 Setup

6.4 Load Data

 [1] "case_id"                "er_pre"                 "er_post"               
 [4] "pr_pre"                 "pr_post"                "her2_pre"              
 [7] "her2_post"              "ki67_pre"               "ki67_post"             
[10] "comment"                "pathologist"            "er_pre_cat"            
[13] "er_post_cat"            "pr_pre_cat"             "pr_post_cat"           
[16] "ki67_pre_cat20"         "ki67_post_cat20"        "ki67_pre_cat30"        
[19] "ki67_post_cat30"        "molecular_subtype_pre"  "molecular_subtype_post"
[22] "biopsy_type"           
# A tibble: 6 × 22
  case_id  er_pre er_post pr_pre pr_post her2_pre her2_post ki67_pre ki67_post
  <chr>     <dbl>   <dbl>  <dbl>   <dbl> <fct>    <fct>        <dbl>     <dbl>
1 34937-25     90      95     60      64 1        1                6         6
2 35131-25    100      97      1       2 1        1               35        35
3 35077-25      0       0      0       0 3        3               60        60
4 35091-25     50      40      1       1 1        0                1         5
5 34367-25    100      96     50      70 0        0               10        11
6 34681-25    100     100    100     100 1        1               23        17
# ℹ 13 more variables: comment <chr>, pathologist <chr>, er_pre_cat <fct>,
#   er_post_cat <fct>, pr_pre_cat <fct>, pr_post_cat <fct>,
#   ki67_pre_cat20 <fct>, ki67_post_cat20 <fct>, ki67_pre_cat30 <fct>,
#   ki67_post_cat30 <fct>, molecular_subtype_pre <chr>,
#   molecular_subtype_post <chr>, biopsy_type <fct>
[1] "Filtered to common cohort. Merged N: 1184"
[1] "Before NA removal: 1184"
[1] "After NA removal: 1037"

6.5 Change in Interpretation (Continuous Variables)

Visualize the shift in ER, PR, and Ki67 scores for each pathologist.

6.6 Intra-observer Consistency (Non-inferiority Analysis)

Following the methodology of Abele et al. (2023), we assess the consistency between Pre-AI and Post-AI scoring. Abele et al. proposed a non-inferiority margin of 75% agreement for categorized variables when evaluating AI assistance tools in routine diagnostics. High consistency (>75%) suggests that AI assistance does not disrupt the diagnostic reliability.

Intra-Observer Consistency (Pre-AI vs Post-AI)
Benchmark: >75% Agreement indicates Non-inferiority (Abele et al. 2023)
Marker Agreement Rate (%) Benchmark Status (>75%)
ER_Agreement_Rate 99.2 75 Non-inferior (Pass)
PR_Agreement_Rate 96.6 75 Non-inferior (Pass)
Ki67_Agreement_Rate 85.1 75 Non-inferior (Pass)
HER2_Agreement_Rate 94.2 75 Non-inferior (Pass)

6.7 Molecular Subtype Changes

Visualize transitions between molecular subtypes after AI assistance.

Molecular Subtype Transition Matrix
Rows = Pre-AI, Columns = Post-AI
Pre-AI Subtype HER2 Positive Hormone Weak Positive Luminal A Luminal B Triple Negative
HER2 Positive 266 7 11 22 4
Hormone Weak Positive 4 120 9 20 0
Luminal A 7 5 243 74 0
Luminal B 5 0 3 116 0
Triple Negative 0 2 0 0 119

6.8 Change in Interpretation (HER2)

Visualize the shift in HER2 scores using Sankey diagrams.

6.9 Consensus Analysis

Did the variance among pathologists decrease after AI?

6.10 Documenting Major Changes

Identify cases where the diagnosis changed significantly (e.g., HER2 1+ -> 3+, or ER Negative -> Positive).

Cases with Changed HER2 Scores
case_id pathologist her2_pre her2_post comment
35091-25 Pathologist 2 1 0 tum membranlari algilamiyor
33813-25 Pathologist 2 0 1
33533-25 Pathologist 2 1 2
33089-25 Pathologist 2 1 0
31458-25 Pathologist 2 1 2
30130-25 Pathologist 2 0 1
24311-25 Pathologist 2 2 1
23916-25 Pathologist 2 2 1 er pr her2 yok
23259-25 Pathologist 2 0 1 lobuler karsinomu taniyamiyor
22977-25 Pathologist 2 2 1
22026-25 Pathologist 2 1 0
20235-25 Pathologist 2 1 0 %10u hesaplayamadim
20234-25 Pathologist 2 2 1
20256-25 Pathologist 2 2 3 ER cok zor. Her2 sayinca ikna oldum.
19362-25 Pathologist 2 1 0
19361-25 Pathologist 2 2 1
19258-25 Pathologist 2 1 2
18413-25 Pathologist 2 2 1
18012-25 Pathologist 2 2 1 PR: hemosiderini pozitif algiliyor
18217-25 Pathologist 2 2 1
16396-25 Pathologist 2 1 0
16497-25 Pathologist 2 1 0
15965-25 Pathologist 2 2 1
15840-25 Pathologist 2 1 2
14982-25 Pathologist 2 2 1
14987-25 Pathologist 2 2 1
12343-25 Pathologist 2 1 0
12068-25 Pathologist 2 2 1 PRde soluklari secmiyor
12440-25 Pathologist 2 0 1
11632-25 Pathologist 2 0 1
10928-25 Pathologist 2 2 1
10676-25 Pathologist 2 2 1
10454-25 Pathologist 2 1 2
10157-25 Pathologist 2 2 1
9859-25 Pathologist 2 2 1
9930-25 Pathologist 2 3 2
9347-25 Pathologist 2 2 1
8135-25 Pathologist 2 1 2
7468-25 Pathologist 2 1 0
7291-25 Pathologist 2 2 1
7010-25 Pathologist 2 0 1
6706-25 Pathologist 2 1 2
6476-25 Pathologist 2 2 1
6444-25 Pathologist 2 0 1
5935-25 Pathologist 2 2 1
34681-25 Pathologist 1 0 1
30736-25 Pathologist 1 0 1
30767-25 Pathologist 1 0 1
29841-25 Pathologist 1 0 1
28310-25 Pathologist 1 0 1
27588-25 Pathologist 1 0 1
25564-25 Pathologist 1 2 1
22803-25 Pathologist 1 1 2
21551-25 Pathologist 1 1 2
15965-25 Pathologist 1 2 1
14933-25 Pathologist 1 0 1
14058-25 Pathologist 1 2 1
13833-25 Pathologist 1 2 1
13471-25 Pathologist 1 0 1
12343-25 Pathologist 1 0 1
12818-25 Pathologist 1 2 1
9859-25 Pathologist 1 2 1 SOLID PAPILLER KARSINOM tIS ALACAK MIYIZ?
7468-25 Pathologist 1 1 0
7291-25 Pathologist 1 2 1
7010-25 Pathologist 1 0 1
6706-25 Pathologist 1 1 2
6444-25 Pathologist 1 0 1
33533-25 Pathologist 4 1 2
30029-25 Pathologist 4 2 1 6
24311-25 Pathologist 4 2 1
22030-25 Pathologist 4 2 1
20823-25 Pathologist 4 2 1
20256-25 Pathologist 4 2 3
15840-25 Pathologist 4 1 2
14987-25 Pathologist 4 2 1 SKOR 2 DERDIM AMA AI ILE %3 KADAR ORTA KOMPLET MEMBRANOZ BOYANMA DIYINCE VAZGECTIM.
8399-25 Pathologist 4 1 2 rutin sirasinda skor 2 der mi diye baska birisinden gorus alacagim bir vaka, AI skor 2 diyince ikna oldum.
7636-25 Pathologist 4 1 2
7291-25 Pathologist 4 2 1
31448-25 Pathologist 3 0 1
31485-25 Pathologist 3 0 1
30029-25 Pathologist 3 0 1
30017-25 Pathologist 3 0 1
22977-25 Pathologist 3 2 1
20234-25 Pathologist 3 2 1
19731-25 Pathologist 3 1 2
15655-25 Pathologist 3 2 1
14886-25 Pathologist 3 2 1
14987-25 Pathologist 3 2 1
14074-25 Pathologist 3 2 1
13833-25 Pathologist 3 2 1
12675-25 Pathologist 3 2 3
12366-25 Pathologist 3 2 1 t1
12068-25 Pathologist 3 2 1
10719-25 Pathologist 3 3 2
10183-25 Pathologist 3 0 1
9859-25 Pathologist 3 2 1
7291-25 Pathologist 3 2 1
6706-25 Pathologist 3 1 2
6444-25 Pathologist 3 0 1

6.11 Influence of AI by Pathologist Seniority

Evaluate if the magnitude of change (influence) varies by pathologist experience level.

Assumption: Pathologist 1 is the most senior and Pathologist 4 is the most junior (P1 → P2 → P3 → P4, from most to least experienced). Wu et al. (2023) found that junior pathologists benefited most from AI assistance, showing greater score modifications after viewing AI suggestions (Wu et al. 2023).

6.11.1 Statistical Tests

AI Influence by Seniority: Statistical Tests
Spearman ρ > 0 indicates junior pathologists are more influenced
Marker Kruskal-Wallis p Spearman ρ1 Spearman p Median Δ (P1) Median Δ (P4) Wilcoxon p (P1 vs P4)
ER <0.001 −0.398 <0.001 2.000 0.000 <0.001
PR <0.001 −0.254 <0.001 2.000 0.000 <0.001
Ki67 <0.001 −0.084 0.007 7.000 5.000 <0.001
1 Spearman ρ: correlation between seniority rank (1=most senior, 4=most junior) and absolute change magnitude

6.11.2 Seniority Trend Visualization

6.11.3 Interpretation of Influence

  • Magnitude of Change (Delta): The “Absolute Change” represents how much the pathologist altered their score after seeing the AI result. A higher value indicates greater influence.
  • Seniority Gradient: The trend plot shows the relationship between experience level and AI influence across all four pathologists, not just the extremes.
    • A positive Spearman ρ indicates that junior pathologists show greater AI influence (consistent with Wu et al. 2023 findings that less experienced pathologists benefit more from AI assistance (Wu et al. 2023)).
    • If the trend is flat, seniority may not be a primary factor in AI adoption/influence.
  • Senior vs Junior: The boxplot compares the most senior (P1) and most junior (P4) pathologists directly.
    • If P4 has a higher median delta, it suggests the junior pathologist is more influenced by the AI.
  • Statistical Significance: The Kruskal-Wallis test assesses overall differences, Spearman ρ tests the seniority gradient, and the Wilcoxon test provides the direct P1 vs P4 comparison.

6.12 Critical Clinical Changes

Evaluate cases where the change in score crosses a clinically significant threshold.

Thresholds:

  • ER/PR:
    • Negative: 0%
    • Low Positive: 1-9%
    • Positive: >= 10%
    • Critical Change: Moving between these categories.
  • Ki67:
    • Low: < 30%
    • High: >= 30%
    • Critical Change: Moving across the 30% threshold.
  • HER2:
    • Any Change: Any difference in score (0, 1, 2, 3) is considered critical.

6.13 Conclusion

6.13.1 Interpretation of Results

  • Shift in Scores:
    • In the scatter plots, points falling on the red dashed line indicate no change in the pathologist’s assessment after AI assistance.
    • Points above the line suggest the AI score was higher, while points below indicate the AI score was lower.
    • A systematic shift (e.g., most points above the line) would suggest the AI tends to grade higher than the pathologist.
  • Variance Reduction (Consensus):
    • The “Change in Inter-Pathologist Variance” plot shows the standard deviation (SD) of scores among pathologists for each case.
    • Points below the red line indicate that the variance decreased after using AI (Post-AI SD < Pre-AI SD). This means the pathologists became more consistent with each other (better consensus).
    • Points above the red line indicate increased variance (less consensus).
    • Ideally, we want to see the majority of cases below the red line, suggesting AI helps standardize the diagnosis, a key benefit highlighted in broad reviews of AI in breast pathology (Liu et al. 2023; Soliman, Li, and Parwani 2024; Ibrahim et al. 2020; Baxi et al. 2022; Niazi, Parwani, and Gurcan 2019).
  • Expert-AI Synergy:
    • Niazi et al. (2019) emphasized that the “Expert-AI combination” yields results that are more accurate and consistent than what an expert can do alone. They argue that AI should not be seen as a replacement but as a tool that enables pathologists to extract information beyond human limits (e.g., sub-visual features) and provides a safety mechanism for error prevention (Niazi, Parwani, and Gurcan 2019).
    • Shafi et al. (2022) validated this in a fully digital clinical workflow for ER assessment, demonstrating excellent concordance (93.8%) between automated analysis and manual scoring. Crucially, they highlighted that automated batch processing of biomarkers can significantly save time and labor while correctly identifying pitfalls like DCIS and benign glands (Shafi et al. 2022).
    • Soliman et al. (2024), in a comprehensive review, concluded that AI integration stands to improve diagnostic accuracy and reduce avoidable errors across the board. They emphasized AI’s capability to standardize the quantification of biomarkers (ER, PR, HER2, Ki-67) and mitigate interobserver variability, advocating for its use not just as a diagnostic aid but as a tool for “adaptive sampling” to handle large whole-slide images efficiently (Soliman, Li, and Parwani 2024).

6.14 Future Directions: The Technical Horizon

  • Near-Perfect Segmentation: The technical capabilities of AI are advancing rapidly. Begum & Kalaivani (2025) recently demonstrated a “Smart Neural Network” approach that achieved a Dice coefficient of ~99.9% for nuclei segmentation on the BreakHis dataset (Begum and Kalaivani 2025).

  • Implication: This level of technical precision suggests that the traditional “bottlenecks” of computer vision (e.g., inaccurate cell boundaries, overlapping nuclei) are being solved. The future challenge will shift from algorithm development to clinical integration, validation on diverse real-world datasets, and establishing trust with pathology teams.

  • HER2 Low Precision: With the advent of ADCs like trastuzumab-deruxtecan (DESTINY-Breast04), distinguishing HER2 Score 0 from 1+ has become a critical clinical threshold. Albuquerque et al. (2025) confirm that AI is highly sensitive (0.97) for this task, potentially serving as an effective screening tool to ensure eligible patients are not missed, though expert review remains essential for final scoring (Albuquerque et al. 2025).

  • Major Diagnostic Changes:

    • The table “Cases with Changed HER2 Scores” highlights clinically significant disagreements between the pathologist’s initial review and the AI-assisted review.
    • Changes between HER2 Negative (0/1+) and Positive (3+) are critical as they directly impact treatment eligibility (e.g., Herceptin).
    • Changes involving Equivocal (2+) cases are also important, as they often trigger reflex testing (FISH).

6.14.1 Challenges to Implementation

While the benefits of AI in standardization and efficiency are clear, broader implementation faces significant hurdles. Reis-Filho & Kather (2023) outlined the key challenges that must be overcome for widespread adoption:

  • Cultural Shift: The transition from the microscope to a digital screen represents a fundamental change in pathology practice that can meet resistance.
  • Quality Control: AI algorithms are sensitive to variations in tissue processing and staining (“garbage in, garbage out”). Ensuring that AI models generalize across different laboratories is a major technical hurdle.
  • Regulatory & Financial: The regulatory landscape for AI-based biomarkers is complex, and the financial cost of digitizing pathology departments (scanners, storage) remains a significant barrier without clear reimbursement models (Reis-Filho and Kather 2023).