AI for Breast Cancer Analysis

Authors
Affiliations

Fadime Gul Salman

Memorial Hospitals Group Department of Pathology

Murat Oktay

Memorial Hospitals Group Department of Pathology

Çiğdem Irkkan

Memorial Hospitals Group Department of Pathology

Fatma Aktepe

Memorial Hospitals Group Department of Pathology

Ilknur Turkmen

Memorial Hospitals Group Department of Pathology

Published

February 10, 2026

Introduction

Background

Breast cancer is the most common malignancy in women worldwide, and accurate diagnosis of biomarker expression is critical for treatment planning. The assessment of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and Ki67 proliferation index through immunohistochemistry (IHC) forms the foundation of breast cancer molecular classification and treatment decisions.

Despite standardized protocols, inter-observer variability in biomarker interpretation remains a significant challenge in diagnostic pathology. Studies have shown considerable disagreement among pathologists, particularly for:

  • Ki67 proliferation index: High variability due to scoring methodology and tumor heterogeneity
  • HER2 scoring: Challenges in distinguishing between scores 1+ and 2+, impacting reflex testing decisions
  • ER/PR low-positive cases: Difficulty in accurately assessing cases with 1-10% positivity

The Promise of Artificial Intelligence

Recent advances in artificial intelligence (AI) and digital pathology offer potential solutions to reduce inter-observer variability. AI algorithms can:

  • Provide objective, reproducible quantification of biomarker expression
  • Standardize assessment across different observers and institutions
  • Potentially improve diagnostic accuracy and consistency
  • Assist pathologists in challenging cases

However, the real-world impact of AI on pathologist behavior and diagnostic consensus remains unclear. Questions include:

  1. Does AI truly improve inter-observer agreement?
  2. How do individual pathologists adopt and integrate AI recommendations?
  3. Which types of cases benefit most from AI assistance?
  4. Does AI introduce new biases or systematic errors?
  5. What is the clinical impact of AI-influenced diagnoses on treatment decisions?

Study Rationale

This validation study aims to comprehensively evaluate the impact of Aiforia AI system on breast cancer biomarker assessment by pathologists. We hypothesize that:

  • AI assistance will improve inter-observer agreement for all biomarkers
  • AI will be particularly beneficial in borderline and difficult cases
  • Individual pathologists will show varying degrees of AI adoption
  • AI-assisted diagnosis will lead to meaningful changes in molecular classification and treatment recommendations

Study Design

Participants

Four board-certified pathologists with varying levels of experience independently evaluated breast cancer cases for ER, PR, HER2, and Ki67 expression.

Two-Phase Assessment

Phase 1 (Pre-AI): Each pathologist independently assessed all cases using standard microscopic evaluation and digital pathology workflows, recording their initial interpretations.

Phase 2 (Post-AI): The same pathologists re-evaluated the identical cases using the Aiforia AI platform, which provided AI-generated biomarker quantifications and recommendations. Pathologists then recorded their final interpretations, which could either align with or differ from the AI suggestions.

Biomarkers Assessed

  1. ER (Estrogen Receptor): Percentage of positive tumor cells (0-100%)
  2. PR (Progesterone Receptor): Percentage of positive tumor cells (0-100%)
  3. Ki67: Percentage of tumor cells with positive nuclear staining (0-100%)
  4. HER2: Score 0, 1+, 2+, or 3+ based on membrane staining pattern

Molecular Classification

Cases were classified into molecular subtypes using a hierarchical algorithm:

  1. HER2 Positive: HER2 Score 2+ or 3+ (regardless of other markers)
  2. Luminal A: HER2 (0/1+) AND ER > 10% AND PR > 10% AND Ki67 < 30%
  3. Luminal B: HER2 (0/1+) AND ER > 10% AND Ki67 ≥ 30%
  4. Hormone Weak Positive: HER2 (0/1+) AND (ER > 0 OR PR > 0) AND (Not Luminal A/B)
  5. Triple Negative: HER2 (0/1+) AND ER = 0 AND PR = 0

Analytical Approach

This study employs a multi-faceted analytical strategy:

Inter-Observer Agreement Analysis

  • Intraclass Correlation Coefficient (ICC) for continuous markers
  • Fleiss’ Kappa for categorical markers
  • Comparison of pre-AI vs post-AI agreement metrics

Impact of AI on Individual Assessments

  • Magnitude and direction of changes after AI input
  • Individual pathologist adoption patterns
  • Cases where AI improved vs worsened consensus

Discordance Analysis

  • Identification of high-variance cases
  • Cases where AI increased disagreement
  • Marker-specific discordance patterns

Statistical Validation

  • Bootstrap confidence intervals for agreement metrics
  • Paired tests for systematic changes
  • Mixed effects models accounting for nested structure
  • Variance component analysis

Clinical Impact Assessment

  • Treatment decision changes (endocrine therapy, HER2-targeted therapy, chemotherapy)
  • Molecular subtype reclassification rates
  • Impact on FISH testing requirements
  • Cost-effectiveness implications

Systematic Bias Detection

  • Overall AI bias (over/underestimation)
  • Range-dependent and threshold effects
  • Regression to the mean analysis
  • Bland-Altman plots

Individual Performance Metrics

  • Intra-pathologist consistency
  • AI adoption indices
  • Agreement with group consensus
  • Learning effects over time

Subgroup Analysis

  • Performance stratified by molecular subtype
  • HER2 status subgroups
  • Borderline cases near clinical thresholds
  • Triple negative and Luminal A/B differentiation

Study Objectives

Primary Objectives

  1. Determine if AI assistance significantly improves inter-observer agreement for breast cancer biomarkers
  2. Quantify the clinical impact of AI on treatment-relevant diagnostic classifications

Secondary Objectives

  1. Characterize individual pathologist adoption patterns and AI influence
  2. Identify case types and scenarios where AI provides maximum benefit
  3. Detect and quantify systematic biases introduced by AI
  4. Evaluate AI performance across different molecular subtypes and clinical scenarios
  5. Assess the real-world feasibility and workflow integration of AI-assisted diagnosis

Significance

This study provides comprehensive, real-world evidence on:

  • The actual impact of AI on diagnostic reproducibility in breast pathology
  • Individual variation in AI adoption and integration
  • Clinical scenarios where AI adds most value
  • Potential pitfalls and systematic biases to monitor
  • Evidence-based recommendations for AI implementation in diagnostic pathology

The findings will inform best practices for AI integration, quality assurance protocols, and training requirements for AI-assisted breast cancer diagnosis.

Data Source

Study data and materials are available at: https://drive.google.com/drive/folders/177V9w4XYFs6UjU2qQDpmHZFSeX_RWTP_?usp=sharing


Note: This is a validation study of the Aiforia AI system for breast cancer biomarker assessment. All pathologists are anonymized as Pathologist 1-4 throughout this report.