AI for Breast Cancer Analysis

Authors

Affiliations

Fadime Gul Salman

Memorial Hospitals Group Department of Pathology

Murat Oktay

Memorial Hospitals Group Department of Pathology

Çiğdem Irkkan

Memorial Hospitals Group Department of Pathology

Serdar Balcı

Memorial Hospitals Group Department of Pathology

Fatma Aktepe

Memorial Hospitals Group Department of Pathology

Ilknur Turkmen

Memorial Hospitals Group Department of Pathology

Published

February 10, 2026

Introduction

Background

Breast cancer is the most common malignancy in women worldwide, and accurate diagnosis of biomarker expression is critical for treatment planning. The assessment of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and Ki67 proliferation index through immunohistochemistry (IHC) forms the foundation of breast cancer molecular classification and treatment decisions.

Despite standardized protocols, inter-observer variability in biomarker interpretation remains a significant challenge in diagnostic pathology. Studies have shown considerable disagreement among pathologists, particularly for:

Ki67 proliferation index: High variability due to scoring methodology and tumor heterogeneity
HER2 scoring: Challenges in distinguishing between scores 1+ and 2+, impacting reflex testing decisions
ER/PR low-positive cases: Difficulty in accurately assessing cases with 1-10% positivity

The Promise of Artificial Intelligence

Recent advances in artificial intelligence (AI) and digital pathology offer potential solutions to reduce inter-observer variability. AI algorithms can:

Provide objective, reproducible quantification of biomarker expression
Standardize assessment across different observers and institutions
Potentially improve diagnostic accuracy and consistency
Assist pathologists in challenging cases

However, the real-world impact of AI on pathologist behavior and diagnostic consensus remains unclear. Questions include:

Does AI truly improve inter-observer agreement?
How do individual pathologists adopt and integrate AI recommendations?
Which types of cases benefit most from AI assistance?
Does AI introduce new biases or systematic errors?
What is the clinical impact of AI-influenced diagnoses on treatment decisions?

Study Rationale

This validation study aims to comprehensively evaluate the impact of Aiforia AI system on breast cancer biomarker assessment by pathologists. We hypothesize that:

AI assistance will improve inter-observer agreement for all biomarkers
AI will be particularly beneficial in borderline and difficult cases
Individual pathologists will show varying degrees of AI adoption
AI-assisted diagnosis will lead to meaningful changes in molecular classification and treatment recommendations

Study Design

Participants

Four board-certified pathologists with varying levels of experience independently evaluated breast cancer cases for ER, PR, HER2, and Ki67 expression.

Two-Phase Assessment

Phase 1 (Pre-AI): Each pathologist independently assessed all cases using standard microscopic evaluation and digital pathology workflows, recording their initial interpretations.

Phase 2 (Post-AI): The same pathologists re-evaluated the identical cases using the Aiforia AI platform, which provided AI-generated biomarker quantifications and recommendations. Pathologists then recorded their final interpretations, which could either align with or differ from the AI suggestions.

Biomarkers Assessed

ER (Estrogen Receptor): Percentage of positive tumor cells (0-100%)
PR (Progesterone Receptor): Percentage of positive tumor cells (0-100%)
Ki67: Percentage of tumor cells with positive nuclear staining (0-100%)
HER2: Score 0, 1+, 2+, or 3+ based on membrane staining pattern

Molecular Classification

Cases were classified into molecular subtypes using a hierarchical algorithm:

HER2 Positive: HER2 Score 2+ or 3+ (regardless of other markers)
Luminal A: HER2 (0/1+) AND ER > 10% AND PR > 10% AND Ki67 < 30%
Luminal B: HER2 (0/1+) AND ER > 10% AND Ki67 ≥ 30%
Hormone Weak Positive: HER2 (0/1+) AND (ER > 0 OR PR > 0) AND (Not Luminal A/B)
Triple Negative: HER2 (0/1+) AND ER = 0 AND PR = 0

Analytical Approach

This study employs a multi-faceted analytical strategy:

Inter-Observer Agreement Analysis

Intraclass Correlation Coefficient (ICC) for continuous markers
Fleiss’ Kappa for categorical markers
Comparison of pre-AI vs post-AI agreement metrics

Impact of AI on Individual Assessments

Magnitude and direction of changes after AI input
Individual pathologist adoption patterns
Cases where AI improved vs worsened consensus

Discordance Analysis

Identification of high-variance cases
Cases where AI increased disagreement
Marker-specific discordance patterns

Statistical Validation

Bootstrap confidence intervals for agreement metrics
Paired tests for systematic changes
Mixed effects models accounting for nested structure
Variance component analysis

Clinical Impact Assessment

Treatment decision changes (endocrine therapy, HER2-targeted therapy, chemotherapy)
Molecular subtype reclassification rates
Impact on FISH testing requirements
Cost-effectiveness implications

Systematic Bias Detection

Overall AI bias (over/underestimation)
Range-dependent and threshold effects
Regression to the mean analysis
Bland-Altman plots

Individual Performance Metrics

Intra-pathologist consistency
AI adoption indices
Agreement with group consensus
Learning effects over time

Subgroup Analysis

Performance stratified by molecular subtype
HER2 status subgroups
Borderline cases near clinical thresholds
Triple negative and Luminal A/B differentiation

Study Objectives

Primary Objectives

Determine if AI assistance significantly improves inter-observer agreement for breast cancer biomarkers
Quantify the clinical impact of AI on treatment-relevant diagnostic classifications

Secondary Objectives

Characterize individual pathologist adoption patterns and AI influence
Identify case types and scenarios where AI provides maximum benefit
Detect and quantify systematic biases introduced by AI
Evaluate AI performance across different molecular subtypes and clinical scenarios
Assess the real-world feasibility and workflow integration of AI-assisted diagnosis

Significance

This study provides comprehensive, real-world evidence on:

The actual impact of AI on diagnostic reproducibility in breast pathology
Individual variation in AI adoption and integration
Clinical scenarios where AI adds most value
Potential pitfalls and systematic biases to monitor
Evidence-based recommendations for AI implementation in diagnostic pathology

The findings will inform best practices for AI integration, quality assurance protocols, and training requirements for AI-assisted breast cancer diagnosis.

Data Source

Study data and materials are available at: https://drive.google.com/drive/folders/177V9w4XYFs6UjU2qQDpmHZFSeX_RWTP_?usp=sharing

Note: This is a validation study of the Aiforia AI system for breast cancer biomarker assessment. All pathologists are anonymized as Pathologist 1-4 throughout this report.