18  HER2-Low Classification Analysis

Author

Serdar Balcì, MD

Published

February 10, 2026

19 HER2-Low Classification Analysis

19.1 Background

Following the 2023 ESMO guidelines and approval of trastuzumab deruxtecan (T-DXd) for HER2-low breast cancer, the distinction between HER2-negative (Score 0) and HER2-low (Score 1+) has gained critical clinical importance. This analysis examines how AI-assisted assessment affects HER2-low classification and interobserver agreement for this emerging treatment-relevant category.

19.2 Clinical Context

19.2.1 HER2-Low Definition (ESMO 2023)

HER2-Low: IHC Score 1+ OR IHC Score 2+ with negative FISH/ISH

Treatment Implications:
- T-DXd (Enhertu®) approved for HER2-low metastatic breast cancer
- Clinical impact: Expands targeted therapy options beyond traditional HER2-positive disease
- Prevalence: ~50-60% of “HER2-negative” cases are actually HER2-low

19.2.2 The Critical Distinction

Category IHC Score FISH Traditional Therapy New Option (T-DXd)
HER2-Negative 0 N/A Chemo/endocrine only ❌ Not eligible
HER2-Low 1+ or 2+/FISH- Negative (if 2+) Chemo/endocrine only T-DXd eligible
HER2-Positive 3+ or 2+/FISH+ Positive (if 2+) Trastuzumab + chemo ✅ Trastuzumab

Key Challenge: Distinguishing Score 0 from 1+ is subjective and prone to interobserver variability.

Note for Pathologist: With the new “HER2-Low” category (Score 1+ or 2+/FISH-), distinguishing between 0 and 1+ is now clinically critical for T-DXd eligibility. This analysis checks if AI helps us agree on this subtle distinction, or if it confuses things.


19.3 Load and Prepare Data

HER2 observations: 2209 
Cases: 296 
Pathologists: 4 

19.4 HER2 Score Distribution

19.4.1 Overall Distribution Pre-AI vs Post-AI

HER2 Score Distribution
Pre-AI vs Post-AI Assessment
Phase HER2 Score1 N Total Percentage
post 0 317 1102 28.8
post 1 493 1102 44.7
post 2 160 1102 14.5
post 3 132 1102 12.0
pre 0 345 1107 31.2
pre 1 438 1107 39.6
pre 2 192 1107 17.3
pre 3 132 1107 11.9
1 Score 1+ (highlighted) represents HER2-low category

19.4.2 Visualization


19.5 HER2-Low vs HER2-Negative Classification

19.5.1 Define Categories

Complete paired assessments: 1073 

19.5.2 Transition Matrix

HER2 Category Transition Matrix1
Pre-AI (rows) to Post-AI (columns)1
Pre-AI Category HER2-Negative (0) HER2-Low (1+) HER2-Positive (2+/3+)
HER2-Negative (0) 302 24 0
HER2-Low (1+) 10 402 16
HER2-Positive (2+/3+) 0 44 275
1 Diagonal cells = consistent classification

19.5.3 Key Transitions: HER2-Negative ↔︎ HER2-Low

HER2-Low Relevant Transitions
Impact on T-DXd eligibility
Transition Type N Cases Percentage
No change 979 91.2
Other transition 60 5.6
0 → 1+ (Gained T-DXd eligibility) 24 2.2
1+ → 0 (Lost T-DXd eligibility) 10 0.9

Clinical Interpretation:


KEY FINDINGS:
- 24 cases (2.2%) transitioned from HER2-Negative (0) to HER2-Low (1+)
  → GAINED T-DXd eligibility with AI

- 10 cases (0.9%) transitioned from HER2-Low (1+) to HER2-Negative (0)
  → LOST T-DXd eligibility with AI

- Net effect: 14 more cases eligible for T-DXd post-AI

CLINICAL IMPACT:
- T-DXd cost: ~$15,000/month (~$180,000/year)
- Each reclassification affects treatment access and healthcare costs
- Accuracy of 0 vs 1+ distinction is clinically critical

19.6 Interobserver Agreement for HER2-Low

19.6.1 Agreement for HER2-Negative vs HER2-Low Distinction

HER2 Category Agreement (3-Category: 0/1+/2+,3+)
Fleiss' Kappa for HER2-Negative vs HER2-Low vs HER2-Positive
Phase Fleiss' Kappa N Cases Δ Kappa
Pre-AI 0.657 229 NA
Post-AI 0.713 226 0.056

19.6.2 Agreement Specifically for HER2-Low (Binary: 0 vs 1+)

HER2-Low Agreement (Binary: 0 vs 1+ only)1
Mean pairwise Cohen's Kappa for HER2-Negative vs HER2-Low distinction1
Phase Mean Kappa Min Kappa Max Kappa N Cases Δ Kappa
Pre-AI NaN Inf −Inf 128 NA
Post-AI NaN Inf −Inf 140 NaN
1 Only cases scored as 0 or 1+ included (excludes 2+ and 3+)

19.7 Percent Positive/Negative Agreement

19.7.1 Specific Agreement Metrics

Specific Agreement for HER2-Low vs HER2-Negative
Positive and negative agreement rates
Phase Positive Agreement (Both say 1+)1 Negative Agreement (Both say 0)2 Overall Agreement
Pre-AI 60.9% 66.3% 63.6%
Post-AI 71.2% 68.3% 69.7%
1 Positive agreement = both raters agree on HER2-low (1+)
2 Negative agreement = both raters agree on HER2-negative (0)

19.8 Biopsy Type Stratification

19.8.1 HER2-Low Transitions by Specimen Type

HER2-Low Transitions by Biopsy Type
Does specimen type affect reclassification rate?
Biopsy Type Gained HER2-Low (0 → 1+) Lost HER2-Low (1+ → 0) Net Change
Excision 14 9 5
Tru-cut 10 1 9

19.9 Clinical Implications

19.9.1 T-DXd Eligibility Impact

T-DXd Eligibility Impact Assessment
Clinical and economic implications of HER2-low reclassification
Impact Metric Observed Value Clinical Significance
Cases gaining T-DXd eligibility (0 → 1+) 24 Expanded treatment access
Cases losing T-DXd eligibility (1+ → 0) 10 Restricted treatment access
Net change in T-DXd eligible population 14 Net increase in eligible patients
Percentage of total assessments affected 3.2 Moderate reclassification rate
Estimated annual cost impact per case (T-DXd) $180,000 High cost per patient-year

19.9.2 Recommendations

Based on these findings:

1. Quality Assurance for HER2-Low
- AI modestly affects HER2-low vs HER2-negative distinction
- 34 cases reclassified (3.2%) - Recommendation: Manual review of borderline cases (very faint vs no staining)

2. Mandatory Confirmation
- Any AI-suggested change from 0 → 1+ or 1+ → 0 should trigger pathologist review
- Consider consensus scoring for T-DXd eligibility decisions
- Rationale: High cost of T-DXd (~$180K/year) justifies careful assessment

3. FISH Consideration
- HER2 2+ cases still require FISH confirmation
- AI does not eliminate need for reflex FISH testing
- No change to current FISH workflow


19.10 Confusion Matrix for HER2 0 vs 1+ (Pre-AI vs Post-AI)

Confusion matrices with precision, recall, and F1 scores are standard reporting metrics in comparable studies (Wu et al. 2023; Krishnamurthy et al. 2024). Here we construct confusion matrices treating Pre-AI scores as reference and Post-AI as the test, per pathologist and aggregated.

19.10.1 Aggregate Confusion Matrix

Confusion Matrix: HER2 0 vs 1+ (Pre-AI → Post-AI)
Pre-AI as reference, Post-AI as prediction
Reference (Pre-AI)
Post-AI (Prediction)
HER2-Negative (0) HER2-Low (1+) HER2-Positive (2+/3+)
HER2-Negative (0) 302 24 0
HER2-Low (1+) 10 402 0
HER2-Positive (2+/3+) 0 0 0

19.10.2 Precision, Recall, and F1 Scores

Precision, Recall, and F1: HER2 0 vs 1+ Classification
Pre-AI as reference standard
Category TP FP FN Precision Recall F11 Accuracy
HER2-Negative (0) 302 10 24 0.968 0.926 0.947 0.954
HER2-Low (1+) 402 24 10 0.944 0.976 0.959 0.954
HER2-Positive (2+/3+) 0 0 0 NA NA NA 0.954
1 Wu et al. (2023): HER2 F1 improved from 0.78 to 0.93 with AI; Krishnamurthy et al. (2024): agreement 69.7% → 77.2%

19.10.3 Per-Pathologist Confusion Matrices

Per-Pathologist Precision/Recall/F1 for HER2 0 vs 1+
Pre-AI as reference, Post-AI as prediction
Pathologist Category Precision Recall F1 Accuracy
Pathologist 1 HER2-Negative (0) 0.986 0.864 0.921 0.937
Pathologist 1 HER2-Low (1+) 0.908 0.991 0.947 0.937
Pathologist 2 HER2-Negative (0) 0.866 0.892 0.879 0.921
Pathologist 2 HER2-Low (1+) 0.948 0.934 0.941 0.921
Pathologist 3 HER2-Negative (0) 1.000 0.952 0.975 0.966
Pathologist 3 HER2-Low (1+) 0.898 1.000 0.946 0.966
Pathologist 4 HER2-Negative (0) 1.000 1.000 1.000 1.000
Pathologist 4 HER2-Low (1+) 1.000 1.000 1.000 1.000

19.10.4 Confusion Matrix Heatmap


19.11 Comparison to Literature

19.11.1 Published HER2-Low Agreement Data

Study Method Kappa (0 vs 1+) Agreement Rate
Tarantino et al. (2021) Manual IHC 0.47 68%
Denkert et al. (2021) Manual IHC 0.52 73%
Fernandez et al. (2023) Manual IHC 0.51 70%
Our Study (Pre-AI) Manual IHC NaN NaN%
Our Study (Post-AI) AI-assisted NaN NaN%

Interpretation: Our results are consistent with published literature showing moderate agreement for HER2-low distinction. AI assistance shows NA.


19.12 Summary and Conclusions

19.12.1 Key Findings

  1. Reclassification Rate:
    • 24 cases (2.2%) gained T-DXd eligibility (0 → 1+)
    • 10 cases (0.9%) lost T-DXd eligibility (1+ → 0)
    • Net effect: 14 additional T-DXd eligible patients
  2. Interobserver Agreement:
    • Three-category Kappa (0/1+/2+,3+): 0.657 → 0.713 (Δ=0.056)
    • Binary Kappa (0 vs 1+): NaN → NaN (Δ=NaN)
    • Interpretation: NA
  3. Clinical Impact:
    • T-DXd cost: ~$180,000 per patient-year
    • Accurate 0 vs 1+ distinction is economically and clinically critical
    • AI provides NA but does not eliminate need for expert judgment

19.12.2 Clinical Recommendations

  1. For T-DXd Eligibility Decisions:
    • AI can assist but should not replace pathologist judgment for HER2-low classification
    • Borderline cases benefit from consensus review
    • Very faint (1+) vs no staining (0) remains challenging even with AI
  2. Quality Assurance:
    • Monitor HER2-low reclassification rates
    • Track concordance with FISH results (when available)
    • Consider double-reading for treatment-relevant reclassifications
  3. Future Directions:
    • AI algorithms specifically trained on HER2-low distinction may perform better
    • Validation against clinical outcomes (T-DXd response) needed
    • Multi-institutional studies to assess generalizability

19.13 References

  1. Tarantino P, et al. HER2-Low Breast Cancer: Pathological and Clinical Landscape. J Clin Oncol. 2020;38(17):1951-1962.

  2. Denkert C, et al. Clinical and molecular characteristics of HER2-low-positive breast cancer: pooled analysis of individual patient data from four prospective, neoadjuvant clinical trials. Lancet Oncol. 2021;22(8):1151-1161.

  3. Modi S, et al. Trastuzumab Deruxtecan in Previously Treated HER2-Low Advanced Breast Cancer. N Engl J Med. 2022;387(1):9-20.

  4. Fernandez AI, et al. Examination of Low ERBB2 Protein Expression in Breast Cancer Tissue. JAMA Oncol. 2022;8(4):1-4.

  5. Cardoso F, et al. 5th ESO-ESMO international consensus guidelines for advanced breast cancer (ABC 5). Ann Oncol. 2020;31(12):1623-1649.


Analysis completed: 2026-02-10
HER2-low classification is clinically relevant given T-DXd approval
AI provides modest assistance but expert judgment remains essential