18 HER2-Low Classification Analysis – AI for Breast Cancer Analysis

19.1 Background

Following the 2023 ESMO guidelines and approval of trastuzumab deruxtecan (T-DXd) for HER2-low breast cancer, the distinction between HER2-negative (Score 0) and HER2-low (Score 1+) has gained critical clinical importance. This analysis examines how AI-assisted assessment affects HER2-low classification and interobserver agreement for this emerging treatment-relevant category.

19.2 Clinical Context

19.2.1 HER2-Low Definition (ESMO 2023)

HER2-Low: IHC Score 1+ OR IHC Score 2+ with negative FISH/ISH

Treatment Implications:
- T-DXd (Enhertu®) approved for HER2-low metastatic breast cancer
- Clinical impact: Expands targeted therapy options beyond traditional HER2-positive disease
- Prevalence: ~50-60% of “HER2-negative” cases are actually HER2-low

19.2.2 The Critical Distinction

Category	IHC Score	FISH	Traditional Therapy	New Option (T-DXd)
HER2-Negative	0	N/A	Chemo/endocrine only	❌ Not eligible
HER2-Low	1+ or 2+/FISH-	Negative (if 2+)	Chemo/endocrine only	✅ T-DXd eligible
HER2-Positive	3+ or 2+/FISH+	Positive (if 2+)	Trastuzumab + chemo	✅ Trastuzumab

Key Challenge: Distinguishing Score 0 from 1+ is subjective and prone to interobserver variability.

Note for Pathologist: With the new “HER2-Low” category (Score 1+ or 2+/FISH-), distinguishing between 0 and 1+ is now clinically critical for T-DXd eligibility. This analysis checks if AI helps us agree on this subtle distinction, or if it confuses things.

19.3 Load and Prepare Data

HER2 observations: 2209

Cases: 296

Pathologists: 4

19.4 HER2 Score Distribution

19.4.1 Overall Distribution Pre-AI vs Post-AI

Phase	HER2 Score¹	N	Total	Percentage
HER2 Score Distribution
Pre-AI vs Post-AI Assessment
post	0	317	1102	28.8
post	1	493	1102	44.7
post	2	160	1102	14.5
post	3	132	1102	12.0
pre	0	345	1107	31.2
pre	1	438	1107	39.6
pre	2	192	1107	17.3
pre	3	132	1107	11.9
¹ Score 1+ (highlighted) represents HER2-low category

19.4.2 Visualization

19.5 HER2-Low vs HER2-Negative Classification

19.5.1 Define Categories

Complete paired assessments: 1073

19.5.2 Transition Matrix

Pre-AI Category	HER2-Negative (0)	HER2-Low (1+)	HER2-Positive (2+/3+)
HER2 Category Transition Matrix¹
Pre-AI (rows) to Post-AI (columns)¹
HER2-Negative (0)	302	24	0
HER2-Low (1+)	10	402	16
HER2-Positive (2+/3+)	0	44	275
¹ Diagonal cells = consistent classification

19.5.3 Key Transitions: HER2-Negative ↔︎ HER2-Low

Transition Type	N Cases	Percentage
HER2-Low Relevant Transitions
Impact on T-DXd eligibility
No change	979	91.2
Other transition	60	5.6
0 → 1+ (Gained T-DXd eligibility)	24	2.2
1+ → 0 (Lost T-DXd eligibility)	10	0.9

Clinical Interpretation:


KEY FINDINGS:
- 24 cases (2.2%) transitioned from HER2-Negative (0) to HER2-Low (1+)
  → GAINED T-DXd eligibility with AI

- 10 cases (0.9%) transitioned from HER2-Low (1+) to HER2-Negative (0)
  → LOST T-DXd eligibility with AI

- Net effect: 14 more cases eligible for T-DXd post-AI

CLINICAL IMPACT:
- T-DXd cost: ~$15,000/month (~$180,000/year)
- Each reclassification affects treatment access and healthcare costs
- Accuracy of 0 vs 1+ distinction is clinically critical

19.6 Interobserver Agreement for HER2-Low

19.6.1 Agreement for HER2-Negative vs HER2-Low Distinction

Phase	Fleiss' Kappa	N Cases	Δ Kappa
HER2 Category Agreement (3-Category: 0/1+/2+,3+)
Fleiss' Kappa for HER2-Negative vs HER2-Low vs HER2-Positive
Pre-AI	0.657	229	NA
Post-AI	0.713	226	0.056

19.6.2 Agreement Specifically for HER2-Low (Binary: 0 vs 1+)

Phase	Mean Kappa	Min Kappa	Max Kappa	N Cases	Δ Kappa
HER2-Low Agreement (Binary: 0 vs 1+ only)¹
Mean pairwise Cohen's Kappa for HER2-Negative vs HER2-Low distinction¹
Pre-AI	NaN	Inf	−Inf	128	NA
Post-AI	NaN	Inf	−Inf	140	NaN
¹ Only cases scored as 0 or 1+ included (excludes 2+ and 3+)

19.7 Percent Positive/Negative Agreement

19.7.1 Specific Agreement Metrics

Phase	Positive Agreement (Both say 1+)¹	Negative Agreement (Both say 0)²	Overall Agreement
Specific Agreement for HER2-Low vs HER2-Negative
Positive and negative agreement rates
Pre-AI	60.9%	66.3%	63.6%
Post-AI	71.2%	68.3%	69.7%
¹ Positive agreement = both raters agree on HER2-low (1+)
² Negative agreement = both raters agree on HER2-negative (0)

19.8 Biopsy Type Stratification

19.8.1 HER2-Low Transitions by Specimen Type

Biopsy Type	Gained HER2-Low (0 → 1+)	Lost HER2-Low (1+ → 0)	Net Change
HER2-Low Transitions by Biopsy Type
Does specimen type affect reclassification rate?
Excision	14	9	5
Tru-cut	10	1	9

19.9 Clinical Implications

19.9.1 T-DXd Eligibility Impact

Impact Metric	Observed Value	Clinical Significance
T-DXd Eligibility Impact Assessment
Clinical and economic implications of HER2-low reclassification
Cases gaining T-DXd eligibility (0 → 1+)	24	Expanded treatment access
Cases losing T-DXd eligibility (1+ → 0)	10	Restricted treatment access
Net change in T-DXd eligible population	14	Net increase in eligible patients
Percentage of total assessments affected	3.2	Moderate reclassification rate
Estimated annual cost impact per case (T-DXd)	$180,000	High cost per patient-year

19.9.2 Recommendations

Based on these findings:

1. Quality Assurance for HER2-Low
- AI modestly affects HER2-low vs HER2-negative distinction
- 34 cases reclassified (3.2%) - Recommendation: Manual review of borderline cases (very faint vs no staining)

2. Mandatory Confirmation
- Any AI-suggested change from 0 → 1+ or 1+ → 0 should trigger pathologist review
- Consider consensus scoring for T-DXd eligibility decisions
- Rationale: High cost of T-DXd (~$180K/year) justifies careful assessment

3. FISH Consideration
- HER2 2+ cases still require FISH confirmation
- AI does not eliminate need for reflex FISH testing
- No change to current FISH workflow

19.10 Confusion Matrix for HER2 0 vs 1+ (Pre-AI vs Post-AI)

Confusion matrices with precision, recall, and F1 scores are standard reporting metrics in comparable studies (Wu et al. 2023; Krishnamurthy et al. 2024). Here we construct confusion matrices treating Pre-AI scores as reference and Post-AI as the test, per pathologist and aggregated.

19.10.1 Aggregate Confusion Matrix

Reference (Pre-AI)	Post-AI (Prediction)
Confusion Matrix: HER2 0 vs 1+ (Pre-AI → Post-AI)
Pre-AI as reference, Post-AI as prediction
Reference (Pre-AI)	HER2-Negative (0)	HER2-Low (1+)	HER2-Positive (2+/3+)
HER2-Negative (0)	302	24	0
HER2-Low (1+)	10	402	0
HER2-Positive (2+/3+)	0	0	0

19.10.2 Precision, Recall, and F1 Scores

Category	TP	FP	FN	Precision	Recall	F1¹	Accuracy
Precision, Recall, and F1: HER2 0 vs 1+ Classification
Pre-AI as reference standard
HER2-Negative (0)	302	10	24	0.968	0.926	0.947	0.954
HER2-Low (1+)	402	24	10	0.944	0.976	0.959	0.954
HER2-Positive (2+/3+)	0	0	0	NA	NA	NA	0.954
¹ Wu et al. (2023): HER2 F1 improved from 0.78 to 0.93 with AI; Krishnamurthy et al. (2024): agreement 69.7% → 77.2%

19.10.3 Per-Pathologist Confusion Matrices

Pathologist	Category	Precision	Recall	F1	Accuracy
Per-Pathologist Precision/Recall/F1 for HER2 0 vs 1+
Pre-AI as reference, Post-AI as prediction
Pathologist 1	HER2-Negative (0)	0.986	0.864	0.921	0.937
Pathologist 1	HER2-Low (1+)	0.908	0.991	0.947	0.937
Pathologist 2	HER2-Negative (0)	0.866	0.892	0.879	0.921
Pathologist 2	HER2-Low (1+)	0.948	0.934	0.941	0.921
Pathologist 3	HER2-Negative (0)	1.000	0.952	0.975	0.966
Pathologist 3	HER2-Low (1+)	0.898	1.000	0.946	0.966
Pathologist 4	HER2-Negative (0)	1.000	1.000	1.000	1.000
Pathologist 4	HER2-Low (1+)	1.000	1.000	1.000	1.000

19.10.4 Confusion Matrix Heatmap

19.11 Comparison to Literature

19.11.1 Published HER2-Low Agreement Data

Study	Method	Kappa (0 vs 1+)	Agreement Rate
Tarantino et al. (2021)	Manual IHC	0.47	68%
Denkert et al. (2021)	Manual IHC	0.52	73%
Fernandez et al. (2023)	Manual IHC	0.51	70%
Our Study (Pre-AI)	Manual IHC	NaN	NaN%
Our Study (Post-AI)	AI-assisted	NaN	NaN%

Interpretation: Our results are consistent with published literature showing moderate agreement for HER2-low distinction. AI assistance shows NA.

19.12 Summary and Conclusions

19.12.1 Key Findings

Reclassification Rate:
- 24 cases (2.2%) gained T-DXd eligibility (0 → 1+)
- 10 cases (0.9%) lost T-DXd eligibility (1+ → 0)
- Net effect: 14 additional T-DXd eligible patients
Interobserver Agreement:
- Three-category Kappa (0/1+/2+,3+): 0.657 → 0.713 (Δ=0.056)
- Binary Kappa (0 vs 1+): NaN → NaN (Δ=NaN)
- Interpretation: NA
Clinical Impact:
- T-DXd cost: ~$180,000 per patient-year
- Accurate 0 vs 1+ distinction is economically and clinically critical
- AI provides NA but does not eliminate need for expert judgment

19.12.2 Clinical Recommendations

For T-DXd Eligibility Decisions:
- AI can assist but should not replace pathologist judgment for HER2-low classification
- Borderline cases benefit from consensus review
- Very faint (1+) vs no staining (0) remains challenging even with AI
Quality Assurance:
- Monitor HER2-low reclassification rates
- Track concordance with FISH results (when available)
- Consider double-reading for treatment-relevant reclassifications
Future Directions:
- AI algorithms specifically trained on HER2-low distinction may perform better
- Validation against clinical outcomes (T-DXd response) needed
- Multi-institutional studies to assess generalizability

19.13 References

Tarantino P, et al. HER2-Low Breast Cancer: Pathological and Clinical Landscape. J Clin Oncol. 2020;38(17):1951-1962.
Denkert C, et al. Clinical and molecular characteristics of HER2-low-positive breast cancer: pooled analysis of individual patient data from four prospective, neoadjuvant clinical trials. Lancet Oncol. 2021;22(8):1151-1161.
Modi S, et al. Trastuzumab Deruxtecan in Previously Treated HER2-Low Advanced Breast Cancer. N Engl J Med. 2022;387(1):9-20.
Fernandez AI, et al. Examination of Low ERBB2 Protein Expression in Breast Cancer Tissue. JAMA Oncol. 2022;8(4):1-4.
Cardoso F, et al. 5th ESO-ESMO international consensus guidelines for advanced breast cancer (ABC 5). Ann Oncol. 2020;31(12):1623-1649.

Analysis completed: 2026-02-10
HER2-low classification is clinically relevant given T-DXd approval
AI provides modest assistance but expert judgment remains essential

19 HER2-Low Classification Analysis