27  Clustering Analysis: Cases vs Markers

27.1 Objective

Cluster CASES based on marker data (ER, PR, Ki67, HER2) from all pathologists. The goal is to visualize how cases group together biologically and how consistent these groupings are across pathologists, comparing Pre-AI and Post-AI phases.

Note for Pathologist: This is an experimental visualization. It attempts to group patients based on their similarity. If the AI is working, we expect the groups to become sharper and more distinct (e.g., all “Luminal B” patients clustering tighter together) because everyone is scoring them more similarly.

27.2 Setup

27.3 Load Data

27.4 Data Preparation

We will create a matrix where:
- Rows = Cases
- Columns = Marker values for each Pathologist (e.g., ER_Path1, ER_Path2, PR_Path1…)

We will do this separately for Pre-AI and Post-AI.

[1] "Pre-AI Cases: 223"
[1] "Post-AI Cases: 206"

27.5 Clustering Analysis

27.5.1 Pre-AI Clustering

27.5.2 Post-AI Clustering

27.6 Interpretation

27.6.1 Column Clustering (Inter-observer Agreement)

  • Look at the dendrogram at the top.
  • Ideal Scenario: All ER columns (ER_Path1, ER_Path2, etc.) should cluster tightly together, separate from PR, Ki67, and HER2.
  • Comparison: Does the Post-AI heatmap show tighter grouping of the same markers across pathologists compared to Pre-AI? If yes, AI improved consistency.

27.6.2 Row Clustering (Biological Subtypes)

  • Look at the blocks of cases in the heatmap.
  • Biological Groups: We expect to see distinct blocks of:
    • Luminal: High ER/PR, Low/High Ki67, Low HER2.
    • HER2+: High HER2.
    • Triple Negative: Low ER, PR, HER2.
  • Comparison: Are these blocks more distinct (sharper contrast) in the Post-AI heatmap? This would suggest AI helps in clearer delineation of biological subtypes.