The performance of overseas-trained AI models on Australian screening mammograms
Abstract
Background: BreastScreen Australia (BSA) uses double-blinded reading, with a third arbiter if both radiologists disagree on a case. Although screening has reduced breast cancer-related mortality by 32%, it increases workload pressures for breast radiologists. Aims: This study aims to evaluate the performance of two US publicly available Artificial Intelligence (AI) models on 10 BreastScreen Reader Assessment Strategy (BREAST) sets of screening mammograms for contextualization to an Australian setting.
Methods: BREAST contains 660 cases (200 malignant and 460 normal) which are routinely used to assess the performance of Australian radiologists. The contrast of BREAST images was enhanced using contrast limited adaptive histogram equalization algorithm. Transfer learning of Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) AI models was then conducted on a screening mammography database called Lifepool. The specificity, case sensitivity and lesion sensitivity were used to evaluate the AI performances on the BREAST set. An ANOVA test was conducted between the performance of the two AI models.
Results: The GMIC model average specificity was 85.3%, case sensitivity 85.5% and lesion sensitivity 83.0%. The GLAM model average specificity was 80.0%, case sensitivity 80.1% and lesion sensitivity 77.4%. GMIC outperformed GLAM in all three performance metrics. GMIC and GLAM had statistically significant difference in the malignancy probabilities of mammograms (P-value=0.037).
Conclusions: With transfer learning and contrast enhancement, the US AI models can correctly identify cancer cases in an Australian context. The potential of AI tools to detect cancers for mammogram screening within BSA is promising to a similar radiologist’ standard.