June 30, 2025
DeepECG AI
Alexis Nolin-Lapalme, Achille Sowa, Jacques Delfrate, Olivier Tastet, Denis Corbin, Merve Kulbay, Derman Ozdemir, Marie-Jeanne Noël, François-Christophe Marois-Blanchet, François Harvey, Surbhi Sharma, Minhaj Ansari, I-Min Chiu, Valentina Dsouza, Sam F. Friedman, Michaël Chassé, Brian J. Potter, Jonathan Afilalo, Pierre Adil Elias, Gilbert Jabbour, Mourad Bahani, Marie-Pierre Dubé, Patrick M. Boyle, Neal A. Chatterjee, Joshua Barrios, Geoffrey H. Tison, David Ouyang, Mahnaz Maddah, Shaan Khurshid, Julia Cadrin-Tourigny, Rafik Tadros, Julie Hussin, Robert Avram
An ECG at the Heart of Artificial Intelligence

The 12-lead electrocardiogram (ECG) is a fundamental tool for diagnosing heart disease. However, artificial intelligence (AI) solutions developed to date are often limited: they lack generalizability, are rarely open-source, and rely primarily on supervised learning, which hinders their adaptation to diverse clinical settings.

Faced with these challenges, we designed and compared two fundamental AI models for ECG: DeepECG-SSL, a model based on self-supervised learning, and DeepECG-SL, a traditional supervised model.

The goal? To offer a more robust, equitable and efficient solution for the automated interpretation of ECGs.

The strategy

We trained our models on over a million ECGs from the Montreal Heart Institute. We also worked to develop an automatic approach for generating diagnostic labels from the reports associated with these ECGs, using a BERT model capable of working in both English and French.

The models were then evaluated in seven hospitals and on four public datasets, covering a wide range of diagnoses. An equity analysis was also conducted to assess potential performance disparities by age and gender.

The results:

DeepECG-SSL demonstrated excellent performance in terms of Area Under the ROC Curve (AUROC), reaching 0.990 on the internal dataset, 0.981 on public databases, and 0.983 on private databases. DeepECG-SL achieved comparable results with AUROCs of 0.992, 0.980, and 0.983, respectively. These consistent high performances on internal, public, and private databases highlight the robustness and generalization ability of both models, thus strengthening their potential for large-scale clinical application.

The same models with new sauces:

A major objective of our study was to evaluate the ability of the DeepECG-SL and DeepECG-SSL models to generalize to novel tasks, specifically left ventricular ejection fraction (LVEF) prediction and classification, long QT syndrome (LQTS) subtype detection and classification, and 5-year atrial fibrillation risk prediction (iAF5). These tasks were selected not only because we had large annotated databases from our external validation sites, but also because they allow direct comparison with previous clinical studies conducted by our team.

We evaluated these tasks on internal and external datasets, measuring model performance using AUROC. DeepECG-SSL outperformed DeepECG-SL for 5-year atrial fibrillation risk prediction (0.742 vs. 0.720, Δ=0.022, P<0.001) and identification of reduced ejection fraction ≤40% (0.928 vs. 0.900, Δ=0.028, P<0.001), while maintaining equivalent performance for LVEF classification <50% and LQTS detection. In external validation on independent databases, DeepECG-SSL demonstrated better generalization across multiple institutions and populations. Analysis of the impact of training data size confirmed this advantage, especially for tasks with a limited volume of annotations. These results highlight the potential of self-supervised learning to improve the robustness and adaptability of ECG models in various clinical settings.

Equity:

In addition to their diagnostic performance, we evaluated the models’ fairness to ensure consistent results across different demographic groups. Fairness in AI relies on metrics such as equalized odds, which checks that the true positive rate (TPR) and false positive rate (FPR) are similar across groups, thus limiting bias. Our analyses show that both models exhibit strong fairness, with TPR/FPR differences between genders below 0.01. DeepECG-SSL displays better balance across age and gender groups, strengthening its potential for equitable clinical application.

Conclusion:

Our results demonstrate that self-supervised learning applied to ECG allows the development of generalizable, high-performance, and fair models. Compared with supervised learning, DeepECG-SSL excelled in adapting to new tasks, especially when annotated data is limited, while maintaining robust fairness across demographic groups.

By integrating advanced automatic language processing methods for extracting diagnoses and validating our models on various databases, we have laid the foundations for an open, transparent and accessible AI for interpreting ECGs.

Code: https://github.com/HeartWise-AI/DeepECG_Docker/tree/main