January 23, 2026
DeepECG AI
Alexis Nolin-Lapalme, Achille Sowa, Jacques Delfrate, Olivier Tastet, Denis Corbin, Merve Kulbay, Derman Ozdemir, Marie-Jeanne Noël, François-Christophe Marois-Blanchet, François Harvey, Surbhi Sharma, Minhaj Ansari, I-Min Chiu, Valentina D'souza, Sam F. Friedman, Michaël Chassé, Brian J. Potter, Jonathan Afilalo, Pierre Adil Elias, Gilbert Jabbour, Mourad Bahani, Marie-Pierre Dubé, Patrick M. Boyle, Neal A. Chatterjee, Joshua Barrios, Geoffrey H. Tison, David Ouyang, Mahnaz Maddah, Shaan Khurshid, Julia Cadrin-Tourigny, Rafik Tadros, Julie Hussin, Robert Avram
Foundation Models for ECG Interpretation

The 12-lead electrocardiogram (ECG) remains a cornerstone of cardiovascular diagnostics, with over 300 million performed annually worldwide. Existing AI solutions for automated interpretation often lack generalizability, remain closed source, and depend on supervised learning requiring extensive labelled datasets.

We developed and validated two open-source foundation models for ECG interpretation: DeepECG-SL, trained using traditional supervised learning, and DeepECG-SSL, a self-supervised model leveraging contrastive learning and masked-lead modeling. Both models predict 77 cardiac conditions derived from American Heart Association recommendations.

Training and Validation

Training Data: Over 1 million ECGs from the Montreal Heart Institute (MHI-ds), with DeepECG-SSL additionally pre-trained on 1.9 million ECGs combining MHI-ds, Code-15, and MIMIC-IV datasets.

External Validation: Both models were validated across 11 geographically diverse cohorts totaling 881,403 ECGs:

  1. 4 public datasets (UK Biobank, CLSA, MIMIC-IV, PTB): 373,865 ECGs
  2. 7 healthcare centers (UCSF, MGH, Cedars-Sinai, JGH, UW, NYP, CHUM): 507,538 ECGs

Multilingual Capability: We developed a BERT-based classifier trained on 640,518 paragraph-label pairs to enable automated diagnostic extraction from both English and French ECG reports.

ECG Interpretation Performance

Both models achieved consistent high performance across internal and external datasets:

Dataset DeepECG-SL AUROC DeepECG-SSL AUROC
MHI (Internal) 0.992 (0.992, 0.992) 0.990 (0.990, 0.990)
External Public 0.980 (0.980, 0.980) 0.981 (0.981, 0.981)
External Healthcare 0.983 (0.983, 0.984) 0.983 (0.983, 0.983)

Performance remained robust across diagnostic categories: rhythm disorders (AUROC >0.92), conduction abnormalities (>0.96), and chamber enlargement (>0.92).

Digital Biomarker Tasks

We evaluated both models on emerging biomarker applications beyond traditional ECG interpretation:

Left Ventricular Ejection Fraction (LVEF)

  1. LVEF ≤40% classification: DeepECG-SSL 0.926 vs DeepECG-SL 0.917 (P<0.001)
  2. LVEF <50% classification: Comparable performance across models

5-Year Atrial Fibrillation Risk Prediction (iAF5)

  1. DeepECG-SSL 0.742 vs DeepECG-SL 0.734 (P<0.001, n=132,050 ECGs)
  2. DeepECG-SSL demonstrated superior external generalization on MIMIC-IV

Long QT Syndrome (LQTS)

  1. LQTS detection: 0.767 vs 0.735 (P=0.117, n=934 ECGs)
  2. LQTS genotype classification (Type 1 vs Type 2): 0.931 vs 0.850 (P=0.026, n=127 ECGs)

The SSL advantage widened inversely with training dataset size, demonstrating particular value for rare diseases and data-limited clinical applications.

Preprocessing Pipeline

We developed an automated three-step preprocessing pipeline enabling deployment across heterogeneous ECG acquisition systems:

  1. High-pass filtering: Detects excessive low-frequency noise (<1 Hz) via Fast Fourier Transform comparison with the 1-30 Hz diagnostic band
  2. Artefact suppression: Identifies and flattens narrowband interference (50/60 Hz) using LOESS fitting
  3. Amplitude scaling: Normalizes signals to consistent millivolt range

This pipeline improved cross-dataset AUROC by up to 0.251, addressing a fundamental challenge in ECG-AI deployment.

Fairness Analysis

Using the Equalized Odds framework, both models demonstrated strong fairness across demographics:

  1. True-positive rate differences by sex: <0.01
  2. False-positive rate differences: <0.02
  3. DeepECG-SSL showed marginally better balance across age and gender groups
Metric DeepECG-SL DeepECG-SSL
Parameters 1.51M 90.37M
Operations 530.57 MMAC 14.17 GMAC
Energy (1000 ECGs, GPU) 0.1786 Wh 0.7463 Wh

DeepECG-SL is 60 times smaller and 29 times faster at inference, reducing CO₂ emissions by up to 9.7 times on equivalent tasks.

Comparison with State-of-the-Art

Direct comparison across shared diagnostic classes demonstrated consistent superiority of DeepECG models over ECGFounder and ECG-FM on external datasets. DeepECG-SSL achieved net reclassification improvements ranging from +0.113 to +1.20 across overlapping labels.

Conclusions

Self-supervised learning enables development of generalizable, high-performance, and fair ECG models. DeepECG-SSL excelled in adapting to novel tasks when annotated data is limited, while DeepECG-SL offers a lightweight alternative suitable for resource-constrained environments. Both models maintain robust fairness across demographic groups.

By releasing model weights, preprocessing tools, and validation code, we aim to support robust, data-efficient AI diagnostics across diverse clinical environments.

Code: https://github.com/HeartWise-AI/DeepECG_Docker/tree/main