Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead

Niccolò Cavagnero; Fernando Fernandes dos Santos; Marco Ciccone; Giuseppe Averta; Tatiana Tommasi; Paolo Rech

Pré-Publication, Document De Travail Année : 2022

Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead

(1) , (2) , (1) , (1) , (1) , (3)

1
2
3

Niccolò Cavagnero

Fonction : Auteur

Politecnico di Torino = Polytechnic of Turin

Fernando Fernandes dos Santos

Fonction : Auteur
PersonId : 754557
IdHAL : ffernand
ORCID : 0000-0002-3504-9862

Architectures matérielles spécialisées pour l’ère post loi-de-Moore

Marco Ciccone

Fonction : Auteur

Politecnico di Torino = Polytechnic of Turin

Giuseppe Averta

Fonction : Auteur

Politecnico di Torino = Polytechnic of Turin

Tatiana Tommasi

Fonction : Auteur

Politecnico di Torino = Polytechnic of Turin

Paolo Rech

Fonction : Auteur

Università degli Studi di Trento = University of Trento

Résumé

Deep Neural Networks (DNNs) enable a wide series of technological advancements, ranging from clinical imaging, to predictive industrial maintenance and autonomous driving. However, recent findings indicate that transient hardware faults may corrupt the models prediction dramatically. For instance, the radiation-induced misprediction probability can be so high to impede a safe deployment of DNNs models at scale, urging the need for efficient and effective hardening solutions. In this work, we propose to tackle the reliability issue both at training and model design time. First, we show that vanilla models are highly affected by transient faults, that can induce a performances drop up to 37%. Hence, we provide three zero-overhead solutions, based on DNN re-design and re-train, that can improve DNNs reliability to transient faults up to one order of magnitude. We complement our work with extensive ablation studies to quantify the gain in performances of each hardening component.

Domaines

Informatique [cs]

Fernando Fernandes dos Santos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03684224

Soumis le : mercredi 1 juin 2022-11:35:52

Dernière modification le : mardi 19 mars 2024-09:34:06

Dates et versions

hal-03684224 , version 1 (01-06-2022)

Licence

Paternité

Identifiants

HAL Id : hal-03684224 , version 1
ARXIV : 2205.14420

Citer

Niccolò Cavagnero, Fernando Fernandes dos Santos, Marco Ciccone, Giuseppe Averta, Tatiana Tommasi, et al.. Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead. 2022. ⟨hal-03684224⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

43 Consultations

0 Téléchargements

Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager