https://hal.inria.fr/hal-03081582Lee, WonyeolWonyeolLeeCS - Department of Computer Science [KAIST] - KAIST - Korea Advanced Institute of Science and TechnologyYu, HangyeolHangyeolYuCS - Department of Computer Science [KAIST] - KAIST - Korea Advanced Institute of Science and TechnologyRival, XavierXavierRivalANTIQUE - Analyse Statique par Interprétation Abstraite - DI-ENS - Département d'informatique - ENS Paris - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - Inria - Institut National de Recherche en Informatique et en Automatique - CNRS - Centre National de la Recherche Scientifique - Inria de Paris - Inria - Institut National de Recherche en Informatique et en AutomatiqueDI-ENS - Département d'informatique - ENS Paris - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - Inria - Institut National de Recherche en Informatique et en Automatique - CNRS - Centre National de la Recherche ScientifiquePSL - Université Paris sciences et lettresYang, HongseokHongseokYangCS - Department of Computer Science [KAIST] - KAIST - Korea Advanced Institute of Science and TechnologyOn Correctness of Automatic Differentiation for Non-Differentiable FunctionsHAL CCSD2020[INFO.INFO-PL] Computer Science [cs]/Programming Languages [cs.PL]Rival, Xavier2020-12-18 11:21:382023-03-24 14:53:202020-12-18 14:36:51enConference papersapplication/pdf1Differentiation lies at the core of many machine-learning algorithms, and is wellsupported by popular autodiff systems, such as TensorFlow and PyTorch. Originally, these systems have been developed to compute derivatives of differentiable functions, but in practice, they are commonly applied to functions with non-differentiabilities. For instance, neural networks using ReLU define nondifferentiable functions in general, but the gradients of losses involving those functions are computed using autodiff systems in practice. This status quo raises a natural question: are autodiff systems correct in any formal sense when they are applied to such non-differentiable functions? In this paper, we provide a positive answer to this question. Using counterexamples, we first point out flaws in oftenused informal arguments, such as: non-differentiabilities arising in deep learning do not cause any issues because they form a measure-zero set. We then investigate a class of functions, called PAP functions, that includes nearly all (possibly nondifferentiable) functions in deep learning nowadays. For these PAP functions, we propose a new type of derivatives, called intensional derivatives, and prove that these derivatives always exist and coincide with standard derivatives for almost all inputs. We also show that these intensional derivatives are what most autodiff systems compute or try to compute essentially. In this way, we formally establish the correctness of autodiff systems applied to non-differentiable functions.