Skip to Main content Skip to Navigation
Conference papers

DataStates: Towards Lightweight Data Models for Deep Learning

Abstract : A key emerging pattern in deep learning applications is the need to capture intermediate DNN model snapshots and preserve or clone them to explore a large number of alternative training and/or inference paths. However, with increasing model complexity and new training approaches that mix data, model, pipeline and layer-wise parallelism, this pattern is challenging to address in a scalable and efficient manner. To this end, this position paper advocates for rethinking how to represent and manipulate DNN learning models. It relies on a broader notion of data states, a collection of annotated, potentially distributed data sets (tensors in the case of DNN models) that AI applications can capture at key moments during the runtime and revisit/reuse later. Instead explicitly interacting with the storage layer (e.g., write to a file), users can "tag" DNN models at key moments during runtime with metadata that expresses attributes and persistency/movement semantics. A high-performance runtime is the responsible to interpret the metadata and perform the necessary actions in the background, while offering a rich interface to find data states of interest. Using this approach has benefits at several levels: new capabilities, performance portability, high performance and scalability.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02941295
Contributor : Bogdan Nicolae <>
Submitted on : Wednesday, September 16, 2020 - 10:15:43 PM
Last modification on : Wednesday, October 14, 2020 - 4:11:55 AM

File

p67_nicolae.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02941295, version 1

Citation

Bogdan Nicolae. DataStates: Towards Lightweight Data Models for Deep Learning. SMC'20: The 2020 Smoky Mountains Computational Sciences and Engineering Conference, Aug 2020, Nashville (virtual conference), United States. ⟨hal-02941295⟩

Share

Metrics

Record views

17

Files downloads

39