Non quia difficilia sunt non audemus, sed quia non audemus difficilia sunt
Home -> Publications
Home
  Publications
    
edited volumes
  Awards
  Research
  Teaching
  Miscellaneous
  Full CV [pdf]
  BLOG






  Events








  Past Events





Publications of Torsten Hoefler
Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler:

 Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs

(In Proceedings of the 33nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'21), Jul. 2021)

Abstract

Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches either study specific algorithms individually, disallow programmatic motifs such as recomputation, or produce asymptotic bounds that exclude important constants. We propose a novel approach for obtaining precise I/O lower bounds on a general class of programs, which we call Simple Overlap Access Programs (SOAP). SOAP analysis covers a wide variety of algorithms, from ubiquitous computational kernels to full scientific computing applications. Using the red-blue pebble game and combinatorial methods, we are able to bound the I/O of the SOAP-induced Computational Directed Acyclic Graph (CDAG), taking into account multiple statements, input/output reuse, and optimal tiling. To deal with programs that are outside of our representation (e.g., non-injective access functions), we describe methods to approximate them with SOAP. To demonstrate our method, we analyze 38 different applications, including kernels from the Polybench benchmark suite, deep learning operators, and -- for the first time -- applications in unstructured physics simulations, numerical weather prediction stencil compositions, and full deep neural networks. We derive tight I/O bounds for several linear algebra kernels, such as Cholesky decomposition, improving the existing reported bounds by a factor of two. For stencil applications, we improve the existing bounds by a factor of up to 14. We implement our method as an open-source tool, which can derive lower bounds directly from provided C code.

Documents

download article:     


Recorded talk (best effort)

 

BibTeX

@inproceedings{,
  author={Grzegorz Kwasniewski and Tal Ben-Nun and Lukas Gianinazzi and Alexandru Calotoiu and Timo Schneider and Alexandros Nikolaos Ziogas and Maciej Besta and Torsten Hoefler},
  title={{Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs}},
  year={2021},
  month={Jul.},
  booktitle={Proceedings of the 33nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'21)},
  source={http://www.unixer.de/~htor/publications/},
}


serving: 3.21.244.240:36178© Torsten Hoefler