Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000

Xavier Cavin 1 Laurent Alonso Jean-Claude Paul
1 ISA - Models, algorithms and geometry for computer graphics and vision
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Hierarchical algorithms form a class of applications widely being used in high-performance scientific computing, due to their capability to solve very large physical problems. They are based on the physical property that the further two points are, the less they influence each other. However, their irregular and dynamic characteristics make parallelizing them efficiently a challenge. Indeed, two conflicting objectives have to be taken into account: load balancing and data locality. It has been shown that the message passing paradigm was not well suited for this kind of applications, because of the intensive communication they introduce. Implicit communication through a shared address space appears to be better adapted. Particularly, the ccNUMA architecture of the Origin2000 can help us getting the desired data locality through its memory hierarchy. We have experimented a parallel implementation of a well known computer graphics hierarchical algorithm: the wavelet radiosity. This algorithm is a very efficient approach to compute global illumination in diffuse environments but still remains too much time and memory consuming when dealing with extremely complex models. Our parallel algorithm focuses on load balancing optimization and heavily relies on the ccNUMA architecture efficiency for data locality. Load balancing is handled with a general dynamic tasking mechanism with specific improvements. Minimal efforts are made towards memory management (like the writing of thread-safe non-blocking malloc/free C functionalities) and the Origin2000 proves all its capabilities to efficiently handle the natural data locality of our application. Our best results yield a speed-up of 24 with 36 processors. Moreover, we were able to compute the illumination of a complex scene (a cloister in Quito, composed of 54789 initial surfaces and leading to 600000 final meshes) in 2 hours 41 minutes with 24 processors. To the knowledge of the authors, this is the most complex "real world" scene ever computed.
Type de document :
Communication dans un congrès
Fourth European CRAY-SGI MPP Workshop, 1998, Garching/Munich, Germany, IPP, R/46 (R/46), pp.178-187, 1998
Liste complète des métadonnées

https://hal.inria.fr/inria-00098705
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 08:18:27
Dernière modification le : jeudi 11 janvier 2018 - 06:19:48
Document(s) archivé(s) le : mercredi 29 mars 2017 - 12:34:55

Fichiers

Identifiants

  • HAL Id : inria-00098705, version 1

Collections

Citation

Xavier Cavin, Laurent Alonso, Jean-Claude Paul. Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000. Fourth European CRAY-SGI MPP Workshop, 1998, Garching/Munich, Germany, IPP, R/46 (R/46), pp.178-187, 1998. 〈inria-00098705〉

Partager

Métriques

Consultations de la notice

244

Téléchargements de fichiers

47