1. Introduction

The currently traditional parallel programming approaches, as communicating processes and data parallelism, can easily exploit hardware parallelism when the computation can be split into smaller equally weighted computation processes, with good data locality. This type of parallel computation is called regular. Simple mapping and scheduling algorithms can be used to exploit satisfactorily the hardware parallelism. The same does not happen if the computations do not have equivalent sizes (or they are not previously known) or do not have good data locality properties. Parallel computations like that are called irregular.

Traditional parallel programming approaches, such as communicating processes (PVM, MPI) and data partitioning (HPF), face the same problem differently. The former offers low-level programming tools, allowing to write a large class of algorithms and to handle efficiently irregular problems. Nevertheless, a considerable programming effort is necessary, because the programmer must predict all possible cases of unbalance, in order to adapt the strategy of data and computation distribution. The latter approach is not currently adapted to process irregular problems, and leads, in this case, to non-efficient programs.

The programming model of Athapascan-0 exploits two levels of parallelism. The first level is the parallelism between computers, implemented with system processes. It is a coarse grain parallelism, due to the operating system overhead, the lack of physically shared memory and the low speed of the computer network if compared to the memory bus. The second level of parallelism appears inside a computer, between a computation processor and its attached communication processor or between processors in a symmetric multiprocessor machine.

To achieve better performance, the fine granularity must be implemented by inexpensive mechanisms. We propose the use of threads as the execution support of a task, with local and remote thread handling functions. With this approach, an irregular algorithm is expressed as a dynamic network of communicating threads.

Parallel machines are considered as generic networks of heterogeneous symmetric multiprocessors, with remote thread creation and message exchange. Athapascan-0 can dynamically map fine granularity computations on such machines and overlap communication delays with computation. These features allow to perform better load sharing and therefore to achieve better performance.