Scheduling Algorithms for new Emerging Applications

May, 29th - June, 2nd 2006, CIRM, Marseille, France

Scheduling trees of tasks for parallel mutifrontal methods

SpeakerAbdou Guermouche

Co-authorsOlivier Beaumont, Pierre-François Dutot and Denis Trystram

We consider the direct solution of large sparse systems of linear equations of the form Ax = b on distributed memory parallel computers using multifrontal Gaussian elimination. For an unsymmetric matrix, we compute its LU factorization; if the matrix is symmetric, its LDLT factorization is computed. Because of numerical stability, pivoting may be required.

The multifrontal method was initially developed for indefinite sparse symmetric linear systems and was then extended to unsymmetric matrices.

The factorization of the matrix is done by performing a succession of partial factorizations of small dense matrices called frontal matrices, that are associated to the nodes of a so-called assembly tree representing the dependencies between tasks. The frontal matrix is divided into two parts: the factor block, also called fully summed block, which corresponds to the variables factorized when the elimination algorithm processes the frontal matrix; and the contribution block, which corresponds to the variables updated when processing the frontal matrix. Once the partial factorization is completed, the contribution block is passed to the parent node. When contributions from all children are available on the parent, they can be assembled (i.e. summed with the values contained in the frontal matrix of the parent). The elimination algorithm is a postorder traversal (we do not process parent nodes before their children) of the assembly tree.

In the parallel context, the frontal matrices may be treated either on a single processor, or with a parallel factorization scheme. Thus, it is important to be able to efficiently map the nodes of the assembly tree on the set of working processors. This can be viewed as a problem of mapping a tree of malleable tasks over a set of computational resources. We study in our context the behaviour of two of the state-of-the-art techniques achieving this goal: The Algorithm presented by Prasanna and Musicus in for scheduling a tree of malleable tasks over a set of processors, and an application of the techniques proposed by Skutella in dealing with the time-cost tradeoff problem. This work will be done in the context of the parallel mutifrontal solver MUMPS.