# ASTEC 2009 Workshop/Summer school on Algorithms and Techniques for Scheduling on Clusters and Grids

## List of talks

### Around truthfulness in scheduling: VCG mechanism and preemption

SpeakerEvripides Bampis

Co-authorEric Angel, Fanny Pascual, Nicolas Thibault

Around truthfulness in scheduling: VCG mechanism and preemption

### Plenary talk : Scheduling Pipelined Applications: Models, Algorithms and Complexity

SpeakerAnne Benoit

Scheduling Pipelined Applications: Models, Algorithms and Complexity

In this talk, I explore the problem of scheduling pipelined applications onto large-scale distributed platforms, in order to optimize several criteria. A particular attention is given to the throughput maximization, (i.e., the number of data sets that can be processed every time unit), the latency minimization (i.e., the time required to process one data set entirely), and the failure probability minimization. First, I accurately define the models and the scheduling problems, and exhibit surprising results, such as the difficulty to compute the optimal throughput and/or latency that can be obtained given a mapping. In particular, I detail the importance of the communication model on the encountered difficulties. Second, I give an overview of complexity results for various cases, both for mono-criterion and for bi-criteria optimization problems. I illustrate the impact of the model on the problem complexity. Finally, I will show some extensions of this work to a different applicative context, such as filtering tasks (that can be found in Web services for instance).

slides(pdf)

### A Meta-Greedy Approach for Multiobjectives Combinatorial Optimization

SpeakerLouis-Claude Canon

"A Meta-Greedy Approach for Multiobjectives Combinatorial Optimization"

Optimization problems can often be tackled with respect to several objectives. In such cases, there can be several incomparable Pareto-optimal solutions. We propose an approach that refines any existing greedy algorithm or heuristic by considering a set of objectives at each step. We show that the modified method is able to generate, in a single execution, a set of non-dominated solutions that approximates the set of Pareto-optimal solutions.

slides(pdf)

### Distributed Learning of Equilibria in scheduling

SpeakerJohanne Cohen

Co-authorDominique Barth, Olivier Bournez, Octave Boussaton

Distributed Learning of Equilibria in scheduling

We study how to reach a Nash equilibrium in a load balancing scenario where each tash is managed by a selfish agent and attempts to migrate to a machine which will minimize its cost. The cost of a ma- chine is a function of the load on it. The load on a machine is the sum of the weights of the jobs running on it. We prove that mixed Nash equi- libria can be learned on that games with incomplete information, with some bounds on the time before convergence.

### Game Theoretic Analysis of the Multi-Organization Scheduling Problem

SpeakerDaniel Cordeiro

Co-authorDenis Trystram, Johanne Cohen and Frederic Wagner

Game Theoretic Analysis of the Multi-Organization Scheduling Problem

Current grid computing systems allow several users and/or organizations to combine their computation resources in order to produce a powerful distributed supercomputer infrastructure. Nevertheless, combining different performance goals of users with potentially different profiles is a challenging task. In this talk we will present our attempts to model this problem as a multi-objective scheduling problem. We will present some centralized algorithms with guaranteed performance bounds and our current work-in-progress to describe this problem as a game theoretic problem. We will show that in our current model this NP-Complete problem doesn't admit Nash Equilibriums when each organization is interested in minimizing its own maximum completion time or its own average completion time.

slides(pdf)

### Job scheduling in an agent-based Grid middleware

SpeakerMichal Drozdowicz

Job scheduling in an agent-based Grid middleware

slides(pdf)

### On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

SpeakerFanny Dufosse

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

In this paper, we explore the problem of mapping filtering services on large-scale heterogeneous platforms. Two important optimization criteria should be considered in such a framework. The period, which is the inverse of the throughput, measures the rate at which data sets can enter the system. The latency measures the response time of the system in order to process one single data set entirely. Both criteria are antagonistic. For homogeneous platforms, the complexity of period minimization is already known; we derive an algorithm to solve the latency minimization problem in the general case with service precedence constraints; we also show that the bi-criteria problem (latency minimization without exceeding a prescribed value for the period) is of polynomial complexity. However, when adding heterogeneity to the platform, we prove that minimizing the period or the latency becomes NP-complete, and that these problems cannot be approximated by any constant factor (unless P=NP). The latter results hold true even for services without precedence constraints.

slides(pdf)

### Scheduling on multi-core clusters

SpeakerEmilio Francesquini

Scheduling on multi-core clusters

Nowadays, with the advent of the multi-core technology, most of the modern computers have several cores. However, the current schedulers do not provide automatic tools to schedule tasks to the cores. Indeed, some of them consider each core as one whole node neglecting the advantages (and disadvantages) that the spatial proximity amongst the cores could provide. In this talk, we will show some of the solutions currently being employed to tackle these scheduling problems as well as propose some of our own ideas for a model which considers the multi-core context and its impact on the scheduling algorithms.

slides(pdf)

### Homogeneous versus hierarchical communication delay model : complexity and approximation

SpeakerRodolphe Giroudeau

Homogeneous versus hierarchical communication delay model:complexity and approximation

We propose an analysis point of view of the complexity and the approximation of the known results on the homogeneous model and the hierarchical model. In the homogeneous scheduling delay model, each arc $(i,j) \in E$ modelizes the potential data transfer between task $i$ and task $j$ provided that $i$ and $j$ are processed on two different processors. So the aim is to find a compromise between a sequential execution and a parallel execution. In the {\em hierarchical communication model}, we assume that the communication delays are not homogeneous anymore; the processors are connected in {\em clusters} and the communications inside a same cluster are much faster than those between processors belonging to different ones. More formally, given a set of clusters of identical processors, and a precedence graph $G=(V,E)$, we consider that if two communicating tasks are executed on the same processor (resp. on different processors of the same cluster) then the corresponding communication delay is neglected (resp. is equal to what we call {\em interprocessor communication delay}). On the contrary, if these tasks are executed on different clusters, then the communication delay is more significant and is called the {\em inter-cluster communication delay}. We are given $m$ multiprocessors machines (or clusters) that are used to process $n$ precedence-constrained tasks. Each machine (cluster) comprises several identical parallel processors. A couple $(c_{ij},\epsilon_{ij})$ of communication delays is associated to each arc $(i,j)$ between two tasks in the precedence graph. In what follows, $c_{ij}$ (resp. $\epsilon_{ij}$) is called inter-cluster (resp. interprocessor) communication, and we consider that $c_{ij} \geq \epsilon_{ij}$. Suppose the existence of arc $(i,j)$ in the precedence graph. If tasks $i$ and $j$ are executed on different machines $\Pi^i$ and $\Pi^j$, then $j$ must be processed at least $c_{ij}$ time units after the completion of $i$. Similarly, if $i$ and $j$ are executed on the same machine $\Pi$ but on different processors $\pi^i$ and $\pi^j$ then the processing of $j$ can only start $\epsilon_{ij}$ units of time after the completion of $i$. However, if $i$ and $j$ are executed on the same processor then $j$ can start immediately after the end of $i$. The communication overhead (inter-cluster or interprocessor delay) does not interfere with the availability of the processors and any processor may execute any task. Our goal is to find a feasible schedule of the tasks minimizing the {\em makespan}, i.e., the time needed to execute all the tasks subject to the precedence graph. We offer techniques and strategies to resist the passage of homogeneous hierarchical model. We look at the UET-UCT problem and large communication problem. Remarks: We can note that homogeneous model is exaclty the hierarchical model with only one machine (Small communication delay) or with only one processor per machine (large communication delay)

slides(pdf)

### Plenary talk : Approximation algorithms for knapsack and scheduling problems

SpeakerKlaus Jansen

Approximation algorithms for knapsack and scheduling problems.

We study two interesting problems from the field of knapsack and scheduling problems. In the Multiple Knapsack problem we are given a list of items, each described by a size and a profit, and several one-dimensional bins of possibly different capacities. The objective is to select a subset of the items which can be packed into the bins in order to maximize the total profit. We present an efficient polynomial time approximation scheme (EPTAS), i.e. a PTAS with efficient running time 2^{f(1/epsilon)} poly(n). The presentation of this EPTAS at SODA 2009 answers an open question posed by Chekuri and Khanna at SODA 2000. In scheduling with fixed Jobs, we are given a list of sequential jobs described by their processing times. Some of the jobs are already scheduled on a system of identical parallel machines. The goal is to schedule the non-fixed jobs without preemption and overlap; the objective is to minimize the makespan. We obtain an approximation algorithm with ratio 3/2. Our result, published at SODA 2009, matches the lower bound presented by Steger, Scharbrodt and Weisser presented at SODA 1999 ten years ago and improves the previously best known algorithm with ratio 2+epsilon.

slides(pdf)

### Plenary talk: Scheduling on Low-Power Multi- and Many-Cores

Scheduling on Low-Power Multi- and Many-Cores

This presentation consists of two parts. In the first part we will describe our leakage-aware multiprocessor scheduling algorithms. When peak performance is unnecessary, Dynamic Voltage/Frequency Scaling (DVFS) can be used to reduce the dynamic power consumption of embedded multiprocessors. In future technologies, however, the static power consumption due to leakage current is expected to increase significantly. Then it will be more effective to limit the number of processors employed (i.e., to turn some of them off), or to use a combination of DVFS and processor shutdown. Leakage-aware scheduling heuristics will be presented that determine the best trade-off between these three techniques: DVFS, processor shutdown, and finding the optimal number of processors. In addition, we will discuss the scope for improvement. In the second part we will describe our efforts to develop a highly scalable parallel implementation of H.264 decoding. A previous implementation that exploits intra-frame macroblock (MB) level parallelism scales to about 30 cores. We have developed a parallel algorithm that combines intra-frame and inter-frame MB-level parallelism and that scales to 100+ cores. However, several scheduling issues exist. First, because different MBs have different processing times, dynamic scheduling is a necessity. Second, to reduce the task management overhead and to increase data locality, adjacent MBs should be scheduled on the same core. We will discuss the possibilities of providing the scheduler with sufficient information to make the optimal scheduling decisions, relieving the programmer from this task.

slides(pdf)

### Scheduling Bag-of-Tasks Applications: Practical Environments and Theoretical Results and Perspectives

SpeakerArnaud Legrand

Scheduling Bag-of-Tasks Applications: Practical Environments and Theoretical Results and Perspectives.

In this talk, I will present the bag-of-tasks scheduling problem in a general setting and try to highlight its difficulties both on a theoretical side and on a practical side. To this end, I will briefly present various theoretical results and how practical environments handle these difficulties.

slides(pdf)

### Impact of disturbance in scheduling with unavailability periods

SpeakerAmine Mahjoub

Co-authorGregory Mounie, Adel Safi and Denis Trystram

Impact of disturbance in scheduling with unavailability periods

### Efficient Scheduling of Task Graph Collections on Heterogeneous Resources

SpeakerLoris Marchal

Efficient Scheduling of Task Graph Collections on Heterogeneous Resources

In this work, we focus on scheduling jobs on computing Grids. In our model, a Grid job is made of a large collection of input data sets, which must all be processed by the same task graph or workflow, thus resulting in a collection of task graphs, problem. We are looking for a competitive scheduling algorithm not requiring complex control. We thus only consider single-allocation strategies. In addition to a mixed linear programming approach to find an optimal allocation, we present different heuristic schemes. Then, using simulations, we compare the performance of our different heuristics to the performance of a classical scheduling policy in Grids, HEFT. The results show that some of our static-scheduling policies take advantage of their platform and application knowledge and outperform HEFT, especially under communication-intensive scenarios.

slides(pdf)

### Comparison of Program Task Scheduling Algorithms for Dynamic SMP Clusters with Communication on the Fly

Co-authorMarek Tudruj, Gregory Mounie, Denis Trytram

Comparison of Program Task Scheduling Algorithms for Dynamic SMP Clusters with Communication on the Fly

The paper presents a comparison of the two novel scheduling algorithms developed for program structurization for execution in dynamic SMP clusters implemented in Systems on Chip (SoC) technology. SoC modules are built of a set of processors, memory modules and a multi-bus interconnection network. A set of such SoCs is interconnected by a global communication network. Inter-processor communication inside SoC modules communicate using a novel technique of data transfers on the fly. The algorithms present two different scheduling approaches. The first uses ETF-based genetically supported list scheduling heuristics to map nodes of a program to processors, then applies program transformations to introduce data transfers on the fly. The second algorithm is a clustering-based algorithm using Moldable Tasks (MT) to structure the graph. Inside each MT, program transformations are applied, which introduce data transfers on the fly. The algorithms were tested using a set of automatically generated parameterized program graphs. The obtained results were compared to results obtained using a classic ETF-based list scheduling without data transmissions on the fly.

slides(pdf)

### Scheduling Multi-User Periodic Arrival of Tasks : Two Linear Programming Formulations

SpeakerEmmanuel Medernach

Scheduling Multi-User Periodic Arrival of Tasks : Two Linear Programming Formulations

We are studying cyclic scheduling with many users. Users share a set of homogeneous machines. Each user submits periodically a set of independent tasks. We approach this problem with two linear programs: The first one models the scheduling of a transitional phase until the establishment of the permanent flow. The second model works directly on the periodical pattern. We take into account the fairness between users to prevent few users bringing system congestion. For this we suggest several criteria for measuring the load induced by users tasks.

slides(pdf)

SpeakerRichard Olejnik

slides(pdf)

### Approximation algorithms for Muliple Strip Packing

SpeakerChristina Otte

Co-authorK. Jansen

Approximation algorithms for Muliple Strip Packing

We study the Multiple Strip Packing (MSP) problem, a generalization of the well-known Strip Packing problem. For a given set of rectangles, $r_1,\ldots,r_n$, with heights and widths $\le 1$, the goal is to find a non-overlapping orthogonal packing without rotations into $k\in\mathbb{N}$ strips $[0,1]\times[0,\infty)$, minimizing the maximum of the heights. We present an approximation algorithm with absolute ratio $2$, which is the best possible, unless ${\cal P}={\cal NP}$, and an improvement of the further best result with ratio $2+\EPS$. Furthermore we present simple shelf-based algorithms with short running-time and an AFPTAS for MSP. Since MSP is strongly ${\cal NP}$-hard, an FPTAS is ruled out and an AFPTAS is also the best possible result in the sense of approximation theory.

slides(pdf)

### 30 min talk. Title?

SpeakerJonathan Pecero

No abstract yet

### Energy-aware scheduling of flow applications on master-worker platforms

SpeakerJean-Francois Pineau

Energy-aware scheduling of flow applications on master-worker platforms

In this work, we consider the problem of scheduling an application composed of independent tasks on a fully heterogeneous master-worker platform with communication costs. We introduce a bi-criteria approach aiming at maximizing the throughput of the application while minimizing the energy consumed by participating resources. Assuming arbitrary super-linear power consumption laws, we investigate different models for energy consumption, with and without start-up overheads. Building upon closed-form expressions for the uniprocessor case, we are able to derive optimal or asymptotically optimal solutions for both models.

slides(pdf)

### On Tolerating Unrecoverable Interruptions via Task Replication

SpeakerYves Robert

Co-authorAnne Benoit, Arnold Rosenberg and Frederic Vivien

On Tolerating Unrecoverable Interruptions via Task Replication

We have access to $p$ identical computers to compute a large diviible workload. These computers are subject to unrecoverable interruptions (that cause to lose all work currently in progress on an interrupted computer). The scheduling tool that we employ to cope with these interruptions is task replication. We assume a priori knowledge of the probability of a computer's being interrupted. We also assume that the risk of a computer's being interrupted increases with the time that it operates, an assumption that is certainly reasonable when interruptions result from hardware failures or from returning owners in cycle-stealing scenarios. The goal is to maximize the expected amount of work that gets computed by the $p$ computers, no matter which, or how many computers get interrupted. We strive to use judicious task replication strategies, and we develop guidelines for crafting efficient schedules that provably maximize the expected amount of work done by the remote computers.

slides(pdf)

### Plenary talk: Robust approaches for scheduling under uncertainties : the RobOCoop project and its relevance for grid computing

SpeakerEric Sanlaville

Robust approaches for scheduling under uncertainties: the RobOCoop project and its relevance for grid computing

This presentation will begin with a short overview of scheduling under uncertainties: application domains and uncertainty sources, typology of methods. It will then focus on the RobOCoop project, labelled in 2008 by the french agency ANR . This project, Robustness and Cooperation in Scheduling, has four partners: the french research laboratories LI Tours, LAAS Toulouse and LITIS Le Havre, and the ILOG-IBM enterprise (ILOG scheduler team). It aims at studying the robust and cooperative approaches in all relevant domains of scheduling under uncertainties, mainly project scheduling, shop scheduling, grid scheduling. One important statement is that in many cases, a form of cooperation will prove useful, or even unavoidable to take uncertainties into account. The concrete objective of the project is to design, build and deliver to the international research community an evaluation tool of scheduling methods. The presentation will focus on some critical aspects of the design of the tool: general software architecture, design of specific benchmark, performance indicators. In the case of grid scheduling, some specific characteristics will be emphasized, concerning the production grid [1]. The need for a cooperative approach will be stressed. The problem can be seen as a dynamic load balancing problem; some approaches will be shortly discussed.

[1]: J.-C. Billaut, A. Moukrim, E. Sanlaville (editors). Flexibility and Robustness in Scheduling. ISTE and Wiley publishers (Control Systems, Robotics and Manufacturing Series).

slides(pdf)

### Plenary talk: Multi-objective optimization/approximation in scheduling

SpeakerErik Saule

Multi-objective optimization/approximation in scheduling.

Classical scheduling theory is interested in finding a schedule a schedule that optimize one criterion. However, the reality is most of the time more complex and decision are made according to several criteria. This talk presents the key concepts in multi objective optimization which will be illustrated on three different problems. Since most practical problems can not be solved in polynomial time, the two kinds of approximation - that is zenith approximation and Pareto front optimization - will receive much interest.

slides(pdf)

### Applying Learning Classifier Systems for Multiprocessor Scheduling Problem

SpeakerFranciszek Seredynski

Applying Learning Classifier Systems for Multiprocessor Scheduling Problem

The problem of multiprocessor scheduling is considered where a parallel system and a parallel program are represented by non oriented graph and weighted oriented acyclic graph, respectively. The purpose of the scheduling is to minimize the total execution time of a program graph in a system graph. To solve the problem we propose a scheduling algorithm which uses an evolutionary driven learning machine called XCS (eXtended Classifier System), which manages a number of scheduling heuristics. In the learning phase of the scheduling algorithm XCS develops general rules to match instances of the scheduling problem with appropriate scheduling algorithms. In the normal operating phase the LCS uses the found rules to assign automatically a scheduling algorithm to a given instance of the problem to solve it in an optimal or suboptimal way.

slides(ppt)

### Batch Scheduling of Conflicting jobs

Batch Scheduling of Conflicting jobs

The classic p-batch scheduling problem is to partition a set of jobs to batches, each to be processed simultaneously on set of parallel machines. Many real-life applications impose restrictions on the subsets of jobs that can be processed simultaneously. Such conflicts among the jobs are often modeled by an undirected conflict graph $G$, where each job is represented by a vertex. The length of a vertex is the processing time of the corresponding job, and the edges in the graph represent the conflicts between pairs of jobs. In each time period we can process a batch of jobs that forms an independent set in $G$. Indeed, finding a batch schedule for the jobs is equivalent to finding a proper batch coloring of the graph $G$, where each batch is assigned a distinct contiguous set of colors, whose size is equal to the maximum length of any vertex in the batch. In this talk I will survey known results for the above problem of batch scheduling with conflicts, considering three different measures: minimizing the makespan of the batch schedule, minimum sum of job completion times and minimum sum of batch completion times.

Based on papers with Leah Epstein, Magn\'{u}s Halld\'{o}rsson and Asaf Levin.

slides(ppt)

### Integrality gap for the bin-packing problem

Integrality gap for the bin-packing problem

The bin packing problem may be understood as a problem of partitioning a given integral vector into the minimum number of feasible knapsack solutions. This yields an integer program, which is sometimes referred to as the Gilmore-Gomory formulation. We study the quality of this formulation; namely, the difference between its optimum value and the ceiling of the optimum value of the associated linear programming relaxation. For the case when the number of distinct item sizes does not exceed 7, we show that this difference is at most 1.

slides(pdf)

### Considering the real cost of communication in task scheduling

SpeakerOliver Sinnen

Considering the real cost of communication in task scheduling

Scheduling a task graph onto several processors is a trade-off between maximising concurrency and minimising interprocessor communication. It is already difficult to find a good trade-off under the classic scheduling model. Yet, in real parallel systems the contention for communication resources and the involvement of processors in communication play an important role. In this talk, models will be presented that allow the consideration of these aspects in scheduling. Given the increased importance of communication avoidance under these models, a task duplication algorithm for the contention model will also be discussed.

slides(pdf)

### Mapping regular iterative computations in a streaming model architecture

Co-authorMarek Tudruj

Mapping regular iterative computations in a streaming model architecture

The Finite Difference Time Domain (FDTD) method is an iterative algorithm used for simulation of the electromagnetic wave propagation in a given area. We propose a streaming model of the FDTD computations, which can be characterized by injection of small portions of data into computing nodes, fast processing and passing of the results either into the main storage or to next computational steps. The amount of data processed in one computational step must be adjusted to the memory size in each computing node. We achieve it by using a tiling approach in which a data flow graph representing computations is decomposed into a set of tiles. A single tile is processed on a single computing node and it defines atomic data transfers. In our case, each tile represents a small subset of FDTD computational cells. They are processed in a pipe manner: data are sent from the main memory to the local memory of given computational node, the computation are performed and the results are sent back to the main memory. In such a model, computations can easily overlap communication. In order to obtain satisfactory performance, the sizes and shapes of tiles have to match the architecture of the assumed executive system. The computation efficiency depends on the relation between the processing time of a single tile and the time needed to transfer a single tile data to and from a chosen computational node. We have tested, discussed and compared efficiency of computations for various parameters of communication system, wave propagation area shapes and for various sizes of tiles.

slides(pdf)

### 2 talks: Scheduling architecture-supported regions in parallel programs; Using extremal optimization for Java program initial placement in clusters of JVMs

SpeakerMarek Tudruj

Scheduling architecture-supported regions in parallel programs

It is quite often that a parallel or distributed executive system includes specialized processing units dedicated to accelerated execution of particular computational tasks, like peculiar library operations, special functions generation, optimized data search et similar. With current multicore system technology such dedicated units can be easilly designed to be parallel to increase their performance. Optimal processing of program code in such dedicated units can impose paticular requirements on the structure and behaviour of the involved embedded parts of programs. These requiements can concern particular start-up of computations in dedicated processors, synchrony or asynchrony of data supply and processing, elimination of intermediate communication towards the rest of the system, fully synchronous layered execution and similar. Parallel or distributed programs built for such executive architectural assumptions can be considered as built of well defined regions dedicated to execution in specialized parallel units bound with the rest of the program code by scripts (glue code) which define the necessary program execution interfaces. Scheduling programs to such defined executive systems requires special program representation and algorithms. The paper presents such representation based on graph approach and respective program graph scheduling algorithms, which are based on the ETF heuristics. The algorithms are meant for a modular architecture composed of many Chip-Multi-Processor modules interconnected by a global data communication network. The assumed architecture of dedicated CMP modules enables personalized fully synchronous program execution, which uses communication on the fly to strongly reduce inter-core communication overheads.

Co-authors:Eryk Laskowski, Ivanoe de Falco, Umberto Scafuri, Ernesto Tarantino, Richard Olejnik

Using extremal optimization for Java program initial placement in clusters of JVMs

no abstract yet.

slides(pdf)slides(ppt)