Labo ID

The tracing procedure

Tracing involves five phases : preprocessing, clock sampling, execution, clock sampling and post mortem trace collection.

1) Preprocessing, compiling and linking

A program to be traced has to be compiled with the specific tracing Makefile, compiling and linking commands. Preprocessing performed during the compilation step replaces each call to an Athapascan function by a call to an instrumented function.

For instance, each a0Send will be replaced by _a0_perf_Send (the instrumented function) which in turn calls _a0Send (the non instrumented function).

Object files are then linked with the instrumented library containing both normal and instrumented functions.

2) Clock sampling before execution (optional)

Parallel systems providing a global clock on all nodes do not require the sampling step.
Without a global clock, the local clocks at nodes are not coherent. In Athapscan-tr, a global clock is implemented by software by correcting the local clocks of the trace files post mortem. This correction step uses an estimation of the clock drifts computed during a samplig phase prior to the program execution:

prompt% a0clock -a0procs=nb_nodes -of=clock_file1 -nbp=nb_points -ws=win_size -delay=time

Clock_file1 is the name of the clock drift file to create.

Nb_points and time define the number of samples and the delay (in microseonds) between two samples points.

The smoothing window retains the value of the win_size (typically 2 to 5) last samples. In order to remove erratic samples, each sample is replaced by the smoothing window median value.

Use a0decode -clock_file=clock_file1 in step 5.

3) Execution

Traced programs are executed similarly to the normal ones, with some extra parameters:

a0run my_prog -a0procs=nb_nodes -a0trace_file=tfile -a0nbuf=nb_buf -a0sbuf=bufsize

Nb_buffer and bufsize are the number and size (in bytes) of the trace buffers and tfile defines the name of the trace files (/TMP/perf_trace by default) to build.

Put your trace file on a local file system (such as /tmp) rather than a remote one to reduce network file system daemons (NFS) activity during your program execution. Check you really have rights to write the trace file, especially when using the default file name.

Use at least as many buffers as the maximum number of concurrent threads (add a dozen of internal dameons to your own threads).

4) Clock sampling after execution (optional)

A second clock sampling step after program execution produces more accurate results. It takes the result of step 3 as input:

prompt% a0clock -a0procs=nb_nodes -of=clock_file2 -nbp=nb_points -ws=win_s ize -delay=time -if=clock_file1

Use a0decode -clock_file=clock_file2 in step 5

5) Post mortem trace processing

Program execution creates a set of trace files tfile.x (where tfile is the trace file name passed to a0run and x is the node number). Collect all trace files:

prompt% rcp my_node_0:tfile.0 . prompt% rcp my_node_1:tfile.1 . ... prompt% rcp my_node_n-1:tfile.n-1 .

Then decode, merge and sort node trace files in a single ASCII trace file:

a0decode tfile.* > my_ascii_file.trace

Use my_ascii_trace_file.trace as input file for Paje.

Chercher	Pratique	Ecrire à l´administrateur

		dernière mise à jour : 08 janvier 2003