M2R Scientific Methodology and Performance Evaluation
Table of Contents
Sitemap
Scientific Methodology and Performance Evaluation
General Informations
The coordinator for these lectures is Jean-Marc Vincent . The lecturers are Jean-Marc Vincent and Arnaud Legrand .
Lectures take place generally on Monday morning from 9:15 to 12:45 (generally, the lecture will take less than 3 hours and we will make more lectures).
The planning with lecture rooms is available at the ADE website (look for PDES and then for the Workshop on Performance Evaluation).
Objectives
The aim of this course is to provide the fundamental basis for sound scientific methodology of performance evaluation of computer systems. Two approaches are developed:
- performance measurement: based on experimental platforms (benchmarks or owner instrumented code execution), how to analyze data and synthesize performance indexes
- performance modeling: from a description of resources and the behavior of applications, how to predict the performance of the application
Here are links to the previous editions of this lecture: 2011-2012, 2012-2013, 2013-2014.
Program and expected schedule
13 September 2014 (09:45 - 12:45): Arnaud Legrand, D211
Introduction to reproducible research, how to manage a laboratory notebook and report results.Documents:
slidesReferences:
A few interesting git tutorials:- http://try.github.com/
- https://try.github.io/levels/1/challenges/1
- http://gitimmersion.com/
- http://git-scm.com/docs/gittutorial
And of course, a link to the org-mode project.
To do for the next time
: Conduct experiments with a parallel implementation of quicksort. The archive with all the source files is available on github.- Fork this project on Github: https://github.com/alegrand/M2R-ParallelQuicksort
- Experiment this code on various environments (laptop, G5K, …)
- Take notes on what you did and push back your journal on github
- Create a synthetic one page IMRAD report
- No lecture on the 20 and on the 27 of October 2014!
3 November 2014 (09:45 - 12:45): Arnaud Legrand, D113
Data presentation, reporting results. Introduction to visualizing data with R and reporting results.Documents:
slides and second part of 2011 lecture if time allows.References:
- R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling Wiley- Interscience, New York, NY, April 1991.
- Brief introduction to visualisation using R, ggplot, knitr/Rmd, Rstudio.
- http://www-958.ibm.com/software/analytics/manyeyes/
To do for the next time
: Improve your previous presentation of experimental results. Possibly
10 November 2013 (09:45 - 12:45): Jean-Marc Vincent, D113
Basic notions of statistics: esperance and variance estimation, confidence interval, distribution comparison. Checking whether the hypothesis apply or not.Documents:
./M2R_EP_Lecture3.pdfGoal:
understand how to deal with randomnessTo do for the next time
: Use what you learnt to improve previous work! :) Make sure you do not run into issues such as:- Crappy data
- Inadequate data
- Temporal dependencies
17 November 2013 (13:45 - 15:45): Jean-Marc Vincent, D213
More advanced notions of statistics: linear RegressionDocuments:
slidesGoals:
- Understand the strengths and limitations of the linear model
To do for the next time
:
- No lecture on the 24 November 2014!
1 December 2014 (09:45 - 12:45): Arnaud Legrand, ???
Measurement on computer systems (benchmarking, observation, tracing, monitoring, profiling).Documents:
Goal:
become fully aware of variability, the need to replicate, experimental setup (compiling, machine, input workload), timing resolution, warmup, randomization, …To do for the next time
: Use what you learnt to improve previous work! :) Try to assess the importance of gcc optimization, of loop unrolling, of input workload, … Try to understand why your performances are so bad.
8 December 2014 (09:45 - 12:45): Arnaud Legrand and Jean-Marc Vincent, ???
Mini-defense of students.
Using R
Installing R and Rstudio
Here is how to proceed on debian-based distributions:
sudo apt-get install r-base r-cran-ggplot2 r-cran-reshape
Rstudio and knitr are unfortunately not packaged within debian so the easiest is to download the corresponding debian package on the Rstudio webpage and then to install it manually (depending on when you do this, you can obviously change the version number).
wget http://download1.rstudio.org/rstudio-0.97.551-amd64.deb ## actually, this archive is likely to be outdated now so get the most recent one. sudo dpkg -i rstudio-0.97.551-amd64.deb sudo apt-get -f install # to fix possibly missing dependencies
You will also need to install knitr. To this end, you should simply run R (or Rstudio) and use the following command.
install.packages("knitr")
If r-cran-ggplot2
or r-cran-reshape
could not be installed for
some reason, you can also install it through R by doing:
install.packages("ggplot2") install.packages("reshape")
Producing documents
The easiest way to go is probably to use R+Markdown (Rmd files) in Rstudio and to export them via Rpubs to make available whatever you want.
We can roughly distinguish between three kinds of documents:
- Lab notebook (with everything you try and that is meant mainly for yourself)
- Experimental report (selected results and explanations with enough details to discuss with your advisor)
- Result description (rather short with only the main point and, which could be embedded in an article)
We expect you to provide us the last two ones and to make them publicly available so as to allow others to comment on them.
Documentation
For a quick start, you may want to look at R for Beginners (French version). A probably more entertaining way to go is to follow a good online lecture providing an introduction to R and to data analysis such as this one: https://www.coursera.org/course/compdata
Bibliography
- R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley- Interscience, New York, NY, April 1991.
- Jean-Yves Le Boudec. Methods, practice and theory for the performance evaluation of computer and communication systems, 2006. EPFL electronic book.
- David J. Lilja, Measuring Computer Performance: A Practitioner’s Guide, Cambridge University Press 2005
- R. Nelson, Probability stochastic processes and queuing theory: the mathematics of computer performance modeling. Springer Verlag 1995