In companies, institutes or at home we generate large amounts types of data. A success of our business or research may depend on a proper handling of the data. We can extract meaningful patterns by applying data reduction and analysis techniques, but eventually we must present the data graphically. Many software packages allow to create a simple line plots or bar charts, but creating data-dense visualisations without distortions is still more of an art than science. The goal of the workshop is to arm you with practice-oriented tips that will help you to avoid clutter and increase the data density of your graphs.
matplotlib is a de facto standard in 2D plotting with Python in active development since 2003. The large number of visualisation types is not matched by any other plotting library available for Python. It can be used to create interactive visualisations, hard copy plots or standalone applications.
The tutorial will introduce the basic theory of data visualisation and put it in use through matplotlib. To unleash the full power of matplotlib, we will reach under the hood and discover some hidden gems in terms of customisation and working with visual primitives. The participants will be encouraged to practice their visualisations skills trough a series of examples. They will learn how to build complex data visualisations from ground up and spice them with a bit of interactivity.
0) Introduction
1) Visualising patterns over time
pyplot
interface2) Visualising proportions
3) Visualising distributions
4) Visualising correlations
5) Finding patterns
6) Making maps
See also slides, examples and exercises from my previous tutorial
Required software:
Optional software:
All of the libraries are available in standard linux distributions, via pip or in scientific python distributions (such as anaconda).
You can also install them with pip
(for example inside virtual environment):
mkdir ~/.virtualenv
virtualenv ~/.virtualenv/ep2014_mpl #crete virtualenv
source ~/.virtualenv/ep2014_mpl/bin/activate #activate it (in bash)
pip install -U numpy scipy matplotlib #absoulte basics
pip install -U ipython pyzmq jinja2 #for ipython notebook
pip install -U pandas #used in many examples
pip install -U scikit-learn lxml patsy #extras
pip install -U statsmodels
Bartosz Telenczuk has been active Python user since 2005. He is creator of svgutils and he has contributed to many open source Python libraries including numpy and matplotlib. He is also a Python advocate and co-organiser of advanced Python schools for scientists. Currently he is a researcher in France, developing methods to interpret the electrical activity of the brain.