Using GraphTool
This page is a short tutorial on how to generate graphs using CMS's GraphTool application. It covers only the graphing interface, and is meant for users who do not wish to use the XML configuration interface.
Introduction
GraphTool is a plotting library based on the venerable matplotlib (matplotlib.sf.net). A couple of the high points include:
- The ability to be a "complete package". A user with a SQL database can specify a few configuration files, and end up with nice looking graphs and a rudimentary web interface.
- Produces "paper quality" graphs. We wish to produce graphs which can be taken directly from the web and given to managers to put into Powerpoint presentations, and eventually have graphs which can be put into scientific papers. While there are always personal preferences, we hope even the basic plots produce pleasing images.
GraphTool Installation
The GraphTool installation procedure is documented in detail here.GraphTool Organization
If you're interested, here's the organization of the folders. This section can be skipped.- $GRAPHTOOL_ROOT
- examples - Simple "whole package" example provided by Ricky Egeland
- config - XML Configuration files for example
- setup.sh - file to configure environment
- tools - Scripts to start up the web application server.
- README - readme provided by Ricky Egeland. Currently rather terse, but to be expanded.
- setup.py (don't use this for now; in the future, we might make this functional or drop it completely)
- src - source code for GraphTool
- static_content - static content needed for the web interface, as well as the CMS watermark currently used.
Hello (Graphing) World!
The simplest kind of plots we will consider today is a pie chart.First, here's the entire example. Below, we'll cover things line-by-line:
$ pythonThe resulting graph should look something like this:
Python 2.4.1 (#2, Mar 31 2005, 00:05:10)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from graphtool.graphs.common_graphs import PieGraph
>>> file = '/tmp/hello.png'
>>> data = {'foo':45, 'bar':55}
>>> metadata = {'title':'Hello Graphing World!'}
>>> pie = PieGraph()
>>> coords = pie.run( data, file, metadata )

We'll talk about how to resize it to fit your needs later.
First, a line-by-line analysis.
- Start the python interpreter:
$ python
- Import the common graphs:
>>> from graphtool.graphs.common_graphs import PieGraph
- Pick a filename to save the image to:
>>> file = '/tmp/hello.png'
- Create some random data. The PieGraph expects a simple dictionary of values as data.
>>> data = {'foo':45, 'bar':55} - Create a dictionary to hold the metadata, such as title or the units the graph is in:
>>> metadata = {'title':'Hello Graphing World!'} - Create the graphing object:
>>> pie = PieGraph()
- Finally, plot the graph. Notice that the run method's arguments are ( data, file, metadata ), and that it returns a coordinates dictionary. This dictionary is full of the screen coordinates of each slice of the pie. It can easily be converted to an HTML image map, if you chose to plot the images on the web.
>>> coords = pie.run( data, file, metadata )
Open up the file, and admire your handiwork! Luckily, it doesn't get too much harder than this.
Note - if you think the graph layout just absolutely sucks, let us know! While we have specific reasons behind many of our decisions, we are welcome to future improvements. Additionally, we can help you create customized graphs.
Bar Graphs
The BarGraph example is similar to the PieGraph. It is done in these steps:
- Define the data object
- Add metadata to the object
- Create the file object
- Create and run the Graph object
Here is the code:
#!/usr/bin/env python
import os
from graphtool.graphs.common_graphs import BarGraph
from graphtool.tools.common import expand_string
# First example - we have a bar graph with text labels
data = {'Team A':4, 'Team B':7}
print data
metadata = {'title':'First Stacked Bar Example', 'height':200, 'width':400,
'title_size':20, 'text_size':17}
file = open(expand_string('$HOME/tmp/bar_strings.png',os.environ),'w')
BG = BarGraph()
BG(data, file, metadata)
Here is the output graph:

The data for the BarGraph is a simple python dictionary. The keys are the bar position and the values are the size of the bar. If one of the keys is a string object, then the grapher will automatically add appropriate text labels.
Notice we are using the following metadata objects:
- title: The title to add to the graph.
- height: Height (in pixels) of the graph output.
- width: Width (in pixels) of the graph output. In the future, the user will be able to specify either pixel size or height in inches, plus DPI.
- title_size: Size (in pixels) of the title font.
- text_size: Size (in pixels) of the text font used in the graph.
We use the helper function expand_string to expand the environmental variables in the file name.
The following example is similar to the above one, except it places data at integer points.
#!/usr/bin/env python
import os
from graphtool.graphs.common_graphs import BarGraph
from graphtool.tools.common import expand_string
data = {}
for i in range(10):
data[i] = i**2
print data
metadata = {'title':'Second Bar Example'}
file = open(expand_string('$HOME/tmp/bar_ints.png',os.environ),'w')
BG = BarGraph()
BG(data, file, metadata)
Here is the output graph:

Histogram Example
The Histogram class is a simple way to histogram data. While this shouldn't be used for the SQL DB bindings (instead, use histogram_parser and BarGraph directly), it's very convenient for the python API.The Histogram expects its data as a list or array; compared to the BarGraph class, it only needs one additional metadata entry - nbins, an integer - which tells the histogramming function how many bins to make (the default value is 10).
Here is some example code which uses numpy's normal function to generate a random set of data according to the normal probability distribution function.
#!/usr/bin/env pythonThe resulting output graph is below:
import os
import sys
from graphtool.tools.common import expand_string
from graphtool.graphs.common_graphs import Histogram
import numpy
data = numpy.random.normal(size=100, loc=1.0)
metadata = {'title':'First Histogram Example', 'nbins': 15}
file = open(expand_string('$HOME/tmp/histogram.png',os.environ),'w')
HG = Histogram()
HG(data, file, metadata)

Stacked Bar Graph
The StackedBarGraph class implements a graph which has bars stacked on top of each other. This is useful for data which has two pivots - such as a site name and time information. Here is an example StackedBarGraph:
Example Code
import os
from graphtool.graphs.common_graphs import StackedBarGraph
from graphtool.tools.common import expand_string
# First example - we have a bar graph with text labels
entry1 = {'foo':3, 'bar':5}
entry2 = {'foo':4, 'bar':6}
data = {'Team A':entry1, 'Team B':entry2}
metadata = {'title':'First Stacked Bar Example'}
file = open(expand_string('$HOME/tmp/stacked_strings.png',os.environ),'w')
SBG = StackedBarGraph()
SBG(data, file, metadata)
Here is the output graph:

The data object used here is a dictionary-of-dictionaries. The outer dictionary describes the "pivot", which is used to stack the bars. The pivot from the above example is the team name. On the other hand, the inner dictionary keys provide the "grouping". In this case, the two groupings are slightly arbitrary - "foo" and "bar". For most common uses, the grouping will be time stamps that the bars correspond to.
Second Example
For the second example, we use integer data for the grouping.
import os
from graphtool.graphs.common_graphs import StackedBarGraph
from graphtool.tools.common import expand_string
entry1 = {}; entry2 = {}
for i in range(0,10,2):
entry1[i] = i**2
entry2[i] = i
data = {'Team A':entry1, 'Team B':entry2}
metadata = {'title':'Second Stacked Bar Example', 'span':2}
del entry1[6]; del entry2[6]; del entry2[8]
entry1[0] = 4
print data
file = open(expand_string('$HOME/tmp/stacked_ints.png',os.environ),'w')
SBG = StackedBarGraph()
SBG(data, file, metadata)
Graph:

There is one new entry for the metadata dictionary, span. This describes the width of each bar; in this case, 2. If the graph is a time plot, the width is the width in seconds, not the width of the bar.
Time Graph Example
Often, one wants a certain set of defaults to show time-evolution data. To this end, we have provided the TimeGraph mix-in. The time graph adds a subtitle stating the date range, selects appropriate tick marks for the x-axis, and correctly formats the x-axis labels.The TimeGraph example is a bit more difficult because the TimeGraph is a mix-in, not a stand-alone graph. Accordingly, in the user code, you have to mix the TimeGraph with a graph, such as BarGraph. For example, this class will create a bar graph with time on the x-axis:
class TimeBarGraph( TimeGraph, BarGraph ):That's all that is required! The TimeGraph methods will override the appropriate parts of the BarGraph to create the desired graph. The whole example is:
pass
import time, os, random, datetimeA few things of note:
from graphtool.graphs.common_graphs import BarGraph, StackedBarGraph
from graphtool.graphs.graph import TimeGraph
from graphtool.tools.common import expand_string
span = 3600
max_value = 40
# Generate our time series
def make_time_data( ):
end_time = time.time(); end_time -= end_time % span
begin_time = end_time - 24*span
data = {}
for i in range(begin_time, end_time, span):
data[i] = random.random()*max_value
return begin_time, end_time, data
# Our classes
class TimeBarGraph( TimeGraph, BarGraph ):
pass
class TimeStackedBarGraph( TimeGraph, StackedBarGraph ):
pass
# Bar graph stuff.
TBG = TimeBarGraph()
begin_time, end_time, data = make_time_data()
metadata = {'title':'Bar Graph w.r.t. Time', 'starttime':begin_time, 'endtime':end_time, 'span':span }
filename = expand_string('$HOME/tmp/time_bar.png',os.environ)
file = open( filename, 'w' )
TBG( data, file, metadata )
# Stacked Bar graph stuff.
TSBG = TimeStackedBarGraph()
begin_time, end_time, data1 = make_time_data()
begin_time, end_time, data2 = make_time_data()
data = {'Team A': data1, 'Team B': data2}
metadata = {'title':'Stacked Bar Graph w.r.t. Time', 'starttime':begin_time, 'endtime':end_time, 'span':span }
filename = expand_string('$HOME/tmp/time_stacked_bar.png',os.environ)
file = open( filename, 'w' )
TSBG( data, file, metadata )
- The make_time_data is just a helper function which returns a bunch of randomly-generated data for the last 24 hours. Usually, your application will generate the data. They should be in time bins of size $span, covering intervals ( i*$span + begin_time, (i+1)*$span + begin_time ) for i=0,1,2,... Further, it needs the data in unix timestamp format. If your application gives data which is not aligned (for example, it gives data for every 30 minutes, but sets the span to an hour), the resulting graph will not be correct.
- The begin time is always passed in the metadata as 'starttime'; the end time is always passed as 'endtime'. A value for 'span' is also required.


Cumulative Plot
This plot shows the cumulative entries with respect to time. It expects a dictionary-of-dictionaries (like the StackedBarGraph) as data, where the keys for the inner dictionaries are unix timestamps. The usage of the plot should be very much like for the TimeStackedBarGraph of the above example.
import time, os, random, datetime
from graphtool.graphs.common_graphs import CumulativeGraph
from graphtool.tools.common import expand_string
span = 3600
max_value = 40
# Generate our time series
def make_time_data( ):
end_time = time.time(); end_time -= end_time % span
begin_time = end_time - 24*span
data = {}
for i in range(begin_time, end_time, span):
data[i] = random.random()*max_value
return begin_time, end_time, data
# Create and plot cumulative plot.
CG = CumulativeGraph()
begin_time, end_time, data1 = make_time_data()
begin_time, end_time, data2 = make_time_data()
data = {'Team A': data1, 'Team B': data2}
metadata = {'title':'Some Cumulative Data', 'starttime':begin_time, 'endtime':end_time, 'span':span, 'is_cumulative':False }
filename = expand_string('$HOME/tmp/cumulative.png',os.environ)
file = open( filename, 'w' )
CG( data, file, metadata )
Note that we have reused make_time_data from above to generate random data again.
The following metadata entries are mandatory:
The following metadata entries are mandatory:
- starttime: The starting time of the data shown.
- endtime: The end time of the data.
- The axes will be trimmed to [starttime, endtime]. Make sure your desired data is within these points!
- span: The width of the time bins used.
- is_cumulative: Whether or not the data is already cumulative. If this is false, the data will be made cumulative by the object.

Scatter Plot
The ScatterPlot class is a tad more complex to use, but it is very powerful in that it allows you to specify an extra dimension of data. Each point in a series has an x and y coordinate and a size associated with it.The scatter plot, like the PieGraph, enforces a square axis in order to make things easier to read.
There are no special metadata attributes which go along with the scatter plot beyond the standard ones for pivot-group data.
Here is the scatter plot example code:
#!/usr/bin/env pythonNotice that the grouping keys are a 2-tuple; the first entry in the tuple is the x coordinate and the second is the y coordinate.
import os
import sys
sys.path.insert(0, os.path.expandvars('$GRAPHTOOL_ROOT/src'))
from graphtool.tools.common import expand_string
from graphtool.graphs.common_graphs import ScatterPlot
import numpy
numpts = 100
data = {}
for site in ['Site A', 'Site B']:
data1 = numpy.random.normal(size=numpts, loc=1.0)
data2 = numpy.random.normal(size=numpts, loc=1.0)
data3 = numpy.random.random(numpts)
data[site] = {}
for i in range(numpts):
data[site][data1[i], data2[i]] = .1*data3[i]
metadata = {'title':'Scatter Plot Example'}
file = open(expand_string('$HOME/tmp/scatter.png',os.environ),'w')
SP = ScatterPlot()
SP(data, file, metadata)
Here is the corresponding graph:

Query Map
The QueryMap class is a non-traditional graph, but has proved very helpful to display quality data for multiple sites versus time.
For example, we have this graph, used in CMS's PhEDEx application:
Each box represent an hour's worth of transfers. The closer a box for a site is to green, the higher success rate for that hour. The red boxes represent hours with poor success rates.
Example Code
#!/usr/bin/env pythonNotice that the quality map, like the stacked bar graph, takes as input a dictionary-of-dictionaries. The data values are expected to be a number between 0.0 and 1.0. However, they may also be a two-tuple representing the number of "successes" and number of "failures". The method using the two-tuples will be more useful later on as it produces a more usable web interface.
import time, os, random, datetime
from matplotlib.dates import num2date
from graphtool.graphs.common_graphs import QualityMap
from graphtool.tools.common import expand_string
span = 3600
# Generate our time series
def make_time_data( ):
end_time = time.time(); end_time -= end_time % span
begin_time = end_time - 24*span
data = {}
for i in range(begin_time, end_time, span):
data[i] = random.random()
return begin_time, end_time, data
QM = QualityMap()
# Data generation
full_data = {}
for i in range(10):
team_name = 'Team %i' % i
begin_time, end_time, data = make_time_data()
full_data[team_name] = data
metadata = {'title':'Quality Plot w.r.t. Time', 'starttime':begin_time,
'endtime':end_time, 'span':span }
filename = expand_string('$HOME/tmp/quality_map.png',os.environ)
file = open( filename, 'w' )
QM( full_data, file, metadata )
The following metadata entries are mandatory:
- starttime: The starting time of the data shown.
- endtime: The end time of the data.
- The axes will be trimmed to [starttime, endtime]. Make sure your desired data is within these points!
- span: The width of the time bins used, in seconds.
- color_override: A dictionary of color overrides. If the data entry is in the color_override dictionary, then the value from color_override will be the printed color. This is useful if you want to assign special color values - you can, for example, map -1 to the color "grey" if the value "-1" has some special significance.