Personal tools
You are here: Home Documentation GraphTool GraphTool Advanced
Document Actions

GraphTool Advanced

by admin last modified 2008-02-06 16:15

This page covers some of the more advanced XML configurations. This allows the full chain (SQL DB -> python -> GraphTool API -> cherrypy -> webpage) to be worked.

Note:

This page is not done - but it's getting there!  If you want to learn how to setup a graphing interface, a good place to start is here and the examples/ directory of the GraphTool source code.


The point of this page is to explore some of the more advanced GraphTool functionality.  Other tutorials cover using the Python API to generate graphs.  This tutorial will cover the simple GraphTool XML language which allows you to specify the SQL queries and a few bits of metadata and come out with a working web interface to your database.

The Big Picture

Everything can be put together through a XML config file.  The files allow you to create object, import other XML files, and import python modules.  Here's a simple config file which does absolutely nothing except import a few other modules:

<graphtool-config>

  <import module="graphtool.web"> WebHost, StaticContent, HelloWorld </import>
  <import file="$CONFIG_ROOT/gratia_graphs.xml" />
  <import file="$CONFIG_ROOT/text_queries.xml" />

</graphtool-config>

The <import file="..."/> tag imports another XML config file; note that environmental variables are expanded.  The other import tag imports a python module.  Use a star (*) for everything or specify objects.  To create an object, use the <class/> tag:

  <class name="static" type="StaticContent">
    <directory name="content"> $GRAPHTOOL_ROOT/static_content </directory>
  </class>

Specify the name ("static") and the type ("StaticContent").  The namespace for graphtool is flat, so make sure that your object names are unique!

In order to make a web interface, you must have the following objects (we will cover each of these later):
  • WebHost object, which sets up the CherryPy webserver and mounts GraphTool objects onto web paths.  The WebHost object takes a separate CherryPy configuration file (see cherrypy.org for more info).
  • The XmlGenerator object which will take the results from the SQL query and output a webpage.
  • The ConnectionManager object which takes the connection parameters to query the database.
  • SqlQueries object, which allows you to specify the SQL query for your web application; note that these can inherit from other queries.  This way, we can have one generic query and not have to rewrite the SQL each time.
  • The Grapher object which builds and caches graphs from one or more SqlQueries objects.

ConnectionManager

The connection manager (graphtool.database.connection_manager) lets you specify one or more database connections.  For example, the below one specifies a MySQL database named OSG_020_0 and sets it as the default connection for this object.
<graphtool-config>

  <import module="graphtool.database.connection_manager"> ConnectionManager </import>

  <class name="ProdConnMan" type="ConnectionManager" default="OSG_020_0">

    <attribute name="default"> OSG_primary </attribute>

    <connection name="OSG_020_0">
      <attribute name="Interface"> MySQL </attribute>
      <attribute name="Database"> OSG_020_0 </attribute>
      <attribute name="Host"> localhost </attribute>
      <attribute name="Port"> 49152 </attribute>
      <attribute name="AuthDBUsername"> brian </attribute>
      <attribute name="AuthDBPassword"> ***** </attribute>
    </connection>

  </class>

</graphtool-config>
Note that there are usually passwords in this file, so make sure you keep it separate from the rest of your configuration and DO NOT COMMIT IT TO CVS!

SqlQueries

The SqlQueries object is the heart of the GraphTool configuration and requires the most customization to match your application.  It allows you to specify one or more queries; the queries can be abstract or inherit from another query.  Here is an example of an abstract query:
<graphtool-config>

  <import module="gratia.database.query_handler">*</import>
  <import module="graphtool.database.queries">SqlQueries</import>
  <import module="graphtool.database.query_handler">*</import>
  <import module="time"/>
  <import file="$HOME/DBParam.xml" />
  <import file="$CONFIG_ROOT/security.xml" />

  <class type="SqlQueries" name="GratiaGenericQuery">

    <attribute name="connection_manager"> GratiaConnMan </attribute>

    <aggregate>
      <connection> gratia </connection>
    </aggregate>

    <query name="simple_query">
      <inputs>
        <input name="span" type="int" kind="sql">3600</input>
        <input name="starttime" type="datetime" kind="sql">time.time()-2*86400</input>
        <input name="endtime" type="datetime" kind="sql">time.time()</input>
        <input name="facility" kind="sql"> .* </input>
        <input name="vo" kind="sql"> .* </input>
      </inputs>
      <sql>
        SELECT
          <slot name="group"/>,
          <slot name="column"/>
        FROM
          JobUsageSimple
        JOIN
          CEProbes ON JobUsageSimple.ProbeName = CEProbes.probename
        JOIN
          CETable ON CEProbes.facility_id = CETable.facility_id
        <slot name="JOIN"/>
        WHERE
          EndTime &gt;= :starttime AND
          EndTime &lt; :endtime AND
          CETable.facility_name regexp :facility AND
          ReportableVOName regexp :vo
          <slot name="where"/>
        GROUP BY
          <slot name="group"/>
        <slot name="having"/>
      </sql>
      <results module="graphtool.database.query_handler" function="simple_results_parser">
        <inputs>
          <input name="pivots"> 0 </input>
          <input name="results"> 1 </input>
        </inputs>
      </results>
      <attribute name="graph_type">GratiaPie</attribute>
      <attribute name="pivot_name" />
      <attribute name="title" />
      <attribute name="column_names" />
      <attribute name="column_units" />
    </query>

  </class>

</graphtool-config>
The main parts are:
  • The attribute tag specifies the connection_manager which connects the DB to a database.
  • The aggregate tag which will allow you aggregate the results of the same query from multiple database (in case if your data resides in multiple places).
  • The query tag which specifies a query.  Its parts are:
    • Inputs: The inputs tag specifies the inputs that the query receives; multiple children <input> can be specified.  The <input> takes the following attributes:
      • name: Name of input
      • type: type the input should be converted to.  If left blank, it is implicitly a string.  Other types include "timestamp" (integer representing a unix timestamp), "datetime" (python datetime object), "int" (integer), "float", and "eval" (which causes python to safely evaluate the input as an expression).
      • kind: If kind is set to "sql", the input is passed to the SQL query; otherwise, the input is just passed to the GraphTool framework.
      • The value of the input is the default.
    • SQL: Text of the SQL query.  If this query is meant to be abstract, you can specify multiple <slot> tags; these must be filled in by queries which inherit from this one.  This way, you can make the grouping expression and data column as a generic slot so you can reuse the same base query multiple times.
    • Results: The results tag determines how the SQL rows are processed into a python object.  The results tag has several portions:
      • module and function attributes: The "function" attribute specifies the function which will parse the SQL rows returned by the database and convert them into whatever format needed.  The "module" atttribute specifies the module where "function" can be found. 
        Parsing SQL into python data is an important concept in GraphTool, and it is important to understand the different possibilities:
        • module="graphtool.database.query_handler", function="results_parser": Produces data for a "pivot-group" graph (the StackedBarGraph, CumulativeGraph, QualityMap).  This produces a dictionary-of-dictionaries, where the first key which data is grouped by is the "pivot" and the second key that the data is grouped by is the "group".  For example, the CMS PhEDEx graphs are sorted first by "site" (the pivot) then by the "time bin" (the group).  Usually, the group is the time column.
        • module="graphtool.database.query_handler", function="cumulative_pivot_group_parser": Same as the results_parser function, except the cumulative parser will assume that the data is floating-point, grouping is floating-point, and turns the data into a cumulative sum.  This must be used for the the cumulative graph.
        • module="graphtool.database.query_handler", function="simple_results_parser": The simple_results_parser produces a python dictionary from SQL rows and only groups by the "pivot" column.
      • <inputs> tag; this is similar to the Inputs in the <query> tag.  It takes multiple <input> children, which are then passed on as keyword arguments to the results function.  Some of the most common inputs include:
        • pivots: Which column is the pivot column (integer between 0 and (number of columns-1).
        • grouping: Which column is the grouping column.
        • Results: Which column(s) are the results columns.  May be multiple comma-separated integers.
        • pivot_transform: A function which transforms the name of a pivot.  If the function returns None, then the data element will be removed.  The pivot_transform is very powerful if the pivot name needs to be changed in some way not easily done in SQL.  For example, when the CMS PhEDEx data transfer application wanted to evaluate how they were doing with non-regional transfers, the team implemented a new pivot_transform function which looked up if the transfer endpoints were in different regions; if they weren't, it returned None to remove the data.  The "regional information" was not kept in the database.
        • grouping_transform: A function which transforms the grouping.  If the function returns None, the data element will be removed.
        • The pivot_transform and grouping_transform take the pivot and grouping column(s) (respectively) as positional arguments and any keyword data (data passed to SQL or as HTTP arguments) as keyword arguments.  The number of positional arguments should be well-known to the application developer, but the keyword arguments which may be passed are arbitrary.
        • data_transform: A function like the pivot_transform and grouping_transform which alters the data.  Can be used to filter out unwanted data which is otherwise awkward to remove using SQL.
    • Attributes: Each query carries around multiple attributes which will serve as hints to later objects which use the results.  They may include:
      • graph_type: Type of graph.  There is a base set of graph types, but usually the web application mixes them together.
      • pivot_name: Name of the pivot.  Is used in the graph labels.
      • title: The title of the query.  Any variable values will be expanded.  So, if the graph requires the "vo" variable as an input, a possible title might be: "Jobs run by VO $vo".
      • column_name: Comma-separated list (for multiple results) of the column names.  Is used in the graph labels.
      • column_units: The units for the column names.

To see how a graph might inherit from the one above, here is a simple pie chart:
    <query name="facility_hours" base="GratiaGenericQuery.simple_query">
<inputs>
<input kind="sql" type="float" name="min_hours"> 10 </input>
</inputs>
<sql>
<filler name="group"> CETable.facility_name </filler>
<filler name="column"> sum(WallDuration)/3600 as WallHours </filler>
<filler name="having"> HAVING sum(WallDuration)/3600 >= :min_hours </filler>
</sql>
<attribute name="pivot_name"> Facility </attribute>
<attribute name="column_names"> Computation Time </attribute>
<attribute name="column_units"> Hours </attribute>
<attribute name="title"> Computational Hours by Site </attribute>
<attribute name="graph_type"> MyPie </attribute>
</query>
Note that the <slot> takes from above are filled with the <filler> tags in this query.  Also, regardless of the database binding module used, one may specify bind variables within the SQL query which are passed from the combined parent and child <input> tags (where kind="sql" is set) by prefixing the variable name with a ":".  This follows traditional Oracle syntax, but is supported by the GraphTool wrappers over SQLite, Postgres, and MySQL.

Grapher

The grapher simply takes the SqlQueries object and "prepares" it for being graphed on a webpage.  This will cache queries when possible and return either the graph as a PNG or returns the coordinates of the data in the graph (which is how the mouseover is created).  Here is an example grapher object:
<graphtool-config>

<import module="graphtool.graphs.graph"> Grapher </import>
<import file="$CONFIG_ROOT/gridscan_queries.xml"/>

  <class type="Grapher" name="gridscan_grapher">
    <attribute name="display_name"> GridScan Graphs </attribute>
    <queryobj> GridScanQueries </queryobj>
  </class>
</graphtool-config>
Notice you name at least one query object (SqlQueries type) as well as specifying a display_name attribute which will be a hint for creating the web page.

XmlGenerator

The XmlGenerator object takes a SqlQueries object and saves it in an XML form; a XSLT transform comes with GraphTool which converts this XML into HTML on the client side.  Here is an example XmlGenerator object:
<graphtool-config>

  <import file="$CONFIG_ROOT/gridscan_queries.xml" />
  <import module="graphtool.tools.query_output">XmlGenerator</import>

  <class type="XmlGenerator" name="query_xml">
    <queryobj> GridScanQueries </queryobj>
  </class>
</graphtool-config>

WebHost

The WebHost mounts a GraphTool object onto a CherryPy web server.
<graphtool-config>

  <import module="graphtool.web"> WebHost, StaticContent </import>
  <import file="$CONFIG_ROOT/gratia_graphs.xml" />
  <import file="$CONFIG_ROOT/text_queries.xml" />

  <class name="static" type="StaticContent">
    <directory name="content"> $GRAPHTOOL_ROOT/static_content </directory>
  </class>

  <class name="web" type="WebHost">
    <mount location="/gratia/gridscan_graphs" content="image/png"> <instance name="gridscan_grapher" /> </mount>
    <mount location="/gratia/xml" content="text/xml"> <instance name="query_xml" /> </mount>
    <instance name="static" location="/gratia/static" />
    <config>$CONFIG_ROOT/prod.conf</config>
  </class>

</graphtool-config>
The WebHost class takes multiple <mount> and <instance> tags as well as one <config> tag.  The mount tag needs a location attribute and (if the content-type should be hardcoded) a content for the HTTP content type header.  The GraphTool object which will be mounted is specified as the <instance> child of the <mount> tag.  GraphTool objects cannot be mounted directly!  If a object is a CherryPy object, then it can be mounted as an <instance> tag directly.  The <config> tag allows you to pass a CherryPy config file.

We recommend to always include the "static" object as above; this way, the XML output from the XmlGenerator can find the corresponding XSLT transform, CSS, and Javascript files.  The StaticObject allows you to export all the files in a set directory to the web.

Code for the Webapp

The web application often needs some additional code, which is usually put in the src/ directory.  If nothing else, it's often useful to mix together some of the standard graphs together.  For example, any graph which comes from a SQL DB should use the DBGraph mixin.  Any bar graph using data from a time series ought to use the TimeGraph mixin.  For those who just want to use the standard examples, the following code snippet defines some common graphs.  Save it in src/mygraphs.py and make sure that the source directory is on your PYTHONPATH and mygraphs is imported in your XML files.  Here is the code listing:
from graphtool.graphs.graph import DBGraph, TimeGraph, \
PivotGroupGraph, PivotGraph
from graphtool.graphs.common_graphs import StackedBarGraph, BarGraph, \
CumulativeGraph, PieGraph, QualityMap
import types

from graphtool.graphs.graph import prefs

# Uncomment this to set your own custom watermark instead of the CMS one!
#prefs['watermark'] = '$CONFIG_ROOT/osg_logo_4c_white.png'
# Or set it to "False" to turn off the watermark completely.
prefs['watermark'] = 'False'

class MyStackedBar(PivotGroupGraph, TimeGraph, StackedBarGraph):
pass

class MyBar(TimeGraph, BarGraph):
pass

class MyCumulative(PivotGroupGraph, CumulativeGraph):
pass

class MyPie(PivotGraph, TimeGraph, PieGraph):
pass

Make sure to set the graph_type attribute of any query to be one of your custom graphs!

Putting it all together

We have a few more pieces left.  They are:
  • Filesystem layout.
  • Setting up the environment and dependencies.
  • Launching application server.

Filesystem Layout

This is, of course, a suggested filesystem layout.  Feel free to make your own decisions!  In parenthesis, we suggest an environmental variable to point to this directory.
  • GraphTool/ directory ($GRAPHTOOL_ROOT)
  • Top level directory for application ($GRAPHTOOL_USER_ROOT)
    • setup.sh: setup file to configure the environment; see next subsection.
    • config/: XML configuration files ($CONFIG_ROOT)
      • website.xml: Production website configuration (contains the WebHost configuration)
      • website-devel.xml: Development website configuration (contains an alternate WebHost configuration)
      • prod.conf / development.conf:  CherryPy config files for the production and development server, respectively.
      • generic_queries.xml: Abstract queries for the application.
      • <appname>_queries.xml: Concrete queries for the application; I usually spread these out through multiple files, according to the graph type.
      • <appname>_graphs.xml: Grapher objects.  For smaller applications, this can be folded into another file.
      • text_queries.xml: XmlGenerator configuration.  Probably could be folded into the above file.
    • tools/: Small scripts to start application server.  (add to $PATH)
    • src/: Python source code directory for this application (add to $PYTHONPATH)


Setting up the environment

Usually the environment setup is put into the setup.sh, which is put into the top level directory for the application.  Use this to set up the GraphTool environmental variables laid out above and to set up the python dependencies, as necessary.  Here is a sample environment set up for an application called gratia:
#!/bin/bash

export GRAPHTOOL_ROOT=~/projects/GraphTool
export GRAPHTOOL_USER_ROOT=~/projects/GraphUsers/gratia
export CONFIG_ROOT=$GRAPHTOOL_USER_ROOT/config

export PYTHONPATH=$GRAPHTOOL_USER_ROOT/src:$GRAPHTOOL_ROOT/src:$PYTHONPATH
export PATH=$GRAPHTOOL_USER_ROOT/tools:$PATH


Launching the application server

The application web server can be started with a small code snippet.  One must first load the XML config file for the webapp, then start the CherryPy server:
#!/usr/bin/env python

from graphtool.web import WebHost
import cherrypy

if __name__ == '__main__':
  WebHost( file='$CONFIG_ROOT/website.xml' )
  cherrypy.server.quickstart()
  cherrypy.engine.start()

Create the above file, set it as executable and run it.  It will start up the webapp as a test.  Make sure to redirect the output and "nohup" the process to put it in the background.  An example for starting the gratia webapp is:
nohup ./tools/gratia-webapp 2>&1 > output.log &


Powered by Plone, the Open Source Content Management System