Personal tools
You are here: Home Monitoring Setting up Alerts
Document Actions

Setting up Alerts

by admin last modified 2006-05-23 13:01

A guide to setting up alerts with the MonaLisa Alerts system

Understanding MonaLisa Alerts

Before configuring your own alerts, it is important to understand how MonaLisa works.  Each parameter/value pair has three additional pieces of information: farm, cluster, and node.  This allows data to be sorted by heirarchy, farm/cluster/node/parameter.  The service host which you run MonaLisa on is the "farm", so if you look at some other farm, you are usually looking at data from another site.  The cluster and node values may correspond to a physical cluster and node (as in Ganglia information), or may not.  For example, at our site we keep track of the dCache system in its own cluster, and the nodes correspond to services provided by dCache.

Because the global MonaLisa system is so complex, it is not possible to give every parameter a unique name, which is why the storage heirarchy was presented.  It is analogous to the reason why you don't have all the files on your computer in one directory - organizing data this way is too overwhelming.

The alerts system depends on the PipeResultWriter module of MonaLisa.  In order to get any data to the alerts system, we must first subscribe the PipeResultWriter to it.  A subscription has the form:

farm/cluster/node/parameter

If a * is given for farm, cluster, node, or parameter, it is a wildcard, which will match anything in the appropriate category.  The subscriptions must be placed in the following file:

$MonaLisa_HOME/Service/usr_code/FilterExamples/PipeResultWriter/conf.properties 

They should be comma separated, and be placed on the PREDICATES line.  For example, at our site, the PREDICATES line looks like this:

PREDICATES=red.unl.edu/*/*/Load5 ; red.unl.edu/Ping/*/LostPackages ; red.unl.edu/Auxiliary Services/*/* ; Nebraska-ML-Test/Ping/*/LostPackages ; Nebraska-ML-Test/MonaLisa/localhost/Load5

It is unfortunate, but for every new alert you add, you will probably have to add a new subscription to the PREDICATE line.  It is NOT recommended that you just subscribe your system to all the data - you could be causing thousands of messages a second to hit your system.

Alert.xml Configuration File

The configuration file for the alerts system is /etc/Alert.xml.  At a very minimum, it must contain a "filter" tag and an XML header:

<?xml version="1.0" encoding="UTF-8" ?>
<filters></filters>
Starting from this empty file, we can add various alerts (along with other tags, as covered in other documentation pages).  Here is an example of a basic configuration file with one alert:
<?xml version="1.0" encoding="UTF-8" ?>
<filters>
  <alert name="Aux Service alert" farm="red.unl.edu" cluster="Auxiliary Services">
    <trigger>  $PARAM > 0 </trigger>
    <alertAction>
      <webalert name="add_alert_py">Auxiliary service $PARAMNAME down</webalert>
    </alertAction>
    <updateAlert> 
      <webalert name="add_alert_py">Auxiliary service $PARAMNAME down</webalert>
    </updateAlert>
    <endAlertAction>
      <webalert name="remove_alert_py"></webalert>
    </endAlertAction>
  </alert>
</filters>
Notice that we have one alert tag, and four tags inside that.  We have not specified either parameter or node in the example above - a wildcard (*) subscription is implicit.  In general, alerts follow this format:
<alert name="Alert Name" farm="My Farm Name" cluster="Some Cluster" node="Some node">
  <param> My Parameter </param>
  <trigger> $PARAM > 7 </trigger> <!-- Can be any valid Python expression --
  <alertAction>
    <!-- Fill with actions -->
  </alertAction>
  <updateAlert>
    <!-- Fill with actions -->
  </updateAlert>
  <endAlertAction>
    <!-- Do some action -->
  </endAlertAction>
</alert>
Note that in the trigger expression, I use '>' instead of '7 < $PARAM', as the '<' causes an XML parsing error.  Once loaded, if the trigger is ever true, all the actions listed under "alertAction" will be done.  Every new piece of data received will cause the "updateAlert" actions to be executed.  Finally, if the trigger has not been true for some period of time, all the "endAlertAction" entries are executed.  By default, an alert will be triggered no more than once an hour, and the alerts end after 20 minutes has passed since the last trigger.

The various possible actions are documented elsewhere.


Powered by Plone, the Open Source Content Management System