Setting up Alerts
A guide to setting up alerts with the MonaLisa Alerts system
Understanding MonaLisa Alerts
Before configuring your own alerts, it is important to understand how MonaLisa works. Each parameter/value pair has three additional pieces of information: farm, cluster, and node. This allows data to be sorted by heirarchy, farm/cluster/node/parameter. The service host which you run MonaLisa on is the "farm", so if you look at some other farm, you are usually looking at data from another site. The cluster and node values may correspond to a physical cluster and node (as in Ganglia information), or may not. For example, at our site we keep track of the dCache system in its own cluster, and the nodes correspond to services provided by dCache.
Because the global MonaLisa system is so complex, it is not possible to give every parameter a unique name, which is why the storage heirarchy was presented. It is analogous to the reason why you don't have all the files on your computer in one directory - organizing data this way is too overwhelming.
The alerts system depends on the PipeResultWriter module of MonaLisa. In order to get any data to the alerts system, we must first subscribe the PipeResultWriter to it. A subscription has the form:
farm/cluster/node/parameter
If a * is given for farm, cluster, node, or parameter, it is a wildcard, which will match anything in the appropriate category. The subscriptions must be placed in the following file:
$MonaLisa_HOME/Service/usr_code/FilterExamples/PipeResultWriter/conf.properties
They should be comma separated, and be placed on the PREDICATES line. For example, at our site, the PREDICATES line looks like this:
PREDICATES=red.unl.edu/*/*/Load5 ; red.unl.edu/Ping/*/LostPackages ; red.unl.edu/Auxiliary Services/*/* ; Nebraska-ML-Test/Ping/*/LostPackages ; Nebraska-ML-Test/MonaLisa/localhost/Load5
It is unfortunate, but for every new alert you add, you will probably have to add a new subscription to the PREDICATE line. It is NOT recommended that you just subscribe your system to all the data - you could be causing thousands of messages a second to hit your system.
Alert.xml Configuration File
The configuration file for the alerts system is /etc/Alert.xml. At a very minimum, it must contain a "filter" tag and an XML header:
<?xml version="1.0" encoding="UTF-8" ?>Starting from this empty file, we can add various alerts (along with other tags, as covered in other documentation pages). Here is an example of a basic configuration file with one alert:
<filters></filters>
<?xml version="1.0" encoding="UTF-8" ?>Notice that we have one alert tag, and four tags inside that. We have not specified either parameter or node in the example above - a wildcard (*) subscription is implicit. In general, alerts follow this format:
<filters>
<alert name="Aux Service alert" farm="red.unl.edu" cluster="Auxiliary Services">
<trigger> $PARAM > 0 </trigger>
<alertAction>
<webalert name="add_alert_py">Auxiliary service $PARAMNAME down</webalert>
</alertAction>
<updateAlert>
<webalert name="add_alert_py">Auxiliary service $PARAMNAME down</webalert>
</updateAlert>
<endAlertAction>
<webalert name="remove_alert_py"></webalert>
</endAlertAction>
</alert>
</filters>
<alert name="Alert Name" farm="My Farm Name" cluster="Some Cluster" node="Some node">Note that in the trigger expression, I use '>' instead of '7 < $PARAM', as the '<' causes an XML parsing error. Once loaded, if the trigger is ever true, all the actions listed under "alertAction" will be done. Every new piece of data received will cause the "updateAlert" actions to be executed. Finally, if the trigger has not been true for some period of time, all the "endAlertAction" entries are executed. By default, an alert will be triggered no more than once an hour, and the alerts end after 20 minutes has passed since the last trigger.
<param> My Parameter </param>
<trigger> $PARAM > 7 </trigger> <!-- Can be any valid Python expression --
<alertAction>
<!-- Fill with actions -->
</alertAction>
<updateAlert>
<!-- Fill with actions -->
</updateAlert>
<endAlertAction>
<!-- Do some action -->
</endAlertAction>
</alert>
The various possible actions are documented elsewhere.