Monitoring
A central repository for getting the most out of MonaLisa
MonaLisa is a wonderful infrastructure to build a site's monitoring system upon. However, out of the box, many people find this complex system bewildering. The point of this website is to help make getting the fullest out of MonaLisa a much more pleasant experience.
The target audience for this webpage is an OSG or CMS system administrator. It will be assumed throughout that you already have a basic MonaLisa service up and working.
The target audience for this webpage is an OSG or CMS system administrator. It will be assumed throughout that you already have a basic MonaLisa service up and working.
Phedex Monitoring:
Some Phedex monitoring can be done easily through MonaLisa; no effort on your part is required. Currently, the monitoring is available here.Recipes:
Adding a reliable module may take a bit of Java coding, but setting up some ad-hoc monitoring on your own can be pretty easy! The recipes below get useful information into the system. The results can be viewed with the Java client to easily get information back out.- mlmetric.py: A simple script used in all our recipes.
- Enabling ApMon in your service
- dCache space information
- dCache GridFtp door monitoring; will be updated later
- Simple one-liners
- Phedex monitoring; will be filled in after SC4 release.
- Setting up ping tests
- Multiple Ganglia clusters
MonaLisa Alerts System
Here at Nebraska, we have put together an XML-based alerts system which uses MonaLisa as a backend. Adding or removing new alerts is as simple as editing one XML file. It is documented here.- Setting up and installing XmlFilter.
- Setting up alerts.
- Adding new filters.
- Adding heartbeat triggers.
- Possible actions.
- Loading custom modules.
- UNL's site configuration
Desired Services:
Information services we are interested in seeing in the short term, and some comments on them- Phedex transfers (monitor successful transfers, not just network flow).
- Phedex people have been resistant to integrating MonaLisa into the official scripts. However, this can be done at sites which are interested in seeing the information.
- They have made available the monitoring data collected in their databases as a CSV-formatted file which can be fed into MonaLisa. I'm waiting for the SC4 release as the old release has a life of about 2 weeks, and there are different file formats.
- Running CRAB jobs (ML monitoring is enabled, but not collected on the OSG).
- We can see the CRAB jobs in the ARDA group on MonaLisa. The work just needs to be done to agglomerate local site data from the global data.
- Available diskspace on various NFS servers
- This can be done using Ganglia or MLMetric; doing it through Ganglia requires patching 2 lines of ML source code.
- "problematic servers"; I want to see load graphs for servers with load > 5.0
- This could be done with a repository. Example graphs are on this website.
- batch slots free for PBS
- Would have to either fix up the existing module or write a new one. We should coordinate with Iosif on this. Because each site operates differently, we probably want them to provide a script which feeds the correct information into MonaLisa.
- application level connectivity (ping/ssh) to various servers
- This is tied in to the alerts system for our website.
- Perhaps ssh connectivity could be done with simple changes to monPing? We'll have to investigate.
- online/offline status for auxiliary services (myproxy, website, slapd, GUMS, Phedex agents, etc).
- Each one of these could simply be done with a cronjob and mlmetric. Some examples of monitoring auxiliary services can be found in the "recipes" section above.
- We additionally monitor GridCat locally, and it is tied into our alerts system.