23 may 2014 the old monitoring system "nagios" was put to sleep, and "circonus" was given production status.
The new monitoring system is sponsored by circonus and most of the monitoring as well as the central database runs on www.circonus.com. The infrastructure team have built and deployed logic around the standard circonus system:
- A private broker, to monitor internal services without exposing them on internet
- A dedicated broker (inhouse development) that monitor special ASF systems (like svn compare US - EU)
- A configuration system, that are based on svn.
- A new status page status.apache.org
- A new team structure (all committers with sudo karma on a vm, get an email when something happens with the vm)
The new system is a lot faster and we can therefore offer projects monitoring of project URLs, of course the project also need to have a team that handles the alerts.
The current version has approx. the same facilities as Nagios, but we are planning (and actively programming) a version.2 that will allow us to better predict problems before they occur.
Some of the upcoming features are:
- disk monitoring
- vital data statistic from core system (like size of mail queues)
The change of monitoring system is a vital component in our transition to automate services and thereby enable infra to more effectively secure the stability of the infrastructure as well as make early detection of potential problems.
The system was presented in Apachecon denver 2014, slides can be found here. We hope to present the live version at apachecon budapest 2014.
On behalf of the infrastructure team