Skip to Content.
Sympa Menu

discuss - Re: [opennic-discuss] OpenNIC Infrastructure Monitoring

discuss AT lists.opennicproject.org

Subject: Discuss mailing list

List archive

Re: [opennic-discuss] OpenNIC Infrastructure Monitoring


Chronological Thread 
  • From: Falk Husemann <josen AT paketsequenz.de>
  • To: <discuss AT lists.opennicproject.org>
  • Subject: Re: [opennic-discuss] OpenNIC Infrastructure Monitoring
  • Date: Sat, 19 May 2012 22:51:40 +0200

Moin Niels,

This and some other (from my vieww significant) points are the reason
why i would recomend nagios as a availability monitor and management
system.

This possible scalability problem is not inherent to Smokeping, it's
inherent to the person setting it up using "dig" ;) Theres a perl module
to do it way faster, but which has to be modified to include/link to the
t2 checks from t2log to be valuable. Also keep in mind, I just want to
get a feel for what is a useful metric and what can be collected without
disturbing the T1/T2 admins.


Beside it allows proven high scalability by technology it offers
availability managent beside monitoring, i.e. escalation plans,
contacting regarding admins / techs and a real service or product
based availability monitoring, offers a information base for users on
different levels about what is (not) going when and why, planned
downs, repairs etc..

To make it short: It's a beast and can do everything on multiple
layers (strategic, tactical and operational management), given enough
admin time. It's really not a bad solution, but...


Especially for admins who are usually very busy it is important that
a monitoring solution does only contact them if there is a really
urgent situation - nagios can intelligently "filter" network based
short outs or network based failures not important to alarm anyone.
The nagios admin can i.e. give each T1 (and possibly T2 admin) his
own account non nagios just showing him that ressources regarding him.
Not at least - there are nice android and other clients out to stay
informed mobile.

Theres even an iPhone App (TouchMon) for it and you can get push
notifications instantly (but I dont posess a mobile^^)...


Nagios is very professional software, but even small and fast enough
to monitor the whole OpenNIC down to the T2s att on a small virtual
machine or on a multiusage server beside other stuff - or even beside
other networks to monitor. Users can see if someone got informed, has
acked or is still working on a machine / service and/or if it is a
planned down etc.. It may useful to integrate important single point
of failures in the network like routers into the topology too.

Yes, it's certainly very professional...


May be it is wort to take a look at nagios for you / OpenNIC.

We have one here at my local cs admin group.


There are several nice interfaces available to play with monitoring
data from nagios into maps, graphics etc.

Which have to be piggy backed, because rrd is nothing Nagios Core
can do by default, IIRC.


There are even modules / extensions to get some nice performance
charts out (what is usually secondary in availability monitoring).

With the right config it is typically easy to wrote some automization
scripts i.e. to manage the nagios configuration for special OpenNIC
purposes (i.e. self registration or importing the T1 / T2 lists).

That's what I have in mind and have done for Smokeping so far. It
shouldn't be too hard to output Nagios config files. If you're
interested, post some :)


If someone needs some help setting up nagios i'm open to bring me in
and help here too.

Certainly! Contact me off-list or drop me a query on IRC.



I don't know, honestly, if Nagios (aka. the all singing and dancing)
is the right tool for what I try to measure. I'm just after a rough
insight into general availability and security using (dont laugh)
relative point-scale metrics.

The reasoning behind this is KISS.

So for starters (or newbies like me) I just like to come up with a bottom-up
solution. The generated rrds certainly could be integrated into a
bigger monitoring installation, but I don't have the spare time for that.
Maybe you do?

Currently I try to hack some rough scripts together to get a better
feeling for how the parts here work together, not more. If you like,
add some items to the AuditinWG wiki page :)

Greets,
Falk



Archive powered by MHonArc 2.6.19.

Top of Page