Skip to Content.
Sympa Menu

discuss - [opennic-discuss] Policy proposal for removal of non-responding T2 servers

discuss AT lists.opennicproject.org

Subject: Discuss mailing list

List archive

[opennic-discuss] Policy proposal for removal of non-responding T2 servers


Chronological Thread 
  • From: Jeff Taylor <shdwdrgn AT sourpuss.net>
  • To: OpenNIC discussion <discuss AT lists.opennicproject.org>
  • Subject: [opennic-discuss] Policy proposal for removal of non-responding T2 servers
  • Date: Tue, 14 Aug 2012 22:16:02 -0600

It seems that occasionally we get new folks signing up to run a public
server, then fall short on the commitment of actually running the
service. There is such a case right now where someone created a new
server which went offline again shortly afterwards. There was no
notification of trouble on the mailing lists, and the person has failed
to respond to personal emails.

In the past, one of us have simple removed such servers from the listing
after some random period in which we finally notice the outage.
However, with all the other automated tools we have been developing, it
seemed appropriate to create an official policy regarding the forced
removal of these servers. When someone checks the wiki pages to choose
which public servers they wish to use, we want those choices to reflect
servers that are generally reliable.

There are three situations that should be addressed. The first is when
the server simply stops responding to DNS queries, and appears to have
gone offline completely. Of course not everyone monitors their servers
on a daily basis, and there could be situations such as when the admin
goes on vacation. There could also be issues with their internet
provider. In most cases, I would expect the admin to at least notify
the mailing list if there is an extended problem that they are trying to
fix. However when there are no notifications and the admin cannot be
contacted, I would like to propose that their server be automatically
removed after 14 days.

The second situation is when the server is online, but failing some of
the zone tests. Again there are a lot of factors to consider, but the
concern is that their server is not responding reliably to all queries,
and users of that server will not be able to reach all OpenNic domains.
This situation has more pitfalls, but in the end it comes down to making
sure the users get the answers that they expect. So I would also
propose a 14-day grace period for this situation.

The third situation is overall reliability. If a server only answered
queries 50% of the time, you wouldn't want to use it. Because we are
recording test results for each server, we can create a historical
profile to rate the reliability. I think the easiest way to score a
server would be to check the percentage of passes over the last X days.
So how many days do we want to look at, and at what percentage do we
consider the server unreliable? As an initial starting point (and
because conflicting rules between the first and second situations would
make programming tricky), I am going to suggest removal if a server
drops below 66.7% in 60 days. That is an extremely lenient score, but
it would actually remove at least seven tier-2 servers immediately.

There is currently no code in place for automated removal of dead
servers, but if we can create a policy for their removal, it would
provide a guideline for the admins to manually trim the list down.



Archive powered by MHonArc 2.6.19.

Top of Page