Skip to Content.
Sympa Menu

discuss - Re: [opennic-discuss] Policy proposal for removal of non-responding T2 servers

discuss AT

Subject: Discuss mailing list

List archive

Re: [opennic-discuss] Policy proposal for removal of non-responding T2 servers

Chronological Thread 
  • From: Peter Green <peter AT>
  • To: discuss AT
  • Subject: Re: [opennic-discuss] Policy proposal for removal of non-responding T2 servers
  • Date: Wed, 10 Oct 2012 19:09:55 +0100

Hash: SHA1

I agree, to be seen as a serious organization, OpenNIC needs to offer
serious reliability.

Whilst I have admiration for volunteers efforts, I suggest not
volunteering if you can't make the mark.


On 10/10/12 19:01, Jamyn Shanley wrote:
> Given how critical DNS is to both the end-user experience and
> general net functionality, I don't understand why non-responsive
> servers aren't removed from the zonefiles within 15 minutes of a
> problem. There's no reason why they couldn't be put back in
> rotation within an hour or two of being 100% functional again, but
> I gotta say if my local ISP had a policy that allowed them 7 days
> to get one of their DNS servers fixed (and also left the
> problematic server listed on their website/documentation) I'd be
> ... disappointed in their professionalism.
> On Wed, Oct 10, 2012 at 11:52 AM, Jeff Taylor
> <shdwdrgn AT <mailto:shdwdrgn AT>> wrote:
> While finishing up the code, I decided that what makes the most
> sense is to take a server offline based on a value of <days>, but
> then to bring it back online again based on a value of <hours>.
> The offline status is really just an extension of the temp-outage
> status, but this step gets a server removed from the public
> listings. I certainly don't what this status to be viewed as a
> 'punishment' to the admins involved, rather it should be considered
> a notice to the users that there is an extended problem occurring.
> It is interesting that between both replies so far, you have both
> suggested the opposite extremes for bringing a server back into
> the pool. My feelings on this is that since the code will automate
> the process, we can keep the time fairly short, however the server
> was marked offline for a reason, so we want to make sure it is
> running smoothly for a long enough period that we can be sure it is
> stable again. For this reason, I think 48 hours would be a
> reasonable period. We should probably get some more opinions on
> this matter.
> We all seem to be in agreement that 7 days is a good length of time
> to wait for issues to be resolved before marking a server offline,
> so I'll stick with that value while moving forward.
> On 10/08/2012 11:20 PM, Jeff Taylor wrote:
>> Regarding the previous discussion about automating the removal of
>> dead or failing Tier-2 servers...
>> First off, a big thanks to Brian for getting the administrative
>> tools created so we can better manage the status of these
>> servers! We now have the tools in place to mark servers as
>> offline or deleted, and handle each case appropriately. Please
>> note that if your server is marked offline and you are able to
>> repair it, you can contact Brian or myself to re-enable your
>> server on the wiki.
>> I am currently testing some new code which will automatically
>> moving failing servers to an offline status (and remove them from
>> the zone file). Servers that are marked offline will continue to
>> be tested for functionality, and could potentially be
>> automatically changed to an online status when they resume
>> service. In looking through this
> thread,
>> it does not appear we ever really established a policy that I
> could put
>> into the code, so I would like to take a quick vote to see what
> everyone
>> thinks would be best...
>> Policies for marking servers as offline: 1) Testing fails more
>> than (7, 14, 28) days 2) Connection fails more than (2, 3, 7, 14)
>> days
>> Policies for marking an offline server as functional again: 3)
>> Passes all tests for at least (1, 2, 7) days
>> My thoughts on this are that connection failures are more serious
>> that testing failures, and should be given a stricter criteria.
>> Also note that I *can* resolve the test times in hours rather
>> than days, but at the moment it seems best to work on a
>> day-by-day basic to give admins time to fix problems with their
>> systems. Please let me know what
> values
>> you think are best for the three questions above, and I'll tally
> up the
>> results in a couple of days and start implementing the new
>> automation.
>> -------- You are a member of the OpenNIC Discuss list. You may
>> unsubscribe by emailing
> discuss-unsubscribe AT
> <mailto:discuss-unsubscribe AT>
> -------- You are a member of the OpenNIC Discuss list. You may
> unsubscribe by emailing
> discuss-unsubscribe AT
> <mailto:discuss-unsubscribe AT>

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


Archive powered by MHonArc 2.6.19.

Top of Page