discuss AT lists.opennicproject.org
Subject: Discuss mailing list
List archive
- From: Jeff Taylor <shdwdrgn AT sourpuss.net>
- To: discuss AT lists.opennicproject.org
- Subject: Re: [opennic-discuss] Grep.geek offline for maintenance
- Date: Thu, 04 Oct 2012 02:10:14 -0600
The database is back online again, sorry for the delay in notifying the
list. Part of the delay was that copying the old data took far longer
than I expected, and this is fact due to the database for grep.geek
spiraling out of control. The database is currently sitting at 118GB
and contains information for over 24million pages.
Why is it so huge? Because a lot of folks create opennic domains with
no purpose other than to mirror an ICANN website. Personally I have no
issues with this... it's your domain, use it how you want to... however
I would like to call into question the intent behind different types of
mirrors.
1) You create a full mirrored backup of another website. When it looked
like wikileaks could be shut down, a number of people created mirrored
copies of the data and hosted it themselves. This is fantastic because
you are actually replicating the data and hosting it from a
geographically different location.
2) The homepage of your domain is a basic 301 redirect to another
domain. There are quite a number of pages that simply redirect to
wikipedia or the pirate bay. I believe the intent was to create new
gateways to reach the piratebay due to some countries attempting to
block the site. Unfortunately it does not work that way. If I try to
browse to piratebay.free and I am sent a 301 redirect to piratebay.se,
my browser still has to make the direct connection to the original
website. I do not magically hop through your local connection to bypass
my country's blockades. Your good intentions are certainly appreciated
in the fight to keep the internet 'free', however when you want to try
and beat someone else's measures, you should probably post your
intentions on the mailing list for a public discussion as to whether
your idea will even work.
Regardless of the intentions, my poor little servers are not up to the
task of indexing Google. Therefore I am implementing some new code to
reject the indexing of any pages that appear to be redirects to another
site. I will index the home page of your domain, but that is all. This
should be enough to get your website listed in grep.geek and have it
appear in general searches, but will not chew up large portions of my
storage drives to retain multiple copies of the same websites, not to
mention the bandwidth required to actually crawl these sites multiple
times. Once the redundant data has been removed from the database,
grep.geek should also respond much faster to queries.
On 10/03/2012 07:53 PM, Jeff Taylor wrote:
> Sorry for the late notice, but I decided to do some maintenance on the
> server that handles the database for grep.geek tonight. The website
> will still respond, however search queries will not work until the
> database comes back up.
>
> Estimated downtime is around one hour, but I will send another email
> once I'm up and running again.
>
>
> --------
> You are a member of the OpenNIC Discuss list.
> You may unsubscribe by emailing discuss-unsubscribe AT lists.opennicproject.org
- [opennic-discuss] Grep.geek offline for maintenance, Jeff Taylor, 10/03/2012
- Re: [opennic-discuss] Grep.geek offline for maintenance, Jeff Taylor, 10/04/2012
- Re: [opennic-discuss] Grep.geek offline for maintenance, mike, 10/04/2012
- Re: [opennic-discuss] Grep.geek offline for maintenance, Jeff Taylor, 10/04/2012
- Re: [opennic-discuss] Grep.geek offline for maintenance, mike, 10/04/2012
- Re: [opennic-discuss] Grep.geek offline for maintenance, Jeff Taylor, 10/04/2012
Archive powered by MHonArc 2.6.19.