Skip to Content.
Sympa Menu

discuss - Re: [opennic-discuss] Grep.geek offline for maintenance

discuss AT lists.opennicproject.org

Subject: Discuss mailing list

List archive

Re: [opennic-discuss] Grep.geek offline for maintenance


Chronological Thread 
  • From: mike <mike AT pikeaero.com>
  • To: discuss AT lists.opennicproject.org
  • Subject: Re: [opennic-discuss] Grep.geek offline for maintenance
  • Date: Thu, 04 Oct 2012 03:48:43 -0500
  • Envelope-to: discuss AT lists.opennicproject.org


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Since grep.geek is such an important part of OpenNIC and is subject to
scalability issues, would it be appropriate to have a discussion
around some sort of distributed kind of architecture, such that others
could pool resources toward grep.geek and have some redundancy and
more storage capacity at the same time?

In other words, would a discussion around laying the foundation for a
scalable grep.geek make any sense at this time?

- --Mike

On 10/04/2012 04:10 AM, Jeff Taylor wrote:
> The database is back online again, sorry for the delay in notifying
> the list. Part of the delay was that copying the old data took far
> longer than I expected, and this is fact due to the database for
> grep.geek spiraling out of control. The database is currently
> sitting at 118GB and contains information for over 24million
> pages.
[clip]
>
> Regardless of the intentions, my poor little servers are not up to
> the task of indexing Google. Therefore I am implementing some new
> code to reject the indexing of any pages that appear to be
> redirects to another site. I will index the home page of your
> domain, but that is all. This should be enough to get your website
> listed in grep.geek and have it appear in general searches, but
> will not chew up large portions of my storage drives to retain
> multiple copies of the same websites, not to mention the bandwidth
> required to actually crawl these sites multiple times. Once the
> redundant data has been removed from the database, grep.geek should
> also respond much faster to queries.


- --
Regards,

Mike Sharkey,CEO
Pike Aerospace Research Corp. (Pike Aero)
420 Cross Street
Sudbury, Ontario
Canada P3E-3W1

P:1+(705)586-2255
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQbU1mAAoJEA7EcEr0emgfhkwIAJqJono5HINVLv0KjIGrgzZ1
14mFHLTAg0uECAHECtU3iM3sKz3+qC7yCUbyYpoCGWeMvoJMOIHT/M/h5oLiGe2y
DRdC7/tZJd8Ap/nTEQW/qXME+Rdc/ueJCGw4MUro24rxLZqzhWeV4Rer/7MUKDEh
+f9XEwMkyd6PwfsCMON+VIX5rFYfh6YLZt4VKXVhM4Hm2WX/NFSlTR+3Te6zYdoy
tXgwtuXGS8opA79JtDoIxWVZJwWTumFwp/kBuzz5d4WfRY/pLy24c2hqPvRw6eU8
x/C7EP2a4ODHEn46D1/lrtEp1cGcdHM23ipkc8rKcHsqSI2x9Chdc20Q0cgWbS4=
=y5cC
-----END PGP SIGNATURE-----



Archive powered by MHonArc 2.6.19.

Top of Page