discuss AT lists.opennicproject.org
Subject: Discuss mailing list
List archive
- From: Jeff Taylor <shdwdrgn AT sourpuss.net>
- To: discuss AT lists.opennicproject.org
- Subject: Re: [opennic-discuss] New search engine being populated
- Date: Tue, 24 May 2011 20:36:12 -0600
- List-archive: <http://lists.darkdna.net/pipermail/discuss>
- List-id: <discuss.lists.opennicproject.org>
Most sites could probably be determined by a visual inspection (obvious signs like labeling for the icann domain), but I'm not sure if there's an easy way to programatically determine the correlation. On top of that, I know that some of the largest-content sites are also the ones which have been mirrored repeatedly through a number of opennic domains (the stampsx sites being the primary offender).
I am still trying to find a way of determining when sites are exact mirrors. It would be great if the indexer code was able to understand this concept and flag sites as such, but I may have to get brutal with the settings and simply blacklist the mirrors, leaving a single domain to be indexed.
This still leaves the issue of pages that are just mirrors of an icann site. It's safe to assume that the majority of the content on opennic domains is simply mirrored copies. I'm not sure there's a problem with that, or a reason to not index the pages completely. It's still content available within our realm, and perhaps it might inspire more unique content as we progress?
On 05/24/2011 06:34 PM, Brian Koontz wrote:
Jeff--
Not to throw water on your excellent efforts, but I wonder: How many
of these pages actually reflect content unique to the OpenNIC
namespace (as opposed to, say, content in ICANN-space that is simply
pointed to by an OpenNIC A record)? I don't know if there's a way to
differentiate the two, but I think results from sites that are simply
mirrors of ICANN-space sites should be flagged as such, or maybe
displayed only on a "top-level" basis (i.e., the initial entry page
cataloged).
--Brian
_______________________________________________
discuss mailing list
discuss AT lists.opennicproject.org
http://lists.darkdna.net/mailman/listinfo/discuss
- [opennic-discuss] New search engine being populated, Jeff Taylor, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Jeff Taylor, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Julian DeMarchi, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Alex Hanselka, 05/25/2011
- Re: [opennic-discuss] New search engine being populated, Julian DeMarchi, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Brian Koontz, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Jeff Taylor, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Erik Andersen, 05/25/2011
- Re: [opennic-discuss] New search engine being populated, Alex Hanselka, 05/25/2011
- [opennic-discuss] on IANA address already? : New search engine being populated, JP Blankert (thuis & PC based), 05/25/2011
- Re: [opennic-discuss] on IANA address already? : New search engine being populated, Jeff Taylor, 05/25/2011
- [opennic-discuss] on IANA address already? : New search engine being populated, JP Blankert (thuis & PC based), 05/25/2011
- Re: [opennic-discuss] New search engine being populated, Alex Hanselka, 05/25/2011
- Re: [opennic-discuss] New search engine being populated, Erik Andersen, 05/25/2011
- Re: [opennic-discuss] New search engine being populated, Jeff Taylor, 05/24/2011
- Re: [opennic-discuss] New search engine being populated, Jeff Taylor, 05/31/2011
- Re: [opennic-discuss] New search engine being populated, Alex Hanselka, 05/31/2011
- Re: [opennic-discuss] New search engine being populated, Jeff Taylor, 05/24/2011
Archive powered by MHonArc 2.6.19.