Skip to Content.
Sympa Menu

discuss - Re: [opennic-discuss] New search engine & article

discuss AT lists.opennicproject.org

Subject: Discuss mailing list

List archive

Re: [opennic-discuss] New search engine & article


Chronological Thread 
  • From: Jeff Taylor <shdwdrgn AT sourpuss.net>
  • To: discuss AT lists.opennicproject.org
  • Subject: Re: [opennic-discuss] New search engine & article
  • Date: Tue, 28 Jan 2014 19:35:27 -0700

Martin, something that might help you out... When I run any spiders I do it in 3-hour increments.  Let them go for a bit, then call them to wrap up and close out nicely, then restart again at the 3-hour mark.  Not only does it help flush out any memory leaks, but if for some reason the spider crashes, it will be restarted within a few hours anyway.

On 01/28/2014 06:10 PM, Martin C wrote:
Hi Calum,
I've been working on a search engine too, started just before writing the article that was discussed here 2 weeks ago.

The search system is http://www.sphider.eu/ - a PHP/MySQL search
system that is a bit outdated. In the future I plan to move to
something a bit more sophisticated, but at the moment it seems to do
Mine has the initial search page in HTML, which sends off a query to the main search engine which is written in C and called via CGI. I currently use SQLite3 for data storage. It is written from scratch.

the job. The database kept crashing during indexing, but after adding
some swap space to the server, the problem seems to be fixed.
My spider is written in PHP, and I noticed that after a long time of indexing, PHP would complain about a memory error. I suspect a small memory leak somewhere. I think I need to tweak php.ini a little. I intend to re-write it in C in the next few months.

I have over 80,000 sites currently indexed, and it is OpenNIC aware. My system concentrates on the META tags (keywords and description) with a fall-back to first 300 characters of a page's content if it can't find a META description.

Who knows, maybe we can combine resources and know-how?

Thanks to Peter for the lead-in to my reply, he has been testing my search engine nearly since the start.

Let me know if you'd like a link to it to have a look-see.

Martin.



--------
You are a member of the OpenNIC Discuss list. 
You may unsubscribe by emailing discuss-unsubscribe AT lists.opennicproject.org




Archive powered by MHonArc 2.6.19.

Top of Page