Internet search engines tend to be perfect examples of the proverb “To them that have shall be given.” (I guess this is a Biblical quote. The “hath” suggests it anyway.)
Get a top ranking on Google and you can guarantee your site will get loads of hits. Which will up your ranking. Which will get you more hits. And so ad infinitum.
Which must be great if you are the website equivalent of Coca Cola. But is a bit of an obstacle when you are Joe Nobody’s Homemade Dandelion and Burdock Drink.
So it’s good that an open source Wikia Search project is slowly being brought into existence. The idea is that an open source search algorithm will inspire more confidence in the results. At the least, it will let website owners know what the goalposts are.
New Scientist of 12th June 2007 (Yes, I know, it obviously takes me a while to process information) described the Wikia search project as the project of a “rebellious group of software engineers” determined to topple Google.
Apparently, one of the biggest problems is the shortage of mountains of cash to set up global data centres to match those of Google and Microsoft. According to New Scientist, one possible solution is to use a grid computing model, along the lines of SETI, with the search processing distributed around the world on volunteer’s PCs.
Most of the stuff on the Wikia site at the moment is concerned with the project itself. There is an about page . It looks as if development has stalled a bit since the initial start push in 2004, though. (Which suggests that New Scientist is even slower than me at processing information.)
Here’s an extract from Wikia Search on some of the ranking problems they intend to address:
Several other strategies to cheat or game the search engines are based on the fact that many search engines consider a hyperlink to a site to be a ‘vote’ for that site or measure of popularity. The use of hyperlinks as an indicator of website ‘quality’ led to link exchanges, link farms, bulletin board spam and other strategies to boost sites. Search engines responded by attempting to algorithmically evaluate the quality of each page, and discount links on sites or pages of little real value. While these algorithms to assess quality have neutralized millions of web pages, they have not (and cannot?) objectively determine the value and context of all the links on the web. The number of links to a page remains one of the biggest factors in how a page ranks in conventional search engines, and remains a prime area of interest for black-hat and grey-hat SEO.
Anything that can cut down the number of pointless spam sites that can clutter up the first few dozen pages of search results from standard search engines will be a big step forward.
I hope they solve the problems and this idea takes off. I’d volunteer my puny computing power and some of my bandwidth. Persuading ISPs not to do the choking-at-peak-times thing that they have started sneaking in through “Fair use” policies might be an obstacle though.