Pages tagged ""

Bodiam Castle? Google Is Your Friend…

Posted on 19th March, 2008 by TW

I have been looking through the website logs to see just what it is that drives people to this site and, while lacking in raw comedy value (unlike some), it has been interesting.

Running a combination of Firestats, Feedburner and Google Analytics it seems this blog is getting around 400 visits a day. From these around 80% are new (which shows just what a non-loyal readership we hold…) and of those around 70% come here from a search engine - nearly all from Google. For the numbers-fans, this translates to about 200 hits a day from Google searches. Given the insanely varied nature of topics here, you would be excused for thinking this was reflected in the search stats. Not so.

Of the top ten search terms used to come here, seven are image searches, and this accounts for about 90 of the incoming hits. Even stranger, of these over a third are all searching for images of Bodiam Castle.

Now, Bodiam Castle is a gorgeous, fourteenth century fairytale castle in East Sussex, run by the National Trust, so I can understand why people are interested in it. In fact, I understand this well enough to have uploaded another photo!

Bodiam CastleIf you have come here searching for Bodiam Castle, I hope you like this, and you can even see more on Flickr. It has been a long time since I have been to Bodiam so please, forgive me for the photos being out of date now. If you have links to other pictures of this gorgeous castle, please let me know and I will be more than happy to link to them from here.

Back onto the search topic, there is the determination issue to consider now. Will my posting of a new Bodiam article increase the amount of hits I get for this? Are people massively disappointed when the Mighty Google sends them here rather than elsewhere? Why dont people use Yahoo to search for Bodiam?

The other common terms people use for an “images search” are:

  • Schwarzenegger
  • Nice Art
  • Fine Houses
  • Holy Wafer
  • Jesus Toast (around 5 people a day come here using that search term… MADNESS)
  • Future Castles

Now, some make more sense than others, but I can only guess at the disappointment people must feel when their searches lead them here.For completeness, the most common search terms that bring people to this site are:

  • HDR How To (use Photomatix)
  • Cool Viking Names (well all of them)
  • Bad Journalist (again, all of them)
  • Firefox Memory Hog (it is)
  • Pipex Download Speeds (almost non-existent)
  • McCanns Blog (wrong place, I didn’t even know they had one)

One last point, a bit of an oddity is a search term Feedburner has identified leading some poor unfortunate here: “blog: I cannot read, feel distracted” - I have no idea what this blog has to offer this poor person.

Popularity: 81% [?]

Sphere: Related Content


Popularity: 81% [?]

Wikia search project

Posted on 17th August, 2007 by Heather

Internet search engines tend to be perfect examples of the proverb “To them that have shall be given.” (I guess this is a Biblical quote. The “hath” suggests it anyway.)

Get a top ranking on Google and you can guarantee your site will get loads of hits. Which will up your ranking. Which will get you more hits. And so ad infinitum.

Which must be great if you are the website equivalent of Coca Cola. But is a bit of an obstacle when you are Joe Nobody’s Homemade Dandelion and Burdock Drink.

So it’s good that an open source Wikia Search project is slowly being brought into existence. The idea is that an open source search algorithm will inspire more confidence in the results. At the least, it will let website owners know what the goalposts are.

New Scientist of 12th June 2007 (Yes, I know, it obviously takes me a while to process information) described the Wikia search project as the project of a “rebellious group of software engineers” determined to topple Google.

Apparently, one of the biggest problems is the shortage of mountains of cash to set up global data centres to match those of Google and Microsoft. According to New Scientist, one possible solution is to use a grid computing model, along the lines of SETI, with the search processing distributed around the world on volunteer’s PCs.

Most of the stuff on the Wikia site at the moment is concerned with the project itself. There is an about page . It looks as if development has stalled a bit since the initial start push in 2004, though. (Which suggests that New Scientist is even slower than me at processing information.)

Here’s an extract from Wikia Search on some of the ranking problems they intend to address:

Several other strategies to cheat or game the search engines are based on the fact that many search engines consider a hyperlink to a site to be a ‘vote’ for that site or measure of popularity. The use of hyperlinks as an indicator of website ‘quality’ led to link exchanges, link farms, bulletin board spam and other strategies to boost sites. Search engines responded by attempting to algorithmically evaluate the quality of each page, and discount links on sites or pages of little real value. While these algorithms to assess quality have neutralized millions of web pages, they have not (and cannot?) objectively determine the value and context of all the links on the web. The number of links to a page remains one of the biggest factors in how a page ranks in conventional search engines, and remains a prime area of interest for black-hat and grey-hat SEO.

Anything that can cut down the number of pointless spam sites that can clutter up the first few dozen pages of search results from standard search engines will be a big step forward.

I hope they solve the problems and this idea takes off. I’d volunteer my puny computing power and some of my bandwidth. Persuading ISPs not to do the choking-at-peak-times thing that they have started sneaking in through “Fair use” policies might be an obstacle though.

Popularity: 26% [?]

Sphere: Related Content


Popularity: 26% [?]

ShopWiki, DoubleClick

Posted on 14th April, 2007 by TW

I was reading a post on Matt Mullenweg’s blog (PhotoMatt), titled “DoubleClick and Kevin Ryan” which talks about Google having bought double click, and Kevin Ryan (co-founder) has moved on to a new start up called ShopWiki. (I am not going to link to them though).

Basically ShopWiki sends bots out to trawl the web and find products at the best price for you. You may think this is a wonderful thing, and it may well be. I am somewhat intrigued though as to why this site (notable for its abject lack of sellable items) has been getting hammered by the Shop Wiki bot for most of the last two days (until it got the .htaccess treatment). As far as I can see, the bot ignored the Robots.txt entry I put in for it (although my track record with this file is poor).

I think the idea behind ShopWiki seems sound and I am sure it is a wonderful new idea. But I have to question the validity of the data it has collected, given the time and effort it spent looking round the contact pages here. In a spambot like fashion, the ShopWiki bot seems to have concentrated on pages which made reference to emails and the like.

Time may modify my point of view, but for now I think of this as a Bad Shop.

Popularity: 35% [?]

Sphere: Related Content


Popularity: 35% [?]

Tagging the untagged

Posted on 25th February, 2007 by Heather

This blog has been going through some traumatic changes to its functionality.

It doesn’t look much different because most of the changes to its appearance were repellent in IE6 and earlier browsers, although they looked great in IE7, so it’s temporarily reverted to a look which it’s had for .. oh, I don’t know… all of about 6 weeks.

The main differences for visitors is that you can find much more by tags, as if the blog was trying to be a mini-Technorati. You can open the Tag Archive page and search on several tags. (These are even presented in a tag cloud.)

The big difference for us is that we can tag things by just clicking on them. Adding tags used to be like pulling teeth. It probably contributed to my blogs being unfeasibly long because I couldn’t bear to have to go through the tagging process again (like a graffiti artist with a sore arm?) So the outcome should be less blog words, more tag words. Or at least, more tag words.

However, we don’t have full tagging liftoff yet.The older posts either don’t have any tags or only have Wordpress category tags. By older, I mean “up to January 2007″. So that’s nearly all of them. As the posts here go back over a year, it’s an arduous task to add tags and it’s getting done piecemeal. All the same. it should be possible to find most of what we have for most of the topics.

And by the way, why do people keep typing “none” into the search bit in the header? This is just bizarre. It’s not when people click on the search box without putting anything in, because that brings up a blank page.

Popularity: 30% [?]

Sphere: Related Content


Popularity: 30% [?]

Web traffic analysis=nonsense

Posted on 18th February, 2007 by Heather

What is it with search engines? and web-traffic rankers?

This blog has done enough whining about Technorati’s randomness. It’s well overdue to say that it’s probably working far more consistently and reliably than most of the facilities that claim to find Internet resources. (On a note that shows how shamelessly susceptible to flattery we are at whydontyou.org.uk - others please take note - it puts this blog at under 60,000 in the blogosphere which is almost beyond its wildest dreams.)

As an experiment, look up your blog in a few search engines. See if you can find any points in common between them.

Here’s one of my favourites in that I suspect they actually must a randomiser to generate web traffic numbers and links. Pick a blog, look at it in technorati’s blog directory.

Go to the traffic rank bit and click on it. You will find yourself in the realm of Alexa. This will probably show you that the traffic isnt really counted because the blog isn’t in the top 100,000. The daily page views are shown as a percent of people using the whole Internet, i.e., if the site isnt in the top 100,000 sites in the world, you wont get any figures. (If you come in at a newbie 5,195,452 - as this blog does - you may wonder if you are even reading the blog yourself)

100,000 sounds like a lot of sites. However, if you consider, global players (like Google or Microsoft), then big online retailers (like Tescos and Dell), then news sites (CNN, BBC) and national government information sites, you can see it must be pretty difficult to get into the club.

Beneath this blank chart, you will see “Percent of Internet users who visit this site” with a fraction of a percent if it’s anything like this one. (Maybe you’re Microsoft, in which case i guess it will be higher. Will check shortly.)
Then “average number of pages visited” and “3 months average traffic rank” (risibly low) and average page views per visitor (1) (1 :-) Do you suspect that’s hard-coded?)

But the next bit is what creases me up for its randomness. People who visit this site come from (in order of most visits):

United States 40.0% (fair enough, the blog’s in English. Most English-speaking Internet users are in the USA)
France 20.0%
India 20.0%
Costa Rica 10.0%
United Kingdom 10.0%

Whydontyou.org.uk traffic rank in other countries: (These seem to be the same countries to me)
Costa Rica 46,349
India 167,900
France 170,280
United States 658,841
United Kingdom 703,872

Come on…. To what do we owe this unprecedented popularity in Costa Rica? India? France? This is a UK-based blog. Most of the stuff we witter on about, apart from atheism and technology, relates to the UK.

It’s not that I don’t want to believe it. A central American flavour to its posts would make this blog much more interesting. I just think the figures have been made up.

OK, let’s look at the sites that link here, according to Alexa. These are so out of date, that it’s obviously not been updated since the blog was a couple of months old. In fact, until I submitted a more recent image, Alexa had a screen shot of the blog that was well over a year old. (Yes, I know, that’s like saying “We don’t get enough spam here, please deluge us with as much as you can possibly manage”.) Maybe because of their age, the sites listed in some of these links are unrecognisable. In fact none of the blog links would be counted by Technorati, being over a year old, but then, it shows no links that Technorati counts (under 90 days.)

Let’s search for this blog on Google. Here, it’s wierder. There are few points of comparison between different Google results, if you repeat the search over a day or so. Maybe it’s just how Google treats blogs, but the post that comes up first is always the same one from a few months ago. Other posts can only be seen by asking for similar results, excluded the first time for being the same. Well, guess what Google, every post is different. It’s a blog. Lots of the other Google results for the blog are bits of the RSS feed. I’d like to think that lots of people are devouring the RSS feed, but, unfortunately, these tend to be link farms. In fact, lots of obscure references to the blog linkfarm sites turn up on Google, most being complete news to us. Real human-created references to the blog don’t turn up as often as they actually happen.

I could go on to the point where I was boring even myself.

None of this would matter if getting seen and indexed correctly wasn’t crucial to getting any visitors. I know that indexing engines and search engines are bomabarded with spammers trying every trick there is to get high on the first results page. The search engines have algorithms that are supposed to penalise sites and blogs that don’t match their definition of legitimate - density of keywords, number of inbound links, and so on. I believe that not only are these not working, they are often acting in exact reverse to their intentions.

Content from blogs get scraped and put into blag sites that exist just to spew out other people’s content. Google then decides the original source site has “duplicate” content and downranks it. How do you stop this without stopping legitimate blogs from commenting on your posts?

Keywords in the metatags don’t match teh keywords in the text? Well, duh, normal human beings aren’t thinking only of page rank. So they put keywords in their metatags then write content, without remembering to keep changing the metatags. Only people obsessed with search engine rankings do that and ,of course, a fair percentage of them aren’t just bloggers or normal website owners.

It’s not just a question of getting visitors. Anyone who wants to bring in revenue from their site or blog by displaying adverts gets judged by these bizarre standards. Some schemes base what they send you on your Alexa rating, which is itself derived from Google’s well-nigh arbitrary page rank . If you’ve ever tried to have GoogleAds on a site, you’ll see how abstract the GoogleAds process is. In fact, visitors who think they’re helping you pay for the site, so click a few times on your ads every time they visit will get you disqualified. Ditto, your rivals……. (It seems as if you get automatically disqualified anyway, at the very point that you might actually receive any revenue.)

I know it must be well nigh impossible to filter the enormous volume of material in the Internet, especially in the face of the number of spammers there are. However, there must be better ways of doing it. I am always amazed when people find things here and comment or email us about them. How do they manage to find it?

So here, is an unaccustomed prop for Technorati (unaccustomed for this blog, anyway, whioch has done its fair share of ranting about it). For all the irritating Technorati monster error messages and totally inconsistent service, Technorato remains the best performing indexing service that I’ve come across yet. The tags are really helpful when they work. You can still find an interesting read on someone’s first post. And Technorati isn’t yet totally under the sway of the giant players. The fabled Web 2.0 stuff really does still have something going for it.

Popularity: 29% [?]

Sphere: Related Content


Popularity: 29% [?]

Technorati … again

Posted on 28th January, 2007 by TW

Just when you may have thought Technorati was approaching normal behaviour, this happens:

Technorati Screenshot

I would say it is getting repetitive but that is, surely, stating the obvious. Despite there being a positive number of blog posts each day (chart) of the last 30 days, Technorati claims to have no posts. It is doing this an awful lot at the moment.

Before this blog creates the impression it just doesn’t like Technorati (which is close to the truth now), I just want to highlight the importance of an “open standard” for things like this. People writing blog posts have no real way of knowing if their comments are getting picked up by Technorati - and if you don’t appear in the three posts listed on that page, people are very unlikely to ever read your posts. Even Google is more open and honest about how it indexes pages.

On it’s own this would be bad enough but it could be argued that blog creators will blog no matter who reads it. The bigger problem is for people searching with Technorati. The results you get from a search are almost randomly arbitrary. When you search, you have no idea if you are getting the latest posts, most relevant posts or anything. It is madness.

Now I actually don’t want the likes of Google to take over as the Blog search engine of choice (it has just as many flaws but different ones), however as Technorati seems to be spectacularly dropping the ball this may be inevitable.

So much for the weblogs being the “great publishing revolution” that allows the masses to become journalists. Unless you get millions of links you wont show up on Google, and your chances of showing up on Technorati seem to depend on you having a MySpace blog or some other covert whim.  Does this need new software to solve it? Are search engines like IceRocket better? At the moment I dont think so, but times change…

Popularity: 30% [?]

Sphere: Related Content


Popularity: 30% [?]

Technorati - tech support needed

Posted on 28th November, 2006 by Heather

My PC is often eccentric and Internet Explorer sometimes seems to have its own agenda, but the way Technorati has been behaving in the past few days defies even my capacity for denial.

I have managed to get it to behave normally about one in twenty tries. Almost all of the rest of the time it just dies on any search, giving the sort of useless error messages that might as well say “It’s broken. I have no idea either, sorry.” A few times, it manfully tried to give me search results but couldn’t sustain the effort beyond the first page and belatedly did the dying thing again.

Popularity: 23% [?]

Sphere: Related Content


Popularity: 23% [?]

Blog Search with Blogger

Posted on 30th September, 2006 by TW

Well, I used to think Technorati was bad at searching the blogosphere for posts and the like. Then I discovered Blogger’s search tool. Wow. It is bad. It really is that bad.

Now, in the past we here at WhyDontYou have ranted about the problems with blog indexes like Technorati (mainly that unless you are a major company or can public relations yourself to get a billion back links you will never show up), but blogger seems to be drowned under the weight of spam.

Out of curiousity, I did a search for “Open University” Technology Web Design, as I know there is a fairly good blog on TT380 and wondered if there were any more.

Sadly, if I only had Blogger search I would never know. I wouldn’t even know the TT380 blog existed. At the time of my search, the first page of results was almost entirely spam. There are a couple of “real posts” (not exactly blogs though) followed by pages like this - http://bonjourarraonsons77327. blogspot.com/ 2006/09/ soccer-team-strategies-with-3-4-3.html - pure garbage which appears to be there for the sole reason of getting links to other spam sites indexed by google.

It is insane. Blogger is powered by Google. Given the difficulty in getting first page results on Google, you would think this applied to Blogger as well. Obviously not.

Still, in my opinion google is getting less and less relevant now. I find http://uk.yahoo.com produces more, worthwhile, search results and pages I design get indexed there faster. Is this the dawn of a new Yahoo?

Anyway, on a more serious note - the TT380 site is excellent. Well worth a visit and a shame it isnt higher placed on the Search Engine.

Popularity: 19% [?]

Sphere: Related Content


Popularity: 19% [?]

Is Technorati Pointless?

Posted on 19th June, 2006 by TW

Well, this is an interesting one. Following on from the CoFaud problems mentioned in the last post here, where I was trying to search Google for a Windows service called CoFaud - but got no hits at all, I tried the search again today (after the web server reported Google had indexed the page) and still no hits.

Out of curiousity, I did a technorati search for the word CoFaud. Now, I know that the blog entry was in Technorati as it was there at the top of the list.

However, the search produced no hits. Even when I searched through my favourite blogs only, I still go no hits on CoFaud. When I went to the Why Dont You Blog entry page though, the CoFaud article is there at the top. It seems that for some reason, Technorati is not showing the proper results. Either it’s search engine is broken or they are manipulating the results in an underhand manner.

Whichever it is, it is wrong and it goes along way to reinforcing my beliefs that this obsession people seem to have with Web 2.0 and the “social net” is not a good thing.

Popularity: 14% [?]

Sphere: Related Content


Popularity: 14% [?]

CoFaud - Weird WinXP service

Posted on 18th June, 2006 by TW

I have been looking through the services tab on a Windows XP machine and I came across a reference to a serice called “CoFaud” (currently disabled). There were no clues as to the path for this service or what is function was.

Out of curiousity (and obviously to make sure it wasnt a trojan/virus etc), I ran a google search - http://www.google.co.uk/search?hl=en&q=Cofaud+windows+service but this returns exactly ZERO hits.

Even a search for the term “cofaud” only returns 30 hits - almost all names and nothing to do (as far as I could) with a potentially rogue windows service. Does anyone have any idea what this service is?

Popularity: 15% [?]

Sphere: Related Content


Popularity: 15% [?]