ShopWiki, DoubleClick

I was reading a post on Matt Mullenweg’s blog (PhotoMatt), titled “DoubleClick and Kevin Ryan” which talks about Google having bought double click, and Kevin Ryan (co-founder) has moved on to a new start up called ShopWiki. (I am not going to link to them though).

Basically ShopWiki sends bots out to trawl the web and find products at the best price for you. You may think this is a wonderful thing, and it may well be. I am somewhat intrigued though as to why this site (notable for its abject lack of sellable items) has been getting hammered by the Shop Wiki bot for most of the last two days (until it got the .htaccess treatment). As far as I can see, the bot ignored the Robots.txt entry I put in for it (although my track record with this file is poor).

I think the idea behind ShopWiki seems sound and I am sure it is a wonderful new idea. But I have to question the validity of the data it has collected, given the time and effort it spent looking round the contact pages here. In a spambot like fashion, the ShopWiki bot seems to have concentrated on pages which made reference to emails and the like.

Time may modify my point of view, but for now I think of this as a Bad Shop.

National ID database

For this, go to the source and read it. No more secrets by Steve Boggan is a very very disturbing account of how “joined-up government” and national ID documents will mean the end of anything resembling privacy.

The blurb on the printed page says:

“Tony Blair insists his government is not building a Big Brother-style super-database. But all the talk of ‘perfectly sensible’ reforms and ‘transformational government’ masks a chilling assault on our privacy”

Brilliant article. It’s almost too much to take in and it might leave you feeling very depressed. But, really, if you live in the UK, you should read it.

Tagging the untagged

This blog has been going through some traumatic changes to its functionality.

It doesn’t look much different because most of the changes to its appearance were repellent in IE6 and earlier browsers, although they looked great in IE7, so it’s temporarily reverted to a look which it’s had for .. oh, I don’t know… all of about 6 weeks.

The main differences for visitors is that you can find much more by tags, as if the blog was trying to be a mini-Technorati. You can open the Tag Archive page and search on several tags. (These are even presented in a tag cloud.)

The big difference for us is that we can tag things by just clicking on them. Adding tags used to be like pulling teeth. It probably contributed to my blogs being unfeasibly long because I couldn’t bear to have to go through the tagging process again (like a graffiti artist with a sore arm?) So the outcome should be less blog words, more tag words. Or at least, more tag words.

However, we don’t have full tagging liftoff yet.The older posts either don’t have any tags or only have WordPress category tags. By older, I mean “up to January 2007”. So that’s nearly all of them. As the posts here go back over a year, it’s an arduous task to add tags and it’s getting done piecemeal. All the same. it should be possible to find most of what we have for most of the topics.

And by the way, why do people keep typing “none” into the search bit in the header? This is just bizarre. It’s not when people click on the search box without putting anything in, because that brings up a blank page.

Global greetings

Hola, Namasthe, Bonjour.

(Greetings in order of this blog’s ranking by nationality of visitors.)

This blog has finally decided to take Alexa seriously, so it’s greeting all you devoted Costa Rican, Indian, French readers. Please reveal yourselves to us.

Web traffic analysis=nonsense

What is it with search engines? and web-traffic rankers?

This blog has done enough whining about Technorati’s randomness. It’s well overdue to say that it’s probably working far more consistently and reliably than most of the facilities that claim to find Internet resources. (On a note that shows how shamelessly susceptible to flattery we are at whydontyou.org.uk – others please take note – it puts this blog at under 60,000 in the blogosphere which is almost beyond its wildest dreams.)

As an experiment, look up your blog in a few search engines. See if you can find any points in common between them.

Here’s one of my favourites in that I suspect they actually must a randomiser to generate web traffic numbers and links. Pick a blog, look at it in technorati’s blog directory.

Go to the traffic rank bit and click on it. You will find yourself in the realm of Alexa. This will probably show you that the traffic isnt really counted because the blog isn’t in the top 100,000. The daily page views are shown as a percent of people using the whole Internet, i.e., if the site isnt in the top 100,000 sites in the world, you wont get any figures. (If you come in at a newbie 5,195,452 – as this blog does – you may wonder if you are even reading the blog yourself)

100,000 sounds like a lot of sites. However, if you consider, global players (like Google or Microsoft), then big online retailers (like Tescos and Dell), then news sites (CNN, BBC) and national government information sites, you can see it must be pretty difficult to get into the club.

Beneath this blank chart, you will see “Percent of Internet users who visit this site” with a fraction of a percent if it’s anything like this one. (Maybe you’re Microsoft, in which case i guess it will be higher. Will check shortly.)
Then “average number of pages visited” and “3 months average traffic rank” (risibly low) and average page views per visitor (1) (1 🙂 Do you suspect that’s hard-coded?)

But the next bit is what creases me up for its randomness. People who visit this site come from (in order of most visits):

United States 40.0% (fair enough, the blog’s in English. Most English-speaking Internet users are in the USA)
France 20.0%
India 20.0%
Costa Rica 10.0%
United Kingdom 10.0%

Whydontyou.org.uk traffic rank in other countries: (These seem to be the same countries to me)
Costa Rica 46,349
India 167,900
France 170,280
United States 658,841
United Kingdom 703,872

Come on…. To what do we owe this unprecedented popularity in Costa Rica? India? France? This is a UK-based blog. Most of the stuff we witter on about, apart from atheism and technology, relates to the UK.

It’s not that I don’t want to believe it. A central American flavour to its posts would make this blog much more interesting. I just think the figures have been made up.

OK, let’s look at the sites that link here, according to Alexa. These are so out of date, that it’s obviously not been updated since the blog was a couple of months old. In fact, until I submitted a more recent image, Alexa had a screen shot of the blog that was well over a year old. (Yes, I know, that’s like saying “We don’t get enough spam here, please deluge us with as much as you can possibly manage”.) Maybe because of their age, the sites listed in some of these links are unrecognisable. In fact none of the blog links would be counted by Technorati, being over a year old, but then, it shows no links that Technorati counts (under 90 days.)

Let’s search for this blog on Google. Here, it’s wierder. There are few points of comparison between different Google results, if you repeat the search over a day or so. Maybe it’s just how Google treats blogs, but the post that comes up first is always the same one from a few months ago. Other posts can only be seen by asking for similar results, excluded the first time for being the same. Well, guess what Google, every post is different. It’s a blog. Lots of the other Google results for the blog are bits of the RSS feed. I’d like to think that lots of people are devouring the RSS feed, but, unfortunately, these tend to be link farms. In fact, lots of obscure references to the blog linkfarm sites turn up on Google, most being complete news to us. Real human-created references to the blog don’t turn up as often as they actually happen.

I could go on to the point where I was boring even myself.

None of this would matter if getting seen and indexed correctly wasn’t crucial to getting any visitors. I know that indexing engines and search engines are bomabarded with spammers trying every trick there is to get high on the first results page. The search engines have algorithms that are supposed to penalise sites and blogs that don’t match their definition of legitimate – density of keywords, number of inbound links, and so on. I believe that not only are these not working, they are often acting in exact reverse to their intentions.

Content from blogs get scraped and put into blag sites that exist just to spew out other people’s content. Google then decides the original source site has “duplicate” content and downranks it. How do you stop this without stopping legitimate blogs from commenting on your posts?

Keywords in the metatags don’t match teh keywords in the text? Well, duh, normal human beings aren’t thinking only of page rank. So they put keywords in their metatags then write content, without remembering to keep changing the metatags. Only people obsessed with search engine rankings do that and ,of course, a fair percentage of them aren’t just bloggers or normal website owners.

It’s not just a question of getting visitors. Anyone who wants to bring in revenue from their site or blog by displaying adverts gets judged by these bizarre standards. Some schemes base what they send you on your Alexa rating, which is itself derived from Google’s well-nigh arbitrary page rank . If you’ve ever tried to have GoogleAds on a site, you’ll see how abstract the GoogleAds process is. In fact, visitors who think they’re helping you pay for the site, so click a few times on your ads every time they visit will get you disqualified. Ditto, your rivals……. (It seems as if you get automatically disqualified anyway, at the very point that you might actually receive any revenue.)

I know it must be well nigh impossible to filter the enormous volume of material in the Internet, especially in the face of the number of spammers there are. However, there must be better ways of doing it. I am always amazed when people find things here and comment or email us about them. How do they manage to find it?

So here, is an unaccustomed prop for Technorati (unaccustomed for this blog, anyway, whioch has done its fair share of ranting about it). For all the irritating Technorati monster error messages and totally inconsistent service, Technorato remains the best performing indexing service that I’ve come across yet. The tags are really helpful when they work. You can still find an interesting read on someone’s first post. And Technorati isn’t yet totally under the sway of the giant players. The fabled Web 2.0 stuff really does still have something going for it.

New Code Required

It strikes me more and more that is not really “cutting the mustard” with regards to how it aggregates blog posts and how it tries to represent the blogosphere. This is not a bad thing as such – it is more a case that Technorati seem to have bitten off a lot more than they can chew and it certainly is (as previously mentioned) time for a new site to take over.

Once upon a time Yahoo was the dominant search engine on the Internet, then after a while it bogged itself down and people migrated to the sleek newcomer of Google. Can Google do the same with blogs? Personally I hope not, but then I feel that Google is starting to fall behind in the search engine stakes (poor quality search results for example), so they may be better off concentrating on that more than anything else.

As an example of Technorati’s oddness, while I was trying to see if it was ever going to realise new posts had been made here, I was refreshing the page about this blog and I noticed the “posts per day” in the corner. The really odd thing was, each refresh made it alternate between two graphs that bore almost no relation to reality (as well as the most recent posts changing to be either days or hours old). Below you can see the first and second vesions. Do they look the same? (I am aware the scales are different).

Version 1 of the Posts Per DayVersion 2 of the Posts Per Day

For example, how many posts were made on 14 Jan? (hint 2) How many were made on 17 Jan (hint – not 19 yet!)

Will some one PLEASE come up with a site which does it better than Technorati.