Pages tagged ""

ShopWiki, DoubleClick

Posted on 14th April, 2007 by TW

I was reading a post on Matt Mullenweg’s blog (PhotoMatt), titled “DoubleClick and Kevin Ryan” which talks about Google having bought double click, and Kevin Ryan (co-founder) has moved on to a new start up called ShopWiki. (I am not going to link to them though).

Basically ShopWiki sends bots out to trawl the web and find products at the best price for you. You may think this is a wonderful thing, and it may well be. I am somewhat intrigued though as to why this site (notable for its abject lack of sellable items) has been getting hammered by the Shop Wiki bot for most of the last two days (until it got the .htaccess treatment). As far as I can see, the bot ignored the Robots.txt entry I put in for it (although my track record with this file is poor).

I think the idea behind ShopWiki seems sound and I am sure it is a wonderful new idea. But I have to question the validity of the data it has collected, given the time and effort it spent looking round the contact pages here. In a spambot like fashion, the ShopWiki bot seems to have concentrated on pages which made reference to emails and the like.

Time may modify my point of view, but for now I think of this as a Bad Shop.

Popularity: 35% [?]


Popularity: 35% [?]

National ID database

Posted on 27th February, 2007 by Heather

For this, go to the source and read it. No more secrets by Steve Boggan is a very very disturbing account of how “joined-up government” and national ID documents will mean the end of anything resembling privacy.

The blurb on the printed page says:

“Tony Blair insists his government is not building a Big Brother-style super-database. But all the talk of ‘perfectly sensible’ reforms and ‘transformational government’ masks a chilling assault on our privacy”

Brilliant article. It’s almost too much to take in and it might leave you feeling very depressed. But, really, if you live in the UK, you should read it.

Popularity: 44% [?]


Popularity: 44% [?]

Tagging the untagged

Posted on 25th February, 2007 by Heather

This blog has been going through some traumatic changes to its functionality.

It doesn’t look much different because most of the changes to its appearance were repellent in IE6 and earlier browsers, although they looked great in IE7, so it’s temporarily reverted to a look which it’s had for .. oh, I don’t know… all of about 6 weeks.

The main differences for visitors is that you can find much more by tags, as if the blog was trying to be a mini-Technorati. You can open the Tag Archive page and search on several tags. (These are even presented in a tag cloud.)

The big difference for us is that we can tag things by just clicking on them. Adding tags used to be like pulling teeth. It probably contributed to my blogs being unfeasibly long because I couldn’t bear to have to go through the tagging process again (like a graffiti artist with a sore arm?) So the outcome should be less blog words, more tag words. Or at least, more tag words.

However, we don’t have full tagging liftoff yet.The older posts either don’t have any tags or only have Wordpress category tags. By older, I mean “up to January 2007″. So that’s nearly all of them. As the posts here go back over a year, it’s an arduous task to add tags and it’s getting done piecemeal. All the same. it should be possible to find most of what we have for most of the topics.

And by the way, why do people keep typing “none” into the search bit in the header? This is just bizarre. It’s not when people click on the search box without putting anything in, because that brings up a blank page.

Popularity: 30% [?]


Popularity: 30% [?]

Global greetings

Posted on 18th February, 2007 by Heather

Hola, Namasthe, Bonjour.

(Greetings in order of this blog’s ranking by nationality of visitors.)

This blog has finally decided to take Alexa seriously, so it’s greeting all you devoted Costa Rican, Indian, French readers. Please reveal yourselves to us.

Popularity: 13% [?]


Popularity: 13% [?]

Web traffic analysis=nonsense

Posted on 18th February, 2007 by Heather

What is it with search engines? and web-traffic rankers?

This blog has done enough whining about Technorati’s randomness. It’s well overdue to say that it’s probably working far more consistently and reliably than most of the facilities that claim to find Internet resources. (On a note that shows how shamelessly susceptible to flattery we are at whydontyou.org.uk - others please take note - it puts this blog at under 60,000 in the blogosphere which is almost beyond its wildest dreams.)

As an experiment, look up your blog in a few search engines. See if you can find any points in common between them.

Here’s one of my favourites in that I suspect they actually must a randomiser to generate web traffic numbers and links. Pick a blog, look at it in technorati’s blog directory.

Go to the traffic rank bit and click on it. You will find yourself in the realm of Alexa. This will probably show you that the traffic isnt really counted because the blog isn’t in the top 100,000. The daily page views are shown as a percent of people using the whole Internet, i.e., if the site isnt in the top 100,000 sites in the world, you wont get any figures. (If you come in at a newbie 5,195,452 - as this blog does - you may wonder if you are even reading the blog yourself)

100,000 sounds like a lot of sites. However, if you consider, global players (like Google or Microsoft), then big online retailers (like Tescos and Dell), then news sites (CNN, BBC) and national government information sites, you can see it must be pretty difficult to get into the club.

Beneath this blank chart, you will see “Percent of Internet users who visit this site” with a fraction of a percent if it’s anything like this one. (Maybe you’re Microsoft, in which case i guess it will be higher. Will check shortly.)
Then “average number of pages visited” and “3 months average traffic rank” (risibly low) and average page views per visitor (1) (1 :-) Do you suspect that’s hard-coded?)

But the next bit is what creases me up for its randomness. People who visit this site come from (in order of most visits):

United States 40.0% (fair enough, the blog’s in English. Most English-speaking Internet users are in the USA)
France 20.0%
India 20.0%
Costa Rica 10.0%
United Kingdom 10.0%

Whydontyou.org.uk traffic rank in other countries: (These seem to be the same countries to me)
Costa Rica 46,349
India 167,900
France 170,280
United States 658,841
United Kingdom 703,872

Come on…. To what do we owe this unprecedented popularity in Costa Rica? India? France? This is a UK-based blog. Most of the stuff we witter on about, apart from atheism and technology, relates to the UK.

It’s not that I don’t want to believe it. A central American flavour to its posts would make this blog much more interesting. I just think the figures have been made up.

OK, let’s look at the sites that link here, according to Alexa. These are so out of date, that it’s obviously not been updated since the blog was a couple of months old. In fact, until I submitted a more recent image, Alexa had a screen shot of the blog that was well over a year old. (Yes, I know, that’s like saying “We don’t get enough spam here, please deluge us with as much as you can possibly manage”.) Maybe because of their age, the sites listed in some of these links are unrecognisable. In fact none of the blog links would be counted by Technorati, being over a year old, but then, it shows no links that Technorati counts (under 90 days.)

Let’s search for this blog on Google. Here, it’s wierder. There are few points of comparison between different Google results, if you repeat the search over a day or so. Maybe it’s just how Google treats blogs, but the post that comes up first is always the same one from a few months ago. Other posts can only be seen by asking for similar results, excluded the first time for being the same. Well, guess what Google, every post is different. It’s a blog. Lots of the other Google results for the blog are bits of the RSS feed. I’d like to think that lots of people are devouring the RSS feed, but, unfortunately, these tend to be link farms. In fact, lots of obscure references to the blog linkfarm sites turn up on Google, most being complete news to us. Real human-created references to the blog don’t turn up as often as they actually happen.

I could go on to the point where I was boring even myself.

None of this would matter if getting seen and indexed correctly wasn’t crucial to getting any visitors. I know that indexing engines and search engines are bomabarded with spammers trying every trick there is to get high on the first results page. The search engines have algorithms that are supposed to penalise sites and blogs that don’t match their definition of legitimate - density of keywords, number of inbound links, and so on. I believe that not only are these not working, they are often acting in exact reverse to their intentions.

Content from blogs get scraped and put into blag sites that exist just to spew out other people’s content. Google then decides the original source site has “duplicate” content and downranks it. How do you stop this without stopping legitimate blogs from commenting on your posts?

Keywords in the metatags don’t match teh keywords in the text? Well, duh, normal human beings aren’t thinking only of page rank. So they put keywords in their metatags then write content, without remembering to keep changing the metatags. Only people obsessed with search engine rankings do that and ,of course, a fair percentage of them aren’t just bloggers or normal website owners.

It’s not just a question of getting visitors. Anyone who wants to bring in revenue from their site or blog by displaying adverts gets judged by these bizarre standards. Some schemes base what they send you on your Alexa rating, which is itself derived from Google’s well-nigh arbitrary page rank . If you’ve ever tried to have GoogleAds on a site, you’ll see how abstract the GoogleAds process is. In fact, visitors who think they’re helping you pay for the site, so click a few times on your ads every time they visit will get you disqualified. Ditto, your rivals……. (It seems as if you get automatically disqualified anyway, at the very point that you might actually receive any revenue.)

I know it must be well nigh impossible to filter the enormous volume of material in the Internet, especially in the face of the number of spammers there are. However, there must be better ways of doing it. I am always amazed when people find things here and comment or email us about them. How do they manage to find it?

So here, is an unaccustomed prop for Technorati (unaccustomed for this blog, anyway, whioch has done its fair share of ranting about it). For all the irritating Technorati monster error messages and totally inconsistent service, Technorato remains the best performing indexing service that I’ve come across yet. The tags are really helpful when they work. You can still find an interesting read on someone’s first post. And Technorati isn’t yet totally under the sway of the giant players. The fabled Web 2.0 stuff really does still have something going for it.

Popularity: 29% [?]


Popularity: 29% [?]

Technorati … again

Posted on 28th January, 2007 by TW

Just when you may have thought Technorati was approaching normal behaviour, this happens:

Technorati Screenshot

I would say it is getting repetitive but that is, surely, stating the obvious. Despite there being a positive number of blog posts each day (chart) of the last 30 days, Technorati claims to have no posts. It is doing this an awful lot at the moment.

Before this blog creates the impression it just doesn’t like Technorati (which is close to the truth now), I just want to highlight the importance of an “open standard” for things like this. People writing blog posts have no real way of knowing if their comments are getting picked up by Technorati - and if you don’t appear in the three posts listed on that page, people are very unlikely to ever read your posts. Even Google is more open and honest about how it indexes pages.

On it’s own this would be bad enough but it could be argued that blog creators will blog no matter who reads it. The bigger problem is for people searching with Technorati. The results you get from a search are almost randomly arbitrary. When you search, you have no idea if you are getting the latest posts, most relevant posts or anything. It is madness.

Now I actually don’t want the likes of Google to take over as the Blog search engine of choice (it has just as many flaws but different ones), however as Technorati seems to be spectacularly dropping the ball this may be inevitable.

So much for the weblogs being the “great publishing revolution” that allows the masses to become journalists. Unless you get millions of links you wont show up on Google, and your chances of showing up on Technorati seem to depend on you having a MySpace blog or some other covert whim.  Does this need new software to solve it? Are search engines like IceRocket better? At the moment I dont think so, but times change…

Popularity: 30% [?]


Popularity: 30% [?]

New Code Required

Posted on 17th January, 2007 by TW

It strikes me more and more that is not really “cutting the mustard” with regards to how it aggregates blog posts and how it tries to represent the blogosphere. This is not a bad thing as such - it is more a case that Technorati seem to have bitten off a lot more than they can chew and it certainly is (as previously mentioned) time for a new site to take over.

Once upon a time Yahoo was the dominant search engine on the Internet, then after a while it bogged itself down and people migrated to the sleek newcomer of Google. Can Google do the same with blogs? Personally I hope not, but then I feel that Google is starting to fall behind in the search engine stakes (poor quality search results for example), so they may be better off concentrating on that more than anything else.

As an example of Technorati’s oddness, while I was trying to see if it was ever going to realise new posts had been made here, I was refreshing the page about this blog and I noticed the “posts per day” in the corner. The really odd thing was, each refresh made it alternate between two graphs that bore almost no relation to reality (as well as the most recent posts changing to be either days or hours old). Below you can see the first and second vesions. Do they look the same? (I am aware the scales are different).

Version 1 of the Posts Per DayVersion 2 of the Posts Per Day

For example, how many posts were made on 14 Jan? (hint 2) How many were made on 17 Jan (hint - not 19 yet!)

Will some one PLEASE come up with a site which does it better than Technorati.

Popularity: 15% [?]


Popularity: 15% [?]

Technorati losing ground?

Posted on 1st January, 2007 by TW

It seems the Why Don’t You…? opinion that Technorati is mad is more widely shared than we had previously realised. (”Technorati Suffering?,” “Technorati - Tech Support Needed” and “Technorati Oddities - Again” are three recent examples)

Reading through the Register’s RSS feeds I found this today “Google overtakes Technorati,” in which the Register outlines research from HitWise inc. This research seems to show that Google’s blog search is vastly outstripping Technorati in results returned and usability. The article also shows screen shots from a search for Dr.Who in Google and Technorati. In the article, Google returns over 40,000 hits, while Technorati shows zero.

Just to confirm (and in the interests of scientific repeatability), we have run the tests again here with almost identical results. You can see for yourself: http://technorati.com/search/dr.who or http://www.google.co.uk/blogsearch?hl=en&q=dr.who. (It may change now this blog exists though!)

In the interests of fair play, we must point out that with different search terms you get different results (for example “Richard Dawkins” produces almost the same number in both engines). However the Google interface is much faster than Technorati and seems (this is currently totally unconfirmed) to be more up to date.

Can Technorati survive this? Time will tell. (Will it ever be possible to use “Google Tags?”)

Popularity: 16% [?]


Popularity: 16% [?]

Technorati - tech support needed

Posted on 28th November, 2006 by Heather

My PC is often eccentric and Internet Explorer sometimes seems to have its own agenda, but the way Technorati has been behaving in the past few days defies even my capacity for denial.

I have managed to get it to behave normally about one in twenty tries. Almost all of the rest of the time it just dies on any search, giving the sort of useless error messages that might as well say “It’s broken. I have no idea either, sorry.” A few times, it manfully tried to give me search results but couldn’t sustain the effort beyond the first page and belatedly did the dying thing again.

Popularity: 23% [?]


Popularity: 23% [?]

Search engine complaint

Posted on 28th November, 2006 by Heather

This blog complained in January 2006 about how bad search engines are. This post will raise that one by about a grand. If anything, they seem to be getting worse.

I had offered to try to find someone’s email address online. Assuming the person was too canny to put their real name (to avoid spam) but might give some signs of their presence in forums and so on, I tried various search methods. The first thing that I discovered was that there seem to be no legitimate directories in which you can find people. Where there used to be White pages and People finders, there are basically none worth using. I can see that spam has made people unwilling to leave their email addresses ripe for the plucking but this seems ridiculous.

I did straight searches for the name (quite an unusual one) and found one forum post containing this name in Google. I continued searching using other search engines and what you would assume to be more productive versions of the name, (such as just the first initial and surname) and actually found that even the forum post that I had found the first time wasn’t brought up by any other searches.

So experimentally, I tried searching for other names, including the name of someone who I know was found through the Internet by an old school friend a couple of years ago, when the internet was clearly a much more naive and open place. No results. I then tried searching for a name that had appeared in this blog. I found this blog but only a cached version. I did not find the article to which the blog had linked, although this is still available online.

So, to test Google, I searched for the headline of the article to which I had referred. I enclosed the text in quotes to stop it from bringing up its first choices - a string of web addresses where any of the words appeared anywhere in any order. (The blog article had came up on page 2) No results, this time, except for where the headline was quoted in this blog - cache version only.

Not believing my eyes, given that I had the article open in front of me in Internet Explorer, I assume that the site for which I was searching is just not indexed by Google. It is a local newspaper site for a pretty sizeable UK city. It gets public funding. Can it really have been so inept in its SEO practice that Google can’t see it? Are googlebots so inadequate that they can’t see a site which supplies many GB of text?

The article is nearly a year old. I thought that maybe Google feels impelled to cache anything this old to save its search time. However, this doesn’t explain why most of the presented results went back 7 years and came from very obscure rural journals, when I put a couple of phrases from the headline in quotes.

There are lots of sources online that claim to have some idea about the logic that underlies Google (et al) ’s search methods. Bullsh. There is no logic to it, as far as I can see, after empirical testing.

Tragically, search engines are not just getting poorer at delivering meaningful results, they are increasingly clones of each other, so that you get the same garbage, in the same order, from half a dozen. There must be a solution?

Popularity: 25% [?]


Popularity: 25% [?]