El Reg has an interesting article on Google and how its ranking system is, effectively a black art. For a company which claims to “do no evil” it is bizarre how closed they keep their methods – surely shining the light of openness on how they work would be the “good” thing to do. While it might increase the risk of black hat SEO working, surely it would make it easier for everyone else as well. Why does the Google Search Ranking algorithm have to be a secret? Read more at: http://www.theregister.co.uk/2009/11/19/google_hand_of_god/
Moving away from Jamie Whyte article and the inevitable Christian wackeroony response, points me towards something that annoys me just as much as the blatant idiocy of the religious.
Once upon a time, the BBC was a bastion of the English language and a resource you could look on as “reliable and trustworthy.” At some point in the recent past, all this changed. Things have been bad for a while and lately they have reached a new low in the erratic, random, headlines they use for articles. On the whole, you wouldn’t care what the headline is, as you can read the article to find out more – however on the Internet the title is the link. It is what you see as a hook to read the article and (sadly) is often all people will read thinking they can get the news one sentence at a time. Sadly, in this task the BBC fails massively.
Take these examples from todays news articles. Have a look and see what you think the article is going to say, then visit the news item and see if it matches:
- Fewer teachers aim for principals (link)
- Brown makes justice deadline call (link)
- England ‘most crowded in Europe’ (link)
- Boys jailed for tram stop killing (link)
- Cancer woman stranded by XL (link)
- Review ordered into cancer move (link)
- Man tells police of woman’s body (link)
- Father’s rape quash bid rejected (link)
Now, admittedly, some may be easier to work out than others and for most you can get a good idea after a few moments thinking about what they are trying to say.
But that is my point.
These are headlines so desperate to get keywords in (and possibly do a bit of SEO for the BBC) that they sacrifice readability and legibility.
Why on Earth has the BBC stooped this low? Are people in the UK so ignorant, uneducated and time-short that they need this sort of nonsense?
What is it with search engines? and web-traffic rankers?
This blog has done enough whining about Technorati’s randomness. It’s well overdue to say that it’s probably working far more consistently and reliably than most of the facilities that claim to find Internet resources. (On a note that shows how shamelessly susceptible to flattery we are at whydontyou.org.uk – others please take note – it puts this blog at under 60,000 in the blogosphere which is almost beyond its wildest dreams.)
As an experiment, look up your blog in a few search engines. See if you can find any points in common between them.
Here’s one of my favourites in that I suspect they actually must a randomiser to generate web traffic numbers and links. Pick a blog, look at it in technorati’s blog directory.
Go to the traffic rank bit and click on it. You will find yourself in the realm of Alexa. This will probably show you that the traffic isnt really counted because the blog isn’t in the top 100,000. The daily page views are shown as a percent of people using the whole Internet, i.e., if the site isnt in the top 100,000 sites in the world, you wont get any figures. (If you come in at a newbie 5,195,452 – as this blog does – you may wonder if you are even reading the blog yourself)
100,000 sounds like a lot of sites. However, if you consider, global players (like Google or Microsoft), then big online retailers (like Tescos and Dell), then news sites (CNN, BBC) and national government information sites, you can see it must be pretty difficult to get into the club.
Beneath this blank chart, you will see “Percent of Internet users who visit this site” with a fraction of a percent if it’s anything like this one. (Maybe you’re Microsoft, in which case i guess it will be higher. Will check shortly.)
Then “average number of pages visited” and “3 months average traffic rank” (risibly low) and average page views per visitor (1) (1 🙂 Do you suspect that’s hard-coded?)
But the next bit is what creases me up for its randomness. People who visit this site come from (in order of most visits):
United States 40.0% (fair enough, the blog’s in English. Most English-speaking Internet users are in the USA)
Costa Rica 10.0%
United Kingdom 10.0%
Whydontyou.org.uk traffic rank in other countries: (These seem to be the same countries to me)
Costa Rica 46,349
United States 658,841
United Kingdom 703,872
Come on…. To what do we owe this unprecedented popularity in Costa Rica? India? France? This is a UK-based blog. Most of the stuff we witter on about, apart from atheism and technology, relates to the UK.
It’s not that I don’t want to believe it. A central American flavour to its posts would make this blog much more interesting. I just think the figures have been made up.
OK, let’s look at the sites that link here, according to Alexa. These are so out of date, that it’s obviously not been updated since the blog was a couple of months old. In fact, until I submitted a more recent image, Alexa had a screen shot of the blog that was well over a year old. (Yes, I know, that’s like saying “We don’t get enough spam here, please deluge us with as much as you can possibly manage”.) Maybe because of their age, the sites listed in some of these links are unrecognisable. In fact none of the blog links would be counted by Technorati, being over a year old, but then, it shows no links that Technorati counts (under 90 days.)
Let’s search for this blog on Google. Here, it’s wierder. There are few points of comparison between different Google results, if you repeat the search over a day or so. Maybe it’s just how Google treats blogs, but the post that comes up first is always the same one from a few months ago. Other posts can only be seen by asking for similar results, excluded the first time for being the same. Well, guess what Google, every post is different. It’s a blog. Lots of the other Google results for the blog are bits of the RSS feed. I’d like to think that lots of people are devouring the RSS feed, but, unfortunately, these tend to be link farms. In fact, lots of obscure references to the blog linkfarm sites turn up on Google, most being complete news to us. Real human-created references to the blog don’t turn up as often as they actually happen.
I could go on to the point where I was boring even myself.
None of this would matter if getting seen and indexed correctly wasn’t crucial to getting any visitors. I know that indexing engines and search engines are bomabarded with spammers trying every trick there is to get high on the first results page. The search engines have algorithms that are supposed to penalise sites and blogs that don’t match their definition of legitimate – density of keywords, number of inbound links, and so on. I believe that not only are these not working, they are often acting in exact reverse to their intentions.
Content from blogs get scraped and put into blag sites that exist just to spew out other people’s content. Google then decides the original source site has “duplicate” content and downranks it. How do you stop this without stopping legitimate blogs from commenting on your posts?
Keywords in the metatags don’t match teh keywords in the text? Well, duh, normal human beings aren’t thinking only of page rank. So they put keywords in their metatags then write content, without remembering to keep changing the metatags. Only people obsessed with search engine rankings do that and ,of course, a fair percentage of them aren’t just bloggers or normal website owners.
It’s not just a question of getting visitors. Anyone who wants to bring in revenue from their site or blog by displaying adverts gets judged by these bizarre standards. Some schemes base what they send you on your Alexa rating, which is itself derived from Google’s well-nigh arbitrary page rank . If you’ve ever tried to have GoogleAds on a site, you’ll see how abstract the GoogleAds process is. In fact, visitors who think they’re helping you pay for the site, so click a few times on your ads every time they visit will get you disqualified. Ditto, your rivals……. (It seems as if you get automatically disqualified anyway, at the very point that you might actually receive any revenue.)
I know it must be well nigh impossible to filter the enormous volume of material in the Internet, especially in the face of the number of spammers there are. However, there must be better ways of doing it. I am always amazed when people find things here and comment or email us about them. How do they manage to find it?
So here, is an unaccustomed prop for Technorati (unaccustomed for this blog, anyway, whioch has done its fair share of ranting about it). For all the irritating Technorati monster error messages and totally inconsistent service, Technorato remains the best performing indexing service that I’ve come across yet. The tags are really helpful when they work. You can still find an interesting read on someone’s first post. And Technorati isn’t yet totally under the sway of the giant players. The fabled Web 2.0 stuff really does still have something going for it.