Spam Avalanche

I am not sure if it was a special event, but for some reason on 04 Feb 2009, this blog was innundated with spam comments.

Now, as any blogger will know blogs get spam comments. We get a fair few of which most (99.85% if you believe Akismet Stats) get caught by the anti-spam. It is, rightly or wrongly, one of the prices you pay for having a blog. It is slightly amusing that around a third of the spam comments are advertising spam-commenting systems but most are tediously repetetive. Every now and then Heather gets it into her head to read, and subsequently rant about, some of them but generally we are happy to ignore them.

However, on Wednesday we were flooded with spam comments. According to Akismet stats (which broadly mirror my recollections), we had 3.5 times as many spam comments as the previous peak (09 Jan 09) and a massive 16 times as many as the average spam comments. We had more spam in that 24 hour period than we’d had in the whole of August and September last year. Fortunately Akismet caught the lot, but it was bizarre. In the time it took to click on “delete all spam now” another 50-odd messages arrived. Equally odd, few were “normal” spam in which something was advertised, most were just strings of random letters and urls pointing to random letter domains. I really have no idea what the spammers hoped to achieve, unless it was an attempt to overwhelm Akismet worldwide…

Anyway, the main point is that the volume of spam meant there was no way we were going to read through it and see if any legit messages had been trapped. In the massively unlikely event that you had a message deleted, this is why.

If anyone knows why 4 Feb was World Spam Day please let me know.

Stupidity and lies for Jesus

Always willing to flog a dead horse, I’ve stumbled across more mind-bending nonsense on the crazy-fest that is Yahoo! Answers. As I mentioned previously, this (*) is a haven for the weird and wonderful ideas people can come up with. Sadly, in the best of Web 2.0 traditions, idiocy, bad education and lies rise to the surface while real education gets drowned under the stupidity of the commons. I honestly think that if a good answer ever turned up it would be drowned under the idiocy (and get so many thumbs down) it would quickly flee for its life.

The most recent idiocy to draw my attention is a month old question titled “Do fossils of now extinct creatures such as dinosaurs prove evolution?” (see original)

At first site this looks like a legitimate question. It is the sort of question you would expect inquisitive school children to ask. It gives the chance for a well thought out answer about the nature of fossils, what evolutionary theory is and how scientific proofs work. You can imagine it being the sort of question a teacher would set a class to see what research they carry out. Well, Toutatis forbid they type the question into a search engine. The results are shocking. To an otherwise ignorant person seeing to improve their education, this search would be disastrous. Anyway, back to the question.

After an innocent start (obviously to trick the unwary), the question continues:

The fact that dinosaurs once lived and are now extinct is no proof of evolution. Such fossils merely show us that certain species once living were destroyed and became extinct. Theorists have been able to reach no general agreement on the cause or causes of extinction. The theories on this subject are numerous and sometimes very imaginative. Since most fossils are found in sedimentary rocks and show signs of catastrophic burial, they seem to point to a global flood as the principal cause of extinction. They must have lived on earth at the same time, just as the Bible implies.

Oh dear Belenus! It is true that the fact dinosaurs lived once and not any more is not proof of evolution. After a promising start it crashes down into a pile of blithering idiocy. So far so uneducated. Next we get:

If the flood-geology interpretation of geological strata is correct, all or most dinosaurs became extinct at the time of the flood. Until that time, then, man and dinosaurs lived on the earth at the same time.

Its good that he uses an “if” to start there. I agree that if the flood geology interpretation were correct dinosaurs died at the flood. However it isn’t. It isn’t even close. Man and Dino did not live on Earth at the same time. It really is that easy.

So far this is just standard creationist idiocy. It is the sad product of poor education, poor understanding and religious doctrine combining. As always though, the monumental lack of evidence to support creationism causes problems and the TRUE BELIEVER© is forced to lie for Jesus. It happens all the time. In all types of debate. The stronger the persons faith, the more they seem willing to lie for their deity. I find the irony very entertaining. Here we have:

Is there any EVIDENCE outside of the Bible to support this view? Yes, there is. It is well known that along the Paluxy River in Texas many dinosaur footprints have been found in limestone strata classified as Cretaceous. Not so well known is the fact that for about fifty years human footprints have been reported in the same strata.

Taranis give me strength. Don’t you just love it when some one asks a question that they answer themselves? Yes. (all puns intended). The only evidence to support humans and dinosaurs co-existing is in the minds of creationists. It isn’t even in the Bible. It is pure fiction. The Flintstones is not real. Lying for Jesus is still lying. The crazy questioner finishes off with his bit of conspiracy theory for Jesus nonsense:

Source: Footprints in Stone(color-sound film)
But since the concept that man lived with dinosaurs is incompatible with the theory of evolution, many Scientists dismiss this documentary for the persuasive evidence unfolded.

Man living with Dino is not incompatible with evolution. The “documentary” evidence cites is not dismissed for that reason and it really is not persuasive…

The screaming stupidity that is Yahoo! Answers comes out in the “best answer” chosen by the “asker.” As is so often the case, the person chooses a best answer that restates whatever idiocy they agree with. This is no different:

I do agree with you to some extent. It is impossible for humans to prove the actual “age” of the extinct dinosaur remains. When scientists try to “determine” the age of the dinosaur remains by soil composition and “carbon dating” etc, I just shake my head. Anybody can make an assumption about life that way. It is also impossible for humans to determine exactly how old the history of mankind is as well. Remember, in the early days of creation, people lived much longer then we do now. Of course they did. Adam lived for 930 years, and his son Seth lived for 912 years. Before the flood, many people lived well into their hundreds. There was a wonderful balance of nature then. No pollution or anything “man-made” existed to destroy that balance. God knew what he was doing right from the very beginning. His creation and existence is perfection in itself – he is the superb mastercraftsman! I bow to his absolute genius…

It is mind-numbing in its stupidity. What on Earth is age doing in quotes? What is the idiot trying to say? Putting determine in sneer quotes – what is that all about? The whole answer manages to be so far from the truth it is almost beyond belief. It isn’t even internally consistent. Even in Biblical terms there were lots of man-made things before the flood – the Ark for example…

The wonders of the internet (and specifically web 2.0) push this stupidity to the top of a search engine query. The miracle of Web2.0 gives the asker the chance to give prominence to the madness that the person asking the question wants to be seen as the answer. Yes, if you scroll down you can find better answers but not everyone is going to do that and, crucially, when they have had their reasoning tainted by the initial two bits, they will be more sceptical of the truth than of the idiocy.

Web 2.0 is not about empowerment and it certainly is not about the shared wisdom of the masses. The tragedy of the commons seems so much more appropriate.

* I suspose this may be a specific problem to the Religion and Spirituality part of Yahoo Answers, but the other sections seem to be riddled with nutjob answers…

Fame and Fortune

Status

Well, actually maybe neither fame or fortune, but I have just realised I am a “featured photographer” on Flickr now! (Check out the Strangford pages, you may have to scroll down a bit though…). I am sure this is of little interest to any one who is not in my immediate family, but I couldn’t resist 🙂 [edited to add Newtonabbey pages as well! Wow!]

Success breeds Success

Abandoned Millstone at Prudhoe

Abandoned Millstone at Prudhoe,
originally uploaded by etrusia_uk.

It is an interesting problem about the world of “Web 2.0” (and I hate that buzz word, I will make sacrifices to Odin as an apology), but one of the main promises it makes is let down by the fundamental way the processes work.

The web was touted as being able to democratise the world, allowing the most insignificant person the ability to have their message heard across the world. This was taken to a new level with the advent of Web 2.0 applications and now, everyone is supposed to be able to get video, audio or text out into the world with ease.

Partially, this is true. In the sense that pretty much anyone with a computer and net access can make a blog, upload video or audio tracks, the ideals of the Web/Web2.0 are sound. The problem lies with its finer implementations.

Think about your own browsing habits. Think about what blogs you read on a regular basis, what videos you watch on YouTube and what websites you visit. Think about how you find new things (Google? Yahoo?) and you can see that the vast majority of things will be the “highest ranked” for a given genre or search term. If you do a google search for something you are interested in, what are the odds you will delve to the seventeenth page of results and look at the sites there – much less link to them in your own sites and improve their page rank. Blogs are the same, Technorati page rank can spell a death sentence for a blog or, in equal but opposite measures, propel the blog to server breaking hits. Digg, Reddit and the like are all similar.

The basic flaw is a catch-22-like situation. Until your website/blog/whatever becomes popular no one can find it, but it wont become popular until people can find it. Certainly there are workarounds – for example, when Technorati indexed the Atheist blogroll it was a surefire way for otherwise low-profile atheist blogs to get disproportionately high rankings – but these are far from certain. However, once a blog or site (or whatever, I will use the terms interchangeably to mean generic things on the web) gets that high ranking, the success will breed itself.

This crops up in many areas: for example, on Technorati the most popular blog is engadget, so more people read its posts and more people favourite or link it, meaning it gets more popularity. Moving away from blogs themselves you get situations like the “Top Five / Most Emailed” on Scienceblogs – these posts get more exposure to the general public and, as a result get more hits and remain in the top five. It gets to the stage where a “Popular” item can be an order of magnitude away from the “normal” items, simply because its success breeds more success.

Pinhole Effect on FarmhouseMoving to the picture starting this meandering, Flickr has a variety of ways in which you can grade your pictures – interestingness, most comments, most views or by the number of times they have been made a favourite by someone. The picture from Prudhoe Castle (above) is the winner of the “interestingness” stakes. Despite being on flickr for months, it has only generated 98 views, 2 favourites and 5 comments – but this is enough to top the polls over pictures which have had more views, more favourites and more comments. So, in the interests of breeding further success, I am posting the picture here to see if it gets more comments, more views or more favourites. To see if the effect is repeatable, I am also including the picture which currently has the most comments (and a lot of views, but no where near the most).

Whatever happens, on the web as in real life, it seems that the more successful you are, the more successful you become. Breaking into that “winners circle” is not an easy thing. Despite the golden hopes that the web would democratise everyone, the reality is that (with the support of Google et al), the web is concentrating the provision of information into small, partisan, groups.

Is this a good thing?

[tags]Democracy, Web, Web 2.0, Internet, Philosophy, Society, Culture, Success, Random Thoughts, Prudhoe Castle, Prudhoe, Castle, Photo, Flickr, Photographs, Technology[/tags]

When Technology Goes Bad

Comically, it seems Technorati has died a death again this weekend. This is becoming a regular occurrence now (read through some of the posts here about it) and, given the nature of the industry in which Technocrappy wants to compete I really did think they would try harder. It seems they don’t.

Still, I am as fickle as the next person and more than happy to bask in the temporary brilliance of their current mistake. It seems that, today, this blog is ranked Number 1 in the world. If you don’t believe me, have a look at this screenshot:

Technorati Screenshot - Taken 04 Aug 07

As you can see, this is recent and we are, indeed number 1 in the world 🙂 Sadly, I am not convinced this will last for long… (Check out the ranking yourself and see if has reverted to our normal, low, position) [tags]Technorati, Page Rank, Technorati Rank, Technorati Monster, Technology, Bad Technology, Web Service, Web 2.0, Social Web, Blogs, Blog Aggregators[/tags]

CAPTCHA – Work of the Devil

Every one knows that allowing bots to post things on people’s behalf is a bad thing. I mean it contributes to spam comments on blogs – which no one likes. Obviously anything which works against this evil is a GoodThing®?

Well, no. I don’t agree. First off, there are better ways to prevent things like automated signups, automated submissions and spam bots. More importantly, they are such an annoying thing I can’t for one second think they do not drive visitors/subscribers and commenters away. Now, I would love to see the business model of a website (especially a “Web 2.0” one) which is happy to drive a percentage (however small) of it’s customers away.

Now, I am healthy, have good eyesight and fully functional manual control – and I have a hard enough time getting round some of the CAPTCHAs out there. I dread to think what it is like for people who have even slight visual impairments or motor co-ordination issues. Over the last few weeks, I have suffered numerous, infuriating, problems with CAPTCHAs on sites which really should know better. Continue reading

Uninspiring .Net

I have tried to hold off commenting on this month’s issue of .Net magazine. In recent months, the magazine has been showing signs of greatness and some of the recent articles have been inspirational and educational.

Not this time.

Generally speaking, the May 2007 issue (number 162) is completely dull. The cover articles range from potentially interesting “The Power Of Type” to ones you know will be dull, namely “Can the Web save the world?” Forgive me, I never realised I’d bought the Economist by mistake….

The saving the world article is about the OLPC project. This is a project to get children in the third world laptops. I am going to steer clear of any potentially dangerous topics, but I can’t help but think that giving them food, water, shelter and the like would be a lot better. Giving them laptops (and I assume net access) is not going to feed them. I hope they are English speakers as well…

For a while I thought there was some webdesign links to the OLPC, but as the site appears to be unavailable, I can’t confirm. Suffice it to say, it struck me as three pages of filler content.

The filler content thing seems prevalent this month. Reading the magazine I got the definite impression that, although a magazine had to be published, they had nothing to say. Every one of the articles is excessively wordy, and the use of pages of graphics has reached new highs. The “advice” section is pretty poor, for example the graphics teach you how to design a type face. This basically consists of write the text you want, scan it in and use it… Seriously (4 pages though). In the “Expert Advice” there is a box out titled Understanding ID and Classes. I defy any one who doesn’t already understand them to understand them after reading this…

All in all, this is certainly not an issue of the magazine which you read and then run to the computer, fire up Dreamweaver (or Bluefish) and get coding. Even the reviews section is sparse. If I wasn’t a subscriber, I wouldn’t have bought this in the shop.

Themes and Upgrades

Well, it seems my hopes that the last theme I tried out would be the “be all and end all” theme for the blog were dashed against the rocks of reality.

It seems that something on the theme “Cleaker 2.1” was quite badly broken when viewed in IE6. This is a big shame because I really did like that theme. However, more than 35% of the hits this site gets are from IE6 (with almost another 5% coming from IE versions older than 6), so this is not a problem we can ignore.

Screenshot Showing the Site ThemeThere is now a new theme (minor additional changes may have taken place) and the image you see here shows how it is expected to look. If you are seeing something radically different from this can you please let us know?

Although I am not as enamoured with this theme as I was the previous one, it appears to work even in old versions of IE so it may be kept for a while.

This leads me on to another, important (to me) issue. If you are using IE 6 or older – UPGRADE! Please, for the love of Tim Berners Lee get a more modern browser. I am loathe to say IE7, but it is better than IE6. For the 0.4% of you who insist on using IE 4 or older, you really are missing out a lot of what the internet has to offer. I mean, people talk about Web 2.0 and there are still around 5 people a day who come here using Web 0.1beta browsers…

Download FireFox, Opera, Mozilla, SeaMonkey or even (gasp) IE7. They are all free!

Well, at least I have got that of my chest.

(p.s. before any Apple / Linux / BSD etc people pipe up – Windows accounts for over 75% of the traffic to this site)

Tagging the untagged

This blog has been going through some traumatic changes to its functionality.

It doesn’t look much different because most of the changes to its appearance were repellent in IE6 and earlier browsers, although they looked great in IE7, so it’s temporarily reverted to a look which it’s had for .. oh, I don’t know… all of about 6 weeks.

The main differences for visitors is that you can find much more by tags, as if the blog was trying to be a mini-Technorati. You can open the Tag Archive page and search on several tags. (These are even presented in a tag cloud.)

The big difference for us is that we can tag things by just clicking on them. Adding tags used to be like pulling teeth. It probably contributed to my blogs being unfeasibly long because I couldn’t bear to have to go through the tagging process again (like a graffiti artist with a sore arm?) So the outcome should be less blog words, more tag words. Or at least, more tag words.

However, we don’t have full tagging liftoff yet.The older posts either don’t have any tags or only have WordPress category tags. By older, I mean “up to January 2007”. So that’s nearly all of them. As the posts here go back over a year, it’s an arduous task to add tags and it’s getting done piecemeal. All the same. it should be possible to find most of what we have for most of the topics.

And by the way, why do people keep typing “none” into the search bit in the header? This is just bizarre. It’s not when people click on the search box without putting anything in, because that brings up a blank page.

Web traffic analysis=nonsense

What is it with search engines? and web-traffic rankers?

This blog has done enough whining about Technorati’s randomness. It’s well overdue to say that it’s probably working far more consistently and reliably than most of the facilities that claim to find Internet resources. (On a note that shows how shamelessly susceptible to flattery we are at whydontyou.org.uk – others please take note – it puts this blog at under 60,000 in the blogosphere which is almost beyond its wildest dreams.)

As an experiment, look up your blog in a few search engines. See if you can find any points in common between them.

Here’s one of my favourites in that I suspect they actually must a randomiser to generate web traffic numbers and links. Pick a blog, look at it in technorati’s blog directory.

Go to the traffic rank bit and click on it. You will find yourself in the realm of Alexa. This will probably show you that the traffic isnt really counted because the blog isn’t in the top 100,000. The daily page views are shown as a percent of people using the whole Internet, i.e., if the site isnt in the top 100,000 sites in the world, you wont get any figures. (If you come in at a newbie 5,195,452 – as this blog does – you may wonder if you are even reading the blog yourself)

100,000 sounds like a lot of sites. However, if you consider, global players (like Google or Microsoft), then big online retailers (like Tescos and Dell), then news sites (CNN, BBC) and national government information sites, you can see it must be pretty difficult to get into the club.

Beneath this blank chart, you will see “Percent of Internet users who visit this site” with a fraction of a percent if it’s anything like this one. (Maybe you’re Microsoft, in which case i guess it will be higher. Will check shortly.)
Then “average number of pages visited” and “3 months average traffic rank” (risibly low) and average page views per visitor (1) (1 🙂 Do you suspect that’s hard-coded?)

But the next bit is what creases me up for its randomness. People who visit this site come from (in order of most visits):

United States 40.0% (fair enough, the blog’s in English. Most English-speaking Internet users are in the USA)
France 20.0%
India 20.0%
Costa Rica 10.0%
United Kingdom 10.0%

Whydontyou.org.uk traffic rank in other countries: (These seem to be the same countries to me)
Costa Rica 46,349
India 167,900
France 170,280
United States 658,841
United Kingdom 703,872

Come on…. To what do we owe this unprecedented popularity in Costa Rica? India? France? This is a UK-based blog. Most of the stuff we witter on about, apart from atheism and technology, relates to the UK.

It’s not that I don’t want to believe it. A central American flavour to its posts would make this blog much more interesting. I just think the figures have been made up.

OK, let’s look at the sites that link here, according to Alexa. These are so out of date, that it’s obviously not been updated since the blog was a couple of months old. In fact, until I submitted a more recent image, Alexa had a screen shot of the blog that was well over a year old. (Yes, I know, that’s like saying “We don’t get enough spam here, please deluge us with as much as you can possibly manage”.) Maybe because of their age, the sites listed in some of these links are unrecognisable. In fact none of the blog links would be counted by Technorati, being over a year old, but then, it shows no links that Technorati counts (under 90 days.)

Let’s search for this blog on Google. Here, it’s wierder. There are few points of comparison between different Google results, if you repeat the search over a day or so. Maybe it’s just how Google treats blogs, but the post that comes up first is always the same one from a few months ago. Other posts can only be seen by asking for similar results, excluded the first time for being the same. Well, guess what Google, every post is different. It’s a blog. Lots of the other Google results for the blog are bits of the RSS feed. I’d like to think that lots of people are devouring the RSS feed, but, unfortunately, these tend to be link farms. In fact, lots of obscure references to the blog linkfarm sites turn up on Google, most being complete news to us. Real human-created references to the blog don’t turn up as often as they actually happen.

I could go on to the point where I was boring even myself.

None of this would matter if getting seen and indexed correctly wasn’t crucial to getting any visitors. I know that indexing engines and search engines are bomabarded with spammers trying every trick there is to get high on the first results page. The search engines have algorithms that are supposed to penalise sites and blogs that don’t match their definition of legitimate – density of keywords, number of inbound links, and so on. I believe that not only are these not working, they are often acting in exact reverse to their intentions.

Content from blogs get scraped and put into blag sites that exist just to spew out other people’s content. Google then decides the original source site has “duplicate” content and downranks it. How do you stop this without stopping legitimate blogs from commenting on your posts?

Keywords in the metatags don’t match teh keywords in the text? Well, duh, normal human beings aren’t thinking only of page rank. So they put keywords in their metatags then write content, without remembering to keep changing the metatags. Only people obsessed with search engine rankings do that and ,of course, a fair percentage of them aren’t just bloggers or normal website owners.

It’s not just a question of getting visitors. Anyone who wants to bring in revenue from their site or blog by displaying adverts gets judged by these bizarre standards. Some schemes base what they send you on your Alexa rating, which is itself derived from Google’s well-nigh arbitrary page rank . If you’ve ever tried to have GoogleAds on a site, you’ll see how abstract the GoogleAds process is. In fact, visitors who think they’re helping you pay for the site, so click a few times on your ads every time they visit will get you disqualified. Ditto, your rivals……. (It seems as if you get automatically disqualified anyway, at the very point that you might actually receive any revenue.)

I know it must be well nigh impossible to filter the enormous volume of material in the Internet, especially in the face of the number of spammers there are. However, there must be better ways of doing it. I am always amazed when people find things here and comment or email us about them. How do they manage to find it?

So here, is an unaccustomed prop for Technorati (unaccustomed for this blog, anyway, whioch has done its fair share of ranting about it). For all the irritating Technorati monster error messages and totally inconsistent service, Technorato remains the best performing indexing service that I’ve come across yet. The tags are really helpful when they work. You can still find an interesting read on someone’s first post. And Technorati isn’t yet totally under the sway of the giant players. The fabled Web 2.0 stuff really does still have something going for it.