Pages tagged ""

Captcha

Posted on 28th August, 2008 by Heather

“Captcha is the bane of the internet,” says Matt Mullenweg, who runs the massively popular blogging site Wordpress.com. “I can’t figure them out myself half the time!” (from the Guardian technology page today)

This is from a Guardian piece discussiing how captchas are welll and truly broken - by algorithms and by cheap human labour -thus increasing the volume of blog comment spam. The writer suggests Akismet or the type of non-machine readable questions that you find on ApathySketchpad as viable alternatives.

I’m comment-impaired at the best of times. I’ll try and comment on a blog and find that my comment just disappears. Granted, this suggests the universe has an innate capacity for mercy. But, just occasionally, the words that disappear into the net’s black hole were comments that I really wanted to make. So, I’ll try and rewrite it, in a half-hearted fashion. It will disappear again. I’ll have a final stab at writing. And sending. But by this time, it’s incoherent garbage, sent only to show the comment-eating demon who’s boss.

And then the captcha is there mocking you. Matt Mullenweg is so right, except, on his own proud boast, at least he gets them right half of the time. Falling foul of captcha is a daily occurrence here at WhyDontYou Towers. And a score of 50% correct is just a fond dream.

The idea is that only humans can read the things. A reverse Turing test. This whole concept falls down on the point that any shapes that are too unlike characters to be read by a souped up OCR-style algorithm are much too unlike letters or numbers for human beings to interpret them.

Even when you can distinguish those shapes that are meant to be characters from the deliberately inserted wavy lines, you face something like:

oo9I0g

There is no way to reliably distinguish between 9 and g, 0 and O, 1 and l and I.

So you type in zero zero nine one zero g, on the offchance. It rejects you. You don’t get another shot at the ambiguous letters.

Oh no. A fresh bleeding captcha. This time you find you have to choose between identifying a letter as either a very thin letter j or the letter i with a slight curve at the bottom. Failed again.

Next time it’s either an l with a slight curve at the top or an anorexic letter c. Ok, got the c right but then you thought that oddly shaped capital A was a 4, didn’t you? Robotic fool.

By this time, the human-detector software has often decided you are a bot cos you couldn’t even guess one out of 3. So your comment is bounced anyway.

If you’ve ever thought that you might as well go for the disabled option, don’t bother. That’s not worth it either. Captchas that claim to be for the disabled are actually even harder to use than their able-bodied comrades. Different experiences you can have with the accessibility captcha include:

  • A long silence. So you think it’s not working and cancel a fraction of a second after it kicks in.
  • so much feedback and background weird noises (to simulate the visual noise on the visual captcha) that you couldn’t even work out what it’s saying if you had a comic book aural discrimination superpower.
  • Voices so bizarrely accented and echoey that you are stunned by the novelty that this is suposed to represent speech. So you don’t notice, let alone memorise, the content as it racespast you in a jumble of syllables.
  • The disabled version sometimes matches the written one and sometimes doesn’t. Which one do you try? The wrong one, of course.

The whole concept of the disabled one seems stupid to me. You are assumed to be too blind to see the captcha image. So how do you see the captcha box and spot where the disabled button is? Are the blind fitted with memory enhancement chips that let them translate a string of meaningless letters and numbers from the native gibberese AND remember them long enough for their screen reader to kick in and tell them where to type?

Popularity: 16% [?]


Popularity: 16% [?]

Comment spam up by 76% percent

Posted on 10th April, 2008 by Heather

I made up the number. Spurious statistics are so convincing.

Spam is definitely up though, as you know very well if you have a blog. If it weren’t for Akismet, this blog would be buried under the weight of it. A year or so ago, a few comment spams would be waiting in the Delete queue every couple of days. Now there are about 60 a day. And the buggers are growing in length. There are single spams with lists of keywords and links long enough to fill a few sides of A4. (Letter for those used to US paper sizes.)

Calculated across the whole lifetime of this blog, there have been 9 comment spams to every post. (That’s a real statistic. I didn’t just make it up, honest. I even used Calculator.) Given that Akismet wasn’t installed for a many months and that most of these spams have arrived in the past few months, the ratio of spam to post is currently very much higher.

My plan was to list the most ludicrous. But they aren’t even funny. They offer porn, online medicines, cars, loans, yada yada, yada. I imagine that even someone who is desperate to buy any of these would think twice about clicking on a link on a spam comment. In fact, is it even remotely possible that someone without an attested mental illness has ever clicked on one of these links in blogspam?

More sophisticated spams aim to pass a cursory blog-entry Turing test by using stock human phrases. Ofteb in a mechanical “translated-from-the-Finnish-using-Babelfish” way. E.g. two of this evening’s crop are “very true statement, we have gotten in much trouble on that notion historically.” and “Hi! Without taking into account the issue of establishing a stone by God, which he won’t be able to pick up, how do you think, may be something in this world, what can God never see?”

What? The characters come from the Western European standard characterset; the words are in an English dictionary; the sentences have nouns and verbs and punctuation - generally including a liberal use of the exclamation mark!!! But the phrases might as well be in a management report for all the sense they convey.

Some comments fake having read a blog post, with generic comments that could apply to any post - “Interesting post on *name of blog* today” - or claims that they haven’t quite understood what you were saying but want to know more. Well, they’re bots, ffs. Of course they haven’t understood your post. You were addressing a mammalian readership.

Others shamelessly flatter your writing style or your blog in general. (“Good portal!” “I like this work!”) The idea must be that the recipient is so blinded by recognition of their innate genius that they fail to notice it’s a spam and let the comment through. My head is at least as easily turned by dumb admiration as anyone’s, but even I have to pass this unsolicited admiration through reality filters.

In fact, these spams really annoy me because sometimes I do just want to comment on someone’s blog to say “Good post.” I’ve got nothing witty or pertinent to contribute. I just want to let the writer know I enjoyed it. But, the fact that it makes me seem like a comment spammer puts a stop to that.

A major irritation caused by spams is that we often accidentally delete real comments that have ended up in the spam pile. If you are commenting from an academic IP, it’s pretty certain that your institution’s email has been used to pour out spam, so Akismet is likely to block you. For a blogger, s it’s sometimes too much effort to pore through 40 comments on the offchance that one is a real person. So, real comments get thrown out, in a baby and bathwater scenario.

Comment spam costs pretty well nothing to create, so whatever the producers charge their customers must be pure profit. Bah. Some bugger is making money while you’re wasting your precious life-force deleting the latest missive from “daniel@msn.com” (a regular spam commenter whom we’ll probably all recognise).

Akismet does a fair job of dealing with it. I don’t know what other solution there is - or if it even matters as more than a stupid waste of bandwidth.

I obsessively look up the IPs and locations of the worst ones. Toutatis only knows why. (Most of the IPs will be spoofed anyway.) But I can glare at Riga or direct withering scorn at Hong Kong on Google Earth and feel that “I’ve got your number”. That must count for something…

Popularity: 23% [?]


Popularity: 23% [?]