El Reg has an interesting article on Google and how its ranking system is, effectively a black art. For a company which claims to “do no evil” it is bizarre how closed they keep their methods – surely shining the light of openness on how they work would be the “good” thing to do. While it might increase the risk of black hat SEO working, surely it would make it easier for everyone else as well. Why does the Google Search Ranking algorithm have to be a secret? Read more at: http://www.theregister.co.uk/2009/11/19/google_hand_of_god/
While I have been busy over the last few weeks I’ve been unable to spend time looking at the stats for my Flickr images – this is unusual because all of us at WhyDontYou towers are somewhat stats obsessed. However, I found time to catch up today and what a bit surprised.
Now there is a fairly consistent amount of views on the photos in my Flickr stream with predictable changes when I add new photos or put some effort into getting them more visibility. Unsurprisingly, almost all my visitors come from Flickr with a rare few being driven from various blogs (hardly ever here…shame on you all) or other sites. Today this consistency was there, with one exception – this image:
For some reason, this image has been getting more traffic than any other image over the last month, and almost all this traffic is coming from google images. Over the last month, 91% of the traffic to this image has come from google images (almost all using “Animals” as the search term) while a further 8% has come from Yahoo Images (even more bizarrely, this is normally from a search for “Boxing”…) with almost none coming from organic Flickr searches. I have tried both google and Yahoo searches and I cant make this image appear in the first 10 pages or so of either engine.
I find it a bit strange that enough people are searching for animal and / or boxing images to wade through pages of results before deciding to visit this one. I find it equally strange that, given the number of animal images I have on flickr, this effect seems isolated to this image. It has gone from relative obscurity (around 1000 views in total) to one of the most viewed images in my photostream – currently 5,556 views in just over a month. (Not that I am complaining but none of these people even leave comments!)
If anyone has any insight as what may be causing this, I’d love to hear it.
Come back flatterspam, all is forgiven.
For the past few days, the blog has been getting gibberish comment-spam, in oddly large numbers, almost at DDOS attack levels. (OK, I exaggerate but there were over 380 yesterday, 51 today.) Some of these comment spams are particularly weird, in that even the URLs are gibberish.
It’s not as if the random word generators have generated text in any known human language, that could trick the unwary into clicking on a link to onlinefakemeds.com or whatever. The URLS themselves are also random letter collections, with names like Mr._Mxyzptlk but less meaningful.
Charitably assuming that spammers have not completely taken leave of their senses, I guess that these suidfiojdfolsrkl.comstyle links go to redirects and do eventually take the unwary URL-clicker somewhere. (Obviously, I’m not going to try them out. I’m enough of a sucker for any worm or trojan anyway.)
But still, what is the point? It seems even less likely that people would click on a gibberish link in a mound of gibberish than that they would believe that a complete stranger in Africa would pay them ten percent for the assistance in transferring 64 million dollars.
A few more blog-related odds and ends, now I’m on the subject:
Apologies to anyone who expects to get email alerts about new posts here. This plug-in has just stopped working. We don’t know what happened so we have less than no idea how to change it.
The Atheist blogroll got broken so long ago, it’s almost a distant memory. Again apologies. We threw it away a few months after it got stuck permanently showing last August’s posts of about ten blogroll members. (Or something like that.)
Other things just randomly break anyway. For instance, there was a link to the Convention on Modern Liberty that only lasted a week or so.
Plus, this blog can load so slowly (even on my allegedly very fast connection) that It’s hard to see why anyone bothers waiting for it.
Except for all those visitors who are looking for Schwarzenegger, 5 fruit and veg, funny magic the gathering cards, Bodium castle, fairytale castles, fine art or morris dancing. These are the top search terms that consistently bring people here from Google. Every day.
Now, I am all for giving the public what they want, but there’s only so much that I have to say on any of these topics. So, most of these visitors must leave a little disappointed, to put it mildly.
This blog needs a serious “REDO FROM START.” It should happen soon…..
Everyone is scared about malware and hacking on the web. There is nothing wrong with this and there really is a genuine threat out there. People need to make sure that their browsing is as safe as possible. For most people, unless you are running a high volume internet banking transaction server this can be simply done by getting a good anti virus (AVG Free is cost effective) and a firewall (windows own, Zone Alarm or one on your router).
Despite this a lot of online organisations feel the need to join in and help out. Most modern browsers have built in “phishing filters” and will try to alert you when you click on what it thinks is an untoward link. This is all well and good and there are only minimal privacy implications.
Equally, search engines are doing the same thing now. When you google a search term, you get links with any potentially harmful ones highlighted. Just in case you ignore google’s advice, they have a blocking page pretty much ensuring you cant click through to malware from google. Again, this may seem all well and good but there are even more issues. For a start, it is down to google to decide what is, or isn’t malware. They may be correct 99% of the time, but what about the other 1%? It becomes the responsibility of the website owner to discover they have been flagged as “malware” by google and then jump through google’s hoops to clear their name. This is wrong.
More importantly, who is responsible when there is a problem with google? A sensible hacker could target google’s servers and create the illusion that certain companies are full of malware. It would take a brave person to ignore the warnings and keep going through to a site that is so heavily flagged on the search page.
Do you think this is unrealistic? Here is the results of a search I did today on www.google.co.uk – imaginatively I searched for “Google”:
The whole internet is infected with malware. Every link is flagged with the dire warning it may harm your computer. I am not alone in discovering this… (PCPlus simply suggests using another search engine for the afternoon, Neowin is more informative) Google isn’t hacked (this time), its just broken. The effect is the same though. Any attempt to search meets with this warning and googles intervention means you cant ignore it and click on. Well done Google – you have borked searching… Amazing.
This is (IMHO of course) the problem with allowing web services to have more and more control over our daily lives. It is bad enough that the most popular search engine on the internet suffers a glitch like this, but imagine if you were using Google to host your remote office systems – an outage can be crippling. Cloud computing may be in vogue, but it is fundamentally a bad idea. You can not delagate your responsibilities to unaccountable groups – you are responsible for making sure no malware gets on your PC, so why does google feel the need to intervene?
I decided to have a cup of coffee rather than randomly searching Google for a few minutes. For the good of the planet.
The Sunday Times reported that 2 Google searches have the the same carbon footprint as boiling water for a cup of tea. (I am hoping the same applies to coffee but I’m erring on the side of caution by forsaking half a dozen notional searches.)
These statistics aren’t completely convincing, being generated, as they were, by a guy who’s set up a website to sell a clean conscience to websites.
People want websites they visit to be eco-friendly. CO2Stats helps you attract and retain those visitors.
CO2Stats is the only service that automatically calculates your website’s total energy consumption, helps to make it more energy efficient, and then purchases audited renewable energy from wind and solar farms to neutralize its carbon footprint – all for a flat, affordable monthly fee. (from co2stats)
The estimated carbon footprint of your search varies wildly between
[Wissner-Gross’s] research indicates that viewing a simple web page generates about 0.02g of CO2 per second. This rises tenfold to about 0.2g of CO2 a second when viewing a website with complex images, animations or videos. (from the Sunday Times)
So, “stick to really dull webpages and don’t visit YouTube or sites that use Flash” sounds more immediately effective advice than buying spurious energy credits.
In any case, this turns out to be at the low-end of the carbon footprint estimates:
….. carbonfootprint.com, a British environmental consultancy, puts the CO2 emissions of a Google search at between 1g and 10g, depending on whether you have to start your PC or not. Simply running a PC generates between 40g and 80g per hour, he says. Chris Goodall, author of Ten Technologies to Save the Planet, estimates the carbon emissions of a Google search at 7g to 10g (assuming 15 minutes’ computer use).
Nicholas Carr, author of The Big Switch, Rewiring the World, has calculated that maintaining a character (known as an avatar) in the Second Life virtual reality game, requires 1,752 kilowatt hours of electricity per year. That is almost as much used by the average Brazilian.
Wait, if using a PC at all emits ~60g an hour, ie, 1g a minute, doesn’t that mean you are saving 0.8g a minute by looking at complex websites?
And that bit about “depending on whether you have to switch your PC on” is really confusing. (When I work out how to use my PC without switching it on, I’ll post the information here.)
I am sure that computer use is mostly a waste of energy. I am sure that big powerful servers are even greedier than my PC.
However, I’m not convinced by the idea that you can buy your way out of responsibility for ecological damage. Paying to generate some less-polluting-energy doesn’t mean that the more-polluting-energy you used before suddenly disappears.
Congestion charges, aviation carbon taxes and so on. They all suggest that you won’t cause ecological damage if you can afford to pay for it. It’s like buying and selling medieval indulgences.
This would be great if the Earth was susceptible to bribery. I think these schemes are usually just ways for us to avoid taking any real steps to stop destroying the Earth. In some ways, they are worse than doing nothing, because they give us the illusion that we are taking serious steps to save the environment and that we can do this without any major inconveniences.
And they give the climate-change deniers some pretty obvious strawmen to direct their denying at. For example, here are some of the comments on the Times article:
When does this global warming hysteria end. It seems like all these die-hard environmentalists would like us all living in huts with no electricity, comforts, or heating. Especially considering this freezing winter (against all predictions), I’d like to see them go first.
Like a mouse climbing up the leg of an elephant with rape on its mind. Global warming at/isn’t going to happen
I call for a moratorium on publishing articles like this one. The amount of CO2 generated when my head starts to steam is much higher than a Google search. Multiply that by the millions of sane people who agree with me that GW is a crock and GW might actually come true.
(Replace the misused “sane people” with a more accurate “Americans” and you get the flavour of a lot of these comments. What is it about living the USA that makes some people unable to see beyond their own carports?)
The calculations are ridiculous and blatantly misleading.
But no surprise, it appears that this will be another cold year and the “environmentalists” are running up and down in a total panic that they failed to fully socialize the world while for a few years was a bit warmer.
And why should we care how much energy Google uses…because of the myth of Global Warming that is being forced down our throats.
2007 was the warmest year on record, no wait, we were wrong about that, the warmest year was 1945. Artic sea ice will be gone soon, no wait, we were wrong about that
It looks as if even people who are too monumentally stupid to see that a cold year doesn’t in itself invalidate climate change are still bright enough to see that these figures are a bit bogus.
Why give them ammunition? The idea of a “carbon footprint” as an individual moral issue, susceptible to individual guilt and contrition is just mistaken. It’s obviously good to do whatever we can as individuals, but it’s a social and political issue, which needs serious social and political solutions.
(end opinionated rant.)
There is a new Google enterprise to get searchable digitised newspaper archives online. A great idea. (I’ve already had loads of educational fun with the Times archive and the Victorian British press archive that went subscriber only, just when it had completely engrossed me.)
The Google blog page has a link to Google’s press archive search but there’s a warning that you won’t find everything indexed. They suggest some searches.
Not every search will trigger this new content, but you can start by trying queries like [Nixon space shuttle] or [Titanic located]. Stories we’ve scanned under this initiative will appear alongside already-digitized material from publications like the New York Times as well as from archive aggregators, and are marked “Google News Archive.”
This instantly arouses my vapourware bullshit detector. Hmm. Space shuttle. The Titanic. First man on the moon… Maybe they’ve just stuck together a few very standard searches and plan to add lots more information as it becomes popular….. I feel impelled to test it a bit more rigorously.
I try a few off-the-wall searches. I pick the topics solely on the randomish basis that somebody’s mentioned the words to me in conversation today :
- “Dolph Lundgren” – 4,370 articles
- “Japanese swearword” – 279 articles
- “linear algebra” – 3,520 articles
- “Large Hadron Collider” – 3,370 articles.
- “Frozen vegetables” – 236,000 articles
Blimey. This actually works really well. I can’t claim to have clicked on more than a handful of links but the ones I did click on were legit.. It’s definitely not vapourware. It’s already damn good.
So, the big test, then. I’m going for my favourite indicator that a human twat-a-tron is at work. “Political correctness gone mad” gets 3,420 print archive hits.
Wait. I run it again, to see if the British press is represented. Just because I suspect that it must appear several times a day, so 3,240 seems a relatively small total. (It’s outnumbered by all the phrases above except “Japanese swearwords” and the consensus of press opinion seems to be that these don’t really exist.)
This time I get a mere 1,550 hits. Bloody inconsistent Google. Plus, the timeline is bizarre to say the least. It claims the first mention was between 1880 and 1559. The next was in 1782, then there’s one from 1805. … I think not. They are making these up. The 1958 ones looks like a mistake as well.
Closer inspection reveals that the “dates” have leaked in from elsewhere in an article. Most examples are huddled around the last 8 years. In fact there’s barely an instance of political correctness gone mad until 1998. It’s only in the past couple of years that the full flowering of the phrase has taken off.
“The PC brigade” (h/t Alun) got 467. Ignoring the dating oddities, these are also clustered around the turn of the century, with a linguistic take-off from 2000.
These numbers are tiny. Ah ha. Google hasn’t archived the Daily Mail. 🙂 (No hits for “the Daily Mail is shit”, h/t Tom Donald)
Look, if they are only going to index serious newspapers, there is going to be no fun in this.
However, they must have archived a fair bit of newsprint crap, because “the Rapture” brings back a stunning 18,300 reports.
First mention is 0 AD 😀
Cuil, Cuil ffs? Repress a shudder at the name. It’s a (relatively) new search engine. It’s good, although it’s had a bit of a critical drubbing. It’s much prettier than google. Its results make a lot of sense. It’s not stuffed with sponsored links or spam links or dominated by top-ten-authority corporate results. So I think I like it, although I’ve only used it on test basis.
I also really like Ubuntu. Of course, any Linux version is admirable. and Ubuntu is more admirable than most.
I am just going to have a pointless rant about the branding – calling things ethnic-sounding names to make perfectly good and worthy things sound just that bit more credible.
The wikipedia entry doesn’t do much to disspell any impulse to sneer at the Cuil name:
The Irish ancestry of Anna Patterson’s husband Tom Costello sparked the name Cuil, which the company states is taken from a series of Celtic folklore stories involving a character called Finn McCuill. The company says that Cuil is Irish for knowledge and hazel.
That’s “Irish ancestry” in the sense of “American Irish”, then? (One Irish great-great grandparent and an Irish surname qualify any American as Irish. Although I remain to be convinced that Costello really counts, here….)
Wikipedia does some serious undercutting of the legitimacy of the Irish ethnic explanation for the brandname, from a standpoint of linguistics. Which feeds my instinctive prejudice against the word, the spelling and its supposed “cool” pronunciation.
I used to get riled every time I saw claims that Ubuntu was the “African word for” something, as if Africa didn’t have more languages than any other continent in the world.
Ubuntu is an African word meaning ‘Humanity to others’, or ‘I am what I am because of who we all are’. The Ubuntu distribution brings the spirit of Ubuntu to the software world. (from Ubuntu.com)
I have to turn my pedantry against myself. That said “An African word for” not “The African word for”. Maybe I have been misjudged Unbuntu. I do a cuil search for “ubuntu is african for.” The first page is whole string of official ubuntu links, none of which say it is the African word for anything. In fact, many of the definitions that turn up are reasonably precise, a Zulu word and a South African philosophy.
My bad. I must have imagined the “African word for” phrase, misremembering the blurb from the old distro I have somewhere.
But google and cuil do both unveil an apparent subgenre of geek humour based on the misremembered “Ubuntu is African for”
Ubuntu is African for ‘Can’t configure Debian’. (typical link: Ubuntu forum post)
Indeed. ubuntu is african for ” I CANT CONFIGURE SLACKWARE”
(typical link: Another forum)
ubuntu is African for “time sucker”, right? (link: I-phone blog forum)
Ubuntu is African for “struggles to install mouses”. (from information rain)
Most off-the-wall is
Ubuntu is African for sharks with freaking laser beams on its head. (from animetro)
Am I beginning to see a pattern, here? I’ll have to try it.
Cuil is Irish for “excuse to use a disgustingly lame pun in a blog title”