CJC.org » Blog Archive » Comments Spam

Comments Spam

I’ve been getting a lot of comments spam in the past few days trying to improve the pagerank of some online pharmacy. Firewall rules haven’t been that effective, since these appear to be sent by robots running on zombied machines across a number of IP nets. Turning on moderation was also not particularly effective, in the sense that the robots are already running and aimed at my site, and I don’t want to wade through all the comments moderation email.

The best thing I could think of was a captcha, which requires an extra field entry before the comment can be approved. The font is actually a bit hard to read, even for humans, though. The hack was found at Gudlyf’s World, after a pointer at a general WordPress anti-spam page. The only modifications had to do with how authimage.php was being referenced, i.e., a URL path issue.

My site isn’t sufficiently interesting to go through the effort, but I recalled reading a few months ago that captcha techniques had already been circumvented, or at least defeated in theory. Basically, spammers have harnessed the power of porn on the Internet to defeat captcha. Ingenious. Evil, but ingenious: it’s a simple idea, obvious when you hear about it, that defeats any sort of captcha performed on the Net. A further implication, I’m sure that’s already been brought up elsewhere, is that, for certain puposes, the Internet can be considered a cyborg, a mixture of organic and machine. As a cyborg, the Internet displays sufficient (collective, human) intelligence to pass Turing tests, or exhibit encyclopedic knowledge about obscure technical questions, like the capabilties of an IBM Selectric, circa 1972. The trick is to be able to harness this potential.

Posted on Wednesday, September 29th, 2004 at 2:23 pm and is filed under Ideas, Tech, The Blog Itself. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

5 Responses to “Comments Spam”

Sigg3 Says:
December 14th, 2004 at 5:53 am
You could check out the anti-spam for b2, and see wether the Avert_Spam could be re-written for WP.

Check: http://www.sigg3.net/cafelog/
Cheng Says:
December 14th, 2004 at 6:40 am
That’s actually a decent idea: a hidden variable with a value related to the browser’s IP address (possibly MD5 hashed for obscurity) in the comments form, and then a check for that value in the PHP script that processes the form. Since the spam robots generally just invoke the processing script, it should work reasonably. It should also be relatively simple to modify for WP, though I probably won’t look into doing so myself.

Some drawbacks that I can see off-hand: some big ISPs (such as AOL) use web proxies that rotate IP addresses, so it’s possible for a user on one of these ISPs to present two different IPs between seeing the comments form and the actual posting. A second problem would be that this relies on the non-standardness of the hack. The spam robot authors are presumably unaware of this hidden field, but if this hack becomes popular/official, it would be easy to circumvent, though at some cost as the robot would have to make two different requests from the server (cost to the robot, cost to your server, too). The nice thing about captcha methods is that they are simple Turing tests, and are relatively difficult to code for by the robot authors.
Sigg3 Says:
January 24th, 2005 at 1:56 pm
Yup, can see that’n.
Still, with the md5 variable and a word verify (like yours) I haven’t gotten _any_ spam at all…
Cheng Says:
January 24th, 2005 at 5:43 pm
I haven’t gotten any spam with just the captcha, and I don’t believe I’m reducing the ability of people behind multiple-IP proxies from commenting.
CJC.org Says:
February 1st, 2005 at 7:35 am
Trackback Spam
Well, I got bombed last night with some 300 trackbacks advertising an online poker site. WordPress handles trackbacks processing differently from comments processing (even though they all wind up in the same database table), so this spam robot escaped…