The stupidity of spambots
The world moved on since this content was published 3 years 10 months 19 days ago. Exercise diligent care when following any instructions and see opinions in the time they were written. If you must have an updated version, please ask kindly through the contact page.Since day 1, I’ve been using content moderation for any comments posted to this site. This basicly means I have to manually approve a comment, before it will show on the site. I get a nice email as soon as someone leaves a comment, asking me to either approve or deny the comment. It does introduce a delay, but since almost nobody reads this site, and even less people leave a comment, it works for me, and it’s a failsafe method to keep the spam away.
What I don’t get, is how stupid and inefficient the spambots are. If I were to build a spambot, I would add code that would check back after several periods in time, to see if my spam actually got published. If it were, I would continue using the site, if it wasn’t published after an X period of time, I would remove the site from my datasource. Of course, you would check back after a minute or so, for the first time, to make sure you don’t remove the working site because some happy admin happened to be online, and quickly removed the spam comments again. And of course, I would add a failure counter, and not remove the site from my datasource, before it failed to publish my spam N times in a row.
Such a modification would make a spambot a lot more efficient, which would basicly get me more bang for my buck. I would only publish spam to sites that are known to be good, or are first tries. Let’s say, in an hour, I can spam 3600 sites (one per second). If I would have a publish-success rate of 90+ percent, I could ask a lot more money to have the spam posted for my client, than I could if I didn’t have any publish-success rate data at all, or a really low percentage. If 80% of the sites I was spamming too were using some kind of antispam technique that I could not bypass, wether it be human verification (impossible to break), or some kind of CAPTCHA, I would be spamming to 2880 sites per hour, knowing that the spam would not arrive. Sure, I could ignore that number, and pretent I get a 100% success rate, but that doesn’t make the number any lower. To convert that to minutes, I would be wasting 48 minutes out of the 60 with trying something that is known to be impossible (wether it is for the time being or not).
Now give me the name of 1 bussinessmen (or woman) that really wants to make money, and would accept a wasting-time rate of such numbers. I can’t believe any of them would want to pay for that. Of course, my numbers are approximations, but just think of it, how many blogs do you still know that immediately publish your comment, and don’t have any anti-spam technique at all? It is generally said that spamming is a high-tech business. Always bleeding edge, always on the look for yet another way to get the message through. Wether the viewer wants to see it or not, you will make sure (s)he sees it, one way or the other. Yet on the other hand, efficiency is nowhere to be found. Of course, when you find that new bleeding edge technique that will force your message to be read by billions, it will make you famous, but c’mon, it’ll last for a week at most. Eventually, the anti-spam community will come up with a way to block your technique. And a certain percent of people will start using that anti-technique. Do you write off months of work after a week of success? This is not the Formula1, you want your technique to last longer than that. If you would check for success rate, you could be using your technique for months, maybe even years. There are always people around that don’t care/don’t want to care about using the latest anti-techniques, and will allow you to either freely distibute your content, or will remove it once it’s found by a human.
Hey, and on the other hand, you will make the life of the old-style admins a bit easier. You know, those guys/gals that manually moderate their message queues. They really get tired of seeing the same content coming in every week on an almost fixed time. And no, they won’t publish it because you are so consistent and persistent in trying….
~RW
Filed Under: blog
Released: on Mar 18, 2008 under a Creative Commons Attribution-NoDerivs (CC-BY-ND) license












It may be that the publish rate is not the metric they are using. Maybe 20 minutes on a high profile blog is worth more to them than 10 weeks on a low profile blog.
You would think that this would apply to email spammers as well. Surely it would be worthwhile removing email addresses that bounce or ones that go to low click-rate people such as abuse@example.com and postmaster@example.com. These email addresses are not only likely to not result in any clicks on your spam but they are also likely to submit your spam to a clearing house, submit your IP address to a DNS block list and manually block the email from reaching all of the users of the system they administer.
Interestingly I have also seen spammers repeatedly submitting to the wrong page on my blog but because of the rewrite rules they were still getting an HTTP 200 response. I changed the rules so these guys got a 302 instead and immediately they stopped attempting the spam. I think at least some spammers are looking at the HTTP response code to determine how successful they were at posting their spam. I wonder if a 403 response would send them away, never to come back or if it would just make them try harder to not be detected.
Maybe the spamming bots are written by one group of people and operated by another. The operators may not have the expertise to actually optimise their lists of blogs/emails and the writers don’t have any incentive to add this feature to their bots. It would be nice if they did however… then they’d stop spamming my blog after noticing that there hasn’t been one successful spam comment in the two years since I allowed comments.