I’ve been nailed by a bastard spammer for some time at another blog I happen to run in Spanish. What he usually does is try to create hundreds of comments pointing to some kind of crappy web site I don”t intend to visit at all. All I know is he uses lots of times the word casino and grants.
Investigating a bit, I learned he uses the UserAgent AIRF string, which happens to be used by this program: roboform, a program for automating the filling of forms, somehow a paradise for spammers :-(
As It says in the FAQ, newer versions disable the use of that UserAgent string so it makes it virtually impossible to filter those comments from a web server point of view. My spammer must also use many open proxies as all the request seem to come from different IPs. Very annoying.
While thinking about the different possibilities of castration ;-) if I could grab the spammer, I thought I could implement a human verification process to post comments, the same way gmail and others implement it, using a distorted image to display a number which you have to manually type in the form. I am not a very experienced python programmer, so I took it as an exercise to see what I could come up with. I spent a whole evening reading part of the pybloxsom code and specially the comments.py plugin.
So this is what I did to implement the logic of the verification (I have yet to think about the generation of the images, or let others do it ;-)
I modified the comment-form.html template an added a couple of variables, like this:
<input type="hidden" name="hash" value="$hash"> Secret Number ($snumber): <input maxlength="5" name="snumber" size="6" type="text" value=""/>
So, the idea is this one: Each time an entry is displayed, I generate a text file in a directory readable and writable by the web server containing inside a random number. The name of the file would be a md5 hash ended by a “.txt”. I use this hash in the form to pass it to the routines in charge of processing the POST action, along with the number the poster must have typed beforehand (supposedly from the image, not now at the moment). The POST routine should get the hash from the hash variable, construct the file name and locate it in the hard disc from the directory it was written which, by the way, is out of the web tree. This way, it can retrieve the random number and compare it to what the user typed. This way, I believe, I can know if it is a bot or a human behind.
if form.has_key("title") and form.has_key("author") and form.has_key("body") and form.has_key("hash") and form.has_key("snumber"): hash = form["hash"].value diskHash = getHashNumber(hash) try: form_snumber = int(form["snumber"].value) except ValueError: # The user did not enter a number form_snumber = -1 if diskHash == form_snumber:
This, and a couple of auxiliary functions make all the magic, and it looks like it works :-)
As I said before, I am not an expert, neither in Python nor http and web programming and I don”t know if this approach has severe flaws or not, if it is a bottle neck or it is just fine (looks like it works to me). So I would really appreciate your comments on the topic, especially if this a valid starting point to build a more secure comment system in pybloxsom.
I believe there are other aspects to work on, like the removal of old files from the temp directory and the posibility, although highly unlikely, to have two identical files names. Other is the dynamic generation of the images. I found some scripts, but in perl [Sorry, the link is in Spanish, but you get the idea].
The source code is here.
There is a followup of this article at [ http://notreally.org/blog/devel/Python/pybloxsomnospam2](http://notreally.org/blog/devel/Python/pybloxsomnospam2)