Adding human verification to comments

Posted by admin on August 04, 2004

I’ve been nailed by a bastard spammer for some time at
another blog
I happen to run in Spanish. What he usually does is
try to create hundreds of comments pointing to some kind of
crappy web site I don”t intend to visit at all. All I know is he uses
lots of times the word casino and grants.

Investigating a bit, I learned he uses the UserAgent AIRF
string, which happens to be used by this program:
roboform, a program for automating
the filling of forms, somehow a paradise for spammers :-(

As It says in the
FAQ, newer versions
disable the use of that UserAgent string so it makes it virtually
impossible to filter those comments from a web server point of view. My
spammer must also use many open proxies as all the request seem to come
from different IPs. Very annoying.

While thinking about the different possibilities of castration ;-) if I
could grab the spammer, I thought I could implement a human verification
process to post comments, the same way
gmail and others implement it, using a
distorted image to display a number which you have to manually type in
the form. I am not a very experienced python programmer, so I took it as
an exercise to see what I could come up with. I spent a whole evening
reading part of the pybloxsom code and specially the comments.py
plugin.

So this is what I did to
implement the logic of the verification (I have yet to think about the
generation of the images, or let others do it ;-)

I modified the comment-form.html template an added a couple of
variables, like this:

<input type="hidden" name="hash" value="$hash">
Secret Number ($snumber):
<input maxlength="5" name="snumber" size="6" type="text" value=""/>

So, the idea is this one: Each time an entry is displayed, I generate a
text file in a directory readable and writable by the web server
containing inside a random number. The name of the file would be a md5
hash ended by a “.txt”. I use this hash in the form to pass it to the
routines in charge of processing the POST action, along with the number
the poster must have typed beforehand (supposedly from the image, not
now at the moment). The POST routine should get the hash from the hash
variable, construct the file name and locate it in the hard disc from
the directory it was written which, by the way, is out of the web tree.
This way, it can retrieve the random number and compare it to what the
user typed. This way, I believe, I can know if it is a bot or a human
behind.

if form.has_key("title") and form.has_key("author") and form.has_key("body")
    and form.has_key("hash") and form.has_key("snumber"):
 
    hash = form["hash"].value
    diskHash = getHashNumber(hash)
    try:
        form_snumber = int(form["snumber"].value)
    except ValueError:
        # The user did not enter a number
        form_snumber = -1
    if diskHash == form_snumber:

This, and a couple of auxiliary functions make all the magic, and it
looks like it works :-)

As I said before, I am not an expert, neither in Python nor http and web
programming and I don”t know if this approach has severe flaws or not,
if it is a bottle neck or it is just fine (looks like it works to me).
So I would really appreciate your comments on the topic, especially if
this a valid starting point to build a more secure comment system in
pybloxsom.

I believe there are other aspects to work on, like the removal of old
files from the temp directory and the posibility, although highly
unlikely, to have two identical files names. Other is the dynamic
generation of the images. I found some scripts, but in
perl [Sorry, the link
is in Spanish, but you get the idea].

The source code is here.

There is a followup of this article at
http://notreally.org/blog/devel/Python/pybloxsomnospam2

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

Comments