Creating images on the fly (human verification)

Posted by admin on August 05, 2004

Ok, so yesterday I
implemented
all the logic behind a anti spam system for pybloxsom
comments. The only thing missing was the generation of images on the
fly, showing the secret number in a way that humans can read it and also
making it weird enough to mess an OCR system.

So, I spent some time investigating PIL, the Python Imaging
Library
, a set of python modules to manipulate and create images.

Basically what I do is create an small image using PIL, write the number on it and a small grid in two very close colors (grey). You can
have a look at the result below in this page.

To display the image, I just use a CGI python script that generates this image from the parameter passed to it like this way:

<img src="/snumber.png?hash=eef334ab8..." />

hash being the filename I was talking about yesterday.

The generation of the image is pretty simple, the code is like:

def generateImage(number):
    font = ImageFont.truetype(fontPath, fontSize)
    im = Image.new("RGB", imageSize, bgColor)
    draw = ImageDraw.Draw(im)
 
    xsize, ysize = im.size
 
    # Do we want the grid start at 0,0 or want some offset?
    x, y = 1,1
 
    draw.setink(gridInk)
    while x &lt;= xsize:
        draw.line(((x, 0), (x, ysize)))
        x = x + xstep
    while y &lt;= ysize:
        draw.line(((0, y), (xsize, y)))
        y = y + ystep
 
    draw.setink(fontInk)
    draw.text((3, 2), number, font=font)
 
    return im

which creates the image. The rest of the implementation consist in writing the img to stdout (web) among the HTTP headers.

Well, this is a little hack to do the trick. I believe this code can be further enhaced by, for example, creating a class for generating this
kind of images.

You can find the source code here for snumber.png. Just rename it to
a name of your choice and make it execute as an CGI. It should work

This and yesterday’s work are a fast hack to avoid spam. It surely can be enhanced. I intend to do it in the next weeks, when I have free time. Feel free to send comments or tell me if you are using it on your system (and if it works). Suggestion, patches and critics
are welcome ;-)

Adding human verification to comments

Posted by admin on August 04, 2004

I’ve been nailed by a bastard spammer for some time at
another blog
I happen to run in Spanish. What he usually does is
try to create hundreds of comments pointing to some kind of
crappy web site I don”t intend to visit at all. All I know is he uses
lots of times the word casino and grants.

Investigating a bit, I learned he uses the UserAgent AIRF
string, which happens to be used by this program:
roboform, a program for automating
the filling of forms, somehow a paradise for spammers :-(

As It says in the
FAQ, newer versions
disable the use of that UserAgent string so it makes it virtually
impossible to filter those comments from a web server point of view. My
spammer must also use many open proxies as all the request seem to come
from different IPs. Very annoying.

While thinking about the different possibilities of castration ;-) if I
could grab the spammer, I thought I could implement a human verification
process to post comments, the same way
gmail and others implement it, using a
distorted image to display a number which you have to manually type in
the form. I am not a very experienced python programmer, so I took it as
an exercise to see what I could come up with. I spent a whole evening
reading part of the pybloxsom code and specially the comments.py
plugin.

So this is what I did to
implement the logic of the verification (I have yet to think about the
generation of the images, or let others do it ;-)

I modified the comment-form.html template an added a couple of
variables, like this:

<input type="hidden" name="hash" value="$hash">
Secret Number ($snumber):
<input maxlength="5" name="snumber" size="6" type="text" value=""/>

So, the idea is this one: Each time an entry is displayed, I generate a
text file in a directory readable and writable by the web server
containing inside a random number. The name of the file would be a md5
hash ended by a “.txt”. I use this hash in the form to pass it to the
routines in charge of processing the POST action, along with the number
the poster must have typed beforehand (supposedly from the image, not
now at the moment). The POST routine should get the hash from the hash
variable, construct the file name and locate it in the hard disc from
the directory it was written which, by the way, is out of the web tree.
This way, it can retrieve the random number and compare it to what the
user typed. This way, I believe, I can know if it is a bot or a human
behind.

if form.has_key("title") and form.has_key("author") and form.has_key("body")
    and form.has_key("hash") and form.has_key("snumber"):
 
    hash = form["hash"].value
    diskHash = getHashNumber(hash)
    try:
        form_snumber = int(form["snumber"].value)
    except ValueError:
        # The user did not enter a number
        form_snumber = -1
    if diskHash == form_snumber:

This, and a couple of auxiliary functions make all the magic, and it
looks like it works :-)

As I said before, I am not an expert, neither in Python nor http and web
programming and I don”t know if this approach has severe flaws or not,
if it is a bottle neck or it is just fine (looks like it works to me).
So I would really appreciate your comments on the topic, especially if
this a valid starting point to build a more secure comment system in
pybloxsom.

I believe there are other aspects to work on, like the removal of old
files from the temp directory and the posibility, although highly
unlikely, to have two identical files names. Other is the dynamic
generation of the images. I found some scripts, but in
perl [Sorry, the link
is in Spanish, but you get the idea].

The source code is here.

There is a followup of this article at
http://notreally.org/blog/devel/Python/pybloxsomnospam2