Speeding up trac’s response time
I’ve been trying to speed up an installation of trac over the last few days. The web interface took ages to display each of the directories or files within the subversion repository. But this one wasn’t too big. The only change to the subversion repository is that we started using a vendor branch imported into our main repository using svm
So, after a few hours trying different solutions, and reading trac’s source code, I think I got where the bottleneck was.Well, it was http://www.sqlite.org/download.html which was causing the bottleneck. Trac uses an object CachedRepository to access the repositories. Whenever we want to get the chagesets, a function to synchronize the repository is called:
class CachedRepository(Repository): def get_changeset(self, rev): if not self.synced: self.sync() self.synced = 1 return CachedChangeset(self.repos.normalize_rev(rev), self.db, self.authz)
and such method, sync(), makes a call to:
youngest_stored = self.repos.get_youngest_rev_in_cache(self.db)
which is all this:
def get_youngest_rev_in_cache(self, db): """Get the latest stored revision by sorting the revision strings numerically """ cursor = db.cursor() cursor.execute("SELECT rev FROM revision ORDER BY -LENGTH(rev), rev DESC LIMIT 1") row = cursor.fetchone() return row and row[0] or None
And that SQL query was taking around 1-2 seconds each time it was executed. It happened that we were running an old version of sqlite and pysqlite, so a ./cofigure && make && make install using the recommended installation saved my day :-)
Hope it is useful to anybody if it gets indexed by Google.
Adding human verification to comments
I’ve been nailed by a bastard spammer for some time at
another blog I happen to run in Spanish. What he usually does is
try to create hundreds of comments pointing to some kind of
crappy web site I don”t intend to visit at all. All I know is he uses
lots of times the word casino and grants.
Investigating a bit, I learned he uses the UserAgent AIRF
string, which happens to be used by this program:
roboform, a program for automating
the filling of forms, somehow a paradise for spammers :-(
As It says in the
FAQ, newer versions
disable the use of that UserAgent string so it makes it virtually
impossible to filter those comments from a web server point of view. My
spammer must also use many open proxies as all the request seem to come
from different IPs. Very annoying.
While thinking about the different possibilities of castration ;-) if I
could grab the spammer, I thought I could implement a human verification
process to post comments, the same way
gmail and others implement it, using a
distorted image to display a number which you have to manually type in
the form. I am not a very experienced python programmer, so I took it as
an exercise to see what I could come up with. I spent a whole evening
reading part of the pybloxsom code and specially the comments.py
plugin.
So this is what I did to
implement the logic of the verification (I have yet to think about the
generation of the images, or let others do it ;-)
I modified the comment-form.html template an added a couple of
variables, like this:
<input type="hidden" name="hash" value="$hash"> Secret Number ($snumber): <input maxlength="5" name="snumber" size="6" type="text" value=""/>
So, the idea is this one: Each time an entry is displayed, I generate a
text file in a directory readable and writable by the web server
containing inside a random number. The name of the file would be a md5
hash ended by a “.txt”. I use this hash in the form to pass it to the
routines in charge of processing the POST action, along with the number
the poster must have typed beforehand (supposedly from the image, not
now at the moment). The POST routine should get the hash from the hash
variable, construct the file name and locate it in the hard disc from
the directory it was written which, by the way, is out of the web tree.
This way, it can retrieve the random number and compare it to what the
user typed. This way, I believe, I can know if it is a bot or a human
behind.
if form.has_key("title") and form.has_key("author") and form.has_key("body") and form.has_key("hash") and form.has_key("snumber"): hash = form["hash"].value diskHash = getHashNumber(hash) try: form_snumber = int(form["snumber"].value) except ValueError: # The user did not enter a number form_snumber = -1 if diskHash == form_snumber:
This, and a couple of auxiliary functions make all the magic, and it
looks like it works :-)
As I said before, I am not an expert, neither in Python nor http and web
programming and I don”t know if this approach has severe flaws or not,
if it is a bottle neck or it is just fine (looks like it works to me).
So I would really appreciate your comments on the topic, especially if
this a valid starting point to build a more secure comment system in
pybloxsom.
I believe there are other aspects to work on, like the removal of old
files from the temp directory and the posibility, although highly
unlikely, to have two identical files names. Other is the dynamic
generation of the images. I found some scripts, but in
perl [Sorry, the link
is in Spanish, but you get the idea].
The source code is here.
There is a followup of this article at
http://notreally.org/blog/devel/Python/pybloxsomnospam2