We're currently doing a big overhaul of our Iceni Server product. This is basically a pluggable architecture that allows us to offer various modules, such as web filtering, mail serving/filtering, telephone exchanges, etc.
Today's job has been to work on the logging subsystem within the web filter. Up until now, the web filter has logged information about web accesses to a text file, and then we have a little daemon sitting there reading the text log and stuffing the data into a PostgreSQL database so that it can be searched and analysed. One of the efficiency improvements I wanted to make was to get rid of the daemon and have the web filter process talk to the database directly.
This isn't entirely straight forward. For one thing, the web filter is massively multithreaded, and the old text file system worked by each thread doing its own log writing. This wasn't going to work for a database - communication with the database will have more latency than a file, and we could end up with a massive locking overhead slowing the whole thing down. So much of the existing logging code has been reorganised so that the individual threads just add log entries to an internal queue and a single dedicated thread handles both writing the traditional text file, and talking to the database. I got all that stuff working yesterday, and it seems to be pretty good.
A second problem, that we've previously had with the old system, is that there's a *lot* of log data going into the database - a quick check on a server at one of my busier customers shows the database is over half a terabyte. With this number of records, things can get decreasingly speedy, so we implemented database partitioning quite a long time ago to limit the amount of data in each table. This works by having a set of master tables that remain permanently empty, and then every so often creating a bunch of sub tables using PostgreSQL's inheritance system and adding check constraints on them. This massively speeds up both inserts and searches. So I've been porting the database partitioning idea to the new logging code. This has taken some thought in order to deal with a few corner cases that have caught us out before - things like when the server clock is wrong and gets changed to an earlier time, which means the new database records start failing some of the check constraints on the live tables.
Not quite got that all finished today, but there's not much left on the logging side of things so that should get done tomorrow...