Steve's Blog

Friday, 25 October 2013

LinkedIn Intro

I've just been pointed at http://www.bishopfox.com/blog/2013/10/linkedin-intro/ - who on earth thinks these things up?!

Monday, 7 October 2013

Sorry, this is going to be a bit of a rant...

In November I switched my energy supply from Scottish Power to Npower using Uswitch. For whatever reason (whether that be Scottish Power or Npower's fault I don't know), it took until mid-January for the switch to actually happen.

I switched onto the Go Save tariff - this was a pretty straight forward tariff where you pay per KWh with no standing charge. Anyway, I received no further information from Npower beyond a letter telling me they had run out of "welcome packs" and would send me one in due course, which of course they didn't.

Seven months on I receive my first bill... telling me I had underpaid by about £400. There appeared to be two reasons for this: Firstly they had estimated that during the summer I would probably use about three times as much energy as I did in the winter (WTF? But that bit was easy to sort out). Secondly, they had apparently signed me up for the far more expensive "Go Save S" tariff, which charges about 55p/day as a standing charge.

So I phone them up and complain and initially they tell me there's nothing they can do about it because they abolished all the no-standing-charge tariffs a few months ago and that I should have received a letter to tell me that I would be switched to Go Save S... they then confirm that they never sent me that letter because they had already erroneously signed me up for Go Save S in the first place. I pointed out that they are committing fraud by charging me a tariff that I never agreed to, they insist that as the tariff I agreed to doesn't exist any more then there's nothing they can do about this. I explained that the internal workings of their systems didn't concern me and didn't absolve them of fraud.. so they logged a complaint and told me I would hear back in 10 days. They also agreed that I could cancel my direct debit so as not to be charged incorrect amounts.

Its now over a month after I made the complaint, still no word about a resolution. I do, however, have a letter dated October 2nd 2013 telling me to "Relax, you don't need to do anything. Your new tariff takes effect from 22nd January 2013" and detailing that I "will" be on the "Go Save" tariff (yay!) and showing that I "will" be paying the 55p/month standing charge (errm.. huh?!). I also have a nastygram complaining that I had cancelled my direct debit (you might recall that I did this with their agreement).

So another phone call to their complaints line...

"I was told I'd hear back in 10 days, why haven't I?"
"We're still investigating, we'll get back to you when we've finished."

"Why do I have this nutty letter dated October which tells me that I will be changing tariff in last January?"
"That's just showing that we've backdated the tariff for you"
"Yes, but its the wrong tariff..."
"Uhh..."

Also, they are still insisting that "it isn't fraud because the tariff you signed up for no longer exists"... Sorry, but agreeing to certain charges and then deciding to increase them and charging me *without telling me* is fraud.

Oh yes, and apparently I can't switch suppliers because the contract I signed up to (which they are changing the terms of without my agreement) has a 12 month term...

They have at least agreed to put my account on hold so that the non-payment (which they agreed to!) doesn't get passed to the debt collectors. We'll see if they manage that bit without screwing it up, since they can't seem to manage much else competently...

I can see this ending up in the small claims court. :(

As a bit of an aside, I do wonder how much legal weight these contracts have, given that I've never actually signed *anything* saying that I agree to npower's T&Cs - I bunged my details into Uswitch, told them to switch my supply, Uswitch sent me an email detailing the tariff. Npower have never sent me anything showing what tariff I'm on or asking me to sign a contract, so I wonder how they can legally enforce any of it? Would it stand up in court if I said "show me a contract with my signature on it proving I agreed to this?", since I know they have no such contract...

Thursday, 26 September 2013

Problems with the CBL

We've recently started having a lot of problems with the Composite Blocking List (CBL). This is supposed to be a list of IP addresses that are known to be sending spam emails, so its employed by people running email servers to automatically reject connections from these senders to reduce the volume of spam email they are getting. The contents of the CBL is also aggregated into other third party block lists, such as Spamhaus's Zen list.

Block lists are a pretty fundamental part of most anti-spam systems, and in general they are a good idea. Unfortunately, the way IP addresses are added to the CBL seems to be very questionable to me - they run "honeypot" servers and if you're caught connecting to one of these servers then you get added to the block list. This makes a lot of sense when the honeypot is just detecting people sending spam email. Unfortunately the CBL's honeypot also looks for people making web requests to it, and this is the problem.

I'll give an example of a couple of typical small office networks:
1. An email smarthost and web proxy are operated on the same server. All the workstations are firewalled off from direct internet access, so they have to use the (authenticated) smarthost to send email and the proxy to access the web.
or:
2. Everything on the network sits behind a single router that does NAT. The router is set to firewall off SMTP so the only machine that can send email is the mail server, but the workstations either have unrestricted web access, or go via a proxy server that also sits behind the same router.

In both of these example networks, the outgoing email and the web traffic comes from the same IP address... And I'm sure the problem is immediately obvious: someone plugs a virus-infected machine into the network, which starts making web requests, the IP address ends up on the CBL and suddenly no one in the office can send email. So anyone using the CBL to reject email, is rejecting email from any network that has had a virus infection, irrespective of whether that infection could have actually sent spam. Anyone who has run a network for a while (especially one full of Windows laptops) knows that virus infections happen all the time.

Recently we seem to be having a lot of customers being hit by the ZBot virus, and ending up with all their email being blocked by people using the CBL because of this.

One solution is to move the email traffic onto a different IP address to the web traffic. In some cases this isn't too hard, but in others the customer may be using an ISP who will only provide them with a single IP address, so implementing this would mean changing ISP.

We could reconfigure everyone to use their ISP's email smarthosts for outgoing email, but we don't routinely do this because they add another possible point of failure, seem to be forever getting black listed themselves since you're sharing them with potentially badly-behaved people, and in my experience ISPs frequently seem to configure them in crazy ways that causes unexpected breakage.

I've been in contact with CBL, suggesting that they could make a list of the honeypot domains available to us. This would allow us to set our customers' proxy servers to block the connections (avoiding being added to the CBL), and also automatically alert the administrator to the virus infection. Unfortunately they say they can't do this due to the rapidly changing nature of the honeypot servers - this seems like a solved problem to me though, they could easily distribute a rapidly changing list of domains using DNS.

Friday, 20 September 2013

Shiny new templates!

As I've mentioned before, our Iceni server product is getting a big redesign at the moment - most of that work is centred around the user interface, but there are a few fundamental changes to the way some of the backend services work too.

Anyway, up until now I've been working to a UI design that I threw together - I'm completely unartistic, so that wasn't great, but it did allow me to get on with writing the new code. The new UI is built using the Smarty templating engine, so I expected to be able to apply a better design at a later date without too much trouble. I didn't realise how little work it would actually be though - its only taken me about two days to completely replace the temporary interface with Sarah's new design.

She gave me the new design as a flat graphic, so the first job was to reproduce this as a web page, using HTML 5 and CSS. A web page needs to be able to re-flow all the text and graphics to fit different sized web browsers, etc. which obviously the flat graphic prototype doesn't do, so it isn't a dead-simple job. It also needs to be flexible enough to allow the layout to adjust to display different types of data. It took about a day to figure out all the HTML and CSS, which I thought was pretty good going.

Today I've been going through the Iceni user interface code and replacing the old templates with the new design - a surprising amount of that is actually just a straight replacement. Of course, a few bits and pieces needed tweaking to make them fit in with the new stuff, but I'm honestly quite impressed that I've got all that done today.

Here's a sneak-peak - there are a few missing graphics and some of the colours need fixing to properly integrate, but all in all I think its come along well.

Tuesday, 17 September 2013

Filtering Google Searches

Google have now suddenly, seemingly without warning, started redirecting people to the HTTPS website. They've been doing this with logged in users since forever, but now they've started sending people who aren't logged in to the encrypted version of the search engine.

Today we caught wind of one local authority who have notified all their schools that they are no longer able to filter Google searches because of this change. This is interesting because none of this affects our filtering systems at all - our Iceni servers are able to force users into the unencrypted version of Google and therefore filter it. We can also turn SafeSearch on for all users automatically. In fact, our systems have been able to do this for years, which is quite reassuring - even though Google seems to have caught some filtering providers off guard, our software has proven robust in this respect and we've not had to do anything.

Monday, 16 September 2013

New web filtering module

Today started with fixing a few bugs in the new web filter code: firstly, it wasn't always telling Squid to decrypt SSL connections when necessary, and secondly the overrides didn't appear to be working at all. Both of these were logic errors - the first one actually already existed in the old code too, although it doesn't seem to manifest itself. The overrides problem was down to setting the initial filter state incorrectly - previously it hadn't shown up because we didn't aggregate the states of the individual filters until after we'd run all of them at least once (which caused the states to be set right), but now we can aggregate the states before running some of the filters and so the incorrect initial states broke things.

I've also been adding a new filtering module to the web filter - this will scan content for URIs and look them up in our URI categorisation database. The idea is that websites often link to similar sites, so aggregating some of the information we have about linked sites should allow better categorisation. Most of the web filter is very modular, and adding a new type of filter should be easy. However, the content filter code is quite old and the newer more modular design has kind of been shoe-horned around it. Eventually that code will need refactoring to bring it in line with the rest of the filter, but that's a lot of work for very little gain, so I'm not going to get into doing that just yet. The new filtering module needs access to all the buffers the content filter currently maintains, so for now the new filtering module is going to be integrated into that existing code. Its half-done now anyway, and happily extracting URIs from the page content, so it shouldn't be too hard to look those URIs up and feed that data into the categorisation engine.

While all this development has been going on, we've also been trying to tackle an odd issue with Apple's "Find my iPhone" service. We've got customers using this service with no problem, but it just doesn't seem to be working on one of our customer's networks. The odd thing is that we can see regular requests to gs-loc.apple.com from the "locationd" service on these devices, and they seem to be succeeding just fine. But whilst Find my iPhone says the devices are online, it says it has no location for them. Its quite frustrating, and of course Apple don't provide any information on their protocols, so debugging the issue is a case of reverse engineering it all. So that's another one to investigate some more tomorrow.

Tuesday, 10 September 2013

Auto-updates

As mentioned in the previous post, we currently use a system built on top of SubVersion to deliver updates to the filtering criteria to clients every night. This is being replaced by a PostgreSQL backed system. We have quite a lot of criteria that the filters use to classify content, so it isn't sensible for all the clients to download the whole lot every night. Instead, we produce deltas - each update is tagged with a revision number and when a client wants new data it tells us what revision its already got and the server just sends the changes. Today I've been trying to figure out the best way to generate deltas from the new database tables.

Initially I set about trying to build SQL that would automatically generate a list of additions and deletions between any two revisions. However, this became unreasonably complex: we produce some of the filtering criteria in-house, and bolster them with data from some third parties. Internally, we have to keep the data from all these sources separate so that we can apply updates to them, but it all gets consolidated for the clients. The various sources can give us duplicate data, and ensuring the deltas behaved sensibly in the face of the duplicates was problematic.

So my second method to try involved running SQL queries to build tables of the "old" data and the "new" data, and then join them together in such a way as to produce the delta. This did work for small amounts of data, but once I loaded all of our standard filtering criteria into the database it ground to a halt and an analysis of how the database was handling the query suggested it was unfixable, so another idea was needed.

In the end, I've used two completely separate SQL queries - one to generate the "old" data and one to generate the "new" data. These are sorted by ID and then the PHP code loops through the two tables in parallel, comparing the IDs to see where records have been removed and added. Although I was hesitant to shift so much data between PostgreSQL and PHP, it does actually seem to work well. I suspect I could write some plpgsql code to do the same job on the PostgreSQL side, but honestly I don't think its necessary with the performance I've seen so far.

Now, the filtering criteria are bundled up into a gzipped xml file, which makes an initial "from-clean" download of around 8MB (obviously the deltas thereafter are much smaller). This is a lot better than the 40-odd MB download the old system does on a fresh install!

Next job is to add some extra types of data that we have to auto-update, which should be relatively trivial since I can just reuse the library code I've now written. Then I need to build client-side updater system to interface with all this. Once all that's done, I'll have our full filtering database on my test machine and can really start to test out the new web filter code to make sure its stable.