Steve's Blog: Adventures in broken apps

Its been a week of frustration, but also success. We always have a perennial problem of broken apps, which work ok on the average home network, but as soon as you try to secure things you hit problems. Although there is a good argument for keeping school networks fairly permissive, realistically schools can't just turn off their firewall and let everyone at it - not only would the school be failing in their duty of care, but it would be a security nightmare too.

We're frequently tasked with getting some badly behaved app to work on a school network. Unfortunately when something works elsewhere but breaks when connected to the school network, the firewall/web filter is often regarded by staff to be "broken", even when we can clearly see that the app is the one doing broken things. Still, we like to keep our customers happy and be as helpful as possible.

We routinely spend a lot of time diagnosing problems and sending debugging to the app vendors and there are a few vendors that are thankful for our input and will work with us to improve their software. However, I think its fair to say that the vast majority of app vendors are completely uninterested in fixing bugs in their software. This attitude is unfortunately prevalent across all kinds of suppliers - from small suppliers, right up to the likes of Microsoft, Apple and Google. In fact, we no longer submit bug reports to Apple because collecting data for them uses a huge amount of our engineering time and they have never fixed any of the bugs we've reported. A good example of this recently was CloudMosa, who responded within 24 hours of our bug report, explaining that they weren't going to fix the bugs that we reported in their Puffin Academy app. WhatsApp have been similarly unhelpful with problems we reported, stating "WhatsApp is not designed to be used with proxy configurations or restricted networks, and we cannot provide support for these network configurations." What a cop-out!

So with the app developers refusing to properly support their own software, our customers have nowhere left to turn and it is often down to us to do our best to work around the flaws in the applications. A lot of this comes down to collecting as much information as possible about each connection so we can automatically turn security features on and off on our systems to work around incompatibilities.

We do things like snooping on TLS handshakes - when a device sets up an encrypted connection, it is supposed to include information such as a "server name indication" and we can spot that and use the presented server name to decide whether or not its safe to intercept and analyse the encrypted data. Some apps aren't compatible with interception, so when we see known problem apps we avoid intercepting their connections. Unfortunately, every so often you find an app that doesn't bother to include this data, and there's no way for the system to know if it's ok to intercept the connection.

In the second half of this week we developed some new code for the web filter to actively callout to the remote server in these situations. When the web filter sees an encrypted connection that has no server name indication, it can now connect to the web server, retrieve the certificate and use information in it to figure out what to do next. We're expecting this to help a lot with the problem apps. The results of each callout are cached to reduce the impact on the web servers. This is currently going through testing to make sure it won't cause any problems,

Another frustration has always been Skype - this has always been a real pain to make it work reliably and securely. We've spent a lot of time this week pouring over network traffic dumps and testing. There are numerous problems with the Skype protocol, which boil down to:

It makes connections on TCP port 443 (and therefore look like HTTPS), which aren't actually HTTPS, or even TLS. These connections can go to any IP address, so we can't trivially poke holes in the firewall for them. They get picked up by the transparent proxy, which treats them as encrypted HTTPS connections and therefore fails to handle them since they aren't actually HTTPS.
It makes real HTTPS connections carrying WebSockets requests. Unfortunately we don't yet support WebSockets and as Skype doesn't bother to include a server name indication we can't pre-emptively decide not to intercept them.
It sends peer-to-peer voice and video media over UDP using any port numbers between 1024 and 65535. Since it's peer-to-peer, this traffic can be directed at any IP address on the internet. Official advice is to just allow that through your firewall - if you do that you may as well not even bother to have a firewall in the first place!
All of Skype's traffic is encrypted so it's almost impossible to figure out what it's actually trying to do and what went wrong when it fails.
If something goes wrong, Skype just breaks in one way or another and provides no indication what actually went wrong. The Android version of Skype could output some debugging data to Android's standard debugging log, but it doesn't. The PC version of Skype can be told to produce a debug log, but the log is encrypted so that only Microsoft developers can read it (gee, thanks for nothing Microsoft!)

Fortunately, it turns out that the not-HTTPS-that-looks-like-HTTPS (1) traffic isn't needed if it can successfully set up the peer-to-peer UDP connections (3), so we think we can ignore that problem.

It doesn't actually seem to matter too much if the WebSockets connections (2) fail, but this should be handled by the web filter's new TLS callouts system described above.

So we're left with the UDP traffic, which can be to an IP address on any port (3). This one is a real problem - blindly allowing all of this traffic would also allow a whole load of other stuff such as VPNs, games, etc. So we've been playing with the nDPI deep packet inspection library and nDPI-Netfilter.

Normally, firewalling is done based on just the information in the packet headers, such as the source and destination addresses. Deep packet inspection examines all of the data associated with the connection, including the payload, in an attempt to identify what protocol is being used. We seem to have got this working pretty reliably now. The sticking point is that the deep packet inspection system needs to see a few packets before it can identify the protocol - usually you'd allow or refuse the connection immediately, but for DPI to work you have to allow all connections for a while and then terminate any that you don't want to allow. We're finding that allowing the first 10 kilobytes seems to work reasonably well - after that we chop any connections that haven't been identified as Skype.

Of course, all this was massively complicated by the fact that, unbeknown to us, Skype had a bug which made video unreliable - we found that out on Wednesday when Microsoft released a new version to address the problem. But not before spending a lot of time trying to figure out what was going wrong (did I mention that Skype problems are almost impossible to debug because absolutely everything, including the debug log, is encrypted so you can't examine it?)

The original intention was to implement deep packet inspection in the new firewall system which we are developing, but by popular demand we've backported this to the existing firewall. There is currently no user interface to set up the Skype DPI rules, but we can manually set them up for customers on demand for the time being.

Anyway, a moderately successful week - we're still testing the Skype rules, but they should be available Real Soon Now™.

Steve's Blog

Friday, 15 July 2016

Adventures in broken apps

No comments:

Post a Comment