Sunday, 24 July 2016

Wifi doorbell project

We've got a wireless doorbell which is fairly unreliable.  Also, I can't hear it from my office, and the 433MHz transmitter built into the doorbell button doesn't have enough range for me to put a sounder in the office.  The unreliability problem could probably be fixed by just buying a new wireless doorbell, but that probably wouldn't help with the range problem.

The advent of the ESP8266 microcontrollers opened up a more interesting approach.  These are programmable microcontrollers with a built in wifi interface.  They are tiny and also only cost about a pound each.  So my plan was: build a new doorbell push from scratch, which talks to the wifi network.  The old doorbell has an electromechanical sounder, which I'm reusing by replacing the driver circuit.

Doorbell push

ESP-01 on stripboard
The first job was the doorbell push.  I'm using an ESP-01 module for this because it's tiny, and its running off a pair of AAA cells, which will provide a 3.0v supply when new.  In theory, I could use a buck-boost regulator to maintain a 3.3v supply to the ESP throughout the life of the batteries, but the quiescent current draw of the regulator would probably drain the battery fairly quickly since this circuit is going to spend most of its life asleep.  Although the ESP8266 is supposed to run on 3.3v, they are apparently ok down to around 2.3v, so I'm running it directly off the batteries with no regulator.  The ESP8266's wifi radio has some reasonably high power demands, so there's a 100µF capacitor across the supply rails to absorb any high current bursts which the batteries may not be up to supplying.

The theory of operation is as follows: the ESP-01 module will be put in deep sleep mode when idle.  The button is wired between the ESP's reset pin and ground, so when someone pushes the button to ring the bell, it pulls reset low and wakes the device from deep sleep.  The ESP-01 has an on board power LED which is always on and would drain the batteries quite quickly, so I've cut the track supplying power to that LED - the module now draws 20µA when in deep sleep mode, so I expect the batteries to last a while.  When the ESP resets, it immediately associates with the wifi network and makes two HTTP requests - one directly to the sounder (more on this later) and the other to my home server.  A php script on the server sends out "doorbell rang" instant messages to both myself and Mel via XMPP - this solves the problem of hearing the bell in the office, since the XMPP notification pops up on my workstation and phone.

Everything just fits!
The button I bought contains an LED ring which I'm using to give the person ringing the bell feedback - The ring lights up when the wifi connection has been established (including DHCP, etc.) and flashes after the HTTP requests have successfully completed.  There's a minor complication that the ESP-01 only makes 2 GPIO pins available, they both have to be pulled high when the ESP boots up, and in fact the ESP-01 module has integrated pullups for this reason.  So the LED is wired between the positive power rail and GPIO-0.  This results in inverted logic - to turn the LED on GPIO-0 has to be set low, to turn it off GPIO-0 has to be either high or in high impedance (input) mode.  Once the HTTP requests have completed and the LED has been flashed, the ESP goes back into deep sleep mode.

The web requests that the button makes include the current battery voltage, so I can figure out when to change the batteries.  Its now been in use for about 15 weeks and the reported battery voltage has fallen from about 3.182v to 3.053v, which I think is pretty reasonable.

The finished doorbell push On the door frame

Sounder

As mentioned, I'm re-purposing the old electromechanical sounder by replacing the old circuit board and driving the solenoid directly.  The old sounder was powered by 3 C cells.  Unlike the button, the sounder will need to remain connected to the wifi all the time, so batteries aren't really an option.  I've opted to using a USB power supply, and with no need for the batteries there's now a lot of space inside the sounder for my new circuit.

I'm using an ESP-12F in the sounder - it's a bit overkill really, but I wanted to have an output that didn't have to be pulled up on boot.  The mini-USB port is connected to an LM1117T-3.3 linear regulator to provide the 3.3v supply voltage.  The solenoid is driven by an RFD14N05L field effect transistor and I've included a fairly chunky capacitor to help with the high power requirements of the solenoid.  Of course there's also a reverse biassed diode in parallel with the solenoid to absorb the back-EMF (although no magic smoke was released before I bothered adding it).

The whole thing is soldered up onto stripboard and just pushed into the battery compartment of the sounder (having removed the battery contacts).

The sounder connects to the wifi network as soon as it is powered up and sits there waiting for an HTTP request.  Making a request to "/" returns a status page, requesting "/ring" causes it to fire the striker and make the classic "ding-dong".
The old electromechanical sounder with an ESP-12F installed

Problems to be aware of

Receiving a notification on your phone to say the doorbell is being rung when you're a few hundred miles away is infuriating because you know you're going to have to drive up to the postal depot to collect a parcel when you get home. :)

Source code

Source code for the project is available in Subversion:
https://subversion.nexusuk.org/projects/doorbell/

Friday, 15 July 2016

Adventures in broken apps

Its been a week of frustration, but also success.  We always have a perennial problem of broken apps, which work ok on the average home network, but as soon as you try to secure things you hit problems.  Although there is a good argument for keeping school networks fairly permissive, realistically schools can't just turn off their firewall and let everyone at it - not only would the school be failing in their duty of care, but it would be a security nightmare too.

We're frequently tasked with getting some badly behaved app to work on a school network.  Unfortunately when something works elsewhere but breaks when connected to the school network, the firewall/web filter is often regarded by staff to be "broken", even when we can clearly see that the app is the one doing broken things.  Still, we like to keep our customers happy and be as helpful as possible.

We routinely spend a lot of time diagnosing problems and sending debugging to the app vendors and there are a few vendors that are thankful for our input and will work with us to improve their software.  However, I think its fair to say that the vast majority of app vendors are completely uninterested in fixing bugs in their software.  This attitude is unfortunately prevalent across all kinds of suppliers - from small suppliers, right up to the likes of Microsoft, Apple and Google.  In fact, we no longer submit bug reports to Apple because collecting data for them uses a huge amount of our engineering time and they have never fixed any of the bugs we've reported.  A good example of this recently was CloudMosa, who responded within 24 hours of our bug report, explaining that they weren't going to fix the bugs that we reported in their Puffin Academy app.  WhatsApp have been similarly unhelpful with problems we reported, stating "WhatsApp is not designed to be used with proxy configurations or restricted networks, and we cannot provide support for these network configurations."  What a cop-out!

So with the app developers refusing to properly support their own software, our customers have nowhere left to turn and it is often down to us to do our best to work around the flaws in the applications.  A lot of this comes down to collecting as much information as possible about each connection so we can automatically turn security features on and off on our systems to work around incompatibilities.

We do things like snooping on TLS handshakes - when a device sets up an encrypted connection, it is supposed to include information such as a "server name indication" and we can spot that and use the presented server name to decide whether or not its safe to intercept and analyse the encrypted data.  Some apps aren't compatible with interception, so when we see known problem apps we avoid intercepting their connections.  Unfortunately, every so often you find an app that doesn't bother to include this data, and there's no way for the system to know if it's ok to intercept the connection.

In the second half of this week we developed some new code for the web filter to actively callout to the remote server in these situations.  When the web filter sees an encrypted connection that has no server name indication, it can now connect to the web server, retrieve the certificate and use information in it to figure out what to do next.  We're expecting this to help a lot with the problem apps.  The results of each callout are cached to reduce the impact on the web servers.  This is currently going through testing to make sure it won't cause any problems,

Another frustration has always been Skype - this has always been a real pain to make it work reliably and securely.  We've spent a lot of time this week pouring over network traffic dumps and testing.  There are numerous problems with the Skype protocol, which boil down to:
  1. It makes connections on TCP port 443 (and therefore look like HTTPS), which aren't actually HTTPS, or even TLS.  These connections can go to any IP address, so we can't trivially poke holes in the firewall for them.  They get picked up by the transparent proxy, which treats them as encrypted HTTPS connections and therefore fails to handle them since they aren't actually HTTPS.
  2. It makes real HTTPS connections carrying WebSockets requests.  Unfortunately we don't yet support WebSockets and as Skype doesn't bother to include a server name indication we can't pre-emptively decide not to intercept them.
  3. It sends peer-to-peer voice and video media over UDP using any port numbers between 1024 and 65535.  Since it's peer-to-peer, this traffic can be directed at any IP address on the internet.  Official advice is to just allow that through your firewall - if you do that you may as well not even bother to have a firewall in the first place!
  4. All of Skype's traffic is encrypted so it's almost impossible to figure out what it's actually trying to do and what went wrong when it fails.
  5. If something goes wrong, Skype just breaks in one way or another and provides no indication what actually went wrong.  The Android version of Skype could output some debugging data to Android's standard debugging log, but it doesn't.  The PC version of Skype can be told to produce a debug log, but the log is encrypted so that only Microsoft developers can read it (gee, thanks for nothing Microsoft!)
Fortunately, it turns out that the not-HTTPS-that-looks-like-HTTPS (1) traffic isn't needed if it can successfully set up the peer-to-peer UDP connections (3), so we think we can ignore that problem.

It doesn't actually seem to matter too much if the WebSockets connections (2) fail, but this should be handled by the web filter's new TLS callouts system described above.

So we're left with the UDP traffic, which can be to an IP address on any port (3).  This one is a real problem - blindly allowing all of this traffic would also allow a whole load of other stuff such as VPNs, games, etc.  So we've been playing with the nDPI deep packet inspection library and nDPI-Netfilter.

Normally, firewalling is done based on just the information in the packet headers, such as the source and destination addresses.  Deep packet inspection examines all of the data associated with the connection, including the payload, in an attempt to identify what protocol is being used.  We seem to have got this working pretty reliably now.  The sticking point is that the deep packet inspection system needs to see a few packets before it can identify the protocol - usually you'd allow or refuse the connection immediately, but for DPI to work you have to allow all connections for a while and then terminate any that you don't want to allow.  We're finding that allowing the first 10 kilobytes seems to work reasonably well - after that we chop any connections that haven't been identified as Skype.

Of course, all this was massively complicated by the fact that, unbeknown to us, Skype had a bug which made video unreliable - we found that out on Wednesday when Microsoft released a new version to address the problem.  But not before spending a lot of time trying to figure out what was going wrong (did I mention that Skype problems are almost impossible to debug because absolutely everything, including the debug log, is encrypted so you can't examine it?)

The original intention was to implement deep packet inspection in the new firewall system which we are developing, but by popular demand we've backported this to the existing firewall.  There is currently no user interface to set up the Skype DPI rules, but we can manually set them up for customers on demand for the time being.

Anyway, a moderately successful week - we're still testing the Skype rules, but they should be available Real Soon Now™.

Wednesday, 18 May 2016

Queen's Speech

I'm reading through the BBC's summary of the Queen's speech.  I have to say a lot of this seems a bit ill thought out...

Digital Economy Bill (UK-wide)

  • Every UK household will have legal right to a fast broadband connection
  • Minimum speed of 10Mbps to be guaranteed through Broadband Universal Service Obligation
  • Properties in the "remotest areas" may have to contribute to cost of installation
Starts off sounding ok, but then we get to the last point and I wonder how this is much different from the current situation.  Telcos are already building out fast broadband in non-remote areas, and if you're in a remote area you can already pay (sometimes a lot) to have fast broadband installed.

I wonder whether they are putting cost restrictions on that or accelerating the time scales, because on the surface this doesn't seem to change much...
  • Right to automatic compensation when broadband service goes down
Great, but who's going to be made to pay the compensation?  If ISPs utilising BT Wholesale's infrastructure are expected to hand out compensation for BT's problems, that seems grossly unfair unless BT are made to reimburse those ISPs.  BT needs to be incentive to fix problems with their network, rather than punishing the (often quite small) ISPs who rely on it.
  • Companies must get consent before sending promotional spam emails, with fines for transgressors
I really don't understand this one - we have already had exactly this law for 13 years, in the form of the The Privacy and Electronic Communications (EC Directive) Regulations 2003 ("PECR").  The problem is that the regulator rarely does much to enforce the law.  Passing a new law that basically says exactly what an existing law already says isn't going to help anything - either the regulator needs to be incentivised to take action against companies who are in breach of the regulations, or the recipients of spam need to be empowered to sue the spammers themselves.
The current situation is that recipients of spam who sue spammers have to argue that the spam has caused them a material loss which must be reimbursed rather than being able to impose a punitive fine.  Spammers will often argue in court that the financial cost of a single spam is trivial and the courts have sometimes agreed and let them off the hook.  This could easily be fixed by implementing legislation that awards the recipient a fixed amount per spam.
  • All websites containing pornographic images to require age verification for access
This strikes me as a waste of time and will end up much like the pointless cookie law.  Any website with user-submitted content (i.e. anything that *could* be pornographic) will implement an annoying popup age verification page.  And of course, anyone under age will just click through it anyway.

Education for All Bill (Mainly England only)

  • Powers to convert under-performing schools in "unviable" local authorities to academies
  • Goal of making every school an academy but no compulsion to do so
...Despite there being no evidence that academies do any better than other schools...

Counter-Extremism and Safeguarding Bill (England and Wales)

 We can't have an official government announcement without the usual "scare everyone shitless so they let us do what we like" section, but...
  • Ofcom to have power to regulate internet-streamed material from outside EU
How's that going to work?  Are Ofcom going to be able to force ISPs to censor traffic without a court order?  Sounds concerning.  To be clear, ISPs are already required to censor stuff if a court tells them to.  So this just sounds like its designed to short-circuit due process and being justified with "otherwise the terrorists will kill you all" and "think of the children" as usual.

Intellectual Property Bill (UK-wide)

  • Exempting lawyers and other professional advisers from liability for threatening legal action in certain cases
Allowing people to bully folks with empty threats sounds like a bad plan to me - we already see this stuff time and time again with US patent law, so removing penalties for doing so in the UK doesn't seem smart.  Call me cynical maybe this is because the government is run by lawyers?

Investigatory Powers Bill

  • Overhaul of laws governing how the state gathers and retains private communications or other forms of data to combat crime
  • Broadband and mobile phone providers will be compelled to hold a year's worth of communications data
  • Creation of new Investigatory Powers Commissioner
I'm not sure anything more needs to be said on this - the IP bill has been in discussion for ages and basically works by treating everyone in the UK as a criminal and spying on everything they do on the off-chance that a crime needs to be pinned on them later on.  The government has received an overwhelming amount of evidence from experts explaining why the bill is largely unworkable, will massively erode the liberties of law abiding citizens and undermine the British software industry, whilst doing precisely nothing to help the authorities stop criminals; But the evidence has been completely ignored and the bill steam-rollered through regardless, on a fast track designed to reduce debate.

Bill of Rights (Subject to consultation)

  • Plans for a British bill of rights to replace the Human Rights Act will be published in "due course" and subject to consultation
Replacing the Human Rights Act just scares the crap out of me.  The only reason for doing so is if you want to avoid giving some people their human rights...

Friday, 22 January 2016

Performance improvements

I've been doing quite a lot of work on improving the performance of our Iceni web filter.  An increasing number of schools are now getting internet connections exceeding 100Mbps, and there was a noticeable drop-off of performance at higher throughputs.

We had previously identified a number of areas which could have been acting as performance bottlenecks- amongst these were reduction of the number of memory copies by replacing linear buffers with ring buffers (using mmap() tricks to present the rings as linear memory) and replacement of some of the content analysis code.


However, in testing, we found that the CPU didn't seem to be the limiting factor.  Even when going flat-out, there was plenty of spare CPU time, and we just weren't seeing the performance we expected.  This was unexpected - we had thought that performance would be harmed by inefficiencies, but if that were the case, we'd expect to see all of the CPUs pegged.

Eventually this was narrowed down to a locking bug.  The software is multithreaded - this means that there are effectively multiple copies (threads) of the program running at the same time, all accessing the same data in memory.  The data is "locked" while a thread is accessing it so that another thread doesn't come along and change it.

Its safe to have lots of threads reading a piece of data at the same time, but we definitely don't want a thread to change that data while other threads are accessing it.  So threads can either make a "nonexclusive lock" or an "exclusive lock" - multiple threads can hold nonexclusive locks at the same time, but if a thread holds an exclusive lock, no other thread can acquire a lock (either exclusively or nonexclusively).

So when a thread asks for an exclusive lock, two things have to happen:
  1. It has to wait for all of the other locks (exclusive and nonexclusive) to be released.
  2. No new locks must be acquired by other threads while it waits.
In our code, when a thread is waiting for an exclusive lock to be acquired, it sets an "exclusive" flag, and once the lock has been acquired, that flag is cleared.  Threads trying to acquire nonexclusive locks check this flag and the nonexclusive locks are therefore inhibited.

The problem arose when multiple threads were waiting for exclusive locks at the same time - they would all set the "exclusive" flag, but the first one to successfully get the lock would clear it again, even though other threads were still waiting for exclusive locks.  Nonexclusive locks where therefore no longer inhibited, and the threads trying to get exclusive locks would be competing with them.  The result was that it could take an extremely long time (frequently hundreds of milliseconds) to acquire an exclusive lock!

The fix was simple - the "exclusive" flag was replaced with a counter, which is incremented when a thread is waiting for an exclusive lock and decremented again when it acquires the lock.

This fix has been rolled out to a number of customers and the user experience improvement has been striking - not only can significantly higher throughput be achieved, but the responsiveness of websites is noticably much better, even at low throughputs.

Going forward

We've already done a lot of the content analysis code improvements, although there are still some more to come.  The main thing now is replacing the linear buffers with ring buffers, which will reduce the amount of memory copying needed.  This is involving a rip-out and replace job on the ICAP protocol interface and finite state machine.  The new code is looking a lot neater and much easier to understand, and is giving us the opportunity to better optimise memory usage based on what we've learnt in the years since it was originally written.

The new work revolves around a neat ring buffer library I wrote a few months back - using mmap() tricks, the buffer is presented to the rest of the application as linear memory.  Not having to worry about data crossing the start/end of the ring simplifies things a lot, and this library has already been used in anger elsewhere in Iceni, so it can be treated as well tested.