Monday 21 December 2015

Finding memory over-usage

Memory leaks are a common problem when writing software - this is where the software has asked the operating system (OS) for a lump of memory, but has forgotten to tell the OS when it finishes using that memory.

Tracking down memory leaks can be a real pain, but software such as Valgrind Memcheck helps - Valgrind lets you run the software (very very slowly!) and keeps track of when memory is allocated and freed.  When the software exits, Valgrind gives you a summary of memory that still hasn't been freed.

However, there is another problem very similar to a memory leak - if the memory has been allocated, but is kept in use by your software for far longer than necessary.  Eventually the software will release the memory again, so technically this isn't a "leak", but the effect is very similar.  Since the software is holding onto memory allocations for a long time, it may require a lot more of the computer's memory than you would expect.  On shut down, the software would usually release all of these memory allocations, so the usual method of tracking down leaks (i.e. see what hasn't been released at the end) isn't going to work.

But we can still use Valgrind, and rather than wait until the program exits we can get a list of currently allocated memory at any point while the program is running.  Obviously this includes everything that has been allocated, not just the "problem" allocations.  However, if the problem allocations amount to a very significant amount of memory, this tends to stick out like a sore thumb.

First of all run the program to be debugged inside Valgrind with the --vgdb=full commandline argument:
valgrind --vgdb=full --track-fds=yes --tool=memcheck --leak-check=full --show-reachable=yes --leak-resolution=high --num-callers=40 ./some_executable

In another window, we can then fire up the the gdb debugger:
gdb some_executable
At the gdb prompt, the following will connect to the running process and obtain the list of allocations:
target remote | vgdb
set logging file /tmp/allocations.txt
set logging on
monitor leak_check full reachable any
The memory allocations will be logged to /tmp/allocations.txt.  You can do this several times while the program is running and then compare the outputs. 

[Edit:  You can use vgdb directly, instead of gdb: "vgdb leak_check full reachable any"]

Wednesday 2 December 2015

Debugging IPSEC

Today I had occasion to do some IPSEC debugging.  There were multiple tunnels between two endpoints (using OpenS/Wan's "leftsubnets={...}" and "rightsubnets={...}" declarations) and one of the tunnels was reported as being dead.  The remaining tunnels were carrying a lot of traffic and couldn't be shut down.

Usually I'd ping some stuff on the other side of the VPN and tcpdump the connection to see if the pings are being encapsulated.  But with multiple tunnels between the same endpoints, just filtering the traffic by source and destination address doesn't really help - you'd end up capturing traffic for all of the tunnels.

Each tunnel has its own pair of identifiers (SPIs) - one for each direction.  So first thing to do is run "ip xfrm policy show" and you get something like:
# ip xfrm policy show
src 10.0.0.0/24 dst 203.0.113.1
    dir out priority 2343 ptype main
    tmpl src 198.51.100.1 dst
203.0.113.1
        proto esp reqid 16389 mode tunnel
src 192.168.0.0
/24 dst 10.0.0.0/24
    dir fwd priority 2343 ptype main
    tmpl src 203.0.113.1 dst
198.51.100.1
        proto esp reqid 16389 mode tunnel
src
192.168.0.0/24 dst 10.0.0.0/24
    dir in priority 2343 ptype main
    tmpl src 203.0.113.1 dst
198.51.100.1
        proto esp reqid 16389 mode tunnel
(The output will actually be much longer if you've got multiple tunnels).  From this we can extract the reqid from the tunnels we care about - 16389 in this case.

Now we do "ip xfrm state show":
# ip xfrm policy show
src 203.0.113.1 dst 198.51.100.1
    proto esp spi 0x01234567 reqid 16389 mode tunnel
    replay-window 32 flag 20
    auth hmac(sha1) 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    enc cbc(des3_ede) 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src 198.51.100.1 dst 203.0.113.1
    proto esp spi 0x89abcdef reqid 16389 mode tunnel
    replay-window 32 flag 20
    auth hmac(sha1) 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    enc cbc(des3_ede) 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


(Again, the output will be much longer than this).  Now look up the reqid and make a note of the associated spis (0x01234567 and 0x89abcdef).


Finally, we can ask tcpdump to only show us traffic that matches those SPIs:
# tcpdump -n -i internet 'ip[20:4] = 0x01234567' or 'ip[20:4] = 0x89abcdef'
In this case I found that the encapsulated traffic was being transmitted ok, but we weren't receiving any from the other end.  Turns out the other side's firewall was dropping their traffic.