author: melinte title: On memory leaks and finding them
The GNU C library comes with built-in functionality to help detecting memory issues. What follows is a look at one of these: mtrace(): how to use it and how to build upon it.
The malloc implementation in the GNU C library provides a simple but powerful way to detect memory leaks and obtain some information to find the location where the leaks occurs, and this, with rather minimal speed penalties for the program.
Getting started is as simple as it can be:
#include <mcheck.h> ... 21 mtrace(); ... 25 std::string* pstr = new std::string("leak"); ... 27 char *leak = (char*)malloc(1024); ... 32 muntrace(); ...
Under the hood, mtrace() installs four hooks for malloc(), free(), realloc() and memalign(). The information collected through the hooks is written to a log file.
Note: there are other ways to allocate memory, notably mmap(). These allocations will not be reported, unfortunately.
Next:
$ MALLOC_TRACE=logs/mtrace.plain.log ./dleaker $ mtrace dleaker logs/mtrace.plain.log > logs/mtrace.plain.leaks.log $ cat logs/mtrace.plain.leaks.log Memory not freed: ----------------- Address Size Caller 0x081e2390 0x4 at 0x400fa727 0x081e23a0 0x11 at 0x400fa727 0x081e23b8 0x400 at /home/amelinte/projects/articole/memtrace/memtrace.v3/main.cpp:27
One of the leaks (the malloc() call) was precisely traced to the exact file and line number. However, the other leaks at line 25, while detected, we do not know where they occur. The two memory allocations for the std::string are buried deep inside the C++ library. We would need the stack trace for these two leaks to pinpoint the place in our code.
We can use GDB to get the allocations' stacks:
$ gdb ./dleaker ... (gdb) set env MALLOC_TRACE=./logs/gdb.mtrace.log (gdb) b __libc_malloc Make breakpoint pending on future shared library load? (y or [n]) Breakpoint 1 (__libc_malloc) pending. (gdb) run Starting program: /home/amelinte/projects/articole/memtrace/memtrace.v3/dleaker Breakpoint 2 at 0xb7cf28d6 Pending breakpoint "__libc_malloc" resolved Breakpoint 2, 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 (gdb) command Type commands for when breakpoint 2 is hit, one per line. End with a line saying just "end". >bt >cont >end (gdb) c Continuing. ... Breakpoint 2, 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #0 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #1 0xb7ebb727 in operator new () from /usr/lib/libstdc++.so.6 #2 0x08048a14 in main () at main.cpp:25 <== new std::string("leak"); ... Breakpoint 2, 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #0 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #1 0xb7ebb727 in operator new () from /usr/lib/libstdc++.so.6 <== mangled: _Znwj #2 0xb7e95c01 in std::string::_Rep::_S_create () from /usr/lib/libstdc++.so.6 #3 0xb7e96f05 in ?? () from /usr/lib/libstdc++.so.6 #4 0xb7e970b7 in std::basic_string, std::allocator >::basic_string () from /usr/lib/libstdc++.so.6 #5 0x08048a58 in main () at main.cpp:25 <== new std::string("leak"); ... Breakpoint 2, 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #0 0xb7cf28d6 in malloc () from /lib/i686/cmov/libc.so.6 #1 0x08048a75 in main () at main.cpp:27 <== malloc(leak);
It would be good to have mtrace() itself dump the allocation stack and dispense with GDB. The modified mtrace() would have to supplement the information with:
Additionally, we can put the code in a library, to free the program from being instrumented with mtrace(). In this case, all we have to do is interpose the library when we want to trace memory allocations (and pay the performance price).
Note: getting all this information at runtime, particularly in a human-readable form will have a performance impact on the program, unlike the plain vanilla mtrace() suplied with glibc.
A good start would be to use another API function: backtrace_symbols_fd(). This would print the stack directly to the log file. Perfect for a C program but C++ symbols are mangled:
@ /usr/lib/libstdc++.so.6:(_Znwj+27)[0xb7f1f727] + 0x9d3f3b0 0x4 **[ Stack: 8 ./a.out(__gxx_personality_v0+0x304)[0x80492c8] ./a.out[0x80496c1] ./a.out[0x8049a0f] /lib/i686/cmov/libc.so.6(__libc_malloc+0x35)[0xb7d56905] /usr/lib/libstdc++.so.6(_Znwj+0x27)[0xb7f1f727] ./a.out(main+0x64)[0x8049b50] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe0)[0xb7cff450] ./a.out(__gxx_personality_v0+0x6d)[0x8049031] **] Stack
For C++ we would have to get the stack (backtrace_symbols()), resolve each address (dladdr()) and demangle each symbol name (abi::__cxa_demangle()).
Let's try again with our new library:
$ MALLOC_TRACE=logs/mtrace.stack.log LD_PRELOAD=./libmtrace.so ./dleaker $ mtrace dleaker logs/mtrace.stack.log > logs/mtrace.stack.leaks.log $ cat logs/mtrace.stack.leaks.log Memory not freed: ----------------- Address Size Caller 0x08bf89b0 0x4 at 0x400ff727 0x08bf89e8 0x11 at 0x400ff727 0x08bf8a00 0x400 at /home/amelinte/projects/articole/memtrace/memtrace.v3/main.cpp:27
Apparently, not much of an improvement: the summary still does not get us back to line 25 in main.cpp. However, if we search for address 8bf89b0 in the trace log, we find this:
@ /usr/lib/libstdc++.so.6:(_Znwj+27)[0x400ff727] + 0x8bf89b0 0x4 **[ Stack: 8 [0x40022251] (./libmtrace.so+40022251) [0x40022b43] (./libmtrace.so+40022b43) [0x400231e8] (./libmtrace.so+400231e8) [0x401cf905] __libc_malloc (/lib/i686/cmov/libc.so.6+35) [0x400ff727] operator new(unsigned int) (/usr/lib/libstdc++.so.6+27) <== was: _Znwj [0x80489cf] __gxx_personality_v0 (./dleaker+27f) [0x40178450] __libc_start_main (/lib/i686/cmov/libc.so.6+e0) [0x8048791] __gxx_personality_v0 (./dleaker+41) **] Stack
This is good, but having file and line information would be better.
Here we have a few possibilities:
The third solution could look something like:
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <dlfcn.h> #include <execinfo.h> #include <signal.h> #include <bfd.h> #include <unistd.h> /* globals retained across calls to resolve. */ static bfd* abfd = 0; static asymbol **syms = 0; static asection *text = 0; static void resolve(char *address) { if (!abfd) { char ename[1024]; int l = readlink("/proc/self/exe",ename,sizeof(ename)); if (l == -1) { perror("failed to find executable\n"); return; } ename[l] = 0; bfd_init(); abfd = bfd_openr(ename, 0); if (!abfd) { perror("bfd_openr failed: "); return; } /* oddly, this is required for it to work... */ bfd_check_format(abfd,bfd_object); unsigned storage_needed = bfd_get_symtab_upper_bound(abfd); syms = (asymbol **) malloc(storage_needed); unsigned cSymbols = bfd_canonicalize_symtab(abfd, syms); text = bfd_get_section_by_name(abfd, ".text"); } long offset = ((long)address) - text->vma; if (offset > 0) { const char *file; const char *func; unsigned line; if (bfd_find_nearest_line(abfd, text, syms, offset, &file, &func, &line) && file) printf("file: %s, line: %u, func %s\n",file,line,func); } }
The downside is that it takes a quite heavy toll on the performance of the program.