Monday, November 11, 2013

Enabling core dumps in Linux


It is usable to set the Linux systems to write core dumps when an application process crashes. It isn't a magic that badly written C/C++ application would silently crash and vanish. In such a case, to debug what has happened, we would like to view the stack trace of the application to the point of crash which may possibly give us a clue that what did go wrong.

If not in all Linux distributions, certain Linux distributions do not write the core dumps to the file system by default as it is set off. It's usually done to prevent applications to write huge core dumps of hundreds of GB, which would eat up valuable disk space. Although, enabling core dumps in production systems is not recommended. it would be beneficial for developers to have it enabled in the development machines.

The question is how do we turn it on. OK.. it's quite simple.

Before that we shall check the current core dump limits. Enter the following command.
$ ulimit -c
0
You might possibly get the value as zero. This means. the system doesn't write any core files when your application crashes. Let's test this with a sample program which is set to crash intentionally. Type in the following in a file names Crash.cpp.
#include <iostream>

using namespace std;

class Crash
{
public:
 Crash() {
     cout << "Constructing " << endl ;
     p = new int ;
 }
 ~Crash() {
     cout << "Destructing" << endl ;
     delete p ;
 }
private:
 int *p ;
};

int main()
{
 Crash *pCrash = new Crash ;
 delete pCrash ;
 delete pCrash ;
 return 0;
}

According to the program, dynamically allocated memory pointer pCrash is reallocated twice, which make the program crash at the second reallocation. Compile the program by entering the following.
$ g++ -o crash Crash.cpp
Now let's execute the program as below. You would see that the program crashed by looking at the terminal output of the stack trace. But the core file would be missing at the execution directory or in the configured directory (see below), since ulimit is 0.
 [shazni@wso2-ThinkPad-T530 Crash]$ ./crash
Constructing
Destructing
Destructing
*** Error in `./crash': free(): invalid pointer: 0x0000000000ddb020 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7fa5bd0bba46]
./crash[0x400b2f]
./crash[0x400a3a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fa5bd05cea5]
./crash[0x400929]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:04 14552625                           /home/shazni/ProjectFiles/Test/NormalTest/Crash/crash
00600000-00601000 r--p 00000000 08:04 14552625                           /home/shazni/ProjectFiles/Test/NormalTest/Crash/crash
00601000-00602000 rw-p 00001000 08:04 14552625                           /home/shazni/ProjectFiles/Test/NormalTest/Crash/crash
00ddb000-00dfc000 rw-p 00000000 00:00 0                                  [heap]
7fa5bcd36000-7fa5bce39000 r-xp 00000000 08:03 4341456                    /lib/x86_64-linux-gnu/libm-2.17.so
7fa5bce39000-7fa5bd039000 ---p 00103000 08:03 4341456                    /lib/x86_64-linux-gnu/libm-2.17.so
7fa5bd039000-7fa5bd03a000 r--p 00103000 08:03 4341456                    /lib/x86_64-linux-gnu/libm-2.17.so
7fa5bd03a000-7fa5bd03b000 rw-p 00104000 08:03 4341456                    /lib/x86_64-linux-gnu/libm-2.17.so
7fa5bd03b000-7fa5bd1fa000 r-xp 00000000 08:03 4341459                    /lib/x86_64-linux-gnu/libc-2.17.so
7fa5bd1fa000-7fa5bd3f9000 ---p 001bf000 08:03 4341459                    /lib/x86_64-linux-gnu/libc-2.17.so
7fa5bd3f9000-7fa5bd3fd000 r--p 001be000 08:03 4341459                    /lib/x86_64-linux-gnu/libc-2.17.so
7fa5bd3fd000-7fa5bd3ff000 rw-p 001c2000 08:03 4341459                    /lib/x86_64-linux-gnu/libc-2.17.so
7fa5bd3ff000-7fa5bd404000 rw-p 00000000 00:00 0
7fa5bd404000-7fa5bd418000 r-xp 00000000 08:03 4329143                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa5bd418000-7fa5bd618000 ---p 00014000 08:03 4329143                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa5bd618000-7fa5bd619000 r--p 00014000 08:03 4329143                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa5bd619000-7fa5bd61a000 rw-p 00015000 08:03 4329143                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa5bd61a000-7fa5bd6ff000 r-xp 00000000 08:03 2891772                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fa5bd6ff000-7fa5bd8fe000 ---p 000e5000 08:03 2891772                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fa5bd8fe000-7fa5bd906000 r--p 000e4000 08:03 2891772                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fa5bd906000-7fa5bd908000 rw-p 000ec000 08:03 2891772                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fa5bd908000-7fa5bd91d000 rw-p 00000000 00:00 0
7fa5bd91d000-7fa5bd940000 r-xp 00000000 08:03 4330760                    /lib/x86_64-linux-gnu/ld-2.17.so
7fa5bdb18000-7fa5bdb1d000 rw-p 00000000 00:00 0
7fa5bdb3b000-7fa5bdb3f000 rw-p 00000000 00:00 0
7fa5bdb3f000-7fa5bdb40000 r--p 00022000 08:03 4330760                    /lib/x86_64-linux-gnu/ld-2.17.so
7fa5bdb40000-7fa5bdb42000 rw-p 00023000 08:03 4330760                    /lib/x86_64-linux-gnu/ld-2.17.so
7fffcccd9000-7fffcccfa000 rw-p 00000000 00:00 0                          [stack]
7fffccd66000-7fffccd68000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted
If you see a value instead, that is the maximum size that's permitted to dump. If the dump is less than that size you would get the full dump. Otherwise the full dump may not be written, which would be useless. the core files may not be truncated to exact limit of ulimit. It may be different, since ulimit that we set is a soft limit.

To set a limit to the ulimit, issue the following command
$ ulimit -c 45  # sets the limit to 45 times 512 bytes of block
To set the limit to an unlimited amount, issue the following command
$ ulimit -c unlimited
Now, execute the above program. You would now see the same output, but there will be an additional file named 'core' in the current directory (If you still don't find, read ahead), which is the kernel dump of the process. Ok, have we done yet? You would have probably guessed 'No', because you see lot of stuff written below in the post. Good guess!!!
The problem here is that, the setting that you made is only temporary to the current terminal session. If you open up another terminal and check the ulimit, it would still be the initial ulimit (probably zero) that you saw earlier. We need our settings to be persistent throughout and between reboots. OK... I get profile files into my head now. Either .bashrc in home directory, if you want the setting to effect only the current user or /etc/profile if you need the settings to be effected to all users. I'll use my .bashrc for this.
$ vi ~/.bashrc
Add the following line to the end of your profile.
ulimit -c unlimited >/dev/null 2>&1
This set the ulimit to unlimited and throws whatever is output to screen to hell. You may need to source the .bashrc file as the below command and may need to close the terminal and start all over again. Now in all terminals you open up as your user account, ulimit should be shown as unlimited.
$ source ~/.bashrc
Ok, now even if you reboot the system and run the above program, you should get a core file created (If, you still don't find read ahead). Well, we can be little more fancy as well. We can have a pattern of our core file name. Currently, if you see the core dump is named 'core' unless someone has changed the pattern in the file /proc/sys/kernel/core_pattern. This is where you specify where the core dump is written and it's file name pattern. If you didn't see the core dump earlier in the directory where the program executable exist, look the path in this file. In my Ubuntu, it as 'core' as shown below.
$ cat /proc/sys/kernel/core_pattern
core
We can have certain information embedded in the core file name itself. You may want to edit this file directly or you can edit /etc/sysctl.conf. I'll edit the configuration file since I can add more system parameters if needed.
$ sudo vi /etc/sysctl.conf
And add the following to the end of the file
#core file settings
kernel.core_uses_pid=1
kernel.core_pattern=core.%e.%s.%p.%t`
fs.suid_dumpable=2
The first line will enable addition of the process id to the core file name.
The more interesting line is the second line. This line is the core file pattern. Each letter following a % sign as a special meaning. Some of the possible values and its meanings are as follows.
Unix/linux systems provide a facility to run a process under a different user id or group id than the user/group starting the process by changing the setuid and setgid bits. Such processes may be dealing with sensitive data which should not be accessible by the new invoking user/group. Since core dumps may provide some internal sensitive data, Unix/Linux systems by default disable core dumps to be written when such users execute the process. If we still want to enable core dumps to be written when such users execute the process, we need to set the third line.

%e - Application name
%s - Signal number that caused the crash
%p - Process ID of the application when it crashed
%t - time at which dump is written (This is in seconds from Unix epoch)
%u - real UID (User ID) of the dumped process
%g - real GID (Group ID) of the sumped process
%h - hostname
%% - will have a % sign itself

To make setting effective, enter the following command.
$ sudo sysctl -p
Now if you run the above application with above pattern, you would get something similar core file name as, core.crash.6.18056.1384147041

OK. Going a little further now. We can permit the core files to be written by all the applications in your system, including daemons. In Red-hat based distributions like RHEL or Fedora, this can be achieved by adding the following line in /etc/sysconfig/init
DAEMON_COREFILE_LIMIT='unlimited'
You may need to restart the system for this setting to take effect.

Instead if you just want to allow only certain daemons to write dumps, in Red-hat based distributions, edit add the following line /etc/init.d/functions unless it's already there
# make sure it doesn't core dump anywhere unless it's requested
corelimit="ulimit -S -c ${DAEMON_COREFILE_LIMIT:-0}"
Now add the following line to the init script of your daemon in /etc/init.d/{your service}
DAEMON_COREFILE_LIMIT='unlimited'
Since is a RedHat way of setting core file limit, in Ubuntu you need to add the following lines instead to the services' init script in /etc/init.d
ulimit -c unlimited >/dev/null 2>&1
echo tmp/core.%e.%s.%p.%t > /proc/sys/kernel/core_pattern
If you now start your service by invoking following command
$ sudo service {your service script} start
Now find you services process id using
$ ps aux | grep {your service}
And let's kill it
$ sudo kill -s SIGSEGV {Your services' PID}
Now if go ahead and look in /tmp. you should find a core dump for your service.

Great!!! Here we are. now we have set our Linux box to write core dumps to the system and can debug any application crashes.

No comments:

Post a Comment