NDK Crash Handling

NDK is scary. Code compiled with NDK doesn’t run in a VM and can do wild things otherwise impossible to achieve. This has both its benefits and drawbacks. One thing commonly not that straightforward to achieve in native code is crash handling. Illegal machine instructions, unlike a VM, interrupt our process and pass the recovery work to the underlying OS kernel. Luckily, Android being a linux-like (some may argue the opposite) system, provides us the necessary building blocks to fill the missing pieces

What we want to achieve

Prepare an Activity with JNI bindings

Before everything else, create an empty NDK project with Android Studio’s templates or integrate NDK into an existing project with this guide.

Then create an activity with a single button in the center. Our button will trigger a crash to showcase our crash handler.

package com.testfairy.ndkplayground;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;

public class MainActivity extends Activity {

	// Used to load the 'native-lib' library on application startup.
	static {
		System.loadLibrary("native-lib");
	}

	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);

		// Example of a call to a native method
		Button crashButton = findViewById(R.id.crash_button);
		crashButton.setOnClickListener(new View.OnClickListener() {
			@Override
			public void onClick(View v) {
				crashAndGetExceptionMessage();
			}
		});
	}

	@Override
	protected void onResume() {
		super.onResume();

		initSignalHandler();
	}

	@Override
	protected void onPause() {
		super.onPause();

		deinitSignalHandler();
	}

	/**
	 * Initialize native signal handler to catch native crashes.
	 */
	public native void initSignalHandler();

	/**
	 * Deinitialzie native signal handler to leave native crashes alone.
	 */
	public native void deinitSignalHandler();

	/**
	 * A native method that is implemented by the 'native-lib' native library,
	 * which is packaged with this application. It will throw a C++ exception
	 * and catch it in the signal handler which will be visible in the logs.
	 */
	public native void crashAndGetExceptionMessage();
}

For this new MainActivity to work as expected, we must implement the underlying native code in a language compatible with JNI. In a simple NDK project, C/C++ compilers are already available in build tools downloaded by SDK manager. You can also use your favorite language if it compiles to Arm and Intel as well. We will use C++ to keep it simple this time.

Create a C++ source file under src/main/cpp/native-lib.cpp and declare it as a unit in your src/main/cpp/CMakeLists.txt build configuration.

cmake_minimum_required(VERSION 3.4.1)

add_library( # Sets the name of the library.
             native-lib

             # Sets the library as a shared library.
             SHARED

             # Provides a relative path to your source file(s).
             native-lib.cpp )

find_library( # Sets the name of the path variable.
              log-lib

              # Specifies the name of the NDK library that
              # you want CMake to locate.
              log )

target_link_libraries( # Specifies the target library.
                       native-lib

                       # Links the target library to the log library
                       # included in the NDK.
                       ${log-lib} )

Native Library

A single Android application with multiple activities unless specified otherwise, always run in a single process. These processes are POSIX processes you are familiar from the Linux world. When a user launches an app or broadcasts an Intent, android launches a new process for the receiving application using the package name as the process name. This process returns a single integer representing its status when its work is done. If its work is interrupted by an external event or an internal panic condition (i.e divide by zero, an uncaught C++ exception etc), the OS will interrupt the flow to access process memory in the kernel space. At this moment, the OS will figure out the problem and set a signal number for the error. If there is no signal handler set by the user, the kernel will simply terminate the process. If there is one, it will invoke the handler first, then proceed as usual.

In our native library, we will import some libraries to communicate the OS and JVM via JNI. We will also need some utilities to work with signals and strings. Finally, we will need an ABI compatible C/C++ demangler to figure out symbol names in our crash handler. Without demangling, our crashes would have gibberish names for known types in our logs.

// Java
#include <jni.h>

// C++
#include <csignal>
#include <cstdio>
#include <cstring>
#include <exception>
#include <memory>

// C++ ABI
#include <cxxabi.h>

// Android
#include <android/log.h>
#include <unistd.h>

In C, extracting array size from a compile-time initialized array can sometimes be too lengthy. We will shorten that with this utility macro.

/// Helper macro to get size of an fixed length array during compile time
#define sizeofa(array) sizeof(array) / sizeof(array[0])

For some older systems, signal handlers eat crashes entirely. For those rare cases, we will need to trigger a signal after our initial handling is done to be able to crash properly. Checking the Linux kernel sources, we learnt that __NR_tgkill will do the trick. Let’s copy its value to our source to finalize our imports.

/// tgkill syscall id for backward compatibility (more signals available in many linux kernels)
#define __NR_tgkill 270

We will specify a singleton, process level context to store our crash relevant info and handler functions. Having an array of important signal numbers will also help registering multiple handlers in a loop.

/// Caught signals
static const int SIGNALS_TO_CATCH[] = {
        SIGABRT,
        SIGBUS,
        SIGFPE,
        SIGSEGV,
        SIGILL,
        SIGSTKFLT,
        SIGTRAP,
};

/// Signal handler context
struct CrashInContext {
    /// Old handlers of signals that we restore on de-initialization. Keep values for all possible
    /// signals, for unused signals nullptr value is stored.
    struct sigaction old_handlers[NSIG];
};

/// Crash handler function signature
typedef void (*CrashSignalHandler)(int, siginfo*, void*);

/// Global instance of context. Since an app can't crash twice in a single run, we can make this singleton.
static CrashInContext* crashInContext = nullptr;

Let’s loop over our supported signals array to register or unregister signal handlers in a bulk.

/// Register signal handler for crashes
static bool registerSignalHandler(CrashSignalHandler handler, struct sigaction old_handlers[NSIG]) {
    struct sigaction sigactionstruct;
    memset(&sigactionstruct, 0, sizeof(sigactionstruct));
    sigactionstruct.sa_flags = SA_SIGINFO;
    sigactionstruct.sa_sigaction = handler;

    // Register new handlers for all signals
    for (int index = 0; index < sizeofa(SIGNALS_TO_CATCH); ++index) {
        const int signo = SIGNALS_TO_CATCH[index];

        if (sigaction(signo, &sigactionstruct, &old_handlers[signo])) {
            return false;
        }
    }

    return true;
}

/// Unregister already register signal handler
static void unregisterSignalHandler(struct sigaction old_handlers[NSIG]) {
    // Recover old handler for all signals
    for (int signo = 0; signo < NSIG; ++signo) {
        const struct sigaction* old_handler = &old_handlers[signo];

        if (!old_handler->sa_handler) {
            continue;
        }

        sigaction(signo, old_handler, nullptr);
    }
}

Now we can use these register/unregister function pair to create our singleton context. Our main activity will be able to invoke these via one liner wrappers.

/// like TestFairy.stop() but for crashes
static bool deinitializeNativeCrashHandler() {
    // Check if already deinitialized
    if (!crashInContext) return false;

    // Unregister signal handlers
    unregisterSignalHandler(crashInContext->old_handlers);

    // Free singleton crash handler context
    free(crashInContext);
    crashInContext = nullptr;

    __android_log_print(ANDROID_LOG_ERROR, "NDK Playground", "%s", "Native crash handler successfully deinitialized.");

    return true;
}

/// like TestFairy.begin() but for crashes
static void initializeNativeCrashHandler() {
    // Check if already initialized
    if (crashInContext) {
        __android_log_print(ANDROID_LOG_INFO, "NDK Playground", "%s", "Native crash handler is already initialized.");
        return;
    }

    // Initialize singleton crash handler context
    crashInContext = static_cast<CrashInContext *>(malloc(sizeof(CrashInContext)));
    memset(crashInContext, 0, sizeof(CrashInContext));

    // Trying to register signal handler.
    if (!registerSignalHandler(&nativeCrashSignalHandler, crashInContext->old_handlers)) {
        deinitializeNativeCrashHandler();
        __android_log_print(ANDROID_LOG_ERROR, "NDK Playground", "%s", "Native crash handler initialization failed.");
        return;
    }

    __android_log_print(ANDROID_LOG_ERROR, "NDK Playground", "%s", "Native crash handler successfully initialized.");
}

Let’s wrap these in JNI exports to expose them to Java world.

/// Jni bindings

extern "C" JNIEXPORT void JNICALL
Java_com_testfairy_ndkplayground_MainActivity_initSignalHandler(
        JNIEnv* env,
        jobject /* this */) {

    initializeNativeCrashHandler();
}

extern "C" JNIEXPORT void JNICALL
Java_com_testfairy_ndkplayground_MainActivity_deinitSignalHandler(
        JNIEnv* env,
        jobject /* this */) {
    deinitializeNativeCrashHandler();
}

/// Our custom test exception. Anything "publicly" inheriting std::exception will work
class MyException : public std::exception {
public:
    const char* what() const noexcept override {
        return "This is a really important crash message!";
    }
};

extern "C" JNIEXPORT void JNICALL
Java_com_testfairy_ndkplayground_MainActivity_crashAndGetExceptionMessage(
        JNIEnv* env,
        jobject /* this */) {
    throw MyException(); // This can be replaced with any foreign function call that throws.
}

//////////////////////////////////////////////////////

We can now define our handler function. In the previous code pieces, we left implementation for one function undefined. It was used as a function pointer in the check during initialization.

// Trying to register signal handler.
    if (!registerSignalHandler(&nativeCrashSignalHandler, crashInContext->old_handlers)) { ...

This signal handler can be anything you find suitable. We will log a human readable crash message to logcat. You can also create crash report files or notify other processes if that is applicable to your use case.

/// Main signal handling function.
static void nativeCrashSignalHandler(int signo, siginfo* siginfo, void* ctxvoid) {
    // Restoring an old handler to make built-in Android crash mechanism work.
    sigaction(signo, &crashInContext->old_handlers[signo], nullptr);

    // Log crash message
    __android_log_print(ANDROID_LOG_ERROR, "NDK Playground", "%s", createCrashMessage(signo, siginfo));

    // In some cases we need to re-send a signal to run standard bionic handler.
    if (siginfo->si_code <= 0 || signo == SIGABRT) {
        if (syscall(__NR_tgkill, getpid(), gettid(), signo) < 0) {
            _exit(1);
        }
    }
}

Our logger line used a function named createCrashMessage() to figure out what caused the crash. Before defining it, let’s discuss how a crash can occur in native code.

In C/C++, any compiled code with an invalid CPU instruction will trigger a signal. This includes division by zero, null pointer dereferencing, bus and allignment errors, process faults etc.

Additionally in C++, uncaught throws trigger a fault signal as well. However, extracting an exception from these kind of signals requires manual labor. Without the C++ ABI libraries, we don’t have a way of knowing what caused the fault.

In createCrashMessage() implementation, we will ask these questions to the OS and C++ ABI. Depending on the received result, we will try our best to construct a useful abort message.

/// Create a crash message using whatever available such as signal, C++ exception etc
static const char* createCrashMessage(int signo, siginfo* siginfo) {
    void* current_exception = __cxxabiv1::__cxa_current_primary_exception();
    std::type_info* current_exception_type_info = __cxxabiv1::__cxa_current_exception_type();

    size_t buffer_size = 1024;
    char* abort_message = static_cast<char*>(malloc(buffer_size));

    if (current_exception) {
        try {
            // Check if we can get the message
            if (current_exception_type_info) {
                const char* exception_name = current_exception_type_info->name();

                // Try demangling exception name
                int status = -1;
                char demangled_name[buffer_size];
                __cxxabiv1::__cxa_demangle(exception_name, demangled_name, &buffer_size, &status);

                // Check demangle status
                if (status) {
                    // Couldn't demangle, go with exception_name
                    sprintf(abort_message, "Terminating with uncaught exception of type %s", exception_name);
                } else {
                    if (strstr(demangled_name, "nullptr") || strstr(demangled_name, "NULL")) {
                        // Could demangle, go with demangled_name and state that it was null
                        sprintf(abort_message, "Terminating with uncaught exception of type %s", demangled_name);
                    } else {
                        // Could demangle, go with demangled_name and exception.what() if exists
                        try {
                            __cxxabiv1::__cxa_rethrow_primary_exception(current_exception);
                        } catch (std::exception& e) {
                            // Include message from what() in the abort message
                            sprintf(abort_message, "Terminating with uncaught exception of type %s : %s", demangled_name, e.what());
                        } catch (...) {
                            // Just report the exception type since it is not an std::exception
                            sprintf(abort_message, "Terminating with uncaught exception of type %s", demangled_name);
                        }
                    }
                }

                return abort_message;
            } else {
                // Not a cpp exception, assume a custom crash and act like C
            }
        }
        catch (std::bad_cast& bc) {
            // Not a cpp exception, assume a custom crash and act like C
        }
    }

    // Assume C crash and print signal no and code
    sprintf(abort_message, "Terminating with a C crash %d : %d", signo, siginfo->si_code);
    return abort_message;
}

Testing the app

Configure your project to use C++14 and build it. Tapping the button in the main activity should create log messages like this in the logcat before the app crashes.

E/NDK Playground: Terminating with uncaught exception of type MyException : This is a really important crash message!

Working Project

For convenience, you may clone this branch of our NDK Playground repo to see it in action.