Friday, September 21, 2007

The taming of the thread

A process-driven approach to avoid thread death

Threads can be nasty beasts. This is partly attributed to their delicate nature. Threads can die. If the causes of thread death are not in your code, then MutableThread may keep your code running. This article explores a solution using an object-oriented, problem-solution method.

The problem

The first question to ask in the problem-solution method is, of course: What is the problem? The problem statement is easy to generate in existing systems since the problem generally causes the trouble. In this case, the problem is: The threads die, and the application stops running. No exception is thrown since the cause of thread death is external to the application.
The what and the how

It is extremely important to isolate the objective from the technical aspects of a possible solution. If design concerns influence the requirements early on, creative solutions might be prevented. It is a designer's nature to think about how to do everything and therefore his temptation to avoid challenges that might result in a superior solution by declaring them problematic, too ambitious, or even impossible.

The it-would-be-wonderful-if statement is a liberating way to imagine an ideal world. It produces the ideal what without regard to design limitations.

Let's define our what: Objects do not depend upon particular threads. If a thread should fail, the thread is recreated, and the object continues to run. This should be as unobtrusive as possible.
Scenarios can clarify the requirement and save time

Following the what review exercise, run through some scenarios the particular what will do. This first test can be done without spending a dime on design.

To generate some scenarios, simply imagine the potential objects created by the what and consider the possibilities that could happen. Try to think outside the box.

So, let's say we have an object that runs in a thread. We know the thread might die, and we need to deal with the consequences.

We must provide memory management to prevent so-called memory leaks. However, memory management has nothing directly to do with our thread-oriented requirement—it's beyond the scope of the what.

The crucial issue is that we want to maintain thread operation. There are a few possible subscenarios:

1. Sunny day: All is well. Just restart the thread and keep running.
2. Rainy day: Bad things happen. The object is somehow corrupted.
3. Typhoon: Really bad things happen. The JVM may be unstable.

Thinking more carefully, Scenario 2 describing corrupted data and Scenario 3 describing an unstable JVM might not affect this application. For this application, failure is the worst possibility. For others, it may be necessary to validate data or even JVM integrity. But for our simple case, the sunny day scenario suffices. Therefore, this design will not include data checking or object recreation in a new JVM. In a more robust design, the object persistence might be refreshed to validate data. In a worst-case scenario, the entire JVM should shutdown and restart.

The how: Identify objects and create a design

We must identify the objects involved in the design. First, there should be a detection object; we can call this object the watchdog. In our particular design, the watchdog watches all other threads. The watchdog runs in its own thread and has a collection of references to other threads so it can monitor them and make sure they're all alive. Conveniently, the Thread class provides an isAlive() method to determine if a thread is alive. The watchdog uses this method to detect each thread in its collection. If a thread fails, it's the watchdog's responsibility to report it.

For more robustness, this design will include a second "dog," the beta dog (the watchdog is the alpha dog). The beta dog's purpose is simply to check that the alpha dog is alive. The alpha dog also detects the beta dog.

The ThreadWatchDog is a particular MutableThread instance that monitors threads, and monitored threads must be either MutableThread or Thread (or their descendants). The watchdog runs through the collection of threads and invokes the isAlive() method. When it notices that a thread is dead, it uses reCreate() to recreate the thread if it is a mutable thread. Otherwise, it simply reports the failure.

Here's how this looks (in the ThreadWatchDog test program associated with this article):

if (lTestThread instanceof MutableThread)
if (!((MutableThread)lTestThread).isAlive())

This tests the mutable thread to see if it is alive. In the case of a MutableThread, it reports an exception and then attempts to restart the thread, if possible, with the following code:

ReportingExceptionHandler.processException( new ReportingException("Mutable Thread " + lThreadKey + " is dead"));
try {
// Attempt to restart the thread by clearing and restarting

For a MutableThread, the ThreadKey is the name assigned to the thread when it is created. The application sets up the threads and assigns them to the watchdog on startup. This is done as follows:

TestThread threadOne = new TestThread();
TestThread threadTwo = new TestThread();
TestMutableThread threadMutable = new TestMutableThread();
System.out.println("TEST: Thread One started");
System.out.println("TEST: Thread Two started");
MutableThread lWatchDog = ThreadWatchDog.getInstance();
System.out.println("TEST: Starting the watchdog");

This starts up the watchdog(s), and thread monitoring is now active. Note clearly that the threads should be started up before initializing the watchdogs, or things will get really confusing.

Note that the put() method adds threads to the ThreadWatchDog. This adds the thread to the collection. The put() method is also overloaded with a put(MutableThread mutableThread). This is because MutableThread isn't really a Thread; rather it implements the MutableRunnable interface, much as Thread implements the Runnable interface.

The MutableThread includes a handle to the actual Thread, and this can be recreated, which replaces the thread owned by the mutable thread:

public void createThread() {
mThisThread = new Thread(this, mThreadName);

Note that the mThisThread is created by passing the this through the thread constructor. That allows the current object to be assigned to the new thread.

The actual thread is encapsulated within the MutableThread and can be recreated and restarted.

The deprecated Thread.stop() method is used in the test program to show what happens when threads die:

System.out.println("TEST: Stopping threadOne");
System.out.println("TEST: Stopping threadMutable");

Later in the test program, we even kill the alpha watchdog to make sure the failure is detected and reported. The watchdogs are named internally and do not need to be named as do the application threads.
More on MutableThread

The MutableThread class does not use thread groups, but this function can be added. The thread owned by the MutableThread is not accessible by any other class because the thread should not be referenced anywhere else. Users of this class should not be affected if the thread is replaced. The class is abstract because it does not implement the run() method. This must be implemented by any class that wants to be an instance of MutableThread.

Note that the MutableThread implements the Runnable interface, and the internal thread created with the new Thread(Runnable, String) then invokes the run() method within the thread's implementing Runnable class. The string passed is the name. All necessary attributes are retained within the MutableThread object so that the thread name and priority are assigned to a new thread when the createThread() method is invoked.
Explore mutation

That's about it. Download the source code file that accompanies this article and extract various bits to explore the notion of mutable threads and watchdogs on your own.

mutablethread.jar contains all the code for MutableThread and the watchdogs. I've included a PC batch file, run.bat, to help you invoke the test program. Simply type run in the same directory as the mutablethread.jar and run.bat files. The code includes a logger class that logs to the console, but this can be easily modified to a logging system such as log4j. There is also a ReportingException that handles nesting and reports the exception that a thread has died.

Finally, don't forget to summarize requirements and separate the requirements from design concerns. Following this analysis and design approach can be a big advantage in creating more reliable and extensible systems.

No comments: