About takt

What's the Problem?

Developing Distributed Systems is difficult due to the following reasons:

  • Concurrency bugs are difficult to reproduce
  • Writing test cases for Integration Testing is difficult

When debugging distributed systems, we usually try to reproduce the bug by executing the system again and again while changing system parameters or inserting print statements. However, execution results of distributed systems may vary depending on various uncontrollable timing factors such as process (thread) scheduling, network speed, machine performance, etc.

In order to improve system stability, it is important to test the system under the various conditions. Unit testing has become popular in recent years, and various tools and frameworks such as xUnit are available. In addition to unit testing, integration (or system-level) testing is also important for distributed systems because the system consists of multiple unit of processing (i.e., servers, processes, and threads). However, integration testing of distributed systems is very difficult since the system behavior is non-deterministic due to the various uncontrollable timing factors. How do you define the expected results of system behavior for the test cases of integration testing?


What takt Does

To overcome the problems mentioned above, takt provides the two important features:

  • Deterministic process (thread) execution
  • Generation of various scenarios

In takt, all timing factors related to the execution of distributed systems are controlled in a centralized manner, and all processes (threads) are scheduled and executed deterministically. Therefore, if you find a bug by using takt, you can reproduce the same bug again and again.

The pattern of process (and thread) scheduling is varied by changing the random seed parameter, which is used in takt. You can easily generate various test scenarios with different coordination patterns among processes and threads. Also, takt provides functionality for injecting various faults, such as network disconnection, packet loss, system call error, etc., to the target system.


Architecture

takt consists of the three main components:

  • takt Core: realizes deterministic process (thread) scheduling and virtually executes services provided by OS (i.e., system calls)
  • Test Scenario Library and Tools: facilitates writing various test scenarios
  • Visualizer: visualizes the process (thread) behaviors and progression of exhaustive testing

The following figure shows the difference between the ordinary execution of distributed systems and the deterministic execution using takt.

In takt, each process is attached to a Syscall Trap, which catches all system calls invoked by the process, sends the type and parameters of the invoked system call, and receives the results of the processed system call from the takt Core.

takt Core schedules invocation of system calls deterministically according to the virtual clock and executes each system call virtually without interaction with OS.


Writing Test Scenario

You can write various test scenarios of distributed systems using the test scenario library (called libtakt). The following figure shows a sample code to test a client-server system. In this code, client and server processes are executed on the same virtual node, and the scenario is executed 5 times with different scheduling patterns.


takt Visualizer

Thread behavior is visualized using takt visualizer. The following example shows the behavior among seven threads (5 belongs to the server process while the other 2 belongs to the client process). Each timeline corresponds to each thread, and invoked system calls, injected faults, and detected errors of each thread are displayed as colored circles along the timeline of the thread.



Progression of exhaustive testing is also visualized using takt visualizer. The arrangement pattern of colored circles in 2D space represents the variety of system behaviors observed during execution of various test cases.

To the Page Top