Technical Information for Developers

Tilman Mehler und Dino Midzic

v0.2 Jan 2006
In the following, we give some detailed information about the implementation of StEAM. It is meant to make it easier for developers to get known to the source package of the model checker.

1. What's new

2. IVM Source Files

3. Additional Source files

4. Further Notes

5. Related Links and Literature


1. What's new

2. IVM Source Files

The original package of ivm includes the virtual machine and a modified version of the gcc compiler igcc, which produces the ELF executables to be interpreted by ivm. As stated before, the approach of StEAM does not require changes to igcc, so only the source files for the virtual machine are of relevance. Moreover, only the files located in the sub folder ivm/vm need to be regarded. These include:

2.1 icvmvm.c

In the original package, this file contained the loop which reads and executes the program's opcodes. For StEAM, this was replaced by a loop, which expands program states according to the chosen algorithm. The original program execution is only done until the main-thread registers to the model checker. At this point, the contents of the stack, the machine, the global variables and the dynamic memory are memorized to build the initial state (i.e. the root state in the search tree).

The source file also holds the functions getNextState, expandState and iterateThread. The first function chooses the next open state to be expanded according to the used search algorithm. The second function expands a given state and adds the generated successors to the open list. The third function iterates a given thread one atomic step by executing the corresponding machine instructions.

2.2 icvmsup.c

This file implements the instructions of the virtual machine in a huge collection of mini-function. Due to its size of more than 140000 lines, it is compiled to 14 separate object files (icvmsup.o, icvmsup1.o, ... ,icvmsup13.o) before linking. Compiling icvmsup takes quite long (several minutes on a 1.8GHz machine). This is quite inconvenient in cases, where a make clean needs to be done (e.g. when a constants in a header file were changed), as the original makefile will also delete the icvmsup object files, even if the machine instructions are not affected by the change. In this case it may be convenient to move the object icvmsup object file to a temporary folder before cleaning up and moving them back before the subsequent make all.

In order to do so, you can type make major_update instead of make clean; make all. This will do it automaticly.

The source file icvmsup.c was not modified during the development of StEAM. However, the file is useful to find out about the meaning of a certain instruction. In icvmsup.c, an instruction can be located through its corresponding opcode. For example, if we disassemble the dining philosophers from User Manual with

iobjdump -d philosophers | less

we find a fraction of machine code:



                 .
                 .
                 .

20:       0200 7c2c 0000  bsrl 2c9c <___do_global_ctors>
26:       a8eb 0800       movel (8,%r2),-(%sp)
2a:       a8eb 0400       movel (4,%r2),-(%sp)

                 .
                 .
                 .

We may want to find out about the instruction brsl <x> (branch sub routine long). Note that iobjdump displays opcodes according to the hi/low order of the underlying machine. In our case, we have an Intel-processor with Little-Endian notation, which means that the corresponding opcode is 0x2 rather than 0x200. In icvsump.c we find a line starting with:

_If1(_sCz2,2,s32) ...

The instructions for ivm are generated through a macro, whose first parameter is the opcode with a leading \_sCz. Note that the opcode is appended without any leading zeros. We can easily interpret the implementation of the instruction, even without knowing how the macro translates to compilable c-code:

{{(R_SP-=8,WDs32(R_SP,R_PC+6,0));R_PC=(is32(2)+R_PC);}

First, the stack-pointer (R\_PC) is decremented to reserve space for the return-address of the called sub routine. Then, that address is stored at the new position of stack-pointer through the macro WDs32 (write signed 32-bit data). The return address corresponds to the current program counter plus 6 - the length of the bsrl instruction. Afterwards, the program counter is set to the the current position plus a 32-bit offset that follows the opcode. Apparently, the macro is32(y) reads a 32-bit parameter from address of the program counter plus y.

2.3 cvm.h

This header defines the basic data types needed for the virtual machine, such as the machine registers. The macros used by the mini-functions in icvmsup.c - such as WDs32 - are also defined here. As a big convenience, all memory read- and write-accesses are encapsulated through the macros of this header. This enabled us to record all such accesses by simply enhancing the macros. On the one hand, this allow us to determine changes made by a state transition without fully comparing the pre- and successor state. On the other hand, we can capture illegal memory accesses before they actually happen.

3. Additional Source files

The following source files were added for StEAM.

3.1 avltree.h|.c queue.h|.c

The name is somewhat misleading, as this folder also contains .c files. More precisely, it contains the AVL-Tree package used for the lock- and memory pool of states in StEAM. Using the AVL trees may not have been the best choice for storing the pools, as their handling is tedious and a simple hash table may be faster in many cases. Developers are hereby encouraged to replace the AVL trees with a more suitable data structure.

3.2 IVMThread.h|.cc

This is the class definition for threads in StEAM. All new thread classes must be derived from IVMThread. A thread registers to the model checker by executing a TREG command-pattern in its start-method. A second TREG statement tells StEAM, it has reached the run()-method of the thread.

3.3 mfgen.c

This is the source code of the tool, which is parameterized with the names of all involved source files of the checked program. In turn generates a makefile for building the ivm-executable that can be model checked by StEAM.

3.4 extcmdlines.cpp

This is the source code annotation tool. Before the investigated program is compiled, its source is annotated with line number information and information about the name of the source file through command patterns. The annotation is automatically done in the makefile generated by mfgen and, hence, remains transparent to the user. It must be assured though, that extcmdlines.cpp is compiled to an executable called extcml. It must be in the system path.

The source annotation tool is actually a workaround, as StEAM does currently not use the debug information that can be compiled into ELF files (e.g. using the -g option on gcc). Writing a parser for this information would have cost too much time considering the manpower available for the project. Developers are encouraged to add this functionality to StEAM, as it would make the annotation redundant and speed up the entire model checking process.

3.5 pool.h|.c

These files define the functions needed for the lock- and memory-pool of StEAM's state description.

3.6 icvm_verify.h|.c

These source files are the core of StEAM as they define the command patterns, data structures and functions needed by the model checker. A command pattern is generated by the macro INCDECPATTERN, which translates into a senseless sequence of increment and decrement instructions. The parameters of a pattern are realized by assigning their value to a local variable. After parsing the command pattern, StEAM can obtain the values of the parameters directly from the stack.

Most functions defined in icvm_verify.c relate to program states. This includes, cloning of a state, comparison of two states, deletion of states etc. Moreover, icvm_verify.c implements the functions of StEAM's internal memory management. The management is encapsulated through the functions gmalloc(int s) and gpmalloc(int s, char * p), which allocate memory of a given size or of a given size plus a given purpose. When StEAM is invoked with the parameter -memguard, all memory regions allocated with gmalloc or gpmalloc are stored in a private memory pool. NOTE: This refers to the memory pool of the model checker and must not be confused with that of the investigated program. It is possible to print the contents of the memory pool using the function dumpMemoryPool().

StEAM's internal memory management is useful to e.g. detect memory leaks in the model checker. For instance, this was important during the implementation of state-reconstruction. Here, states are frequently deleted after being replaced by a corresponding mini-state. If a few bytes of the state are not freed, the resulting leaks will quickly exceed the available memory.

The source file icvm_verify.c also implements the currently supported heuristics through the function getHeuristicEstimate. The latter function is called by getNextState() in icvmvm.c, which chooses the state to be expanded next according to its heuristic value. Note that for simplicity undirected search is also implemented through heuristic values. For BFS, the heuristic estimate of a state corresponds to its depth in the search tree. For DFS the estimate is determined by subtracting the depth from a constant which is greater than any search depth we may ever encounter.

3.7 steam.c

This is the frontend of the model checker, which calls the executable of the modified virtual machine. Environment variables are used to pass the parameters.

3.8 hash.h|.c

These files implement the (non-)incremental hashing in StEAM. The hash function is parameterized with a bitmask, which determines the hashed components of the state description. The frontend of steam currently only gives two choices: Either only the cpu-registers are hashed (default) or the entire state (-fullhash). The cpu registers were chosen for the partial hash option, since they constitute the state component that is most likely to be changed by a state transition. Developers are encouraged to try other subsets of components.

4. Further Notes

4.1 Changes in icvmvm

The functions getNextState, expandState, iterateThread and printTrail actually do not belong in the file icvmvm.c and it would be a good idea to move them to icvm_verify.c.

When StEAM is started with the -scrtrl option, the error trail is printed as the actual source lines of the checked program (rather than just the line numbers). For this, StEAM makes use of the Unix-commands cat and grep. Hence, the option will not work on systems where these commands are not available. It would be desirable to rewrite the printTrail function such that it no longer requires any external programs.

As a definite drawback, states must currently be fully expanded. The problem is, that it is not a-priori known, if during thread iteration a non-deterministic statement is encountered. In the latter case, execution of a thread yields more than one successor. For this reason state reconstruction is currently not available for programs involving non-deterministic statements. Also the full expansion is disadvantageous for depth-first variation, such as DFS and IDA*, since at depth d the model checker needs to memorize d o states instead of d - where o is the outgoing degree of a node in the search tree.

Hence, a big improvement would be to add the possibility to directly generate the i-th successor of a state.

5. Related Links and Literature

5.1 Links

Internet C/C++ Virtual Machine - Internet C++/Internet Virtual Machine is a high-speed, open-source alternative to Java and .NET. Applications written in standard languages, such as C and C++. OpenGL 1.2 support brings portable high-speed 3D graphics and games.

StEAM User Manual

5.2 Literature

A. Groce and W. Visser
Model Checking Java Programs using Structural Heuristics.
In International Symposium on Software Testing and Analysis (ISSTA), pages 12-21, 2002.

T. Mehler and P. Leven.
Introduction to StEAM - an assembly-level software model checker.
Technical report, University of Freiburg, 2003.