Formal Verification: From Programs to Formulas

Convert programs to logical formulas and prove properties about them

Imagine being able to prove that, no matter what inputs your program receives, certain properties or invariants will always hold by the end of the execution. This is what formal verification allows us to do, and in this article, we'll explore how to convert a program into logical formulas that we can then solve to create proofs about it.

💡
If you have a background in blockchain security, we'll be implementing something like Halmos or HEVM, but for a very simple imperative language instead of EVM programs.

The general steps involved in this process are:

  1. Implementing an interpreter for a simple imperative language (IMP).

  2. Implementing a parser that turns IMP programs into ASTs.

  3. Converting IMP ASTs into logical formulas (SMT formulas), also represented as an AST.

  4. Embedding the properties we want to prove into the logical formulas.

  5. Passing the logical formulas to an SMT solver (Z3), which will determine whether the properties we specified hold true for every possible input to the program.

In this article, we'll see how to implement all the previous steps intuitively. Moreover, you'll find a link to the code on GitHub at the end of the article. If there's enough interest, I'll create a follow-up article going through that code step by step. Let's begin!

Prerequisites

If you are reading this, I assume you already have a good understanding of general programming concepts. In addition, I suggest having a basic understanding of what a parser and an AST are before reading this article.

Converting the program into an AST

First, we need to define the structure of the IMP language that we are going to be working with. Here's an example program written in the IMP language:

balance := 0;
bonus := 10;

if (1000 <= deposit) {
    balance := balance + deposit * bonus
} else {
    balance := balance + deposit
}

This program receives a deposit amount, and adds it to the balance. If the deposit is greater than 1000, then it gets multiplied by a bonus factor. We'll see how we can formally verify invariants such as balance >= 0, for every possible deposit amount.

To convert the program into logical formulas, we first need to convert it from text to a representation in which we can manipulate it better: an AST. Let's see a simple example on how a program is represented as an AST:

a := 1;

if (a == 1) {
    a := 2
} else {
    a := 3
}

IMP AST

We can see the AST is just a tree with different types of nodes. For instance:

  • Seq represents a sequence of two statements. Thus, it has two child nodes.

  • Set represents a variable assignment and holds the variable name. It has one child node that represents the value of the variable.

  • Lit represents an integer literal. It doesn't have any child nodes, so we call it a leaf of the tree.

  • If represents an if-then-else statement, and has three child nodes: the condition, the statement in the if branch, and the statement in the else branch.

Transforming the IMP AST into a logical formula

Now that we have an IMP AST, we have to convert it to a logical formula, which can be also represented as an AST. We'll call the latter the Z3 AST since we'll use a tool called Z3 to solve the formulas (we'll see exactly what solving the formulas means in the next sections).

💡
Z3 is an open-source solver developed by Microsoft, and it's the industry standard. However, there are other solvers that could be used.

Let's see how the IMP AST we saw before translates to a Z3 AST:

Z3 AST

Although this doesn’t look exactly as an AST, the structure is fairly similar to the one of the IMP AST. The main differences are:

  • We use node types provided by the Z3 API.

    • The bv prefix stands for bit-vectors, which are used to represent numbers as an array of bits. In this case, we are using arrays of 32 bits for variables and literals.

      💡
      There are different types of SMT formulas, which are called theories. In this case, we are using the theory of fixed-width bit-vectors.
  • Each statement converts to a separate tree; that's why we see one tree for the initial variable assignment and one "tree" for the if-then-else (ite).

    • All trees, however, are part of the same Z3 computation, which represents the logical formula that will be solved at the end.
  • Variable assignments are represented with a combination of Z3.eq(var, value) and Z3.assert(condition).

    • For example, the assignment a := 1 translates to Z3.assert(Z3.eq("a", 1)).

      💡
      Z3.assert adds a constraint to the Z3 computation, which links the variable with its corresponding value.
  • Each time a variable is assigned to, we create a fresh variable with the same name, and Z3 assigns it an incremental id that differentiates it from the variable before the assignment.

    💡
    This is known as the Static Single-Assignment form (SSA).
  • The variable a!3 represents the value of a after the if-then-else, and encodes all possible values of a considering both execution paths (the one where the if branch is executed, and the one where the else branch is executed instead).

    💡
    In the example program we are considering, we already know what branch will be executed since there are no free variables (i.e. all variables have a fixed initial value). However, we'll then see an example in which the branch to be executed is unknown until runtime, so the solver has to consider all possible outcomes to determine whether a property holds true for every initial state of the program.

Adding constraints to the logical formula

Once we have the Z3 AST (i.e. the logical formula that encodes the IMP program), we can add aditional constraints to it. These constraints will represent the properties or invariants that we want to formally prove about our program.

a := 1; // a!0

if (a == 1) {
    a := 2 // a!1
} else {
    a := 3 // a!2
}

// a!3

In the example program we are considering, we could add a constraint on the final value of a. For this, we would have to constrain the a!3 variable, as we saw in the Z3 AST before. For instance, we could add the constraint a!3 == 2, which should hold true given the logic of the program.

Now we are finally ready to provide the Z3 solver with the Z3 AST plus the final constraints we added. The solver's job will be to find one possible assignment to all the variables in the program (namely a!0, a!1, a!2, a!3) that satisfies all the constraints (i.e. the ones in the Z3 AST, and the ones we added at the end).

  • If the solver finds an assignment for all the variables that satisfies all constraints, it will output those assignments.

  • If it can't find an assignment that satisfies all constraints, it will tell us that the logical formula is unsatisfiable.

Running the solver to prove properties

Let's go back to the first example program we saw to put everything together:

balance := 0;
bonus := 10;

if (1000 <= deposit) {
    balance := balance + deposit * bonus
} else {
    balance := balance + deposit
}

There are three variables in this program: balance, bonus and deposit. Note that deposit doesn't have an initial value (i.e. it's a free variable), so we'll let Z3 explore all possible initial values to prove some properties about this program.

Generating counterexamples

First, we'll want to prove that for every non-negative deposit amount, after the program executes, the balance is always non-negative as well. For this, we need to impose two constraints: one for the initial value of deposit, and one for the final value of balance. We can do so by using Z3.assert as we saw before:

Representing properties as constraints to the Z3 computation

  • The box on the right is the Z3 AST representation of the program along with its corresponding constraints, which is generated with the process we discussed before.

  • We add two additional constraints (the two boxes on the left) that encode the property we want to prove about the program:

    1. We constrain the initial value of deposit (i.e. deposit!0) to be greater than or equal to 0, using Z3.bvSge (signed >= for bit-vectors).

    2. We constrain the final value of balance (i.e. balance!k, where k is just the internal id Z3 will assign to that final variable) to be less than or equal to -1, using Z3.bvSle (signed <= for bit-vectors).

      💡
      Note that we want to prove that balance >= 0, but we are actually adding the constraint balance <= -1, which is the opposite. This is because if the solver finds an initial value for deposit that satisfies balance <= -1 at the end of the execution, we want it to output it as a counterexample. If the solver can't find such initial value, it will output Unsatisfiable and we'll know there's no initial value for deposit that makes the balance negative, which is what we want to prove.

If we finally run the Z3 solver passing it the final logical formula we just built, it will output the following:

deposit!0 -> #x7fffffff

deposit!8 -> #x7fffffff
bonus!7 -> #x0000000a
balance!6 -> #xfffffff6

balance!5 -> #x7fffffff

balance!4 -> #xfffffff6

bonus!3 -> #x0000000a
balance!2 -> #x00000000

It gave us a counterexample! This means the property balance >= 0 does not hold for every possible initial deposit. Looking at the value the Z3 solver assigned to the deposit!0 variable (i.e. the initial value for deposit), we can see it's the maximum signed integer value using 32 bits (i.e. 2147483647). That value will create an overflow on the balance variable on the following line:

balance := balance + deposit * bonus

This will make the balance negative, satisfying the constraint we imposed.

Let's also understand what the rest of the variable assignments actually are:

  • balance!2 and bonus!3 are the variables that represent the initial balance and bonus at the start of the program:

      balance := 0;
      bonus := 10;
    
      bonus!3 -> #x0000000a
      balance!2 -> #x00000000
    
  • balance!4 and balance!5 represent the balance within the if and else branches, respectively.

      if (1000 <= deposit) {
          balance := balance + deposit * bonus
      } else {
          balance := balance + deposit
      }
    
      balance!5 -> #x7fffffff
      balance!4 -> #xfffffff6
    
    💡
    Note that balance!4 is where the overflow happens, since 0xfffffff6 == -10 in signed two's complement.
  • deposit:8, bonus!7 and balance:6 represent the final state of the program, after the if-then-else statement.

      deposit!8 -> #x7fffffff
      bonus!7 -> #x0000000a
      balance!6 -> #xfffffff6
    

    Both deposit and bonus will keep their original value since they are constants, and balance will be assigned the value of balance!4, since the if branch will be taken in this particular case.

💡
Remember that the Z3 solver's job is to create an assignment to all the intermediate variables such that all constraints are satisfied. From Z3's perspective, bonus!3 and bonus!7 are completely different variables; what ends up making them have the same value at the end are the constraints created during the Z3 AST construction.

Proving a property for all possible inputs

Let's conclude with an example where the Z3 solver cannot find a variable assignment to satisfy all constraints in the Z3 computation.

balance := 0;
bonus := 10;

if (1000 <= deposit) {
    balance := balance + deposit * bonus
} else {
    balance := balance + deposit
}

We now want to prove that, no matter what the deposit amount is, it's impossible for the balance to be 1000 at the end of the program's execution.

Using the same technique as before, we now impose the constraint balance == 1000.

Running the Z3 solver, we'll now get:

Unsat

This means that it's impossible to create an assignment for all the variables in the Z3 computation such that all constraints are satisfied at the same time.

Therefore, the property balance != 1000 holds true for every possible deposit amount.

Conclusion

In this article, we saw how to turn an imperative program into a logical formula (encoded using Z3's API), and then prove properties about that program by encoding these properties as constraints to the Z3 solver.

Let me know if you'd be interested in a follow-up article explaining the actual code that implements all of this from scratch. Otherwise, the code is available on GitHub for you to see on your own. Although it's implemented in Haskell (just like HEVM), it shouldn't be hard to understand it at a high level if you understood the process we went through in this article.

Until the next research project!

References