# Formal Verification: From Programs to Formulas

## Convert programs to logical formulas and prove properties about them

Imagine being able to prove that, no matter what inputs your program receives, certain properties or invariants will always hold by the end of the execution. This is what formal verification allows us to do, and in this article, we'll explore how to convert a program into logical formulas that we can then *solve* to create proofs about it.

The general steps involved in this process are:

Implementing an interpreter for a simple imperative language (IMP).

Implementing a parser that turns IMP programs into ASTs.

Converting IMP ASTs into logical formulas (SMT formulas), also represented as an AST.

Embedding the properties we want to prove into the logical formulas.

Passing the logical formulas to an SMT solver (Z3), which will determine whether the properties we specified hold true for every possible input to the program.

In this article, we'll see how to implement all the previous steps intuitively. Moreover, you'll find a link to the code on GitHub at the end of the article. If there's enough interest, I'll create a follow-up article going through that code step by step. Let's begin!

# Prerequisites

If you are reading this, I assume you already have a good understanding of general programming concepts. In addition, I suggest having a basic understanding of what a parser and an AST are before reading this article.

# Converting the program into an AST

First, we need to define the structure of the IMP language that we are going to be working with. Here's an example program written in the IMP language:

```
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
```

This program receives a `deposit`

amount, and adds it to the `balance`

. If the `deposit`

is greater than `1000`

, then it gets multiplied by a `bonus`

factor. We'll see how we can formally verify invariants such as `balance >= 0`

, for every possible `deposit`

amount.

To convert the program into logical formulas, we first need to convert it from text to a representation in which we can manipulate it better: an AST. Let's see a simple example on how a program is represented as an AST:

```
a := 1;
if (a == 1) {
a := 2
} else {
a := 3
}
```

We can see the AST is just a tree with different types of nodes. For instance:

`Seq`

represents a sequence of two*statements*. Thus, it has two child nodes.`Set`

represents a variable assignment and holds the variable name. It has one child node that represents the value of the variable.`Lit`

represents an integer literal. It doesn't have any child nodes, so we call it a*leaf*of the tree.`If`

represents an`if-then-else`

statement, and has three child nodes: the condition, the statement in the`if`

branch, and the statement in the`else`

branch.

# Transforming the IMP AST into a logical formula

Now that we have an IMP AST, we have to convert it to a logical formula, which can be also represented as an AST. We'll call the latter the *Z3 AST* since we'll use a tool called Z3 to *solve* the formulas (we'll see exactly what *solving* the formulas means in the next sections).

Let's see how the IMP AST we saw before translates to a Z3 AST:

Although this doesn’t look exactly as an AST, the structure is fairly similar to the one of the IMP AST. The main differences are:

We use node types provided by the Z3 API.

The

`bv`

prefix stands for*bit-vectors*, which are used to represent numbers as an array of bits. In this case, we are using arrays of`32`

bits for variables and literals.💡There are different types of SMT formulas, which are called*theories*. In this case, we are using the theory of fixed-width bit-vectors.

Each

*statement*converts to a separate tree; that's why we see one tree for the initial variable assignment and one "tree" for the`if-then-else`

(ite).- All trees, however, are part of the same Z3 computation, which represents the logical formula that will be solved at the end.

Variable assignments are represented with a combination of

`Z3.eq(var, value)`

and`Z3.assert(condition)`

.For example, the assignment

`a := 1`

translates to`Z3.assert(Z3.eq("a", 1))`

.💡`Z3.assert`

adds ato the Z3 computation, which links the variable with its corresponding value.*constraint*

Each time a variable is assigned to, we create a fresh variable with the same name, and Z3 assigns it an incremental

`id`

that differentiates it from the variable before the assignment.💡This is known as the Static Single-Assignment form (SSA).The variable

`a!3`

represents the value of`a`

after the`if-then-else`

, and encodes all possible values of`a`

considering both execution paths (the one where the`if`

branch is executed, and the one where the`else`

branch is executed instead).💡In the example program we are considering, we already know what branch will be executed since there are no*free*variables (i.e. all variables have a fixed initial value). However, we'll then see an example in which the branch to be executed is unknown until runtime, so the solver has to consider all possible outcomes to determine whether a property holds true for every initial state of the program.

# Adding constraints to the logical formula

Once we have the Z3 AST (i.e. the logical formula that encodes the IMP program), we can add aditional constraints to it. These constraints will represent the properties or invariants that we want to formally prove about our program.

```
a := 1; // a!0
if (a == 1) {
a := 2 // a!1
} else {
a := 3 // a!2
}
// a!3
```

In the example program we are considering, we could add a constraint on the final value of `a`

. For this, we would have to constrain the `a!3`

variable, as we saw in the Z3 AST before. For instance, we could add the constraint `a!3 == 2`

, which should hold true given the logic of the program.

Now we are finally ready to provide the Z3 solver with the Z3 AST plus the final constraints we added. The solver's job will be to find **one** possible assignment to all the variables in the program (namely `a!0`

, `a!1`

, `a!2`

, `a!3`

) that satisfies all the constraints (i.e. the ones in the Z3 AST, and the ones we added at the end).

If the solver finds an assignment for all the variables that satisfies all constraints, it will output those assignments.

If it can't find an assignment that satisfies all constraints, it will tell us that the logical formula is

*unsatisfiable*.

# Running the solver to prove properties

Let's go back to the first example program we saw to put everything together:

```
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
```

There are three variables in this program: `balance`

, `bonus`

and `deposit`

. Note that `deposit`

doesn't have an initial value (i.e. it's a free variable), so we'll let Z3 explore all possible initial values to prove some properties about this program.

## Generating counterexamples

First, we'll want to prove that for every non-negative `deposit`

amount, after the program executes, the `balance`

is always non-negative as well. For this, we need to impose two constraints: one for the initial value of `deposit`

, and one for the final value of `balance`

. We can do so by using `Z3.assert`

as we saw before:

The box on the right is the Z3 AST representation of the program along with its corresponding constraints, which is generated with the process we discussed before.

We add two additional constraints (the two boxes on the left) that encode the property we want to prove about the program:

We constrain the initial value of

`deposit`

(i.e.`deposit!0`

) to be greater than or equal to 0, using`Z3.bvSge`

(signed`>=`

for bit-vectors).We constrain the final value of

`balance`

(i.e.`balance!k`

, where`k`

is just the internal id Z3 will assign to that final variable) to be**less than or equal to**`-1`

, using`Z3.bvSle`

(signed`<=`

for bit-vectors).💡Note that we want to prove that`balance >= 0`

, but we are actually adding the constraint`balance <= -1`

, which is the opposite. This is because if the solver finds an initial value for`deposit`

that satisfies`balance <= -1`

at the end of the execution, we want it to output it as a*counterexample*. If the solver can't find such initial value, it will output`Unsatisfiable`

and we'll know there's no initial value for`deposit`

that makes the`balance`

negative, which is what we want to prove.

If we finally run the Z3 solver passing it the final logical formula we just built, it will output the following:

```
deposit!0 -> #x7fffffff
deposit!8 -> #x7fffffff
bonus!7 -> #x0000000a
balance!6 -> #xfffffff6
balance!5 -> #x7fffffff
balance!4 -> #xfffffff6
bonus!3 -> #x0000000a
balance!2 -> #x00000000
```

It gave us a counterexample! This means the property `balance >= 0`

does not hold for every possible initial `deposit`

. Looking at the value the Z3 solver assigned to the `deposit!0`

variable (i.e. the initial value for `deposit`

), we can see it's the maximum signed integer value using 32 bits (i.e. `2147483647`

). That value will create an overflow on the `balance`

variable on the following line:

```
balance := balance + deposit * bonus
```

This will make the `balance`

negative, satisfying the constraint we imposed.

Let's also understand what the rest of the variable assignments actually are:

`balance!2`

and`bonus!3`

are the variables that represent the initial`balance`

and`bonus`

at the start of the program:`balance := 0; bonus := 10;`

`bonus!3 -> #x0000000a balance!2 -> #x00000000`

`balance!4`

and`balance!5`

represent the`balance`

within the`if`

and`else`

branches, respectively.`if (1000 <= deposit) { balance := balance + deposit * bonus } else { balance := balance + deposit }`

`balance!5 -> #x7fffffff balance!4 -> #xfffffff6`

💡Note that`balance!4`

is where the overflow happens, since`0xfffffff6 == -10`

in signed two's complement.`deposit:8`

,`bonus!7`

and`balance:6`

represent the final state of the program, after the`if-then-else`

statement.`deposit!8 -> #x7fffffff bonus!7 -> #x0000000a balance!6 -> #xfffffff6`

Both

`deposit`

and`bonus`

will keep their original value since they are constants, and`balance`

will be assigned the value of`balance!4`

, since the`if`

branch will be taken in this particular case.

`bonus!3`

and `bonus!7`

are completely different variables; what ends up making them have the same value at the end are the constraints created during the Z3 AST construction.## Proving a property for all possible inputs

Let's conclude with an example where the Z3 solver cannot find a variable assignment to satisfy all constraints in the Z3 computation.

```
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
```

We now want to prove that, no matter what the `deposit`

amount is, it's impossible for the `balance`

to be `1000`

at the end of the program's execution.

Using the same technique as before, we now impose the constraint `balance == 1000`

.

Running the Z3 solver, we'll now get:

```
Unsat
```

This means that it's impossible to create an assignment for all the variables in the Z3 computation such that all constraints are satisfied at the same time.

Therefore, the property `balance != 1000`

holds true for every possible `deposit`

amount.

# Conclusion

In this article, we saw how to turn an imperative program into a logical formula (encoded using Z3's API), and then prove properties about that program by encoding these properties as constraints to the Z3 solver.

Let me know if you'd be interested in a follow-up article explaining the actual code that implements all of this from scratch. Otherwise, the code is available on GitHub for you to see on your own. Although it's implemented in Haskell (just like HEVM), it shouldn't be hard to understand it at a high level if you understood the process we went through in this article.

Until the next research project!