<![CDATA[Research by Ezequiel]]>https://eperez.bloghttps://cdn.hashnode.com/res/hashnode/image/upload/v1718524689931/Y-jljpdR-.pngResearch by Ezequielhttps://eperez.blogRSS for NodeWed, 06 Nov 2024 15:01:52 GMT60<![CDATA[Formal Verification: From Programs to Formulas]]>https://eperez.blog/formal-verification-from-programs-to-formulashttps://eperez.blog/formal-verification-from-programs-to-formulasWed, 18 Sep 2024 15:42:35 GMT<![CDATA[<p>Imagine being able to prove that, no matter what inputs your program receives, certain properties or invariants will always hold by the end of the execution. This is what formal verification allows us to do, and in this article, we'll explore how to convert a program into logical formulas that we can then <em>solve</em> to create proofs about it.</p><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">If you have a background in blockchain security, we'll be implementing something like <a target="_blank" href="https://github.com/a16z/halmos">Halmos</a> or <a target="_blank" href="https://github.com/ethereum/hevm">HEVM</a>, but for a very simple imperative language instead of EVM programs.</div></div><p>The general steps involved in this process are:</p><ol><li><p>Implementing an interpreter for a simple imperative language (IMP).</p></li><li><p>Implementing a parser that turns IMP programs into <a target="_blank" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">ASTs</a>.</p></li><li><p>Converting IMP ASTs into logical formulas (<a target="_blank" href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories">SMT</a> formulas), also represented as an AST.</p></li><li><p>Embedding the properties we want to prove into the logical formulas.</p></li><li><p>Passing the logical formulas to an SMT solver (<a target="_blank" href="https://github.com/Z3Prover/z3">Z3</a>), which will determine whether the properties we specified hold true for every possible input to the program.</p></li></ol><p>In this article, we'll see how to implement all the previous steps intuitively. Moreover, you'll find a link to the code on GitHub at the end of the article. If there's enough interest, I'll create a follow-up article going through that code step by step. Let's begin!</p><h1 id="heading-prerequisites">Prerequisites</h1><p>If you are reading this, I assume you already have a good understanding of general programming concepts. In addition, I suggest having a basic understanding of what a parser and an AST are before reading this article.</p><h1 id="heading-converting-the-program-into-an-ast">Converting the program into an AST</h1><p>First, we need to define the structure of the IMP language that we are going to be working with. Here's an example program written in the IMP language:</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>This program receives a <code>deposit</code> amount, and adds it to the <code>balance</code>. If the <code>deposit</code> is greater than <code>1000</code>, then it gets multiplied by a <code>bonus</code> factor. We'll see how we can formally verify invariants such as <code>balance >= 0</code>, for every possible <code>deposit</code> amount.</p><p>To convert the program into logical formulas, we first need to convert it from text to a representation in which we can manipulate it better: an AST. Let's see a simple example on how a program is represented as an AST:</p><pre><code class="lang-c">a := <span class="hljs-number">1</span>;<span class="hljs-keyword">if</span> (a == <span class="hljs-number">1</span>) { a := <span class="hljs-number">2</span>} <span class="hljs-keyword">else</span> { a := <span class="hljs-number">3</span>}</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718603344576/90e90358-12ae-4ccf-9efe-aa4991c31fb2.png" alt="IMP AST" class="image--center mx-auto" /></p><p>We can see the AST is just a tree with different types of nodes. For instance:</p><ul><li><p><code>Seq</code> represents a sequence of two <em>statements</em>. Thus, it has two child nodes.</p></li><li><p><code>Set</code> represents a variable assignment and holds the variable name. It has one child node that represents the value of the variable.</p></li><li><p><code>Lit</code> represents an integer literal. It doesn't have any child nodes, so we call it a <em>leaf</em> of the tree.</p></li><li><p><code>If</code> represents an <code>if-then-else</code> statement, and has three child nodes: the condition, the statement in the <code>if</code> branch, and the statement in the <code>else</code> branch.</p></li></ul><h1 id="heading-transforming-the-imp-ast-into-a-logical-formula">Transforming the IMP AST into a logical formula</h1><p>Now that we have an IMP AST, we have to convert it to a logical formula, which can be also represented as an AST. We'll call the latter the <em>Z3 AST</em> since we'll use a tool called <a target="_blank" href="https://github.com/Z3Prover/z3">Z3</a> to <em>solve</em> the formulas (we'll see exactly what <em>solving</em> the formulas means in the next sections).</p><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">Z3 is an open-source solver developed by Microsoft, and it's the industry standard. However, there are <a target="_blank" href="https://smt-lib.org/solvers.shtml">other solvers</a> that could be used.</div></div><p>Let's see how the IMP AST we saw before translates to a Z3 AST:</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719181170857/31d1a2d8-1077-4ad0-bedc-1cd568165011.png" alt="Z3 AST" class="image--center mx-auto" /></p><p>Although this doesnt look exactly as an AST, the structure is fairly similar to the one of the IMP AST. The main differences are:</p><ul><li><p>We use node types provided by the <a target="_blank" href="https://z3prover.github.io/api/html/group__capi.html">Z3 API</a>.</p><ul><li><p>The <code>bv</code> prefix stands for <em>bit-vectors</em>, which are used to represent numbers as an array of bits. In this case, we are using arrays of <code>32</code> bits for variables and literals.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">There are different types of SMT formulas, which are called <a target="_blank" href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories#Decidable_theories"><em>theories</em></a>. In this case, we are using the theory of fixed-width bit-vectors.</div> </div></li></ul></li><li><p>Each <em>statement</em> converts to a separate tree; that's why we see one tree for the initial variable assignment and one "tree" for the <code>if-then-else</code> (ite).</p><ul><li>All trees, however, are part of the same Z3 computation, which represents the logical formula that will be solved at the end.</li></ul></li><li><p>Variable assignments are represented with a combination of <code>Z3.eq(var, value)</code> and <code>Z3.assert(condition)</code>.</p><ul><li><p>For example, the assignment <code>a := 1</code> translates to <code>Z3.assert(Z3.eq("a", 1))</code>.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text"><a target="_blank" href="https://z3prover.github.io/api/html/group__capi.html#gaa4ab09a1b7e3ee6e578cd33e67cbf894"><code>Z3.assert</code></a> adds a <strong><em>constraint </em></strong>to the Z3 computation, which links the variable with its corresponding value.</div> </div></li></ul></li><li><p>Each time a variable is assigned to, we create a fresh variable with the same name, and Z3 assigns it an incremental <code>id</code> that differentiates it from the variable before the assignment.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">This is known as the <a target="_blank" href="https://en.wikipedia.org/wiki/Static_single-assignment_form">Static Single-Assignment form</a> (SSA).</div> </div></li><li><p>The variable <code>a!3</code> represents the value of <code>a</code> after the <code>if-then-else</code>, and encodes all possible values of <code>a</code> considering both execution paths (the one where the <code>if</code> branch is executed, and the one where the <code>else</code> branch is executed instead).</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">In the example program we are considering, we already know what branch will be executed since there are no <em>free </em>variables (i.e. all variables have a fixed initial value). However, we'll then see an example in which the branch to be executed is unknown until runtime, so the solver has to consider all possible outcomes to determine whether a property holds true for every initial state of the program.</div> </div></li></ul><h1 id="heading-adding-constraints-to-the-logical-formula">Adding constraints to the logical formula</h1><p>Once we have the Z3 AST (i.e. the logical formula that encodes the IMP program), we can add aditional constraints to it. These constraints will represent the properties or invariants that we want to formally prove about our program.</p><pre><code class="lang-c">a := <span class="hljs-number">1</span>; <span class="hljs-comment">// a!0</span><span class="hljs-keyword">if</span> (a == <span class="hljs-number">1</span>) { a := <span class="hljs-number">2</span> <span class="hljs-comment">// a!1</span>} <span class="hljs-keyword">else</span> { a := <span class="hljs-number">3</span> <span class="hljs-comment">// a!2</span>}<span class="hljs-comment">// a!3</span></code></pre><p>In the example program we are considering, we could add a constraint on the final value of <code>a</code>. For this, we would have to constrain the <code>a!3</code> variable, as we saw in the Z3 AST before. For instance, we could add the constraint <code>a!3 == 2</code>, which should hold true given the logic of the program.</p><p>Now we are finally ready to provide the Z3 solver with the Z3 AST plus the final constraints we added. The solver's job will be to find <strong>one</strong> possible assignment to all the variables in the program (namely <code>a!0</code>, <code>a!1</code>, <code>a!2</code>, <code>a!3</code>) that satisfies all the constraints (i.e. the ones in the Z3 AST, and the ones we added at the end).</p><ul><li><p>If the solver finds an assignment for all the variables that satisfies all constraints, it will output those assignments.</p></li><li><p>If it can't find an assignment that satisfies all constraints, it will tell us that the logical formula is <em>unsatisfiable</em>.</p></li></ul><h1 id="heading-running-the-solver-to-prove-properties">Running the solver to prove properties</h1><p>Let's go back to the first example program we saw to put everything together:</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>There are three variables in this program: <code>balance</code>, <code>bonus</code> and <code>deposit</code>. Note that <code>deposit</code> doesn't have an initial value (i.e. it's a free variable), so we'll let Z3 explore all possible initial values to prove some properties about this program.</p><h2 id="heading-generating-counterexamples">Generating counterexamples</h2><p>First, we'll want to prove that for every non-negative <code>deposit</code> amount, after the program executes, the <code>balance</code> is always non-negative as well. For this, we need to impose two constraints: one for the initial value of <code>deposit</code>, and one for the final value of <code>balance</code>. We can do so by using <code>Z3.assert</code> as we saw before:</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719181136738/4f953c8f-9eb5-4217-9f53-f14e55cc6520.png" alt="Representing properties as constraints to the Z3 computation" class="image--center mx-auto" /></p><ul><li><p>The box on the right is the Z3 AST representation of the program along with its corresponding constraints, which is generated with the process we discussed before.</p></li><li><p>We add two additional constraints (the two boxes on the left) that encode the property we want to prove about the program:</p><ol><li><p>We constrain the initial value of <code>deposit</code> (i.e. <code>deposit!0</code>) to be greater than or equal to 0, using <code>Z3.bvSge</code> (signed <code>>=</code> for bit-vectors).</p></li><li><p>We constrain the final value of <code>balance</code> (i.e. <code>balance!k</code>, where <code>k</code> is just the internal id Z3 will assign to that final variable) to be <strong>less than or equal to</strong> <code>-1</code>, using <code>Z3.bvSle</code> (signed <code><=</code> for bit-vectors).</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">Note that we want to prove that <code>balance >= 0</code>, but we are actually adding the constraint <code>balance <= -1</code>, which is the opposite. This is because if the solver finds an initial value for <code>deposit</code> that satisfies <code>balance <= -1</code> at the end of the execution, we want it to output it as a <em>counterexample</em>. If the solver can't find such initial value, it will output <code>Unsatisfiable</code> and we'll know there's no initial value for <code>deposit</code> that makes the <code>balance</code> negative, which is what we want to prove.</div> </div></li></ol></li></ul><p>If we finally run the Z3 solver passing it the final logical formula we just built, it will output the following:</p><pre><code class="lang-plaintext">deposit!0 -> #x7fffffffdeposit!8 -> #x7fffffffbonus!7 -> #x0000000abalance!6 -> #xfffffff6balance!5 -> #x7fffffffbalance!4 -> #xfffffff6bonus!3 -> #x0000000abalance!2 -> #x00000000</code></pre><p>It gave us a counterexample! This means the property <code>balance >= 0</code> does not hold for every possible initial <code>deposit</code>. Looking at the value the Z3 solver assigned to the <code>deposit!0</code> variable (i.e. the initial value for <code>deposit</code>), we can see it's the maximum signed integer value using 32 bits (i.e. <code>2147483647</code>). That value will create an overflow on the <code>balance</code> variable on the following line:</p><pre><code class="lang-c">balance := balance + deposit * bonus</code></pre><p>This will make the <code>balance</code> negative, satisfying the constraint we imposed.</p><p>Let's also understand what the rest of the variable assignments actually are:</p><ul><li><p><code>balance!2</code> and <code>bonus!3</code> are the variables that represent the initial <code>balance</code> and <code>bonus</code> at the start of the program:</p><pre><code class="lang-c"> balance := <span class="hljs-number">0</span>; bonus := <span class="hljs-number">10</span>;</code></pre><pre><code class="lang-plaintext"> bonus!3 -> #x0000000a balance!2 -> #x00000000</code></pre></li><li><p><code>balance!4</code> and <code>balance!5</code> represent the <code>balance</code> within the <code>if</code> and <code>else</code> branches, respectively.</p><pre><code class="lang-c"> <span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus } <span class="hljs-keyword">else</span> { balance := balance + deposit }</code></pre><pre><code class="lang-plaintext"> balance!5 -> #x7fffffff balance!4 -> #xfffffff6</code></pre> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">Note that <code>balance!4</code> is where the overflow happens, since <code>0xfffffff6 == -10</code> in signed <a target="_blank" href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a>.</div> </div></li><li><p><code>deposit:8</code>, <code>bonus!7</code> and <code>balance:6</code> represent the final state of the program, after the <code>if-then-else</code> statement.</p><pre><code class="lang-plaintext"> deposit!8 -> #x7fffffff bonus!7 -> #x0000000a balance!6 -> #xfffffff6</code></pre><p> Both <code>deposit</code> and <code>bonus</code> will keep their original value since they are constants, and <code>balance</code> will be assigned the value of <code>balance!4</code>, since the <code>if</code> branch will be taken in this particular case.</p></li></ul><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">Remember that the Z3 solver's job is to create an assignment to all the intermediate variables such that all constraints are satisfied. From Z3's perspective, <code>bonus!3</code> and <code>bonus!7</code> are completely different variables; what ends up making them have the same value at the end are the constraints created during the Z3 AST construction.</div></div><h2 id="heading-proving-a-property-for-all-possible-inputs">Proving a property for all possible inputs</h2><p>Let's conclude with an example where the Z3 solver cannot find a variable assignment to satisfy all constraints in the Z3 computation.</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>We now want to prove that, no matter what the <code>deposit</code> amount is, it's impossible for the <code>balance</code> to be <code>1000</code> at the end of the program's execution.</p><p>Using the same technique as before, we now impose the constraint <code>balance == 1000</code>.</p><p>Running the Z3 solver, we'll now get:</p><pre><code class="lang-c">Unsat</code></pre><p>This means that it's impossible to create an assignment for all the variables in the Z3 computation such that all constraints are satisfied at the same time.</p><p>Therefore, the property <code>balance != 1000</code> holds true for every possible <code>deposit</code> amount.</p><h1 id="heading-conclusion">Conclusion</h1><p>In this article, we saw how to turn an imperative program into a logical formula (encoded using Z3's API), and then prove properties about that program by encoding these properties as constraints to the Z3 solver.</p><p>Let me know if you'd be interested in a follow-up article explaining the actual code that implements all of this from scratch. Otherwise, the code is available <a target="_blank" href="https://github.com/EperezOk/formal-verification-imp">on GitHub</a> for you to see on your own. Although it's implemented in Haskell (just like <a target="_blank" href="https://github.com/ethereum/hevm">HEVM</a>), it shouldn't be hard to understand it at a high level if you understood the process we went through in this article.</p><p>Until the next research project!</p><h1 id="heading-references">References</h1><ul><li><a target="_blank" href="https://github.com/EperezOk/formal-verification-imp/tree/main">GitHub Repository</a></li></ul><ul><li><p><a target="_blank" href="https://youtu.be/ruNFcH-KibY?si=e3AHsP5a6dzHoroa">[Talk] Analyzing Programs with Z3, Tikhon Jelvis</a></p></li><li><p><a target="_blank" href="https://www.cs.colostate.edu/~cs440/spring19/slides/z3-tutorial.pdf">Z3 - a Tutorial, Leonardo de Moura and Nikolaj Bjrner</a></p></li></ul>]]><![CDATA[<p>Imagine being able to prove that, no matter what inputs your program receives, certain properties or invariants will always hold by the end of the execution. This is what formal verification allows us to do, and in this article, we'll explore how to convert a program into logical formulas that we can then <em>solve</em> to create proofs about it.</p><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">If you have a background in blockchain security, we'll be implementing something like <a target="_blank" href="https://github.com/a16z/halmos">Halmos</a> or <a target="_blank" href="https://github.com/ethereum/hevm">HEVM</a>, but for a very simple imperative language instead of EVM programs.</div></div><p>The general steps involved in this process are:</p><ol><li><p>Implementing an interpreter for a simple imperative language (IMP).</p></li><li><p>Implementing a parser that turns IMP programs into <a target="_blank" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">ASTs</a>.</p></li><li><p>Converting IMP ASTs into logical formulas (<a target="_blank" href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories">SMT</a> formulas), also represented as an AST.</p></li><li><p>Embedding the properties we want to prove into the logical formulas.</p></li><li><p>Passing the logical formulas to an SMT solver (<a target="_blank" href="https://github.com/Z3Prover/z3">Z3</a>), which will determine whether the properties we specified hold true for every possible input to the program.</p></li></ol><p>In this article, we'll see how to implement all the previous steps intuitively. Moreover, you'll find a link to the code on GitHub at the end of the article. If there's enough interest, I'll create a follow-up article going through that code step by step. Let's begin!</p><h1 id="heading-prerequisites">Prerequisites</h1><p>If you are reading this, I assume you already have a good understanding of general programming concepts. In addition, I suggest having a basic understanding of what a parser and an AST are before reading this article.</p><h1 id="heading-converting-the-program-into-an-ast">Converting the program into an AST</h1><p>First, we need to define the structure of the IMP language that we are going to be working with. Here's an example program written in the IMP language:</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>This program receives a <code>deposit</code> amount, and adds it to the <code>balance</code>. If the <code>deposit</code> is greater than <code>1000</code>, then it gets multiplied by a <code>bonus</code> factor. We'll see how we can formally verify invariants such as <code>balance >= 0</code>, for every possible <code>deposit</code> amount.</p><p>To convert the program into logical formulas, we first need to convert it from text to a representation in which we can manipulate it better: an AST. Let's see a simple example on how a program is represented as an AST:</p><pre><code class="lang-c">a := <span class="hljs-number">1</span>;<span class="hljs-keyword">if</span> (a == <span class="hljs-number">1</span>) { a := <span class="hljs-number">2</span>} <span class="hljs-keyword">else</span> { a := <span class="hljs-number">3</span>}</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718603344576/90e90358-12ae-4ccf-9efe-aa4991c31fb2.png" alt="IMP AST" class="image--center mx-auto" /></p><p>We can see the AST is just a tree with different types of nodes. For instance:</p><ul><li><p><code>Seq</code> represents a sequence of two <em>statements</em>. Thus, it has two child nodes.</p></li><li><p><code>Set</code> represents a variable assignment and holds the variable name. It has one child node that represents the value of the variable.</p></li><li><p><code>Lit</code> represents an integer literal. It doesn't have any child nodes, so we call it a <em>leaf</em> of the tree.</p></li><li><p><code>If</code> represents an <code>if-then-else</code> statement, and has three child nodes: the condition, the statement in the <code>if</code> branch, and the statement in the <code>else</code> branch.</p></li></ul><h1 id="heading-transforming-the-imp-ast-into-a-logical-formula">Transforming the IMP AST into a logical formula</h1><p>Now that we have an IMP AST, we have to convert it to a logical formula, which can be also represented as an AST. We'll call the latter the <em>Z3 AST</em> since we'll use a tool called <a target="_blank" href="https://github.com/Z3Prover/z3">Z3</a> to <em>solve</em> the formulas (we'll see exactly what <em>solving</em> the formulas means in the next sections).</p><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">Z3 is an open-source solver developed by Microsoft, and it's the industry standard. However, there are <a target="_blank" href="https://smt-lib.org/solvers.shtml">other solvers</a> that could be used.</div></div><p>Let's see how the IMP AST we saw before translates to a Z3 AST:</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719181170857/31d1a2d8-1077-4ad0-bedc-1cd568165011.png" alt="Z3 AST" class="image--center mx-auto" /></p><p>Although this doesnt look exactly as an AST, the structure is fairly similar to the one of the IMP AST. The main differences are:</p><ul><li><p>We use node types provided by the <a target="_blank" href="https://z3prover.github.io/api/html/group__capi.html">Z3 API</a>.</p><ul><li><p>The <code>bv</code> prefix stands for <em>bit-vectors</em>, which are used to represent numbers as an array of bits. In this case, we are using arrays of <code>32</code> bits for variables and literals.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">There are different types of SMT formulas, which are called <a target="_blank" href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories#Decidable_theories"><em>theories</em></a>. In this case, we are using the theory of fixed-width bit-vectors.</div> </div></li></ul></li><li><p>Each <em>statement</em> converts to a separate tree; that's why we see one tree for the initial variable assignment and one "tree" for the <code>if-then-else</code> (ite).</p><ul><li>All trees, however, are part of the same Z3 computation, which represents the logical formula that will be solved at the end.</li></ul></li><li><p>Variable assignments are represented with a combination of <code>Z3.eq(var, value)</code> and <code>Z3.assert(condition)</code>.</p><ul><li><p>For example, the assignment <code>a := 1</code> translates to <code>Z3.assert(Z3.eq("a", 1))</code>.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text"><a target="_blank" href="https://z3prover.github.io/api/html/group__capi.html#gaa4ab09a1b7e3ee6e578cd33e67cbf894"><code>Z3.assert</code></a> adds a <strong><em>constraint </em></strong>to the Z3 computation, which links the variable with its corresponding value.</div> </div></li></ul></li><li><p>Each time a variable is assigned to, we create a fresh variable with the same name, and Z3 assigns it an incremental <code>id</code> that differentiates it from the variable before the assignment.</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">This is known as the <a target="_blank" href="https://en.wikipedia.org/wiki/Static_single-assignment_form">Static Single-Assignment form</a> (SSA).</div> </div></li><li><p>The variable <code>a!3</code> represents the value of <code>a</code> after the <code>if-then-else</code>, and encodes all possible values of <code>a</code> considering both execution paths (the one where the <code>if</code> branch is executed, and the one where the <code>else</code> branch is executed instead).</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">In the example program we are considering, we already know what branch will be executed since there are no <em>free </em>variables (i.e. all variables have a fixed initial value). However, we'll then see an example in which the branch to be executed is unknown until runtime, so the solver has to consider all possible outcomes to determine whether a property holds true for every initial state of the program.</div> </div></li></ul><h1 id="heading-adding-constraints-to-the-logical-formula">Adding constraints to the logical formula</h1><p>Once we have the Z3 AST (i.e. the logical formula that encodes the IMP program), we can add aditional constraints to it. These constraints will represent the properties or invariants that we want to formally prove about our program.</p><pre><code class="lang-c">a := <span class="hljs-number">1</span>; <span class="hljs-comment">// a!0</span><span class="hljs-keyword">if</span> (a == <span class="hljs-number">1</span>) { a := <span class="hljs-number">2</span> <span class="hljs-comment">// a!1</span>} <span class="hljs-keyword">else</span> { a := <span class="hljs-number">3</span> <span class="hljs-comment">// a!2</span>}<span class="hljs-comment">// a!3</span></code></pre><p>In the example program we are considering, we could add a constraint on the final value of <code>a</code>. For this, we would have to constrain the <code>a!3</code> variable, as we saw in the Z3 AST before. For instance, we could add the constraint <code>a!3 == 2</code>, which should hold true given the logic of the program.</p><p>Now we are finally ready to provide the Z3 solver with the Z3 AST plus the final constraints we added. The solver's job will be to find <strong>one</strong> possible assignment to all the variables in the program (namely <code>a!0</code>, <code>a!1</code>, <code>a!2</code>, <code>a!3</code>) that satisfies all the constraints (i.e. the ones in the Z3 AST, and the ones we added at the end).</p><ul><li><p>If the solver finds an assignment for all the variables that satisfies all constraints, it will output those assignments.</p></li><li><p>If it can't find an assignment that satisfies all constraints, it will tell us that the logical formula is <em>unsatisfiable</em>.</p></li></ul><h1 id="heading-running-the-solver-to-prove-properties">Running the solver to prove properties</h1><p>Let's go back to the first example program we saw to put everything together:</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>There are three variables in this program: <code>balance</code>, <code>bonus</code> and <code>deposit</code>. Note that <code>deposit</code> doesn't have an initial value (i.e. it's a free variable), so we'll let Z3 explore all possible initial values to prove some properties about this program.</p><h2 id="heading-generating-counterexamples">Generating counterexamples</h2><p>First, we'll want to prove that for every non-negative <code>deposit</code> amount, after the program executes, the <code>balance</code> is always non-negative as well. For this, we need to impose two constraints: one for the initial value of <code>deposit</code>, and one for the final value of <code>balance</code>. We can do so by using <code>Z3.assert</code> as we saw before:</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719181136738/4f953c8f-9eb5-4217-9f53-f14e55cc6520.png" alt="Representing properties as constraints to the Z3 computation" class="image--center mx-auto" /></p><ul><li><p>The box on the right is the Z3 AST representation of the program along with its corresponding constraints, which is generated with the process we discussed before.</p></li><li><p>We add two additional constraints (the two boxes on the left) that encode the property we want to prove about the program:</p><ol><li><p>We constrain the initial value of <code>deposit</code> (i.e. <code>deposit!0</code>) to be greater than or equal to 0, using <code>Z3.bvSge</code> (signed <code>>=</code> for bit-vectors).</p></li><li><p>We constrain the final value of <code>balance</code> (i.e. <code>balance!k</code>, where <code>k</code> is just the internal id Z3 will assign to that final variable) to be <strong>less than or equal to</strong> <code>-1</code>, using <code>Z3.bvSle</code> (signed <code><=</code> for bit-vectors).</p> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">Note that we want to prove that <code>balance >= 0</code>, but we are actually adding the constraint <code>balance <= -1</code>, which is the opposite. This is because if the solver finds an initial value for <code>deposit</code> that satisfies <code>balance <= -1</code> at the end of the execution, we want it to output it as a <em>counterexample</em>. If the solver can't find such initial value, it will output <code>Unsatisfiable</code> and we'll know there's no initial value for <code>deposit</code> that makes the <code>balance</code> negative, which is what we want to prove.</div> </div></li></ol></li></ul><p>If we finally run the Z3 solver passing it the final logical formula we just built, it will output the following:</p><pre><code class="lang-plaintext">deposit!0 -> #x7fffffffdeposit!8 -> #x7fffffffbonus!7 -> #x0000000abalance!6 -> #xfffffff6balance!5 -> #x7fffffffbalance!4 -> #xfffffff6bonus!3 -> #x0000000abalance!2 -> #x00000000</code></pre><p>It gave us a counterexample! This means the property <code>balance >= 0</code> does not hold for every possible initial <code>deposit</code>. Looking at the value the Z3 solver assigned to the <code>deposit!0</code> variable (i.e. the initial value for <code>deposit</code>), we can see it's the maximum signed integer value using 32 bits (i.e. <code>2147483647</code>). That value will create an overflow on the <code>balance</code> variable on the following line:</p><pre><code class="lang-c">balance := balance + deposit * bonus</code></pre><p>This will make the <code>balance</code> negative, satisfying the constraint we imposed.</p><p>Let's also understand what the rest of the variable assignments actually are:</p><ul><li><p><code>balance!2</code> and <code>bonus!3</code> are the variables that represent the initial <code>balance</code> and <code>bonus</code> at the start of the program:</p><pre><code class="lang-c"> balance := <span class="hljs-number">0</span>; bonus := <span class="hljs-number">10</span>;</code></pre><pre><code class="lang-plaintext"> bonus!3 -> #x0000000a balance!2 -> #x00000000</code></pre></li><li><p><code>balance!4</code> and <code>balance!5</code> represent the <code>balance</code> within the <code>if</code> and <code>else</code> branches, respectively.</p><pre><code class="lang-c"> <span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus } <span class="hljs-keyword">else</span> { balance := balance + deposit }</code></pre><pre><code class="lang-plaintext"> balance!5 -> #x7fffffff balance!4 -> #xfffffff6</code></pre> <div data-node-type="callout"> <div data-node-type="callout-emoji">ðŸ’¡</div> <div data-node-type="callout-text">Note that <code>balance!4</code> is where the overflow happens, since <code>0xfffffff6 == -10</code> in signed <a target="_blank" href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a>.</div> </div></li><li><p><code>deposit:8</code>, <code>bonus!7</code> and <code>balance:6</code> represent the final state of the program, after the <code>if-then-else</code> statement.</p><pre><code class="lang-plaintext"> deposit!8 -> #x7fffffff bonus!7 -> #x0000000a balance!6 -> #xfffffff6</code></pre><p> Both <code>deposit</code> and <code>bonus</code> will keep their original value since they are constants, and <code>balance</code> will be assigned the value of <code>balance!4</code>, since the <code>if</code> branch will be taken in this particular case.</p></li></ul><div data-node-type="callout"><div data-node-type="callout-emoji">ðŸ’¡</div><div data-node-type="callout-text">Remember that the Z3 solver's job is to create an assignment to all the intermediate variables such that all constraints are satisfied. From Z3's perspective, <code>bonus!3</code> and <code>bonus!7</code> are completely different variables; what ends up making them have the same value at the end are the constraints created during the Z3 AST construction.</div></div><h2 id="heading-proving-a-property-for-all-possible-inputs">Proving a property for all possible inputs</h2><p>Let's conclude with an example where the Z3 solver cannot find a variable assignment to satisfy all constraints in the Z3 computation.</p><pre><code class="lang-c">balance := <span class="hljs-number">0</span>;bonus := <span class="hljs-number">10</span>;<span class="hljs-keyword">if</span> (<span class="hljs-number">1000</span> <= deposit) { balance := balance + deposit * bonus} <span class="hljs-keyword">else</span> { balance := balance + deposit}</code></pre><p>We now want to prove that, no matter what the <code>deposit</code> amount is, it's impossible for the <code>balance</code> to be <code>1000</code> at the end of the program's execution.</p><p>Using the same technique as before, we now impose the constraint <code>balance == 1000</code>.</p><p>Running the Z3 solver, we'll now get:</p><pre><code class="lang-c">Unsat</code></pre><p>This means that it's impossible to create an assignment for all the variables in the Z3 computation such that all constraints are satisfied at the same time.</p><p>Therefore, the property <code>balance != 1000</code> holds true for every possible <code>deposit</code> amount.</p><h1 id="heading-conclusion">Conclusion</h1><p>In this article, we saw how to turn an imperative program into a logical formula (encoded using Z3's API), and then prove properties about that program by encoding these properties as constraints to the Z3 solver.</p><p>Let me know if you'd be interested in a follow-up article explaining the actual code that implements all of this from scratch. Otherwise, the code is available <a target="_blank" href="https://github.com/EperezOk/formal-verification-imp">on GitHub</a> for you to see on your own. Although it's implemented in Haskell (just like <a target="_blank" href="https://github.com/ethereum/hevm">HEVM</a>), it shouldn't be hard to understand it at a high level if you understood the process we went through in this article.</p><p>Until the next research project!</p><h1 id="heading-references">References</h1><ul><li><a target="_blank" href="https://github.com/EperezOk/formal-verification-imp/tree/main">GitHub Repository</a></li></ul><ul><li><p><a target="_blank" href="https://youtu.be/ruNFcH-KibY?si=e3AHsP5a6dzHoroa">[Talk] Analyzing Programs with Z3, Tikhon Jelvis</a></p></li><li><p><a target="_blank" href="https://www.cs.colostate.edu/~cs440/spring19/slides/z3-tutorial.pdf">Z3 - a Tutorial, Leonardo de Moura and Nikolaj Bjrner</a></p></li></ul>]]>