Functional Verification of High Performance Adders in Coq

Addition arithmetic design plays a crucial role in high performance digital systems. The paper proposes a systematic method to formalize and verify adders in a formal proof assistant Coq. The proposed approach succeeds in formalizing the gate-level implementations and verifying the functional correctness of themost important adders of interest in industry, in a faithful, scalable, and modularized way. The methodology can be extended to other adder architectures as well.


Introduction
Demonstrating the functional correctness of an arithmetic implementation is a challenging topic which has lasted for several decades.Testing and simulation, as the traditional methods, have won good reputation and have been employed extensively in industry.When dealing with large scale designs, these methods may find counterexamples but could not assert if a design is correct because the exhaustivity is impractical.
As an alternative, formal methods have been increasingly adopted to validate the arithmetic implementations.A main branch of formal methods is model checking, which is recognised by its automation and succeeds in numerous industrial applications.However, the inherent state explosion problem prevents it from scaling to large scale designs.
Another branch of verification is theorem proving, which is no longer restricted by the scale as model checking, testing, and simulation.The main problem restricting theorem proving to be widespread is that it requires strong logic backgrounds and heavy user interactions.Nevertheless, there appear quite a few successful applications by different theorem provers.By Boyer-Moore, a microprocessor is verified in [1], and an N-bit comparator as well as mean-value circuits are verified in [2].By HOL, a ripple carry adder and a sequential device are verified in [3], and an ATM switch fabric is verified in [4].By Coq, a sequential multiplier is verified in [5], and an asynchronous transfer mode switch fabric is verified in [6].
The main effort of this work is to propose a holistic methodology to formalize and verify adders in Coq [7].Adders are chosen because they are the most fundamental arithmetic units widely employed in various advanced digital systems, such as IBM POWER6, whose correctness depends significantly on the correctness of its addition subcomponents.This methodology provides a uniform way to formalize and verify various implementations of arithmetic addition, and it is applied in this work to formalize and verify primary and high speed adders of interest in industry, including Carry Look-ahead Adder (CLA), Ling Adder (LA), and Parallel Prefix Adder (PPA).
Benefiting from the techniques of Coq, the methodology shares the following decent features.
(i) Scalability: the formalization of an adder is parameterized by a natural number (named length) and the correctness proof applies to any length.(ii) Modularization: various verified adders are encapsulated as instances of an abstract module, which provides a uniform way to be reused in advanced arithmetic units.The formalization and verification of an advanced arithmetic unit can be accumulated from verified units ignoring their detailed implementations.(iii) Fidelity: the adders are formalized by (recursive) functions, which have clear correspondences to the gate-level implementations of circuits.The addends and sum of an adder are formalized as vectors, which is a faithful model of arrays and provides meanwhile additional type checking ability to avoid potential misusing of inputs.
The rest of paper is organized as follows.Related works are introduced in Section 2. According to our knowledge, we verify not only most adders appearing in the literature, but also some for the first time by theorem proving.Section 3 explains our methodology in details by the example of ripple carry adder.Preliminaries are also introduced according to our needs.Some definitions and most proofs will not be presented in this paper, but they are available on the author's webpage (http://superwalter.github.io/dev/veriadder.zip).Sections 4 and 5 are devoted to LA and PPA, respectively.

Related Work
Compared to their extensive applications, the verification of primary adders by theorem proving is not at the fingertips.In particular, the formalization and verification of the Ling adder cannot be found in any literature.Reference [8] proves the correctness of RCA by formalizing adders with dependent types in Coq.Reference [9] proves the correctness of RCA by the higher-order logic with a reusable library for formalizing circuits.Reference [2] verifies RCA written in VHDL as well as other circuits by the higher-order logic.Reference [10] develops semiformal correctness proof of CLA or PPA.Reference [11] shows a pencil-and-paper proof of the general prefix adders, as well as the proof of related RCA.Furthermore, [12] formalizes and verifies these adders in Coq.By rewriting and induction, [13] provides the verification of PPA using powerlists.An algebra formalization of PPA and its correctness proof are presented in [14].Besides applying it to formalize and verify most primary adders, our methodology also provides good features, which only appear partially in other literatures, but are never integrated together in any preview work, according to our knowledge.

A Holistic Methodology
Various kinds of adders are designed to provide relatively good performances for different circumstances, while they implement the same addition functionality.A holistic methodology is proposed in this work in order to capture all the different adders and provide desired good features.

3.1.
A Unified Proof Structure.Basically, the methodology answers four questions: (i) how to formalize the related data types; (ii) which method is used to formalize an adder; (iii) what should be proved; (iv) how to organize formalizations and verifications for different adders.
Lines 1 and 2 answer the first two questions., in line 1, is a parameter (name length) indicating the inherent nature of an adder: how many bits it can process.The input carry-in and output carry-out are formalized by Booleans (bit).The input addends and the returned sum are formalized by vectors of Booleans (data ), which are dependent types depending on the length .hyp  is another dependent type standing for a tuple of a bit and a -bit vector, which is used in line 2 for combining the carry-out and the sum.Thus, an adder is formalized as a function, taking two addends and a carryin as inputs and returning a tuple of carry-out and sum.This function is normally recursively defined as shown later.
Lines 3, 4, and 5 answer the third question.The correctness of an adder is ensured by proving that the natural number denotations of the inputs and outputs are equivalent.In line 5, || is the natural number denotation of a bit .|[V]| and |()| are natural number denotations of the vector V and the result tuple .Big endian is chosen to implement these two functions.
Lines 6-10 answer the last question.A general adder is formalized as an abstract module.The specification is assigned and the correctness is required.A verified adder should be its instance, like a Ripple Carry Adder (RCA).

An Example Explaining the Methodology.
Carry Lookahead Adder (CLA) improves RCA by computing all the carries in advance in order to reduce the significant delay.This is represented, in the formalization, by extending the general module with abstract functions , , and  which are supposed to compute all the propagated carries, generated carries, and carries, respectively, according to the inputs.
(1) Module Type LookAheadAdder <: GenAdder.<: symbol in line 1 stands for the fact that this module should be an instance of the general verified adder.RCA is formalized according to the following equations: Carry to each bit  +1 in CLA is computed by iteratively unfolding   in (1) until  0 which is an overall input bit as shown by the following example: This process as well as definitions of  and  are formalized as follows: (1) Definition P n (X Y: data n):= X ⊕ Y.
(2) Definition G n (X Y: data n):= X ∧ Y. (5 ⊕ and ∧ in lines 1 and 2 and ∨ used later are extensions of logical Boolean operations ⊕, ∧, and ∨, iterating these operations on the elements at the same position of the two vectors.+ symbols in lines 5 and 6 stand for the start of the two branches of the recursion where  = 0 or  =  + 1.The ⊳ operators in line 5 return the leftmost element of a vector.Correspondingly, the ⊲ operator in line 6 returns the rightmost  elements of a (+1)-bit vector.[] is a vector with a single bit . 1 and  2 represent the first and second objects of a tuple, respectively.The ⋈ operator in line 9 joins a bit and a -bit vector to form a ( + 1)-bit vector.
The adder is defined as follows and its correctness is proved by induction on the length and reusing the correctness result of the full adder: (1) Definition adder: forall n, mbadder n.

Features Provided by the Methodology.
There are several benefits to the use of this methodology for the verification of adders.

Scalability. The formalization and verification of an
adder is scalable to any data-width, because the parameterized length can be specified to arbitrary natural number.A 4-bit RCA can be obtained by the following: (1) Definition CLA4:= CLA 3.
Notice that a 4-bit CLA is CLA3, because we require that the addends of the adders have at least one bit.The correctness proof of a CLA with a specified length follows straightforwardly from the proof of CLA with arbitrary length.

Modularization
. Some high speed adders divide the input addends into different groups.Each group is calculated by a Carry Selected Adder (CSA) independently, and different groups will be concatenated together in order.Since the computation of CSA depends on the very late steps of input carry-in, such designs would have less propagated time, thus high performance.We formalize an abstract architecture for this kind of design, which illustrates the modularization of our method and may also contribute to verify complex adders in the future.
CSA takes an abstract verified adder as parameter and is also an instance of the general verified adder.
(3) intros X Y c.      Lines 2 to 10 define CSA.Two adders compute the sum and the carry-out with respect to carry-in  and  in lines 4 and 5, respectively.The multiplexer chooses the real sum and carry-out according to the actual carry-in in lines 6 and 7, since when the input carry is required. in line 6 applies a function to each element of a vector.The correctness of CSA holds because the addition unites are correct; thus, CSA is an instance of the general adder.The parameterized module can be instantiated by any verified adders.Line 13 defines a CSA whose addition unites are specified to CLA.
The formalization and verification of this adder are quite complex due to the problem with the dependent types as described in [15,16]; therefore, the unimportant details are omitted.The  in line 2 is a partition of the addends.This partition should be valid, which means the elements preserve strict order and do not exceed the total data-width.Lines 3 to 12 define the adder recursively by combining an adder with another which is combination of the remaining groups of adders obtained by recursion. in line 9 execute the combining operation.  in line 9 converts an adder with length  to an adder with length  taking the proof of  =  as an argument.The initial values of this recursive function are specified in line 13.The correctness can be proved by the induction on the length of the partition and using the correctness result of combining correct adders.
The parameterized module can be instantiated by any verified adder.If it is instantiated by CSA, it is a verification of many popular high speed adders.

Fidelity.
There are normally two ways to formalize the addends and sum of an adder in Coq, either by dependent type V as in [6,8] and this work or nondependent type  as in [12].Both [6,8] have explanations why dependent type is more proper for the verification of adders.Generally speaking, nondependent list is more proper for formalizing linked list, whose length can be obtained by computation, while dependent vector is more proper for formalizing array, whose length is inherent natural.The functionality of adders is formalized by interactively defined (recursive) functions which have clear correspondences to gate-level description of circuits.

Ling Adder
The Ling Adder (LA) was proposed by [17].Instead of computing in advance all the carries as CLA, LA computes all the pseudo carries, the propagation of which have less fan-ins and fan-outs.With the proper grouping of the input addends, LA needs lesser levels of gates and consequently has better performance.
Similar to the propagated and generated carries, LA has new complementing signal   and previous stage propagate   , which are defined in (4) and ( 5) respectively as follows: The pseudo carries are defined recursively.According to our knowledge, [17] and other materials about LA define the pseudo carries without considering the case  = 0 as this paper does in (6b). Consider Without this case, the default values of  −1 and  −1 are both , and it is equivalent to our definition assuming that   is always .More intuitively, that algorithm does not consider the carry-in to the least significant bit,, which restricts it to some special applications, such as the addition of two registers.We generalize it to provide general functionality of an adder.Sum is defined similarly to consider the carry-in to the least significant bit as follows: The abstract module of Ling extends the general one by adding signatures of , , and .
(1) Module Type LingAdder <: GenAdder.To compute the th pseudo carry of , the th bit of  and the (−1)th bit of  are needed.Therefore, the two parameters of  stand for vectors  and a left shift of .The formalization of  assuming the correctness of the parameters is as follows:   is defined recursively.Line 3 deals with the case  = 0. Lines 4 and 5 deal with the recursive case. is the last  bits of  by recursion, and  ⊳ stands for  −1 .
LA is defined according to (7a) and (7b) using the definition of .
Since the th bit of sum depends on the ( − 1)th bit of  and , they are shifted in lines 5 and 7.The reason why  is shifted into  is explained above;  is shifted into  to ensure  −1 ∧  −1 =  where  −1 and  −1 are the bits to be shifted in, respectively, and  −1 = .The carry-out of LA is (TXY ⊳) ∧ (Hc ⊳) which is equivalent to  out as shown in The formalization of ( 8) is complicated, but the proof is trivial by induction and case analysis.The correctness of LA follows by proving a lemma stating that the outputs of CLA and LA are the same with regard to arbitrary same inputs.This lemma is proved by induction with the result of ( 8).
Reference [18] proposed an extension of Ling's adder by the following equations: : =  : ∧ ( : +  −1: ) , where  : and  : are group propagated and generated carries which are defined later in Section 5. Equation ( 11) is also proved in this work.

Parallel Prefix Adder
CLA improves RCA by computing all the carries in advance as shown in (4).However, large fan-in and fan-out will be caused if all the carries   are computed this way especially when  is large.Parallel Prefix Adder (PPA) avoids this by the idea of divide-and-conquer, which provides an efficient way to compute all the parallel carries.Basic definitions are as follows: Due to the similarity between ( 14) and ( 15), only the formalization of ( 15) is shown as follows.An auxiliary function, defined recursively on the difference of  and , is reluctantly introduced to define it in Coq.
In line 1, the parameters  and  stand for the propagated and generated carry vectors.Another parameter  is the difference of  and .Function th  V returns the th element of V from the leftmost bit indexed 0. pred  computes the predecessor of .
To compute all the carries parallel in advance, the carry  +1 should not depend on any   , where  ≥  > 0, except  0 which is the overall carry-in.Therefore, carries of PPA are computed according to a variation of (12) as follows: and different PPAs employ different parallel prefix methods to compute the group carries  :0 and  :0 , for all  ≥  ≥ 0, for the sake of high performance.To capture various PPAs in a uniform framework, an abstract module, which abstractly describes this method as , is employed as follows: (1) Module Type GroupCarries.
2 , in line 3, is the dependent type of a tuple of vectors whose lengths are both .Therefore, the parameter of  stands for vectors of propagated and generated carries as shown in line 6.  in line 4 is the assumption that the  function is correct.The correctness is represented as an extensional equality of another correct function and itself.In line 7,   is the correct function to compute the groups carries according to (14) and (15).Its correctness holds by, first, computing all the carries   according to this function and then proving that   are equivalent to the carries of CLA.ℎ , in line 2, is a compositional operation first iterating Equation ( 16) on all the  :0 and  :0 which are stored in the vectors of the first and projection of V and then shifting the overall carry-in  0 to get all the carries.Consider that the computation of  : depends on the subgroups of the group propagated carries  : , the fundamental carry operator "∘" as in [19] is used to compute the group propagated and generated carries simultaneously in function   and should be used in all implementations of function .Consider (, ) ∘ (  ,   ) = ( ∧   ,  ∨ ( ∧   )) . ( Function   can be taken as an instance of  function and is only one particular implementation of , which is verified.There are many other implementations of the  function based on the following lemmas which are proved by induction on the difference between  and , using ( 14) and (15): Equation ( 18) can be rewritten using ∘ operator in one equation.For all  <  ≤ , ( : ,  : ) = ( : ,  −1: ) ∘ ( : ,  −1: ) .
Equation (19) shows clearly that any group of group carries can be computed by its concatenation (or even overlapped) subgroups.And the proper dividing and conquering of the bits of input addends can implement  function with high performance.PPA is such a family of adders differing only in the computation of the  function; thus, a general PPA can be formalized and parameterized by module .
Kogge-Stone adder can be combined by the general module of PPA and this specific module of Kogge-Stone methods to compute all the group carries, which provides not only the computation method but also the correctness proof.

Conclusion and Future Work
In this work, we proposed a holistic methodology to formalize and verify primary adders (RCA, CLA, LA, and PPA) in theorem prover Coq.They are formalized using dependent types, higher-order recursion and module systems in order to provide fidelity, scalability, and modularization.
In particular, PPA is a family of adders sharing the same structure, only differing in the methods of parallel prefix computing.We provide a novel way to describe the general PPA and show how to use this general module to verify a specific PPA, exemplified by Kogge-Stone adder.
Other advanced arithmetic designs can be verified reusing the formalizations and verifications of this work in a combinational way, as we describe by the example of carry select adders.
All the work has been carried out in Coq.The whole development contains around 2,000 lines of Coq scripts.This number of scripts is only about one third of [12], which is another work dedicated to verify additional designs in Coq.This work used lesser scripts but verified more addition designs than [12].
This work can be continued in two directions.Advanced arithmetic designs, such as IBM POWER6, can be cumulately verified based on these verified adders.Since formalization in a constructive way is to have clear correspondence to gatelevel descriptions of circuits, HDL codes can be generated from the verified designs, which may provide an alternative way for designing the correct arithmetic implementations.