FPGA Implementation of Reconfigurable Finite State Machine with Input Multiplexing Architecture Using Hungarian Method

The mathematical model for designing a complex digital system is a finite state machine (FSM). Applications such as digital signal processing (DSP) and built-in self-test (BIST) require specific operations to be performed only in the particular instances. Hence, the optimal synthesis of such systems requires a reconfigurable FSM. The objective of this paper is to create a framework for a reconfigurable FSM with input multiplexing and state-based input selection (Reconfigurable FSMIM-S) architecture. The Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture and an optimized multiplexer bank (which defines the mode of operation). For this, the descriptions of a set of FSMs are taken for a particular application. The problem of obtaining the required optimized multiplexer bank is transformed into a weighted bipartite graph matching problem where the objective is to iteratively match the description of FSMs in the set with minimal cost. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. The experimental results from MCNC FSM benchmarks demonstrate a significant speed improvement by 30.43% as compared with variation-based reconfigurable multiplexer bank (VRMUX) and by 9.14% in comparison with combination-based reconfigurable multiplexer bank (CRMUX) during field programmable gate array (FPGA) implementation.


Introduction
Designing a complex digital system requires an efficient method that includes modeling a control unit (i.e., a controller).The operational speed of such systems depends on the speed of their controllers.The mathematical model for designing a controller for applications such as microprocessor control units, circuit testing, and digital signal processing (DSP) is a finite state machine (FSM).Consequently, designing such systems requires an efficient synthesis technique for high-speed FSM [1,2].Applications such as DSP [3,4] and built-in self-test (BIST) [5] require specific operations to be performed only in the particular instances.Different control units are required to complete each operation.Hence, to optimally perform these operations, a single control unit is defined which can configure itself depending upon the applied mode of operation; it is also known as reconfigurable FSM [1].The mode of operation for such FSM is controlled by a counter, timer, or any user-defined control signals based on the application requirements.An example of a reconfigurable FSM is given in [1] as a test chip for wireless sensor network.In this example, Transition-Based Reconfigurable FSM (TR-FSM) [1] is configured into one of the MCNC FSM benchmark circuits (i.e., dk15, s386, or cse) at different instances.Moreover, any application which requires sequential processing can be broken down into a series of instances (i.e., multistage reconfigurable signal processing) where at each instance only a particular operation is performed [3].Hence, for such applications, efficient architectures can be created using reconfigurable FSM.These emerging trends in the research necessitate a framework for optimal synthesis of high-speed reconfigurable FSM.
Conventional LUT-based architectures have been used for FSM implementation on a FPGA platform [6].Similarly, ROM-based architectures are investigated for FSM implementations.Due to the area and speed advantages, they act as an excellent alternative to their conventional LUT-based counterparts [7].In such implementations, a considerable reduction in power consumption is obtained by disabling embedded memory blocks (EMBs) during the idle International Journal of Reconfigurable Computing states [8,9].The fundamental framework for FSM with input multiplexing (FSMIM) is made in [7] whose prime objective is to shorten the depth of ROM memory.In their approach, an input selector (which consists of a multiplexer bank) is used.The basic idea that has been implemented is to select only a specific set of inputs for a particular state.FSMIM with state-based input selection (FSMIM-S) is proposed in [10], which further reduces the ROM memory size.
Another approach for implementation of reconfigurable FSM is RAM-based architectures.In literature, there are two underlying RAM-based architectures, that is, variation-based reconfigurable multiplexer bank (VRMUX) and combination-based reconfigurable multiplexer bank (CRMUX) [11].The RAM-based architectures do not serve as a novel tool for implementation of complicated FSM structures such as parallel hierarchical finite state machines (PHFSM) [12] or reversible FSM [13].Due to significant advantages of FSMIM-S architecture over other architectures, it is used to create a framework for the high-speed Reconfigurable FSMIM-S architecture.
The Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and an optimized multiplexer bank (which defines the mode of operation).For this, the descriptions of a set of FSMs are taken for a particular application.Hence, the problem is to obtain the optimized multiplexer bank for the given set of FSMs.It can be solved by mapping all the FSMs into one large FSM (called base ckt) in that set.The objective of this process is to perform optimal matching between base ckt and the other FSMs in the set so that a minimum number of bits are changed by changing the mode of operation.This situation (i.e., performing one-to-one mapping) transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set to base ckt with minimal cost [14].As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed.In this algorithm, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix.A cost matrix reduction based technique, that is, Hungarian algorithm [15,16], is used for matching.A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search.At every iteration, descriptions of two FSMs (i.e., base ckt and one of the FSMs in the set) are taken as inputs.It produces the modified descriptions of the FSMs of the same dimension as outputs.At the end of the algorithm, a mutual XOR operation is performed among the modified descriptions, which provides the required optimized multiplexer bank.
The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11], as operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation.It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation.The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation.
The rest of the paper is outlined as follows.Section 2 consists of the Reconfigurable FSMIM-S architecture and the proposed iterative greedy heuristic based Hungarian algorithm.The experimental evaluation of the proposed algorithm, implementation of the Reconfigurable FSMIM-S architecture, and comparison with other proposals from the literature are presented in Section 3. The concluding remarks are devised in Section 4.

Proposed Method
As most of the FPGA platforms use synchronous EMBs, Mealy machines with synchronous outputs are used in this paper.Let a Mealy FSM be described by the following columns:   is a code of current state (  ∈ , where  = { 1 , . . .,   } is a set of states); (  ) is a code of state   ∈ ; ℎ is the number of transitions per state (ℎ ∈ , where  = { 1 , . . .,   } is a set of number of transitions per state corresponding to );   is a state of transition (the next state); (  ) is a code of state   ∈ ;  = { 1 , . . .,   } is the set of input variables,  = { 1 , . . .,   } is the set of output variables; and  = { 1 , . . .,   } is defined as excitation functions for the flip-flops, where  is the number of flip-flops (i.e., the number of bits in internal state codes),  ∈ {⌊log 2 ⌋, }.
The descriptions of a set of FSMs are taken for a particular application.The fundamental idea is to obtain the description of a single FSM by mapping all the FSMs into one large FSM (called base ckt) in that set.The inputs, states, and outputs of an FSM in the set are mapped into base ckt in their respective order.The mode bits are applied through a 2 × 1 multiplexer in those positions where the polarity of bit differs (i.e., 1 in place 0 and vice versa) to perform such mapping.Hence, the resultant FSM operates in two modes, where base ckt mode is the default mode of operation.Similarly, all other FSMs in the set are mapped into base ckt.In this way, a single FSM (i.e., base ckt) combined with a multiplexer bank (which defines the mode of operation) acts as reconfigurable FSM.It can be configured into a particular FSM in the set by applying the specific mode bits.Due to numerous advantages mentioned in the literature, FSMIM-S architecture [10] is chosen to implement the FSM (i.e., base ckt) part.Therefore, the Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and multiplexer bank for mode based reconfiguration as shown in Figure 1.
It encounters the following two major difficulties: (i) The complexity of the resultant multiplexer bank is very high.
(ii) It becomes difficult to define the dummy states and dummy transitions.Dummy states and dummy transitions are such states and transitions which are not present in base ckt but exist in the other FSMs in the set and vice versa.These states and transitions lead the system to failure.As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed.In this algorithm, the descriptions of a set of FSMs (i.e., [ℎ, ,   ,   , ]) are taken as inputs.It provides the optimized multiplexer bank for mode based reconfiguration as output.It also provides the updated description (i.e., description without dummy states and dummy transitions) of base ckt, which is used to construct the conventional FSMIM-S part of the proposed architecture.Let ( + 1) be the set of FSMs for a particular application.Based on the complexity of the description of FSM, the largest FSM is selected from the set.It is called base ckt.The rest of the FSMs are called recon ckt 1, recon ckt 2, . . ., recon ckt , respectively.
Each input, state, or output of a recon ckt  ∈ {recon ckt 1, recon ckt 2, . . ., recon ckt } can be mapped into any one of the inputs, states, or outputs, respectively, of base ckt; that is, there exists a one-to-one mapping.These mappings cannot be performed independently because inputs, states, and outputs of an FSM are interdependent.Consequently, mapping an input or state of recon ckt b into base ckt is transformed into a weighted bipartite graph matching problem or linear assignment problem (LAP) [14] as shown in Figure 2. In this LAP, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix.The objective of this process is to perform matching with a minimal cost so that a minimum number of bits are changed by changing the mode of operation.Therefore, the complexity of the multiplexer bank is reduced.
In the literature, the following approaches are proposed to solve a LAP: (i) Modified Hungarian algorithm [16] (ii) Simple greedy heuristic based algorithm [17] (iii) Evolutionary heuristic algorithm [18].
The maximum number of inputs or states does not exceed 100 in MCNC FSM benchmarks or FSMs used in realworld applications.So, the number of vertices used in the resultant weighted bipartite graph is always low which results in small LAP.But, the number of LAPs formed in this process is enormous because input matching and state matching are performed together as shown in Figure 2. Hence, the primary requirement of the algorithm to solve LAP becomes the fast convergence.Therefore, a cost matrix reduction based technique, that is, Hungarian algorithm [15,16], is used for matching.A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search.The pseudocode of this technique is summarized in Algorithm 1. (Note: subscripts "base" and "recon" denote the parameters of base ckt and recon ckt, respectively, throughout the paper.) At every iteration ∈ {1, . . ., }, descriptions of two FSMs, that is, base ckt and recon ckt b, are taken as inputs.The major contributing factors for power consumption and LUT requirement in FSM are the number of inputs and the internal states [8,19].In any FSM, input variable and states are interdependent.Thus, input and state matching are performed together between base ckt and recon ckt.
If  base ≥  recon, then  =  base   recon combinations of input lines for base ckt are generated to match with input lines of recon ckt b. ( base −  recon) input lines act as don't cares while the system operates in recon ckt b mode.Otherwise,  =  recon   base combinations of input lines for recon ckt b are generated to match with input lines of base ckt.In this case, ( recon −  base) input lines act as don't cares while the system operates in base ckt mode.Now, for each combination of input lines, state matching is performed (Algorithm 2).This situation can be seen as a LAP where the objective is to match the states of recon ckt b to the states of base ckt with minimal cost [14,17]   All LAP solving algorithms require a cost matrix as an input to perform an optimal assignment.So, to form a cost matrix for this problem, a procedure named weight assignment is proposed.
In this procedure, the combinations of input lines,   and ℎ, for base ckt and recon ckt b are taken as inputs.The basic idea that has been implemented is as follows: (i) replace the recon ckt b state with the base ckt state sequentially in the recon ckt array; (ii) evaluate the weight by performing Bitwise-XOR operation (i.e., transition matching) for that particular replacement; (iii) then, construct the cost matrix.
For each transition in recon ckt array (i.e., ℎ recon ∈ {1, 2, . . .,   recon }), transition matching is performed.This situation can be seen as a LAP where the objective is to match the transition of recon ckt b to the transition of base ckt with minimal cost [14,17].For this, the number of transitions for the particular state is equalized in both the FSMs.Therefore, if   base ≥   recon , then ( −   recon , where  =   base ) dummy transitions are added in the recon ckt array.Otherwise ( −   base , where  =   base ) dummy transitions are added in the base ckt array.Thus, for each transition in base ckt array (i.e., ℎ base ∈ {1, 2, . . .,   base }), a Bitwise-XOR operation is performed between the arrays for that particular transition.The total number of 1's in the Bitwise-XOR operations is counted to create a cost matrix for transition matching.Then, optimal assignment of transitions is performed by greedy based heuristic Hungarian algorithm (GBH hungarian algorithm) between base ckt array and recon ckt array.Let match count be a variable defined as where,   ← cost matrix,   ← decision variable. (1) In this way, by using match count (from (1)), the cost matrix formation to map recon ckt b states into base ckt states is completed.The pseudocode of the procedure, weight assignment, is summarized in Algorithm 5.
Let  and  represent the set of vertices (i.e., transitions or states) for recon ckt and base ckt, respectively. = ( ∪ , ) is defined as a balanced weighted bipartite graph, where || = || = . is the cost matrix.A number   ≥ 0 for each edge [, ] ∈  is called the cost (or weight) of the edge [, ].
In GBH hungarian algorithm, the cost matrix  is taken as input.It provides an optimal assignment between  and  as output.GBH in [17] is an iterative cost matrix reduction based approach to solve the LAP.At each iteration, a single vertex is eliminated from either  or  until the advent of some stopping conditions.Let  be the last iteration (whereas  is a positive integer).Therefore, either  or ( − 1) vertices are eliminated from  at the last iteration.
Let V  ⊆  and   ⊆  be the subsets of the remaining vertices in  and , respectively, at iteration .At the first iteration, that is,  = 1, V 1 = , and  1 = , respectively, the objective of the LAP is to assign  resources to  tasks in such a way that optimal total cost should be obtained for the assignment.The LAP can be mathematically formulated as follows: where ∀ = 1, . . ., ; (3) where ∀ = 1, . . ., ; (4) where ∀,  = 1, . . ., . ( Equation ( 2) represents the objective function for LAP.If resource  is allocated to task  then the decision variable   = 1 and 0 otherwise as depicted in (5).One-to-one mapping should be practiced between resources and tasks.Equations ( 3) and (4) ensure these criteria.
At each iteration, there are two options to eliminate a vertex, that is, from either  or .For each  ∈ V  and  ∈   , the following parameters are defined to select one of the above options: In ( 6),    and    can act as "potential cost contribution" [17] of vertices  ∈ V  and  ∈   to   in ( 2).Thus, the potential cost contribution is evaluated for the vertices, and if it exceeds the corresponding removal cost, then such vertices are eliminated.
If    ≤    , then an attempt is made to remove one of the vertices from V  ⊆ .From (7), if    ≤  −1 , that is, the objective function value is improved by eliminating   , then V +1 is set to V  and the next iteration is executed.
Otherwise, one of the vertices from   ⊆  is eliminated.From (8), if    ≤  −1 , that is, the objective function value is improved by eliminating   , then  +1 is set to   and the next iteration is executed.
In this case, when the objective function value is not improved by eliminating either   or   , then algorithm halts and the obtained solution is  −1 .Furthermore, if    >    , then the above steps are repeated in the opposite order.The pseudocode of this approach is devised in Algorithm 6.
Therefore, after obtaining the cost matrix from weight assignment for state matching, GBH hungarian algorithm is applied to obtain the following parameters: Thus, all the recon ckt b states are replaced by their assigned base ckt states, and all the complete arrays of recon ckt b are arranged corresponding to   base order.Hence, from ( 9), the combinations of input lines are selected with min{assignment cost 1 , . . ., assignment cost  } & min{total cost 1 , total cost 2 , . . ., total cost  }.Now, binary state codes (  ) and (  ) are applied in base ckt and recon ckt b.As it changes the weights of cost matrix, weight assignment is again applied to construct a modified cost matrix.In this case, arrays are created by combining [selected input combination, (  ), (  )].Dummy states are replaced in matched states of base ckt and recon ckt b by using Propositions 1 and 2.Then, dummy transitions are replaced by using Proposition 1.The dummy replacement algorithm is shown in Algorithm 3.

Proposition 1. Dummy transitions in a matched state of base ckt or recon ckt b should be replaced with one of the existing transitions in that particular state with a minimum cost.
Proof.For each matched state (or assigned state after matching) ∈ recon ckt b, if (  base ≥   recon ) then (  base −   recon ) dummy transitions are present in recon ckt b state.
Hence, there are (  base −   recon ) transitions, present in the corresponding state of base ckt which are unassigned.These unassigned transitions in base ckt will lead the system to failure while operating in recon ckt b mode.As a solution, these unassigned transitions of base ckt are assigned to the existing transitions of recon ckt b with the least cost by looking at the particular column of the modified cost matrix.
Similarly, for each matched state (or assigned state after matching) ∈ recon ckt b, if (  base <   recon ) then (  recon −   base ) dummy transitions are present in base ckt state.Hence, there are (  recon −   base ) transitions, present in the corresponding state of recon ckt b which are unassigned.These unassigned transitions in recon ckt b will lead the system to failure while operating in base ckt mode.As a solution, these unassigned transitions of recon ckt b are assigned to the existing transitions of base ckt with the least cost by looking at the particular row of the modified cost matrix.
Let   × represent the modified cost matrix for a matched state, where rows (  ) and columns (  ) denote the base ckt and recon ckt b transitions, respectively.Thus, the unassigned transitions in base ckt state can be assigned by (10) as follows: unassigned   →   : min (  1 ,   2 ,   3 , . . .,    ) . (10) Similarly, the unassigned transitions in recon ckt b state can be assigned by (11)  Proof.In FSM, splitting a state with high transitions results in low power consumption [8,19].It also improves the operating speed [2,20].If  base >  recon, then there are ( base −  recon) states, present in base ckt which are unassigned.These unassigned states in base ckt will lead to failure in the system while operating in recon ckt b mode.As base ckt is the largest FSM in the collection and its transitions per state are greater than recon ckt b, splitting recon ckt b states are insignificant for the system performance.So, these unassigned states of base ckt are assigned using Proposition 1.
If  base <  recon, then ( recon −  base) dummy states are replaced by splitting the matched state in base ckt.Let Ψ(  base ) = (  base −   recon ), where  is a positive integer.Only the states for which |Ψ(  base ) > 1| can be split [19].Each state can be split into nonoverlapping subsets of (  base −  recon ) transitions.Algorithm 7 is proposed to split a base ckt state.
At this stage, the states and the input lines of both the FSMs are completely matched and fixed.Hence, the output matching is performed by performing a Bitwise-XOR operation and selecting the combination with the least count of 1's.If  base ≥  recon, then  =  recon   base combinations of output lines for base ckt are generated to match with output lines of recon ckt b.Otherwise,  =  recon   base combinations of output lines for recon ckt b are generated to match with output lines of base ckt.Then, for each combination of output lines, Bitwise-XOR operation is performed between corresponding output lines of base ckt and recon ckt b.Let XOR count  , where  ∈ {1, 2, . . ., } represents the total number of 1's in the Bitwise-XOR operation for a particular combination of output lines.Therefore, the combinations of output lines with min{XOR count 1 , . . ., XOR count  } are selected.
At the end of every iteration, the description of base ckt is updated to operate on the next iteration.At the end of th iteration, for each recon ckt  ∈ {recon ckt 1, recon ckt 2, . . ., recon ckt ( − 1)}, replacement of dummy transitions and states is performed and updated descriptions of recon ckt 1, recon ckt 2, . .., recon ckt ( − 1) are obtained.In this way, descriptions of all FSMs are optimally matched, having the same dimension.Therefore, a mutual (i.e.,   2 ) Bitwise-XOR operation between the updated descriptions of FSMs is conducted which provides the optimized multiplexer bank for mode based reconfiguration.

Experimental Evaluation
Experiments have been conducted to illustrate the advantages of the proposed architecture using the FSM benchmark circuits from MCNC/LGSynth [21] as shown in Table 1.
The proposed iterative greedy heuristic based Hungarian algorithm has been implemented in MATLAB (2016b) environment.MATLAB HDL Coder tool is used to generate the Verilog HDL code for multiplexer bank for mode based reconfiguration.The Reconfigurable FSMIM-S architecture is described in Verilog HDL and implemented on a Xilinx xc6vlx75t Speed Grade-3 device (Virtex-6) by using Xilinx ISE 14.6 [15].All computations are performed using a computer with an Intel(R) Core(TM) i5, 8 GB RAM, and 2.67 GHz CPU.
Let  1 ,  2 ,  3 , . . .be the input lines,  1 ,  2 ,  3 , . . .be the output lines, and 1, 2, 3, . . .be the states of an FSM.In the proposed algorithm, at the first stage, input matching is performed along with the state matching; after that, dummy states and transitions are replaced.Then, output matching is performed (Algorithm 4).
As the number of inputs or outputs exceeds 8, it requires the generation of more than 8  8 = 40320 combinations for matching, which exhausts the simulation resources.Hence, the excess input lines are discarded from input matching, which contains the maximum number of don't cares out of the total number of transitions.Similarly, the excess output lines are discarded from output matching, which contains the minimum number of 1's out of the total number of transitions.Therefore, the complexities of input selector bank and group encoder are reduced because the information content of these lines is minimum.
The FSM "s1494" has been considered as base ckt, because it consists of 48 states, 8 inputs, 19 outputs, and 250 transitions which are of higher values as compared with any of the FSMs in the collection.Hence, "s1494" is considered as an FSM included in the design at the 0th iteration.In this case, state splitting is never used for dummy state replacement, because base ckt contains the highest number of states.All dummy states and transitions are replaced by using Proposition 1.For output matching,  1 ,  2 ,  3 ,  4 ,  5 ,  6 ,  7 ,  9 ,  10 ,  16 , and  18 are discarded because they contain 62, 12,13,8,6,6,38,4,16,46, and 78 instances of 1's, respectively, out of a total of 250 transitions.
In the 1st iteration, an FSM, "sand," is included in the design.For input matching,  6 ,  7 , and  9 are discarded because they contain 178, 150, and 182 don't cares, respectively, out of a total of 184 transitions.All states are matched with the   requirement and operating speed are presented in Figures 3  and 4, respectively.
The operating speed of the proposed system is maximum (i.e., 810.17 MHz) and its LUT requirement is minimum (i.e., 42 LUTs) in the 0th iteration.The operating speed is reduced, and the LUT requirement is increased successively by adding an FSM at each iteration as shown in Table 4. Therefore, the proposed architecture acts as an ideal candidate for such applications where the similarity between the sets of FSMs is high (i.e., fewer differences in their descriptions).Many International Journal of Reconfigurable Computing Variation-based reconfigurable multiplexer bank (VRMUX) [11] Combination-based reconfigurable multiplexer bank (CRMUX) [11] Proposed Reconfigurable FSMIM-S architecture  architecture leads to a problem of defining the optimized multiplexer bank for mode based reconfiguration for the set of FSMs in a particular application.This situation transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set with minimal cost.As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed, which provides the required optimized multiplexer bank.By using the proposed architecture, operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation in comparison with VRMUX [11].It has also been shown that the operating speed is improved at an average of 9.14% as compared with Variation-based reconfigurable multiplexer bank (VRMUX) [11] Combination-based reconfigurable multiplexer bank (CRMUX) [11] Proposed Reconfigurable FSMIM-S architecture CRMUX [11].The only trade-off of the proposed technique is that it requires 88.65% more LUTs compared with CRMUX [11] during FPGA implementation.Further, the improvement on this work is focused on reducing the LUT requirement to implement the proposed architecture.In this study, a binary state encoding is used, and next state function is partially included in matching.However, evolutionary state encoding algorithms such as [23] or [24] can be used to reduce the increased LUT requirement.

Figure 2 :
Figure 2: Flow chart for iterative greedy heuristic based Hungarian algorithm.
It provides the cost matrix to map recon ckt b states into base ckt states.An array is created at each transition in both base ckt and recon ckt b by combining [input combination ∈ { 1 ,  2 , . . .,   },   ].

Figure 3 :
Figure 3: Comparison of hardware requirements during FPGA implementation.

Figure 4 :
Figure 4: Comparison of operating speeds during FPGA implementation.
. For this, Input.The descriptions of the FSMs (i.e.[ℎ, ,   ,   , ]) Output.The optimized multiplexer bank for mode based reconfiguration begin select the largest FSM from the set based on the description; base ckt ← largest FSM; recon ckt 1, recon ckt 2, . . ., recon ckt  ← rest of the FSMs in the set; for each recon ckt b ∈ {recon ckt 1, recon ckt 2, . . ., recon ckt } do if (L base ≥ L recon) then / * performing the input matching * / generate,  ←  base   recon combinations of input lines for base ckt to match with input lines of recon ckt b; go to state matching; / * calling the function-"state matching" * / else if ( base <  recon) then generate,  ←  recon   base combinations of input lines for recon ckt b to match with input lines of base ckt; go to state matching; / * calling the function-"state matching" * / end select combinations of input lines with min{assignment cost 1 , . . ., assignment cost  } & min{total cost 1 , total cost 2 , . . ., total cost  }; perform binary state assignment in base ckt & recon ckt b i.e. apply (  ) & (  ); weight assignment( ); / * creating arrays by [selected input combination, (  ), (  )] * / go to dummy replacement; / * calling the function-"dummy replacement" * / if ( base ≥  recon) then / * performing the output matching * / generate,  ←  base   recon combinations of output lines for base ckt to match with output lines of recon ckt b; end select combinations of output lines with min{XOR count 1 , . . ., XOR count  }; update the description of base ckt; end for each recon ckt b ∈ {recon ckt 1, recon ckt 2, . . ., recon ckt ( − 1)} do go to dummy replacement; / * calling the function-"dummy replacement" * / update the description of recon ckt b; end perform a mutual (i.e.  2 ) Bitwise-XOR operations between the updated descriptions of FSMs; obtain the optimized multiplexer bank for mode based reconfiguration; end Algorithm 1: Iterative greedy heuristic based Hungarian algorithm.
go to output matching; / * calling the function-"output matching" * / else if ( base <  recon) then generate,  ←  recon   base combinations of output lines for recon ckt b to match with output lines of base ckt; go to output matching; / * calling the function-"output matching" * / Input.Combinations of input lines, the descriptions of base ckt & recon ckt b (i.e.[ℎ, ,   ,   , ]) Output.Assignment cost  , total cost  , modified description of recon ckt b begin for all (combinations of input lines) do if ( base ≥  recon) then / * equating the number of states in both the FSMs * / add ( base −  recon) dummy states in recon ckt b; , The descriptions of base ckt & recon ckt b (i.e.[ℎ, ,   ,   , ]) Output.The descriptions base ckt & recon ckt b without dummy states and dummy transitions.begin if ( base ≥  recon) then / * replacing the dummy states * / replace dummy states in recon ckt b by Proposition 1; / * by considering states in place of transitions in the modified cost matrix * / else if ( base <  recon) then replace dummy states in base ckt by by Proposition 2; end for each (matched state,   recon ∈ recon ckt b) do / * replacing the dummy transitions * / for each (transition in recon ckt b, ℎ recon ∈ {1, 2, . . .,   recon }) do if (  base ≥   recon ) then replace dummy transitions in recon ckt b by Proposition 1; else if (  base <   recon ) then Input.

Table 3 :
Output matching among FSMs.Combination of input lines for base ckt,   base , ℎ base , combination of input lines for recon ckt b,   recon , ℎ recon ; Output.Cost matrix C / * cost matrix formation to map recon ckt b states into base ckt states * / begin base ckt array ← create an array at each transition in base ckt by combining [input combination ∈ { 1 base , . . .,   base },   base ]; recon ckt array ← create an array at each transition in recon ckt b by combining [input combination ∈ { 1 recon , . . .,   recon },   recon ]; for each (state,   recon ∈ recon ckt array) do

Table 4 :
Iterative implementation of the Reconfigurable FSMIM-S architecture on Virtex-6.Note.#LUTs denotes the number of LUTs in ISE.