VLSI Architectures for Sliding-Window-Based Space-Time Turbo Trellis Code Decoders

The VLSI implementation of SISO-MAP decoders used for traditional iterative turbo coding has been investigated in the literature. In this paper, a complete architectural model of a space-time turbo code receiver that includes elementary decoders is presented. These architectures are based on newly proposed building blocks such as a recursive add-compare-select-o ﬀ set (ACSO) unit, A, B, Γ -, and LLR output calculation modules. Measurements of complexity and decoding delay of several sliding-window-technique-based MAP decoder architectures and a proposed parameter set lead to deﬁning equations and comparison between those architectures.


Introduction
The ability of turbo codes (TCs) to achieve very low BER that approaches the Shannon limit is very attractive.These channel capacity approaching codes have been proposed by Berrou et al. [1,2].Iterative turbo decoding involves the parallel concatenation of two or more component codes, which operate on the data and exchange information in order to progressively reduce the error rate.The exchanged information is in the form of log-likelihood ratio (LLR) soft decisions and can be exchanged between elementary decoders, which apply either the soft output Viterbi algorithm (SOVA) [3,4] or the maximum a posteriori (MAP) probability algorithm [5].
The principles of iterative turbo decoding can be combined with those of space-time coding resulting in a bandwidth efficient low-error-rate channel coding scheme named space-time turbo trellis coding (STTuTC) [6][7][8][9][10].The latter scheme benefits from the impressive coding gain of turbo codes and the diversity gain of space-time codes, to obtain a very power-efficient system.These power gains are very important for high-performance communication systems, particularly in scenarios where low signal-to-noise ratio is in demand.
Despite the complexity reduction, iterative turbo codes still have prohibitively high implementation complexity and suffer from large decoding delay, motivating researchers to seek efficient implementations [11][12][13][14][15][16][17][18][19].The sliding window technique (SWT) presented in [17][18][19][20][21] enables the early acquisition of the state metric values without having to scan the whole trellis frame in the one direction before scanning the other, and this results in reduced elementary decoder latency as well as smaller state metric memory requirements.

Turbo Transmitter.
A space-time turbo trellis code is a parallel concatenation scheme using two identical elementary encoders to operate on an information message C = [C 0 , C 1 , . . ., C t , . . ., C P−1 ], where each symbol consists of b bits [8,9].The same message is sent to both component encoders, but each one receives this information in a different order through an interleaver.
Elementary encoders using recursive space time trellis coding (STTC) benefits from interleaver gain and iterative decoding [6][7][8].These encoders add redundancy to the encoded messages, and the second sequence is deinterleaved back to its original order.The two sequences of codewords are multiplexed and truncated to preserve bandwidth efficiency and form a single P length sequence denoted by X = [X 0 , X 1 , . . ., X t , . . ., X P−1 ], where each space-time vector X t = [x 1 t,p , x 2 t,p , . . ., x i t,p , . . ., x K t,p ] T consists of K b-bit modulated symbols.In vector X , half of the symbols (odd locations) come from component encoder 1 (p = 1) and the other half (even locations) come from component encoder 2 (p = 2) [6].The sequence is forwarded for carrier modulation and transmission since baseband modulation occurs within the encoders.The structure results in a long block code from small memory convolutional codes, a fact that enables the turbo code to approach the Shannon limit.
Note that the recursive STTCs are trellis based, meaning that during the encoding process, in each elementary encoder a trellis path is created [7].Let S = [s 0 , s 1 , . . ., s t , . . ., s P−1 , s P ] be a state row vector containing all the states in a trellis path chosen by the information message.The combined trellis paths will be decomposed and exploited at the receiver side to decode the message.

Proposed Turbo
Receiver.Consider a baseband MIMO receiver, shown in Figure 1, the received frame T , can be used to estimate the original sequence C. Using the baseband signal representation, the received noisy signal from antenna j at time t over a MIMO (possibly fading, when h t j,i / = 1) Gaussian channel is given by the equation where V j t represents the noiseless version of the r j t signal [7][8][9].The receiver, after estimating the channel fading coefficients, separates the signals using maximal ratio receiver combining (MRRC), which also equalises the signals by removing the effect of the channel.With MRRC, costly joint detection of space-time symbols is avoided.
The equalised frame symbols are delivered to a softdemapper to calculate the soft channel output given in ( The received symbols have already −E s /2σ 2 prescaled by a SNR scaler in the analogue domain [12].A frame formation block (FBB) separates the even and odd symbols and inserts zeros, so it is assured that each log-MAP decoder core receives information only from the corresponding component encoder.
The STTuTC decoder at the algorithmic level consists of two soft-in soft-out (SISO) component decoders, which exchange soft information and progressively through many iterations result in a better estimate of the values of the information symbols [11,12].Since the two component decoders do not operate on data at the same time, only one silicon IP core can be used to implement both.As illustrated in Figure 1, the decoder contains a symbol-based SISO component core that applies the log-MAP algorithm on a nonbinary trellis.It also uses two memories, one for storing the exchanged soft information and another to perform symbol interleaving/deinterleaving.

MAP and Log-MAP Algorithms.
Maximum likelihood decoding using Viterbi algorithm determines the most likely path through the trellis and from that determines the information symbol sequence, whereas MAP decoding determines the latter sequence by considering each information symbol c t independently using the channel observations R. The MAP algorithm is used to compute the a posteriori LLR soft decisions (3), about the value of the symbols in the information message [6,8]: where f = 0, 1, 2, . . ., 2 b − 1.This equation expresses the excess probability of a symbol u t being f in the logarithmic domain over the probability of being equal to a reference symbol 0 [11,13].The log-MAP decoder calculates the LLRs of all the possible modulation symbols in the constellation diagram using the forward A t (s) = ln(a t (s)) and backward B t (s) = ln(β t (s)) state metric probabilities as well as the branch metric probabilities Γ t (s , s) = ln(γ f t (s , s)) in the logarithmic domain: The A t (s) metric represents the logarithmic probability that the state s at stage t has been reached from the beginning of the trellis.Similarly, the B t−1 (s ) metric represents the logarithmic probability that the state s at stage t − 1 has been reached from the end of the trellis [13,15].The branch metric probabilities can be computed using where C 1 = 1/2πσ 2 .The above equation gives the probability of a transition at time t from state s to state s in the trellis diagram [13].Translating (5) in the logarithmic domain [15], where C 1 = ln(C 1 ) is a constant, the H t (s , s) term is the soft channel output, and the ln(Pr t ( f )/Pr t (0)) term is the a priori information coming from the other component decoder: Finally, (7) gives the elementary decoder output, which is a posteriori probability LLR in terms of logarithmic domain state and branch metrics [13].

Soft-In Soft-Out-MAP Component Decoder
3.1.Decoder Module Architecture.The proposed elementary decoder core is illustrated in Figure 2. The main modules within this decoder are the branch metric calculation module (BM-CM), the A-and B-state metric calculation modules (A-SM-CM and B-SM-CM, resp.), the LLR calculation module (LLR-CM), and a large A-metric RAM to temporarily store metric values.
The BM-CM calculates all possible normalised branch metric values (see Section 3.1) and store them in a register file.The latter consists of M = 2 b registers to store the M possible branch metric values.These metrics are given to A-SM-CM, B-SM-CM, and LLR-CM.The A-metric CM scans the entire trellis diagram in the forward direction, starting from stage zero and ending at stage P, calculating all A-state metrics and storing them in the A-metric RAM.
After this first trellis pass the B-metric CM and the LLR-CM simultaneously compute the B-state metrics and the LLR outputs, respectively.The B-SM-CM scans the entire trellis in the backward direction, from stage P to 0. At this time, the B-metrics do not need to be stored in a memory, since every time a B-metric is calculated the corresponding A-metric is extracted from the A-metric RAM and the LLR calculation is computed.
Therefore, only the A-state metrics are stored in a RAM memory, the size of which must be (P + 1) * 2 G * w L , for continuous mode decoders.In a decoder operating in the terminated mode, where the trellis path begins and ends at state s(0), that is, s 0 = s(0) and s P = s(0), the memory requirement is slightly reduced to [(P − 3) * 2 G + 2 * 2 b + 2] * w L , whereas in truncated operating mode (i.e., s 0 = s(0) must hold for every frame, but there is no termination condition), the storage requirement is used to represent the internal data of the MAP decoder.
Table 1 shows the complexity and decoding delay of a log-MAP decoder operating in different modes.The metrics are calculated using data flow units (DFUs) that operate in decoding steps.As can be seen the log-MAP decoder results in very high memory requirements and long elementary decoder latencies.

Branch Metric Calculation Module.
The branch metric calculation module computes the normalised branch metrics Γ f t (s , s).The branch metric appears on both the numerator and denominator of the a posteriori LLR formula, thus any constants will cancel out and play no role in the calculation: The module receives as input the soft output of the channel H t (s , s) and the a priori information from the other component decoder and computes all possible values of branch metrics as shown in Figure 3.

State Metric Calculation Module.
In the literature, the proposed architectures that recursively calculate the state metrics are based in the so-called ACSO (add-compareselect-offset) processing unit equivalent to add-max * () presented in [18].It is responsible for the computation of a state metric based on the previous stage metrics and the branch metrics arriving at the state in question.
Several different, but similar architectures have been proposed.The diagram in Figure 4 suggests a new type of the ACSO unit, which instead of employing more multiplexers to cope with the more than two input state metrics in nonbinary trellises, it recursively accepts the input state and branch metrics and calculates the output state metric in a number of cycles.A look-up table can be used to store precalculated correction factors and its quantization is discussed in [18,22].
This module requires M clock cycles to calculate the output state metric.In the first clock cycle, the feedback register is reset, and the A t−1 (s 1 ) and Γ 0 t (s 1 , s) are applied at the inputs.Also the SW1 switch is connected to the zero

with all other cycles
The LUT is connected A t (0)

Branch metric selector
State metric selector register.The result is that a 1 = A t−1 (s 1 ) + Γ 0 t (s 1 , s) is stored in the feedback register.In the remaining cycles, the LUT output is connected through SW1 to the output adder.In the second cycle, the a 2 = A t−1 (s 2 ) + Γ 1 t (s 2 , s) is applied at the input; therefore the max * (a 1 , a 2 ) is stored in the feedback register.Similarly, the max * (a 3 , max * (a 1 , a 2 )) is stored at the third cycle, whereas the output state metric equal to max * (a 4 , (max * (a 3 , max * (a 1 , a 2 ))) is ready at the output port during the forth clock cycle.A-metric calculation is illustrated, but the architecture operates equally good for Bmetric calculation in a backward recursion.This unit will be referred to as recACSO unit.
The forward/backward recursion computations may lead to an over flow.If a state metric value constantly grow the finite word length w L will not be sufficient to hold this value and an over flow will occur.Since the max * () operation is linear and shift invariant and a global shift of state metric values would not change the LLR output value, it is the difference between the state metrics and not their absolute values that is important.Rescaling of the state metrics can be readily performed to avoid over flow.The rescaling technique is the same as that used in other SISO-log MAP algorithms [11,18,23].
The precision of the state metrics determines the word length w L .The precision depends on the dynamic range of the state metrics.
Thus, 2 G recACSO units can be employed to calculate all the states of trellis stage in one step, as depicted in Figure 5 for a 2 G state trellis.This unit receives 2 G possible branch metrics and the previous state metrics and computes all the state metrics of the current stage.Configuring the module this way fastens computation since it avoids reading the previous   stage state metrics from the RAM memory.At the input of the module two selectors are used to drive the recACSO units with the appropriate inputs.

LLR Output Calculation Module.
The LLR-CM depicted in Figure 6 is responsible for computing the output reliability estimates λ f (u t | R) given the state and branch metric calculations according to  A careful observation of (9) reveals a close relation with the recACSO unit calculation.The LLR computation has two terms that can be computed separately and then subtracted.If A t−1 (s ) and B t (s) are added in advance, then the recACSO unit can then be used to calculate max * (), where f = 0, 1, . . ., 2b − 1.Because the max * () when f = 0 has to be subtracted from all other LLRs, an inverter is included to negate this term.
As already mentioned, λ 0 (u t | R) need not be calculated since zero is the reference symbol and therefore its LLR is zero.The remaining 2 b −2 log-likelihood ratios are computed by the proposed architecture and outputted out of the symbol-based log-MAP component decoder.

SWT-Based Architectures for the SISO-MAP Decoder
4.1.The Concept.SISO-MAP decoders suffer from very long latency and high memory requirements.A different organisation (scheduling) of the computation and exploitation of the convergence property [24][25][26] of the decoder lead to reduction in the latency and the amount of state metrics that need to be stored.For this reason, the sliding window technique (SWT) has been proposed in the literature [18,19].SWTs are divided in single-flow structure (SFS), doubleflow structure (DFS), and pointer-based (PNT) techniques.In these techniques the trellis or transmitted frame is divided in windows of size L, where L is the convergence length [25].Care must be taken to ensure that the whole frame can be exactly divided into an integer number of windows.MAP decoder convergence property allows recursive calculation of a state metric to result in a good approximation of the actual metric value after L recursion steps, even if the initial data is a dummy value [8,9].
Three important metrics for comparison of MAP decoders can be considered: (i) decoding delay (latency), (ii) memory requirements (or storage lifetime), and (iii) computational complexity (number of recursion modules).Several different schedulings of the MAP algorithm computation reveal trade-offs between these quantities.Since the window size is L recursion steps, a DFU proposed in Section 3 can be employed to calculate all the state metric vectors in a window in L steps.A DFU can perform forward, backward, or dummy metric calculations.Defining the following parameter set will assist in the analysis of SWT-based MAP decoders (technique, Π, π, shift, ε).The "technique" element is the type of computation organisation with possible values being SFS, DFS, SFS/PNT, and DFS/PNT."Π" is the relative position between (A-and B-) forward and backward coupled metric recursive calculations and therefore determines when the B-metric calculation begins relative to A-metric calculation.It is a continuous value variable, but only three values are a meaningful choice.Π = 0 means that the backward calculation starts after the forward ends, Π = 2L means that the forward calculation starts after the backward ends, and Π = L means that the two calculations take place simultaneously and switch role in the middle of the convergence block calculation.
"π" is the ratio of the valid over the invalid (dummy) metric recursive calculation.To compute the backward metrics before the end of the whole frame a dummy metric calculation is required to estimate an intermediate value using the convergence property.This means that for every Aand B-metric pair one dummy module is required, but that can vary depending on the technique, π is the ratio of invalid metrics over valid A-and B-metrics.
The computation can take place in two directions (two flows) simultaneously each calculating half of the frame.The "Shift" is the time shift in decoding steps in between the two flows.In the pointer technique pointers are used to reduce memory requirements, ε is the number of pointers, which when exists indicates the pointer technique PNT.
This set of SWT-based architectures has been fully investigated and some results are given in Table 2.These indicate the hardware requirements and decoding delay of some common implementation structures.In the DFG the horizontal axis represents time quantized in terms of symbol periods, whereas the vertical axis represents the symbol number within the frame.In this specific diagram a frame of P = 4L symbols decoded by a trellis of P + 1 = 4L + 1 stages is presented, where L is the convergence length of the decoder.
Three data flow units can be seen in the diagram.The dashed arrow represents L recursions of a forward DFU calculating A-metric, which are stored as calculated.The continuous arrow represents L recursion of a backward DFU calculating B-metrics.This calculation is done simultaneously with the LLR soft output calculation.The dotted arrow represents L recursions of a dummy metric DFU.
In the diagrams to follow the dotted arrow always represents dummy metric recursive use of DFUs, whereas for forward and backward metric DFUs the arrow is continuous or dashed when the metric is not stored or stored in the memory, respectively.The dummy metric DFU results in a valid metric after the L backward recursions.
The fact that the dummy metric DFU is working backwards can be understood from the fact that its projection on the vertical axis points downwards.DFUs whose projection on the vertical axis points upwards are operating on the data in a forward manner.Also note that the maximum number of arrows a vertical line can cross in this diagram gives the number of DFUs employed in this structure [21].
Storage is also represented by the shaded rectangular areas.Taking into account one of these areas (they are all the same), the projection of this rectangular area on the vertical line gives the amount of state metric vectors to be stored [18,21], whereas the projection on the horizontal line gives the time required to store the vectors.
Decoding delay (or latency) is the horizontal distance between the acquisition curve (always y 1 = t) and the decoding curve in DFG.Since the decoding curve in this structure is at y 2 = t − 4L, the decoding delay is y 1 − y 2 = t − (t − 4L) = 4L symbol periods.The symbol period is denoted by t S .
DFGs and tile graphs model the same thing, they model the resource-time scheduling of the recursions of the algorithm.A diagram is a DFG when viewed as a concrete graph or tile graph when viewed as tile repetition [17].On the right of Figure 7 a tile, which consist of 3 DFUs, is illustrated.This tile is repeated as many times as required to form a complete DFG.
Let us concentrate more on the scheduling of the operations described by the DFG in Figure 7.During the symbol periods 2L to 3L−1, the dummy metric DFU (dotted arrow 1) starting from a zero vector assigned to state metric vector B 2 L computes invalid metrics from B (2L−1) down to B L .This calculation results in a valid state metric vector B L , because convergence is reached.All other metrics calculated during these L recursions are invalid.At the same time the forward DFU (dashed arrow 2) calculates and stores a sequence of L A-metrics from A 0 to A (L−1) .
In the next L symbol periods from 3L to 4L − 1 the backward DFU (continuous arrow 3) comes into play to  calculate B (L−1) to B 0 .During the backward calculation the A-state metrics from A (L−1) to A 0 are extracted from the memory and together with the B-state metrics are used to calculate the first L soft outputs.
Thus, a set of 3 recursion units (called a "tile" in a tile diagram as indicated in DFG) results in the calculation of L A-and B-metrics required to produce L soft output LLR values.Note that the LLR values are calculated in a reverse manner.At symbol periods 4L to 5L − 1, the order of soft output values is reversed [18].In a turbo coding scheme, the interleaver reordering can be exploited to perform this reversing.If this last LLR reversing step is done by the MAP decoder, then the latency of this core is 4L symbol periods.If the reversing is done by the interleaver, the decoding delay is 3L symbol periods.
The above process (tile) is repeated until the end of the frame.The set of tiles in the tile graph (or DFG) represents a single flow passing through the data of a frame (or trellis).Hence, the proposed structure is designated as a single-flow structure (SFS).Structures where the above process occurs in both directions are called double-flow structures (DFSs) and will be discussed later in this paper.
In single-flow structures the benefit of structures with Π = 0 compared to Π = L is the use of a single RAM memory Figure 13 depicts a DFS, with Π = L, π = 1/2, shift = L/4.Note that for RAM sharing between the two data flows a shift of L/4 is required in this case, as opposed to L/2 shift used in other DFSs.This means that the appropriate shift is not constant but depends on the other parameters of the structure.In this diagram it is clearly indicated that two flows are operating splitting the 8L frame into two 4L symbol blocks.The first flow is using 4 DFUs, one forward, one backward, and two dummy metric DFUs.The same resources are employed by the second flow, therefore in total 8 DFUs are required.
Double-flow structures lead to efficiency only for Π = L, since only for this value of Π memory sharing can take place.For all other values of Π the memory requirements and the number of DFUs are doubling for double-flow structures with the added advantage of doubling the MAP decoder speed since the frame decoding is split between the two flows.For (DFS, Π = L, π, shift) decoders, storage req.
Setting π above 1 results in prohibitively large decoding delay.Even if π is only 3, the decoding delay is (4π − 3/2)L/(3π −3/2)L = 10.5L/7.5L,which is unacceptably large, therefore architectures for π ≤ 1 are of interest only.In all of the above structures all the necessary metrics are stored in the RAM memories.Another idea is to selectively store some of the state metrics and recompute all the others.This idea of selective recomputation leads to reduced storage requirements.
The pointers are nothing else but the actual state metric values, which are stored and extracted from a storage device whenever is necessary.The values to be stored in pointers are depicted with small white circles in Figure 11.
In this structure, during time t = 2Lt S to (3L − 1)t S , after L dummy metric recursion steps, the obtained valid B L value is stored in a register.This value is also used to initialise a backward recursion, which operates during the symbol periods 3L to 4L − 1.This backward B-metric DFU as progresses stores in registers the values B (3L/4) and B (L/2) and in the RAM memory the last L/4 state metric values from B (L/4−1) down to B 0 .Thus, only L/4 state vectors are stored in RAM.An extra backward DFU will then extract the pointer B (L/2) to recalculate the other L/4 B-metric values (B (L/2−1) down to B (L/4) ) and store them in the memory.This procedure will be repeated until the calculation of all L Bmetric vectors in the window.As the backward metrics are calculated, a forward DFU computes the corresponding Astate metrics.
As can be observed from Figure 11 at any given time only L/4 state metric vectors of size 2Gw L each are stored in the RAM.This means the storage requirements of this structure are a RAM of size (L/4) * 2G * w L bits and 3 pointer vectors of 2G * w L bits each.
For pointer-based SFS and Π = 0 the storage of state metrics starts right after the dummy metric calculation has computed a valid metric, so there is no time space for pointers.Π = 2L is equally well with the Π = L case regarding pointers in the sense that the trade-offs mentioned before are not affected.
For (SFS/PNT, number of DFUs = 4, (12b) Increasing ε (the number of pointers) reduces the RAM memory but has no effect on the number of DFUs.The size of the required memory is given by (L/(ε + 1))2 G w L .Of course, some registers need to be allocated for each pointer, whose size is 2 G w L .The decoding delay is only affected by the amount of pointers used if interleaver LLR reversing is employed.In this case the decoding delay is 4L − (1/(ε + 1))L and holds unless π > ε.
If the structures (SFS/PNT, Π, π = 1, shift = −, ε = 3) with Π = 0, L, 2L, are compared with the corresponding SFS structures that use no pointers, it is observed that using three pointers and one more DFU results in 75% reduction in the required RAM memory.The decoder delay of the pointerbased structures when interleaver LLR reversing is used is increased by 0.75L for the first two cases Π = 0, L.
For π / = 1, if mod(ε + 1, 1/π) / = 0, the pointer-based techniques (single flow and double flow) are not applicable.If mod(ε + 1, 1/π) = 0, the resultant architecture does not reduce memory size but has higher throughput (computation speeds up π times).In this case, (12a) and (12c) hold as they are, but in (12b) the number of DFUs is given by 1/π +3.Π = 2L is the best choice for pointer-based SFS, whereas Π = L is a better choice for pointer-based DFS, since the latter Again, note that increasing ε reduces the storage requirement but not the data flow unit count of the structures.Figure 13 shows a complete SISO-MAP decoder VLSI architecture using a double flow with Π = L, π = 1, 3 Ptrs, shift = L/8.This architecture requires 8 DFUs (2 dummy, 2 pointer saving, and 2 forward and 2 backward) and two RAM memories (L/8)2 G w L each and has a decoding delay of 4L.With selective recomputation only a fraction of the required state metric vectors are stored in the RAM memories.If ε pointers are used, then only L/(ε+1) of the state metrics need to be stored in the RAM memory.The remaining L − (L/(ε + 1)) is recalculated in blocks of L/(ε + 1).

Conclusions
The VLSI architectures of space-time turbo trellis coding decoders as well as of a set of SISO-MAP component channel decoders used in turbo coding are proposed and investigated.The space-time turbo code receiver as opposed to binary codes is based on nonbinary trellises, which imposes a number of differences.Except the fact that channel estimation as well as MRRC combining is needed to cope with the more than one diversity received symbols, the frame formation block is different than in a traditional binary turbo receiver, since in the latter systematic and parity redundant information can be separated and stored in separate memory banks, whereas in the former it cannot, so the whole symbol is stored in a memory bank.The difference is that in a STTuTC receiver case, the equalised symbols are demultiplexed and stored in memory banks, zeros are inserted at even or odd locations, and one memory bank is sent to the decoder each time.This ensures that the SISO-MAP decoder accepts the information from the corresponding encoder.The symbols are demapped by a symbol hard or soft demapper.
The proposed STTuTC architectures are based on a different ACSO unit than the binary turbo codes.This ACSO unit can handle the more than two state and transition metric pairs iteratively.In nonbinary trellises the state and transition metric pairs are more than two, because of the increased number of transitions in between states.Thus, the ACSO unit must either grow or work iteratively.Many ACSO units can be used to calculate all state metrics in one step without the need for storing those values in a state metric memory.Thus, the state metric calculation modules are considerably different than in the binary turbo decoders.The LLR calculation module is also different because more than one LLR value needs to be calculated in every decoding step."f " ACSO units appropriately connected rather than two are required to calculate the appropriate amount of LLR values for all possible symbols.This also means that the number of soft LLR outputs the SISO-APP demapper delivers every time depends on the "f " possible values.That is, f -1 LLR must be outputted, so the LLR memory size is different than the binary turbo decoder case.
A parameter set (FS, Π, π, shift, e) helped in the comparison and led to defining equations for many different cases, single-flow, double-flow, and pointer-based techniques.Table 3 gives a list of formulae, which determine the quantities of memory size, number of deployed DFUs, and decoding delay of the most efficient techniques from those discussed in the this paper.

Figure 12 :
Figure 12: State metric calculation modules for single-flow structures using the pointer technique.

Figure 13 :
Figure 13: VLSI Architecture for SWT-based MAP decoder using the double-flow structures and the pointer technique.

Table 1 :
Complexity and decoding delay of log-MAP decoders.

Table 3 :
Complexity and decoding delay of best SW techniques.