FPGA Implementation of a Single Step MFCV Estimator Based on EMG in Diabetic Neuropathy

This paper details the design and the hardware implementation of a real-time diagnostic system based on FPGA for the muscle fibre conduction velocity estimation (MFCV). The MFCV is considered as a principal monitoring index for diabetic neuropathy (DPN), as well as in muscle fatigue assessment, to evaluate the muscle fibre status. The FPGA platform evaluates the MFCV during dynamic contractions (e.g., gait), by exploiting a multichannel sensing system composed of 4 wireless surface EMG electrodes, placed in pair on each leg. Raw data are digitized and made binary to create two bitstreams for each monitored limb. Then, a comparison between the two-bit streamed EMGs extracted from the same leg is carried out. The comparison, which allows extracting the MFCV, exploits a computationally light version of the cross-correlation method. The overall architecture implemented and validated on an Altera Cyclone V FPGA is HPS-free and exploits 22.5% ALMs, 10,874 ALUTs, 9.81% registers, 3.36% block memory, and <2.7% of the total wires available on the platform. The choice of FPGA as computing system lies in the possibility to determine resource utilization, related timing constraints for a future real-time ASIC implementation in wearable applications. From the actual muscle contraction during gait (cyclical starting point of the computing), the system spends about 316ms to acquire useful data and 47.5ms (on average) to process the signal and provide the output, dynamically dissipating 28.6mW. The accuracy of the tool evaluation has been evaluated proving the repeatability of the measurements by in vivo test. In this context, 1250 contractions from each subject involved in a protocolled 10-meter walk have been acquired (n = 10 subjects evaluated). On average, the same MFCV estimation has been extracted on 1184/1250 contractions (standard deviation of 11 contractions), reaching an accuracy of 94.7%. These estimations fully match the physiological value range reported in literature.


Introduction
Peripheral neuropathy (DPN) is typical of patients with type 2 diabetes [1].Clinically, DPN in these patients starts with sole or predominantly sensory dysfunction and disturbance [1].Several currently in use methods for DPN assessment, such as the nerve conduction velocity (NCV) analysis [2], the laser-evoked potential (LEP) [3], and the Semmes-Weinstein monofilament test (SWT) [4], require expensive and cumbersome equipment [3], limiting the subjects' comfort and movement freedom.Moreover, major drawback is the invasiveness [2,4] or the need of an expert support in handling the data.The NCV is an indirect measure of the motor unit potential propagation speed on nerve tissue by assessing the electromyographic patterns (EMG) [2].The method requires the use of needles as EMG electrodes, to be positioned inside the skin, one to stimulate the muscle under test with an electrical signal (electrostimulation) and another one to collect, in a suitable different point, the induced response.The timing between the transmission and the response collection identify the NCV.This practice excludes the possibility of an auto-positioning of the electrodes.
The LEP test is used in neuropathy analysis on small nerve fibres.It is based on the detection and evaluation of the evoked potential latencies and amplitudes, obtained by laser stimulations in different measurement experiments [3].Although the test meets the noninvasiveness requirement, the need to involve specialized medical personnel, as well as the patient's movement restriction during the test, remains insurmountable problems.The SWT [4] consists of the use of a nylon filament that is placed in contact with the foot surface.The force exerted by the filament and the information provided by the patient are converted in a measure of the sensory capacity of the subject.This test is qualitative and does not offer an objective measure.
Starting from its first definition in 1943 by Denslow and Hasset, the muscle fibre conduction velocity (MFCV) has gained more and more importance in the neurophysiology of healthy muscles [5].It got a relevant role in gait analysis, rehabilitation, and prosthetic scenarios [6,7], as well as in the clinical investigation of polyneuropathies, such as DPN [1].Recently, it has been proved that MFCV can be used to detect muscle fibre denervation atrophy, as early sign of motor axonal loss [2].Indeed, abnormalities in MFCV are a sign of the impairment of the motor unit in early diabetic polyneuropathy, slowing the velocity in the conducting fibres.Thanks to its noninvasiveness and the possibility to realize reliable portable solutions, the MFCV is a valid candidate in the early DPN recognition and patient remote monitoring.
Indeed, the MFCV can be evaluated by using surface electrodes that acquire the electromyographic (EMG) signals along the muscle fibre with the aim to define their propagation velocity [5].In this case, the MFCV is derived by the ratio between the interelectrode distance (Δd) and the measure of the propagation time θ between two EMG electrodes placed along the same muscle fibre [6].Despite its clinical applicability, the literature does not propose solutions that allow a real-time assessment of the muscle fibre conduction velocity (MFCV) and fatigue, in ordinary life movements [7][8][9][10].In the most cases, these solutions require cumbersome static structure, such as an ergometer cycle [9,10], or force the subject to keep the contraction all along the measurements [7][8][9][10], limiting the subject's movement freedom.
Aiming to bridge this gap, an FPGA-based cyber-physical platform, for remote monitoring of MFCV in everyday life (i.e., gait), is here proposed.Since the limbs primarily affected by DPN are the lower ones [4], the platform extracts the MFCV by analyzing data passively acquired (no electrostimulation is required) from 4 wireless surface EMG electrodes, which are positioned on the patient legs.The muscles selected for the acquisitions are the gastrocnemius laterals, and an appropriate electrode positioning is guaranteed by an embedded positional tool.The MFCV estimation exploits a 2-electrode time-domain comparison approach.Since the velocity assessment between two points along the same fibre is a linear problem, it is here solved by a simplified digital version of the classic cross-correlation method between two time-shifted signals.The overall platform realizes a portable diagnostic tool that provides a stable estimation of the MFCV.In future perspective of an ASIC implementation, the architecture has been validated on an FPGA (here an Altera Cyclone V) that will be the embedded core of the final wearable device.The paper is structured as follows.Section 2 introduces the state of the art for the MFCV extraction, discusses the algorithm on which the system is based, and finally presents the FPGA implementation.Section 3 outlines the experimental results from testing and validation with special care to the resource utilization, timing, and measurement repeatability.

Materials and Methods
2.1.MFCV Estimation: State of the Art.In the last ten years, different methodologies for the MFCV evaluation have been proposed by the literature [6].Typically, they differentiate each other in the analysis domain: frequency or time [6].
The method proposed in this work for the MFCV evaluation is in time domain and can use just 2 electrodes in the minimal configuration.Considering a muscular fibre, the two electrodes are typically placed along a specific muscle path [11] with minimum interelectrode distance (Δd), which allows considering the fibre as a linear motor unit potential conductor.
Under this hypothesis, the MFCV computing can be operated by considering the classic algebraic relationship that defines the transmission speed in a straight direction as where v is the conduction velocity (MFCV), Δd is the minimum interelectrode distance that satisfies the abovementioned hypothesis, and θ represents the delay between the signals collected at the two electrodes.
If the EMG signal moves from the position of the first electrode, dA, to the position of the second electrode, dB, the interdistance is given by Δd = dB -dA.The MFCV is indirectly measured from the delay, θ.Mathematically, once acquired in two points along the same muscle fibre, the time-discrete signals can be described as follows [6]: where x A n and x B n represent the n th sample associated with the signal detected on the electrodes A and B, respectively.The terms α n and α n − θ define, respectively, the useful components of the signal and their θ-shifted version.The components ε A n and ε B n are the noise on both the channels in a linear superposition hypothesis.ε is considered as noise, independent, white Gaussian, with the same variance.These assumptions make possible evaluating the θ, by comparing the time distance between two peaks of the EMG signal in the two different electrode positions [6].The use of temporal peak distance between the EMGs to evaluate θ and, thus, the MFCV is strongly dependent on the signal/time resolution and can be improved by using a cross-correlation approach between the acquisitions.The cross-correlation time-lag, associated with the maximum cross-correlation value, identifies θ.Farina and Merletti [6] also propose the signal evaluation in the frequency domain, studying the peaks and dips in the EMG spectrum.These 2 Journal of Sensors methods extract the phase shifts to determine the time delay, θ.These approaches are computationally heavier than the time-domain ones and are strongly affected by the noise.

The System
Overview.An overview of the implemented platform is shown in Figure 1 and comprises the setup for the data acquisition and the main blocks that constitute the architecture.The main computing blocks, which are physically implemented on FPGA, are shown inside a grey box and consist of (i) bitstream generator, (ii) θ (delay) computing block, and (iii) MFCV estimation.An accessory block is the electrode positional scanner, which avoids the concrete problem of the subjective EMG electrode positioning.The acquisition interface, in Figure 1(b), includes 4 wireless smart EMG (2 for each leg) electrodes.The sensing nodes transfer the data to a gateway [12,13], which is directly connected to the FPGA evaluation board (DE1-SoC) that embeds the elaboration unit (grey box in Figure 1(b)).
At first, the acquired EMGs are elaborated by a computing block (bitstream generator in Figure 1) that digitizes the muscular signals [13].This block reduces the data flow towards the subsequent blocks, preserving useful information for the MFCV computing [14].The derived bit streams (one for each monitored channel) feed the θ computing unit, devoted to the conduction velocity estimation.The θ computing block compares the bitstreams provided by two EMGs along the same muscle fibre, extracting the action potential propagation time θ .

EMG Positional Scanning.
A reference literature study [11], in the MFCV evaluation field, highlights that the most critical point, in the MFCV estimation, is the optimal positioning of the surface EMG electrodes along the same muscle fibre [11].Typically, this step requires a good understanding of the muscle anatomy, linked to the impossibility in placing the electrodes on tendinous zones (TZ) and innervation ones (IZ) [11].Considering the abovementioned constraints, the positioning is always entrusted to specialized medical staff.
In order to realize a completely automatic tool for the electrode positioning, the influence of the muscle length and the mutual electrode locations in EMG acquisitions during dynamic tasks was evaluated.The study in [11] accurately details the behaviour of each lower limb muscle during the gait, also considering skin and fat layers.It shows that the only muscle that does not present electrode shift phenomena is the gastrocnemius.This muscle also presents a stable IZ position during gait.Both the gastrocnemii are then considered in this context for the MFCV evaluation.Figure 1(a) shows a lateral gastrocnemius highlighting the prohibited areas (TZ and IZ in red) and, on the available zones, two virtually traced matrices.The EMG electrodes should be collocated in the areas between TZ and IZ, with a specific interelement distance (Δd).
In the EMG placement context, the guidelines in [15] explain how to identify these muscle areas.The optimal interelectrode distance (and thus a lattice interelement one) can be also defined according to [15].
The action potential along the muscle fibre propagates with same waveform but reduced in amplitude [15].Exploiting this concept, the optimal electrode placement can be realized by using a covariance-based algorithm.The covariance, differently from cross-correlation, allows to quantify the similarity degree between the EMG signals from the same muscle fibre in terms of waveform and magnitude [10].In the available regions, the covariance is higher than in the IZ/TZ zones.An optimal positioning returns a maximum degree of covariance.
The proposed system implements an algorithm that compares the EMG signals from two contiguous electrode sites of the same column: 1 and 2 from Figure 1(a).
The signals acquired by these sites are named as follows: EMG 1 and EMG 2 .For the positioning phase, the subject under test is asked to keep a static contraction of the gastrocnemius for 2 s, while the electrodes are positioned in the proper sites with the prescribed interdistance [15], ensuring a good linearization of the muscle fibre.During this first contraction phase, the EMG samples acquired on the single channel are used to define 4 threshold values (one per channel).These thresholds (Thr) correspond to the 80% of the squared value of the EMG amplitude, which has been computed on the entire duration of the contraction (e.g., 2 s).
Once the thresholds are defined, the subject is asked to carry out another contraction of the same duration.Referring to a single leg and, thus, to the pair {EMG 1 , EMG 2 }, due to 3 Journal of Sensors the propagation direction (from top to bottom), only the top electrode EMG 1 is initially monitored by the system.
When an EMG 1 sample satisfies the conditions: EMG 1 > 0, 3 the system starts an iterative procedure to find the maximum value of the signal on the EMG 1 (EMG 1MAX ).Once the EMG 1MAX is found, the system acquires the EMG 2 samples for a physiological time range (e.g., 4 ms [16]).The proposed system extracts on this time span the EMG 2MAX value.Once both EMG 1MAX and EMG 2MAX are defined, the relative error δ is extracted as A low δ value corresponds to a high degree of similarity between the signals (high covariance), then a position that returns a δ<20% can be considered as optimal electrode positions.When the optimal positioning is found, the system provides a feedback to the user.

The Bitstream Generator.
The bitstream generation block exploits the pillar theory of a binary signal description algorithm, treated in [17,18], in order to realize the EMG bitstream to be compared.It allows to convert the 16-bit EMG signal in 1-bit equivalent one, here named bEMG (as shown Figure 1).This "dimensionality reduction" exploits a dynamic-thresholding approach, in which the EMG signal is squared and stored in an M sample shift register.The mean value of all the M samples is used as a dynamic threshold, while a second averaging is computed on the last N samples (N < <M), defining what we define the "instantaneous" magnitude of the signal.
The resulting bEMG is "1" if the N-sample-based magnitude is higher than the dynamic threshold, otherwise "0."The platform operates with a sampling frequency, f s , of 2 kHz, then M = 1024 sample that corresponds to an acquisition ~500 ms and N = 8 sample, which are ~4 ms.This choice allows a fine bitstream description of the raw EMG signals.Figure 2 shows the bEMG signals associated to the EMG A and EMG B (only the raw data from EMG A channel is shown in Figure 2) acquired during in vivo measurements.The bEMGs in Figure 2 show the presence of a time-shift θ between the electrode B signal and the one on the electrode A.

The θ Computing Block.
Once the two bEMGs (e.g., bEMG A and bEMG B ) have been created, they undergo to an iterative comparison stage, which allows extracting the dynamic degree of resemblance between the two binary signals [14].An observation window on J samples is defined on both the bEMGs, starting from the first activation of bEMG A (when the signal on A identifies a contraction).In our work, the system operates with a window of J = 602 samples [19] (i.e., 301 ms).
The two binary sequences, of J samples, are stored in two shift registers and compared bit by bit, through and XNOR gate, defining the signal x ∈ RJ.Then, the identical bit pair number (IBPN) has been extracted as follows: Each n * J comparison, with n = 1, 2 … 602, a "0" is appended in the LSB of the shift register that contains the bEMG A (e.g., element bEMG A 0 ).In this way, a delayed version of the original signal is created (n * 0 5 ms shifted).The θ shifted signal (bEMG B ) present on the electrode B is left in its original time positions.The comparison is repeated, until bEMG A is a null vector.In this way, the IBPN becomes a vector IBPN ∈ R J .The index of maximum value of IBPN, which represents the highest degree of similarity between the bEMGs, returns the estimated θ.The MFCV is then derived from this estimation according to (1).

2.3.
The FPGA Implementation.The presented algorithm has been implemented on an Altera Cyclone V SE 5CSE-MA5F31C6N, by using the Altera Quartus Prime Lite Edition 17.0 developing environment.The processor system-level design features of the implemented system are summarized in Table 1, identifying the inputs, the outputs, and the global signals (SYS) that ensure the system operation.General enable and reset signals were excluded from the table.Figure 3 shows a simplified block schematic of the whole implemented system.Each functional subsystem is identified by dotted grey line.It is important to note that only one bitstream generator block has been reported on the schematic (EMG A to bEMG A ), but in our implementation, 4 identical branches operate in parallel (two for each leg).4 Journal of Sensors machines (FSM): dynamic threshold FSM and instantaneous power FSM (Figure 3).The operation principle of the FSMs, presented in Section 2.2.2, is the same for both, but exploits two block RAMs of distinct size (RAM-M and RAM-N of M = 1024 samples and N = 8 samples, 32 bit words) in which the new squared EMG samples are written in a FIFO mode [20,21].When a new EMG sample enters (@posedge Clk_2kHz), it is put at the address 0 and the last sample of the RAM (1023 th for RAM-M and 7 th for RAM-N) is sent out to the FSM.This latter sample is subtracted, and the new sample is added to refresh the overall power, overwriting the RAM word with the new data.To cyclically refresh these two magnitudes (i.e., threshold and instantaneous value), the two FSMs divide the sum of the RAM-N elements by 8 [22] for the instantaneous power and the RAM-M ones by 1024 for the dynamic threshold [23].
The ratios have been realized by using 3 bits right shift for the former case and 10 bits for the latter value.A 64-bit discriminator (represented as ">" in Figure 3) compares the powers, generating the 1 bit EMG (bEMG A -Figure 3).

The IBPN Evaluation Block.
In the downstream bEMG generation, the architecture implements a subsystem in which the degree of resemblance between bEMG A and bEMG B is evaluated.When a contraction occurs, the bEMG A rises to "1," setting the enable signal, upstream the switch control FSM, to "1."It means that the system is enabled to acquire and store data in the shift register electr.A and B (Figure 3) for J = 602 sClk falling edge (301 ms of acquisition).When the shift registers are full, the switch control FSM sets its output (ENA_Switch) to "1" enabling a pure feedback on the shift register B, and a delayed version of the stored vector on the shift register A, by using a DFF.It allows to maintain unaltered the θ-shifted signal on B and to shift the pure signal temporarily on the right.To realize this shift, the DFF is reset at the 601 th sClk falling edge, inserting a "0" in append.Each "0" represents a temporal shift of 0.5 ms.As shown in Figure 3, the implemented algorithm analyzes the last bits of the shift registers, comparing them by an XNOR gate.A dedicated clock (sClk) drives the shift between the vectors for the comparison.The binary waveform x , defined according to (5), enables the downstream counter (ENA-Figure 3), assessing the number of sClk falling edges that occurs during ENA = "1."This value is the IBPN on the i-th temporal shift.On the J-2 th sClk falling edge (PI clock), the i-th IBPN is fixed, and at the  5 Journal of Sensors J-1 th one (PO clock), it is sent to the θ estimation block.The system repeats the above described comparison for J-1 times.After all the iterations, the shift register A is a null vector and the shift register B is the same of the first acquisition.
For sake of clarity, Figure 4 shows a functional testbench generated considering J = 5 and making clearly visible the Clk_8MHz.bEMG B is right-shifted of 1 bit w.r.t.bEMG A , then we expect the maximum IBPN in this shift.At the end of acquisition, the stored bEMG A is [10110], while bEMG B is [01011].It is possible to note that the bEMG A right shifts of 1 bit after J comparisons (SR bEMG A ).The maximum value of x is reached after 1 shift, as expected.Here, 5 bit on 5 is identical, and thus IBPN = 4.

The Glitch-Free Multiplexer.
As shown in Figure 4, the sClk signal on which the IBPN evaluation and the θ estimation blocks are based is achieved by the contemporary contributions of the Clk_2kHz and Clk_8MHz, in particular, sClk≡Clk_2kHz during all the acquisition stage and shift register filling.The sClk≡Clk_8MHz is instead defined when the system starts the cyclic comparisons.The glitch-free MUX is driven by the ENA_Switch, which goes to "1" if the registers are full.In that moment, the system passes from Clk_2kHz to Clk_8MHz.The here-adopted MUX architecture (shown in the dotted gray line-Figure 4) eliminates the glitch phenomena during the sCLK commutations [24].

2.3.4.
The θ Estimation Block.The time delay (θ) estimation block has been realized by a dedicated VHDL-based FSM.This unit iteratively compares the IBPN, updating a temporary memory with the maximum value and the associated indexes (number of signal right shifts).After J comparisons, the maximum is defined and also the index that contains this value: where f s is the adopted sampling frequency and θ is the expected time delay.According to (1), the MFCV is derived from the ratio between the interelectrode distance Δd and θ.

Results and Discussion
This section is dedicated to the implementation and testing of the proposed FPGA-based MFCV extractor in the contexts of walking assessment.The system has been implemented on the Altera Cyclone V FPGA and tested on 10 subjects (aged: 24 ± 4, height: 1.77 ± 0.12 m, weight: 82 ± 4 kg) by wirelessly capturing the data from 4 surface EMG electrodes on both the gastrocnemius.Each task consists of a protocolled 10meter walk in which the subjects were asked to walk for 10 m distance adopting a comfortable walking speed, for a total of 25 steps per leg (50 dynamic contractions in total).The test has been repeated 5 times each day, for 5 days, with a total of 1250 dynamic contractions on each subject.The validity of the algorithm in clinical literature matching has been analysed in our previous work [19], then in this section are explained the FPGA-implemented algorithm performances, with focus on the resource utilization, operation timing, and power consumption and measurement reliability.In addition, this section is dedicated to the positional scanning performance.
The overall MFCV estimation system uses 7214.5 logic elements out of 32,070 available (22.5%), 136,638/4065280 memory elements RAM (3.36%), and the 6296/64140 registers (9.81%).Table 2 divided the resource consumption by subsystems.In the table, the ADC controller, the bitstream generators, the switch control units, the IBPN evaluation blocks, θ estimation block, and 2 MFCV computing have been evaluated.The bitstream generator field embeds 4 identical blocks, while each of the latter 4 listed blocks comprise two identical processing chains, one for each leg.Other defines the surrounding circuitry.It is important to note that the positional scanner is not enabled during the normal operation stage of the system, considerably reducing the functional resource utilization (ALMs: −34.1%,ALUTs: −29%, registers: −11.5%, memory bits: −1.2%).6 Journal of Sensors 3.2.FPGA Timing Requirements.From the actual contraction to the MFCV generation, the overall processing stage takes about 363.5 ± 0.25 ms, of which 301 ms for the useful signal acquisition and 62.5 ± 0.25 ms for the effective computing, matching the time requirements for real-time applications.Detailing the wireless recording system introduces a nonnegligible latency for data digitalization [26,27] (which is multiplexed) and transmission, amounting respectively to 1 ms and 14 ms [28].Starting from the FPGA-embedded 50 MHz internal clock (Clk_50MHz), a 20 MHz clock (Clk_20MHz) has been derived by using a PLL to control the ADC.The implemented system requires 2500 Clk_20MHz pulses (~125 μs) to multiplex among the monitored channels (n = 4), ensuring on all the channels have the sampling frequency value each 10,000 cycles at 20 MHz (Clk_2kHz).When the flag of new data occurs, the system sends the 16 bit sample to the bitstream generator FSMs that is driven from the Clk_8MHz.This clock is also derived from a PLL, realizing Clk_8MHz = 8.19209 MHz.The block needs from about 10 μs and seven Clk_8MHz pulses to generate the correspondent bit.The shift register filling operates on the Clk_2kHz falling edges, avoiding the 10 μs delay in the bitstream generation.All the filling operations last 301 ms (602 cycles at 2 kHz), while downstream the XNOR gate, the 602 iterative comparisons * 602 temporal shifts require in the best case 362,404 Clk_8MHz pulses, and in the worst one 362,403 cycles at 8 MHz and one Clk_2kHz falling edge.Experimentally, to complete the computing, the system spends 47.49 ms.Finally, 34 cycles at 8 MHz are required to generate the MFCV value according to (1).The FPGA implementation has been optimized in terms of position on board of the logic array block (LAB) combinational cells and registers as shown in Figure 5.With the hereadopted clocks (Clk_2kHz, Clk_8MHz), the design allows realizing an architecture that presents a worst setup slack of +54.094 ns (data arrives sufficiently early to the designed LAB) and a worst hold slack of +1.03 ns [27].It allows the data input at any given memory remains stable after the clock edge of the clock long enough to be reliably stored.This distribution allows to use <2.7% of the total available wires.The on-chip system area occupancy is shown in Figure 5(a) (Chip Planner tool), while the wire utilization is mapped in color (Figure 5(b)).

Power Consumption.
The here-reported power consumptions have been provided by the PowerPlay Power Analyzer tool of the Altera Quartus Prime Lite Edition 17.0, with a "high" power estimation confidence.The overall implementation consumes 453.24 mW without heat sink with still air, of which 416.64 mW is the core static power dissipation.
P ST is the power statically dissipated on chip, which is independent of user clocks.It includes the leakage power from all FPGA functional blocks, except for I/O DC bias.The I/O management statically dissipated 8 mW with a V DDIO = 2 5 V.The ADC (LTC2308) operating at 2 kHz has a consumption of about 1.25 mW.Considering 1 s of operation, the bitstream generator operates for the 100% of the time, the IBPN evaluation block for the 34.5% of the time, the θ estimation block, and MFCV computing, together it operates for <0.001% of the considered time.Thus, the power dissipation caused by signal transitions is dynamically P DYN = 28 60 mW, considering the two adopted clocks (Clk_2kHz, Clk_8MHz) as shown in Figure 6.P DYN can be divided in 3.60 mW for the I/O, 1.04 mW for the register cells, and 0.04 mW for the combinational ones; the memory 10 kB (M10K) blocks dissipate 11.12 mW; and finally, the PLL unit consumption is 11.55 mW.

Positional Scanning Performance.
To validate the implemented electrode positioning scanner, a comparison between   4), a specular behaviour between xcov and δ is expected.As stated in Section 2.2.1 for the positioning phase, the subject under test is asked to keep an isotonic contraction of the gastrocnemius for 2 s.Then, the EMG signal acquired from a specific electrode (from 1 to 7 in Figure 7) is compared with the one detected on the contiguous site (from 2 to 8 in Figure 7).
For the validation, a whole column of the positioning matrix (Figure 1(a)) has been considered, as shown in the measure setup in Figure 6.Here, the imposed Δd (interelectrode distance) is 23 mm.The blue diagram in Figure 7 shows the relative cross-covariance coefficients between the electrode pairs (e.g., 1-2 and 2-3).The red diagram shows the FPGA extracted δ values on the same pairs.As expected, a specular behaviour is found.The electrode combinations with the minimum value of cross-covariance (poor similarity) are linked to the δ values above the threshold (δ > 0 17).The results identify 2-3 the optimal position, as theorized in literature [15].

MFCV Tracking:
In Vivo Measurements.This section is dedicated to the assessment of the tool measurement repeatability.To this aim, 12,500 dynamic contractions have been acquired from the same subject during a natural walk along two 10 m long lines.We extracted 25 blocks of 25 steps for each leg and subjects (i.e., 50 steps on both legs).Table 3 summarizes the worst, best, and typical cases in terms of MFCV estimation occurrence during the single runs (50 steps) of the 10 subjects' dynamic contractions.The typical case shows that 7.67 m/s is returned, on average, the 94% of the cases.Also, the value 6.57 m/s, which cover the 4%, ensures a physiological value as in [16], but diverges from the expected value.The best case shows the 100% in physiological value estimation [16].Finally, on 1250 dynamic contractions, 1184 ± 11 contractions return the same conduction velocity estimation (94.72%).

Conclusions
In this paper, we presented the FPGA (Altera Cyclone V) implementation and validation of a real-time MFCV estimator, usable in cyclic dynamic contractions such as an ordinary gait.The paper described the implemented 2-electrode comparative MFCV measurements, which exploits the signals from 4 surface EMG electrodes, positioned on the gastrocnemius of both legs.The comparison between the signals is entrusted by the custom implementation of a crosscorrelation algorithm that compares two 1-bit equivalent signals on each leg.The implementation on the FPGA of   the algorithm has been optimized: it exploits only the 22.5% ALMs, 10,874 ALUTs, 9.81% registers, 3.36% block memory, and less than 2.7% of the total wires available.The tool is able to match the real-time requirements, with just a processing latency of 47.49 ± 0.25 ms.
In short-time test (50 steps walk), the system returns the same value in 47/50 cases and the 96% of the estimations are compatible with the medical literature [16], showing repeatability and accuracy and representing a promising m-health solution in the diagnosis and monitoring of DPN.
Finally, Table 4 compares the most important and recent MFCV extraction solutions in terms of the used computing dedicated platform, the applicability in the ordinary life, number of electrodes, easiness of installation, usage, and computing performance.

Figure 1 :
Figure 1: Overall system.(a) EMG positioning lattice with detailed EMG acquisition on 7 selected sites.(b) Block diagram of the architecture.

2. 3 . 1 .Figure 2 :
Figure 2: bEMG extraction from in vivo measurements from A and B electrodes of the left lateral gastrocnemius during walking step.

Figure 3 :
Figure 3: Schematic of the architecture implemented on FPGA.The bitstream generator concerns only a single processing chain (bEMG A ).

Figure 4 :
Figure 4: ModelSim functional testbench of the proposed system, with a comparison window length of J = 5.

Figure 5 :
Figure 5: (a) On-chip area distribution in terms of LABs and registers by Chip Planner tool.(b) Total wire utilization.

Figure 6 :
Figure 6: Compendium of the system power consumption with detailed P DYN .

Table 1 :
Processor-system level design features.

Table 4 :
State of the art solutions comparison table.