Software-Defined Radio FPGA Cores: Building towards a Domain-Specific Language

. This paper reports on the design and implementation of an open-source library of parameterizable and reusable Hardware Description Language (HDL) Intellectual Property (IP) cores designed for the development of Software-Defined Radio (SDR) applications that are deployed on FPGA-based reconfigurable computing platforms. The library comprises a set of cores that were chosen, together with their parameters and interfacing schemas, based on recommendations from industry and academic SDR experts. The operation of the SDR cores is first validated and then benchmarked against two other cores libraries of a similar type to show that our cores do not take much more logic elements than existing cores and that they support a comparable maximum clock speed. Finally, we propose our design for a Domain-Specific Language (DSL) and supporting tool-flow, which we are in the process of building using our SDR library and the Delite DSL framework. We intend to take this DSL and supporting framework further to provide a rapid prototyping system for SDR application development to programmers not experienced in HDL coding. We conclude with a summary of the main characteristics of our SDR library and reflect on how our DSL tool-flow could assist other developers working in SDR field.


Introduction
Software-Defined Radio (SDR) approaches for rapid prototyping of radio systems using reconfigurable hardware platforms offer significant advantages over traditional analog and hardware-centered methods.In particular, time and cost savings can be achieved by reusing tested design artefacts.For example, a reconfigurable computer coupled to a commercial off-the-shelf (COTS) Radio Frequency (RF) daughterboard can reduce development time and lower costs in comparison to a custom-built PCB approach.A broad variety of SDR prototyping platforms is available, such as the USRP and Microsoft Sora [1], and rapid prototyping tools such as National Instruments LabView and GnuRadio [2].The choice of SDR platform and components used to develop a complex SDR system is typically based on a variety of many interrelated selection decisions [3].Important selection decisions during development are influenced by the developers' familiarity with processor and platform architectures, coding languages, design approaches, support for legacy systems, and familiar design tools, among other factors.Development tool-chains and programming languages are highly influential in terms of developer productivity.Similarly, familiarity with the tools can impact the quality and reusability of the designs [3].In this paper we consider a standard VHSIC Hardware Description Language (VHDL) approach to SDR prototyping, for which we develop a reusable library of Intellectual Property (IP) cores for use in Software-Defined Radio (SDR) application prototyping.We test the system using a legacy FPGA-based platform and recent add-on COTS daughterboard as a case study around which we build and test this library.
The aim of our investigation is to establish an effective selection of IP cores for baseline SDR applications, which can 2 International Journal of Reconfigurable Computing improve developer productivity using cores that can be further built upon to realize more specialized application needs.The Reconfigurable Hardware Interface for computatioN and radiO (RHINO) platform [4] is used as a case study for testing the library.In our effort to improve SDR productivity and reduce design complexity, we also propose a tool-flow that uses a Domain-Specific Language (DSL) as its entrypoint to describe algorithms and automate the generation of HDL code.It exploits the parameters of SDR cores in order to integrate these existing library cores in the compilation flow hence enabling rapid prototyping of FPGA-based SDR at high-level of design abstraction.
The SDR IP core library comprises Digital Signal Processing (DSP) cores and input/output (I/O) interface cores.The library, which we are referring to as "SDR IP cores" library, is available for free under the General Public License (GPL) on https://github.com/lekhobola/Rhino-Processing-Blocks.It is designed around use by both the novice and experienced lowlevel HDL developers, providing novice users with experience of using IP cores that support open bus interfaces in order to exploit System-On-Chip (SoC) design without commercial, parameter, and bus compatibility limitations.The provided modules will be of particular benefit to the novice developers in providing ready-made examples of processing blocks, as well as parameterization settings for the interfacing cores and associated RF receiver side configuration settings.DSP cores can be used in any FPGA platforms whereas porting the I/O interface cores requires replacing Spartan-6 clock management and interconnecting libraries with new targetspecific platform libraries.
The DSP cores are realized with fundamental DSP algorithms: Finite Impulse Response (FIR), Infinite Impulse Response (IIR), Fast Fourier Transform/Inverse Fast Fourier Transform (FFT/IFFT), and Digital Down Converter (DDC) algorithms.These DSP cores are accompanied by a description of how they can be integrated into a common Open Standard Interconnection Bus, namely, Wishbone.Furthermore, the I/O interface cores realize the interface control logic for Gigabit Ethernet (Gbe) and 4DSP FMC150 Analog-to-Digital Converter/Digital-to-Analog Converter (ADC/DAC) daughter board, both being part of RHINO.The Gbe interface core uses UDP protocol to enable high-speed data transfer between RHINO and external devices while FMC150 ADC/DAC provides an air interface for RHINO at high sampling rates.A Frequency Modulation (FM) receiver is then built from the IP cores to demonstrate the importance and reusability of the library of IP cores in the real world context of SDR.
The remainder of this paper is organized as follows.In Section 2, we provide a background to the SDR, FPGAs, and reconfigurable computing (RC).Next, the existing IP libraries are reviewed in Section 3.This is followed by the methodology used in developing and evaluating the library of SDR IP cores in Section 4. We continue with the design and implementation of a library of SDR IP cores in Section 5.In Section 6 we perform validation testing for SDR IP cores and results are compared to ideal Matlab simulation results.As a case study for SDR IP cores, we continue the testing by building an FM receiver in Section 7.This is followed by benchmarking all the SDR cores including an FM receiver in Section 8. Finally, we propose a tool-flow that enables highlevel design using SDR IP cores in Section 9 while Section 10 covers conclusion.

Background
The ever increasing popularity and evolution of wireless communication technologies and standards are changing the manner in which wireless services and applications are used [6].The demand and usage of these services by users are growing rapidly and are constantly pushing designs to their limits.Wireless devices are becoming more common and users are demanding the convergence of multiple services and technologies [7] in a single device.These lead to potential challenges in areas of equipment design, wireless service provision, security, and regulation [8].

SDR Systems.
Configurable technologies are a solution to today's increasing user needs for wireless services and applications.These types of technologies are upgradable, reconfigurable, and adaptable to changes in technology standards and need [9].One such technology that offers all these features is SDR.SDR is defined as radio in which hardware components or physical layer functions of a wireless communications system are all implemented in software [10].
SDR prototyping has opened doors to many possibilities in the field of radio communications.Owing to its rapid growth in recent years, it has gained popularity and has also found wide adoption in the analysis and implementation of many wireless communications systems.Traditional systems are now replaced by SDR systems because of their high reconfigurability and increased capabilities which suit modern wireless communications technology [9,10].
SDR relies on a general purpose hardware that is easy to program and configure in software to enable a radio platform to adapt to multiple forms of operation such as multiband, multistandard, multimode, multiservice, and multicarrier [6,10].A typical SDR transceiver is depicted in Figure 1.The analog RF front-end converts RF signals to Intermediate Frequency (IF) signals in the receiver chain while the transmitter converts IF signals to RF signals.This is also where signal preconditioning and postconditioning using analog functions such as amplification and heterodyne mixing prior to ADC and after the DAC take place [6,8].The DSP performance largely depends on the digital computing hardware device used.Furthermore, improved and higher sampling ADCs and DACs are pushing the tasks traditionally performed in analog closer towards the antenna, hence allowing them to be processed digitally using processors or reconfigurable devices [11].However, a drawback is that the ADCs and DACs are usually costly, and achieving high sampling rates (over millions of samples per second) remains a limitation in SDR [6]; this is a motivating factor for reusable SDR platforms for prototyping to share the cost of the same platform across multiple projects.has revolutionized the field of SDR.FPGAs are made of highly reconfigurable and multiple logic blocks and cells together with switch matrix to route signals between them [12].Their flexibility and speed have made them popular and are preferred to lay a general purpose hardware platform for SDR.The reconfigurable and parallel characteristics of FPGAs enable computationally intensive and complex tasks to be processed in real time with better performance and flexibility.These features have seen them gaining popularity over traditional general purpose processors (GPPs) and DSP processors [10].For these reasons, they are used in RC as their structure can be reconfigured during start-up or runtime to perform advanced computations [13].

Design for Reuse.
FPGAs have led to the concept of design for reuse which is a driving factor in enhancing the productivity and improving the system-level design in SDR applications.A library of parameterizable FPGA cores makes a design for reuse effective [14].The timing, area, and power configurations are the key to SoC success as they allow mixand-match of different IP cores so that the designer can apply the trade-offs that best suit the needs of the target application [15].

Review of Existing IP Libraries
The continuous design and implementation of a library of HDL cores, called IP cores in this paper, is increasingly driven by the desire to meet shortest possible time-to-market.This has led to greater demands of minimal development and debugging time [14,16].Many of the IP libraries have one or more of the characteristics listed below [14,[16][17][18][19]: Hardware designers are relying on predesigned IP cores from the IP libraries to increase productivity and reduce design time.However, many of the FPGA vendors and third-party IP libraries are static [18].A static IP does not allow high performance to be achieved even when hardware resources or power budget is available nor does it achieve better performance to save both size and power consumption [18].Integrating the third-party IPs can also be a challenge.It is often time-consuming and error-prone [19].The IP libraries developed by private vendors are expensive and prohibitive to low-cost prototyping [20].
All the above shortcomings of private vendor IP libraries have led to new open-source hardware development models where reusable IPs are developed and made freely available to the public.Two examples of communities supporting open IP cores are OpenCores and GRLIB.OpenCores has the considerable number of IPs as well as Wishbone bus and its cores are accessible for free; however, OpenCores IPs are not parameterizable [20].Likewise, GRLIB has many IP cores, interconnected by AMBA-2.0AHB/APB bus on a SoC design.But a drawback of using GRLIB is that not all the IP cores are free [19].The first step involved consultation with SDR experts.We interviewed and corresponded with these experts involved in FPGA-based design and implementation, both from industry and academia.Members of the engineering team at the Square Kilometre Array (SKA) were corresponded with in order to gain suggestions and feedback related to designs, processing requirements, and interfacing techniques; a total of three staff members contributed insights.We also met and corresponded with researchers involved with FPGA work at the University of Cape Town, including research scientists, postgraduate researchers, research officers, and academic staff; insights from four senior researcher staff and three postgraduate researchers were obtained at the university.The insights gained from this process were then used to prepare the design for the SDR core library and to decide the parameters that the cores should provide.The SDR cores were then coded using VHDL.Section 5 presents the subsequent design of the SDR IP cores library, starting by explaining the generic architecture that the cores fit into, then discussing the choice of SDR cores and how these were divided into DSP cores and I/O interface cores, and then explaining structure, parameters, and interface for each of the cores provided.

Methodology
During the second step, testing of the individual cores was done to validate their operation; this was done using simulation and test vectors and by running the cores on a hardware platform.As part of this step, a reconfigurable hardware platform was chosen on which to test the cores.The chosen platform contained an FPGA that connected directly to a high-speed sampling card and to an Ethernet port for sending data.The results of this testing are shown in Section 6.
In the third step, we developed a representative SDR application that used the cores to confirm their operation and adequate performance when integrated as part of a complete SDR system.This application involved the development of an FM receiver for which digitally downconverted data was transferred over Ethernet to a host computer for demodulation and playback.
In the fourth step, our SDR cores were benchmarked against alternate cores available from other libraries.This benchmarking was done to confirm that the cores did not utilize an excessive number of logic elements compared to alternate solutions and that the operational clock rates of our cores were at adequate speed.Section 8 reports the benchmarking results.
In the final step, we propose a DSL to support rapid integration of the SDR cores for prototyping FPGA-based SDR applications without the programmer needing to have experience with the use of HDL coding.The proposed DSL and supporting tool-chain is presented in Section 9.

Design of SDR IP Cores
This section discusses the design of a library of SDR IP cores which is divided into DSP cores and I/O interface cores.For the design of DSP cores, we follow a modular coding approach using technology-independent logic elements which result in simple and reusable functional blocks.Likewise, for the design of I/O interface cores, the coding style is still modular but it comprises both technologyindependent and technology-dependent functional blocks.Many commercial cores are closed source, licensed modules provided as monolithic hardware routing implementations optimized for and compatible with specific FPGA chips.A benefit of our cores is the availability of the underlying source code and therefore can be customized and be optimized further to meet the design requirements.
Although the SDR cores were tested on RHINO platform, the generic design of DSP cores using modular FPGA elements makes their portability possible without any changes or optimizations in the design, whereas porting of I/O interface cores to a wider range of platforms would still need an additional logic description or replacement of platformspecific elements in modules composed of such elements.Novice developers wanting to reuse these processing cores would consequently be advised to review the theoretical operation of the cores, possibly trying them in Octave or Matlab to gain a practical understanding of their behavior or limits, whereafter they would be familiar with the parameters concerned and be well prepared for moving to the FPGA, RHINO-based context of application of making these processing operations work in real time.these DSP algorithms have been used by both commercial and open-source IP designers to implement their IP cores; however obtaining optimal results depends on the RTL coding style at a low level of design abstraction; thus the commercial solutions, in particular, are likely to have significant optimization performed for their proprietary implementations.Commercial IP cores are typically optimized for deployment on specific platforms whereas the opensource community hardly follows similar levels of consistency and standardization needed to implement such high-quality designs.In this work, we pay much attention to good RTL design conventions and practices typically obtained from consultation with experts and our own experience in FPGA design.The examples of technical recommendations that help in optimization of the cores during synthesis include (1) using modular coding style, (2) using a synchronous design approach, (3) avoiding latches, long combinatorial loops, and paths, (4) synchronizing resets at each clock domain, (5) using compatible platform-specific hardware resources such as memory, clock, and I/O management IP libraries, (6) isolating clock gating and switching, and (7) using clock PLLs and management blocks.All these design practices and many others not mentioned above make the optimization by lowlevel synthesis tools easier and effective.We also parameterize the cores to make it possible for future integration in the highlevel synthesis tools.
The general structure of the design of the DSP cores is shown in Figure 2.These cores are implemented using fixed-point arithmetic [21] and are designed to operate with up to 130 MHz clock frequency.The configuration of core parameters such as data width, a number of filter coefficients, and FFT/IFFT length is performed through VHDL generics, hence providing a user with a wide range of options during the design process.

Selection of the DSP Cores.
In this project, only restricted selection of IP cores was developed due to time constraint.These were chosen specially in consideration of common SDR processing needs and was also based on a thorough literature review as well as establishing priorities on what was needed for RHINO platform.The following processing DSP IP cores were chosen for inclusion into the library: FIR filter, IIR filter, FFT/IFFT modules, and a parameterized and highly scalable DDC core.Furthermore, I/O interface cores, namely, FMC150 core and UDP/IP core, were developed for their importance in providing high-speed data communication with external devices.

Connecting the DSP Cores.
As shown in Figure 2, the DSP cores can be interconnected with other cores using high-speed parallel interface at operating frequency of up to 130 MHz.The cores are also designed around a common SoC interface, namely, Wishbone, whose purpose is to further improve the reusability of these cores on a SoC design.The Wishbone slave control logic manages read-write operations of the slave registers while the first in first out (FIFO) memory stores incoming input data and outgoing processed data.

FIR IP Core.
The FIR IP core is designed to enable modularity and scalability of SDR applications with the assurance of maximum attainable clock speed.With the support of five different FIR structures, the user has a wide range of choices to synthesize efficient FIR filter that meets the design needs under consideration.The top-level block diagram of the FIR IP core is depicted in Figure 3.
The proposed FIR core realizes a number of structures which include transposed parallel FIR structure, averaging FIR filter, and two optimized realizations, namely, even and odd symmetric parallel FIR filters [22,23].Parallel FIR architectures are designed around low-order, high-performance applications while optimized architectures are to be used in high-order, applications and where resources are limited.
The FIR core operation depends mainly on the structure chosen by the designer and the diagram that summarizes the operational flow based on the selected FIR structure is shown in Figure 4 (odd symmetric lter) M = ＝？ＣＦ(ＨＯＧ＜？Ｌ_Ｉ＠_＝Ｉ？ffＭ/2) (even symmetric lter) other filter structures use coefficients stored in the distributed RAM or rather load coefficients from an external source.The user decides whether to use distributed RAM coefficients or to load them from an external memory.The FIR core does not begin filtering process until the coefficients loading is finished.If internal coefficients are used, filtering occurs immediately without waiting for loading to happen.

IIR IP Core.
The core is built from a basic structure of a second-order IIR filter also known as biquad of Direct Form I [22].IIR core allows cascading of the biquads to build higher order IIR filters without experiencing coefficient-sensitivity problems.This IIR structure with a cascade of biquads is called Second-Order Sections (SOS).The block diagram of IIR core designed is shown in Figure 5.The IIR core recursive and nonrecursive coefficients for each biquad are configured by the user.[24] is exploited to implement a complex pipelined R-2 2 SDF architecture of the FFT on an FPGA.The high-level block diagram of the designed FFT IP core is shown in Figure 6.The implemented FFT core is further used to implement an IFFT core.The procedure is straightforward as the IFFT is computed by conjugating the twiddle factors of the corresponding forward FFT output [25].Some benefits of using R-2 2 to design the FFT core are that its FFT architecture has simple pipeline control and reduced multipliers by a factor of ( − 1)/2 compared to Radix-2 and Radix-4 which are used to design an FFT for Xilinx IP Cores Library [26].The designed FFT/IFFT core length can be configured to 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096 by the user.However, larger size FFTs can be implemented using a Matlab script accompanying a core.

FFT/IFFT IP
An example of 32-point R-2 2 SDF FFT is illustrated in Figure 7. Data arrives at the input in a sequential order and output data leaves the core in a bit-reversed order.For an FFT with  points, a complete stage consists of two butterflies, namely, BFI and BFII, delay feedback shift register, and a twiddle factor complex multiplier.On the other hand, half a stage only has a single butterfly which is BFI.

DDC IP Core.
The developed DDC core is highly configurable and can be tailored easily to meet many SDR multirate applications needs.Figure 8 illustrates the top-level block diagram of the DDC IP core.This can be used in SDR applications to perform the first processing after ADC.The DDC performs the tasks such as frequency downconversion, sample rate reduction, and high-speed filtering [27].
The DDC structure is realized as shown in Figure 9.The structure is composed of Numerically Controlled Oscillator (NCO), digital mixer; Cascaded Integrator Comb (CIC) and FIR filter were all designed to complete the structure of the DDC.

FMC150 Interface
Core.The FMC150 is designed with TI's ADS62P49/ADS4249 dual-channel 14-bit 250 MSPS ADC and TI's DAC3283 dual-channel 16-bit 800 MSPS DAC.The TI's CDCE72010 PLL is the clock distribution device that provides a clock to drive the DAC and ADC.The internal clock source can optionally be locked to onboard 100 MHz or external reference clock [28].
The FMC150 core presented in this section provides Low-Voltage Differential Signaling (LVDS) interface to a 4DSP FMC150 daughter card as depicted in Figure 10.A design example in Figure 10 is configured on ADC sampling rate of 61.44 MSPS and sends digital samples to DAC at 61.44 MSPS rate.The maximum sampling rates that FMC150 core were tested on using RHINO are 163.84MSPS and 245.76 MSPS for ADC and DAC, respectively.monitor, and control the Ethernet interface.This section presents a design of a UDP/IP core based on the combination of Internet Protocol version-4 (IPv4) and User Datagram Protocol (UDP) in order to provide a high-speed and efficient solution for communication over a Gbe.

UDP/IP
FPGA devices require Ethernet Media Access Controller (EMAC) to interface with the physical layer (PHY) chip on the board [29].RHINO uses an integrated Marvell 88E111 PHY chip.The PHY is needed for the FPGA to connect with external devices.The user logic can be deployed to configure the EMAC physical interface [29] in a form of wrapper files.In our case, the wrapper files configure the OpenCores Trimode MAC [5] which is published under the GNU Lesser General Public License (LGPL).This is a very cost-effective and nonrestrictive solution in comparison with proprietary Media Access Controllers (MACs) such as Xilinx's Trimode Ethernet Media Access Controller (TEMAC) [30] which is costly.Furthermore, the OpenCores Trimode MAC IP core supports data rates of 10, 100, and 1000 Mbps and is compliant with IEEE 802.3 specification [5].Our UDP/IP core is only configured on 1000 Mbps speed.
The architecture of the UDP/IP core is illustrated in Figure 11.Address Resolution Protocol (ARP) is used to resolve the sender and receiver MAC addresses before packet data communication.An OpenCores Trimode MAC [5] is responsible for delivering data over a shared physical channel.The MAC consists of two user interfaces that simplify the connection to a PHY.It encodes/decodes packet data  Figure 11: Structure of UDP/IP core for interfacing and control of a Gbe (uses OpenCores Trimode Ethernet MAC [5] to interface with a PHY).
to/from PHY during transmission and reception of data using Gigabit Media-Independent Interface (GMII) between a MAC and a PHY.The second interface is serial and it is called Management Data Input/Output (MDIO) bus.It transfers configuration data to a PHY and it is also used to read PHY status registers.The IPv4 is used by the designed UDP/IP core to deliver messages between the RHINO and a destination device.The IP addresses are configured statically and they must be in the same subnetwork for successful communication to happen.UDP is chosen as a transport layer protocol.It is used in this design for its simplicity and the fact that it supports high-speed and real-time data transfers [31,32].

Testing and Results
In order to verify the functionality and correctness of these cores, testing which involved behavioral and functional simulation was performed.Each DSP core was successfully synthesized on Spartan-6 of RHINO and tested from input data generated using Matlab.After the core had processed the data, the results were stored in an output file as a vector of samples.Matlab scripts were used to plot graphs and perform further signal processing of the results for analysis.The general experimental setup for DSP cores is shown in Figure 13.The operating frequencies of 100 MHz were used when testing the FIR, IIR, and FFT cores.For a DDC core, 122.88 MHz of clock frequency was used.The RHINO platform was designed to be a combination of an education and training platform for learning about reconfigurable computing and as a research and prototyping platform for studies related to SDR [4,33].The two main processing elements of RHINO include ARM processor and Spartan-6 FPGA as shown in Figure 12.The computationally intensive functions are processed by the FPGA while the ARM processor provides configuration, control, and interface function with FPGA through Berkeley Operating System for Reprogrammable Hardware (BORPH) [4,34].BORPH is an extended Linux kernel that allows control of FPGA resources as if they were native computational resource [34].This, as a result, allows users to program the FPGA with a given design or configuration and run it as software process within Linux.
Other building blocks of RHINO include FMC connectors which enable interface with ADC, DAC, and mixed  signal daughter cards, supporting sample rates over 1GS/s [35].The 1/10 Gbe connectors provide a high-speed network connection between the FPGA and remote devices using standard TCP or UDP transport layer protocols to convey packets of data.In comparison to the FIR core results, the IIR core is highly selective and uses fewer coefficients leading to better results.

6.4.
Testing FFT/IFFT Core.This testing involved generating an input vector (length = 1024) of a rectangular pulse as shown in Figure 16(a) and processing it with a 1024-point FFT/IFFT core.This was used at the input of the core operating in FFT mode.The output of the FFT core as shown in Figure 16(c) was later used as an input data to the IFFT core.The FFT core yielded the sinc waveform in Figure 16(c) which was the expected Fourier transform of a pulse waveform.This also matched with the Matlab generated FFT of the pulse wave shown in Figure 16(b).As expected, the IFFT core produced the original rectangular pulse waveform which is illustrated in Figure 16(d

Validation: Development of FM Receiver
This section reports on the development of a wideband digital FM receiver.This is used for validation of the proposed SDR IP cores library.Using the SDR IP cores which incorporate DSP cores and I/O interface cores, this prototype serves as a proof of concept that the cores can be used not only in this FM receiver design but also to prototype other real-time SDR applications.The complete design of FM receiver comprises an analog RF front-end circuitry and digital receiver which forms the largest part of the FM receiver processing.
where  fm-signal is FM signal power in dBm and  fmc150-ADC is ADC input signal power in dBm.precision ADC.The ADC sampling rate   is determined by using a bandpass sampling criterion [37] in (2) where  falls in a range 1 to 5. We choose  = 2 and this results in a wide frequency range of 108 MHz-176 MHz valid for ADC bandpass sampling.We choose 122.88 MSPS as the ADC bandpass sampling speed.The chosen ADC sample rate results in the digitized FM signal downshifted from 88-108 MHz band to 14.88-34.88MHz band.
where  is given by 1 ≤  ≤ (  /(  −   )),   is high frequency, and   is low frequency.The 14-bit samples received from the ADC are extended to 16-bit signed words which are directed into the DDC core input.To generate a complex baseband I/Q signal, the sine and cosine waveforms are generated using the NCO core and then multiplied with the FM signal using digital quadrature mixer.The frequency of the NCO output is equal to the frequency of a sampled FM signal and it ranges between 14.88 MHz and 34.88 MHz.
After mixing down the FM signal using the quadrature mixer, the image and mixer products are eliminated by the CIC filter which uses zero multipliers in its implementation.This CIC filter also decimates the 122.88 MSPS ADC rate to 960 kSPS by decimation ratio of 1 : 128.Despite its low cost and efficient and simple implementation, the CIC filter introduces undesirable droop in its filter response passband [38].To correct this nonflat response in the passband of the CIC filter, the compensation FIR (C-FIR) filter is used.
The CIC and C-FIR filter specification parameters and their respective filter responses are shown in Figure 25.In this same figure, the total filter response is shown which results after decimation and compensation.
International Journal of Reconfigurable Computing

Data Packetization.
A data packet that is composed of 32-bit samples needs to be generated before the actual construction of a UDP packet.Each 32-bit is decomposed into 16-bit I/Q data samples.Generation of data packets is facilitated by a length-N double buffer that lies between a DDC core and UDP/IP core as shown in Figure 24.In this example, we choose  = 33.As illustrated in Figure 26, a double buffer accepts samples from a DDC core and writes them to BUFFER 1 buffer.The UDP/IP core then reads samples from BUFFER 2 as a unit to generate a data packet that is padded to a data field of a UDP packet.Furthermore, double buffering also solves a producer-consumer problem during concurrent read and write process by DDC core and UDP/IP core, respectively.Each data packet generated forms part of the UDP data field where each 32-bit sample in a packet is represented by 16-bit I/Q samples as shown in Figure 27.performed by tuning to a local radio station at 94.5 MHz (K-FM).The final output of the FPGA processing was complex I/Q samples centered at DC as shown in Figure 29(a).These samples were then demodulated in Matlab using arctan/differentiation FM demodulator at the PC end.The output of the FM demodulator is a real-valued signal and is shown in Figure 29(b).The results clearly show the mono audio, pilot tone, stereo audio, and RBDS spectral components; however, stereo audio and RBDS are not distinguished due to a weak FM signal received by the ADC.The ADC tends not to be sensitive to signals with power way below 10 dBm.
Increasing the analog RF front-end gain will improve results.

Benchmark Results
We benchmark our IP cores using Xilinx ISE v14.7, targeting the Spartan-6 xc6slx150t FPGA found on RHINO platform.We do the same with cores found in Xilinx DSP core library and OpenCores where they both represent commercial and open-source cores, respectively.The benchmark results  shown in Table 1 use metrics of FPGA resource utilization and the maximum clock speed that can safely be used to execute each core.The key parameter values for each of the SDR cores are as follows: (1) The number of coefficients for the FIR core is 21 with data width set to 16 bits.
(2) The IIR core is benchmarked on 16 stages with 16-bit data.
International Journal of Reconfigurable Computing   (3) The number of FFT core points is set to 1024 with data width set to 16 bits.
(4) The DDC core is configured to operate on 122.88 MSPS of an RF signal and output 1.28 MSPS of a baseband signal.
The Xilinx library does not have an IIR core while OpenCores does not have a DDC core needed to perform benchmarking.
The Xilinx DSP cores exhibit more performance and use the fewest FPGA resources overall.The static nature of the Xilinx cores allows easy access to generic parameters through a core generator wizard which makes it very complex to even modify the generated code.Our cores have performance that is slightly less than Xilinx cores but with comparable FPGA resource utilization.They also expose easy interface for configuration of generic parameters of the cores.The OpenCores cores have the lowest performance and the largest resource occupation on the FPGA.Furthermore, they have other constraints such as limiting the number of FIR core coefficients to less than 22 and the FFT points are limited to 1024.
The results have shown that our SDR cores achieve significant performance and use less resources while exposing design parameters to the user in the VHDL code.Further benchmark tests for a Gbe core, FMC150 interface core, and the FM receiver application are performed and results shown in Table 1 show a satisfactory performance needed for SDR.

Proposed DSL and Tool-Flow for SDR
We briefly introduce a new SDR high-level synthesis, namely, SdrHls which enables the FPGA design of SDR applications using a Domain-Specific Language (DSL).Instead of translating a DSL into new FPGA functionality, SdrHls maps user design specifications in DSL onto parameterizable      SDR IP cores and stitches the cores together to yield the desired FPGA design.It achieves this by adopting a modelbased design approach employing a Synchronous Dataflow (SDF) [39] while also leveraging the high-level topological design patterns of a DSL that are based on a functional language called Scala.There are other dataflow-based HLS tools for fast prototyping of DSP applications such as Ptolemy [40], LabView [41], and Simulink [42].These tools provide intuitive methodologies to specify, simulate, and/or execute DSP applications.However, they strive to provide generic solutions in all areas of DSP field thereby resulting in compromised efficiency of produced designs.SdrHls is a domainspecific tool optimized for producing efficient FPGA-based SDR designs and uses idioms and notations familiar to domain users of SDR.An overview of our design flow for SdrHls is illustrated in Figure 30 and its description is backed by a typical SDR design example of the FM receiver discussed in Section 7.

A DSL and High-Level Compilation.
We begin with an application specification that represents an SDR algorithm to be translated into an FPGA design in VHDL.This is specified using a DSL that employs a dataflow-based approach as shown Listing 1 and the corresponding SDF diagram is illustrated in Figure 31.The expressiveness and conciseness of the rich DSL syntax enable the intuitive description of a system, therefore, raising the low-level FPGA design abstraction to a higher level of design abstraction.This makes it easy for domain users with limited or no hardware design skills to generate hardware design and for skilled users to improve productivity.Depending upon the system requirements, the user selects from the existing library of SDR cores the components needed to construct a complete system.The parameters for components can optionally be configured using a DSL and the SdrHls will assign default values for the unset parameters during high-level synthesis.Furthermore, the parameters set in a DSL are later mapped to VHDL generics in the final hardware design.Such parameters can be defined as static values in a DSL or read from data files stored in a local memory.Most importantly, a DSL allows parameter values to be dynamically generated using a DSL itself and the compiler will assign static values for the parameters.A typical example is generating the filter coefficients for a FIR core.
The first six lines in Listing 1 include the SdrHls DSL compiler library and the Delite library and define an object for running the main SdrHls main application method.The fourth line generates a list of coefficients for a compensating filter of the CIC filter in Figure 24.The compensator constructor takes in parameters set with values as follows: sample rate change = 128, a number of stages = 10, and differential delay = 1.This is followed by the configuration of the parameters which correspond to VHDL generics of the IP cores.In this example, the ADC core takes no parameters as denoted by Nil, and the DDC core parameters set include data width and filter coefficients with other parameters not shown to make the code brief.The Gbe is set with 64 bytes of transmitted payload which is calculated using (3).The Component object is used to define the IP core and it takes in the VHDL component name of the IP core together with its generic parameters.The Chain object is a topological design pattern that creates a cascade of component objects connected to each other using FIFO channels.The square brackets after each of the components define the consumption and production rates of each component or actor in an SDF dataflow.The rate of zero denotes nonexisting input or output channel while the existence of a channel is denoted by the rate greater than zero.Lastly, The DSL application (Scala code) is input into a Delite framework compiler framework [43] which runs a scalac to convert the DSL into Java bytecode.The produced Java bytecode is then executed (staged) to create Intermediate Representation (IR) and perform optimizations.The Delite IR comprises useful artefacts, namely, dependency graph and a sea of nodes representing DSL operations which are transformed into a dataflow model for the system.Delite is a highly extensible compiler framework and runtime environment developed by Stanford PPL.It provides developers with reusable components like parallel patterns, optimizations, and automatic code generators to facilitate construction of parallel embedded DSLs.The current support of resultant languages includes C++, CUDA, OpenCL, and Scala.Support for configuration of FPGAs and deployment of Delitegenerated executables onto FPGA platforms for deployment onto Altera  Hardware Description Language (DHDL) and generating hardware code for Maxeler platform in MaxJ [44,45].However, DHDL only generates MaxJ for Maxeler platforms other than VHDL or Verilog which are used universally.DHDL is also targeted for applications in domains of machine learning, image processing, financial analytics, and internet search, all of which are not naturally related to SDR applications.In this work, we intend to improve productivity and performance of reconfigurable designs for SDR while also increasing portability using platform independent synthesizable VHDL.

System
Modeling.The application model is specified using a Dataflow Interchange Format (DIF) Language [46] shown in Listing 2. A DIF specification formally captures the dataflow semantics of various dataflow models and performs analysis of topological information contained in a dataflow.We use SDF in our design for its straightforward static dataflow scheduling and analysability of throughput and buffer requirements.The DIF performs analysis, scheduling, and functional verification of the system modeled in SDF.When all these functions are complete, the application modeled in SDF is now ready for mapping onto a hardware.9.3.Code Generation.This process involves generation of the VHDL code shown in Listing 3 which represents the design described using a DSL.The SDF dataflow-based design is converted into hardware description using SDR IP cores and FIFO buffers.The SDR IP cores become SDF actors and FIFO buffers act as SDF channels.We use VHDL Manipulation and Generation Interface (vMagic) [47] to read IP core library as well as writing the VHDL code of FPGA FM receiver system.The final step is the conversion of VHDL code into a bitstream using a third-party tool called Xilinx ISE.The bitstream is then loaded on the FPGA device for execution.

Conclusion
In this paper, we presented the design of a modular, reusable, and parameterizable library of SDR HDL cores.These cores provide both wishbone-compatible interfaces and direct parallel interfaces.DSP cores for processing and I/O cores for connecting to an ADC sampling daughterboard are provided together with an Ethernet data streaming core for sending data from the FPGA to a host computer.Functional validation testing was done for each core using a reconfigurable computing platform, namely, the RHINO platform (see Section 6.1), and the cores were tested working together in a representative SDR application, namely, an FM receiver shown in Section 7.4.In order to link the SDR processing cores to an input stream of sampled data, the FMC150-ADC core would need to be customized to be compatible with the sampling hardware.The test also demonstrated how the cores and their parameterizability allowed for rapid assembling of the SDR system.The SDR cores were benchmarked in
Ability to consume fewer FPGA resources

Figure 6 :
Figure 6: An architecture of the FFT IP core.

Figure 8 :
Figure 8: An architecture of the DDC IP core.

Figure 9 :
Figure 9: A structure of the Digital Down Converter.

Figure 13 :
Figure 13: Process of configuring a DSP core.
Core.The FIR core was verified by designing length  = 95 FIR bandpass filter to specifications shown in Figure14.The resulting Parks-McClellan optimal FIR coefficients of 16-bit width are illustrated in Figure14(b) in a form of filter frequency response.The filter was tested on an input signal consisting of a sum of sinusoids at frequencies 440, 800, 2200, and 2500 Hz as shown in Figure14(a) and only a 2200 Hz was isolated by a bandpass filter.The results shown in Figure14(d) closely match with the results of the ideal filter in Matlab shown in Figure14(c); however, the Signal-to-Noise Ratio (SNR) has slightly decreased due to quantization and round-off errors.6.3.Testing IIR Core.Testing the IIR filter was similar to FIR testing in Section 6.2 except for filter response shown in Figure 15(b) which was designed with Chebyshev Type I filter to specifications shown in Figure 15.The results of the IIR core obtained are shown in Figure 15(d) which closely match the ideal Matlab results shown in Figure 15(c).

7. 1 .
Analog Front-End.The block diagram of the analog RF front-end design is shown in Figure23and is sensitive to −65 dBm FM signal.The indoor FM antenna with a variable gain of 36 dB receives the FM signal in the frequency range of 88-108 MHz.The front-end also provides a bandpass filtering of FM band and a total gain of 75 dB to make the FM signal compatible with the input swing of the ADC.The overall gain is determined as Total Front-End Gain (dB) =  fm-signal +  fmc150-ADC = 10 + 65 = 75 dB,

Figure 17 :
Figure 17: Results of DDC core and FM demodulator.(a) FM-modulated signal generated in Matlab, (b) magnitude spectrum of mixer, (c) magnitude spectrum of CIC-1, and (d) magnitude spectrum of a C-FIR filter.

7. 4 .
Final Test: FM Receiver.The block diagram showing the experiment setup is shown in Figure 28.The test was

Figure 18 :
Figure 18: FM demodulator output showing the demodulated signal with transients and after removing the transients.(a) Magnitude spectrum of FM-demodulated signal, (b) FM-demodulated signal; (c) magnitude spectrum of FM-demodulated signal without transients, (d) FM-demodulated signal without transients.

Figure 19 :
Figure 19: Experimental setup for a streaming core using FMC150 ADC core and Gbe core.

Figure 20 :Figure 21 :Figure 22 :Figure 23 :
Figure 20: 20 MHz tone ADC output streamed using UDP.(a) ADC input of 20 MHz tone generated with function generator, (b) FFT for 20 MHz ADC signal captured on a PC after streaming.

Figure 24 :Figure 25 :
Figure 24: An architecture of the digital FM receiver. 0

Figure 26 :
Figure 26: Double buffering used between the ADC or DSP output and Gbe input.(a) Writing DSP samples to BUFFER 1 and creating a data packet by concurrent reading of BUFFER 2 samples.(b) Writing DSP samples to BUFFER 2 and creating a data packet by concurrent reading of BUFFER 1 samples.

Figure 29 :
Figure 29: The results of FM receiver when tuning to 94.5 MHz radio station.(a) Spectrum of 94.5 MHz FM station baseband signal which is the output of a DDC core, (b) spectrum of FM-demodulated 94.5 MHz FM station in Matlab; the signal is real-valued after demodulation.

Figure 30 :Figure 31 :
Figure 30: A high-level synthesis design flow for SDR.
. Except for a moving average FIR filter, all International Journal of Reconfigurable Computing Cores.This section presents a development of FMC150 ADC/DAC interface core and a UDP/IP core for Gbe which are designed to be operational on RHINO. ).
sample rate as shown in Figure17(c).The nonideal response of the CIC filter was corrected by introducing a compensation FIR filter in the final stage of the DDC and its output is shown in Figure17(d).After the digital downconversion, the FM demodulator was used to demodulate the FM signal.The magnitude

Table 1 :
IP core benchmark results for Xilinx, OpenCores, and SDR cores.
Figure 27: UDP frame data field format.Data field is composed of 33 samples collected from a DDC core output.