Open-Source Ethernet MAC IP Cores for FPGAs: Overview and Evaluation

,


Introduction
Today, Ethernet is by far one of the most important, if not the most important computer network technology [1,2]. It was developed in 1973 at Xerox Palo Alto Research Center (PARC) and has been approved as the IEEE 802.3 standard in 1983. Since then the original Ethernet technology was further developed to a great extent, and today the IEEE 802.3 standard includes numerous supplementary sections resulting in thousands of pages of documentation. During the last decades, Ethernet has become the dominant LAN technology to interconnect computers in, e.g., homes, ofce buildings, or at university campuses. Over the years and apart from its intended purpose, Ethernet is increasingly used in more and more other application felds such as telecommunications [3], the automotive sector [4], industrial automation [5], and even avionics [6]. Te eforts to make use of Ethernet in industrial applications by replacing traditional feldbus technologies are commonly summarized under the term "Industrial Ethernet" [7]. However, this term does not refer to a single standard but rather to various approaches aiming to introduce determinism and real-time behavior, rugged connectors, or networking infrastructure with extended temperature range in order to work in harsh environments. Examples for Industrial Ethernet protocols are PROFINET, EtherCAT [7], and EtherNet/IP (where "IP" refers to "Industrial Protocol," see [8]).
Ethernet basically covers Layer 1 and Layer 2 of the Open Systems Interconnection (OSI) model including specifcations of the communication media (see Figure 1). Te classic Ethernet implementation made use of a coaxial cable while for recent variants, twisted pair and fber optic links are the most common types of media. Typical transmission speeds in today's Ethernet implementations are 10, 100, and 1000 Mbit/ s. However, bitrates of 10, 40, and 100 Gbit/s are also standardized since some years and the IEEE 802.3bs physical layer specifcation (adopted in 2017) supports a transmission speed of up to 400 Gbit/s. Te physical layer of Ethernet is often abbreviated as Physical Layer (PHY) while the data link layer, which includes both the Medium Access Control (MAC) and the Logical Link Control (LLC) sublayer, is commonly referred to as Medium Access Control (MAC). Both PHY and MAC are implemented in a hardware device while upper protocol layers sitting on top of Ethernet (e.g., TCP, UDP, IP, ARP, and ICMP) are most often implemented as a software stack running on a CPU or microcontroller (however, hardware implementations for the previously mentioned upper protocol layers are also available, e.g., for applications with demands on high throughput and/or low latency [9]). Te Ethernet PHY, the MAC, and the CPU (processing the upper protocol layers) can be either separate devices or a single-chip solution (e.g., a microcontroller that integrates both the PHY and the MAC on-chip). If the PHY and the MAC are individual devices, they are interconnected over the so called Media Independent Interface (MII) whereby different variants of this interface exist, such as Reduced Media Independent Interface (RMII) or Gigabit Media Independent Interface (GMII). Te physical media is connected to the PHY over the Media Dependent Interface (MDI).
Due to the widespread use of Ethernet, there is consequently a high demand on available Ethernet implementations. When an Ethernet interface is required for a Field-Programmable Gate Array (FPGA) or Application-Specifc Integrated Circuit (ASIC) design, a common method is to integrate the MAC and the CPU (that processes, e.g., a TCP/ IP stack) on-chip. For this reason, Ethernet MACs are available from FPGA vendors, Intellectual Property (IP) providers, and other companies in form of an IP core. However, existing commercial solutions come with some limitations: (i) IP cores from FPGA vendors are often technologydependent which makes it difcult to port the design to devices from other FPGA vendors. (ii) IP cores that are provided as a netlist cannot be modifed by the user, making it impossible to change the existing design (in order to adjust the receive/transmit bufer sizes, replace the on-chip bus interface, etc.), add additional features (e.g., to move functionality to dedicated ofoad engines), or to fx bugs. (iii) Finally, license fees have to be paid for most commercial IP cores.
Fortunately, the open-source idea that is known from the software domain for a long time also became popular in the world of hardware design since some years [10]. Tus, the usage of an open-source Ethernet MAC IP core can be a solution to overcome the limitations of commercial IP cores mentioned previously which fnally was the motivation for the authors of this work. Te goal of this paper is to survey available open-source Ethernet MAC IP cores, evaluate existing designs in terms of performance, resource utilization, code quality, or maturity, and to present and summarize the evaluation results herein. Te following sections are structured as follows. Section 2 provides an overview of existing work related to the scope of this publication. Section 3 presents results of our survey on opensource Ethernet MAC IP cores including information such as Internet source and license model for each IP core, design language, supported bitrates, PHY and application interfaces, available documentation and testbenches, existing reference implementations, and features like support for DMA transfers, VLAN tagging, or Precision Time Protocol (PTP). Furthermore, the consumed FPGA resources are compared in this section in terms of logic resources, memory, and other technology-specifc building blocks (Phase-Locked Loops (PLLs), delay elements, etc.). Here, the synthesis and linting reports are discussed on a large basis since they refect the code quality of the IP cores. Eight projects have been selected for a closer evaluation whereby details of the selection process are outlined in Section 4. In Section 5, the evaluation and measurement setup for the selected projects (FPGA platforms, used wrappers and hardware environment to embed the IP cores, Ethernet tools, etc.) is presented. Finally, the results of the various measurements (network throughput, latency, and packet loss at diferent frame lengths) are provided in Section 6. Te paper is concluded by a discussion of the evaluation results and provides a brief outlook to our future research work in Section 7.

Related Work
Existing publications that are related to this work cover, e.g., Ethernet MAC designs for FPGAs, hardware implementations of network protocols such as IP, UDP, or ARP, and papers where Ethernet MACs only act as a use case to investigate other research topics. For example, Qian et al. introduced a Verilog implementation of a 10/100 Mbit/s Ethernet controller in a short paper [11]. Unfortunately, numbers concerning network performance or resource consumption are missing. Yi et al. presented an implementation of a Ten Gigabit Ethernet MAC in [12]. While less details on the actual implementation of the MAC are given, the focus of the paper is, however, on a new CRC calculation method. Another 10 Gbit/s Ethernet MAC was described in [13] by Xiao et al. that was implemented on a Xilinx Virtex-6 FPGA device. In the bachelor thesis [14] from the University Ilmenau/Germany, the author Kerling provided a lot of implementation details of an Ethernet MAC coded in VHDL and targeted at FPGAs, supporting link speeds of 10, 100, and 1000 Mbit/s. Te design is publicly available under an open-source license. A few Ethernet MACs are listed in the thesis under related work, but they are all commercial IP cores. A number of existing publications propose hardware implementations of higher level protocols such as IP or UDP built on top of Ethernet. In [15], the Universidad Autónoma de Madrid and the ETH Zürich introduced "Limago," an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s which, according to the authors, is the frst complete description of an FPGA-based TCP/IP stack at this bitrate. Te design is based on Vivado-HLS and makes use of a commercial Ethernet MAC. Te paper includes results from performance measurements of the network stack as well as the resource consumption for diferent confgurations of the framework. A similar TCP/IP implementation for FPGAs called SiTCP was presented by the University of Tokyo in [16]. In order to be vendorindependent, here the author argues against using a hardmacro for the Ethernet MAC and therefore makes use of a custom built MAC. Te Technical University of Munich presented an UDP/IP core for FPGAs based on a hard-wired Ethernet MAC from Xilinx in [17]. Measurements have been performed by the authors concerning network throughput and packet loss. Moreover, a comparison of the consumed FPGA resources with a UDP/IP stack from Löfgren et al. [18] was done. In [19], Sütő and Oniga presented a custom built Ethernet MAC with low resource usage that includes hardware implementations of ARP and DHCP. Here the intended use case is communication of sensor values from an embedded sensor node and shall contribute to the "Internet of Tings." Te "Corundum" project by the University of California is an open-source FPGA-based prototyping platform for network interface development at up to 100 Gbit/s and beyond [20]. Te platform has an even broader focus than the previously mentioned work since it includes 10 G/25 G/ 100 Gbit/s Ethernet MACs, PCI Express Gen 3, a custom PCIe DMA engine, and high-precision IEEE 1588 PTP timestamping. It makes use of the Xilinx Ethernet CMAC hard core for 100 G Ethernet and an own FPGA-based implementation for <100 Gbit/s (which is not described in detail by the authors). Te conference paper [21] from Santos et al. has yet another scope and describes the FPGAbased architecture of a modifed Ethernet switch providing real-time communication based on the Flexible Time-Triggered paradigm. It utilizes the Xilinx Tri-Mode Ethernet MAC soft core which can operate at 10/100/1000 Mbit/s. Other publications such as [22,23] implement an Ethernet MAC only as a use case while the focus of research is verifcation.
In summary and to the best of our knowledge, no publication could be found that compares and evaluates available open-source Ethernet MACs on a large scale. Tat, as well as providing general insights into the potential usefulness of open-source IP cores, was our primary motivation to write this survey and evaluation paper.

Overview of Open-Source Ethernet MAC IP Cores
Te frst step of the overview and evaluation of open-source Ethernet MAC IP cores described in this paper is to gain a comprehensive overview of open-source projects available in this context. Tree major sources for fnding these projects can be mentioned: ( (3) Some projects are only available from the website or private source code hosting instance of the respective author or core vendor. Identifcation of these projects wholly relies on the indexing of an Internet search engine or mention of the location in a publication (e.g., [14]).

Identifed Projects.
Te projects that have been identifed at the time of writing identifed using the sources mentioned above are listed in Table 1 in alphabetical order. As some of the identifed projects are quite popular, often not only the original repository of the core appears in search results, but also other projects that include these cores in their designs. Tese projects are not shown in Table 1, which aims at showing the original set of Ethernet MAC cores found. In addition to the identifed projects, Table 1 also provides the version of the IP core (git commit, SVN revision, or version number), as well as the release date of the version evaluated in this work. Furthermore, basic information about the core and its features is provided.
Te context in which an IP core can be used in a digital design is related to the license under which the source code is released. While most of the identifed cores are released under a traditional open-source software license such as GNU General Public License (GPL) or Berkeley System Distribution (BSD), in recent years licenses especially suitable for open-source hardware such as the CERN Open Hardware License (OHL), the Solderpad License, or the NetFPGA Hardware-Software license have become available. Te license, for example, impacts if the core can be used commercially at all and which parts of the source code (if any) need to be published if it is used in a commercial product. A discussion of licenses for open-source hardware can be found in [42].
Te language which is used to describe the hardware of a core impacts the ease of integration into the context of a larger project. Most identifed cores are described in a "traditional" hardware description language such as VHDL or Verilog. Te two exceptions are the two projects An Ethernet Controller and Litex Liteeth. Te former is described in Chisel, a hardware description language based on the Scala programming language. Chisel is, for example, used in the Rocket Chip Generator (https://github.com/ chipsalliance/rocket-chip, accessed: May 5, 2023) to describe the RISC-V Rocket CPU core. Using the Scala Build Tool (sbt), a synthesizable Verilog representation of the design is generated. In contrast, Litex Liteeth is described in Migen, a hardware description system and core library written in Python, that also generates synthesizable Verilog. Apart from the identifed Ethernet IP core, the Litex project (https://github.com/enjoy-digital/litex, accessed: May 5, 2023) provides a System-on-Chip (SoC) build system and core library (e.g., DRAM, PCIe, and SATA cores) written in Migen.
Furthermore, the supplementary material provided by the core's repository is detailed in the columns "Testbench", "Documentation", and "Reference Implementation" of Table 1.
Providing a testbench with a core allows the potential user of a core to quickly bring up and confrm the functionality of the core in simulation. Furthermore, it may serve as an indicator that some thought has been given to the verifcation of the core by the author(s).
If a project includes reference implementations for one or more FPGA development boards, this can serve as an indicator that the project is indeed synthesizable and was at some point tested in actual hardware by the author(s). Furthermore, important implementation details such as how to integrate the core into a functional system and which parts of the project need to be ported for a specifc FPGA technology can be learned from such an implementation.
Te documentation a core provides has been classifed into three categories: (i) Code comments (CC) document the source code itself inline. (ii) Readme (R) fles are often short text fles describing the most important aspects of a core (e.g., intended application, FPGA family, and build system). In projects hosted on github.com or a similar system, these fles are rendered as a project's "landing page." (iii) Some projects also provide long-form (LF) documentation, either in the form of a user manual, specifcation document(s), or both.
Availability of high-quality documentation signifcantly reduces the time needed until a project can be used productively. Otherwise, this information needs to be extracted from example implementations, testbenches, or the source code of the core itself.
Te principal set of features along which the identifed IP cores are classifed is shown in Table 2.
Concerning the supported communication speed, three classes were introduced: (1) 10/100 Mbit/s, the "traditional" Ethernet speed, (2) 1 Gbit/s, a standard speed nowadays, and (3) >1 Gbit/s, e.g., 10, 25, or more Gbit/s as a fast communication speed. While some cores only support one speed class, others support multiple standards. Te supported speed is closely related to the supported MII variant to interface to an Ethernet PHY. For example, the original MII (4 bit parallel data, 25 MHz clock frequency) was introduced in the 100 Mbit/s Fast Ethernet standard. RMII falls into the same Ethernet speed class but doubles the frequency to 50 MHz in order to halve the number of required data signals. Gigabit Ethernet requires a diferent MII variant, such as GMII (8 bit parallel data, 125 MHz clock frequency), the double-data-rate Reduced Gigabit Media Independent Interface (RGMII) (4 bit parallel data, 125 MHz clock frequency, one nibble transferred per clock edge), or the 625 MHz double-data-rate Serial Gigabit Media Independent Interface (SGMII). Even faster Ethernet speeds require even more complex interfaces such as the 156.25 MHz double-data-rate 32-bit parallel Ten Gigabit  International Journal of Reconfgurable Computing Media Independent Interface (XGMII). An exception to the support of one or more MII variants is the WhiteRabbit project that only supports direct connection to a Physical Coding Sublayer (PCS) core. Furthermore, the interface that a core provides to access its features is an important detail to consider when integrating the core into a planned or existing system. Two kinds of tasks for the core's interface have been considered in this work: the control part of the interface that is used to confgure the core or alter its behavior (e.g., setting MAC address fltering), and the data part of the interface that is used to transfer data to be sent into the core and data received out of the core. In some cases, both of these parts are actually integrated into one single interface. Tis is, for example, the case for a core that provides a single memory mapped address space which allows access to confguration registers and data bufers via an on-chip bus system. In other cases, a separate interface exists for the two tasks. In the "Interface" columns of Table 2, apart from standard on-chip bus interfaces such as Wishbone, Advanced Microcontroller Bus Architecture (AMBA) Advanced High-Performance Bus (AHB), AMBA Advanced Peripheral Bus (APB), AMBA Advanced eXtensible Interface Bus (AXI), and Open Core Protocol (OCP), also non-standard interfaces had to be considered. In this context, Register Transfer Level (RTL) stands for discrete ports that must be driven by external logic but do not follow a well-defned interface. A core is classifed as having an "External FIFO Interface" if it exposes a read/ write request signal along with a read/write data signal for connection of an external FIFO, while it is classifed as having an "Internal FIFO Interface" if it exposes these signals for external access to a FIFO contained in the core. Additionally, a core is classifed as having an "other address/data bus" if it does not use a standard on-chip bus but exposes a generic address and data bus as well as control signals for external access. Finally, some cores are able to retrieve frames to be sent, and store received frames, in external memory acting as a bus master, i.e., provide Direct Memory Access (DMA) capabilities. In these cases, the control interface is used to establish DMA descriptors that the core then uses to access the correct memory locations. Tese cores are described as having a "master" interface.
While a standardized bus interface may be preferred when integrating a core into a complex or CPU-centered SoC, other interfaces such as a plain FIFO or streaming interface may be easier to interface from arbitrary logic that would else potentially be required to generate sequences of bus transactions just to bring up the core for transmission or reception.
Te (technology) Primitives column of Table 2 shows for which FPGA vendor-if any-a core instantiates technology primitives. If such primitives are instantiated in the RTL description of a core, these need to be replaced when porting the core to another FPGA family or vendor. For some technology-dependent components (e.g., BRAM), current synthesis tools are able to infer these components from an RTL description. However, when using more complex components such as Double Data Rate (DDR) components or transceivers necessary for high-speed Ethernet standards, direct instantiation of these primitives becomes necessary. Table 3 provides further insight into the technical features the cores provide. It can be seen that almost all cores provide checking and insertion of the Ethernet Frame Checksum (FCS) as well as First-In-First-Out (FIFO) bufers for decoupling the MAC from the network and/or from the user logic.
While often required, an implementation for Management Data Input Output (MDIO) for confguration and monitoring of the PHY is only provided by a subset of the identifed cores. If MDIO is not present, the user must either supply an external implementation or use the PHY's powerup defaults, which may not be possible in every case.
Several cores provide DMA support to ofoad the task of reading frames to be transmitted and writing received frames into memory from the CPU. In these cases, the cores often require the setup of elaborate DMA descriptor systems that point to locations in memory where the core should place received frames or fetch frames to be transmitted from. Further ofoading is provided by those cores that allow fltering incoming frames for specifc MAC addresses or allow insertion of the host's MAC address into transmitted frames.
As an alternative to DMA, some cores include internal memory-mapped RAM bufers themselves that can be read from or written to from a bus interface.
A small subset of the cores provides special features such as support for VLAN tagging or the PTP for clock synchronization.
While Ethernet standards for transmission speeds lower than 10 Gbit/s include half-duplex operations, only three cores support this mode of operation. Tis includes handling the carrier sense (CRS) and collision (COL) signals generated by the PHY, as well as implementing Carrier Sense Multiple Access/Collision Detection (CSMACD). However, as nowadays the vast majority of Ethernet networks employ switches and full-duplex operations, these features were not evaluated in hardware.
Finally, the (default) width of the data path of the analyzed cores is provided in Table 3. Te majority of the analyzed cores use a data path width geared towards modern on-chip bus systems and CPU interfaces of 32 or 64 bit. As an exception, P. Kerling's MAC and some variants of the Verilog-Ethernet project's MAC provide byte-wise access to the transmitted or received data. Te default network-side interface supported by the WhiteRabbit wr-endpoint core is a 16-bit variant of PCS. Finally, the LMAC3 project-geared towards data rates of up to 100 Gbit/s-provides a 256-bit wide interface.

Out-of-Context Synthesis.
In order to confrm completeness and basic synthesizability of the identifed IP core projects, the cores were synthesized using Xilinx's Vivado 2019.1 FPGA implementation tool. For cores that allow signifcantly diferent variants (e.g., diferent confgurations, speeds, and interfaces), multiple synthesis runs were performed. Tese synthesis runs were done in Out-of-Context International Journal of Reconfgurable Computing  (OOC) mode, which is Vivado's term for performing a synthesis run only, with no required physical (e.g., pin) constraints and without insertion of IO bufers. Te result of this synthesis run is a technology-dependent gate-level netlist that can-in an actual design fow-be instantiated in a hierarchical design. Te OOC synthesis runs allow to judge if a core is synthesizable as-is, or, alternatively which components of the core need to be ported to the specifc FPGA technology in order to be synthesizable. Furthermore, basic resource usage after synthesis allows comparison between the cores and identifcation of possible bugs (e.g., unintended instantiation of latches). Finally, it provides access to Vivado's linting capabilities. For example, its (critical) synthesis warnings can be analyzed and used as an indicator of basic code quality. In the context of Ethernet, also the results of Vivado's Clock Domain Crossing (CDC) analysis tool are relevant because most cores incorporate CDCs from the interface clock domain to the MII clock domain and vice versa.
As a target technology for the preliminary OOC synthesis, the widely used Artix 7 device family by Xilinx was chosen. Tis technology provides 6-input fracturable Look-Up Tables (LUTs), 18 Kbit and 36 Kbit memory blocks, and 25 × 18 DSP blocks.

Synthesized Cores and Variants.
In order to perform OOC synthesis for each of the cores listed in Table 1, Hardware Description Language (HDL) wrappers that instantiate the respective core and set top-level parameters were implemented. Te parameter values were chosen to refect default values set either in the core's top-level module or mentioned in the documentation. In those cases where a core was available in diferent variants (e.g., diferent interface types or Ethernet speeds), multiple variants were synthesized. Tis concerns the following cores: (i) LeWiz's LMAC cores provide a native interface to internal receive and transmit FIFOs and an AXI-Stream interface. Te LMAC1 core was thus synthesized in the native variant, referred to as LMAC_CORE_TOP after the top-level module that was synthesized. Additionally, the LMAC1_COR-E_AXIS variant contains the AXI-Stream interface mentioned above. Additionally, LeWiz's LMAC cores contain a FIFO implementation inferred from HDL. However, it was seen during synthesis that this description is not completely understood as intended by Xilinx Vivado, which implements the FIFO's logic in fipfops and LUTs instead of more suitable memory resources. Tus, each of the variants of the LMAC1 core mentioned above was synthesized in two variants: one with the original inferred FIFO implementation and one where this implementation has been replaced by a macro provided by Xilinx (using the Xilinx Parameterized Macro (XPM)-FIFO core).
Te remaining cores provided by LeWiz (LMAC2 and LMAC3) were only synthesized in variants that use the native interface and the XPM-based FIFO implementation. (ii) Litex's Liteeth provides several diferent interfaces to PHYs (and thus also diferent Ethernet speeds) and diferent application interfaces as well. Both variants provide internal, memory-mapped data bufers for received and transmitted Ethernet frames that can be read and written via Wishbone. (iii) Te 10 Gbit/s project NFMAC10G provides a bare variant that places tight constraints on the interface to external logic and a more comfortable user interface. According to NFMAC10G's documentation, this interface allows for fow control on the receive side and more fexible interfaces on the transmit side. Additionally, this interface also flters out received frames with invalid FCSs. Te bare variant is referred to as nfmac10g, and the variant with the more convenient interface is referred to as nfmac10g_user_intf (see Table 4).
(iv) In a similar way, P. Kerling's Ethernet MAC provides a bare variant that requires external FIFOs and one where these FIFOs are implemented internally, and the read and write port to them is exposed on the interface. (v) Te Verilog-Ethernet project provides a large variety of MACs supporting diferent PHY interfaces. Five diferent variants, instantiating a subset of these interfaces, were synthesized. All of the variants provide AXI-Stream access to internal receive and transmit FIFOs. (vi) WhiteRabbit is a complete system for highly accurate clock synchronization for data transfer and control at CERN. Te WhiteRabbit code repository provides a variety of diferent cores to implement this system, among others a complete SoC implementing Network Interface Controller (NIC), containing elaborate fltering and even a Lattice Mico32 CPU core. Only the MAC implementation of this project was synthesized for this work (wr_endpoint).
In addition, synthesis constraint fles (.xdc fles) for Xilinx Vivado were implemented that defne clocks for the top-level clock inputs in order to enable Vivado to perform CDC analysis. To that end, except if noted otherwise, each clock input was considered to drive a clock that is asynchronous to all other clock inputs. Additionally, this constraint fle allows to constrain clock multiplexers. Several of the analyzed cores (cores capable of 10/100 Mbit/s and 1 Gbit/s) use clock multiplexers to select between the 25 MHz MII transmission clock generated by the PHY in 10/ 100 Mbit/s mode and the 125 MHz GMII gigabit transmission clock generated by the MAC in 1 Gbit/s mode. Vivado requires manual constraining of clock multiplexers in order to perform correct case analysis of the propagated clocks [43].  International Journal of Reconfgurable Computing Furthermore, some cores need manually ported components (e.g., IO components such as DDR inputs and outputs or FIFO bufers), which were added in a preliminarily form to allow synthesis. While these implementations were not verifed in simulation or hardware, they should nevertheless provide information about the approximate relative resource usage of the cores.

Resource
Results. Te primary output of the OOC synthesis process is a technology-dependent netlist of the synthesized cores and thus information about the amount of FPGA resources needed to implement them. Basic resource usage data for the synthesized variants can be found in alphabetical order in Table 4, and usage of additional, more specialized resources can be found in Table 5. Tese results were obtained by running Vivado's OOC synthesis (syn-th_design -mode out_of_context) with default options and one pass of optimization (opt_design) afterwards. Te latter command performs basic optimizations such as propagating constants and removing nets and cells with no fan-out [44].
Due to missing VHDL package fles, containing, e.g., type defnitions, the project Opencores Gbiteth was not synthesizable and thus could not be analyzed in this and the following steps.
On frst glance, the dramatic resource usage of the LMAC1 variants that use inferred FIFOs-obviously in a way not correctly recognized by Vivado-can be spotted in Table 4. If RAM-based FIFOs are instantiated in these cases, more sensible resource results are produced.
Furthermore, LeWiz's LMAC3 core consumes a large number of LUTs even in the XPM-FIFO variant compared to the other analyzed MACs. Analyzing the hierarchical resource results reveals that the majority of the additional resource usage (more than 30000 LUTs and 7000 fip-fops) compared to LMAC2 stems from the implementation of receive and transmit CRC blocks. As LMAC3 uses a 256 bit wide data path and supports data rates of up to 100 Gbit/s, a high-performing but less resource-efcient CRC implementation may have been chosen. An increase in resource usage when increasing the transmission speed above 1 Gbit/s can also be seen when comparing the Verilog-Ethernet 1 Gbit/s variants to the 10 Gbit/s variant.
Apart from the outliers mentioned above, it can be seen that the analyzed cores span a wide range of sizes. While some implementations (e.g., the 100 Mbit/s and 1 Gbit/s variants of Verilog-Ethernet) use close to 500 LUTs and FFs, apparently more complex implementations such as the LMAC variants and to a limited extent also GRETH, Opencores Ethernet Tri Mode, and Opencores Ethmac consume 1000 s of each. When comparing the resource usage of the diferent cores, the vastly diferent amount of functionality provided by the individual cores must be taken into account. For example, the MACs provided by the Verilog-Ethernet project are relatively bare-bone-these cores allow to place frames to be sent into a FIFO and read received ones from another FIFO, with little additional functionality apart from checking FCS validity on incoming frames and inserting the FCS in outgoing ones. On the other hand, for MACs such as Gaisler's GRETH or Ethmac, the focus seems to be on CPU-based systems, with these cores providing a large number of confguration registers, interrupt circuitry, and DMA functionality. Furthermore, the width of the data path may play a role in the used LUTs and FFs.
Diferent sizes of provided FIFOs and bufers explain the diference in LUT-based RAM (LUTRAM) and Block RAM (BRAM) seen among the diferent cores. For example, both Litex variants and Ariane-Ethernet provide internal memory-mapped receive and transmit RAM bufers in contrast to Gaisler GRETH and Opencores Ethmac that use DMA to write to external memory.
Another important insight provided by the resource counts is the number of instantiated latches (in contrast to the number of instantiated fip-fops). Latches inferred by the synthesis tool instead of fip-fops are often the result of incorrect descriptions of combinational or sequential logic in an HDL (sometimes called "unintended latches"), as they may, among other problems, complicate correct static timing analysis [45]. If latches are instantiated, this prompts analysis into the responsible sections of the hardware description in order to verify if the latch was actually intended. In Table 4, the only cores that instantiate latches are LeWiz's LMAC cores. Part of the latches instantiated by Vivado may be caused by the incompatible description of the FIFO inferred from HDL. However, also the variants that use an instantiated XPM FIFO contain at least one latch. Analysis of the source code of LMAC1 and LMAC2 revealed that the latch in both cases is instantiated in the design unit tcore_rx_xgmii. Here, the signal pre_-pkt_we_wire is assigned in a way that requires implementation as a latch, and it thus must be counted as an intended latch. Listing 1 shows an excerpt of the responsible Verilog code in LMAC2. LMAC3 applies similar patterns in its design units tcore_rx_cgmii and eth_crc32_gen. Te larger number in the LMAC3 case stems from the CRC generator because the afected signal in this case is 32 bits wide, and the design unit is instantiated once in the receive and once in the transmit path. Furthermore, as these latches fan out to a large number of cells, the synthesis tool replicates each latch two or three times.
Most cores themselves do not require FPGA resources beyond LUTs, fip-fops, and block memory to implement their functionality. Only the subset of cores listed in Table 5 requires additional, more specialized resources. Among the required resources are the following IO-and clock-related FPGA resources: (i) OBUF and IBUF elements that are ordinary bufers for input and output signals in Xilinx FPGAs that have been explicitly instantiated in the RTL description. (ii) ODDR and IDDR elements that implement drivers or receivers of double-data-rate signals. Tese elements either receive a DDR data signal and clock, and generate two single data rate signals from it, or vice versa [46]. Tis is needed for DDR PHY interfaces such as RGMII.
International Journal of Reconfgurable Computing (iii) IDELAYE2 elements that allow to delay signals (either coming from a pin or the FPGA fabric) by a confgurable duration [46]. Tese elements are used in some cores to, for example, delay incoming data and control signals relative to their clock to improve signal stability when sampled. In order to function properly, either the core itself (as in the case of Ariane-Ethernet) or the instantiating logic must instantiate an IDELAYCTRL element that calibrates the IDELAYE2's delay taps to a reference clock. (iv) BUFIO elements that implement clock bufers that can drive global clock nets from an input pin [47]. (v) PLLE2_ADV elements that instantiate one of the FPGA's PLLs for clock synthesis, skew compensation, and phase shifting [47].

Linting
Results. Important side efects of the OOC synthesis experiments are the automated linting checks that are performed during the synthesis process. Generally speaking, Xilinx Vivado's synthesis process is rather sensitive when generating warnings, warning both about minor imperfections in the input HDL as well as about potential bugs. Two kinds of warnings are generated: Warnings, for situations that may lead to suboptimal results and where user action thus may be taken, and Critical Warnings for constructs Vivado deems "outside the best practices for an FPGA family" and thus recommends user action [44]. Table 6 shows a summary of the number of warnings generated during OOC synthesis of the analyzed cores. Te generated warnings have been summarized into the following categories: (i) Warnings concerning constraints, for example, clocks. (ii) Warnings concerning the generation of (potentially unintended) latches. (iii) Linting warnings that contain information about benign imperfections of the HDL code. (iv) Warnings that describe which parts of the design are trimmed or optimized away. (v) Simulation mismatch warnings that refer to constructs that may lead to diferent behavior in hardware and in logic simulation.
(vi) Warnings concerning the structure of the design.
(vii) As their own category due to their observed number, warnings that inform about internally unconnected ports. Vivado reacts rather sensitive to this condition, generating warnings of this kind even if not all bits of a vector are used in a module that is driven by this vector.
Vivado limits reporting of each individual warning message to 100 occurrences. In these cases, a warning count of "100+" is shown in Table 6. Te actual warning IDs generated by Vivado that have been subsumed into the categories described above are listed in Table 7.
Of these categories, we consider warnings that fall into the constraints, latches, simulation mismatch, and structural categories to be especially serious.
Most of the analyzed cores and their variants generate relatively few serious warnings. Te variants of LMAC1 that have been synthesized using the original FIFO inferred from HDL are the candidates that produce the most of these warnings. Once the problematic FIFO descriptions are replaced with vendor-defned instantiated ones, most of these warnings disappear. However, in all LMAC variants, as also seen in Section 3.2.2 some latches are generated. In addition, Vivado warns about a latch being generated in the WhiteRabbit core that appears to be later optimized away as the fnal resource count in Table 4 shows no latches instantiated for this core. Listing 2 shows the VHDL code that causes Vivado to infer a latch. Te conditional assignment statement is missing an else case, requiring con-sistency_match to hold its value when the condition is not met. As the responsible VHDL description never lets the signal return to zero and the signal is not initialized at declaration, this may constitute a bug resulting in an unintended latch description.
Few cores, namely, Gaisler GRETH, Opencores Ethernet Tri Mode, Opencores Ethmac, and Opencores XGE LL MAC, cause warnings in the simulation mismatch category. As seen in Table 7, warnings in this category concern (1) the sensitivity lists of processes and (2) the description of the reset behavior of fip-fops. In the case of Gaisler GRETH, one signal that is read in a combinational process is not part of the process sensitivity list. Tis may cause a logic simulator not to reevaluate the process when only this signal of the process's inputs changes. Te same issue is present for one signal in the Opencores Ethernet Tri Mode project, one (1) always @ ( * ) begin (2) Table 7). If the input RegInit is tied to a constant in an instantiation, Vivado infers fip-fops with asynchronous reset for the 0-bits and fip-fops with asynchronous set for the 1bits. In the RTL description, all RegInit inputs are tied to constants at instantiations of RegCPUData. If they, however, were not tied to constants-which is apparently assumed by Vivado during OOC synthesis at frst-the value that is asynchronously loaded into each of the fip-fops making up RegOut would depend on a non-constant value. As this is not supported by the fip-fops provided by Xilinx's 7 Series FPGAs, this would need to be implemented using additional logic, which Synth 8-5788 warns about.
Finally, some cores leave (parts of ) signals unassigned, i.e., with no driver, resulting in Synth 8-3848 warnings, which are classifed as Structural. Tis is (relatively) benign when the respective signals are not used but can evolve into a bug if they drive logic in the future. If they are used, the synthesis tool assumes a value for the concerned signal (e.g., assumes constant zero), which may or may not behave as intended.
In addition to Ethernet MAC functionality, Gaisler GRETH optionally implements a UDP-to-AHB bridge referred to as Ethernet Debug Communication Link (EDCL). Te top level of GRETH provides a secondary AHB master interface for EDCL that operates in parallel to the primary AHB master, the bus interface of GRETH's DMA engine. If EDCL is not used, the outputs of the signals of this secondary AHB master are left unconnected. No immediate efects on the design due to these unconnected signals are to be expected as long as the secondary AHB interface is also left unconnected at instantiation externally.
In the case of nfmac10g, the two concerned undriven signals are outputs of a module that are not used at instantiation. Tus, no immediate efects on the functionality of the core are to be expected in this case. In the case of Opencores Ethmac, this concerns a single "debug" signal that can be read from the Wishbone-mapped "Debug" register. Opencores Triple Speed Ethernet, however, does not drive several nets that feed into the MIIM (i.e., MDIO) As mentioned before, in addition to the "normal" warnings discussed above, Vivado also generates Critical Warnings for constructs deemed especially dangerous. Only three kinds of these critical warnings were observed in the cores that were subjected to the OOC synthesis process, as shown in Table 8 CDCs are considered a critical part of any digital design because improper handling may lead to timing (setup/hold) violations at runtime, leading to unwanted behavior due to metastability. Tus, there are some "best-practice" accepted design patterns for dealing with CDCs of diferent types, such as (i) Synchronizer chains of multiple fip-fops for singlebit signals (ii) Synchronizer chains for control logic that controls consistent sampling of multi-bit signals (iii) Encoding multi-bit signals using Gray code (iv) Use of dual-clocked technology elements such as dual-clock FIFOs and dual-port BRAMs CDC analysis is a state-of-the art verifcation technique provided by tools such as Mentor Graphics Questa CDC (https://eda.sw.siemens.com/en-US/ic/questa/designsolutions/clock-domain-crossing/, accessed: May 5, 2023) and Synopsys Spyglass CDC (https://www.synopsys.com/ verifcation/static-and-formal-verifcation/spyglass/ spyglass-cdc.html, accessed: May 5, 2023). Xilinx Vivado also provides some support for structural CDC analysis in the form of the report_cdc command. Vivado's CDC analysis identifes paths crossing from one clock domain to another. It then tries to identify "best-practice" CDC structures according to vendor-defned guidelines [48]. If such "safe" structures cannot be determined or unsafe structures are detected, Vivado generates CDC warnings. As these warnings might be overly sensitive, they have to be reviewed by a designer. Te result of this review can then either be to introduce a fx of the identifed error or to document why the particular CDC is safe in a way that is not understood by the analysis tool.
Tere are some cores where a CDC analysis is not applicable as they do not include clock domain crossings: (1)  Te number of clocks, as well as asynchronous clock pairs with actual paths between the source and destination clock, and the results of CDC analysis performed using Vivado after OOC synthesis can be found in Table 9. Te table shows the total count of warnings in the CDC report, the number of Critical CDC warnings, and the warnings triggered by each core (CDC-1, CDC-2, etc.). Critical warnings have been marked with an asterisk. Tese are the warnings that Vivado classifed as especially critical, requiring user intervention. A description of these warning IDs is shown in Table 10.
Of the warnings shown in Tables 9 and 10, at least CDC-15 can be considered informational only. Tis warning is generated by Vivado when a clock-enable controlled CDC structure is detected. In this CDC structure, a control signal (e.g., a "valid" signal) is synchronized to the destination clock using a synchronizer chain and often converted to a pulse. Tis signal is then used as clock-enable signal of fipfops that sample a multi-bit signal into the destination clock domain without any other synchronizing logic. Te correct operation of this structure depends on surrounding logic to ensure that the multi-bit signal does not change during sampling (e.g., by employing a handshake pattern). Tus, the CDC-15 warning suggests the considered CDC for review. It does, however, not indicate the detection of a potentially dangerous design pattern. In the same way, CDC-2 could be considered as (relatively) benign. Xilinx suggests to inform the implementation tool of a register used in a synchronizer by setting the ASYNC_REG property of the corresponding RTL signal. Tis prevents, e.g., absorbing these fip-fops into non-CDC capable FPGA resources such as LUT-based shift registers (SRL16 and SRL32). A missing ASYNC_REG property indicates that this might happen in future implementation runs, even if it did not happen in the current run and the CDC was correctly detected.
Considered more serious-although noncritical-warnings are CDC-5 and CDC-6. Tese International Journal of Reconfgurable Computing  warnings are generated if a bus is synchronized using a synchronizer for each bit. If the bits of the bus are not required to be consistent in each clock cycle after synchronization, this may be acceptable. Te warnings indicate that the respective RTL should be reviewed to ensure that this is the case. Te diference between CDC-5 and CDC-6 is that CDC-5 additionally warns about missing ASYNC_REG properties.
Finally, CDC-26 is a rather technology-specifc warning. It indicates that the write port of a LUTRAM and the fipfop latching its output are clocked with two diferent clocks, efectively forming a CDC. As reading of the LUTRAM in 7 Series FPGAs happens asynchronously [49], there exists a path that crosses directly from the write to the read clock domain in the case that the read and write addresses are set to the same value. If the user logic ensures that this never happens, a LUTRAM can be used safely for a CDC [48]. Warning CDC-26 thus informs the user to review that this is always the case.
If the analysis tool cannot match the logic on a path between diferent clock domains to a known "safe" CDC pattern or if it recognizes an "unsafe" pattern, Critical CDC warnings are generated. Tese are marked in Table 9 with an asterisk following the warning ID.
If the logic on a CDC path cannot be matched to a single or multiple-bit synchronizer and also does not match to an enable-, multiplexer-, or BRAM-based design pattern, the Critical warnings CDC-1 (for single-bit signals) and CDC-4 (for multi-bit signals) are generated. If the asynchronous reset or set signal of a fip-fop is concerned instead of its data (D) or clock enable (CE) input, the CDC-7Critical warning is generated.
Critical warnings CDC-10, CDC-11, and CDC-12 inform about synchronizers that have been recognized but violate some "best practices" such that the input of a synchronizer should directly originate from a fip-fop in the source clock domain, not from combinational logic as this may introduce glitches into the synchronizer, reducing the Mean Time Between Failures (MTBF).
Finally, CDC-13 and CDC-14 are technology-specifc critical warnings, informing the user that a CDC is present between a fip-fop of the source clock domain and another technology element that is clocked by the destination clock. Tis may, for example, happen if a synchronizer chain (with no reset of the fip-fops) is not constrained with the ASYNC_REG attribute. In this case, the chain of fip-fops may be interpreted as a shift register by the implementation tool and mapped to a LUT-based shift register (SRL16 or SRL32 [49]), especially if more than two are used. Tis is also the case if a fip-fop in the source clock domain directly feeds into a port of a memory block clocked by the destination clock.
Te warnings generated by Vivado's CDC analysis indicate potential problems in the circuit; however, not all warnings correspond to actually critical circuitry. Tus, besides the actual count of warnings in the respective category, in-depth analysis of the reported paths is necessary to judge the quality of the implemented CDCs. In the following, for each analyzed core or core variant, a brief description of the sources of the CDC warnings is provided.
(i) Ariane Ethernet causes relatively few CDC warnings to be generated during OOC synthesis. Te CDC-7 and CDC-10 warnings concern 3-stage reset synchronizers, which contain combinational logic to OR multiple reset sources either before the frst stage or between the frst and the second stage. Te CDC-13 warning refers to an intended synchronizer that is however not constrained as such. Tis register chain is mapped by Vivado to a LUT-based shift register, which is not recommended by Xilinx for implementing synchronizers. Finally, the tool warns about multi-bit signals that are synchronized from one clock domain into another, which concerns Graycoded pointers in the RX and TX FIFO here, which is a commonly used design pattern. (ii) Gaisler's GRETH employs clock-enable controlled structures for its many clock domain crossings. Here, a "valid" signal from the RX or TX Tis is intended to be the signal that controls transfer of wider buses in a "CE-controlled" fashion. However, the synchronizer is coded on RTL in a way that Vivado does not infer a clean chain of fip-fops. Rather, it instantiates fip-fops with combinational logic in between that are capable of setting the individual stages to a particular reset value. Tis leads to the intended synchronizer not acting as such and prompts the reported CDC-1 warnings. Furthermore, as the control signal is now not synchronized properly, the "CE-controlled" pattern is also not recognized as such, and the respective paths are reported as critical CDC-4 warnings. Overall, there are nevertheless very few paths criticized by Vivado's CDC analysis. (vi) Tere are several clocking issues with the Opencores Ethernet Tri Mode project. Firstly, the design is constructed in a way that both the MII clock and clock with half its frequency are needed. Te provided design unit for dividing this clock does this using a fip-fop. It is noted in a comment that this unit is intended "for simulation only" and needs to be "replaced according to technology." However, the supplied example implementation for Virtex 5 also uses this exact design unit. Due to the clock skew incurred by this clock divider, the two clocks are considered as asynchronous to each other. As these clocks are mutually exclusive, however, this leads to no additional clock domain crossing warnings.
Another potential problem is the use of Xilinx's BUFGMUX primitive as a clock multiplexer. On 7 Series FPGA, this uses the BUFGCTRL resource with the clock enable pins (CE0 and CE1) used to select between the two source clocks. However, using the CE pins on BUFGCTRL may cause glitches on the output clock if the signal driving the CE inputs violates the setup/hold time of either source clock [47]. Tis leads to clock domain crossings between the clock driving the select inputs and both input clocks to be selected using the clock multiplexer. Tese are reported as critical CDC-13 warnings. As an alternative, the BUFGMUX_CTRL macro switches the clocks glitchfree using the S0 and S1 inputs [47]. However, this alternative can only be used if both input clocks are free-running, which may not be applicable in this situation (when switching over from MII to GMII, the MII TX clock may already be disabled One general pattern that can be found in all of the variants is that some reset synchronizers either are driven by combinational logic that ORs multiple reset sources or use combinational logic between the frst and second stage of the usually three-stage synchronizers, explaining the CDC-7 and CDC-10 warnings. Te multi-bit CDCs reported (as CDC-5 and CDC-6 warnings) for these variants are generally either Gray-coded pointers in asynchronous FIFOs or two status signals (error_bad_fcs and error_-bad_frame). In the latter case, these are directly connected to top-level outputs, shifting the responsibility to deal with potential inconsistencies between the two signals during one interface clock cycle to the user. Te CDC-12 warning in the GMII variant of Verilog-Ethernet is caused by a reset synchronizer that receives ORed reset signals from the TX and user clock domains, while the CDC-13 warning is once again the result of using the asynchronously switching BUFGMUX clock multiplexer. Finally, the CDC-26 warnings in the MII, GMII, and RGMII variants are generated for paths from reset input pins to the reset inputs of fip-fops of the respective reset synchronizers. Tese warnings are only generated by Vivado when "false-path" constraints are set for the reset inputs of these fipfops, which is done by the constraint fles for the respective PHY interfaces supplied by the Verilog-Ethernet project. When the set_false_path constraints are removed, the CDC-26 warnings are no longer generated. As described earlier, the CDC-26 warning informs the designer to review a certain path that uses LUTRAM as a synchronization element, which is only valid if surrounding logic prohibits that the read and write addresses of the LUT carry the same value. As no LUTRAM is involved in the paths for which the CDC-26 warnings are generated (precisely, no LUTRAM is used in any of the three designs, as can be seen in Table 4) and the warnings vanish when the "false-path" constraint for the In summary, three groups of cores can be identifed based on the analysis of their clock domain crossings: (1) Cores that seem to correctly handle the CDCs present in their respective designs, leading to very few to no problems during CDC analysis. If there are (critical) warnings at all, they concern (relatively) benign patterns like missing ASYNC_REG constraints (that are solvable without changing the source code by the use of a constraint fle) or warnings concerning combinational logic feeding into reset synchronizers (e.g., ORing multiple reset inputs). Ariane Ethernet, LeWiz LMAC1 (AXI Variant), Litex Liteeth, NFMAC10G, Opencores Minimac, Opencores XGE_MAC, and all variants of the Verilog-Ethernet project fall into this category. (2) Projects that contain best-practice clock domain crossing logic that is however not correctly implemented by the synthesis tool due to missing constraints or potentially due to coding style. Tese problems may be fxed by additional constraints via constraint fles but may also require restructuring HDL code to make the intended CDC pattern clear to the synthesis tool. Gaisler's GRETH falls into this category. (3) Finally, some projects contain both a large number of paths between clock domains, and at least some of these paths have been shown to not include (correct) synchronization logic. Some of these paths may have been considered "safe" by the authors, as they-for example-"change only seldomly" (code comment in P. Kerling's Ethernet MAC) or "[are] available long time before its actual use" (in the "Ethernet IP Core Design Document" for Opencores Ethmac). Tis should then, however, be documented clearly and on a per-path basis-either in the form of constraints (analysis tools often allow to "waive" CDC warnings on specifc paths) or in long-form documentation. In some cases, clearer documentation of which clock inputs are considered to be in phase to each other would also be desirable. It might even be necessary, if a core in this category is to be used, to review the identifed potentially unsafe CDC in detail and patch the core's HDL code with safer CDC patterns. Opencores Ethernet Tri Mode and Ethmac as well as P. Kerling's Ethernet MAC, WGE-100, and White-Rabbit fall into this category.
Finally, while the analysis presented above may be used to guide selection of a MAC IP core, it is certainly benefcial for a user of any of these cores to perform a CDC analysis of the entire design that also helps to identify problems that may arise in the interface from user logic to the selected core.

Evaluation Scope
In addition to the analysis of the OOC synthesis results (i.e., resource, linting, and CDC analysis results) discussed in Section 3, a subset of the identifed MAC cores was also subjected to evaluation in physical hardware. Tis allows testing the interoperability with known-working network hardware (i.e., PC Ethernet interfaces) and measuring the performance of the evaluated cores. In terms of performance measurements, the receive and transmit latency of the cores as well as their supported throughput was measured.
Tis evaluation was carried out on a subset of the cores shown in Table 4. Tis subset was selected for evaluation due to the following criteria: (i) C1 (Ethernet speed): Due to their ubiquity in today's network infrastructure, especially when considering the context of embedded systems, the core to be evaluated shall support 10/100 Mbit/s or 1 Gbit/s Ethernet speeds. Tis precludes MACs from evaluation that only support Ethernet speeds above 1 Gbit/s, e.g., 2.5 Gbit/s, 10 Gbit/s, and higher, such as LMAC2 and LMAC3, as well as NFMAC10G, XGE_(LL)_MAC, and the 10 Gbit/s variant provided by Verilog-Ethernet. (ii) C2 (documentation): Te evaluated MACs shall provide sufcient documentation and/or example code that provides guidance in how to port the core to a new hardware platform. Both Ariane-Ethernet and An Ethernet Controller provide little to no documentation or implementation examples at all, and thus these cores were not evaluated. Furthermore, a detailed analysis of the source code of Ariane-Ethernet revealed that its Ethernet component heavily draws on an older version of Verilog-Ethernet's source code and basically only adapts its AXI-Stream interface to a memory-mapped control/status register and frame bufer interface. (iii) C3 (porting efort): Te porting process to a "standard" 10/100 Mbit/s or 1 Gbit/s Ethernet platform shall involve no extensive implementation work other than (1) porting memory-based components-e.g., FIFOs-as well as clock and I/O components to the target FPGA and (2) the development of an interface module that allows the MAC to loop back received data with slight changes (e.g., inverting every received byte). Tis precludes three cores or variants from evaluation-the variants of LeWiz's LMAC that use an inferred FIFO lead to massive usage of FPGA resources, and thus only the variants where these FIFOs were replaced by an instantiated XPM Te selection process can be seen in Table 11. Tus, we arrive at eight projects to be evaluated in total (in ten variants-two for Litex Liteeth and Verilog-Ethernet each). Of these eight projects, a subset of seven projects was evaluated with a 100 Mbit/s Ethernet capable platform. Another subset of six projects was evaluated with a 1 Gbit/s capable platform, with an overlap of fve projects (Litex Liteeth, Opencores Ethernet Tri-Mode, P. Kerling's Ethernet MAC, Verilog-Ethernet, and WGE 100) that were evaluated on both platforms.

Evaluation Setup
In order to perform the function, throughput, and latency tests in a hardware implementation of the cores selected in Section 3, an evaluation platform is needed. Te main task of this platform is to instantiate the respective core under test, supply it with the required clock signals, confgure the core via its control interface, and fnally to operate its data interface. Furthermore, the platform needs to adapt the individual MAC's PHY interfaces to the PHY available on the board to be used. In this section, Section 5.1 describes the general setup of the evaluation platform, and Section 5.2 details the MACspecifc hardware needed to operate each core, as well as adaptations made in order to be operable with the test platform.

Approach and Evaluation Platforms.
Evaluations were carried out on two diferent Xilinx-based platforms: Cores capable of 100 Mbit/s Ethernet speed were evaluated on a Digilent BASYS3 development board, which is based on an Artix 7 FPGA. As this board does not provide Ethernet connectivity by itself, an external RMII PHY-a Microchip LAN8720 [50]-was externally connected. Cores that are capable of 1 Gbit/s Ethernet speed were additionally evaluated on a Kintex Ultrascale-based AVNET KU040 development board. Tis board provides two Texas Instruments DP83867 Gigabit-capable RGMII PHYs [51].
Te main functionality of the harness implemented on both platforms is to implement a loopback of received frames on the user-side interface, allowing to send back frames received from an external Ethernet device. As each MAC provides a diferent user interface-some provide AXI-Stream or FIFO interfaces, others access to memorymapped bufers, and others require DMA bufers-this module needs to be implemented for each MAC individually. In order to allow to diferentiate between frames sent by the external device and those looped back by the MAC under test in a packet capture fle, this interface block negates each data word in the loopback process. Tis furthermore has the efect that the MAC has to calculate a new FCS, and thereby this functionality is also tested by the performed evaluations.
Additionally, our platform supports the intended latency measurements by taking timestamps at certain points in the receive and transmit paths of the respective cores, as well as measurement of the received Inter-Frame Gap (IFG). Measuring the received IFG allows to roughly judge the throughput achieved by the Control PC and is done by counting clock cycles between deassertion and re-assertion of the respective RMII or RGMII valid signal. If the PC achieves the maximum bandwidth specifed by the respective Ethernet standards, the expected IFG for 100 Mbit/s Ethernet is 960 ns and 96 ns for 1 Gbit/s Ethernet; if the achieved bandwidth is lower due to factors infuenced by the packet generator software, operating system, or network interface hardware, a longer received IFG is expected. For latency measurements, timestamps are taken in the receive path at the reception of special Ethernet frames on the PHY interface and at the frst indication of the core's user-side interface that a new frame is available. In the transmit path, timestamps are taken when a frame to be transmitted is placed into the responsibility of the MAC and at the time the frame is visible on the PHY interface. Te exact point where the timestamp for received and transmitted frames is taken difers between the RMII (100 Mbit/s) and RGMII (1 Gbit/s) version. In the latter case, the timestamp is taken after conversion of RGMII to GMII. Tis is done because RGMII requires special DDR IO resources that are directly connected to an FPGA pin and thus cannot fan out to a second receiver. Furthermore, the PHY employed in the RGMII variant requires some confguration that is sent via MDIO. Tis is not necessary with the used RMII PHY.
A fnal task of our hardware platform implemented on the target FPGA is to adapt the PHY interface provided by each core to the PHY interface provided by the board (RMII for 100 Mbit/s and RGMII for 1 Gbit/s) in case the respective PHY does not support the interface supplied by the board. A block diagram of our platform can be found in Figure 2.
Te function, throughput, and latency tests are carried out by a host PC (3.2 GHz Intel i5-3470U CPU with 4 cores, 8 GB RAM, Intel 82571EB/GB PCIe Gigabit Ethernet NIC) running Debian GNU/Linux 11 (Kernel 5.10.0-14). In order to disable automatic IFG adaptation and to provide an IFG above the one specifed by the relevant Ethernet standards, a patched e1000e Linux driver was used for the NIC.
Function as well as throughput is tested by transmitting 100000 Ethernet frames of various sizes (48 (padded to 64 bytes by PC's network stack according to the Ethernet standard), 64, 128, 256, 512, 1024, 1400, and 1536(maximum standard Ethernet frame length) bytes) from the PC to the MAC under test. Tis is done with each of the packet 24 International Journal of Reconfgurable Computing generator programs packETH (command-line version) (https://github.com/jemcek/packETH, accessed: May 5, 2023) and trafgen (part of the netsnif-ng toolkit, see https:// netsnif-ng.org/, accessed: May 5, 2023) at the maximum rate the host PC can achieve. Concerning throughput, it must be noted that not only the throughput of the MAC but also of the MAC-specifc interface hardware modules, implemented for this work, is tested. Tus, if frame loss does occur, the evaluated MAC might not be the single culprit.
For transmission with packETH, frst a PCAP fle containing the frame to be sent is generated by a Python script using the Scapy module. Tis frame contains the Ethernet header to be sent and a payload to bring the entire frame size up to the size to be tested (see above). Te payload is initialized with pseudo-random data. Subsequently, this frame is handed to packeth, which replaces frst bytes of the payload with a sequence number incremented at each sent frame and a short ASCII text containing the name of the MAC.  Figure 2: Block diagram of evaluation platform.

International Journal of Reconfgurable Computing
Testing with trafgen is done to corroborate the results obtained with packETH. Here, the same frame sizes are sent, however, with a payload consisting of a repetition of the same byte (0x4C), and without counters. Tis would, in theory, allow higher transmission rates. As trafgen per default uses multiple CPU threads to generate frames, this would impact the CPU time available for the packet capture tool. In order to minimize frames dropped due to high CPU utilization, trafgen was restricted to using only one thread.
Te MACs, with the help of the purpose-built interface hardware, loop back each received frame to the PC. Each data byte is inverted in the interface hardware in order to allow diferentiation between frames sent by the PC and those sent by the MAC. In parallel, the IFG between incoming frames is sampled and can be queried by the PC via the serial interface. Tis is done so that the actually achieved average throughput during testing (as achieved by the PC) is known.
Te PC, in turn, records every returning frame using the packet capture program tcpdump into a PCAP fle. Tis fle can then subsequently be analyzed if (1) the appropriate number of frames has been sent back (or frame loss occurred, for example, due to limited throughput in the MAC and erroneous FCS calculation), (2) each frame has the expected frame length, and (3) each frame contains the expected data. One diference between the results for tests with packETH and trafgen is that only with the former incoming as well as frames can be captured using tcpdump. While both tools use a raw socket in the PF_PACKET protocol family, packETH uses the sendto (2) system call to send data, and trafgen places frames directly into a bufer shared with the kernel. Tus, only incoming frames show up in tcpdump's PCAP fles when transmitting with the latter packet generator.
For the latency tests, 128 individual frames for each frame size are sent using Scapy. Te RX frame detector monitors for the frst 24 bytes of a special Ethernet frame and takes a timestamp in case it is detected. Subsequently, the MAC interface block takes timestamps when a frame becomes available on the MAC's RX interface as well as when the frame is placed into the TX interface. Finally, another TX frame detector monitors for the start of the looped-back special frame and takes a timestamp when it is observed. Tese timestamps are continuously transmitted by the MicroBlaze CPU to the host PC, which monitors them on a serial interface.
A sequence of events for reception of a frame can be seen in Figure 3, and that for transmission of a frame can be seen in Figure 4. Te actually reported reception latency for each MAC is calculated as the diference of the MII timestamp and interface timestamps, minus the transmission duration that remains after the frame has been detected by the RX frame detector. In a similar way, the transmission latency is calculated from the individual timestamps.

Evaluated Cores.
In the following, the cores selected for evaluation as shown in Table 11 are described in further detail. In addition, the MAC interface module developed for each core and the necessary adaptions to port the core to the target platforms are described. Finally, information about the core's receive and transmit sequence is provided along with information about the points where timestamps for latency measurements are taken.
Te following MACs were evaluated on the 100 Mbit/s Ethernet platform (BASYS3, RMII, and LAN8720). All MACs were instantiated in their MII variant (even if others were available) and interfaced to the RMII LAN8720 PHY using Xilinx's MII to RMII converter core.
(i) Gaisler's GRETH MAC is part of Gaisler's IP core library GRIP. It provides an AMBA AHB/APB memory-mapped interface on the user side and multiple MII variants on the PHY side. An overview of the wrapper can be seen in the block diagram in Figure 5. Tis MAC handles receiving and transmitting frames via DMA. It requires the user to set up DMA descriptors for received and transmitted frames containing memory locations where to write received frames to and where to read transmitted frames from. Tus, in addition to this setup, which is done via an APB slave interface, the core also requires access to system memory via its DMA port implemented as an AHB master. Tis memory contains both the stored frame data as well as the DMA descriptors. Furthermore, a mechanism is required that monitors the status of the RX and TX DMA descriptors in order to perform the intended loopback functionality in hardware. Tis mechanism must transfer the addresses and frame lengths of RX DMA descriptors once they hold a received frame to TX DMA descriptors and re-enable the RX and TX descriptors when they are no longer used. Confguration of the MAC's APB registers is done via a state machine. For holding the DMA descriptors as well as the frame data, a custom AHB module is employed. Tis module infers memory from VHDL for storage of the data elements discussed above. In addition, however, the AHB is monitored. When an RX descriptor is written back with the frame length and status value, the corresponding address is written into a FIFO additionally. Furthermore, the next available RX bufer (of 8 available bufers) is enabled. Te FIFO of available frames to be transmitted back is queried every time the core requests a new TX descriptor. Timestamps are taken in hardware when (1) the DMA engine writes back an RX descriptor with status and length, indicating that a new RX frame is available in system memory, and (2) when a new TX descriptor is enabled, indicating that a frame to be transmitted is placed in the responsibility of the DMA engine. Furthermore, an informational timestamp is taken when the TX descriptor is written back, indicating that transmission is done.  (ii) Litex Liteeth is a MAC written in Migen, a Pythonbased environment for describing hardware, which can be exported to VHDL and Verilog. Diferent user interfaces and capabilities are provided by the project, among others a Wishbone memorymapped interface that provides access to the received and transmitted Ethernet frames, as well as an Etherbone [52] compatible interface that provides a UDP-to-Wishbone bridge. For this evaluation, the Wishbone-based bare-metal Ethernet variant was used. Tis interface provides access to a set of control and status registers as well as internal RAM bufers for two received frames and two frames to be transmitted. Tis means that for looping back a received frame, it has to be copied from the RX to the TX bufer.
Handling the control interface, as well as copying frames to be sent back, is done by a Finite State Machine (FSM). It frst sets up the control registers (confguring the current slot to receive and enabling reception) and then waits for an RX interrupt to occur. Subsequently, the state machine reads the frame length, copies the data to be sent back, and fnally writes the TX frame length and enables transmission. Te presence of two bufers means that while one frame is being transmitted, another one can be received at the same time. An overview of the wrapper can be seen in Figure 6. Te interface RX timestamp is taken when an RX interrupt is received by the core, indicating that a new frame is available in the corresponding bufer. Te TX timestamp is taken when enabling the "SRAM Reader," i.e., the component that reads the frame to be transmitted from the internal transmit bufer. (iii) Te control interface provided by Opencores Ethernet Tri Mode MAC is a memory-mapped parallel interface to control/status registers, while the received and transmitted frames are exchanged over a FIFO interface, i.e., an interface consisting of a read/write signal, a 32-bit data bus, a "byte-enable" signal that indicates how many bytes in this bus are valid, and a set of three status signals (start-offrame, end-of-frame, and data available/free). Concerning the confguration, the high and low watermark values for the receive and transmit FIFOs need to be set via the register interface, as well as the Ethernet speed. Tis is done by a state machine before operation, after which frames are simply looped back by connecting the TX FIFO interface back to back to the RX FIFO interface. Te RX timestamp is taken on the rising edge of the RX FIFO's "data available" signal, indicating that a received frame is present on the FIFO interface. In turn, the TX timestamp is taken on the rising edge of the end-of-frame signal that feeds into the TX FIFO interface.
(iv) Opencores Ethmac is controlled by a Wishbone slave interface that allows to access a bank of confguration registers. Furthermore, this slave interface includes a memory-mapped bufer area to write DMA descriptors into. Received frames are stored in external memory using a Wishbone master, which also fetches frames to be sent from external bufers. A principle similar to Gaisler's GRETH is followed for looping back frames received via this core: A state machine confgures the MAC initially (i.e., sets the confguration registers and confgures an initial RX DMA descriptor). Subsequently, the interrupt line is monitored until a new frame is received. Te state machine then reacts by (1) setting the pointer in the RX DMA descriptor to the next bufer segment, and re-enables it so that the next frame can be received, and (2) storing the bufer address and frame length of the currently received frame into a FIFO. Once the RX DMA descriptor has been enabled, the state machine checks if the MAC's transmitter is ready to send a new frame. If this is the case, the next bufer address and frame length are retrieved from the FIFO and confgured as a TX DMA descriptor. Once this descriptor is updated and enabled, the MAC transmits the stored frame back to the PC. For this purpose, in the wrapper module, an XPM FIFO has been instantiated, and a Wishbonecompatible byte-enabled RAM bufer (for storing received frames) has been inferred. An overview of the wrapper can be seen in Figure 7. Te receive timestamp is taken when the state machine handles an RX interrupt, while the transmit timestamp is taken when handing of a frame to the transmitter is fnished (by loading the next TX DMA descriptor). (v) P. Kerling's Ethernet MAC needs minimal confguration (e.g., Ethernet speed if not using auto negotiation via MDIO) on some RTL ports. In the MAC-with-FIFOs variant, the data interface is a relatively straight-forward FIFO interface consisting of conventional read/write, 8-bit wide data, and empty/full signals. However, as the frame length is stored in the same FIFO, there is a protocol to be followed when reading a received frame or writing a frame to be sent. Te frst two bytes contain the frame length; subsequently, the frame content is stored. Tis protocol is implemented in a state machine that communicates to the core via these FIFO ports.
Several components needed to be ported to the Artix 7 based target platform in order to evaluate this core: (1) For triple-speed operation, the core uses clock multiplexers to select between the MII and GMII clock signals. As it is used in single-speed mode for our evaluations, this construct was removed. (2) Te core provides FIFO components as Xilinx ISE IP core fles. Tese needed to be upgraded to be usable with Vivado for Artix 7.
(3) Some I/O components needed to be removed/ deactivated. Te original core is intended to directly interface to a GMII PHY. It directly instantiates I/O bufers in an attempt to guarantee a certain clock-to-input/output delay and to make sure that some registers are directly packed into the I/O resources. However, as the core is instantiated in a wrapper and communicates to the PHY via an MII-to-RMII converter, these direct instantiations of I/O resources are no longer needed and in fact not accepted by the implementation tool.
Te RX timestamp is taken when the state machine waits for the RX FIFO to indicate that it is no longer empty, while the TX timestamp is taken when writing the last byte to be sent to the TX FIFO.
(vi) All MACs provided by the Verilog-Ethernet project allow only minimal confguration (the IFG to be transmitted via an RTL port). Tey provide two AMBA AXI-Stream interfaces for received frames and frames to be transmitted. Tese interfaces include the standard AXI-StreamTLAST signal to indicate the end of a frame, while the beginning is assumed once data become frst available after a reset and after the end of the previous frame has been indicated. On the RX interface, data become only available if the frame checksum has been checked. Tus, once the AXI-StreamTREADY signal is asserted by the core Both Litex Liteeth and Verilog-Ethernet provide support for RGMII, and thus no adapter was needed in between MAC and PHY for these cores. However, the other MACs only support GMII. Tese cores were adapted to the available RGMII PHY by using the applicable converter provided by the Verilog-Ethernet project.
Furthermore, for the operation of the 1 Gbit/s PHY, implementing an MDIO interface was required in order to confgure the PHY to 1 Gbit/s Ethernet speed. Tis was done for all evaluated MACs using code provided in an example from the Verilog-Ethernet project.
Te following signifcant adaptations were needed for performing the evaluations on the 1 Gbit/s Ethernet platform (KU040, RGMII): (i) LeWiz's LMAC1 was only evaluated on the 1 Gbit/s Ethernet platform. Tus, an additional interface module tailored to the core's interface had to be implemented. Te confguration for the core (i.e., Ethernet speed, local MAC address, and support for broadcast frames and promiscuous mode) is done via RTL ports. As a data interface on the user side, this MAC provides a FIFO interface. However, it does not only provide RX and TX FIFOs but also a FIFO that contains the packet lengths of received frames. For reception, the "frame length" FIFO must frst be read out, and then the appropriate number of words from the data FIFO. For transmission, frst the frame length needs to be written to the TX FIFO followed by the frame content. Handling the two receive FIFOs and the transmit FIFO was done using a state machine. Tis state machine also detects when the frame length and data FIFOs are inconsistent (when more data should be available in the data FIFO according to the frame length read earlier, but the data FIFO indicates emptiness). In this case, a reset of the core is triggered. Furthermore, while it is claimed in the core's documentation that GMII is supported, the core actually implements an XGMII-like interface. In contrast to standard XGMII, which is 32 bits wide, this interface is only 8 bits wide. It however requires a start-of-frame and end-of-frame symbol indicated using a single "control" signal (standard XGMII has four control signals). Tis interface is adapted to GMII using a purpose-built interface module that inserts control symbols in received GMII transfers and removes them from outgoing transfers. Te RX timestamp is taken when the RX data FIFO's "empty" signal is deactivated, while the TX timestamp is taken by the state machine when all data have been written to the TX FIFO. An overview of the wrapper can be seen in Figure 8.
In addition, the inferred FIFOs provided by the core had to be replaced by instantiated XPM FIFOs as done in the OOC synthesis experiments. (ii) Litex Liteeth: As the RGMII interface provided by this core uses an IDELAY element, the instantiation of a DELAYCTRL primitive is required anywhere in the design. Te DELAYCTRL primitive tunes the delay taps of all the FPGA's delay elements to a reference clock. As this primitive is not instantiated by the core, the instantiation is however enforced by Vivado's design rule check, and the interface module does this instantiation. (iii) WGE 100: Te NGC netlists for the FIFOs required by the core are not compatible with the Ultrascale design fow as reported by Xilinx Vivado. In order to continue using the netlists-and thereby avoiding the need to re-generate these IP cores-the NGC netlists were converted to Verilog netlists using Xilinx ISE 14.7.

Evaluation Results
As mentioned in Section 5.1, measuring the received IFG may serve as a rough quantifcation of the throughput achieved by the PC. Tese values were measured in hardware by counting the number of RMII or RGMII clock cycles between the deassertion and subsequent assertion of the respective "data valid" signal. Table 12 shows the average Inter-Frame Gaps (IFGs) and their standard deviations generated by the PC in the 100 Mbit/s experiments when using each frame generator program. It can be seen that (1) the generated IFGs are longer than the minimum transmitted IFG of 960 ns defned in IEEE 802.3-2018 and (2) that there seems to be no signifcant diference with frame length and between both frame generators in the average output data rate. Table 13 shows similar data for the 1 Gbit/s experiments. Te IFGs here once more are longer than the minimum transmitted IFG of 96 ns defned in IEEE 802.3-2018. Furthermore, when the PC sends short frames (especially with packETH), the generated IFGs seem to be signifcantly longer than when it sends frames in the range of 128 to 1400 bytes. Additionally, the variance is increased when sending especially short frames (48 and 64 bytes) when compared to longer ones. At the maximum frame length (1518 bytes), longer average delays between frames have been measured. It can be seen that the maximum theoretical throughput as allowed by the relevant Ethernet standards was not achieved during these tests. However, the throughput experiments can nevertheless yield information about the ability of the evaluated cores to replicate this "realistic" throughput in addition to an overall function test with reference "known-good" Ethernet equipment.  For these throughput tests, as discussed in Section 5.1, 100000 Ethernet frames of varying sizes were sent to the MAC, which looped them back to the PC where they have been recorded for checking. When testing with PackETH, the frames include an incrementing sequence number well as pseudo-random data in the remaining payload. As PackETH allows recording of outgoing and incoming frames, the generated PCAP fle could then be analyzed to verify that all sent frames were correctly looped back by the MAC. Tis was done by verifying the sequence number and the remaining payload against the sent data.
When transmitting data using trafgen, the outgoing data cannot be recorded in the same way using tcpdump. With trafgen, only the content of the received frames and their absolute count could be verifed (100000 sent frames should result in 100000 frames looped back by the MAC).
Testing the selection of MACs for 100 Mbit/s (Opencores Ethernet Tri Mode, Opencores Ethmac, Gaisler GRETH, Litex Liteeth, P. Kerling, WGE 100, and Verilog-Ethernet) yielded the following results: (i) With packETH as frame generator, all of the checks described above for all of the MACs tested at that speed and frame sizes were successful. All 100000 pairs of frames transmitted by the PC were correctly looped back by the MACs under test, and thus 200000 frames could be captured in total. (ii) With trafgen, only received frames coming from the MAC could be captured. However, summing up the frames that could be captured successfully by tcpdump, it was verifed that all MACs transmitted back 100000 frames.
Considering the selection of MACs tested at 1 Gbit/s (Opencores Ethernet Tri Mode, LeWiz LMAC1, Litex Liteeth, P. Kerling, WGE 100, and Verilog-Ethernet), all MACs correctly looped back the frames transmitted to them with the exception of LeWiz LMAC1. When testing LeWiz LMAC1 with packETH and when testing it with trafgen, some frame loss could be observed. Tis can be seen in (a) and (b) in Table 4. Here, on the one hand, the expected number of captured frames (200000 for packETH and 100000 for trafgen) can be seen. On the other hand, these tables show the frames actually captured by tcpdump, revealing a considerable discrepancy. Te packet capture software tcpdump did not report any dropped frames-neither by the Kernel ("Dropped-K") nor the interface ("Dropped-IF"). As this efect occurred at every frame size and did occur only with this MAC, it is assumed that the frames are either dropped due to the MAC-specifc loopback logic implemented for LMAC1 or due to problems internal to the core ("Dropped-Core").
Latency results are plotted for the 100 Mbit/s experiments in Figure 9. While there exists some variation between the exact latency of the cores, the cores evaluated for 100 Mbit/s exhibit both receive and transmit latencies in the same order of magnitude. Te latency variations between the cores may also be impacted by the signals in the design where the receive and transmit interface timestamps have been taken. Tus, no actually "fastest" core can be selected on the basis of these data. Te results, however, may allow the conclusion that all cores seem to be suitable for efcient 100 Mbit/s operation. Figure 10 shows latency results for the cores evaluated with 1 Gbit/s Ethernet speed. Concerning transmit latency, all evaluated cores are fairly consistent in the time it takes to begin to send a frame-give or take a few clock cycles. Te only major diference between the cores can be seen with the receive latency: while fve of the six cores tested at 1 Gbit/s forward a received frame to the interface within ≤ 20 clock cycles, it takes more than 200 for LeWiz LMAC1.

Discussion and Future Work
In the era of the Internet of Tings, many of today's electronic devices implement some kind of network interface with Ethernet being known as one of the most widely used network standards. Tere is consequently a high demand on available Ethernet implementations for FPGA platforms. Existing commercial solutions for Ethernet MAC IP cores may come with some limitations such as technology dependencies, license fees, and the inability to perform design changes or to add special features. Here, the usage of an open-source IP core can be a solution to overcome these drawbacks. Since to the best of our knowledge no publication could be found that compares available open-source Ethernet MACs on a large basis (see Section 2), we wanted to provide an overview of existing open-source IP cores including an evaluation in terms of performance, resource utilization, or code quality herein.
During our survey, 18 open-source projects could be found at Internet sources like opencores.org or github.com that have been listed in Table 1. Concerning code quality, the availability of a reference implementation may be a frst indicator of the maturity of an IP core (see Table 1 and discussions in Section 3.1). Next, the reports of a logic synthesis tool typically provide important information in that context. One of the 18 projects shown in Table 1 (Opencores Gbiteth) was not synthesizable at all and thus was excluded from further evaluation. Te synthesis warnings for the remaining 17 cores are summarized in Table 6. Here, especially warnings in the categories "constraints," "latches," "simulation mismatch," and "structural" have been considered to be serious (see discussion in Section 3.2.3). However, except for the variants of the LeWiz LMAC1 core, most of the analyzed IP cores show relatively few of those serious warnings. Furthermore, in every digital design, special attention should be paid to signals that cross clock domains since improper handling may lead to metastability efects and faults that can be extremely hard to debug. Based on our analysis, the cores Gaisler GRETH, Opencores Ethernet Tri Mode, Opencores Ethmac, P. Kerling's Ethernet MAC, WGE 100, and WhiteRabbit may be problematic in that context, and therefore signal paths mentioned in Section 3.2.4 should be closely examined and improved before one of those IP cores is used in a project.  Some of the 17 synthesizable cores shown in Table 1 have not been evaluated further due to insufcient documentation or because the porting efort to a specifc FPGA technology has been considered as being too high (for details, see Section 4). Moreover, only Ethernet MACs with support for network speeds of either 10/100 Mbit/s or 1 Gbit/s have been selected for a prototype evaluation since these bitrates are most widely used today. Tis results in a number of eight projects (see Table 11) that have been closely examined on an FPGA-based prototype platform (described in Section 5). As the measurement results in Section 6 show, all of the eight Ethernet MACs could be successfully operated on a real-world hardware platform. However, packet loss could be observed for the LeWiz LMAC1 core (see (a) and (b) in Table 14). As shown in Figure 8, the receive and transmit latencies for the cores evaluated at 100 Mbit/s are in the same order of magnitude while for the cores that have been operated at 1 Gbit/s, the receive latency of the LeWiz LMAC1 MAC is much higher than that of other cores (see Figure 9).
In summary, 16 out the 18 Ethernet MAC IP cores shown in Table 1 can be principally recommended to be used without larger re-design work (Opencores Gbiteth is not synthesizable and for the LeWiz LMAC1, a number of serious synthesis warnings and packet loss during prototyping could be observed). However, the integration and porting eforts of the IP cores An Ethernet Controller and Ariane-Ethernet may be higher than those of other cores while the clock domain crossings of the cores Gaisler GRETH, Opencores Ethernet Tri Mode, Opencores Ethmac, P. Kerling's Ethernet MAC, WGE 100, and WhiteRabbit are strongly suggested to be improved before these IP cores are used in a project.
Concerning the remaining Ethernet MACs and based on our evaluations, there is no "best" IP core. Instead, parameters such as the supported network bitrate (10/ 100 Mbit/s, 1 Gbit/s, >1 Gbit/s, etc.), the PHY interface (MII, GMII, RGMII, etc.), or the application interface (AXI, Wishbone, propriety interface, etc.) may heavily infuence whether an IP core is suitable for a specifc application or not (see Table 2).
Te same applies to special features such as support for DMA transfers, VLAN tagging, or PTP (see Table 3). Tis is also refected by the FPGA resources consumed by an IP core which largely depends on the implemented features (see Section 3.2.2 and Tables 4 and 5). Besides, the license model may impact whether the core can be used commercially at all and which parts of the source code need to be disclosed if being used in a commercial product. On the other hand, the design language (shown in Table 1) should not be an issue since all surveyed cores are available in VHDL, Verilog, or System Verilog, and modern FPGA tools can typically handle all of these languages. However, for two cores (An Ethernet Controller and Litex Liteeth), a special "build tool" must be used in order to generate synthesizable HDL code.
As already mentioned, the selection of an Ethernet MAC highly depends on the intended use case. Let us focus, for example, on applications with requirements on high bandwidth and/or low latencies. Such requirements exist, e.g., in the telecommunications sector where demand for bandwidth is ever increasing due to developments such as High Defnition (HD) and Ultra High Defnition (UHD) video streaming. Also, in the automotive area where Ethernet is used for quite some time as an in-vehicle network, bandwidth requirements are constantly growing. With the advent of Advanced Driver-Assistance Systems (ADASs) and automated driving, substantially more data need to be exchanged with sensor data coming from diferent places and have to be distributed to various locations [4]. For data centers which are used for large-scale computation or to host services such as Internet search machines (e.g., Google, Bing, etc.) and video streaming platforms (YouTube, Netfix, etc.), the bandwidth requirements on the network infrastructure are enormous [53]. When such high-performance computing clusters are used to process, e.g., artifcial intelligence applications, latencies are important as well since due to the huge network trafc, the network latency heavily infuences the time needed for the distributed calculations [54]. In industrial communication systems, low latencies are also often a must to minimize response times [7]. Te same applies to control networks in avionics [6].
Based on our measurements described in Section 6, a ranking of the evaluated 1 Gbit/s Ethernet MACs in terms of network speed and latency can be derived as shown in Table 15. Tis table shows the cores ranked by their average measured transmit latency (in clock cycles at 125 MHz) as well as the average latency measurements and standard deviations for both received and transmitted frames. Tis ranking may help designers to select an open-source Ethernet core when a high-speed/low-latency MAC is needed for a particular application. Of course, the ranking can be completely diferent if other requirements than speed or latency are in the focus of an application.