Assessing the Static and Dynamic Sensitivity of a Commercial Off-the-Shelf Multicore Processor for Noncritical Avionic Applications

*e present work assesses the radiation sensitivity of an affordable and performant COTSmulticore processor for noncritical avionic applications. *e target device is the Epiphany E16G301 multicore manufactured in 65 nm CMOS which integrates 16 processor cores. *is device was selected due to its high performance, low power consumption, and affordability, allowing general public accessing to parallel computing. Additionally, the E16G301 is the coprocessor for parallel processing of the Parallella platform, which was considered by NASA researchers for onboard health management of the DragonEye UAS. *e evaluation of the device is done using quantitative theory by means of radiation experiments with 14MeV neutrons to emulate the effects of high-energy neutrons present at avionic altitudes. Static and dynamic cross sections are obtained to evaluate the intrinsic sensitivity of the device as well as its dynamic response. Results show that the failure rate of the E16G301 running amatrix multiplication as application reaches level D of the DO-178B/C guideline, being the device well suited for minor failure conditions of avionic applications.


Introduction
Multicore processors are a suitable solution for achieving high performance and reliability without increasing significantly the power consumption.Its processing capacity and redundancy capabilities make them appropriate devices for implementing fault-tolerant mechanisms [1,2].Hence, avionic industries are interested in incorporating these devices in their systems [3].However, the high degree of miniaturization (nanometer-scale) of multicores increases their vulnerability to the effects of natural radiation.
is radiation may result in transient and permanent failures called single event effects (SEEs).Among them, the single event upset (SEU) is the most representative, since it may produce the modification of the content of a memory cell [4].For this reason, manufacturers are enhancing fabrication processes and architectural designs.Silicon-on-Insulator (SOI) is a clear example of technology improvements, implemented to face traditional bulk CMOS drawbacks [5,6].Radiation hardening by design (RHBD) techniques are also used to mitigate SEU consequences [7].
e implementation of error correcting codes (ECCs) and parity to protect the internal memory of the processors is useful but not enough in presence of multiple bit upsets (MBUs).Another wellknown RHBD technique is the triple modular redundancy (TMR) which significantly improves the reliability of the system.Nevertheless, having more robust or dedicated components implies a considerable increase in costs.Consequently, an important challenge for aircraft industries is the integration of commercial-off-the-shelf (COTS) multicore processors due to budget and availability issues [8].e current work assesses the effects of neutron radiation on a multicore processor which does not implement protection mechanisms in its internal memories.is is achieved by means of two accelerated radiation experiments.e first one aims at evaluating the device (hardware) sensitivity, while the second one evaluates the application (software) sensitivity.Part of the results of current research has been presented in [9].

Related Work
ere are some works in the literature regarding accelerated radiation experiments on multi-/manycore processors.e most representative ones are given below.
Stolt and Norman established a dynamic cross section model for a multicore server based on quadcore processors built in 45 nm bulk CMOS technology.e target multicore was an HP c7000 BladeSystem designed for aircraft altitudes.Radiation experiments on the multicore server were conducted with the 14 MeV neutron in order to simulate the effects of high energy particles present at avionic altitude.
e server is composed by six Intel X5570-based HP server blades and six interconnect modules.For the test, it is possible to select the operating system, the BIOS setting, the processor, and the input/output utilization.Results estimate that the cross section per bit for 45 nm CMOS technology at 14 MeV neutrons is 1 × 10 −14 cm 2 /bit [10].
Guertin presents radiation experiments on the 49-core Maestro processor, which is a radiation hardened by design (RHBD) device for space applications based on the Tilera TILE64 processor.is 90 nm manycore is produced by the onboard processing expandable reconfigurable architecture (OPERA) program and built by the Boeing Solid State Electronics Development (SSED).Experimental tests have been conducted at the Texas A/M University's (TAMU) cyclotron facility using 15 and 25 MeV ions.During the tests, internal registers as well as the L1 and L2 cache memories of the tile core were targeted.e main observed events were upsets in the L1 and L2 caches which were handled by an effective error correction and detection (EDAC) included in the Maestro design [11].
Santini et al. proposed a generic metric (mean workload between failures) to evaluate the reliability of an embedded processor intended to execute safety-critical applications.
is study considers both cross section and exposure time for demonstrating that, on modern embedded processors, enabling the cache memories may provide benefits to critical systems in terms of reliability.is is possible since the larger exposed sensitive area may be compensated by a shorter exposure time of the application.e proposed metric is validated through extensive radiation test campaigns targeting a 28 nm COTS ARM-based SoC.
e experiments were performed at the Los Alamos National Laboratory (LANL) and Los Alamos Neutron Science Center (LANSCE) with white neutrons source that emulates the energy spectrum of the atmospheric neutron flux.
e failure probability of a bare-metal application is decreased when L1 cache is enabled.Consequently, it is not enough to rely only upon the cross section to ponder reliability [12].
Oliveira et al. presented the radiation sensitivity evaluation of cache memories and internal resources of modern graphic processing units (GPUs) designed in a 28 nm technology node.In addition, several hardening strategies based on duplication with comparison (DWC) to reduce GPU radiation sensitivity are presented and validated through radiation experiments.e device under test was NVIDIA K20 that contains a compute unified device architecture (CUDA-) -based GPU. e cross section per bit of the L2 cache and shared memories were experimentally obtained at the Los Alamos facility using 14 MeV energy neutrons.ree different DWC strategies were designed to mitigate radiation-induced effects on GPU's used in safetycritical and high-performance computing (HPC) applications.e efficiency of the proposed strategies was experimentally evaluated and compared with chip's ECC protection mechanism.It was demonstrated that DWC strategies can be more effective than ECC when input data are duplicated [13].
Ramos et al. illustrated the radiation experiments on a quadcore processor built in a 45 nm SOI. e target device was the Freescale QorIQ P2041 processor, which is a highperformance device designed for communications.Experimental tests were conducted in the GENEPI2 (GEnerator of NEutron Pulsed and Intense) particle accelerator located in Grenoble, France.e results show that the SOI technology is between three and five times less sensitive to SEE than its CMOS counterpart.
e dynamic asymmetric multiprocessing (AMP) tests have demonstrated that in spite of parity and ECC protection mechanisms, errors have been occurred in the result of the application.In addition, it can be seen that the dynamic sensitivity of the device strongly depends on the implemented multiprocessing mode [14].
Vargas et al. evaluated the SEE static and dynamic sensitivity of a manycore processor built in 28 nm CMOS.
e target device was the Kalray MPPA-256 processor which is a power efficient device implementing a clustered architecture with 16 compute clusters each one with 16 processing elements.Radiation experiments were conducted in a GENEPI2 particle accelerator located in Grenoble, France.
e evaluation of the device's dynamic response shows that, by enabling the cache memories, it is possible to gain in performance of the application without compromising reliability.Additionally, the results suggest that ECC and interleaving implemented in the static memories of the targeted clusters are very effective to mitigate SEUs since all detected events were corrected [15].

Methodology and Materials
Accelerated radiation ground testing allows performing the analysis of the sensitivity to radiation of electronic devices through artificial radiation environments.It is the fastest way to obtain statistically meaningful data in a short period of time, since the more particles hit the component, the more SEEs are observed [16].e reproducibility of the experiment is also another major advantage of this strategy.Consequently, this work considers two models of tests for evaluating the sensitivity of a multicore processor: a static test in order to obtain the intrinsic sensitivity of the device's memory cells and a dynamic test for evaluating the dynamic response of the implemented application [17].Figure 1 illustrates the proposed methodology.

Journal of Nanotechnology
In this work, experimental tests have been conducted with 14 MeV neutron radiation to emulate the effects of high-energy neutrons present at avionic altitudes, since neutrons are the most representative particles in the Earth's atmosphere.Reference [18] discusses the relevance of using the 14 MeV neutron test to characterize the SEU sensitivity of digital devices.Sections 3 and 6 of the JESD89A document of the JEDEC standard were used as a base protocol for the experimental tests [19].

Identification of the Variables.
Radiation tests are experiments that can be addressed using quantitative theory.Consequently, the first task is to identify the variables involved in the experiment.Table 1 lists the independent and dependent variables for the static and dynamic tests.Note that the dynamic test also depends on the system configuration and the implemented application.
e independent variables are divided in two groups: variables depending on the system (exclusive for the dynamic test) and variables to be manipulated during the radiation experiments.
Dependent variables represent the errors observed during the tests.ey can be classified into single errors, multiple errors, and sequence interruption errors.It is important to consider that, depending on the memory architecture of the multicore, single and double errors can be corrected and detected, respectively, by the protection mechanisms.

Static Test.
is test aims at estimating the intrinsic sensitivity to SEE of the memory cells of a processor.e device under test (DUT) is placed facing the center of the target perpendicularly to the beam axis at a distance depending on the required radiation flux.Typically, the method consists in writing a predefined pattern in the memory and accessible registers of the processor via the instruction set (load and store).Once finished the initialization, the DUT is irradiated and the program checks periodically the memory locations along the radiation test to detect upset events.If an upset is detected, the program writes the correct pattern in the associated memory location and logs the results to an external host via Ethernet ports.During the static test, all the sensitive zones are exposed to radiation at the same time, which do not represent the real behavior of the circuit since not all the memory resources are used simultaneously when an application is executed.For this reason, the static test provides the worst-case estimation of the device sensitivity [20].
As a result of this test, the static cross section (σ STATIC ) of the device is obtained.It is defined as the number of detected upset events divided by the fluence, which is the neutron flux integrated in time, as expressed in the following equation: e elementary data pattern for memory circuits is a logical checkerboard [19].All zeros and all ones is also a common pattern used during the radiation test.However, some memories such as DRAMs usually have a favorite error failure, either 0->1 or 1->0.For this reason, for testing when there is no a priori information about the component, the test pattern have to balance the number of 0's and 1's.us, the selected pattern for the static test was 0x55AA55AA.Regarding the exposure time to radiation of the device, it is important to consider that the probability of having an upset event during a given period of time is a stochastic process that follows a Poisson distribution.us, the waiting time between the read operations in the static test can be validated by analyzing the distribution of the number of events per unit of time.If the obtained distribution does not follow the Poisson law, the waiting time should be adjusted.

Dynamic Test.
e goal of this test is to estimate the SEE dynamic response of an application running on a processor.As a result of the experiment, the dynamic cross section (σ DYN ) is obtained.Unlike the static test, it only evaluates the memory cells used by the application.e method consists in the periodical execution of an application while the processor is being irradiated to induce SEE.Once finished each execution of the program, results are compared with a set of correct values previously obtained, in order to detect errors.
e experiment is launched and monitored using a host computer located outside the armored chamber.e  Neutrons are emitted omnidirectionally to the DUT with an average energy of 14 MeV.e DUT is set facing directly the target at a distance depending on the required neutron flux.Experimental radiation campaigns consider that only neutrons emitted fully forward will impact the DUT.For protecting the readout electronic platform other than the DUT, a dedicated neutron shielding is used.
A new T target providing a maximum neutron flux of 4.5 × 10 7 n•cm −2 •s −1 was installed in 2015 aiming at increasing the neutron production while improving the accelerator reliability.e major modification consists in replacing the current deuterium ion source by a new one, based on the electron cyclotron resonance (ECR) technique, delivering higher-beam intensity.Figure 2 illustrates GENEPI2 particle accelerator.

Device under Test.
e selected device for this study was the Adapteva Epiphany E16G301 which is a 16-core processor designed for parallel computing of the Parallella board. is board is a high-performance computing platform based on a dual core ARM A9 processor, used as host, and the Epiphany E16G301 used as coprocessor.e Epiphany is a scalable multicore architecture with up to 4095 processors sharing a common 32-bits memory space.It defines a parallel computing fabric comprised of a 2D array of processors nodes connected by a low-latency mesh network-on-chip.e E16G301, which is based on 3rd generation of the Epiphany multicore IP, is a 16-core system-on-chip implemented in a 65 nm CMOS technology [22].Each processor core is a 32-bit superscalar floating point RISC CPUs, capable of performing two floating point operations per clock cycle and one integer calculation per clock cycle.e device has a peak performance of 32 Gflops (2 Gflops per core).e maximum chip power consumption is less than 2 W. Each CPU has an efficient general-purpose instruction set that excels at compute-intensive applications while being efficiently programmable in C/C++.Figure 3 shows the implementation of the E16G301 architecture.
e memory architecture of the E16G301 multicore is based on a flat shared memory map.Each compute core has up to 1 MB of local memory as a unique addressable part of the total 32-bit address space.e core processor can access its own local memory as well as other processors' memory by means of standard load/store instructions.e local memory is comprised of 4 independent banks, each one of 8 KB for a total of 32 KB for each CPU core.For the particular case of the Epiphany E16G301 that implements 16-cores, the chip has a 512 KB distributed shared memory [23]. is multicore processor does not implement any protection mechanisms in its internal memory.

Benchmark Application.
A standard 45 × 45 matrix multiplication (MM), which is a memory-bound application, was selected to be tested throughout this work.It was considered since the matrix multiplication is one of the most essential algorithms in numerical algebra as well as in distributed, scientific, and high-performance computing [24].Concerning avionic applications, MM is used for image processing, filtering, adaptive control, and navigation and tracking.e input matrix A was filled up with the decimal number "5," while matrix B was filled up with 6 s; thus the expected result was 1350 for all the elements of the resulting matrix C. e total number of variables used for the implementation of the matrix multiplication is 6078, distributed in 4050 input variables, 2025 output variables, and 3 indexes for loop operations.Each variable was implemented in 32-bit, being the targeted sensitive area about 24 KB that perfectly fits in the 32 KB local memory of each core.e size of the matrix was selected so that data occupy as much memory space as possible, leaving enough space for the program's code.

Results and Discussion
Radiation experiments performed on the Epiphany E16G301 are very interesting compared to similar works targeting other multicore processors, since errors produced by SEE are clearly identified as they are not masked by protection mechanisms such as ECC or parity.is fact allows a better analysis of the behavior of the device in presence of SEEs.
Concerning the limitations of the experiments, there are two points to consider: (i) e E16G301 processor does not have direct access to printf function for logging results.For this reason, it has to write the information about observed events in the external DDR memory of the board.is information is logged by the host processor (ARM).(ii) e physical distance between the E16G301 multicore and the host processor in the Parallella board is less than one centimeter.It was thus necessary to limit the neutron flux for avoiding particles affecting the host processor and other circuitries.

Experimental Setup.
e DUT was placed at a distance of 38.5 ± 0.5 cm to the target.e neutron beam energy was 14 MeV with an estimated flux of 7.2 × 10 4 n•cm −2 •s −1 with an error of ±0.1 × 10 4 n•cm −2 •s −1 .Special attention was required to protect the rest of the platform components from radiation.For that, the E16G301 multicore was irradiated through a small window on a 5 cm thickness polypropylene block intended to protect the readout platform.e power supplied of the multicore platform was monitored by using a camera available in the casemate of the accelerator facility.In this way, the voltage and current parameters were controlled.

Intrinsic Sensitivity.
is evaluation was performed by targeting the internal memory and accessible registers of each core of the E16G301 processor.e host processor of the Parallella board was in charge of filling the internal memory and registers of the multicore with a predefined pattern using the Epiphany SDK utilities E-READ and E-WRITE [25].In this manner, the whole internal memory of the E16G301 multicore could be targeted.Table 2 summarizes the sensitive zones of the multicore processor.ree static tests were performed with an exposure time of 1 hour each one, providing a fluence of 6.82 × 10 8 n•cm −2 as per the neutron facility records.During the tests, 69 SBU and 7 MCU that produce bit-flips were detected.ere were no observed errors in processor's registers.In addition, 5 SEFI that caused hangs were observed.Table 3 summarizes the test results.Note that the subscript number following MCU represents the multiplicity of the upset.
Table 4 shows a sample of data containing bit-flips caused by SBU and MCU produced in the local memory and logged during the experiment.
At the end of the experiment, the static cross section σ STATIC was estimated by applying (1).It provides the worstcase sensitivity of the device built in 65 nm CMOS technology.
Due to the scarcity of experimental data, 95% confidence intervals were applied to this result.In this case, the most accurate way to calculate the uncertainty margins consists in using the relationship between the cumulative distribution functions of the Poisson and chi-squared distributions as described in [26].en, the lower and upper limits for the dynamic cross section are 0.94 × 10 −7 cm 2 /dev < σ STATIC < 1.48 × 10 −7 cm 2 /dev. (3) Since the tested memory area of the multicore processor represents 4194304 bits, the 95% confidence interval for the static cross section per bit is estimated as  is evaluation was carried out to obtain the dynamic cross section (σ DYN ) of an application running in the multicore processor.ree dynamic tests were performed with an exposure time of 1 hour each one providing a total fluence of 6.95 × 10 8 n•cm −2 as per the accelerator records.e comparison between the obtained and correct results was performed inside the multicore processor.
e duration of the matrix multiplication is 27597 µs, and the comparison time is 1.44 µs.erefore, in one hour (3600 s), the application executes 130449 times, which multiplied by the comparison time gives 0.19 s (0.0052%) of loss of exposure time which is negligible.Table 5 summarizes the results of the dynamic radiation campaign.
From the results presented in Table 5, only erroneous results, time-outs, and hangs were taken into account to calculate the dynamic cross section as follows: σ DYNAMIC � 27 6.95 × 10 8 � 3.88 × 10 −8 cm 2 /dev. ( Silent errors were observed by reading input matrices aimed at detecting corrupted data caused by upset events which do not produce errors in the resulting matrix.ey are presented for showing the total number of SEUs occurred in the dynamic test.As in the static case, uncertainty margins were added to the results.en, the lower and upper limits for the dynamic cross section for a 95% confidence interval are as follows: 2.56 × 10 −8 cm 2 /dev < σ DYN < 5.65 × 10 −8 cm 2 /dev.(6) From the consequences of the dynamic tests, erroneous results are the most critical since the program considers them as valid results affecting dramatically the reliability of the application.e reliability of the device can be evaluated by means of its failure rate.e failure rate (λ) of the device is estimated by extrapolating the dynamic cross section at avionic altitude (35,000 feet) where the neutron flux (φ) is about 2.99 × 10 3 n•cm −2 •h −1 , by applying the following equation: As the reliability of multi-/manycore processors strongly depends on the implemented application (software), the failure rate of the device can be classified within the DO-178B/C (Software Considerations in Airborne Systems and Equipment Certification).
e DO-178B/C is a guideline used as de facto standard for developing avionic software systems [27].Table 6 shows the level of failure condition.
e failure conditions for avionic systems are described as follows: (i) Catastrophic: Failure may cause a crash.Error or loss of critical function required to safely fly and land the aircraft.(ii) Hazardous: Failure has a large negative impact on safety or performance or reduces the ability of the crew to operate the aircraft due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers (safety-significant). (iii) Major: Failure is significant, but has a lesser impact than a hazardous failure (e.g., passenger discomfort) or significantly increases crew workload (safety related).(iv) Minor: Failure is noticeable, but has a lesser impact than a major failure (e.g., passenger inconvenience or a routine flight plan change).(v) No Effect: Failure has no impact on safety, aircraft operation, or crew workload.
Results show that the failure rate of the multicore executing a matrix multiplication as application falls in level D of the DO-178B/C.erefore, the device is convenient for minor failure conditions, which includes several applications regarding data and image processing.In fact, the NASA's report, "Intelligent Hardware-Enable Sensor and Software shows the use of Parallella board containing the Epiphany multicore for unmanned aircraft systems [30].e current article is relevant since it supports the use of the Parallella board in aircraft applications by presenting experimental data concerning its radiation sensitivity.

Conclusions
Radiation experiments performed with 14 MeV neutrons are a useful technique for evaluating the intrinsic sensitivity of the multicore and dynamic response of the application.
Results demonstrate that the Epiphany E16G301 multicore is suitable for embedded systems performing noncritical avionic applications.e fact that the Epiphany does not implement protection mechanisms has permitted a true estimation of the error rate, confirming that protection mechanisms affect testing as stated in [31].
Although the maximum flux provided by the radiation facility is 4.5 × 10 7 (n•cm −2 •s −1 ), the applied neutron flux was limited to 7.2 × 10 4 n•cm −2 •s −1 in order to avoid perturbations in the circuitry of the Parallella board.However, the applied flux is about eight orders of magnitude greater than the flux at avionic altitudes, which corresponds to almost 10 years of exposure time of the device to neutron radiation at 35,000 feet.
Despite the efforts for protecting the rest of components of the platform, the SD card containing the Linux OS was corrupted in one of the experiments.
is was solved by replacing the tainted SD by a new one and repeating the test.
In future work, the Adapteva Epiphany multicore processor will be proposed to be used for image processing in a military aircraft.In parallel, another module containing the Epiphany processor will execute a memory-bound application in order to detect SEUs produced in real operating environment.

Table 1 :
Independent and dependent variables for radiation tests.

Table 2 :
Sensitive zones of the Epiphany E16G301 multicore processor.

Table 4 :
Example of the obtained results in the static tests.

Table 5 :
Results of the dynamic radiation test campaigns.