The Internet of Things is one of the hottest topics in communications today, with current revenues of $151B, around 7 billion connected devices, and an unprecedented growth expected for next years. A massive number of sensors and actuators are expected to emerge, requiring new wireless technologies that can extend their battery life and can cover large areas. LoRaWAN is one of the most outstanding technologies which fulfill these demands, attracting the attention of both academia and industry. In this paper, the design of a LoRaWAN testbed to support critical situations, such as emergency scenarios or natural disasters, is proposed. This self-healing LoRaWAN network architecture will provide resilience when part of the equipment in the core network may become faulty. This resilience is achieved by virtualizing and properly orchestrating the different network entities. Different options have been designed and implemented as real prototypes. Based on our performance evaluation, we claim that the usage of microservice orchestration with several replicas of the LoRaWAN network entities and a load balancer produces an almost seamless recovery which makes it a proper solution to recover after a system crash caused by any catastrophic event.
The Internet of Things (IoT) is one of the hottest topics in communications today. Although previous forecasts may have overestimated the growth of connected IoT-devices, it is clear that the current and short-term market revenues are impressive: from $151B in 2018 up to $1,567B by 2025. Current IoT-devices vary from 6 to 9 billion devices (e.g., the current number of IoT devices in 2018 is 7 billion, according to IoT-analytics [
Many of the IoT services follow the category of massive Machine-Type Communications (mMTC), one of the three major 5G use cases (in addition to enhanced mobile broadband and ultrareliable and low latency MTC). Since mMTC communications assume a massive number of devices, in most of the cases battery-powered and in a high number of locations and environments, its main requirements are low-power communications and a wide range coverage.
Two main technologies which fulfill these requirements are being used for these applications: cellular evolution and Low Power Wide Area Networks (LPWAN).
Regarding cellular evolution, the Third Generation Partnership Project (3GPP) has tried to adapt the existing mobile standards for the requirements of IoT devices. In this way, cellular IoT standards utilize the existing mobile network infrastructures in an effort to integrate both worlds. Some of the main cellular technologies for IoT are Extended Coverage Global System for Mobile communications (EC-GSM), LTE Cat-0 (new low complexity Long Term Evolution device, defined in 3GPP Release 12), LTE-M, and narrow-band IoT (NB-IoT). NB-IoT addresses the specific requirements of mMTC but, unlike LTE Cat-0 and LTE-M devices, requires a specific frequency band different from those used for LTE or LTE-Advanced.
To the contrary of cellular technologies, LPWAN technologies have been born with IoT requirements from the very beginning. Previous attempts such as local or mesh networks can accommodate part of IoT services, such as low battery consumption and optimization for low data rates but are not intended for global coverage. Some of the most popular LPWAN technologies are LoRaWAN, SigFox, RPMA, and NWave. They offer long range (up to several tens of kilometers), very low power consumption (years of battery operation), and very low bandwidth (tens of kbps) and utilize license-exempt frequency bands. Another advantage of LPWANs is that they require a much lower investment compared to mobile networks, allowing new players to compete with current Mobile Network Operators (MNOs). For this reason, many MNOs (e.g., KPN, Orange, SK Telecom, Bouygues Telecom, Swisscom, and SoftBank) have started to deploy LoRaWAN (Long-Range Wide Area Network) to complement their current cellular networks deployments.
In this article, we propose the design of a LoRaWAN testbed for supporting critical situations, i.e., an IoT testbed that shall be able to automatically recover if part of its network infrastructure is destroyed. Since, in the case of LoRaWAN, the radio equipment is cheap and can be easily replaced, we will focus on the core network infrastructure.
This testbed will be integrated with the demonstrator from the 5G-City [
Testbed of the 5G-City research project [
One of the objectives of the 5G-City research project is the design of a virtualized 5G network for massive IoT and broadband experience. Within the 5G context, different wireless technologies will coexist as technological alternatives for the interconnection between information producers and consumers.
In the case of IoT, one of the wireless technologies that will be included in the 5G-City demonstrator will be the LoRaWAN network prototype proposed in this paper. For that purpose, this prototype will be integrated with the 5G-City 4G/5G network demonstrator following our previous work in [
The literature that defines the state of the art regarding the evaluation of LoRaWAN and LoRa testbeds in real deployments is very rich in quality and quantity. A number of papers have been devoted to measuring LPWAN performance metrics in both indoor and outdoor deployments, as well as in rural, urban, and suburban scenarios.
Almost invariably all of these works focus on coverage measurements. Particularly, different evaluations are reported, ranging from covered distance [
In [
However, none of the reported papers evaluates the LoRaWAN resilience dimension under a critical situation, i.e., the elapsed time for recovery and packet losses impact after a system crash. Note that these quantitative evaluations will definitively help to determine the LoRaWAN suitability for IoT deployments under critical circumstances.
The main objective of this paper is the proposal of a self-healing LoRaWAN network architecture in order to provide resilience under critical situations such as earthquakes, fires, or hurricanes. Under such conditions, part of the network equipment may become faulty. By virtualizing the different entities in the core network, i.e., converting them into VNFs (Virtual Network Functions), we will be able to reduce costs, increase flexibility, and provide resilience. We have implemented different options for the virtualization of the LoRaWAN core network entities which will be compared in terms of time for recovery, packet losses, and resource usage.
For this objective, the rest of the article is organized as follows. Section
LoRaWAN [
The spreading factor is defined as
LoRaWAN is an open standard managed by the LoRa Alliance. LoRaWAN defines the Medium Access Control (MAC) layer on top of the LoRa physical layer. It also defines the system architecture.
The MAC layer utilizes a duty cycle to reduce the probability of collisions in a simple and hardware-efficient manner. Depending on regional regulations [
In the case of European regulations, the duty cycle is 1% and the combination between SFs and bandwidths produces the different data rates (DR) included in Table
Parameters for the different LoRaWAN DRs.
DR | SF | BW (kHz) | data rate (bps) | ToA (ms) | Time between frames (s) |
---|---|---|---|---|---|
0 | SF12 | 125 | 250 | 1482.8 | 148.3 |
| |||||
1 | SF11 | 125 | 440 | 823.3 | 82.3 |
| |||||
2 | SF10 | 125 | 980 | 411.6 | 41.2 |
| |||||
3 | SF9 | 125 | 1760 | 205.8 | 20.6 |
| |||||
4 | SF8 | 125 | 3125 | 113.2 | 11.3 |
| |||||
5 | SF7 | 125 | 5470 | 61.7 | 6.2 |
| |||||
6 | SF7 | 250 | 11000 | 30.8 | 3.1 |
The architecture of a LoRaWAN network is based on a star topology, as shown in Figure
LoRaWAN network architecture.
All the frames forwarded from the gateways to the network server are integrity protected (thanks to a Message Integrity Code (MIC) generated with a network session key,
In order to exchange the required session keys (
In the first case, the developer shall include this information in both the nodes (i.e., stored in their firmware) and the servers. Thus, no signalling is needed. In the second case, the node shall send a
LoRaWAN activation types.
LoRaWAN allows nodes to have bidirectional communications with gateways although asymmetric, since uplink transmissions (from nodes to gateways) are strongly favored. Three types of devices are defined (classes A, B, and C) with different capabilities. Class A is the most energy efficient and must be supported by all nodes. Class A nodes use pure ALOHA for uplink access, and they can only receive a downlink frame after a successful uplink transmission. This class is intended for battery-operated sensors. Class B nodes utilize beacons sent from the gateway to determine whether they have to receive downlink frames or not, using scheduled receive windows at a predictable time without the need of successful uplink transmissions. This class is intended for battery-operated actuators. Finally, class C nodes are always listening to the radio interface except when they are transmitting. Due to its power consumption, class C is intended for main powered actuators. As commented, class A is mandatory for all LoRaWAN nodes, and the three classes may coexist in the same network.
This section presents the two options which have been implemented for the virtualization and the automatic orchestration of a LoRaWAN network. As commented, the first proposal is based on microservices using the Kubernetes platform. The second implementation is based on virtual machines, using OpenStack along with its modules for the automatic deployment of the LoRaWAN services.
Due to the architecture of a LoRaWAN network and the lightweight functionalities of the different entities, they can be deployed as microservices. With the success of containerization technologies, such as Docker [
In order to implement a LoRaWAN network which supports autorecovery in the case of an emergency situation such as an earthquake, fire, hurricane, or any other situation that may destroy part of the core network infrastructures, the usage of a microservice orchestration platform may suit these requirements due to its efficiency in terms of CPU, memory, and storage consumption compared to virtual machines [
In particular, we propose to utilize the Kubernetes platform [
Proposed network architecture based on microservices.
The LoRaWAN network and application servers are based on the LoRaWAN Server Project [
Kubernetes pod (group of colocated containers that are tightly coupled and need to share resources) for LoRaWAN deployment.
In the case of the implementation using virtual machines, due to its popularity, large community, high availability of modules, and being open-source, we have opted for using OpenStack. Our OpenStack testbed is based on the Rocky release and has been installed using the DevStack scripts.
Apart from the primary OpenStack services (Keystone for the identity service, Glance for the image service, Nova for the provision of compute instances or virtual machines, Neutron for network connectivity, and Horizon for the dashboard user interface), Heat and Ceilometer have been installed for the orchestration and the telemetry service. The chosen hypervisor is KVM (Kernel-based Virtual Machine) using QEMU Copy-on-write (qcow2) as the virtual machine image format.
For comparison purposes, the virtual machine images are based on CentOS 7 cloud images, similarly to the Kubernetes deployment. For the same reason, the LoRaWAN network and application servers have also been installed using the LoRaWAN Server Project [
Network architecture based on OpenStack and virtual machines.
For our OpenStack testbed, we have developed a module which automatically starts the provision of resources for a new instance, which is then launched. This procedure is triggered once that the original instance is destroyed due to, e.g., a catastrophic event. The module utilizes the API provided by the heat orchestrator and the metrics from ceilometer.
This section describes both the hardware and software used for the design of our LoRaWAN network prototype.
The radio access network of our LoRaWAN network prototype is composed of 5 gateways and 12 nodes. The gateways are LiteGateways from iMST [
LoRaWAN gateways used in the proposed testbed.
The nodes are TTGO-LoRa32 devices (see Figure
LoRaWAN nodes used in the proposed testbed.
Our core network prototype is composed of two servers with an Intel Core i7-7820X CPU (8 cores operating at 3.6 GHz) and 32 GB of RAM located in University of Granada (UGR). These two servers act as the master node of the Kubernetes cluster and the first worker node (minion-1). In addition, we have other two servers located in Barcelona at Universitat Politècnica de Catalunya (UPC) (based on a six-core Intel i7-5820K operating at 3.3 GHz) and Fundació i2CAT (based on a Intel Xeon E312xx (Sandy Bridge) with 32 cores operating at 2.5 GHz), which acts as the second and third worker nodes (minion-2 and minion-3), respectively.
The IP network is a direct Ethernet connection between the gateways and the master server located at University of Granada, which also implements the frontend using one NGINX ingress controller.
The nodes are programmed using the Arduino framework, based on the IBM’s LMIC library [
Figure
Detail of a LoRaWAN node.
In order to avoid that the connection between the nodes and the experiment manager may impact the results of the experiments, e.g., due to additional delays, all the nodes connect to the Wi-Fi network only when they are switched on; i.e., there is no connection establishments during the experiments. The mean response time, between the request from the node and the response from the server, has been measured around 85 ms, which is much lower than the time between LoRaWAN frames (which has a minimum value of 6.2 seconds according to Table
As previously stated, the gateways are based on the Raspberry Pi platform with the iC880A concentrator. The software is based on the reference gateway implementation from TTN-ZH (Zurich community of The Things Network) [
Our Kubernetes cluster is based on Kubernetes version 1.5.2. For portability and reproducibility purposes, both master and worker nodes have been virtualized and executed using VirtualBox version 5.2.22. The host operating system is Ubuntu Server 16.04.05 (64 bits), and the guest operating system is CentOS 7.5.1804 (64 bits) running with 1 core and 2 GB of RAM. We utilize Vagrant version 2.1.5 in order to automatize the virtual machines deployment, and Ansible version 2.6.4 to automatize the installation and configuration of the required packages.
In order to simplify the requirements for connecting the worker and master nodes, a VPN was created using OpenVPN version 2.4.6 using client certificate authentication. In this way, only the master node is required to have a public IP address. In addition, it eliminates the possibility of issues due to firewall rules. In our VPN, the master node acts as the OpenVPN server whereas the worker nodes act as OpenVPN clients. The workers also collect tcpdump traces on the TCP/UDP ports that are used by services related to the LoRaWAN deployment, which are later sent to the experiment manager.
The dockers that implement the LoRaWAN deployment are also connected to the experiment manager, which starts/stops the services depending on the experiment and collects the required logs and stats.
The experiment manager connects to the different entities using SSH connections, which enable us to execute commands (e.g., to start or to stop a particular service), to upload files (e.g., a configuration file), or to download files (e.g., logs, tcpdump traces, or stats). In the case of the LoRaWAN nodes, they connect to the experiment manager after transmitting one LoRaWAN frame to request the transmission parameters for the next frame. If the connection manager commands the node not to transmit, it will ask again after 10 seconds. Figure
Experimentation testbed.
The NGINX ingress controller has been configured with the default values for load balancing the different external services, i.e., the UDP port 1700 (which is used by the lorawan-gateway-bridge container), the TCP port 1883 (which is used by the MQTT broker), and the TCP port 443 (which is used for the HTTPS-based GUI). The main parameters are
Based on this experiment manager, we have developed a framework based on scripting to generate different scenarios for both the radio access network and the core network and to automatically collect statistics, which will be used in the next section for the experimental evaluation.
Considering the two options that we have followed for the virtualization of the LoRaWAN network entities, i.e., using microservices (Kubernetes) and virtual machines (OpenStack), the following use cases have been tested:
The reason of selecting these four use cases is twofold. First, testing and comparing different configurations using microservices have been proved to be suitable for the deployment of LoRaWAN servers. For that purpose, we want to compare the usage of Kubernetes with default values (UC1, with a timeout of 300 seconds), with two replicas in order to achieve a solution (almost) without service interruption (UC3) and an intermediate situation (UC2). Second reason is to compare both Kubernetes (with containers, UC1) and OpenStack (with virtual machines, UC4) with their default configurations.
Since we want to simulate a high-loaded IoT scenario, nodes will transmit 12-byte frames using SF7 and 125 kHz, leading to a minimum time between frames of 6.2 seconds due to the duty cycle (see Table
With these values, the average time between frames is 15 seconds, i.e., leading to a load of
In the proposed use cases, the main performance indicators are the recovery time, i.e., the time that elapses from the failure in one worker node until another worker node executes the pod with the LoRaWAN deployment, and the lost frames during that recovery. It shall be noted that the lost frames will depend on the gateway load, and the results shown in this section are given for the aforementioned load of 0.8 frames/sec. Additionally, we also want to show the different requirements, in terms of CPU and memory, between the usage of a Kubernetes cloud or OpenStack.
Figures
Recovery time after server failure.
Frames lost during recovery.
As shown, UC1 takes between 5 and 6 minutes (with an average of 321 seconds and an standard deviation of 13.6) to recover due to the default value of the pod eviction timeout (300 seconds). In the case of UC2, we have reduced this timeout to 30 seconds, leading to a recovery time of around one minute (average of 64 seconds with an standard deviation of 14.7). To conclude with the Kubernetes-based use cases, UC3 has an almost negligible recovery time. This is because two replicas are already executed, and the second one takes over when the first one fails. Since we utilized the default values for the NGINX frontend, only three packets are required to change from one replica to another. The rate of these packets depends on the transmissions from the LoRaWAN nodes and some periodic packets, being the recovery time of 4 seconds with a standard deviation of 2.4. As it was expected, the number of lost frames is approximately proportional to the recovery time.
In the OpenStack use case (UC4), the developed module waits 5 minutes (like the default timeout value for Kubernetes) before provisioning and launching the new instance, leading to a total recovery time of around six minutes (with an average of 358.1 seconds and a standard deviation of 0.72). As in the previous use cases, the number of lost frames is almost proportional to the recovery time.
Next, we compare the usage of resources of both options. For Kubernetes, we employed cAdvisor [
Usage of resources for the containers of the LoRaWAN deployment.
Container | CPU % | MEM % | MEM (MiB) |
---|---|---|---|
lora-app-server | 0.00 | 0.60 | 11.3 |
| |||
loraserver | 0.00 | 0.40 | 7.6 |
| |||
lora-gateway-bridge | 0.00 | 0.20 | 4.0 |
| |||
mosquitto | 0.00 | 0.00 | 1.5 |
| |||
postgres | 0.00 | 0.70 | 13.0 |
| |||
redis | 0.00 | 0.00 | 1.6 |
The results from Table
Figure
Total CPU usage including all Kubernetes processes.
In terms of memory, around 1.6 GB are used by the worker node. The processes that reserve more memory are also related to Kubernetes (java with 392 MB, kubelet with 75.9 MB, dockerd-current with 51.5 MB, kube-proxy with 39.9 MB, and flanneld with 26.8 MB).
In the case of OpenStack, the total memory employed by the worker node is 8.45 GiB when no instances are deployed and 288 MiB more when one virtual machine with the LoRaWAN is executed. In terms of CPU, worker node consumes 1.28 CPU cores. This means that, in our given scenario, OpenStack requires more than 5 times the memory needed by Kubernetes and more than 18 times in terms of CPU consumption.
In this paper, we propose the usage of a microservices platform such as Kubernetes for the deployment of a LoRaWAN network infrastructure. Based on its orchestration capabilities, the proposed framework is able to support catastrophic situations and to rapidly recover from equipment failure in the core network. To evaluate the performance of this solution, a prototype testbed of a complete LoRaWAN network has been implemented. By using an experiment manager, we have been able to automatize the node traffic generation and the automatic recollection of stats, as well as the presence of failures. We have evaluated our implementation in terms of time to recover, lost frames, and resource usage. After the conducted evaluation, we claim that the usage of several replicas of the LoRaWAN core network entities and a load balancer, which automatically changes between servers in a fast and efficient way, produces an almost seamless recovery, what makes it a proper solution to recover after a system crash caused by any catastrophic event.
For future work, based on our previous analysis [
The data has been generated from live tests in our LoRaWAN testbed. Logs or any other information is available upon request to the authors.
The authors declare that they have no conflicts of interest.
This work is partially supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (Projects TEC2016-76795-C6 and EQC2018-004988-P).