A New Approach Customizable Distributed Network Service Discovery System

Computer systems and applications on the internet provide services to outsiders and, at the same time, the vulnerabilities may be exploited by attackers and leak some sensitive private information. To collect and monitor the service information provided by the network environment such as IoT (Internet of Things), vehicular networks, cloud computing, and cloud storage, it is particularly important that a system can provide faster service discovery for discovering and identifying specific network services. The current service discovery systems mainly use port scanning technology, including Nmap, Zmap, andMasscan. However, these technologies hard code the service features and only support common services so that cannot cope with real-time updates and changing network services. To solve the above problems, this paper proposed a customizable distributed network service discovery system based on stateless scanning technology of Masscan and proposed a customizable interactive pattern set syntax. The system used random destination address technologies to scan for Ipv4 address allocation and used a distributed deployment scheme. Experimental results show that the system has high scanning speed and has high adaptability to new services and special services.


Introduction
With the growth of internet devices and applications, various large scale cyberattacks continue to emerge, and internet vulnerabilities also show a surging trend [1,2]. Despite the recent growth in computer networking best practices, the continual improvement in Internet-based services has presented new challenges in maintaining security and preserving privacy [3,4]. Even though some enterprises have discovered vulnerabilities and released repair patches, many users still do not update, leading to potential security threats and providing attackers with access to attacks. At the same time, many web apps and services are installed on the devices hosting a web client and providing the interface for user control with open ports, where security and privacy are the critical issues [5,6]. Censorship needs to know these security risks, that is, to count and supervise the service information in a large-scale network.
The IoT, vehicular networks, cloud computing, cloud storage, and other environments can provide users with flexible and convenient service access [7,8]. While greatly improving the convenience of life, privacy issues caused by security problems are also becoming more and more serious [9,10]. For example, in vehicular networks, security plays a dominant role as applications based on vehicular networks usually correspond to passengers' safety (e.g., self-driving) and privacy information (e.g., driving history) [11]. So the security of the network should be one of the most important issues in the upcoming days. Searching and gathering the specific information of the devices on the internet provide data to analyze the vulnerabilities which can enhance system's security and preserve privacy [4,12]. A common tool to deal with this problem is port scanning, but current scanning tools have two disadvantages. In one hand, supported services are mostly hard coded in the system, and for less common, newer services, you need to wait for the developer's update support. It has poor scalability, as evidenced by the famous Masscan, which only supports HTTP, SSL, and other common protocols but ignores industrial network protocols and instant messaging protocols. On the other hand, the traditional scanning methods are noninteractive detection, so they are failed for service identification with multiple interactions.
In order to solve the above problems, we designed a customizable distributed network service discovery system (CDNSDS) in this paper. The main contributions of this work are as follows: (i) We designed a system architecture, which includes three subsystems: central control subsystem (CCS), schedule control subsystem (SCS), and scanning proxy subsystem (SPS). The CCS is the brain; it receives the user's instructions and manages and assigns tasks to the SCS. The SCS is the bridge connecting CCS and multiple SPS. The last subsystem is the SPS, the key factor for performance. We optimized Masscan, the most efficient scanning tool currently, and used a distributed program to improve concurrency (ii) To be customizable, we had compiled a pattern set of syntax conventions. The syntax conventions can convert the user's customized services description, including interactive service, to standard syntax which is accepted by a scanning tool in the SPS The rest of the paper is organized as follows: a related work is described in Sections 2 and 3 elaborates the proposed system CDNSDS; experimental results are followed in Section 4. Finally, we summarize the research in Section 5 with a discussion as well as a future work.

Related Work
In the study of empirical security, fast Internet-scale network service discovery has opened a new avenue, while scanning technology plays an important role. One of the earlier scanning tools is Nmap [13], which maintains a full-connected state to track hosts that have been scanned and to handle timeouts and retransmissions. In this state, the unresponsive requests cost too much time; it takes several weeks or many machines for Nmap to scan the public address space. To overcome the issue of efficiency, Zakir et al. [14] designed a scanning tool Zmap based on no per-connected state. For Zmap, there is no need to track connection timeouts, and it accepts response packets with the correct state fields during the scanning. The manner of Zmap is similar to SYN cookies. Compared to Nmap, with the same accuracy, Zmap is capable of scanning the IPv4 public address space for under 45 minutes on a single machine [15], which is over 1300 times faster than the most aggressive Nmap default settings. Further, drawing on the data collected by Zmap from ongoing Internet-wide scanning, Zakir et al. [16] designed a public search engine named Censys, which supports full-text searches on protocol banners and querying a wide range of derived fields. With Censys, it becomes simple to help researchers answer security-related questions.
Although Zmap has greatly improved in performance, scanning technology is still in progress. The fastest internet port scanner Masscan [17], an open source project, only takes six minutes to scan the IPv4 public address space, transmitting 10 million packets per second. For the sake of high performance, Masscan takes endeavour from three aspects. For one thing, similar to Zmap is the use of no perconnected state. Because Masscan can simultaneously maintain the number of connections which is set by the program itself, the number can be set very large, so the scanning speed is much faster than other scanners. For another, Masscan uses a custom TCP/IP stack, and a designated network device and PF_RING DNA driver are necessary conditions. It is a lightweight protocol stack that means the underlying packet processing, connection control, etc. will bypass the operating system protocol stack, so the protocol stack process is simpler and there will be a substantial increase in performance. In addition, the configuration of Masscan is more flexible, not limited to single-port probing, and a user can specify the port segment. Through a target address randomization algorithm, it can be more effective to random host range for target that can evade from detecting of Intrusion Detection System (IDS).
Except for the above famous scanners, a number of research efforts focus on empirical security. In order to scan anonymously, Rodney et al. [18] performed scanning through Tor, which can hide the source's IP address from the target. Andrei et al. [19] proposed a public, large-scale analysis of firmware images, which supported a global understanding of embedded systems' security. At the same time, the Heartbleed vulnerability is the measurement and analysis in [20]. In the weak keys detecting, researchers [21][22][23] reported they had computed the RSA private keys for HTTPS hosts on the internet and traced the underlying issue to widespread random number generation failures on network devices. Arzhakov et al. [24] proposed a multithread network scanner with a very flexible architecture that allows us to parallelize the process of sending requests and receiving responses from remote hosts. Focused on automated web scanners, Fang et al. [25] gave a new direction for the detection of the fingerprint using a finite state machine to abstract differences of scanners.

Service Discovery System
3.1. System Architecture. The traditional service discovery systems may cause issues such as triggered IDS alarm and single-node detection poor performance. In this paper, we design and implement the CDNSDS and the architecture is shown in Figure 1.
(i) Central control subsystem (CCS): it is the brain, which receives a user's instructions and manages and assigns tasks to the SCS. Users can get the task process and results and manage the attribute and state of the scanning node. In this subsystem, we design a pattern set of syntax conventions to support customizability.
(ii) Schedule control subsystem (SCS): it is a bridge connecting CCS and multiple SPS. It provides task division, scheduling management, and results of the temporary service.

Wireless Communications and Mobile Computing
(iii) Scanning proxy subsystem (SPS): it is the key factor for performance. The SPS consists of several distributed agent modules. Each agent is a scan node with optimized Masscan that performs real-time scan task from the SCS.

Central Control Subsystem (CCS).
The CCS provides users with customizable service probe interfaces. A pattern set of syntax conventions is defined for customization as follows.
We use s for send state and r for receive state. Denote D = fs i , r j g as instructions, s i is the ith state of send, and r j is the jth state of receive. The attributes of different state are split with character '.'. Property set of send state s is and property set of receive state r is P r = isbanner, patterns, len, goto f g : ð2Þ Table 1 shows the list of the property descriptions of P s and P r . An example state transition diagram is shown in Figure 2.
In this example, there are three kinds of state node: (1) the green solid nodes s0, s1, and s2 are send state; (2) the hollow blue nodes r1, r2, and r3 are receive state without banner output that means the property isbanner equals to false; and (3) the solid nodes r0, r4, and r5 are receive state with banner output that means the property isbanner equals to true.

Schedule Control Subsystem (SCS).
The SCS is aimed building an efficient and reliable communication environment between the CCS and SPS, while providing intermediate data storage and high-speed read service.
The SCS contains three modules: state management module, message queue module, and cache module.
(i) Status management module: to better understand the survival status and scanning progress of each scanning agent node, the SCS is logically responsible for building the communication environment between the central control system and each scanning node. At the same time, to deal with the problems such as downtime of the CCS and change of server address, the SCS also provides an interface to dynamically manage the connection configuration of the scanning nodes to ensure the normal   (ii) Message queue module: it contains task queue management and result queue management. The detection tasks issued by the CCS will split into specified slices for smaller granularity and detection. Each scanning agent node consumes only one slice at a time, and these task slices will be handed over to the task queue management. From the scanning agent's point of view, each slice represents a scanning task, and the result of the task may be success, fail, or timeout. The task queue manages the various results that may exist after each task slice is received. The slice that fails to scan is reenlisted for other scanning nodes to probe again. When the task is successfully scanned, the result data will be passed back from the scanning node to the result queue, which will store that result slice temporarily for the CCS.

Wireless Communications and Mobile Computing
Similarly, the result queue will also manage the status of the results processed by the central control system as described above.
(iii) Cache module: after the scanning node successfully detects the target address set, it will pass the result slice back to the result queue and then open the next detection task. Due to the large number of potential detection nodes, if each slice's results are stacked in the CCS, it will increase the pressure on its storage and processing. Therefore, the module provides a dumping service to the cache and notifies the CCS to consolidate, deduplicate, and persist the results after receiving. Thus, the cache module provides memory-level high-speed data processing functions.
For each packet received, the SCS will determine whether it is a task slice, heartbeat data, or task result.
(i) Task slice packet: it is handed over to the "task queue" to manage and monitor the execution (dispatch) result of this slice.
(ii) Heartbeat packet: it is forwarded to the central control system to update the survival and progress status.
(iii) Task slice result data: it is temporarily stored in the cache module. The task slice result data is temporarily stored in the cache module to remind the central control system for integration.  (ii) RST: the target is open, but the destination port is close.

Scanning
(iii) ICMP unreachable: the target is close.
(iv) No response: connection timed out.
It is obviously that in the first case we can keep detecting while other cases can be directly abandoned.
Normally, scanners based on semiconnected state will send RST to close connection after receiving SYN-ACK. Such a scheme is not suitable for interactive detection. This issue can be resolved by two possible solutions.
(i) Using the operating system protocol stack, reestablish the connection to the open target port for deep probing (ii) Send ACK to finish three-way handshake instead of RST The first solution theoretically provides reliable connection, but the number of connections is limited. The second solution requires a user-mode protocol stack and is more efficient than the former. Fortunately, Masscan already provides this functionality. Therefore, the second solution is adopted in this paper.
In order to record all active connections, a TCP connection table is needed to maintain the management of Transmission Control Block (TCB) which contains all the important information about the connection, as shown Figure 3.
The interactive service detection hierarchy is displayed in Figure 4. Through asynchronous threads, the sending and receiving are separated.
During service discovery, packets need to bypass the original system stack; otherwise, the original system stack will send RST packet because of the absence of connections. This paper proposed two solutions.
(i) ARP cheat: send an ARP packet with an unreal IP in same subnet to router.
(ii) Modify Linux iptables: drop traffic with a specified port.

Randomize Target Address.
The CCS delivered tasks in the form of fragments. Under the premise of address randomization, in order to avoid duplication of detection intervals of all nodes, the system sets the range as S range = N hosts × N ports to serialize the scan range. Denote IP segment A = fA 1 , A 2 ⋯ A n g, port setment B = fB 1 , B 2 ⋯ B m g, IP-port consist data: Scan range mapping set is follows: Using the above mapping set, the conversion from the index of the range to host addresses and ports can be achieved according to equations (5) and (6). We use this conversion to find the IP and port of the ith scan task.  Figure 5: System deployment diagram. 6 Wireless Communications and Mobile Computing After serialization, let us suppose fragment range set is r, then randomization of r is that Due to the condition r ′ is not unique, the degree of randomization is judged by comparing the same number of elements. The less the number, the higher the degree. In the scan module, randomize target address using generalized Feistel [26] encryption to achieve k × M → M, where k is any number and M is the target host range. In the k intervention, a mapping process to achieve the same range of random is as follows: This method is a modification of Feistel encryption, function is Fe½r, a, b, r is rounds, a, b ∈ N, and ab ≥ k. The process of randomize address recovery is as follows.

Experimental Environment and Deployment.
In order to satisfy cluster operations and distributed schedule, the system adopted Docker Swarmkit and deployed in 5 nodes. Among them, in Docker Swarm Mode, the CCS and SCS are deployed in the admin node, and another five SPS are deployed in other nodes. The system deployment diagram is shown as Figure 5.

System
Testing in Single Node. In this paper, we choose a single node to probe HTTP service, the target host segment is 169.54.23.0/16, and the probe port is 80. Each set of experiments is the average of three testing. The testing results are shown as Table 2.
According to the data, we can get the following charts. As Figures 6 and 7 show, when sending rate less than 3000 pkt/s,    3190  3154  178  150  1k  3239  3240  95  79  2k  3151  3157  52  46  3k  3033  3080  38  34  5k  2189  2251  26  25  1w  1218  1054  17  17 7 Wireless Communications and Mobile Computing the results are generally flat, but after that, they decreased significantly. The reason is that when the rate of sending packets increases, the time reduces, so it takes time to wait for service probe packets or SYN-ACKs. Therefore, there is such a situation that the response packets arrive after the scanner shut down. 3000 pkt/s is a stable sending rate in this testing.

System Testing in Multinode.
There are a total of five nodes for testing; the pattern set comes from the analysis of the protocol icoco with port 80. In order to compare the distribute platform and single node, we test in the following two aspects.
(i) Scanning range is fixed, and sending rate changes (ii) Sending rate is fixed, and scanning range changes For the first aspect, the target host segment is 169.54.23.0/10 and port is 80. The results are shown as Table 3, and the trend is shown in Figure 8. In Table 3, results mean the number of icoco service.
For the second aspect, the fixed sending rate is set 1000 pkt/s, the results for different scope are shown in Table 4, and the trend is shown in Figure 9.
As can be seen from Figure 9, the accuracy between multinode and single-node is similar. However, as the scanning range increases, the multinode shows better performance.
Consider ratio changes at the same sending rate of the single-node and multinode, as shown in Figure 10. In an ideal    As can be seen from Figure 10, reaching the ideal ratio value "5" is determined by the packet rate and the scope of the probe host. It can be derived as follows in conclusion: (i) If the detection range is fixed, the lower the sending rate, the easier it is to approach the ideal ratio (ii) If the sending rate is fixed, the larger the detection range, the easier it is to approach the ideal ratio Based on this conclusion, for better detection results, the system parameters can be adjusted by three factors: the number of nodes, the range of detected host, and network bandwidth.

Conclusions
The current service discovery system cannot deal with realtime updates and changing network services. Existing scanners only support the probing of common public protocols. This paper designed a Customizable Distributed Network Service Discovery System (CDNSDS) to solve the issue. CDNSDS consists of three subsystems: CCS, SCS, and SPS. In the CCS, a pattern set of syntax conventions is defined to assist users in customizing scan features. At the same time, the SPS provides an efficient Masscan-based scanning module. In the SPS, we describe interactive detection technology, including TCP connection management, and randomize target address in detail. Finally, the Docker Swarm Mode is used to distribute container choreography, and the experiment shows that the CDNSDS has high efficiency and accuracy, especially in industrial control protocols.
As a future work, the system should extend the syntax of the pattern set to make it better adapted to the changing protocol, such as dynamically constructing the sending packet for the reply packet. At the same time, it is necessary to calculate the relationship expressions of the transmission rate, transmission range, and the optimal ratio mathematically to arrange the distributed nodes to conduct detection with higher timeliness.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.