Research Article A Covert-Aware Anonymous Communication Network for Social Communication

. To efectively protect the communication content and communication behavior of social networks, anonymous communication technologies are widely used. However, the anonymous communication networks represented by Tor and I2P lack the covertness of the control plane design, which leads to important user behavioral characteristics in the process of accessing anonymous communication networks. Terefore, network monitors can analyze users’ communication behavior by tracking these characteristics. In this paper, the concept of covert measurement is proposed. On this basis, a software-defned anonymous communication network architecture is presented, which also considers the covertness of the anonymous communication network control plane and the data plane. According to the theoretical analysis and experimental results, the anonymous communication network architecture proposed in this paper has better anonymity and usability than traditional anonymous communication networks, such as Tor.

measurement method based on Shannon entropy [8] in information theory was proposed in 2002, many diferent measurement methods have been derived from the information theory and entropy measurement methods, such as normalized entropy [16], Rényi entropy [17], and conditional entropy [18]. Ten, some scholars proposed methods based on time [11], game theory [19], and diferential privacy [20] to measure the anonymity of anonymous communication networks. In 2018, Das et al. [21] proposed the triangle dilemma of an anonymous communication network, emphasized that only two of the factors of anonymity, delay overhead and trafc overhead, can be selected and utilized their evaluation model to evaluate the anonymous communication network based on onion routing.

Covert Access and Detection.
Covert access includes confusion, encryption, and felds disguised as normal protocols to eliminate the trafc characteristics during software access. Tis paper introduces the technologies used in common anonymous communication networks, such as Tor. 1. Meek: its principle is to use the nonprohibited protocol as the tunnel to pass the Tor trafc in the tunnel. It uses domain fronting technology [22], utilizing HTTPS and CDN to bypass censorship. Meek detection is mainly based on machine learning. Shahbar and Zincir-Heywood [23] used the decision tree classifer to analyze the time span, the number and repeatability of connections, the amount of data transmitted and the number of connections established, or the packet size, the number of bytes sent, and the maximum packet size; then, the trafc can be identifed after learning. Qureshi et al. [24] indicated that the total duration of TCP connections of normal HTTPS and meek is diferent and the length distribution proportion of TCP payload is also different. Zhao et al. [25] studied Tor trafc classifcation using the state-of-the-art algorithm, which included J48, J48Consolidated, BayesNet, jRip, OneR, and RRPTree. In addition, the entropy characteristic of Meek also has a certain efect in detection when using machine learning.
Te Obfs includes obfs2, obfs3, obfs4, and Scrabblesuit [26]. At present, obfs4 is the most commonly used. Its principle is to encrypt the trafc, which makes it look like random bytes, to avoid fngerprint detection based on a blacklist. Obfs4 can combat active probing attacks [11] with key negotiation to prevent reviewers from utilizing connection discovery bridges. In terms of detection, Wang et al. [27] found that joint detection based on entropy detection and simple heuristic algorithms (such as length detection) can identify Obfs trafc. Other detection methods include packet length detection and truncated sequential probability ratio testing.

Covert Trafc Detection.
At present, trafc detection technology [28] can be divided into the following four categories: (1) semantic-based detection, (2) entropy-based detection, (3) machine learning-based detection [25], and (4) combined detection [29]of DPI and frewall, which can reconstruct the complete trafc, analyze the specifc protocol, identify the keywords of packets, and actively detect suspicious servers to avoid false-positives. . Te development of a software-defned network also provides a  new optimization scheme for the traditional anonymous  communication network, such as deploying the softwaredefned anonymous communication protocol of the network  layer on the autonomous domain router of the network service provider (such as lap [30] and Phi [31]). Tese new anonymous communication networks separate the control plane from the forwarding plane, making the information transmission path programmable.

Threat Models
Tis paper assumes that there is large-scale supervision of ISPs in the network and that the relevant monitoring platform of each cloud platform server has been exposed to a supervisor. However, supervisors do not have direct control over individual hosts. Terefore, nodes in the Internet are defned in this paper as the following four diferent nodes: client nodes used for users to access the network, nodes used for forwarding information in anonymous communication networks (hereafter referred to as controlled nodes), malicious nodes controlled by supervisors (hereafter referred to as malicious nodes), and dazed third-party nodes.
For the client node, in the environment described in this paper, the supervisor can only see the trafc at the entrance and exit of the node but cannot obtain the control authority of the node. Terefore, the supervisor detects whether users use anonymous communication networks and try to obtain users' communication relationships and other information through wiretapping, recording, replay and trafc analysis, and other means.
For the controlled nodes, due to their wide distribution, this paper assumes that the supervisor can only monitor and analyze the controlled node state and its incoming and outgoing trafc in a certain physical area. However, it cannot obtain all the node states in the anonymous communication network. Nevertheless, the supervisor can add the malicious nodes under its control to the anonymous communication network to achieve a man-in-the-middle attack or a witch attack.
For malicious nodes, this paper only considers that a large number of malicious nodes controlled by supervisors are concentrated in one physical region and a small number are distributed in other regions. Beyond that, supervisors can only access data from some Internet infrastructure providers. Terefore, because anonymous communication networks are distributed all over the world, their design rules include the fact that each node in the path does not exist in the same country or region to ensure that the case that all trafc is tracked in a transmission path is not considered.
In this paper, the dazed third-party nodes mainly refer to some Internet infrastructure platforms with massive users and data fles, such as web storage for storing media fles, social platforms for publishing information, and various Git repositories for hosting code. Tis article assumes that the custodian has the same access rights as the user and does not have access to the user's usage records of these dazed third parties.

Tor.
Tor is a widely deployed and popular anonymous communication network and its main purpose is to prevent attackers from identifying communication parties or associating communication links with a single user. Tor is based on the P2P network architecture and uses the onion routing protocol. Its data are transmitted through a series of uncontrolled voluntary nodes in the Internet; that is, there are controlled nodes, malicious nodes, and dazed third-party nodes in the Tor network.
Tor works as follows: clients build a link by selecting entry, intermediate, and exit nodes. Te Tor client obtains the current Tor network consensus fle from the current Tor's authoritative directory server. Tis fle contains basic information, such as the IP address, bandwidth and location of each forwarding node in the current Tor network, and the services supported by the node, and this information is updated every hour. Te client selects three nodes from the nodes listed in the consensus fle. For randomly selected nodes, the selection probability is approximately proportional to the bandwidth weight of the node. When creating and using links, layered encryption of onion routing ensures that each forwarding node only knows the information of the previous hop and the next hop in the link and no single forwarding node can transmit the client's information to the destination [32].
To improve the security and anonymity of services, Tor clients use diferent access protection mechanisms when they access Tor networks. In the client access process, Tor adopts a series of security mechanisms, such as bridge node [33], Meek covert channel construction, Obfs obfuscation, and FTE encryption, to protect user trafc from supervision during access. Te protection mechanism during access is randomly selected by users, and the probability of the optional nodes obtained for link establishment is proportional to the bandwidth [34]. To be an optional node, a forwarding node must meet a number of selected criteria to ensure good performance and increase the cost of being attacked. Te selection criteria are as follows: frst, the forwarding node must be measured by Tor's bandwidth measurement system, which takes two weeks [23]. Second, the forwarding node must have enough bandwidth to make its weight reach at least 2000. Te bandwidth value measured by Gerry Wan [35] et al. is approximately 35.5 Mbit/s. Tird, the forwarding node must always be online to be considered a stable state. Fourth, forwarding nodes must remain online long enough to be considered familiar nodes.

Anonymous Communication Network Based on Software
Defnition. Tis paper proposes a software-defned anonymous communication network that can achieve good covertness. Te network architecture is displayed in Figure 1. Te access method based on Internet public service is used in the network of the user access stage. In the data transmission stage, the data will pass through two parts: an isolated network and a core network. Te isolated network consists of a control center with multiple Internet infrastructures (dazed third parties). Te core network consists of a control center and several controllable forwarding nodes distributed all over the world.

User Access.
Tis stage is jointly completed by the client, the anonymous communication network (controller, access agent), and the dazed third party, realizing the process of establishing an implicit communication relationship with the dazed third party and obtaining response data fles under the condition that the client and the servers in the anonymous communication network are unaware of each other. Te specifc process is demonstrated in Figure 2.
Te working mechanism of the client, access agent, and the control center can be described by the following Algorithm 1.
Te client in Figure 2 obtains the temporary address of the registry in out-of-band mode like SMS or hiding the key information in tiktok [36] and obtains the identity identifer, the public and private key pair, and the list of accessible access agents from the registry before accessing the communication network. After that, it downloads the real resource until it obtains the responses from the access agent.
Te access agent forwards content from both the client and the controller Algorithm 2.
For the controller, when it receives the request from the access agents, it would verify the messages and send the real resource to the web storage. Te address of this storage is sent to the access agent Algorithm 3.
Before accessing the anonymous communication network, the client obtains the temporary address of the registry in out-of-band mode and obtains the identity identifer, the public and private key pair, and the list of accessible access agents from the registry. When accessing the anonymous communication network, the access request message is sent to the access agent. After receiving the access request message, the access request message is forwarded to the control center server. Additionally, after receiving the access request message, the control center server packages the control information corresponding to the access request and saves it to a third-party web storage. After receiving the reply message from the authoritative directory server, the access agent returns the reply message to the client and notifes the client to read the control information from the specifed third-party storage node. After receiving the response message from the access agent, the client reads the control information from the specifed third-party storage node.

Isolate Network.
Trough the fle exchange rather than the data streaming anonymous information based on the fle name and the fle encryption implementation content encryption, the isolation network transmission method based on the Internet of Basic Public Services implements anonymous communication users through asynchronous communication, fragmentation, and screen fow mechanisms and ensures trafc data from the client before entering the covertness of the anonymous communication networks.
At this stage, the data sent by the client is transmitted to each dazed third-party platform in the form of a fle, and then the corresponding A-nodes in the core switching network obtain the fle from the dazed third-party platform. Due to the public nature of the third party, the regulator's perspective cannot simply identify the controlled nodes and the transmission trafc.

Core Network.
Te core network borrows the idea of software defnition and adopts the form of the controller and the controlled forwarding node to realize the programmable node and the forwarding path. To realize the covertness of the network communication, the system uses fle exchange instead of message exchange to realize asynchronous communication. After being removed from the isolated network, the data to be transmitted are synchronized to each intermediate node in the form of a fle, and the intermediate node transmits the data to the receiver according to the specifed forwarding path. Te two parties do not directly transmit encrypted trafc.
(1) Architecture. Te system proposed in this paper consists of N nodes and K console servers and is shown in Figure 3.
As a communication user, a node also provides fle storage and forwarding services for the anonymous communication of other nodes. Each node maintains N folders

Anonymous User
Internet User Internet User Mix Network Access Network  Security and Communication Networks and N − 1 backup fles, where N folders correspond to each node i. Te console server is the core of the system, which controls the IP addresses of all nodes and determines whether each node participates in the communication process. Ten, the sender can set the forwarding route through the console server before communication. Te console server can also control whether the node performs fle synchronization. Due to the controller's large throughput, the system needs to use multiple controllers to prevent the supervisor from tracing the source.
As the core of the system, the controller server controls the IP address of all nodes.Tese nodes running in the internet are silent at frst, and can be activated by the controller server for the communication process.. Te sender can determine the forwarding route through the console server before communication. Te console server can also control whether the nodes synchronize fles.
(2) Route Selection. Route selection is performed by the controller selecting m controlled nodes (m < N − 1.) or the sender selects m nodes to form the path R, which can be seen in the following equation: In addition, the design of this route has the following constraints: it needs to go through diferent countries; it needs to go through diferent VPS manufacturers; and at least three controlled nodes must be passed. Te control center server then sends synchronization confguration commands to each node, as depicted in Figure 4. (3) Information Transmission. As shown in Figure 5, node A synchronizes information to the nodes in the forwarding path in the form of an A fle, for example, the exchanged keys, but based on Wildcard identity-based encryption [37].
Te nodes in the path synchronize information in turn until node B receives the fle and returns the receiving identifer in the same way. In this process, the trafc identifed by the supervisor is that node A communicates with another node C and Bob communicates with another node D.
Finally, A and B realize the complete communication process in the core switching network. During the whole process, BOTH A and B perform fle synchronization operations with multiple nodes, masking the real trafc transfer information. Tird parties cannot track specifc data trafc.

Security Analysis.
Te background of the proposed system is to build an anonymous communication system implemented by controllable nodes at the application layer in an uncontrolled network. It shields all information below Input: request, SK controller , PK AA , webStorageList Output: address (1) message � dec(request.data, SK controller ) (2) K � message.K (3) If verify(message, PK AA ) �� False then (4) return error (5) address � null (6) for webStorage in webStorageList do (7) If checkAvailable( webStorage) then (8) address � webStorage (9) break; (10)   (1) Security. In terms of security, this article considers several common attacks: Sybil attacks, man-in-the-middle attacks, and DoS attacks. A Sybil attack refers to the fact that a few nodes in a P2P network control the majority of nodes and obtain multiple false identities, making it no longer a peerto-peer network. In this system, since the console server is credible, the scenario of the Sybil attack is that the node is controlled by the attacker and all synchronized fles are obtained by the attacker. In fact, what distinguishes this system from other P2P networks is that the console server is a trusted central control node that can control and monitor the abnormal trafc of all nodes and notify the node user when there is an abnormality. Abnormal nodes are quickly separated from the network, ending the witch attack. A manin-the-middle attack means that the information of the communicating parties is intercepted and forwarded by the attacker. However, this system not only uses TLS1.3 to encrypt the trafc at the network layer but also uses digital signatures and encryption for valid information at the application layer; therefore, only the receiver can successfully decrypt it and avoid man-in-the-middle attacks. According to Abhishta [38], the possible DOS attacks in this system occur during the communication process when a node is maliciously controlled, and before the console server takes it ofine, a large amount of malicious data is sent to other nodes, which causes the network bandwidth to be occupied and other normal forwarding services cannot be performed.
In the information transmission of 3.1, this paper has proposed the fact that the system will send a maximum time limit during precommunication. Terefore, when the sender node in the network does not receive the fag information returned by the node within the maximum time limit, the console server sends data to ensure that all nodes discard the malicious data, thereby preventing DOS attacks. Besides that, the cost and efectiveness must be two targets for each communication network. Naiwei Liu [39] came up with a method for trustzone, in which way we could get the same way to fnd out the cost and efectiveness of our node.
(2) Antitraceability. Since this article implements routing in the form of fle forwarding using software in an uncontrolled network environment, it can better resist traditional network-level traceability attacks, including passive traceability and active traceability. In passive traceability, all types of attacks come from correlation attacks. A correlation attack occurs when the attacker can control the nodes of the anonymous channel and can observe the ingress and egress trafc of the anonymous channel at the same time. Next, they can compare the trafc packets and their sequence within a certain time delay and then analyze the corresponding information to achieve a traceability efect. Correlation attacks require that both ends of the communication be under control, but, in large-scale network confrontations, the network where the sender and receiver nodes are located is within the supervision of the supervisor, and the trafc at the network layer is monitored and correlated by the supervisor. In our system, the sender and the receiver only exist once in the point-to-point communication at the application layer, the amount of payload data that they have is small, and most of the data are encrypted and transmitted through other nodes. Terefore, within a certain time delay, the supervisor cannot associate the trafc of the two communicating parties from the massive trafc, which guarantees the noncorrelation. Active traceability is mainly based on network watermarking attacks. A network watermarking attack means that when the trafc enters the anonymous channel, the network supervisor inserts specifc watermark information into the trafc, and when the trafc is received by the receiver, the two are correlated, thereby destroying the anonymity. According to the watermark form, it can be divided into four forms based on content, delay, packet length, and ratio. Te common point of this type of attack method is that the object is a network stream. Terefore, the produced watermark is inevitably lost in multiple asynchronous forwarding of multiple nodes in diferent physical environments. Te supervisor cannot obtain the relationship between the node and the console server, which guarantees the noncorrelation of the system. In addition, due to the diferent paths used to forward valid data each time, the supervisor cannot distinguish the real recipient, thus ensuring anonymity.

The Modeling Method
Te goal is to quantify the invisibility of an anonymous communication network, which consists of the invisibility of client access and the invisibility of trafc in the network. Terefore, this paper indicates the need to detect both client programs and trafc covertness. Te covertness detection model can be represented by the covertness block diagram, which is similar to the malware detection model [40]. Tis block diagram is a logical graphical description method that determines the probability of covertness behavior by probability analysis of each available data and obtains a relative covertness score based on this probability. Terefore, as long as the general characteristic data collection is stipulated, the covertness of the client access stage and data transmission stage of any anonymous communication network can be evaluated.
In the general model, the detection program collects all commonly available data. Based on the threat modeling in this paper, the data collected by the detection program will not be tampered with by attackers.

Covertness Block Diagram.
Te covertness block diagram is defned as the following path from left (start state) to right (end state). Each node in the path corresponds to a condition, according to which the probability can be determined as PI. Tus, the probability of each node I on the entry path is as follows: Te order in the direction is determined by the order from small to large according to the judgment probability of the collected data.

Access Covertness.
In order to observe users accessing anonymous communication networks, observers frst need to observe egress trafc data. As seen from the relevant work in Section 2, the mainstream domain fronting technology now mainly uses large Internet cloud service providers, such as Microsoft, Amazon, and Cloudfare. Terefore, the domain names, DNS query records, and IP addresses accessed by the outbound trafc have become critical observability indicators. Second, to further check whether the host where the client resides has covert access behavior, the detection program periodically samples and scans the node within a certain time T after detecting the preegress trafc to see if there is trafc with the same destination. If there is trafc, P i will prove that the node has no covert access.

Covertness of Transmission.
In the transmission stage, the trafc and the performance status of each node before and after user data enter the anonymous communication network should be considered simultaneously. Considering the specifc parameters involved in the access and data transmission of an anonymous communication network, the following table is given in this paper Table 1.
Based on the above methods, the model constructed in this paper is displayed in Figure 6.
Te corresponding equation is as follows: (3)

Experiment
By building and deploying the system, this chapter shows the results of the basic performance, including the response of the control center, the forwarding delay, and the throughput of the core network. In addition, the results of the covertness of the system and Tor have been measured.

Response Time of the Control Center.
Te response of the control center mainly includes the delay of fow table switching, the response time of distribution, and the response of node state acquisition. Te data collected in this section are the diferences between the reading database record time and the execution operation time of the web system. Te test results show that the response time of these activities in the actual test is less than 5 seconds and in most cases not more than 3 seconds, which can be regarded as a real-time response.

Forwarding Delay.
Te forwarding delay of the core network is the time required by both sides of communication from sending data to receiving data. Te sender will split the original data to be sent into diferent slices and send them to the receiver, and the receiver will restore the original data. Te test results of this section are shown in the following Table 2. It can be seen from the experimental results that when the number of slices increases, the delay is considerably reduced. Tis is because the scheme described in this paper transmits fle units. When the fle size is less than the size of the data transmitted in unit time, the transmission queue can maintain fast parallel transmission. When the fle size is too large, each fle is transmitted in a single queue. Terefore, when the business scenario is faced with the need to reduce delay, the requirements can be customized according to packet fragmentation.

Forwarding Troughput of the Core Network.
Te forwarding throughput of the core network is the data forwarding volume of each node in the process from sending data to receiving data. Te sender will split the original data to be sent into a fxed number of pieces and send it to the receiver through multiple links, and the receiver will restore the original data. Te test results of this section are shown in the following Table 3.
It can be seen from the experiment that when the number of links increases, the load of each link is relatively balanced with that of each receiver. Terefore, it is not difcult to determine that high concurrency can be achieved through multiple links. When the business scenario is faced with the need to improve transmission efciency, the demand can be customized by increasing the number of routes.

Security Lower Bound Assessment.
Because the anonymous communication network described in this paper adopts the idea of software defnition, the requirements can be customized according to diferent scenarios. Terefore, this section attempts to obtain diferent levels of covertness scores by adjusting the number of nodes and carries out experimental tests on them. Te specifc method is that, in the system described in this paper, the fle with a data size of 10 MB is forwarded from the overseas node to the domestic node. On the basis of ensuring that each data exchange in the link in the transmission stage occurs in nodes in two diferent countries, the covertness score of a set of anonymous communication networks can be calculated by manually adjusting the number of nodes that passed by the client trafc. We calculate the covertness score according to the above covertness test method (Figure 7).
It is easy to know that the data transmission delay increases linearly with the increase in the number of relay nodes. From the global perspective, the more nodes there are on the link, the greater the probability that forwarding behavior has the same characteristics.
Terefore, when the number of nodes is greater than 5, the covertness score does not increase substantially. In the current scenario, when using the anonymous communication network scheme in this paper, the security lower bound of the relay node is 5.

Covertness
Comparison. Te anonymous communication network described in this paper transmits data through fles, while Tor and other anonymous communication systems    transmit data in the form of streams. Considering that the link selection of Tor cannot be set manually and locally, this paper simulates the Tor network through shadow on the server during covertness comparison and runs the anonymous communication network client described in this paper on a centos7 virtual machine according to Tor's simulation communication log, restoring the communication relationship of the simulation log. Ten, we collect the data of the two communication processes and measure the covertness. In order to add the systems to be compared, this paper selected several systems that can be simulated in the Intranet. As an anonymous fle sharing system with high latency, freenet can deployed through docker. Besides that, both PrivaTegrity and Dicemix, which is based on cMix [41], could be built and evaluated.
In this experiment, we can adjust the probability of detecting characteristic trafc in the outlet trafc by adjusting the number of redundant segments (1/2/4/8) when the client program sends data. At the same time, a curl is used to send requests to the domain names of major Internet cloud service providers according to diferent ratios to set the probability of indicators in another I/O. Te data graph obtained at the end of this experiment is as follows.
When using the anonymous communication network described in this paper, we send the same picture to another server located abroad through the designated client.
We run Wireshark on the host of the centos7 virtual machine to capture the virtual machine program and the network card, which is used to simulate the cloud server operator and the defender to detect its export trafc and service status. We run tcpdump on the controlled node to simulate the supervisor to supervise each node.
Finally, by comparing and transmitting the communication behavior in the simulation log many times, the covertness score comparison between the anonymous communication network described in this paper and Tor is obtained, as shown in Figure 8 as follows. It can be seen from the fgure that the covertness evaluation method can evaluate the covertness of onion routed and mix-based anonymous communication network. In addition, in some specifc network scenarios, softwaredefned anonymous communication networks can obtain higher covertness points than Tor. Tanks to the high latency, freenet could get the highest covertness points.

Conclusions and Discussions
Tis paper focuses on the construction of an antieavesdropping anonymous communication network system under the condition of the uncontrolled existence of various trafc characteristic detection environments and analyzes the problems existing in all stages of anonymous communication networks from access to data transmission. Tis paper proposes an anonymous communication network system based on the idea of software defnition. Due to the current situation that there is no good quantitative evaluation method to solve these problems, this paper proposes a method to measure invisibility and uses this method to compare the proposed system with the onion network to prove the efectiveness of the system on invisibility.
What is more, there are still many details to be improved. Besides information in the transport layer or application layer, an optimized frewall anomaly resolution improved by Fulvio Valenza [42] can make specifc rules changed more quickly. And the total of our channels can be increased. For example, Sherifdeen Lawa [43] introduced microfrontend and it could be used to deploy the microservice faster and more fexible [14].

Data Availability
All relevant data used to support the fndings of the study are included within the article.

Conflicts of Interest
Te authors declare that they have no conficts of interest to report regarding the present study