VPN Traffic Detection in SSL-Protected Channel

network. The proposed system analyzes the communication between user and the server to analyze and extract features from network, transport, and application layer which are not encrypted and classify the incoming traffic as malicious, i.e., VPN traffic or standard traffic. Network traffic is analyzed and classified using DNS (Domain Name System) packets and HTTPS- (Hypertext Transfer Protocol Secure-) based traffic. Once traffic is classified, the connection based on the server’s IP, TCP port connected, domain name, and server name inside the HTTPS connection is analyzed. This helps in verifying legitimate connection and flags the VPN-based traffic. We worked on top five freely available VPN services and analyzed their traffic patterns; the results show successful detection of the VPN activity performed by the user. We analyzed the activity of five users, using some sort of VPN service in their Internet activity, inside the network. Out of total 729 connections made by different users, 329 connections were classified as legitimate activity, marking 400 remaining connections as VPN-based connections. The proposed system is lightweight enough to keep minimal overhead, both in network and resource utilization and requires no specialized hardware.


Introduction
To enable the communication between the computers, TCP/ IP stack was implemented. e stack was implemented without the consideration of security of information being transferred in the communication [1]. is issue raised a lot of security concerns which are constantly managed by di erent security services [2]. Secure Sockets Layer (SSL) is commonly used to provide authentication and encryption security service in TCP/IP stack [3].
e trend of encrypted tra c in the network has largely increased in the last decade due to security concerns in general and privacy concerns in speci c [4]. e encryption has provided a lot of bene ts for the user ensuring end-toend secrecy and data con dentiality. e need to inspect the tra c originating or destined for the organization's network has immensely increased for many security reasons. One of the reasons may be to simply validate parties involved in the communication [5].
Simple firewalls are generally not equipped with SSL inspection or off-loading which allows encrypted traffic to pass without any inspection [6]. is allows malicious traffic inside the network over covert channels that are not inspected by the firewall [7]. ere is a dire need to detect legitimate and illegitimate traffic with minimal network overhead and overall system cost. is will allow any scale organization to better govern their organizational policies.
Virtual private network (VPN) service may be used to hide the real traffic in the network which may be otherwise not allowed or may be monitored [8]. A user using VPN service connects to a VPN server using normal Transport Layer Security (TLS) connection outside the network. Once connected, it requests the website or service from the server [9,10]. e VPN server originates the request on behalf of the user to the server requested. e encrypted response is sent to the user on already established channel; as a result, the whole activity passes any filter on the network firewall.
Such techniques may be used by the users which aim to hide from or deceive the organization of their Internet activity [9]. is paper proposes a novel technique to detect VPN traffic inside a network. e proposed technique extracts the network traffic features and classifies the traffic to indicate if the traffic is legitimate or not. Key features are extracted from the network traffic and are compared against the already identified features of traffic found to be illegitimate or VPN traffic. e system is also able to classify the traffic which is not following the pattern of normal traffic or normal user activity and flags that particular traffic stream to be invalid. We tested our system against five well-known freely available web-based VPN service providers; the proposed system was able to classify all of them correctly. More traffic-characterizing features may be added to identify more applications.

Related Work and Comparison
Multiple VPN services like TOR [11], Hotspot Shield, and other services have unique fingerprints, and not all the services can be distinguished using a similar criterion. Yamada et al. discussed a technique that uses statistical analysis on the encrypted traffic [12]. e scheme discussed, uses data size of network packets and performs timing analysis on the received packets to detect malicious traffic inside an encrypted channel. is technique is very useful for Web service providers to analyze the traffic coming to their servers and detect any malicious activity coming from outside the network.
A study on android-based applications which use VPN services [13] to show that these VPN services may use thirdparty trackers to track user behavior, and some may be used to bypass android sandbox environment. Once a malware or virus is delivered to the device inside the network, the whole network is vulnerable to attacks [14].
VPN clients inside the network act as a proxy, which connect to the respective VPN server. Once the connection is established, the VPN service provider is able to change or eavesdrop on the information and network traffic as required [15,16]. is attracts many third-party advertisement or tracking entities [17,18]. Any malicious entity can read, save, and/or modify our request and the related information to and from the destined service.
VPN services can change the data as they are in control of incoming and outgoing traffic from network to device. VPN services are also able to perform TLS interception [19] by using their own certificates which is trusted locally by the system, for VPN service to work properly. is leads to a more potentially risky situation when the device connected contains sensitive data [13,20]. One of the countermeasures to this issue is certificate pinning [13,21]. So, detecting such VPN services inside your network can save you from huge losses in terms of the information lost.
Goh et al. [22] proposes a man-in-the-middle approach to detect VPN traffic in the network. e article puts forward a solution that uses secret-sharing scheme which involves a massive key management overhead using public key infrastructure (PKI) technique. e paper assumes that the traffic coming to the system is unencrypted and the data are available in plain form for the system to analyze and detect VPN traffic. is is achieved by using application layer proxy which generates the copy of unencrypted traffic against each connection which is then sent to the system for further analysis. is technique approximately doubles the network traffic and computational resources of existing system while increasing the memory requirements to decrypt and reencrypt the web traffic.
Another solution that uses Deep Packet Inspection technique [23] uses multiple sensors throughout the network to get the unencrypted traffic from the end hosts and send it back to snort-based IDS [24] to detect unusual behavior in traffic. It increases the overall network traffic because a sensor is to be installed on each network machine to be able to detect any unusual activity. Another technique is to copy the entire connection traffic and use preshared secret to analyze any malicious traffic [25].
To identify applications being run inside the network, network analysis is used extensively. e work discussed by He et al. [26] uses basic yet one of the most effective and used techniques in network traffic analysis for traffic classification. Based on five-tuple connection classification, the technique uses connection characteristics like packet size, their interarrival time, and the direction and order of the packets to identify the network signature of any android application.
e scheme provides basic understanding of traffic classification. However, network traffic generated by web-based VPN services will have no major difference or identifying characteristics, different to a standard HTTPS connection.
e use of unencrypted traffic to manage, analyze, and categorize encrypted traffic is an exciting concept, discussed by Niu et al. [27]. e schemes use labelled DNS-based data set to identify malicious command and control traffic and label the traffic as suspicious or normal. e concept provides a unique prospective to analyze the network traffic beyond five-tuple/ current connection technique discussed previously [26]. Table 1 provides basic attributes of already discussed techniques. e techniques discussed pave the path of our proposed scheme.
Our proposed system analyzes DNS records to identify malicious or illegitimate VPN server names. Connection features are extracted using five-tuple approach. Five-tuple approach classifies each new connection by five attributes listed below: (i) Source IP (ii) Destination IP (iii) Protocol (TCP/UDP) (iv) Source port (v) Destination port DNS-based traffic analysis and connection management were done using five-tuple techniques; our proposed system goes a step further to analyze HTTPS handshake. is is done to verify the server name used in the connection with the DNS activity which the user has generated by his network activity. Using this novel approach of managing a connection by using the activity preceding the current connection, we are able to detect and identify VPN traffic inside the network.

Forensic Analysis of VPN Services Client
To detect the network activity of VPN services, we carried out the forensic analysis of VPN services. For this purpose, we choose top five freely available web-based VPN services listed below: For each of these VPN services, we analyzed the network traffic, generated by their clients, installed on a user PC. e initial analysis was performed using Wireshark [28] and NetworkMiner [29]. Detailed analysis of each VPN service is discussed below.

Hotspot Shield. Hotspot shield [30] developed by
AnchorFree is one of the leading free VPN services used. We tested its two versions: (i) Client application for windows desktop (ii) Firefox add-on

Client Application for Windows Desktop.
In client version of the abovementioned VPN service, it was observed that once enabled, the service uses standard port 443 for HTTPS connections but generally connects to only one server. All the traffic may it be multisite traffic uses the same active connection. Figure 1 shows the connection details for current user activity against Hotspot Shield. Hotspot Shield uses fake well-known server name in SSL certificate to bypass the traffic from server name-based filters over the network, if any, as shown in Figure 2 below.
It can be seen that the used server name is twitter.com. It does not generate any DNS entry for such server name. e NetworkMiner tool shows us the connection details in Figure 3. We can see that eight unique connections were made; in this case, it generally means eight unique web pages were open. Requests of all these web pages were managed by the server whose IP is 136.0.99.219. Certificate details can also be seen against this server IP which were received. Total 20,708 packets were sent in this activity, and 116,84 packets were received. Figure 4 shows that no DNS activity for such host name was found during the communication. We can see all the DNS generated by the user while using Hotspot Shield client.

Firefox Add-On.
Hotspot Shield in add-on uses standard https port along with standard DNS queries. e only way to detect Hotspot Shield inside the network is to identify the domain names used by Hotspot Shield. Shown below in Figure 5 is the network traffic generated by Hotspot Shield captured using Wireshark.
It can be seen in Figure 6 that the domain name is ext-miex-nl-ams-pr-p-1.northghost.com for which the connection is established.
We observed that Hotspot Shield domain name consists of two main parts: (i) Server identifier (ii) Domain name is can also be seen in certificate details in Figure 7, analyzed by NetworkMiner tool: It is clearly observed that the domain name is * .northghost.com and the other part is some server identifier as it may change once you reinitate the connection. It can be seen that the connections for Hotspot Shield were established against only one server with IP address 216.162.47.67. Total connections established were 35, and a total of 207,08 packets were sent in this activity, and 11,684 packets were received.
e add-on also generates standard DNS activity as shown in Figure 8.
Changing the VPN locations from add-on's option has no effect on the server being connected by the client as the server identifier in the same activity does not change.

ZenMate.
ZenMate [31] developed by ZenGuard is also very popular free VPN service used. We analyzed the chrome-based add-on of ZenMate. It uses standard https port along with standard DNS queries. e only way to detect ZenMate inside the network is to identify the domain names used by ZenMate VPN. Shown below in Figure 9 is the network traffic generated by ZenMate VPN captured using Wireshark.
It can be seen in Figure 10 that the domain name is 63.ayala-maroon.ga for which the connection is established.

Security and Communication Networks 3
Like Hotspot Shield, ZenMate's domain name also consists of two main parts: (i) Server identifier (ii) Domain name is can also be seen in certificate details in Figure 11, analyzed by NetworkMiner tool: It is clearly observed that the domain name is * .ayalamaroon.ga, and the number part is some server identifier. ZenMate is unique from other VPN services as it constantly

Research techniques Strengths Limitations
NIDS-based technique [22] (1) Complete architecture to handle encrypted traffic-based intrusion detection (1) Multiple devices to be added in the network (2) Protection against remote access and evasion techniques (2) Increased bandwidth inside the network due to traffic duplication DNS-based technique [27] (1) Introduces the concept of DNS scoring and analysis. Helpful in detecting malicious CNC based on DNS (1) All CNC may not use only DNS based implementation Connection-based technique [26] (1) Five-tuple-based connection management. Helpful in identifying different protocol and application behavior (1) Traffic generated by HTTPS based VPN will generally look like standard HTTPS streams changes the servers being connected by a user. So, any suspicious or long activity with one server cannot be identified by automated tools. As seen in Figure 11, multiple host names against the same domains are listed in SSL certificate provided by the VPN server. ese servers/hosts may be used randomly to request multiple resources over the Internet. It is clearly shown in the figure that the number of connections against this server is only five, which is less than other VPN servers' connection discussed in the paper. Another unique feature that ZenMate offers is that it changes the domain name as well once the location of the VPN server is changed from the settings of add-on. As shown in Figure 12, the server name is changed to 34.lutzobrien-olive.ga once the user has changed the location.
ZenMate changes domain names against region selected by the user, but for the same region, the server identifier of domain name may change but domain remains the same. If a user is constantly changing the locations, after some time when all locations available are exhausted, the domains for each location could be identified. As shown is Figure 13, multiple domains for ZenMate service used by this user are as follows: (i) lutz-obrien-olive.ga (ii) ayala-maroon.ga (iii) hall-silver.ga (iv) young-purple.ga is information can now be used to prepare a filter to identify ZenMate VPN inside the network. One can also notice that the last part of domain is always a color and ends with .ga. So, if we received DNS request or response and the domain name ends with .ga with "-" (dash) in the query, it could be separated on "-." Once separated, if the last string   contains any well-known color name, we can classify it as ZenMate DNS server. As shown in Figure 14, the domain name analysis was done by NetworkMiner, we can see the same pattern discussed above. [11] is used generally by users to hide their Internet activity and to access resources on dark web. TOR browser uses a concept of onion routing to hide user's activity. We installed TOR browser to analyze the network traffic generated by the browser. It uses a nonstandard port for communication over Internet. It uses HTTPS over 9001 TCP Port initially for circuit connection.

TOR Browser. TOR Browser
After the circuit connection is established, TOR may use 443 for normal Internet or any other port as configured. TOR   will generally not generate any DNS traffic. A normal TOR stream viewed in Wireshark is shown in Figure 15.
Opening of each website may create new connection to server and server name along with their IP addresses which are communicated to TOR browser during circuit establishment process and are encrypted. Figure 16 shows a TORbased TCP stream analyzed in Wireshark.
Connection details of a TOR connection analyzed by NetworkMiner are shown in Figure 17. It shows that, against server IP 5.9.42.230, a total of 639 packets were sent and 586 packets were received by the user.
Complete activity of the user for the session being discussed is also shown in Figure 18. It is interesting to mention here that no DNS activity was found for TOR browser. [32] is another freely available VPN. We used it as Firefox add-on. It uses standard HTTPS port along with standard DNS queries. e only way to detect Browsec VPN inside the network is to identify the domain names used by it. Shown below in Figure 19 is the network traffic generated by Browsec VPN captured using Wireshark.

Browsec VPN. Browsec VPN
It can be seen in Figure 20 that the domain name is nl30.tcdn.me for which the connection was established. Like other VPN services, the domain name of Browsec VPN can also be further divided for better analysis. It consists of three main parts; it can also be seen in certificate details in Figure 21, analyzed by NetworkMiner tool: (i) Country code (ii) Server identifier (iii) Domain name It is clearly observed that the domain name is * .tcdn.me and the other part consists of some server identifier and location identifier. In Figure 21, the location identifier is nl,  Like ZenMate VPN, Browsec VPN also changes its DNS information when changing the location, but unlike Zen-Mate, the domain name is not changed rather only the server qualifier is changed. Figure 23 shows the DNS traffic generated by user's activity. [33] is another freely available VPN. We used it as Firefox add-on. It uses standard HTTPS port along with standard DNS queries. We can detect Hoxx VPN inside the network by identifying the domain names used by the VPN service. Shown below in Figure 24 is the network traffic generated by Hoxx VPN captured using Wireshark.

Hoxx VPN. Hoxx VPN
It can be seen in Figure 25 that the domain name is dyn-146-185-141-219-5871-b377a.klafive.com for which the connection is established. Like other VPN services, the domain name of Hoxx VPN server can also be further divided for better analysis. It consists of two main parts: (i) Server identifier (ii) Domain name is division can also be seen in certificate details in Figure 26, analyzed by NetworkMiner tool. It is clearly observed that domain name is * .klafive.com and the other part consists of some server identifier. Figure 27 shows the DNS traffic generated by user's activity.

Proposed System
e proposed system distinguishes the normal flow of an Internet activity or session from an abnormal one. Normally, when a user wants to connect to a website a DNS request is made to translate the web name to IP address [34]. After successful name resolution, against the IP, a TCP (Transmission Control Protocol) session is initiated and required security associations are established. is behavior may be used to monitor and analyze different features of network traffic. [35][36][37].
e proposed system classifies any incoming data into multiple categories depending on the current state of connection; in addition to that, Internet activity preceding the connection is also monitored to identify the traffic as VPN or simple Internet traffic. e process of detecting any illegitimate traffic is further classified into two main processes: (i) Feature extraction (ii) Traffic classification 4.1. Feature Extraction. To classify traffic as normal or VPN, we have to extract different traits of the network traffic. Now, most of these traits can be found in current traffic stream while some of them are collected before the actual stream starts. Figure 28 shows the basic flow of network traffic feature extraction module of the system. e analyzer extracts the following information to be used for traffic categorization.   is information is extracted from IPv4 Protocol fields, source IP and destination IP [38]. Depending upon the transport layer protocol, the source port and destination ports are also extracted [39].

Domain Name Server Analysis.
Unencrypted traffic information is as important in traffic characterization and behavior analysis of users as the encrypted traffic. For any web request, generated by a user, a DNS request is initiated by the user's browser to request the IP information of the        Security and Communication Networks    Security and Communication Networks 11 server name. A response is sent to the user from DNS server containing IP information of the server [34]. is information is stored by our system to verify the DNS server name vs. HTTPS certificate's server name to see for any inconsistencies.

HTTPS Protocol Detection.
Incoming traffic is then passed to HTTPS detection module. e system looks for HTTPS other than port 443. is is done by looking for HTTPS headers on streams which are TCP-based connections but the server port number is other than 443. A lot of   applications and services use the technique to change the server port.
is allows them to pass through network firewall and is not labelled as encrypted payload.

SSL Analysis.
e proposed system decodes SSL certificates [40] once HTTPS is detected. ere are 4 basic types of messages in SSL: From the Handshake messages, we extract the server information such as name of the server to which the connection is made. is is used to verify or detect the DNS activity versus server name.
ese features once extracted are used by traffic classifier to classify each connection to VPN or normal traffic.  are stored for every new connection. Once the connection is established, it is classified as legitimate or VPN traffic based on extracted features of previous network traffic and new connection. is classification may be as legitimate traffic or VPN traffic. e proposed scheme classifies the incoming connections as shown in Figure 29 and is discussed below.

IP-Based
Classification. Server IP of each new connection is looked up in an already populated IP-based hash table. is hash table contains the IP list of TOR's exit nodes [11] along with the server IP that were previously classified by the system as VPN servers. is is done to minimize the resource utilization against already classified VPN server. If server IP of the current connection is found in this IP-based hash, then the traffic is classified as VPN traffic.

Server Name-Based Classification.
If the connection is not classified by VPN IP-based hash table, the server name specified in HTTPS Client Hello message is used to classify the connection. In a normal TCP/IP-based communication, whenever a service or website needs to be accessed, first its domain name is converted into IP address. is is done to access the resources over the Internet [41]. An IP address at a given time is bound to a specific domain. Using this technique, we classify the normal domains against the domains responsible for VPN Services. is classification can be further divided into two steps.

No Server Name
Analysis. Against the current server name extracted from the connection, we look up our selfmaintained DNS list, populated by network traffic. If no DNS entry is present for that server name in the list or the server IP of the connection is not associated against the given server name, such traffic is classified as VPN traffic. Mostly, inside the initial connection to VPN server, these IPs against DNS are shared with the client's application in SSL-protected channel as to avoid any DNS-based filtering.

Server Name Analysis.
e server name or the domain name of the current connection is looked up against the wellknown VPN server's domain names. e list is maintained to look up the server name; if found, the connection is classified as VPN-based connection. e list is generated by the traffic analysis of these VPN servers, and some unique strings are extracted specific to that VPN service as discussed previously is Section 3.

System Evaluation
e deployment of our proposed solution, if used only for detection, can be passive as well. Passive deployment will result in lower latency as the traffic is being mirrored by the switch or gateway itself. For passive deployment, all the traffic destined outside the network and DNS traffic must pass through the tapped interface as shown in Figure 30.
We analyzed the traffic pattern of well-known available VPN services which use HTTPS protocol for communication. ese servers are listed below: (i) TOR browser (ii) Hotspot Shield free (iii) Browsec VPN (iv) ZenMate VPN (v) Hoxx VPN e traffic of these VPN services was analyzed, and a selection criterion was build based on the pattern emerging from the analysis. e key features for each VPN service are shown in Table 2. In case of TOR, we see nonstandard HTTPS behavior which means that it may not be on default port 443. We can also detect TOR by TOR nodes list populated and updated by community.
In case of Hotspot Shield, we tested two variants of its client. One was the add-on of Firefox web browser, and the other client was desktop application. In case of web browser extension or add-on, Hotspot Shield uses special domain names which are used to uniquely classify the service. In case of desktop application, the client uses nonstandard port for HTTPS with no DNS activity. Browsec and Hoxx VPNs both were tested as add-on to the browser, and they are uniquely classified using the domain names the servers use.
All three services discussed above use the same type of domain names across multiple geolocations, e.g., any traffic may be classified as traffic of Hoxx VPN if its domain name contains * .klafive.com.
is is not the case for ZenMate VPN. It changes domain names with respect to geolocations chosen by the user. e list of these domain names is communicated during initial connection setup and is updated frequently. is allows VPN services like ZenMate and others to work over a network which uses DNS-based filters, if these filters are not updated frequently.

Traffic Generation.
Across multiple systems inside the network, multiple clients of the abovementioned VPN services were installed and configured. ese clients were enabled, and network activity was generated by surfing the Internet. e activity was monitored by VPN detector, and alerts were generated once the VPN activity was detected.

Traffic Classification Alert.
e alerts generated above for different VPN services were of different types depending upon the activities performed by the users. e generated alerts by five of these users are shown in Table 3. e alerts shown in Table 3 show the traffic classification of each type of VPN service used with respect to its unique characteristics as discussed in Table 2. Mostly, VPNs may be classified with the help of DNS activity which enable the user to access such services. e results shown in Table 3 show that the system classified 400 out of 729 active connections as potential VPN connections. Once the system is deployed, any new connection activity in the network is monitored. Each system connected to Internet manages its on DNS cache to reuse DNS information. If a new connection is made and no DNS activity is present in the system for the server, the system will flag it as potential VPN traffic. To improve system's precision, the system ignores the already established connections.
VPN classification based on IP and DNS activity may need periodic updates to the lists maintained by the system. Updating this information will increase the overall accuracy of the system and result in less false positives and negatives. Our test shows that, in case of TOR IP analysis, the IP  Figure 29: Traffic classification.
Internal network External network VPN analyzer

Conclusion
A VPN service inside an organization may generally be used by an individual to hide the real communication.
is communication may be harmful or damage the organization, and the organization may not allow such communication over its monitored network. An organization may not be able to invest heavily on SSL-based proxies to manage its network.
is paper proposes a lightweight approach to detect and block unwanted VPN clients inside the organizational network responsible for some illegitimate activity.
Our proposed technique focuses on the information available in plain, which means there is no need to decrypt or decode any network communication. is helps in low resource utilization. e proposed solution not only focuses on the current connection but also keeps track of the network activity responsible for this communication, i.e., DNS activity. Such mapping of DNS with its next stream helps identify the normal behavior of the TCP/IP network stack. If no Domain Name information is available for current connection, it may not be normal traffic flow. e scheme also analyzes nonstandard use of HTTPS and detects this anomaly as it is largely used to hide such communication from HTTPS-based filters in firewall.
Results show that our proposed system is able to identify and classify such trends in network traffic and classify the network traffic. e analysis of the VPN services discussed in Table 2 is crucial to detect these services. ese service providers keep changing the traffic characteristics for their service. Active analysis of these services must be carried out to keep VPN detector up to date with latest traffic trends.

Data Availability
e data used to support the findings of this study are provided within the article.