GroupTracer: Automatic Attacker TTP Profile Extraction and Group Cluster in Internet of Things

,


Introduction
e Global Research and Analysis Team (GReAT) at Kaspersky points out that the Advanced Persistent reat (APT) activity has become increasingly complex and destructive [1] since these APT groups launch targeted attacks on critical infrastructure and attempt to compromise central networks. Meanwhile, the Internet of things has become the no. 1 security threat to personal privacy, corporate information security, and even critical infrastructure since IoTdevices are inherently risky and easy to exploit while being heavily exposed to the Internet. What is worse, attackers can utilize open-source tools to quickly assemble malware that can scan, penetrate, and control IoT devices. Excellent hackers can take down millions of IoT devices in a short time. Once IoT botnets are formed, attackers can launch an APT attack to hazard the Internet infrastructure and cause network disconnections (e.g., Dyn cyberattack [2] and VPNFilter event [3]). e emerging challenge is how to observe and predict attacks on IoT devices by individuals or even attacker groups since the number of attacks on IoT devices, which are perfect tools for APT attacks, has risen dramatically.

Describing Individual Behavior.
While behavior detection methods for attacks are mostly based on Indicators of Compromise (IOCs) extracted from rule-based methods or traditional blacklists, the information conveyed by such IOCs is not enough to describe the abundant and varied network security environment due to the following reasons: (i) IOC is unstable and is easily changed by attackers.
For example, if adversaries are leveraging an anonymous proxy service like Tor, they may change IPs quite frequently with little effort and never be noticed. (ii) IOC cannot express how the attacker interacts with the victim system, and the process of the attack cannot be represented. (iii) Redundancy occurs when IOC is used to express an attack. In other words, more IOCs do not necessarily lead to a better description.
Bianco proposed the Pyramid of Pain [4], in which each level of the pyramid represents different types of attack indicators leveraged to detect the activities of the adversary, and the most valuable attack indicator is attacker TTPs. TTP profile [5] describes the flow that adversaries go through to accomplish their mission, from initial access to impact and at every step in between, which is abundant to support a comprehensive analysis of the aggressive behaviors of individuals or attack groups. Meanwhile, the defense is shifting from vulnerability-centric to threat-centric, and flexible and efficient security architecture can only be constructed with a sufficient understanding of the threat of the critical assets, which depends on an overall comprehension of the attack tactics, techniques, and behavior patterns (i.e., TTPs). However, at this stage, there is no mature method to normalize the description of attacks on IoT devices and map them to the analysis model. A method for automatic TTP profile extraction of IoT device attacks is expected.

Clustering Attackers into Groups.
With the rapid growth of APT activities, the evolution of a threat landscape moves from a single hacker to well-organized attack actor groups (e.g., Darkhotel [6] and Turla [7]). How to find and depict the behavior of an attack actor group among an ocean of attacks becomes a challenge. Behavioral analysis in sandboxes [8,9] and binary analysis [10,11] seem like pleasant ways, which can match malicious samples used by attackers to known or novel malicious families and capture their behaviors to observe the similarities between these attackers. However, malicious family and attack group have a many-to-many relationship, and we cannot just rely on the analysis of malicious samples to find the group behind attacks. Considering the excellent performance of the data-driven approach in the field of network security [12][13][14], we try to tackle the challenges from a data-driven perspective.
Given the challenges presented above, this paper aims to develop mapping knowledge bases from attacker payloads to the ATT&CK framework to extract the TTP profile and generate behavior fingerprint for attackers to discover groups behind active campaigns. e ultimate purpose is to observe the behavior of attack actor groups and predict attacks in the Internet of things.

Contributions.
ree critical contributions of the paper are as follows: (i) Comprehensive Description of Attacker Behavior.
GroupTracer leverages four feature groups (TTP profile, Time, IP, and URL) that are derived from log data to characterize different actions of attackers, which addresses the emerging challenge of the observation and prediction of attacks on IoT devices by individuals. e TTP profile depicts the technique, tactic, and procedure of the attacker. e Time feature group provides statistical characteristics based on attack duration, number of attacks, and time zone of the attacker. e IP and URL feature groups both involve the type of IP/URL and the malicious index, while the latter also analyzes the download file.
(ii) Automatic TTP Profile Extraction. Considering that the data source is honeypot log data, which collects payloads utilized by attackers, we construct the 1st and 2nd knowledge bases, which store the mappings between commands and TTPs. By using these knowledge bases, GroupTracer maps commands derived from payloads to the ATT&CK framework to extract the TTP profile, which bridges the gap between cyber threat intelligence (CTI) and the attacker. (iii) Group Cluster and Attack Tree Generation.
GroupTracer proposes four feature groups and hierarchical clustering algorithm to build attack group cluster model which aims at finding out the potential groups behind complex attacks. In order to better understand each attack group's behaviors, GroupTracer also introduces attack tree construction method where nodes describe attack commands and edges represent command sequences. e evaluation result shows that GroupTracer can achieve excellent performance as the Calinski-Harabasz index reaches 3416.93. e remainder of this paper is organized as follows: Section 2 explains the related work about the fundamental techniques used in our framework. Section 3 presents the data collection, flow of feature processing, application of clustering algorithm, and attack tree creation in Group-Tracer. e entire experiment and evaluation process is elaborated in Section 4. Finally, the conclusion and future work are discussed in Section 5.

Application of IoT Honeypot.
To specialize in cyberattacks and defend against them, tools for proactive defense are presented. For instance, honeypot that can capture attacks, document intrusion information about instruments and behaviors of hackers, and prevent attacks outbounding the compromised system [15] has been widely leveraged in cybersecurity. Due to the vulnerable and destructive nature of IoT devices [16][17][18], the number of IoT honeypots based on different protocols is rapidly increasing. Currently, some IoT honeypots have already existed [19]. e work in [20] utilizes IoTPOT, a novel honeypot that stimulates the Telnet-enabled IoT devices, which handles commands sent by attack actors, analyzes malicious families on different CPU architectures, and provides an in-depth analysis of ongoing attack behavior. However, IoTPOT focuses on observing the characteristics of malicious families (e.g., spread tendency and ultimate goal) and relationships between these families. It does not employ existing data to analyze the behavior of the aggressors behind attacks and associations between them in detail, which is the center of cyber threat intelligence. A honeypot that emulates the ZigBee gateway and aims at assessing ZigBee attack intelligence and IoT cyberattack behavior is proposed in the text [21]. Although this paper analyzes the commands in the honeypot data at great length and classifies them into six categories of attacks, it does not mine the TTP of these attacks, which helps analysts in threat modeling. Heo and Shin [22] analyze the connection-level log data to study Telnet service scanning to provide solid evidence for the existence of IoT botnet, whereas the dataset contains only connection metadata, so there is no way to analyze the payload in packages, which is a critical evidence for attacking, thus not entirely convincing.
In conclusion, most published methodologies have focused on a single service such as Telnet and ZigBee and analyzed features of malicious families. GroupTracer is more widely used for protocols where command execution vulnerabilities occur and depicts the characteristics of attack behaviors. Besides, most of the previous studies have not analyzed payloads at the TTP level, and some even have not analyzed payloads at all. GroupTracer extracts attack techniques, tactics, and procedures (TTPs) from payloads and utilizes payloads to build attack trees for potential attack groups to more specifically demonstrate their attack behaviors.

Cyber
reat Intelligence and TTP Extraction. Gartner defines cyber threat intelligence (CTI) as evidencebased knowledge, which can be utilized to inform decisions concerning the subject's response to menace or compromise [23]. With the rapid evolution of the cyber threat landscape, the demand for high quality and fast speed of CTI exchange that allows organizations to respond to emerging threats at the tactical level is becoming increasingly urgent. TTP describes the techniques, tactics, and attack patterns used by the adversary and can be presented in structured text formats that meet the high demand. Husari et al. [24] develop TTPDrill that can achieve the automatic and context-aware analysis of CTI to generate TTPs precisely. eir work bridges the gap between unstructured cyber threat intelligence and structured techniques, tactics, and procedures. Nevertheless, their data source is the cyber threat intelligence, which means that only after CTI is produced, can TTPDrill construct a complete attack pattern. Our work aims at decreasing the time-to-defend even more.

Group Cluster.
Clustering and correlating have been studied extensively and are employed in a multitude of datadriven domains, including security and privacy areas [25]. In a similar direction to this paper, the work in [26] applies an unsupervised method to characterize and classify securityrelated anomalies and attacks that exist in honeypots without learning phase, labeled traffic, or attack signature database. Cho et al. [27] compare the similarity of the distributed domain to predict the same group, which provides the possibility of response to future attacks. Azevedo et al. correlate IOCs from different OSINT feeds and cluster them to obtain enriched IOCs. is work allows the identification of attacks that was impossible by analyzing IOCs individually. One work that inspires us comes from Ghiëtte et al. [28]. ey dissect the SSH protocol to fingerprint tools based on cipher suites and SSH version strings, employing key exchange algorithms and SSH banners to cluster similar tool usage into collaborating individuals and even campaigns. However, as [4] said, adversaries can employ or create another tool that has the same capability to evade detection.
By comparison, GroupTracer employs honeypot log data, in which timestamps, IP addresses, and payloads sent by attackers are usually recorded, and considers four different perspectives (e.g., TTPs and Time) to generate feature groups. For example, it generates TTP profiles by mapping payloads to ATT&CK framework based on command characteristics. en, it clusters similar adversaries' behaviors into groups based on these features. Our work draws on the strengths of the mentioned studies and improves their weaknesses.

Framework
e ultimate goal of this paper is to automatically extract TTP profiles and cluster attack groups in the Internet of things. Figure 1 overviews the flow of GroupTracer. Firstly, it captures attacks, generates raw data, and extracts features from specific fields (e.g., timestamp, payload, and timezone). We deploy numerous honeypots on the Internet to capture attacks. Secondly, it enriches these features. As for generating the TTP profile feature group, it cuts payload into commands, maps these commands to the ATT&CK framework, and then generates Abstract Syntax Tree of the commands for a second mapping to techniques and tactics. After generating all feature groups, encoding and TF-IDF algorithm are utilized to vectorize these string-type features.
irdly, it combines all feature vectors and leverages the hierarchical clustering algorithm to cluster these attackers into groups. Finally, attack trees for each group, where nodes are commands and edges are command sequences, are created from their payloads to characterize attack profiles.

Raw Data Collection.
e log format of open-source honeypot contains general fields and particular fields. ere are 12 general fields standard in all honeypots. src ip is the source IP, and src port is the source port number. sensor ip and dst port represent the IP and destination port number Security and Communication Networks of the honeypot, respectively. timestamp represents the time when the event occurred. protocol represents the underlying protocol used by the honeypot. geoip describes the current geographic location of the IP and related information like time zone, which is usually obtained by the external API GeoLite [29]. pot version is added by the honeypot developer to indicate the current version of the honeypot. fingerprint is mostly a hash value generated from a string of src ip, timestamp, and a specific honeypot-specific field. log type depicts the type of event recorded in the log, and honeypot type describes the type of honeypot. container id represents the ID of the docker container. e particular field is determined according to the specific protocol used by the IoT honeypot. GroupTracer mainly leverages three general fields, namely, src_ip field, timestamp field, and geoip field in honeypot log data, as these three fields can provide information to generate the Time and IP feature groups. Given that the payload may appear in different fields, GroupTracer will accurately locate the corresponding field through string matching. To ensure the universality of the framework and the integrity of the payload, the contents of all fields that have payloads are spliced.

Feature Extraction.
e following subsections detail how GroupTracer converts these fields into four feature groups, namely, the TTP profiles, Time, IP, and URL. src_ip field is considered to be the primary key in all fields because the probability of an IP being used by multiple groups is minimal, even if the individual IP is assigned dynamically.

TTP Profile Feature Group Generation.
e TTP profile consists of tactics and techniques used by attack actors. Tactic depicts the common strategy of a threat action (e.g., execution and defense evasion). ATT&CK framework provides 12 categories for corresponding techniques. GroupTracer extracted these names as tactics in TTP profiles. Technique describes attack techniques implemented by attack actors under a specific tactic. For example, defense evasion is a tactic that can be performed by a technique named clear command history.
GroupTracer produces the TTP profile primarily from a command execution perspective. ere is a crucial issue to be compromised. On the one hand, the classification of commands should be as accurate as possible. On the other hand, command parameter values sometimes affect the classification needlessly. e following two commands illustrate both cases. Both of these commands can be classified as technique file deletion. (i) can be further subdivided into technique clear command history, whereas (ii) can only be classified as technique file deletion. For (i), the whole statement is a valid classification basis, while for (ii), only rm can serve as a valid classification basis, and all other parts are redundant.
Aiming at solving the above dilemma, the quadratic mapping method is proposed. Figure 2 illustrates the process of quadratic mapping. ere are two knowledge bases in GroupTracer. One saves the mapping of the entire command statement to tactics and techniques for the first mapping, and the other stores the mapping of the abstract command structure to tactics and techniques for the second mapping. In the first mapping, GroupTracer splits the original payload into several commands and maps these commands to the 1st knowledge base to attain some of the TTP profiles. As shown in Figure 3, GroupTracer generates the Abstract Syntax Tree [30] for each payload to obtain command nodes and extract abstract command structures. en, these abstract structures are put as input into the 2nd knowledge base to acquire the  remaining TTP profiles. Table 1 only shows some mapping samples due to space limitation. After obtaining the TTP profiles, the GroupTracer encodes the corresponding string into a numeric feature vector. e final data structure can be described as follows: where there are variable-length techniques and tactics.

IP and URL Feature Groups
Generation. e features related to IP and URL feature groups are shown in Table 2, the first three of which are common to both feature groups, and the last one is unique to the URL feature group. e country can be obtained through Ipdb [31]. VirusTotal [32] provides multiple antivirus scanning engines (e.g., Kaspersky URL advisor, Malware Domain Blocklist [33], and Dr.Web Link Scanner [34]) used for URL scanning. GroupTracer first utilizes VirusTotal to scan for unknown IPs/URLs and then regards the number of antivirus engines that return malicious results as the malicious index for those IPs/URLs. According to the purpose, there are seven types of IP addresses shown in Table 3.
is framework uses the RTBAsia API [35] to get the classification of IP types. Download is an optional option for the URL feature group, depending on the type of vulnerability in the honeypot, because in some cases the download is to install a backdoor for subsequent manipulation, while in others it is to provide some tools for privilege escalation under certain circumstances. When attacking different honeypots, it is likely that hackers in a group download file for different purposes, which will greatly affect our performance in clustering only by downloading file names. After attaining these feature groups, the GroupTracer encodes the corresponding string into a numeric feature vector.

Time Feature Group Generation.
As the attacker groups tend to use the tool framework to attack the specified target, there is a corresponding regularity in the IP time zone, attack duration, number of attacks, etc. Algorithm 1 describes the generation of the basic time-series features for a given T ip , which represents the set of timestamp for a specific IP. T is a collection of T used as input to generate the start time, the attack duration, and the number of attacks in each duration. In addition, GroupTracer reuses this code to select the final threshold. When selecting threshold, we nest a for loop in the outermost layer, and the threshold value increases by 1. Meanwhile, we count the number of time interval in each T ip and assign their sum to the variable num interval. If num interval has not changed compared to the previous loop, we jump out of the cycle and output the final threshold. e threshold, which indicates how small the time interval between two attacks is before they are considered to belong to the same attack period, is utilized to divide the attack duration. Partitioning each attack period and characterizing the behavior (e.g., duration and number of attacks) of each attack steadily can be the key to the reliability of time analysis, since the members of a group may be similar in these respects. To select a number as the initial threshold, GroupTracer maps all the time interval values into the following 5 time buckets: <1 s, [1 s, 1 min], (1 min, 1 h], (1 h, 1 day], >1 day. e results show that 99.91% of time gaps fall into the first four buckets, so the initial threshold is set to 1 h. en, GroupTracer employs Algorithm 1 to accomplish the threshold selection procedure. It first counts the total number of attack durations for each IP and then adjusts the threshold slowly until the number of attack periods for most IPs is almost unchanged. If multiple thresholds have the same result, choose the smaller one first, as the threshold always tends to choose the smaller one.
After we get the final threshold and basic time-series features, we extract the statistical features from the basic ones, namely, the start time, the attack duration, and the number of attacks in each attack duration. GroupTracer draws a new time series for the relevant features of each IP, taking the start time as the independent variable and the attack duration and number of access as the dependent variables. Several features that proved to be significant timeseries characteristics in an early stage shown in Table 4 are applied. After generating the time series, GroupTracer will automatically calculate these selected features to produce the final feature vectors.
By encoding the timezone field appearing in the data, GroupTracer finally turns the string-type features into integer vectors. Eventually, statistical features from timestamp and encoded time zone constitute the Time feature group.

Group Clustering Algorithm.
As the hacker groups tend to utilize customized frameworks, features like time gap and TTP profiles have specific patterns. erefore, attacker behavior naturally forms clusters. is paper is aimed at identifying such natural clusters to dig out potential groups behind active campaigns.
We employ the well-known hierarchical clustering algorithm [36] as it captures the hierarchical structure between clusters, which helps security experts to observe the relationship between clusters and subclusters. Moreover, hierarchical clustering is suitable for arbitrary shape clustering and is insensitive to the input order of samples.  Figure 4 depicts a bottom-up approach to perform hierarchical clustering. Each sample represents a unique cluster in the beginning, and then we choose Euclidean distance to calculate the similarity between each cluster and merge these clusters successively. e threshold applies when forming the final flat clusters. e Euclidean distance is computed as follows [37]: where x � (x 1 , x 2 , . . . , x n ) and y � (y 1 , y 2 , . . . , y n ) are two points in Euclidean n-space.

Attack Tree Creation Method.
After digging out potential groups, GroupTracer gathers all payloads and generates attack trees for each cluster to embody and better understand group behaviors. Algorithm 2 is to process all payloads from a cluster. P T represents all payloads collected from a given cluster T.
In the directed graph, nodes are command names and edges represent the command sequence between two commands. e out-degree of each command determines the size of each node. If a command exists only at the end of the payload, its size can also depend on the in-degree. e width of each edge is decided by the weight, which describes the frequency of occurrence of the command sequence. For instance, we have some payloads from a cluster, whose command sequences shown in Table 5 are obtained after command segmentation and abstraction. GroupTracer runs Algorithm 2 to generate the attack tree, as illustrated in Figure 5. Line 3-line 13 generate a list that stores two-step command sequences. For example, it turns cd chmod ./t into [cd chmod, chmod ./t]. Line 14-line 16 counts the number of occurrences of all two-step command sequences.

Dataset.
In this section, we first describe the datasets obtained from two kinds of IoT honeypots: UPnP-SOAP multiport honeypot and the Netis router backdoor honeypot. e time of data collection, the number of IPs, and the number of log entries are shown in Table 6.

UPnP-SOAP Multiport Honeypot.
In the UPnP service, SOAP protocol assists in defining device types and other related information [38]. erefore, there are a huge number of IoT devices that provide SOAP services, some of which do not require authentication. Honeypot simulates the behavior of the 11 ports most frequently scanned (e.g., 52881, 5500, and 2048) and the relevant SOAP service path and returns the corresponding information after being scanned. Six nodes are deployed on the Internet, averaging about 700 log entries per day. e dataset contains 153,413 log entries from 2,652 IPs over 196 days in 2019 (Table 6). Each log entry is identified by src ip, timestamp, timezone, body, and query string. e src ip in our datasets is globally unique.

Netis Router Backdoor Honeypot.
e Netis router listens on port 53413 (UDP) by default. After sending a specific string to it, the attacker can gain root login and then execute the corresponding command to perform a series of malicious behaviors. e honeypot has seven nodes deployed on the Internet, averaging about 300 log entries per day. Our Netis dataset contains 241,593 log entries from only 373 IPs over 279 days in 2019 (Table 6). Unlike UPnP-SOAP honeypot, the log entry in Netis is characterized by src ip, timestamp, timezone, and data.
Given that both types of honeypots are simulated command execution vulnerabilities, we can leverage commands executed by attackers to discover their technology and tactics. erefore, GroupTracer can be utilized to analyze the data of these two honeypots at the same time. Our datasets contain ten types of techniques that can be grouped into six tactics. ese tactics are as follows [39]: if delta ≤ threshold then (8) end � T ip [i + 1] (9) cnt � cnt + 1 (10) else (11) unit � start′: start,′duration′: start − end,′cnt′: cnt (12) time interval.append(unit) (13) start end � start (15) cnt � 1 (16) end if (17) end for (18) if start ≠ end then (19) unit � start′: start,′duration′: start − end,′cnt′: cnt (20) time interval.append(unit) (21) end if (22) time intervals.append(time interval) (23) end for ALGORITHM 1: Basic time-series feature generation for a given T ip .  (iv) Execution: running malicious code. e purpose of adversary-controlled code might be communicating with C&C servers or stealing data. (v) Impact: expanding the impact of the intrusion on the victim system.
(vi) Collection: gathering information related to the adversary's objectives. Table 7 shows techniques corresponding to each tactic and commands mapped to these techniques in our datasets.

Clustering Performance Evaluation.
After clustering, we need measurement indicators to evaluate the effect. In general, the measurement for the quality of a clustering algorithm can be categorized into two kinds of criteria [40]: internal validation and external validation. External criteria are based on the previous knowledge about the data and require that ground truth labels are known. However, the labels of samples in this paper are not available. us, the internal validation is more suitable for our evaluation. More specifically, three internal indexes are utilized in this evaluation. ese indexes measure if clusters are well compact and separated.
e larger value of CH indicates a better clustering solution.
(ii) Silhouette Coefficient [42]: e Silhouette Coefficient s is composed of two scores: a means distance between a sample and the rest in the same cluster. b is the distance between a point and all other samples included in the next nearest class. s for all samples is given as the mean of the Silhouette Coefficient for each point. As for single sample i, s can be computed by e Silhouette Coefficient index ranges from −1 to 1; −1 represents a weak clustering effect, and 1 means a good classification effect. 0 indicates the overlap of clusters. A higher s indicates a better clustering quality. (iii) Davies-Bouldin (DB) [43]: DB can be measured as follows: k denotes the number of clusters. i, j represent different cluster labels (j ≠ i). d(X i ) and d(X j ) are the distance from all samples in cluster i and j to their respective cluster centroids. d(k i , k j ) is the distance between these centroids. A smaller DB value means a better clustering result.

Experiment Design.
In the following, we evaluate GroupTracer by examining the cluster quality, i.e., how well clusters capture similar attack actors. e primary evaluation is for the group clustering of GroupTracer. We compare the evaluation results of GroupTracer based on the three indicators we mentioned above with the performance of group clustering based on the other two baseline algorithms to conclude that GroupTracer has excellent performance.

Comparison Baselines.
Meanshift [44] and K-means [45] clustering algorithms are chosen to be the baselines.
Meanshift is a centroid-based algorithm that requires an iterative step. It continuously calculates the expected moving distance of the center point and moves until the final condition is reached. For a given sample set, K-means divides it into K clusters, minimizing a criterion (e.g., withincluster sum-of-squares). K is a positive integer number and must be predefined. We first extract the required features according to Section 3.2 and convert them into usable feature matrices. Before performing the final group clustering, we normalize the feature matrices. e reason is that clustering algorithms all use a distance measure to determine if object i is more likely to belong to the same cluster as object j than the same cluster as object k. ese distance measures are affected by the scale of the variables. By putting all variables into the same range, we can weigh all variables equally, especially when the feature vectors are generated differently. e main idea of normalization is to calculate the p − norm of each sample and then divide each element in the sample by the norm. e result of this process is that the p − norm of each processed sample is equal to 1. In the experiment, p is set to 2 and L2 − norm of vector x � (x 1 , . . . , x n ) can be defined as [37] ‖x‖ 2 � x 1 2 + x 2 2 + · · · + x n For the purpose of eliminating the influence of irrelevant variables as much as possible, we leverage all data in the dataset for experiments of each algorithm. For GroupTracer, we iterate the threshold value to pick the threshold with the highest clustering quality. e number of iterations is 20.
en we generate multiple versions of K-means clustering to attain the best clustering solution (Calinski-Harabasz index). Meantime, we run the Meanshift algorithm with the window size changing to get the best effect.

Experiment Result and
Discussion. e evaluations of GroupTracer and two comparison baselines using the metrics we mentioned above are shown in Table 8. When running each algorithm, the value of different independent variables (e.g., threshold and quantile) brings the most significant change to the Calinski-Harabasz index, so this index is considered as the primary reference index when evaluating each algorithm. As a result, the Calinski-Harabasz index of GroupTracer reaches 3416.93 when the threshold is set to 10, which is about three times the K-means model and 1.5 times the Meanshift model. Taking the Silhouette Coefficient index as the definitive reference, the performance of GroupTracer is still the best. Although our algorithm performance is not the best after taking the Davies-Bouldin index into account, the gap is also within the acceptable range. GroupTracer generates 4 clusters, while the K-means model generates three classes. In the same way, the Meanshift model only generates 2 clusters for all data. e evaluation of GroupTracer using the three metrics is illustrated in Figure 6, where the value of threshold ranges from 1 to 20. e Silhouette Coefficient index is on the rise  until the threshold reaches 9. When the threshold value ranges from 10 to 17, the index stabilizes at a relatively high level in Figure 6(a).
e Davies-Bouldin index shows a downward trend as a whole until the threshold reaches ten and starts to stabilize at the lowest point (0.7367). After that, the index starts to rise rapidly in Figure 6(b). When the threshold value is 10-17, the Calinski-Harabasz index is maintained at the highest level (3416.93) in Figure 6(c), and the number of clusters is also kept at 4. In summary, all three indicators show that when the number of clusters is 4, the cluster quality becomes the highest and reaches a stable state.
When the CH index reaches the highest point, the Silhouette Coefficient and the DB index both have the worst effect in Figure 7. Similarly, when the CH index reaches the highest level, the DB index is in a lower position in Figure 8. It can be seen in these two figures that the changing trends of these indicators are not matched, which means that the two baselines are not well applied to our datasets.

Limitation and Future Work
Our research proposes a framework that can dig out potential groups behind active campaigns.
is new technique can make full use of information from attack campaigns. However, our current design is preliminary. We only focus on the IPs used by the potential groups and do not go any further to track which specific groups were involved, that is, to try to correspond to the real groups [46]. Moreover, GroupTracer can only deal with honeypot logs that contain attack payloads.
In future work, we expect to combine NLP techniques with cyber threat intelligence to precisely match these potential groups to the real-world APT groups, in which the attack tree may be helpful to extract a group profile. Moreover, an experiment to prove the effectiveness of the attack tree should also be carried out. Further, we expect to apply more data sources such as system log and network traffic to expand our knowledge base and perform more comprehensive analysis.

Conclusion
In this work, we propose GroupTracer, a framework for attack actors clustering from IoT honeypot logs. Group-Tracer is aimed at extracting the TTP profile automatically and digging out potential groups behind active campaigns. By mapping payloads to the ATT&CK framework, GroupTracer can effectively extract structured TTP profiles using two knowledge bases. Besides, this framework leverages four feature groups (namely, Time, TTPs, IP, and URL), a total of 18 characteristics, derived from log entries to capture the natural hierarchical structures for attacker groups. Finally, GroupTracer constructs attack trees for each cluster to embody the group actions. In the experiment, we compare our algorithm with two baseline algorithms. e evaluation of 395,006 log entries from 3,025 IPs reveals the high performance of GroupTracer, in which the Calinski-Harabasz index reaches 3416.93. Moreover, our proposed framework is generalizable as it is from a log accounting perspective, so its application is not limited to the IoT honeypot.

Data Availability
e research data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.