LogPal: A Generic Anomaly Detection Scheme of Heterogeneous Logs for Network Systems

. As a key resource for diagnosing and identifying problems, network syslog contains vast quantities of information. And it is the main source of data for anomaly detection of systems. Syslog presents the characteristics of large scale, diverse types and sources, data noise, and quick evolvement, which makes the detection methods not generic enough. To efectively address problem of log anomaly labelling caused by massive heterogeneous logs, we propose LogPal, a generic anomaly detection scheme of heterogeneous logs for network systems, which innovatively combines template sequences and raw log sequences to construct and generate log pattern events. By improving the self-attention mechanism of transformer, LogPal proactively synthesizes self-attention and handles log pattern events in a unique way. Te model can make full use of log template and sequence semantic information, by automatically becoming aware of the pattern of logs. We implemented experiments to evaluate the performance of LogPal on publicly available datasets, and the outcome of the experiments shows that LogPal automatically adapts to log type changes and improves precision, recall, and F 1 score to 99% on publicly available datasets.


Introduction
When the system is running, syslog is used to record the runtime state and events of the system, including the anomalies of the system.As the most reliable source of information for monitoring the health of a system, syslog contains massive amounts of information and is the main source of data for anomaly detection in the system [1].For traditional standalone systems, developers write specifc rules based on domain knowledge or manually check logs to detect system anomalies.
However, modern information systems usually adopt a distributed architecture.Syslog is multisourced and heterogeneous.Syslog usually originates from multiple subsystems with various types, structures, implementations, versions, and deployment environments [2,3].Te approach to anomaly detection, which relies heavily on manual check of logs, is almost unworkable for large-scale system.Moreover, developers usually use free text to record system time for convenience and fexibility.Examples of heterogeneous logs are shown in Table 1.
More importantly, just like any other software maintenance, syslog is constantly evolving.Developers may frequently modify the source code, including logging statements.So, this can create a new log pattern that has not appeared and afected the results of anomaly detection.As Kabinna et al. [4] observed, in their research project, about 20%∼45% of the logging statements changed during their lifecycle.Many new log events and log sequences are generated by dynamic logging statements.
Terefore, many automated anomaly detection methods based on logs have been proposed in recent years, and these methods are mainly classifed into unsupervised learning and supervised learning.Unsupervised learning methods usually use machine learning techniques such as clustering and PCA [5][6][7][8], but unsupervised learning tends to be less accurate compared to supervised learning methods.Supervised learning methods generally learn the anomaly patterns of logs based on anomaly labelling to achieve the purpose of anomaly detection.And supervised learning methods usually use deep learning methods such as LSTM and CNN [9][10][11][12].Although some of the above methods can efectively detect anomalies, log sequence anomaly detection problems face the following challenges: (1) It is rather difcult to achieve a balance between learning log templates and raw log semantic information.Tanks to the rapid development of natural language processing and deep learning, some methods build log anomaly detection methods based on raw log sequences when solving the heterogeneous log anomaly labelling problems, and there is hardly any parsing of the raw log sequences, making it difcult for models to utterly learn log word vector semantics or patterns.Tere are also approaches that parse the raw logs by extracting log sequence templates and use the log templates as input to build template-based anomaly detection network models.However, these approaches simply using log template sequences as training data to obtain word vectors for the templates, ignoring the key textual information specifc to the raw logs, which can lead to more serious results.For example, two or more normal log sequences and anomalous log sequences are considered the same template by removing the critical variable part, and the model "considers" log sequences with diferent labels as the same input, which is quite fatal for anomaly detection.Terefore, how to make the model understand the log patterns more easily while retaining all the information of log semantics becomes one of the key issues for logbased anomaly detection.
(2) Tere is a large amount of noise in log data.A certain level of noise is inevitably interspersed in the collection and preprocessing of log data [13].Log data are derived from various events that occur on distributed hardware and software systems.Tese events include both events that characterize the system as anomalous, such as being subject to DDoS attacks, storage failures, anomalous system behavior, and network jitter, and events that characterize the system as normal, such as successful ping sessions, successful subsystem startups, and fle reads and writes.Since logs are usually generated by multiple processes or threads of the system, a log sequence often contains multiple normal/anomalous.Tis results in an anomalous log sequence often interspersed with one or more normal logs, presenting a signifcant challenge for log sequence anomaly detection.In addition, in large-scale systems, many logs are generated individually by geographically distributed components and then uploaded to a centralized location for further analysis.Tis collection process can lead to missing, duplicated, or disordered log sequences (e.g., due to network errors, limited system throughput, storage issues, etc.) [14].A McKinsey network survey [15] found that 80% to 98% of logs are just noise, which makes processing and analyzing log data tricky.Noise in log data hinders the efectiveness of existing log-based anomaly detection methods.(3) Accuracy and recall are still difcult to balance.In anomaly detection based on heterogeneous logs, the precision rate refers to the proportion of true anomalous logs among those predicted to be anomalous; the recall rate refers to the proportion of logs that are predicted to be anomalous among all true anomalous logs.As we all know, there exists a relation of "as one falls, another rises."It is an uphill battle to have both accuracy and recall.Te system can generate hundreds of millions of system logs in just a few months, among which the anomalous logs can reach hundreds of thousands; even if there is a 1% error in the precision rate, there may be thousands of false positives, which is a great vexation for operations staf.Likewise, if the recall rate has 1% error, this means that there will be thousands of anomalous logs ignored, and some of them may be caused by fatal failures, which will cause serious losses.How to balance and improve the two is one of the most important challenges for researchers to overcome today.
To solve the above key challenges, in this study, we propose a generic anomaly detection mechanism for heterogeneous logs, called LogPal, which flters the raw system logs, then uses the FT-tree method to parse the log templates, and next splices the templates with the raw logs to generate log pattern events, thus realizing the automatic parsing of heterogeneous logs.Moreover, based on the semantic similarity of the anomalous sequences of heterogeneous logs, we combine natural language processing methods and deep learning methods to improve the transformer model to learn log patterns more adaptively and efectively to achieve anomaly detection of heterogeneous logs.Te contributions of this study can be summarized in the following points: (1) To address the difcult problem of balancing log templates and all semantic information of the raw (2) For the two parts that are diferent from each other by log pattern events, after embedding the pattern events into log pattern vectors, the synthetic attention approach is prospectively used to improve the transformer model to process log pattern events diferently, so as to build a pattern-aware learning model for heterogeneous logs.(3) To address the large amount of noise present in log sequences, in the synthetic attention part, the model's capability and computational complexity are balanced by the relative deviations of diferent tokens.Te input tokens focus on each token, thinning out Tokens with diferent deviations away from it in a fne-to-coarse fashion, as a way to reduce or even ignore noise in the log sequence.
Te rest of the study is organized as follows.Section 2 analyzes the work related to log-based anomaly detection.In Section 3, we introduce the framework of LogPal and the workfow of log parsing and anomaly detection in detail.Section 4 describes the experimental environment and datasets, evaluation indicators, experimental results, and the corresponding analysis.Section 5 concludes the study and looks forward to future work.

Related Work
Te traditional machine learning approaches are playing an increasingly infuential role in log anomaly detection.For example, Bodik et al. [16] use regression-based analysis techniques to automatically classify and identify performance crises by constructing a new representation of data center state, called a fngerprint, which is constructed by statistical selection and summarization of hundreds of performance metrics typically collected on such systems.It can be used to detect specifc performance crises that have been seen before, but has limited efects on new unseen performance crises.
Chen et al. [17] proposed a decision tree learning method to diagnose failures in large Internet sites, which is the frst application of decision trees to anomaly detection.Te method records the runtime attributes of each request and applies automated machine learning and data mining techniques to determine the cause of failure.Te algorithm was able to successfully identify 13 of the 14 true causes of failure, achieving a 93% identifcation rate.
Although efective, traditional machine learning methods often require manual extraction of features from the raw logs, and the results of the model output depend heavily on the extraction of features.In addition, traditional machine learning methods cannot efectively address the heterogeneity and evolution of logs, making the accuracy of anomaly detection based on traditional machine learning methods not very high.With the rapid development of deep learning and natural language processing, research has focused on the application of sequence-based [9][10][11][12][18][19][20][21] models.Du et al. [9] designed the DeepLog framework using LSTM neural networks to realize online anomaly detection on system logs.DeepLog uses not only log keys, but also metric values in log entries to detect anomalies, and it relies only on a small training dataset consisting of "normal log entries."Te LogMerge anomaly detection method proposed by Zhang et al. [13] combines LSTM and CNN methods to efectively extract the backward and forward dependencies of log sequences, yet signifcantly reduces the impact brought by noise in log sequences.LogMerge learns the semantic similarity of multisyntax logs, which enables the migration of log anomaly patterns across log types and greatly reduces the anomaly annotation overhead.LSTM with attention mechanism has also been used to improve the performance of complex sequence modeling tasks, such as those for which Zhang et al. [14] proposed the anomaly detection method LogRobust.LogRobust extracts semantic information of log events and represents them as semantic vectors.Ten, it detects anomaly using an attention-based bi-LSTM model that captures contextual information in log sequences and automatically learns the importance of different log events.In this way, LogRobust can identify and handle unstable log events and sequences, is robust to unstable log data, and solves the problems of unstable log data in anomaly detection, but when the log sequences span is large and the network is deep, it can greatly increase the calculation.Tese are some explorations of log sequence anomaly detection with LSTM, but further improvements are needed in detecting accuracy and reducing computational overhead.
Transformer [22] is a state-of-the-art NLP architecture based on self-attention, it breaks the limitation that LSTM models cannot be computed in parallel, and the self-attention mechanism is a more interpretable model that has achieved many impressive results on natural language processing tasks, and in recent years, gradually more and more researchers have been applying this model to the feld of log anomaly detection.For example, Nedelkoski et al. [18] proposed Logsy, a classifcation-based method to learn log representations that allow to distinguish between normal system log data and anomaly samples from auxiliary log datasets, easily accessible via the Internet.Te idea behind Logsy is that the auxiliary dataset is sufciently informative to enhance the representation of the normal data, yet diverse enough to regularize against overftting and improve generalization.Steverson et al. [19] detect attacks on an enterprise network by applying mining NLP techniques to Windows Event Logs (WELs), using transformer models and self-supervised training methods.A self-supervised anomaly detection model was constructed by combining deep learning methods, traditional machine learning, and natural language processing.Te model flters log into a series of words with a few simple steps.Te model does not perceive template for input and has poor generalization Security and Communication Networks ability to logs of the same template that have not appeared, in addition to the simple fltering of logs makes it difcult to eliminate the efect of log noise and may even make log data noisier.Le and Zhang [23] proposed NeuralLog, a novel logbased anomaly detection approach that does not require log parsing.NeuralLog extracts the semantics from raw log sequences and represents them as semantic vectors.Tese representation vectors are then used to detect anomalies using a transformer-based classifcation model.
Tere are other deep learning methods for log anomaly detection.Qi et al. [24] proposed a novel log-based anomaly detection method called Adanomaly, which uses the BiGAN model for feature extraction and an ensemble approach for anomaly detection.Han et al. [25] proposed a data augmentation strategy that generates a set of anomalous sequences by negative sampling so that practitioners can use the observed normal sequences and the generated anomalous sequences to train a binary classifcation model.

Classification-Based Log Anomaly Detection
3.1.Framework.To address the challenges brought by the heterogeneity, evolution, and data noise of logs, we propose LogPal for generic anomaly detection for heterogeneous logs under massive noise.LogPal can automatically parse heterogeneous logs and improve the accuracy of syslog anomaly detection by combining the raw logs to obtain the fnal log pattern events, and LogPal can sense the log patterns through an improved transformer model to achieve anomaly detection.Tis section describes the overall framework of LogPal and the details of each part.
Figure 1 shows the overall framework of LogPal, which is divided into two modules: the ofine training module and the online detection module.In the ofine training module, LogPal frst uses the FT-tree method to extract templates from the raw logs, and the templates are combined with the raw logs to parse them into new log pattern events, and construct pattern vectors based on the log pattern events.LogPal inputs the pattern vectors into the transformer deep neural network model of synthetic attention and trains a general anomaly detection model for heterogeneous logs.In the online detection module, LogPal maps online log sequences to pattern vectors based on the above method, judges whether an online log sequence is anomalous according to the trained anomaly detection model, and generates an alarm if it is an anomalous log sequence.

Pattern Vector Construction.
Syslog is usually an unstructured natural language text written by diferent developers and often needs to be parsed by log parsers before it can be efectively applied for anomaly detection based on machine learning, deep learning, and other methods.Currently, it is a common practice to parse syslog by extracting templates from the syslog.A template is usually an invariant part of the syslog that represents the general type and meaning of the event expressed by the log sequence, and similar log sequences can be represented by the same templates, e.g., " * * startup succeeded" is "syslog: klogd startup succeeded" which is a template for "syslog: klogd startup succeeded."Compared with the raw log, the template removes the variable part "syslog: klogd" and keeps the main part of the event, i.e., "A process or port started successfully."Tis template can represent not only the log sequence "syslog: klogd startup succeeded," but also other log sequences that describe the same event as this log sequence, such as "syslog: syslogd startup succeeded." We use the FT-tree template parser [26] for template extraction.FT-tree is an extended prefx tree structure with the basic idea that a fxed part of a log sequence is usually the longest combination of frequently occurring words.Terefore, extracting templates is equivalent to identifying the longest combination of frequently occurring words from the logs.Numerous experiments based on production environment logs show that FT-tree supports incremental learning with high accuracy and high template matching efciency.However, simply taking log template sequences as training data and constructing template vectors based on them, although efective, ignores key textual information peculiar to the raw logs, which results in two or more normal and exception log sequences, removing the critical variable parts, and generating the same template.Tis makes the model "think" of log sequences with diferent labels as the same input, which is fatal for the log anomaly detection model.
In the end, we adopt the frequently used textual preprocessing library torchtext, which flters abundant numbers and special character noise in the raw log sequences and applies character case conversion, then uses FT-tree for template extraction.Te extracted log template sequences are encoded as natural number sequences from 1 to n, and each number represents the type of each template.So far, the raw log sequences have been transformed into template tag sequence, and fnally new textual token sequences are generated and combined with the raw syslog.A combined pattern event will be composed of two parts (template number and fltered syslog).Te new textual tokens sequences not only abstract the main part of each log sequence but also fully retains all the key information of the variable part.In addition, to preserve the semantics of the two parts of log pattern events and reduce or even eliminate the impact of heterogeneous log anomaly detection, LogPal uses all log pattern event tokens (template numbers arranged before syslog sequences) as training data to obtain word vectors of template words and raw syslog sequences and constructs pattern vectors based on them.GloVe [27] integrates latent semantic analysis based on singular value decomposition and the word2vec algorithm by introducing co-occurrence probabilities matrix, which uses both global statistical features of the corpus and local context features.GloVe uses the lexical co-occurrence statistics to change their weights in the objective function J, which is specifed as follows: where v i and v j are the word vectors of words i and j, b i and b j are two deviation terms, f is the weight function, and N is 4 Security and Communication Networks the size of the vocabulary table (co-occurrence matrix dimension is N × N).Te pattern vectors of log pattern events can be obtained by using GloVe.To facilitate the reader's understanding, Figure 2 shows the process of transforming the raw logs into new pattern vectors.

Synthetic Attention Transformer.
LogPal is modeled by an encoder with a multihead attention transformer and takes the constructed log pattern vectors as input, which difers from the input of a traditional transformer in that each pattern vector contains two parts of tokens (the template number and the fltered real log).Terefore, an improved transformer for synthesizing attention is designed to learn the constructed log pattern vectors more efciently.Synthetic attention is represented by a synthetic attention matrix, which is divided into global attention and sparse attention.Global attention is applied to the log template, and sparse attention is applied to the log sequence.Te log template pays attention to every token of log pattern, including the log template itself, because it can even directly determine the anomaly itself.
However, not every token needs to deal with contextual representation.In the typical self-attention mechanism, every token needs to attend all other tokens; however, for a trained transformer, the learned attention matrix K is usually very sparse at most data points.Terefore, the computational complexity can be reduced by combining structural biases to limit the number of keyword key pairs per query.For a given input token, we can group its contexts into nonoverlapping spans of diferent sizes, and the size of the spans increases with their relative distance.Tat is, the input token attends each token, processing the diferent spans away from it in a fne-to-coarse fashion.To obtain the synthetic attention keyword matrix, the template token attention and the sparse log token attention are constructed successively.

Template Global Attention.
Global attention is used for the template token of the constructed log vector, "global" means that the template token can both attend all other tokens and let all other tokens pay attention to it.Te attention formula is as where Attention(Q, K, V) is the value of attention and Q, K, and V are the query vector matrix, key vector matrix, and value vector matrix, respectively.Every row of these three matrices represents a vector corresponding to a token, and we need to calculate a Score Matrix for the template vector before calculates the template attention: Templates are very important for anomaly detection, so global attention is applied to templates.Tat is, only the attention between the template token and other tokens, and the attention of other tokens with the template token are calculated.Figure 3 illustrates this process.

Log Sparse Attention.
Unlike templates, each token of log sequences is processed from center to both ends.Every token pays more attention to the log sequence token that is closer to itself, and the further distant token is not as Security and Communication Networks concerned, which can signifcantly decrease the subsequent parameters in quantity by sparsity.Te following Q matrix, K matrix, and Score matrix all only describe the raw log sequence tokens, without the template tokens.
In the K matrix, we take a heterogeneous log sequence with m tokens as an example.For a certain token i , the distance deviations of every raw log token j (without considering the template token) from token i is calculated as follows: and then, we input m deviations to the minimum heap MinHeap Deviation � Deviation 0 , Deviation 1 , . . ., Deviation m−1  . ( By inputting the deviation into the minimum heap, we can ensure that the next selected token has the minimum deviation from token i .And then, LogPal selects several groups of tokens from the minimum heap, and the number of tokens in each group is 2 0 , 2 1 , 2 2 , ..., total N tokens.Te vector in each cluster takes the maximum value.Next, the maximum vector is used to calculate the Score vector.Ten, process each token i in sequence to get the sparse matrix W Score .Te ith row and jth column of the raw log vectors (excluding the template vector) are the sparse attention values of token i with its own and other token, and fnally further calculated by equation ( 2) to obtain the sparse attention matrix from fne to coarse.Tis is based on the assumption that for any token, the nearest token requires more attention, while the distant token has little impact on it.Tis can reduce the efect of noise away from the distant token.At the same time, it also decreases the parameter quantity of subsequent calculations.Finally, the sparse attention matrix obtained in part 2 is spliced to the lower right of the template global attention matrix.
Figure 4 illustrates the mapping process from the Q matrix and K matrix to the Score matrix with the raw log sequence whose length of token is 15 as an instance.token 7 and token 8 are shown in Figure 4.
Te pseudocode of the sparsity algorithm is shown in Algorithm 1.

Parameter Setting.
In the online detection module, every pattern tokens in the log pattern events is mapped to a 300dimensional vector in the same way, 4 heads are used for multihead attention, a cross-entropy function is used as the loss function to train the LogPal neural network, and a Dropout layer is used to prevent overftting, a sigmoid layer is employed to output the classifcation, and we use a weight decay factor 0.001, the initial learning rate is set to 0.001 for the Adam optimizer, and the fnal training epoch is set to 10.In addition, the random seed can be initialized to a fxed value to ensure that the experimental results can be reproduced.Our model is implemented using PyTorch and trained on an NVIDIA GeForce RTX 3090 GPU.

Experiments
To quantify the performance of LogPal, we conducted various experiments.We compare this method with four exposed baselines on two real-world syslog datasets.We   Security and Communication Networks describe the main information in the datasets, discuss the experimental settings and evaluation indicators, and give the results.
4.1.Datasets.We evaluate the proposed method on the following three open log datasets: BGL dataset [28], HDFS dataset [7], and Tunderbird [28].A brief summary is shown in Table 2, and the details are as follows: Te BGL dataset is an open dataset of logs collected from a BlueGene/L supercomputer system at Lawrence Livermore National Labs (LLNL) in Livermore, California, with 131,072 processors and 32,768 GB memory.Te log contains alert and nonalert sequences identifed by alert category tags.In the frst column of the log, "−" indicates nonalert sequences while others are alert sequences.Te label information is amenable to alert detection and prediction research.It has been used in several studies on log parsing, anomaly detection, and failure prediction.
Te HDFS dataset is generated in a private cloud environment using benchmark workloads and manually labeled through handcrafted rules to identify the anomaly.Te logs are sliced into traces according to block IDs.Te HDFS dataset marks each block sequence as normal or anomalous.Te HDFS dataset consists of 11,175,629 logs collected in 38.7 hours on more than 200 Amazon EC2 nodes.Tere are 575,061 log blocks in the dataset, of which 16,838 are marked as "exception" by Hadoop experts.Tunderbird dataset is an open dataset of logs collected from a Tunderbird supercomputer system at Sandia National Labs (SNL) in Albuquerque, with 9,024 processors and 27,072 GB memory.Te log contains alert and nonalert sequences identifed by alert category tags.In the frst column of the log, "−" indicates nonalert sequences, while others are alert sequences.

Experimental Setup.
Te experimental setup for this study is explained as follows.

. and sum equals Token's numbers;
(2) Min − Heap for offering and polling Deviation; (3) for row i in W Q do (4) for row j in W K do (5) Min − Heap add |i − j|; (6) end for; (7) for N k in Array do (8) while not end of N k do (9) ListKey k add (Min − Heap poll); ( 10) end for; (13)  Precision: it is the percentage of true anomalies among all anomalies detected by the approach: Recall: it is the percentage of anomalies among the dataset being detected: F1 score: it is the harmonic mean of precision and recall: TP is the number of anomalous log sequences correctly detected by the model, FP is the number of normal log sequences incorrectly identifed as anomalies by the approach, FN is the number of anomalous log sequences that are not detected by the approach, and F1 score is used as a metric that considers both precision and recall, which does not favor one metric over another and does not lose scientifc validity due to the imbalance problem of the dataset.

Experimental Results
. Firstly, we compare LogPal with transformer on BGL dataset and HDFS dataset.We convert the log sequences of the datasets into log pattern vectors in the same way, and then, these pattern vectors are input into the transformer model, and relevant parameter settings are consistent with LogPal.We conducted the comparison experiment by controlling the ratios of the training set and test set.Figures 5 and 6 show the comparative results of the experiment.
Te horizontal axis of Figures 5 and 6 represents the ratios of the training set and test set, and the vertical axis represents the F1 score of diferent anomaly detection models.It can be seen that when the training set ratio of LogPal is large, the optimal F1 score of the LogPal method for anomaly detection is 99% and that of transformer model is 98%, which has a weak advantage over transformer; when the ratio of training sets is small, it can better refect the performance advantages of LogPal.It shows that even if a small amount of training data is obtained from the target syslog, LogPal can extract the key information leading to the normal or anomalous log sequences and can produce accurate prediction even in invisible samples.When the ratio of training sets is 2 : 1, LogPal's anomaly detection rate is 90%, while transformer is 86%, LogPal increased F1 score by 4.7%.Even with a large training set, the F1 score improves by more than 1%.LogPal uses synthetic attention to perceive the relationship between the template vector and the raw vector diferently and obtains a better F1 score than the transformer model.Te transformer model does not consider this special feature of the raw log template but only considers the self-attention relationship matrix of the raw log sequence itself making a lot of important information ignored, which may lead to false alarm.LogPal can quickly perceive and learn the semantic information of the pattern vectors to improve the accuracy of anomaly detection.In addition, LogPal adopts a sparse attention method for the raw log sequence token, which can reduce the noise impact even in the long text log sequence and adopting a general and unifed pattern event extraction method to embed the pattern vectors, as we expect, enable robust representation and accurate anomaly detection even for heterogeneous logs or invisible new logs.
To further evaluate the performance of LogPal, we also evaluated LogPal and baselines on BGL dataset, HDFS dataset, and Tunderbird dataset.We show the overall performance of LogPal compared with baselines in Table 3 and Figures 7-9.Based on the three datasets, generally speaking, LogPal has the best performance, and all evaluation indicators are close to 99%.LogPal can generally flter massive heterogeneous logs to generate pattern vectors and perceive log templates and log sequences, respectively, by synthetic attention.PCA and DeepLog use the index of log template to learn anomalous and normal patterns.It ignores the meaning of log sequences and words, and the actual performance is not high.Although InterpretableSAD performs well on the Tunderbird dataset, where all indicators are balanced and the indicator value is not low, the method does not perform so well on the BGL dataset and the HDFS dataset, which may be related to the fact that the method is not universal and may be more suitable for a certain dataset.
It is worth noting that comparing SwissLog and HitAnomaly, SwissLog has a precision of 97% and a recall of 100% in the experiments on the BGL dataset and the HDFS dataset, while the two indicator values of HitAnomaly are exactly the opposite, it has a precision of 100% and a recall of Security and Communication Networks 97% in the experiments.Tis means that SwissLog is more inclined to sequences as anomalous, in other words, it is more capable of uncovering anomalous logs in the logs.Although the recall rate is satisfactory, it is clear from the precision rate that SwissLog mistakes some log sequences that are really normal as anomalous.Tus, generating a large number of false positive predictions, and if a log anomaly detection method generates too many false alarms, which will consume energies of O&M staf to verify the system condition and add a lot of unnecessary work; in  Te bold values show the best performances of method based on the experimental results with the specifc indicator and dataset.
Security and Communication Networks contrast, HitAnomaly can be very precise in logs identifed as anomalous, without so much false positive predictions.But it misses some anomalous logs, thus generating a large number of false negative predictions, which may be a more serious problem if it fails to detect system anomaly.System failures may not be resolved in a timely manner for a long time, which will cause serious losses.Generally speaking, LogPal is able to balance precision and recall and has an improved overall performance.Overall, compares favorably to baselines, LogPal not only learns the semantic information of log word vectors but also focuses diferently on the attention relation between template tokens and log sequence tokens.Finally, LogPal achieves an excellent performance on the BGL dataset, the HDFS dataset, and the Tunderbird dataset.Compared with existing methods, LogPal improves the F1 score by 1% on the HDFS dataset.Besides, LogPal improves the precision and F1 score by 1% on the Tunderbird dataset.

Conclusion
As a kind of data refecting system status and events, syslog provides an important support for detecting various software and hardware system anomalies.Many log-based methods have been proposed to detect anomaly in largescale software and hardware systems.However, the existing methods make it difcult to efectively deal with the labelling problems in heterogeneous logs.To overcome these problems, this study proposes a generic anomaly detection mechanism for heterogeneous logs, called LogPal.Te model innovatively utilizes the synthetic attention transformer encoder network, which prospectively thins out the semantics of log sequences and weakens the infuence of noise.Compared with other methods, it achieves better generalization ability on multisource and heterogeneous samples.Experiments based on public datasets show that the overall performance of LogPal is better than the current machine learning and deep learning methods.In future work, we will further improve the accuracy of anomaly detection by introducing the weight coefcient to learn the contribution degree of the template and the raw log token.In addition, we will explore the synthesis strategy of synthetic attention to reduce the computational complexity and improve the early warning speed of anomaly detection.

Figure 2 :Figure 3 :
Figure 2: Examples of mapping raw log to pattern vector.

Figure 5 :Figure 6 :
Figure 5: Comparisons of diferent ratios on the BGL dataset.

Table 1 :
Examples of heterogeneous logs.

Table 2 :
A brief summary of datasets.

Table 3 :
Comparison of methods on three datasets.