An Intelligent Detection Method of Personal Privacy Disclosure for Social Networks

With the increase of the number of users in the current social network platform (takingWeChat as an example), personal privacy security issues are important. )is paper proposes an intelligent detection method for personal privacy disclosure in social networks. Firstly, we propose and construct the eigenvalue in social platform. Secondly, by calculating the value of user account assets, we can obtain the eigenvalue to calculate the possibility of threat occurrence and the impact of threat. )irdly, we analyse the situation that the user may leak the privacy information andmake a score. Finally, SVM algorithm is used to classify the results, and some suggestions for warning and modification are put forward. Experiments show that this intelligent detection method can effectively analyse the privacy leakage of individual users.


Introduction
Today's society is developing rapidly, and with the popularity of smartphones, the amount of private information they generate is increasing. With the occurrence of "PRISM," Facebook user personal information leaked, and other incidents, the issue of private security has begun to attract people's attention.
In recent years, WeChat is one of the most popular apps in China, and as of June 2019, the number of monthly active accounts on WeChat reached 1.13 billion. e huge number of users will contain a large amount of user privacy information. However, most users do not have relevant expertise or neglect information management. erefore, when using WeChat, they do not pay attention to the protection of private information. After investigation and analysis, disclosure of account passwords, the addition of friend settings, location information, etc. during the chat process will pose a threat to user data if they are leaked and may cause economic loss or even personal harm. Because of the above problems, there have been related studies. Reference [1] conducted a comprehensive evaluation of the apps in the mobile app market but did not consider the risk of social platforms. Reference [2] aimed at Facebook to collect a large amount of user data and analyse them using a questionnaire, but the number of users on WeChat in China far exceeds Facebook. Reference [3] summarized the abnormal account detection scheme based on the four aspects of behaviour characteristics, content, graphs, and unsupervised learning. e detection scheme is rich, but the feature values involved are relatively small, making the data analysis insufficient and not targeting personal social accounts to make appropriate adjustments. Reference [4] fully considered the issue of trajectory privacy leakage and protected it with a prefix tree, but it was not enough to consider all aspects of personal social account security. Reference [5] made a formal description of the malicious use of the address book matching function and made corresponding protection measures, but it was also considered incomplete. In [6], a lot of malicious programs and risk programs on the Android side were analysed in depth, but only the leakage of information through phone calls and text messages was analysed. It did not analyse the social platform and failed to give a more intuitive evaluation system. In [7], a high scientific and rigorous static analysis, dynamic analysis, and network data model were used for multidimensional analysis. However, only 30 apps were tested, and it was concluded that the current application market software leaks user privacy. No risk assessment is performed for each user. In [8], the risk analysis was based on association rules and game theory, but the selected feature values were few and were not targeted at social platforms. Moreover, it is learned that the privacy leak detection systems in the current environment are oriented to companies and enterprises and are not suitable for analysing individual users. e innovations proposed in this article are as follows: (1) rough the investigation and analysis of WeChat, eight characteristic items in social networks are proposed and constructed, which are account passwords during chats, WeChat wallet consumption records (not friends), and WeChat wallet transfer records (friends), Moments settings of strangers, settings for nearby people, settings for adding friends, Moments settings of friends, and information acquisition of mini-program. e intelligent detection system uses these filtered feature items to calculate the risk value more efficiently and accurately. (2) Use the operations of asset identification, threat identification, and vulnerability analysis to calculate the comprehensive threat value. (3) An intelligent detection method based on SVM (Support Vector Machine) is proposed to divide the data more accurately. (4) After investigation, most of the detection software with similar functions today is oriented to enterprises, and this system is a rare intelligent detection system for individual users on the market.

Principles of Intelligent Detection System
e intelligent detection method of personal privacy leakage for social networks proposed in this article is always for users.
ere is a risk of personal privacy leakage. By obtaining user WeChat settings, asset identification, threat identification, and vulnerability analysis are performed, and the matrix is compared to obtain security. For event risk value, calculate information leakage risk coefficient according to weight. Reference [9][10][11][12] pointed out that machine learning has been widely applied in the fields of healthcare, cybersecurity, etc. due to its powerful data mining capabilities, where SVM is one of the most popular machine learning algorithms; therefore use SVM algorithm to divide information leakage risk coefficient and get a final evaluation.

Risk: Risk Is the Effect of Uncertainty on a Goal.
e risks explored in this article refer to the risks of information security breaches, human or natural threats, and the use of vulnerabilities in information systems and their management systems to cause security incidents and their impact on organizations. In the current environment of high information transparency, private information cannot be in a state of zero risks [8].

Assets and eir Value.
Assets refer to any information or resources that are valuable to the unit. e value of assets does not refer to the economic value of the information system but is closely related to the business work of the organization. Asset value is the importance and sensitivity of assets and the main content of asset identification.

Asset Identification.
Asset identification includes two steps: "asset classification" and "asset assignment." is article explores the classification of application software. Based on asset classification, further semiqualitative and semiquantitative analysis of assets is performed; that is, asset valuation is performed, to have a scientific and rational understanding of asset value. Assets are broken down into three security attribute assignments: "confidentiality assignment," "integrity assignment," and "availability assignment."

Confidentiality.
It is the feature that prevents the information from being leaked to unauthorized individuals, entities, processes, or makes it useless.

Integrity.
It protects the accuracy and completeness of information and processing methods.

Usability.
It is a feature that can be accessed and used by authorized entities once they are needed [13].

reat Identification
(1) reat: Potential cause of an accident that may cause damage to assets or units. (2) reat identification: Referring to the process of analysing the potential cause of an accident. reat identification is divided into "threat classification" and "threat assignment" [13].

Vulnerability Analysis
(1) Vulnerability: Weakness in assets or assets that can be threatened. Compared with threats, threats are the external cause of risk, and vulnerability is the internal cause of risk. e two together form a risk. (2) Vulnerability identification: Referring to the process of analysing and measuring the weak links of assets that may be threatened to use [13].

Basic Introduction of SVM Algorithm
. SVM refers to support vector machine, which is a common method of discrimination. In the field of machine learning, it is a supervised learning model, which is usually used for pattern recognition, classification, and regression analysis. e main idea of SVM can be summarized as two points: (1) It analyses linearly separable cases. For linearly inseparable cases, by using a nonlinear mapping algorithm, a linearly inseparable sample from a low-dimensional input space is transformed into a high-dimensional feature space to make it linearly separable. It is possible to perform a linear analysis of the nonlinear features of the sample using a linear algorithm in the feature space. (2) It constructs the optimal hyperplane in the feature space based on the structural risk minimization theory, so that the learner is globally optimized, and the expectations in the entire sample space meet a certain upper bound with a certain probability [14].

Basic Architecture of Intelligent Detection Model
. is intelligent detection model is divided into a data source layer, an analysis layer, and a calculation layer, as shown in Figure 1. Among them, after the user source of WeChat data is obtained by the data source layer, eight characteristic values are selected for analysis and calculation; the analysis layer performs asset identification in turn for the characteristic values, and threat calculation and vulnerability analysis, respectively, obtain calculation tables. Asset identification selects three security attributes of asset confidentiality, integrity, and availability, calculates the asset value, and divides the asset value into five levels to obtain a quantitative asset value table.
reat identification is to classify threats into five levels based on the frequency of threats to obtain a table of the frequency of threats. Vulnerability analysis is to calculate the fragility property calculation table by calculating the basic measurement group, time measurement group, and environmental measurement group in turn; at the calculation layer, the three calculation tables in the analysis phase are combined with the security event to compare the two-dimensional matrix table to obtain from each eigenvalue's data the risk value of the security event.
en the sum of the weight values of each risk value is used to obtain the risk value, and the risk value is brought into the corresponding SVM classifier to obtain the final result.

Eigenvalue Construction.
Based on the investigation and analysis of WeChat, we selected the following conditions as the eigenvalues. e intelligent detection system uses these filtered feature items to calculate the risk value more efficiently and accurately:

Account Password in the Chat Process.
e account and password are directly mentioned during the chat. If the chat history is stolen, the account and password information is leaked, and the entire account will be lost, with more illegal acts.

WeChat Wallet Consumption Records (Non-Friends).
ey require money to communicate with each other without knowing too much about the identity of the other party, have lack of security protection, and may cause economic losses.

WeChat Wallet Transfer Records (Friends).
e transfer security between friends is higher than the transfer between non-friends, but if the identity of the friend is impersonated, the identity of the transfer counterparty is unknown, so even the transfers between friends will be at risk.

Setting up a Circle of Strangers.
e setting of a circle of strangers is divided into invisible to strangers, ten photos visible to strangers, and unlimited. If the attacker continuously obtains the user circle information for a long time, the stranger can see that the ten photos are not much different from unlimited, which will cause a large amount of information leakage for the user.

Settings for Nearby
People. If the nearby people are not closed, the real-time location of the user will be exposed and used by criminals.

Add Friend Settings.
e related settings include whether you need to verify when adding as a friend. e way to search for users is divided into WeChat, mobile phone number, and QQ number, in addition to business card. Too many permissions in this regard will increase the possibility of being disturbed by strangers.
(7) Location of Moments: e attacker can further commit a crime based on the obtained positioning information, causing the user's personal safety to be threatened (8) Mini-Program Information Acquisition: Mini-programs usually obtain user information. If the miniprograms are used by criminals, arbitrating user information will lead to user information leakage.

Asset Identification.
Assets have security attributes such as confidentiality, integrity, and availability, which reflect the characteristics of the asset in different aspects. By quantifying the three security attributes, one can calculate a value that reflects the asset [15].
Among them, Conf represents confidentiality assignment; Int represents integrity assignment; Avail represents availability assignment; INT represents rounding processing and rounding. e three security attributes are divided into 5 levels. e higher the level, the greater the impact on assets.
ere are 5 levels of corresponding security attributes, and the level of asset value is also divided into 5 levels. e greater the level is, the more important the asset is.
It can be seen from Table 1 that the disclosure of the account password during the chat process will lead to the loss of the entire account information, so its three assignments and the calculated asset value are 5, which is the highest. Compared with WeChat wallet transfers between friends, the required protection information and processing methods are more accurate and complete than the WeChat wallet transfers between friends; that is, the integrity assignment is relatively high. For feature items that are likely to come in contact with strangers (Moments permissions settings for strangers, setting nearby people, adding friends, and using mini-program), we have assigned more average values. e location of the Moments is mostly limited to friends, so the value is lower.

reat Identification.
According to the frequency of threats, the possibility of threats is defined and divided into 5 levels.
e higher the level, the higher the probability of threats.
It can be seen from Table 2 that the number of account password disclosures and WeChat wallet consumption records between non-friends during the chat process has a greater impact on each leak, so the interval assignment frequency of threats of different levels is smaller. e remaining eigenvalue assignment intervals are larger or assigned according to the settings in the specific WeChat.

Vulnerability Analysis.
is paper uses the Common Weak Evaluation System (CVSS). e CVSS evaluation system consists of three measurement groups: the basic measurement group, the time measurement group, and the environment measurement group [15].
Basic metric � round_to_1_decimal (10 * access vector * access complexity * authentication * ((confidentiality impact * confidentiality impact weight value) + (consistent impact * consistency impact weight value) + (availability impact * availability impact weight value))) e values in Table 3 were selected according to Table 4 [16]. Since personal privacy leaks are based on local information, all access vectors are selected locally. WeChat has official protection measures, so the complexity of access is all high. Authentication refers to verifying whether the user has the right to access the system. Authentication is only required for special operations, so all selections are not required. If the account password disclosed in the chat is leaked, it will cause the user to lose all his accounts, so only the confidentiality impact, consistency impact, and availability impact of this feature item are selected all, and the rest are selected all or according to the impact. e confidentiality impact weight value, consistency impact weight value, and availability impact weight value are assigned according to the proportion of each characteristic item affected by the three attributes. Finally, the basic measurement value is calculated.
Time metric � round_to_1_decimal (basic metric * available for use * grade that can be repaired * confidentiality of the report) e values in Table 5 were selected according to Table 6 [16]. e leakage of the account password during the chat is most likely to be used, so this feature item can be selected for high utilization. e transfer records have low availability, so the selection is not confirmed. e availability of location selection in Moments is theoretically proven to be practical and feasible for the remaining feature items. e level that can be repaired is assigned according to the featured item according to whether it is  easy to recover after the leak.
e transfer records and Moments positioning are better than other feature items. erefore, high and theoretical are selected, and the rest are selected unconfirmed. e confidentiality of the report has been uniformly selected.
Environmental metric value � round_to_1_decimal ((time metric score + ((10-time metric score) * incidental loss impact)) * target distribution). e values in Table 7 were selected according to Table 8 [16]. e impact of the loss of transfer records and location of the Moments is small, so the impact of incidental losses is selected as medium and low, and the rest are selected as high. e target distribution is assigned according to the distribution of the feature items. e account password and password in the chat are selected to be high, and the rest are selected to be low or medium. Calculate environmental metrics.

Risk Calculation: e Calculation of Risk Is as Follows.
After completing asset identification, threat identification, and vulnerability identification, an appropriate model can be used to calculate the risk value of a security event caused by the vulnerability using threats. is article adopts the risk calculation model in Chinese National Standard GB/ T 20984 "Information Security Technology, Information Security Risk Assessment Specification". e formula is expressed as risk value � R (A, T, V) � R (L (T, V), F (A, V)). Among them, R is the calculation function     Security and Communication Networks of security risk, A is the value of the asset, T is the threat, V is the vulnerability, L is the possibility of threatening the use of the vulnerability of the asset to cause a security event, and F is the loss caused by the security event.
In the specific calculation of risk, there are three key calculation links.

Calculate the Probability of a Security Incident.
According to the frequency and vulnerability of threats, calculate the probability that a threat will cause a security event using vulnerability, that is, the probability of a security event � L (frequency of threats, the severity of vulnerability) � L (T, V).
is system uses a two-dimensional matrix algorithm to calculate the probability of a security event, as shown in Table 10 [15].

Calculate Losses Caused by Security Incidents.
According to the value of the asset and the severity of the vulnerability, calculate the loss caused by the security event once it occurs, that is, the loss caused by the security event � F (asset value, severity of vulnerability) � F (A, V).
is system uses a two-dimensional matrix method to calculate the loss of security events, as shown in Table 11 [15].

Calculating the Value at Risk.
According to the calculated probability of the security event and the loss caused by the security event, calculate the risk value, that is, risk value � R (the probability of the security event, the loss caused by the security event) � R (L (T, V), F (A, V)).
e system uses the two-dimensional code matrix method to calculate the risk value of security events, as shown in Table 12 [15].

Sum Based on Weights
e risk value of each data security event is obtained from Table 12, and each risk value is multiplied by the weight value of Table 13 to obtain the final risk value.
e calculation formula for the comprehensive calculation value of a single threat to an information asset: t � T s + T i .
Among them, t is a single threat comprehensive value, T s is a threat source value, defined as a value between 1 and 5, and T i is an impact degree value and is also defined as a value between 1 and 5.

SVM Algorithm Application.
e intelligent detection system uses the SVM algorithm to divide the comprehensive     Security and Communication Networks 7 threat value (as shown in Figure 2) and further divides the risk level of the user account more accurately. e specific process is as follows. Based on the comprehensive threats mentioned above, it is worth calculating the risk value. e obtained risk values are divided into two categories. e scores of 1 to 40 are lowrisk areas, and the scores of 40 to 100 are high-risk areas. Among them, 1 to 20 in the low-risk areas are defined as safe, 21 to 40 are defined as basic safety, 41 to 59 in the high-risk areas are defined as higher risks, and 60 to 100 parts are expressed as high risks. e feature quantities of two types of risk values defined as safety and basic safety are recorded in the initial feature vector set 1. e feature quantities defined as two types of risk values of higher-risk and high-risk are recorded in the initial feature vector set 2.
Normalize the feature data items to remove the extreme data. Convert the processed two types of data formats into an input format acceptable to the SVM classifier (class vector Y, feature vector Xi) e corresponding classifier 1 is trained using data defined as safe and basic safety as training samples, and the corresponding classifier 2 is trained using data defined as safe and basic safety as training samples.
Set the SVM parameters and use the K-fold cross-validation algorithm to find the optimal parameters. Perform asset identification, vulnerability analysis, and threat identification from the characteristic values read by the user. After risk calculation, determine the low-risk area or highrisk area based on the score and enter the corresponding risk area as a test sample. e SVM classification model performs classification judgment. Substitute the results obtained by the SVM into the Naive Bayes formula to obtain the security risk probability, and send feedback of the final results to the user.
<index> refers to 8 feature quantities of the input algorithm, which are integers.
<value> is the value of the feature code for each item and is an integer.
SVM_train implements training on training samples to obtain SVM models.
SVM classification is a prediction of the classification result of the data set according to the trained model.
Use SVM_train to train the input training data set to obtain the SVM model file. e SVM algorithm maps each input training sample, that is, an n-dimensional vector into a high-dimensional space, forming multiple scattered points, and passing the aggregation of points. e region simulates the classification hyperplane and continuously uses the newly input training sample data to make corrections, generates template files, and records the classification features.
In this paper, the K-fold cross-validation method is used to obtain the optimal parameters by verifying the accuracy of the results. e main purpose of the verification algorithm is to divide the data set A into a training set B and a test set C. When the sample size is small, the data set A can be randomly divided into k packets, and one of the packets is used as the test set at a time. e remaining k-1 packets are trained as a training set. e cross-validation method is used to prevent overfitting caused by the model being too complicated [18]. By constantly transforming two important parameters of the SVM: the penalty factor C and the kernel function parameter g, the optimal parameters C � 2048 and g � 0.0078 are determined.

SVM Algorithm Processing
SVM classifier 1: Input x � a 1 , a 2 , · · · , a m y y 0 , y 1 , x represents the feature value set of each sample in the test sample, y represents the categories are 0 and 1, which represent safety and basic safety respectively; Output e user's security risk probability is less than or equal to 50% as safe, and greater than 50% as basic safety.
Step 1: Normalize the feature data Step 2: Convert the processed feature data into an input format acceptable by the classifier (feature vector x, category vector y) to obtain training samples Step 3: Set the SVM type to 0-SVM and the kernel function type to radial basis function (RBF) Step 4: Set the penalty factor C and kernel function parameter G Step 5: Set the K value of the K-fold cross-validation algorithm Step 6: Use the SMO algorithm to find the support vector Step 7: Build a hyperplane model from training samples Step 8: Enter the test samples for classification, and get the classification result y Step 9: Calculate the P(a i |y) to obtain the conditional probability ratio of each feature attribute in the result classification y Step 10: Calculate p(y) to get the probability of category y appearing Step 11: Calculate p(a i ) to get the probability of each characteristic attribute Step 12: Substitute the formula P(y|x) � P(x|y)P(y) P a i P a 2 · · · P a m · a.
Step 13: return P(y|x) 8 Security and Communication Networks SVM classifier 2: Input x � a 1 , a 2 , · · · , a m y y 0 , y 1 , x is the feature value set of each sample in the test sample, y is the category is 0 and 1, which means high-risk and higherrisk respectively Output e user's security risk probability is less than or equal to 50% as a high-risk and greater than 50% as higher-risk. SVM classifier 2 process is the same as SVM classifier 1.

Environmental Configuration and Data Acquisition.
is test system is designed to run on the Android platform. During the test phase, Android Studio is used to simulate the Android platform for various tests.
Due to the inconvenience of directly obtaining the personal privacy data of the user's WeChat, a questionnaire was used at this stage to collect the WeChat usage of 149 users as a training sample for the SVM classifier. e specific content of the questionnaire is shown as Appendix in Supplementary Materials (available here). Table 14. For the functional test of this system, we first obtained the WeChat related records of a user for testing, as sample 1. e user has been tested and calculated a comprehensive threat value of 26. After obtaining the comprehensive threat value, the data format of this sample is converted into an input format acceptable to the classifier. Based on the user's comprehensive threat value of 26, the sample should be determined. Enter SVM classifier 1. e input format is 0 1 : 3 2 : 1 3 : 2 4 : 1 5 : 1 6 : 1 7 : 4 8 : 2, and processing of sample 1 is complete.

Functional Test. User test assignment table is shown in
After removing the extreme data from the remaining samples, the above steps are processed and sent to the corresponding SVM classifier. e training samples are used to build a hyperplane model. When the system intelligently detects the risk leakage probability, it will automatically obtain the feature quantity, calculate the comprehensive threat value after calculation, and send it to the corresponding SVM classifier to obtain the final security risk probability.
According to this method, we processed the results of 149 user questionnaires and calculated the number of scores for each segment. e results are shown in the following Table 15.
From Table 15, we can see that a total of 89 users are in the low-risk area and 34 users are in the risk area, of which 26 users are in the security zone. is shows that the security awareness education has been effective, and people have realized that personal privacy is important, but there are also many users in high-risk areas, indicating that there is still a need to continue with efforts to expand coverage and increase everyone's security awareness.

Performance Testing.
Obtain user WeChat related information through a questionnaire. As a sample, test the personal privacy leak detection value of a user's social network, and give a warning or suggestion to get the percentage of people at each risk level, and then get the current data of whether people know and implement the degree of privacy protection in place, which aspects are of importance to people, and which aspects are ignored by people, and provide directions for the promotion of privacy protection awareness in the future. e findings are shown in Figure 4.
At the same time, we counted the number of occurrences of high threats for each feature item (that is, the number of times assigned 4 or 5).
In Figure 5, we can see that most people have a certain awareness of self-privacy protection, but many people ignore the function of "people nearby" and allow strangers to view the private information that may be leaked in Moments. A system that can protect the privacy of the user's privacy is      essential.
rough this intelligent detection system for personal privacy leaks for social networks, users can clearly understand their negligence in the process of using WeChat and correct them to prevent problems before they occur.

Concluding Remarks
e system proposed in this article is based on reading multiple characteristic values of personal WeChat and establishing a model based on three aspects of asset identification, threat identification, and vulnerability analysis. According to the risk calculation models and methods in the national standards of information security risk assessment standards, the dimension matrix table calculates the possibility of security events, the loss of security events, and the risk value of security events, determines the risk level according to the magnitude of the risk, evaluates the personal privacy leakage of the user's online social software, gives a score, and informs the user about source of risk.
is article only mentions the scoring function in the system and the function of displaying the risk source of personal privacy leakage. In the future, more functions will be added to improve the entire system, which will also make the judgment more accurate and create a more accurate situation for the individual users, creating safe environment to use social networks.

Data Availability
Due to the inconvenience of directly obtaining the personal privacy data of the user's WeChat, a questionnaire was used at this stage to collect the WeChat usage of 149 users as a training sample for the SVM classifier. e specific content of the questionnaire is given in Supplementary Materials.

Conflicts of Interest
e authors declare that they have no conflicts of interest.