Behaviors of High-Frequency Subscribers in Cellular Data Networks

Copyright © 2018 JingtaoLi et al. This is an openaccessarticledistributed undertheCreative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cellular networks operate under restrictive constraints of resources including radio channel capacity and network processing capability. The tremendous growth in the cellular data network usage brings operators with unprecedented signaling overloads and threatens the stability of the network. High-frequency subscribers, who produce low data volume but cause high signaling overhead, are extremely resource-inefficient. For subscribers who activate more than 5 sessions per hour, they only account for 1.19% of the total subscribersand generateabout 3.81% data traffic but consume roughly 19.46% of the signaling resources, resulting in the inconsistent signaling-data bandwidth consumptions. Understanding the characteristics of those users has an important significance of capacity design and optimal allocation of resources. A lack of understanding of this active group potentially leads to low network performance and security threats. In this paper, we perform the first city-wide, large-scale investigation of high-frequency subscribers. By applying a set of novel approaches, such as pattern extraction and user behavior rebuilding, we observed that high-frequency subscribers correspond to a lower percentage of none-pattern traffic, showing positive correlation between access regularity and session activation frequency. Besides, we found that amount of high-frequency subscribers has abnormal behaviors, resulting in unwanted signaling loads. We demonstrate that our findings have significant implications on network optimization.


Introduction
The Internet is going wireless and mobile, and cellular network is going to be the favorite way to access network.However, in spite of a concerted effort to support packetswitched traffic, cellular data networks are still, at their essence, circuit-switched systems.Because of this inflexibility, the tremendous growth in the cellular data network usage brings operators with unprecedented signaling overloads and threatens the stability of the network.
High-frequency subscribers, one particular type of subscribers who access cellular network frequently and do be more active than others, are extremely resource-inefficient for high signaling overhead and low data transmission volume, resulting in the following potential threats.Firstly, highfrequency subscribers produce few data traffic but have disproportionately high signaling overhead.Secondly, high signaling resource consumption with few data transfers is unfair to other subscribers, which puts high signaling pressure on network operators but produces low fees.
The problem caused by those users is not a simple problem of user behavior but an all-round problem which affect network management, security, performance, and so on.Therefore, it is important to characterize their behaviors to balance resource usage and guarantee network performance.Besides, characterizing their behaviors can fill a vacancy in the analysis of such users on the one hand and deepen our understanding of real-world traffic in cellular networks on the other.
In this paper, based on the real-world traces collected from a commercial cellular network in China in 2010, we present the first in-depth city-wide measurement of high-frequency subscribers to quantitatively understand the following important characteristics from data session (data session: a period of continuous activity, lasting from the allocation to the release of network resource and containing control-plane and data-plane messages) level and application-level semantics [1]: (i) The impact on signaling resources of high-frequency subscriber sessions for commercial cellular networks.
(ii) The regularity of subscribers' high session activation behaviors.
(iii) The correlation between session activation frequency and periodicity.
(iv) The correlation between periodicity and abnormal behaviors In summary, we detail our key phenomena as follows: (i) Inconsistent signaling-data bandwidth consumptions: producing low data volume, but causing heavy signaling load, high-frequency subscribers result in the unfairness of resource allocation and loss of network operators.Furthermore, this may decrease the network performance.
(ii) Positive correlation between regularity and activation frequency: higher frequency corresponds to a lower percentage of none-pattern traffic.More than 40% of the subscribers (  ≥ 10) are in accordance with Pattern A (Pattern A: activate sessions per fixed seconds), and for those with   larger than 20, almost 70 % of them are Pattern A subscribers.
(iii) Amount of periodical subscribers actually does abnormal behaviors: by correlating cross-layer info, we identify that nearly half of Pattern A users' behaviors are abnormal, the top four types of which are Periodical PDP Context Activation, Network-side Termination, Periodical UDP Packets, and Privacy Information Uploading.Producing low data volume, but doing some other bad things sneakily, such as heavy signaling load and privacy leak, those abnormal behaviors endanger the security and decrease network performance.
Paper organization.Section 2 provides the background and dataset.Section 3 points our research intention and methodology.Then we characterize high-frequency subscribers and three kinds of association in next three sections: quantitative relationship between data transfer and signaling consumption in Section 4, correlation between frequency and periodicity in Section 5, and relevance between periodicity and abnormal behaviors in Section 6. Next, we propose some real-life implications of our findings in Section 7. Section 8 summarizes related work and Section 9 concludes the paper.

UMTS and Dataset
Our traces are collected from Gn (Gn: the Data transmission link between SGSN and GGSN in cellular network) interface, in a cellular operator's core UMTS network which services a large metropolitan with a population of more than one millions in China.Figure 1 plots its architecture.We collected all two types of data traffic: one is PDP control messages between SGSN and GGSN for the initiation, termination and updating of data session, and another is Tunneled IP packets between mobile terminals and the GGSN.Our collection lasts for three days in January, 2010.
In our work, each data session is logged as a behavior record which is the reflection of a subscriber's continuous activities and is indexed by a session activation timestamp and an anonymized subscriber identifier (IMSI (IMSI: International Mobile Subscriber Identification Number)).Each record contains all kinds of key information: control messages which can reconstruct the signaling exchanges during the session, such as the messages for establishing and terminating the session, IP packets during the holding time of the session for key fields extraction, such as IP address, ports and compressed payload, and application identification such as port information, payload signatures, and other heuristics which are detailed in [2].

Methodology
3.1.Phenomenon Driven.High-frequency subscribers are extremely resource-inefficient for two phenomena: low data volume and high signaling overhead; i.e., subscribers (  ≥ 5) generate about 3.81% data traffic but roughly consume 19.46% of the signaling resources.
From network operator's perspective, they should apply much more policies to deal with this huge inconsistency, resulting in heavy management and poor network performance.
High-frequency network access is not a simple specific user behavior but an all-round complex problem which affect network management, security, performance, and so on.Therefore, we struggle to dissect those specific subscribers.
Unfortunately, by focusing on session-level features we observed that the inconsistency phenomenon between data volume and signaling consumption covers the entire spectrum of the frequency.To further dissect this inconsistency, we then correlate frequency with access regularity from user access level and consider the very/extremely high-frequency subscribers to extract their behavior pieces from applicationlevel semantics.

Why Session Centric?
Different form prior work which analyzed based on packet or flow level or used an idle time (e.g., 5 minutes) to approximate the termination of a session, we studied from session perspective for the following considerations: (i) Data session maintenance is the basic goal of network signaling while flow is just one piece of a session.A data session not only contains its flows' total information, but also includes some additional information such as session activation time.
(ii) To some extent, data session can be used to capture the resource usage behaviors of mobile subscribers by semantic analysis while flow cannot do.
(iii) One of our purposes is to extract high-frequency users' characteristics comprehensively and quantify the signaling consumption, and those information is shown in data session level brightly and perfectly.
Though it is much harder to rebuild sessions than flows, it is more useful to macroscopically analyze session signaling consumption than the analysis of flow pieces.In this paper, we reconstruct more than 910K complete data sessions and develop novel pattern-identify method to study their characteristics.

How to Extract Sessions.
As mentioned above, different from prior work which terminated session based on an long idle time without data transfer, we accurately extract one data session from raw packet traces according to the following steps: (i) Extract session control message including PDP Context Create, Update, and Delete messages, and rebuild session structure based on TEID Control (Tunnel Endpoint Identifier for control-plane massages) option.
(ii) Extract session data message and join with session structure based on TEID Data field, which identifies the data communication tunnel.
(iii) For one particular subscriber, correlate sessions based on IMSI option as IMSI is identifier that identify the unique data sessions activated by the same subscriber.
The methods proposed in [3,4] extract sessions relied only on IP packets in data plane; that is, they assume the termination of a session if no packets from the private IP address of that session are observed for a threshold time.They mainly focus on how to estimate the signaling overheads brought by RRC state promotions and demotions in a single session, which are caused by the intervals between continuous IP packets.However, by correlating control plane and data plane, we can accurately identify the beginning and the end of a session.In addition, we can connect all the sessions belonging to one subscriber together, which will be used to analyze subscribers' session-level behaviors.Note that private IP addresses are dynamically allocated and normally one subscriber would get different IP addresses when connecting to the network twice, so methods based on IP addresses cannot be used to connect sessions.

Cross-Layer Analysis.
To deeply understand highfrequency subscribers' behaviors, we should do know what they have done, how they have performed, and when and where they have communicated.However, those user information is hidden in various layers, such as access time info hidden in session layer, transport protocol info hidden in flow layer, and application preference info in application layer.
To restore a real comprehensive user from disorganized data, we should accurately extract fragments from various perspectives, correlate them together, and quantitatively analyze subscribers' behaviors.
In this paper, we firstly try to quantify the influence of this inconsistency phenomenon between data volume and signaling consumption which covers the entire spectrum of the frequency by focusing on session-level features.Then we step into user access level to correlate frequency with access regularity; finally we consider the very/extremely high-frequency subscribers to extract behavior pieces from application-level semantics.

Inconsistent Signaling-Data Bandwidth Consumptions
In this section, by quantifying the signaling resource impact of high-frequency subscribers measured, we intend to understand user behavior and quantify the correlation between traffic volume and signaling consumption.

The Phenomenon.
We use two indicators for session clustering through most of our following analysis: (a)   : defined as the average session-activate times per hour for every subscriber; (b) SAI (abbreviation for Session Activation Interval): defined as the interval between two adjacent session activation requests initiated by the same subscribers.Figure 2 plots the CDFs of sessions and subscribers over the metrics.We observed that major subscribers (93%) have a   less than 3, but just accounts for roughly 70% of the total sessions, suggesting that others (7%) have a high-frequency behavior to activate sessions, counting for approximately 30%.Furthermore, we observed that the top 1% of the subscribers create nearly 20% of the total data sessions, suggesting that a few subscribers activate far more sessions than others.This shows a significant imbalance of network usage among subscribers with a few subscribers hogging the much of the network resource, resulting in the unfairness of resources sharing.

Quantitative Analysis.
Given the subscriber set "U", we use two metrics to profile its characteristics:(a) traffic volume (labeled as V), defined as the number of bytes above transport layer consumed by all subscribers; (b) signaling overhead (labeled as S), defined as the total signaling messages involved in creating and deleting sessions by all subscribers and containing two parts: one part is the signaling overheads for radio resource control, and the number of signaling messages is estimated based on the signaling exchanges by the RRC state transitions [3]; the other part is the signaling overheads of PDP context control, and the number of messages in this part are counted directly from our dataset (and this part of signaling overheads are not taken into consideration by the previous works [3,4]).
As showed in Table 1, we totally focus on five sets:  0 is the all subscribers in the entire dataset; and  1 to  4 correspond to the subscribers with   larger than 5, 10, 15, and 20, respectively.And for subscriber set   , (  ) (or (  )) represents the total traffic volume (signaling overheads ) consumed by these subscribers as defined above.
For each set   , we first calculated its subscriber subset   detected in the real trace, which is the subscribers in   who have periodic session activation behaviors.Then we calculated the ratio of the traffic volume of   over that of U0 and the ratio of the traffic volume of   over that of  0 as and The ratio of the signaling overheads brought by sessions of   over those brought by sessions of  0 and the ratio of the signaling overheads brought by sessions of   over those brought by sessions of  0 are also calculated as and Δ () =  (  )  ( 0 ) . ( We observed that as   grows larger, the traffic volume of high-frequency subscribers accounts for less proportion.But by removing these subscribers, the signaling overheads will reduce a lot.This trend is more obvious when applying it to periodic activated sessions.Clearly, there exists tremendous disparity between the traffic volume and the resource consumption of high frequent or periodical session activations, indicating that high frequent or periodical session activations are extremely resource inefficient. As an example, high-frequency subscriber sessions are responsible for only 0.55% of  2 , but their signaling overhead (Δ) impacts are 8 times higher.For periodic sessions in  2 , the impacts are nearly 20 times higher.
To quantify disparity between the traffic volume and the resource consumption of high-frequency subscribers, we introduce the metric "payload density".Let   be the average payload size per session of a subscriber and   be the total number of singling messages per session of the subscriber.We then compute the payload density, defined by  y /  .Payload density is essentially one metric for measuring the effective data transfers per signaling message.Figure 3 plots the normalized payload density distributions of subscribers with different session activation frequency.We use the maximum payload density as the basis for normalizing.That is, normalized payload density of the subscriber  = 's payload density/maximum payload density.We observed that some active subscribers have an extremely high sessionactivate frequency, but a smaller payload density.The session activation frequency shows a negative correlation with the payload density.
From the operator's perspective, they charge only based on the traffic that subscribers have generated, and they prefer the situation of lower signaling cost but higher traffic volume; however, those active subscribers generate just little traffic and cause significant signaling load.From the perspective of other subscribers, in the process of generating the same amount of data traffic, those subscribers consume more resources, showing significant unfairness of resource consumption.Our estimation method has some limitations, because it neglects the signaling resources consumed in the session holding time.For example, during session holding time, RRC state transition may still occur (i.e.,  → ), which brings a few round-trips of signaling messages.However, our method takes into account the signaling resources consumed in RRC connection setting up (i.e.,  → ) and the RRC connection release (i.e.,  →  or  → ).The recent cellular network measurement study [3] has demonstrated that the signaling messages of RRC connection setting up and release account for more than 60% of the total ones.
In summary, some high-frequency users produce low data volume, but cause heavy signaling load, resulting in the unfairness of resource allocation and loss of network operators.Besides, with the increase of activation frequency, this phenomenon is more pronounced.

Positive Correlation between Periodicity and Frequency
In this section, we focus on the characteristics of session activation about users described in previous section.

Detecting Periodicity in Session Activations.
To detect the regularity of subscribers' session activation, we use DBSCAN algorithm [5] and a novel classification method to analyze subscribers' data sessions from a time perspective.Define high-frequency subscribers as  = { 1 ,  2 , . . .,   }, and each element of U represents a high-frequency subscriber.Here we use   as an example to illustrate our methods.
(i) Reconstruct all data sessions initiated by   in chronological order and model a sequence of sessions as shown in Figure 4.
(ii) Assuming   = {  alphabet to these clusters in proper order.Thus, we get a cluster symbol series   =  1 . . .  , and   represents the symbol of the  ℎ cluster of   , i.e.,  1 =  and  2 = .
(iii) Assume   = {( 1 ), ( 2 ), . . ., (  )}, in which the method S means the transition from   to the symbol of the cluster which   belong to.Then extract all subsequences of   to find his self-patterns; i.e., if his subsequence is "dbcb", then we retrieve all occurrences of "dbcb" in   and calculate its ratio.
If it exceeds the sequence fraction threshold (here, we set threshold to be 65%), then his self-pattern is "dbcb"; else consider his other subsequences.
(iv) To normalize self-activate pattern, we map from the capital alphabet to different cluster symbols of a particular pattern in a proper order.For example, suppose we have two self-patterns "abca" and "dbcd", in this case, they are both mapped to "ABCA", which indicates that they have the same normalized regularity.
In our method, we set sequence fraction threshold values for the consideration as follows: for each user, multiple applications can access network via cellular networks anytime, resulting in that a data session may contain multiple applications' data transfer, and each application may trigger a session activation if not existed.In this case, each application's access behavior, such as periodicity, may be hidden by others and make it harder to detect, as showed in Figure 5.
By making use of our periodicity detecting method, we tag two session types: periodical sessions which follow specific pattern, and normal sessions which behaviors are disorganized.And each user may have one or two session types.In our following sections, we sometimes analyze, respectively, with this definition.

The Phenomenon.
Using the periodicity detecting method in previous section, we get tens session activation patterns.Here we choose some of them, which hold the largest proportion and illustrate them in Figure 6.We use "Pattern X" to represent the regularity of session activations extracted by our periodicity detection method.Session activation behavior of a subscriber follow Pattern A means more than 50% of his SAIs are in accordance with one periodicity (T in Figure 5), and Pattern AB means more than 50% of the SAIs are in accordance with one or two periodicities (2T in Figure 5), as shown in (3)."Others" means there exist more than two periodicities in these sessions.Some other subscribers follow no pattern and tend to frequently activate sessions irregularly, which is represented as Pattern N.
We observed that there is a positive correlation between the frequency of subscribers' session activation and the periodicity of SAI.Higher frequency corresponds to a lower percentage of none-pattern traffic, which means the more frequently subscribers activate their sessions, the more possible that their activate sessions periodically.More than 40% of the subscribers (  ≥ 10) are in accordance with Pattern A, and for those with   larger than 20, almost 70 % of them are Pattern A subscribers.
Figure 7 plots the CDF of Pattern A subscribers over SAI.The key observation is that several particular values dominate the intervals.We notice multiple small clusters, such as ≤ 3 minute, 4.5-6 minutes.Such values are likely to be set by mobile application developers in an ad hoc manner.
Comparing two curves, we find periodical sessions have a smaller SAI mean value.

Explanation (Application-Level Semantics).
In order to find a reasonable explanation, we focus on the applications those subscribers applied, i.e., Facebook.To analyzing their application-level features, we respectively focus on Pattern A subscribers who trigger periodic session activations and Pattern N subscribers who generate nonperiodic sessions.
To carry out a deep investigation, we use not only the categories of network traffic characterized by port number but also application layer headers to distinguish the traffic from different applications [2] 5.3.1.Identifying Subscriber Behavior.Data flow, a specific data service between mobile and a fixed server, is typically identified by five tuples: src/dest address, src/dest ports, and protocol, for example, a tcp flow is represented by ( * , * , * , * , TCP).
Each session contains multiple (≥ 0) flows, and flows within a session contain same local address (or protocol) and different server address.In this section, we identify each subscriber's application behavior as follows: Assume each subscriber  has  sessions and is assigned a session vector   = { 1 ,  2 , ⋅ ⋅ ⋅ ,   }, and each session   has  flows.The application type of  flows is stored in a vector (  ) = ( 1 , . . .,   ) ordered by flow index.Then  have many application type vectors and each vector may have duplicate entries.By merging vectors and removing duplicates, the  application types  used are stored in a vector   = { 1 ,  2 , . . .,   }.
Here we define  , as the  ℎ application type fraction for user "u", then we can get When  , is much larger than others and   have a small size, then the main application type of  is  , ; i.e., if  has   = {, , } and   = {0.05,0.02, 0.93}, then its type is "game".By the same token, larger     and lower  make subscribers a multiple-application type; i.e., if  has   = {,,, , } and   = {0.2,0.2, 0.2, 0.2, 0.2}, then its type is "multiple-application".For reference, Table 2 lists important definitions used in this paper.

Application-Level Origin.
As showed in Figure 8, we observed that most sessions have less than one flow, which means that to some extent a flow can represent a session.Besides, most periodical high-frequency users prefer to activate those zero-or 1-flow sessions.Then by calculating each user's application behavior with previous method, we identify more than 90% Pattern A users' behaviors, Figure 9 plots the distribution of behaviors by removing unknown types.
We observed that most Pattern A users are SingleApp type, which means their features are mostly influenced by one specific application.By correlating multiple users' information with same application, we list features of some top behaviors as showed in Table 4.

Abnormal Behaviors
6.1.The Phenomenon.We find a certain correlation between the subscribers periodic session activations and abnormal behaviors (listed in Table 3).The overview of these apps or behaviors is illustrated in Figure 10.We observed that Pattern A subscribers (45.38% in sum) have a significant probability to trigger abnormal behaviors than others (9.95% in sum).

Periodical PDP Context
Activation.Some subscribers periodically initiate PDP context with a fixed time interval, and then delete the PDP context soon.Figure 11 (red line) plots the CDF of Pattern A subscribers who perform this behavior over SAI.We observed that the SAI values is extremely fixed, one cluster range from 0min to 1 min, and another around 5min.
No data-plane transmission (no IP packets), short session duration, and fixed Session Activation Interval, obviously, it is enough to prove this behavior is suspicious.It would only cause the network to exchange signals continuously, wasting a lot of signaling resources without any actual utility (effective data transmission).In addition, the subscribers will not be charged with any fees based on data traffic accounting in this case.

Network-Side Automatic
Termination.Some compromised mobile devices are about to access some premium or special websites continuously.As shown in Figure 12, we observe that these data flows have been successfully recognized by some middle boxes between the GGSN and  the Internet such as Firewall and OCS (Online Charging System) where all traffic is composed of IP packets.Detected as abnormal access by firewall as lack necessary permissions, the connections will be deleted by the GGSN automatically.Although these methods prevent the target hosts or the core network from suffering invalid access, but the overconsumption of signaling resources cannot be avoided.Abnormal subscribers will continuously reactivate the PDP context and access as the automatic termination looks like a technical error for the subscribers, while GGSN will delete these PDP contexts immediately.Back and forth like this, a lot of signaling resources will be consumed.
Figure 11 (black line) plots the CDF of Pattern A subscribers who perform this behavior over SAI.We observed that major of this behavior may happen per (<1) min.Short period, high network resource consumption, and simple operation, make this behavior harmful to network and decrease others' experience.

Periodical UDP Packets.
As showed in Figure 11 (green line), after creating the PDP sessions, the subscriber sends UDP packets to the target host in a certain period (from 1s to more than 30s or from 5min to 6min), and those sessions have its self-defined port, such as 8888, 12345.This type of sessions often has a relatively longer duration; however, the payload size and content of its packets mostly are fixed and fewer than dozens of bytes.Furthermore, we find some abnormal UDP broadcast behavior with few same content and high contracting frequency.As predicted in [6], periodical UDP packets may provide a way for the attackers to drain the battery power of subscribers' mobile devices by exploiting PDP context retention and the paging channel.

Privacy Information
Uploading.The subscribers continuously initiate PDP contexts and upload privacy information [7], such as IMEI, phone numbers, etc.After finishing uploading, they disconnect themselves and after a while (SAI distribution plotted in Figure 11 (blue line)) repeat these operations.These behaviors may threaten the privacy security and at the same time waste signaling resources.
By identifying and clustering URLs, we model several typical requests, as showed in Table 5.
For those three model types, we observed major servers use the first type, such as "S2"; some UDP-based servers use the third model, only a few servers obtain users' privacy information by making use of the second model, such as a real request " http://S4:8080/?HOST=S2&R=/bs/&Phone-Type=**&PhoneNumber=%2B861345&Version=1.6" As we have studied in previous section, most sessions just have a flow and major Pattern A users are "SingleApp" type; in this section, we focus on flow level, for the consideration that a flow represent one-time data transfer between mobile and server.
Figures 13 and 14 plot the features of those servers (use "S1-22" to represent those servers) which get users' personal information.
We observed that there are dozens of servers obtain users' personal information, and major servers just obtain one or two privacy types.However, we still found that a few servers just request multiple times in each flow and do nothing, such as Server2, Server4, resulting in the sustained privacy loss.

Real-Life Implications of the Findings
It is necessary to reduce or manage the impact of highfrequency subscribers.Based on our study on the real trace Security and Communication Networks 9 from an operational cellular network, we found that highfrequency subscribers can be extremely signaling resourceinefficient as they activate data sessions with high frequency but transfer few data in each session.As a result, it is necessary for cellular network operators to monitor the subscribers' session activation behaviors.We believe that this kind of solution will be critical for cellular service providers to improve the performance of resource allocation.Those subscribers with a high   value and low payload density value should be paid close attention to.

Related Work
Servicing as the interface of network access, cellular network systems have experienced lots of versions to balance datavoice services and improve performance.However, the nature (or bottleneck) of these systems themselves has not changed, which means subscribers should consume network signaling resources for new network connections before access network, resulting in signaling overhead and even signaling storm [8].
In recent years, the characteristics of cellular data traffic and its impact on capacity planning, signaling cost and data transmission have attracted attention in the industrial circle.Previous studies can be classified approximately into several categories.
Network Architecture and Resource Management.RRC, which manages the handset radio interface, is the key coupling factor bridging the application traffic patterns and the lowerlayer protocol behaviors [9].Previous studies [10,11] examine the RRC state machine and its interaction with cellular traffic for cellular networks.Also, some efforts [4,[12][13][14][15][16][17] measured various network performance metrics.Those studies investigated various aspects, such as crowed events, queue delay, and so on.Besides, Xu et al. characterized 3G data network infrastructure and found that the current routing of cellular data traffic was quite restricted [18].Wang et al. unveiled cellular carriers NAT and firewall policies [19].In this paper, by characterizing user behaviors, we took advantage of existing RRC state promotion, evaluated high-frequency access behavior's impact on signaling allocation/release, and found that invalid deployment and configuration of middle boxes will generate unwanted signaling traffic.
Mobile Behavior.The areas of traffic and application characteristic have recently received much attention by the research community, such as traffic dynamics [20,21], geospatial dynamics [22], behavior patterns [23], mobility [24], and application usage patterns [25].There are also several subscriber behavior studies based on deploying a custom logger on smartphones [26,27].Besides, by analyzing the periodicity of data transfer from ip-packet level, [4] studied periodic data transfer and its impact on resource consumption.In contrast, we analyzed periodicity of session activation from session level and perform the first multiangle investigation of high-frequency subscribers by rebuilding sessions, identifying session activation patterns and calculating application behavior for anonymized traces of an city-wide operational 3G network.
Unwanted Traffic, Detection, and Prevention.Complex and heavy signaling procedures render Internet-connected cellular networks vulnerable to a variety of abnormal data traffic [28][29][30].Many studies have focused on various types of abnormal data, including virus [31,32], spams [33], DoS attacks [34][35][36], phishing [37], charging [38,39], etc.In response, significant work has been undertaken to detect [40,41], model [42,43], and defense [41] such problems.The goal of these studies is to design or model abnormal traffic to prevent legitimate use of data services [28,30].Unfortunately, few of these solutions have been widely deployed.In this paper, by collecting real-network data, modeling and extracting forensics, we do observe that some behaviors of highfrequency subscribers can lead to unwanted heavy signaling overloads.Besides, prior efforts [4,13,44,45] have explored that unwanted traffic can cause large-scale wastage of logical resources in cellular networks for various aspects, such as crowed events, periodic transfer, etc.In this paper, we proposed "high-frequency" traffic, a novel traffic type, verify, and enrich the above conclusion.

Conclusion
In this paper, we comprehensively characterized the impact and application-level origin of high-frequency subscribers in an operational cellular network in China.They consume much more signaling resources but have a lower utilization and arrive at super conclusion by making the following contributions: (i) Inconsistent signaling-data bandwidth consumption, i.e., subscribers (  ≥ 5), generate 3.81% of the data traffic; however consume more than 19.46% of the total signaling resource, causing unfairness in charging.
(ii) Positive correlation between periodicity and frequency.Higher frequency corresponds to a lower percentage of none-pattern traffic.
(iii) Periodic subscribers tend to just apply one behavior or application, and amount of them actually does abnormal behaviors, such as periodical PDP context activation, network-side automatic termination, and privacy information uploading, and the payload density of these applications is extremely low.

Future Work
There are several directions for further research.First, we do not consider the impact of different kinds of applications.Second, how to reduce or manage the impact of high-frequency subscribers.We believe our findings in characterizing the session patterns of high-frequency subscribers directly have important implications on solutions to some of these issues.

Figure 2 :
Figure 2: CDF of subscribers and sessions over   .

Figure 7 :
Figure 7: CDF of Pattern A subscribers over SAI.

Figure 8 :
Figure 8: CDF of Sessions over the number of flows per session.

Figure 9 :
Figure 9: Distribution of Pattern A users over behavior type.

Figure 10 :
Figure 10: Distribution of high-frequency subscribers over behavior type.

Figure 11 :
Figure 11: CDF of Pattern A subscribers over SAI.

Figure 13 :
Figure 13: Distribution of Pattern A PIU users' flows over server.

Figure 14 :
Figure 14: Analysis of Pattern A PIU users over request times per flow and personal key count.

Table 1 :
Impact of high-frequency subscriber sessions.

Table 2 :
Important definitions about user behavior.

Table 4 :
Features of application behaviors.

Table 5 :
Typical privacy information obtain model.
Figure 12: How OCS notifies GGSN to delete a PDP context.