Data Traffic Offload from Mobile to Wi-Fi Networks : Behavioural Patterns of Smartphone Users

This paper presents a model for defining the behavioural patterns of smartphone users when offloading data from mobile to Wi-Fi networks. The model was generated through analysis of individual characteristics of 298 smartphone users, based on data collected via online survey as well as the amount of data offloaded from mobile to Wi-Fi networks as measured by an application integrated into the smartphone. Users were segmented into categories based on data volume offloaded from mobile to Wi-Fi networks, and numerous user characteristics were explored to develop a model capable of predicting the probability that a user with given characteristics will fall into a given category of data offloading. This model may prove useful for analysing smartphone user behaviour when offloading data.


Introduction
The number of smartphone subscription connections at the global level has reached 3 billion, and in the last five years, data traffic has increased more than 40-fold [1][2][3][4].According to a Cisco report [5], smartphone-based data traffic is expected to exceed 80% of total data traffic generated on mobile networks by 2020.The increased use of smartphones, adaptation of price plans, and user demands for entertainment services and video content are accelerating the increase in data traffic offloading, and this growth is expected to continue [6,7].Data traffic offloading refers to traffic via complementary access network technology (e.g., Wi-Fi, WiMAX) instead of traffic via mobile networks [2,8,9].This paper considers only offloading involving Wi-Fi networks [10], which account for a substantial proportion of data traffic offloading.More than 180 million Wi-Fi access points are spread around the world [11].Wi-Fi technology is popular because it does not require licenced frequency spectrum and provides users with mobility to move within the local area coverage [12].The quantity of data traffic that is currently being offloaded to Wi-Fi networks, primarily because of Wi-Fi access points in households, already exceeds the total amount of data traffic in mobile networks, and this proportion is expected to keep growing [6,13].A Cisco report [5] concludes that the amount of smartphone-based data traffic offloading to Wi-Fi networks will be 56% of total data traffic by the year 2020, and studies have claimed that 65% [14] or more than 70% [15] of smartphone data traffic is being offloaded to Wi-Fi networks.One study of smartphone use found that the ratio of data traffic generated using Wi-Fi and mobile networks was greater than 4 : 1 [16].At the global level, data traffic on mobile devices occurs via Wi-Fi networks and mobile networks in the ratio 80%/20% [17].Smartphones and phablet devices will continue to dominate the generation of mobile data traffic [5].
Mobile network operators are aware that the use of Wi-Fi networks is a key element in smartphone users' behaviour.Measuring such use is challenging in part because of the fragmentation of Wi-Fi networks.As a result, mobile network operators cannot gain a detailed view of smartphone connection with Wi-Fi access points or of data traffic offloading.
Using applications integrated into the smartphone itself offers the possibility of measuring data traffic offloaded onto Wi-Fi networks, which would provide, for the first time, the opportunity for mobile network operators to measure this aspect of user behaviour.This could lead to insights that could guide the design of strategic actions, optimisation of operations (e.g., personalisation of services and price 2 Wireless Communications and Mobile Computing plans), market segmentation, user targeting, and planning of telecommunication networks based on potential implementation of complementary Wi-Fi access networks.

Overview of Previous Studies
Many authors have analysed the use of mobile phones within various contexts, but few have examined the reasons for the substantial differences among users in the amount of smartphone data traffic offloaded [20][21][22].
Several studies have considered the generation of data traffic exclusively involving mobile networks, and this literature has established connections among user behaviour, smartphones, and particular information-communication services.Alwahaishi and Snášel [23] analysed user behaviour, characteristics, and intentions regarding the use of mobile Internet.Falaki et al. [24] analysed in detail patterns and volumes of mobile data traffic generated by smartphones during the use of various information-communication services and applications, such as Internet searching, instant messaging, and use of navigational maps.Baghel et al. [25] analysed the volume of data traffic generated by various smartphone apps.Ghose and Han [26] surveyed a large sample of users to derive a detailed view of their behaviour during the use of information-communication services and applications.Similarly, He et al. [27] analysed patterns of data traffic generated with various smartphones, in this case with the goal of achieving desired performance levels in mobile networks.Yang et al. [28] established a link among the volume of data traffic generated, smartphone-based apps, and user mobility.Binde and Fuksa [29] examined the development of mobile communication networks and information-communication services as factors affecting mobile Internet use in Latvia.
Other studies demonstrate possibilities for measuring data traffic generation on mobile phones or smartphones.
One study [20] illustrated various approaches to such measurement, each with its advantages and drawbacks.Another study [30] laid out in detail research methods for such measurement and described their applicability to given research objectives.That work explored the concept of measurement points inside telecommunication networks as well as the possibility of predicting smartphone user behaviour and use of information-communication services.
Soikkeli [31] analysed periods of mobile phone use based on measurements collected on the mobile devices themselves.The data were collected by traffic-monitoring software previously installed on the devices of test users, and they captured the context in which the mobile phones were used, contributing to the understanding of user behaviour.Another study [32] also collected data on the use of mobile services using an application on the mobile device itself, which were complemented with data collected from a user survey.An app-based study analysed the distribution of smartphone-based data traffic on mobile and Wi-Fi networks across different times of day and on different days of the week [33], providing insights into working habits and behavioural patterns of smartphone users.
Few studies have examined how smartphone users access Wi-Fi networks, how they would want to use them in the future, and what makes them prefer to connect their device to Wi-Fi rather than to the mobile network [1].One study [34] used an application for iOS and Android operating systems to analyse how the users of smartphones access networks (3G, 4G, and Wi-Fi access).Their results suggest that users select the network based on their access to newer communication technologies, their possibilities for data transfer speed, and economic possibilities.Smartphone user behaviour changes depending on the access network selected [16], and mobile network subscribers often choose to access Wi-Fi rather than a 3G or 4G mobile network [15].This has given rise to the offloading "blackhole": when mobile subscribers switch from their mobile network to Wi-Fi, most operators can no longer track their activities, their data traffic amount, the quality of their user experience, or the security of their Internet connection [35].This blackhole represents a large proportion of mobile subscriber behaviour: one study [34] indicates that users increase their data traffic when they move from a mobile to Wi-Fi network, and more than 40% of users connect to more than one Wi-Fi access point in a single day.This data traffic offloading creates several challenges for telecommunication operators, for which some authors have already proposed some solutions [8].Bakhit et al. [36] concluded that offloading to Wi-Fi networks is an optimal offload option since it takes advantage of Wi-Fi availability and throughput capacity.Indeed, the data traffic offload from mobile to Wi-Fi networks has been stimulated by "free" availability of Wi-Fi access at homes, simplicity of automatic selection of the Wi-Fi network by smartphones, quality of experience (QoE) on mobile networks, and price structure of data services of mobile networks [16].Hetting [37] states that the majority of users consider Internet access via Wi-Fi networks as less expensive, simpler, faster, and more reliable than access via mobile networks.
One approach to researching the blackhole of data traffic offloading is to use applications installed on the mobile device itself to measure the traffic (e.g., [38]).Taylor et al. [1] analysed user behaviour, smartphone use, and interest in accessing Internet via Wi-Fi networks.Kaisar [39] compared data traffic generated via mobile or Wi-Fi networks with an application that collected data on user characteristics such as network, location, and time.A potentially more informative approach is to supplement measurements taken on users' mobile devices themselves with data from surveys of the same users [32,40].
Previous studies have documented the diversity of Internet access and of data traffic generation and offloading when using smartphones with mobile and Wi-Fi networks.Some studies have also examined user characteristics with respect to data traffic offload.Nevertheless, how user behavioural patterns relate to amounts of data traffic offload is poorly understood.The present study addresses this question by segmenting users according to their level of data traffic offload.This work relied on a mixture of data from measurements taken by an application integrated into the user's mobile device, as well as data from a user survey.

Research Methodology
The methodology for defining behavioural patterns of smartphone users according to data traffic offload from mobile to Wi-Fi networks is presented in Figure 1.

Identification of Factors and Determination of Preferences.
The first phase of the model development methodology presented in Figure 1 includes the following: (a) Identification of relevant factors that affect the generation of smartphone data traffic.
(b) Determination of user preferences for data traffic offload from mobile networks to Wi-Fi networks.
Relevant factors and user preferences identified in the first phase formed the basis for distinguishing user characteristics directly related to the generation and offload of data traffic.Husnjak et al. [18] identified several factors affecting the generation of smartphone data traffic, including communication technology of the mobile network, size and resolution of the device monitor, device operating system, and price plan and options.Those authors then evaluated each factor using an adequate mathematical method, and those results were incorporated into the present work together with user preferences that Husnjak et al. [19] determined to influence user smartphone data traffic offload from mobile to Wi-Fi networks; those preferences include data offload speed of the Wi-Fi network, availability of the Wi-Fi network, and price structure of access to that network.The factors and user preferences identified in these previous two studies served as the basis for preparing the online survey in the present work (see Section 3.2).

Measurement, Collection, and Processing of Data.
The second phase of the methodology is collection of data about smartphone data traffic offloading and relevant characteristics of the smartphone users.Data were collected from two sources: (1) An application integrated into user smartphones provided data about data traffic generated on mobile and Wi-Fi networks.(2) An online survey provided data on relevant characteristics of the smartphone users.
Figure 2 represents these two flows of data collection, together with the subsequent phase of data anonymisation and combination.Data were collected during three weeks from 298 participants who responded to the online survey and whose smartphones provided data on data traffic offloading.Data were collected from the smartphones using an embedded (integrated) application compatible with all Android operating systems that provided access to historical information on the amounts of data traffic generated.Both sets of data were collected from most participants at a single interview (Figure 2), when they were first asked to fill out a paper-based form about data traffic based on the data provided by the integrated application.Then, participants received a link to an online survey, which they completed on their smartphone.Some participants filled in the survey later using a computer or any other electronic device.The integrated application provides an overview of the amount of data traffic generated via mobile and Wi-Fi networks (Figure 3), as well as an overview of the data traffic generated by specific applications.Although the graphical interface appeared different based on the Android version, the functionalities were the same.Data traffic was measured over a 28-day period.

Wireless Communications and Mobile Computing
Questions on the survey, which were adapted from our previous work [18,19], took into account various contexts and ecosystems of smartphone use, as well as user behavioural patterns (Table 1).The survey included questions such as "What is the size of the screen on your smartphone?," "How much data traffic is included in your mobile plan?," and "Do you have access to a Wi-Fi network at home?" 3.3.Combining and Anonymisation of Data.The anonymity of data was guaranteed by a unique identification key issued to each user.This identifier was on the form filled out by the user based on the data measured by the integrated application.It was also the identifier linked to the online survey filled out by the user (Figure 2).

Definition of Model Variables
In this paper, the dependent variable was the categorical (ordinal) variable defined according to the amount of smartphone data traffic offloaded from mobile to Wi-Fi networks.The elements and types of variables necessary for the model development are presented in Table 2.
After defining these variables, a database was formed [41] by entering numerical values as their absolute value and entering categorical variables as codes.For instance, the attribute "gender" can be coded such that 1 denotes male and 2 denotes female.In the present study, the dependent variable was categorised according to the amount of data traffic offloaded.Independent variables were taken from the online survey.Both types of variables were brought together to develop a model as depicted in Table 3.

Determining the Dependent Variables in the Model
. In a logit model, independent variables can be continuous or categorical, but the dependent variable is always categorical.Previous authors have pointed out that categorising smartphone users according to data traffic varies from market to market and often reflects the availability and types of price plans [42].Unfortunately, we could not define our user categories simply based on price plans or on previous studies because of the extensive differentiation of price plans in the Croatian telecommunication market in terms of included data traffic, and also because the inclusion of data traffic in a given plan relates only to data traffic over the mobile network but not over Wi-Fi networks, which is precisely the focus of the present study.Given the lack of consensus among researchers about how to categorise smartphone users according to data traffic offloading, we decided to categorise users into five types (Table 4).Five is often the number of groups chosen in user segmentation research [e.g., [42]].Additionally, according to [41], the user segmentation can be defined depending on the needs of the researcher.The values of the amount of data traffic offloading (Table 4) are by no means comparable to other research since the authors did not find a research that analyses the smartphone data traffic offloading from mobile to Wi-Fi networks and defines user categories.
These categories were used to classify all users in our sample (Table 5).The total data traffic offloading in the sample varied from 0 to 215 GB.

Determining the Independent Variables in the Model.
The possible independent variables were user characteristics derived from the online survey (Table 6).
As in linear regression, logistic regression involves generating potential models and selecting the one that best explains the given data.The model should have as few variables as possible, since increasing the number of variables increases the standard errors of parameters, rendering the model numerically unstable.One-dimensional analysis followed by multidimensional analysis of the coefficients of independent variables was used to define the best model.Independent variables associated with -test ( > ||) < 0.25 in the onedimensional analysis were retained in the multidimensional analysis.The final model contained six independent variables (Table 7).

Definition of Behavioural Patterns by Determining Probabilities of Event Occurrence
Logistic regression plans the result (Score) , which represents the linear function of the independent variables and related coefficients of logistic regression for single user : where   is the score for one selected user ;  0 ,  1 , . . .,   are the values of coefficients along with the analysed independent variables;  1 ,  2 , . . .,   are the analysed independent variables for single user .
where   is the result for one defined (selected) user ;  3 ,  9 ,  10 ,  13 ,  16 ,  17 are the levels of independent variables determined as significant in the developed model of this paper for single user .
Consequently, the forecasts of the probability of occurrence of a certain category of the dependent variable of ordinal logistic regression represent the probability that  +   lies within the interval between the determined points of intersection (cut) [43,44].Here, parameter   represents the value of random error.Therefore, the forecast of the probability of occurrence of a certain category of dependent variable for a single user is calculated as follows: This holds for category of users, light users, who offloaded the smallest quantity of data traffic from mobile to Wi-Fi networks.The probability of belonging to the next higher category, in this case medium users, is In general, the probability of belonging to a certain category of a dependent variable (( ≤ )) represents the difference of cumulative probabilities, as follows: This equation was used to calculate probabilities that a smartphone user with given characteristics belongs to a certain category of dependent variables according to the quantity of data traffic offloaded.

Results and Discussion
6.1.Data Analysis.A total of 298 participants filled out the form requesting data traffic generation based on measurements made by the integrated application on their smartphones, and they completed the online survey about user preferences.Of this total, 221 (74.16%) were men and 77 (25.84%) were women.According to the survey, 98.9% of participants use Wi-Fi networks, and 74.16% reported using Wi-Fi networks most often when accessing the Internet and when up-or downloading data.Most participants (77.85%) reported using only Wi-Fi networks when updating applications on their smartphones, while 14.43% reported using both mobile and Wi-Fi networks to do so.An example of 28-day history of data traffic via mobile and Wi-Fi networks for a single study participant is shown in Table 8.For the purposes of the present study, the amount of data traffic generated via Wi-Fi networks was taken to be the same as the amount of data traffic offloaded from mobile to Wi-Fi networks.This is because the main focus of this study was data offloading.
Table 8 shows a large difference in data traffic between mobile and Wi-Fi networks, which was the case for our study population overall (Figure 4).This graph shows that The data traffic offloaded onto Wi-Fi networks by all 298 users involved a total of 564 applications used on Wi-Fi networks (Table 9).
Analysis of individual application data in Figure 4 and Table 9 shows that YouTube was associated with the largest amount of data offloaded to Wi-Fi (1151.85GB), followed by Facebook (450.51GB) and supplementary apps (341.48GB).This last category includes myriad applications used by the user, such as applications for weather forecasting, health/fitness, and web searching.The music streaming application Deezer was associated with the smallest amount of offloaded data (14.94GB).
Analysis of application categories shows that the categories associated with the largest amounts of offloaded data were video players and editors (1182.50GB), social (712.07GB), and communication (348.80GB).These amounts of offloaded data accounted for 73% of total data traffic offloaded onto Wi-Fi networks, which is consistent with previous studies [5,42].The categories associated with the smallest proportions of data offloading were other (80.99%),social (82.34%), and communication (82.85%).These low proportions of offloaded data likely reflect the importance of these applications to our primarily millennial users for staying in touch with their social contacts and with the latest trending content.In addition, applications such as Facebook and Instagram operate continuously in the background and are accessed occasionally by the user, regardless of whether the Internet network being used is a mobile network or a Wi-Fi network.In contrast, 97.81% of all data traffic in the entertainment category occurred via Wi-Fi offload, probably reflecting the notion that users did not consider these applications as constant needs and so chose to use them nearly exclusively when a Wi-Fi network was available.

Model Analysis.
The model was generated with the independent variables 3, 9, 10, 13, 16, and 17, for which the respective coefficients () of ordinal logistic regression are shown in Table 10.
The coefficient  measures the direction and magnitude of the change in the log likelihood that the user would    fall into a lower or higher category of data offloading following a 1-unit increase in the independent variable.Thus, the independent variables 3, 10, and 16 reduced the likelihood that the user would fall into a higher data offloading category, while 9, 13, and 17 increased this likelihood.
To use the model to calculate probabilities that a given user would fall into each of the offloading categories, concrete values of independent variables (such as for the example shown in Table 11) were entered into (1).For example, the probability that the individual in Table 11 is a light user ( = 0) is where   is defined in expression (2).From this, it follows that where  = 1071194 (unique identifier of the selected user, unique user).
The value cut 1 in expression (6) represents the value of the first point of intersection, which is determined by the Stata software.The number of cut values is always one smaller than the number of dependent variable categories, so there were four cut values for the five categories of dependent variables in our study.For the user in Table 11, In other words, the probability that the user in Table 11 is a light user for data traffic offloading from mobile to Wi-Fi networks is 75.82%.The probability that the same user is a medium user ( = 1) is where This process can be repeated for all the categories (Table 12).This approach can be used to explore what behavioural patterns are consistent with high probabilities of belonging to certain categories of data traffic offloading, as illustrated in the subset of study participants shown in Table 13.This analysis for the entire study sample is shown in Annex 1.
Further analysis of individual coefficient values suggests factors that may help classify users according to their offloading behaviour.For example, users with their own income during studies (3) are less likely to belong to a higher category of data offloading than users without their own income.This is understandable, since those with their own income are less concerned with offloading because they can afford to purchase mobile plans that include a larger data transfer amount.Consistent with this idea, users with sufficient data transfer allowance in their mobile plan (10) are less likely to belong to a higher data offloading category.Users who update their applications exclusively on Wi-Fi networks (16) are nevertheless willing to transfer data over mobile networks as well, which is reflected in their lower likelihood of falling into a higher offloading category.
Users who have Wi-Fi access at home (13) are much more likely to fall into higher offloading categories, as are users who define their application settings to operate only on Wi-Fi networks (17).Interestingly, users with larger data allowances in their mobile plans (9) are more likely to fall into higher data offloading categories than users with smaller data allowances.This may reflect the fact that most users with larger data allowances generate a relatively large amount of data traffic, regardless of the network involved.
Useful insights could be gained by incorporating additional information into the model, such as the timing of data offloading and at what types of Wi-Fi access points users offloaded their data, according to a particular application installed on smartphone.As a necessary requirement to this, a third-party application that provides measurements of such and similar phenomena is imposed, since the possibilities of integrated application used in this research are utilised in the best possible way.This falls outside the scope of the present

Conclusions
This study has demonstrated the possibility of using ordinal logistic regression to correlate user preferences or characteristics with the likelihood of offloading different amounts of mobile data from mobile to Wi-Fi networks.This model provides one of the most extensive analyses of smartphone user behavioural patterns and how that behaviour may relate to data offloading amount.This model, and the approach of collecting data from user surveys as well as from applications installed on their smartphones, may be useful for mobile network operators and service providers for understanding customer behaviour, segmenting their markets, and differentiating their price plans.In addition, the results and approach may be useful for Wi-Fi network operators and service providers, who control an ever-growing share of data traffic but who currently lack market analytics to analyse, for example, the increasingly important "blackhole" of data traffic offloading.The modeling of offloading described here will help various stakeholders better exploit the revenue potential of the offloading market.This model can help network operators and service providers offer special options or differentiated price plans to take advantage of the tendency of certain users to use Wi-Fi networks.For instance, operators may offer unlimited email for roaming services or a data package that includes free access to popular social networks.The model presented here creates new potential for strategic action, rational optimisation of services and price plans, and personalisation of services that can explicitly take into account the increasingly important Wi-Fi network ecosystem.

Phase 4 :Phase 1 :Figure 1 :
Figure 1: Methodology for the development of the model in the paper.

Figure 2 :Figure 3 :
Figure 2: Flow of data collection and processing.

Figure 4 :
Figure4: Amount of data traffic offloading from mobile to Wi-Fi networks for different categories of applications (as defined in Google Play Store) for all 298 users, based on data provided by the integrated application on users' smartphones.

Table 1 :
[18,19]sly identified factors and user preferences[18,19]that served as the basis for the online survey in the present work.

Table 2 :
Elements of models and types of variables applied to the model in this paper.

Table 3 :
Presentation of the elements of the base of collected data for the development of the model and examples of possible values of variables.

Table 4 :
User segmentation based on data traffic offload.

Table 5 :
Classification of users into predefined categories based on data traffic offload.

Table 6 :
Database formed from user characteristics.

Table 7 :
Independent variables that emerged as significant from multidimensional analysis.

Table 8 :
Example of 28-day data traffic for a single smartphone user in our study.

Table 9 :
Categories and types of smartphone applications generating the highest levels of data traffic offloaded from mobile to Wi-Fi networks, based on data from all 298 users.

Table 10 :
Significant independent variables and values of logistic coefficients in the model.

Table 11 :
Example of one smartphone user's values for the independent variables in the model.

Table 12 :
Probabilities that the individual in Table11belongs to each of the categories of data traffic offloading.

Table 13 :
Subset of study participants showing various behavioural patterns and corresponding variation in the probability of belonging to each category of data traffic offloading.focused on the volume of data offloaded from mobile to Wi-Fi networks.