Analysis of Individual’s Activity Space Based on the Cellular Signaling Data

In the overall planning of a city, it is important to formulate the reasonable structure of urban space which needs lots of research studies as strong supports. One of these supports is the relationship between the urban built environment and human behavior, and this has been of interest to the field of urban transportation planning. *e essential element in this research field is the development of appropriate measures for individual’s activity space based on the collected data. *is study introduced a new dataset, the cellular signaling data (CSD), and corresponding measures to analyze the relationship between the urban built environment and individual’s activity space. *e CSD have more detailed time-space stamps of individual’s activities compared with traditional surveys, questionnaires, and even call detailed record (CDR) data. *e individual’s activity space is defined based on the anchor point theory.*e convex polygon approach was used to describe the geometrical shape of individual’s activity space. *e proposed methodology was verified with the CSD collected in Shanghai. *e results show that the total number of the cellphone users investigated in this study can be categorized into three different groups with specific characteristics of activity spaces. *e results may benefit for related urban agencies to implement customized policy for the purpose of transportation demand management.


Introduction
e urban strategic spatial policy has raised a number of topics, for example, during the outline stage of urban master planning, how to develop the urban space structure. We have known that what we need are not only the healthy modes of transportation but also a healthy space structure. e traffic patterns with the priority of bus cannot take the place of a reasonable urban space structure, which should also be the focus of strategic controlling. Under these circumstances, it is necessary to discuss the relationship between the urban built environment (BE) and individual's activity space. And this arises two related issues: how to discuss the activity space of the same kind of residents under different built environment and how to discuss the activity space of different kinds of residents under the same built environment. e relationship between the urban built environment and individual's activity space has been of interest to the field of urban transportation planning [1][2][3]. For example, recent studies [4,5] have focused on the impacts of land use and design policies on the usage of different transport modes, such as transit, walking, and bicycling. e results of such relationship then can be used by urban planners for the evaluation of proper policies to guide human travel activities. e essential element in this research field is the development of appropriate measures for individual's activity space based on the collected data. In other words, there are two keys in this research field: the data and the corresponding appropriate measures.
Previous studies on the link between the urban built environment and individual's activity space mostly rely on traditional traffic surveys and questionnaires and the corresponding measures including travel behavior patterns [6], human mobility patterns [7], and activity patterns [8]. However, the traditional traffic data collection is time consuming and expensive, and the results from traffic surveys and questionnaires are specific for certain research area under certain time period. All the above shortcomings of the traditional data collection restrict the development of the temporal and spatial patterns of individual's activities.
Recently, the widely implemented intelligent devices, for example, GPS, GIS, the Internet, and especially the mobile phone, offer a great opportunity to investigate the detailed relationship between urban built environment and individual's activity space.
ere are generally two different kinds of data generated from mobile phones. One is the call detailed record (CDR) data, which are generated with the time and location information when an individual makes a call or sends a message. is dataset is currently used in some studies [9][10][11][12]. However, because individuals make calls or send messages randomly, the major drawback of the CDR data is that the data are not uniformly generated. e sample size of the CDR data may not be sufficient to analyze the temporal and spatial distribution of individuals' activity space. e other data generated from mobile phones are the cellular signaling data (CSD), which are generated not only when people make phone calls or send messages but also when the location of the device changes, e.g., from the coverage of the current base station to the adjacent station (please see the details in the data section). us, compared with CDR data, the CSD contain much more temporal and spatial records of individuals. Yet, the questions of what and how to use the CSD on the link between the urban built environment and individual's activity space remain largely unexplored.

Literature Review
Under the background of the big data, the research emphasis transferred from "based on OD" to "individual's regular activity pattern caused by the external environment effect." So, how to describe the individual's activity space effectively and comprehensively has become a necessity. And there are two main tasks for us to do: to describe the activity space and to mark the category attributes according to activity patterns. e data collected from mobile phones are the emerging widely and used in current studies, such as the road travel speed [13], the acquisition of OD matrices [9,14], and the traffic predication and path selection for urban road network [15]. Meanwhile, lots of work based on mobile phone data have been done for urban and traffic planning (like the decision support system [10]) and transportation construction and management [16]. Besides the cell phone data, traditional surveys have contributed to the research of travel behavior, like pedestrian behavior, driving behavior [6,7,17], and consumers' trip [18].
Some studies have attempted to figure out the human mobility patterns or to what extent can we predict human mobility patterns. By measuring different entropies of individuals' trajectories, the distribution of actual entropy was captured. And there was a 93% potential predictability (the distribution is narrowly peaked) in user mobility which conflicts with intuition that is the relative regularity of users who travel the most is higher the others [19]. e gyration radius was calculated to interpret the user's characteristic distance, and its distribution follows the truncated powerlaw as the same as the travel step size. Furthermore, after removing the anisotropy and rescaling the trajectories, all human mobility collapsed into a same pattern and researchers found that individual trajectories can be characterized by the same gyration radius-independent two-dimensional probability distribution [11,20]. For the verification of results from mobile phone data, studies find that trajectories estimated by models are similar to the real ones, and the radius of gyration is an appropriate way to present human mobility [12].
From the view of a large scale, human mobility patterns and the corresponding travel behavior form the individuals' activity space. Researchers have described the activity space from different aspects, most of which relied on the anchor point theory [21]. To obtain the human activity space, trips were reconstructed and the distribution of reconstructed starting time and duration of activities was discussed [22]. Another method is to study the spatial density and distribution of individuals' activity [23] or the intensity and entropy of activity [24]. To describe the geometrical shape of the activity space, the standard deviational ellipse (SDE) technique was used [25]. However, the SDE technique will overestimate the spatial spread. Now, in this paper, we found a convex polygon to solve the problem, and it can also show the direction as well as the SDE. Furthermore, geographers have tried to explain human activity space from the aspect of time geography [26,27], and visualization of human mobility patterns was made in 2D and 3D dimension [28,29]. Table 1 concludes the typical articles relating these issues with new data resources.

Dataset.
e data we used are cellular signaling data (CSD) which were recorded from September 1th, 2011, to September 30th, 2011, and the detailed format of the data is shown in Table 2. e column "MSID" is the only identification of the mobile phone user which is encrypted. e "Date Time" is the timestamp of the signal. And the combination of "LAC" and "CI" can identify the base station through which we can locate the mobile phone user (obtain the latitude and longitude coordinates). e great advantage of cellular signaling data is that they not only contain CDR data but the location update data and the handover data. e location update data include normal location update, periodic location update, and IMSI attach (caused by cell phone power on). e handover data include the CDR data and the switch data between base stations. In more general words, these following activities will generate CSD: the cell phone powers on; the user makes a call or sends a message; and the user moves from the coverage of a base station into another one ( Figure 1). So, we can obtain detailed information of an individual's trajectory and cell phone usage.
Two datasets were used in this study. e first dataset (D1) has a total of 6441389 logs, which were collected from 1500 people who are randomly in Shanghai. Figure 2 shows the spatial distribution of the D1. e second dataset (D2) consists of 18844 people who lived in three communities (Jingan, Dahua, and Gucun) (see Figure 2(c)) along Shanghai Metro Line7. ese 3 communities locate, respectively, around the inner, central, and outer ring, and others characteristics are shown in Table 3. We use the dataset D2 as comparison group to the dataset D1. A detailed description can be seen in the section of methodology.

Data Preprocessing.
e raw CSD cannot be used directly in this study due to the data noise and the structure of the data. us, a three-stage framework was proposed in this study to preprocess the raw CSD as follows and a detailed description of the data preprocessing can be seen in our previous study [30]: Step 1. Binning method: Because there are some overlaps in the coverage areas of two adjacent base transceiver station, frequency handover may occur as the mobile phone enters the overlaps of the serving cell and the adjacent cells. is may lead to the data noise. In order to clean the data noise, the binning method was used in this study to smooth the location information and reduce the volume of data. First, we distributed the chronologically sorted logs into bins of equal width (10 minutes). is threshold will mostly clean the noise and has little effect on data quality. Second, we replaced all the logs by one equivalent log. So, the location information was replaced by the weighted average of the original coordinates in the same bin.
Step 2. Raster data structure: ere were totally 23,918 base transceiver stations which are distributed unevenly and irregularly throughout Shanghai. So, the data structure was not fit for temporal and spatial analysis. In this study, we constructed a raster to cover the city territory of Shanghai. Cells of the raster were delimited with latitude and longitude with fixed intervals. So, every cell could be determined by the coordinate of centroid (lon c , lat c ) and the length of sides Δlon and Δlat, as shown in Figure 3. Given the radio coverage of the base transceiver station, the size of cells was 500 meters by 500 meters, and the base transceiver  If a user "A" moves from the coverage of base station "X" into the coverage of base station "Y", this will generate a log in CSD Mathematical Problems in Engineering stations in the same cell were replaced by one equivalent base transceiver station at the cell centroid.
Step 3. Identification of activity points: We are more interested in the points where people stay rather than those points people just pass by. So, we defined those interested points where people continuously stay for more than 30 minutes as activity points.

Identification of Anchor Points.
Anchor point theory was proposed by Golledge to explain the formation of the activity space: an individual in a specific space will first find the primary nodes (like home or workplace) and then secondary nodes and the roads connecting nodes will be recognized. ese hierarchical roads and nodes form an individual's activity space. e primary nodes are defined as anchor points to be distinguished from other activity points (secondary nodes). However, for the lack of individuals' personal properties in the data, we cannot recognize individuals' home-work places. So, here in this study, we set the rules for an anchor point "S" to follow: (1) In one day, an individual stays at the activity point "S" for more than 4 hours (2) During the 30 days of observation, (1) happens for more than 10 days, which means more than 40 hours totally (3) e distance between anchor point "S1" and anchor point "S2" for one individual should be more than 3   kilometers, which is about the maximum activity radius of Shanghai residents [31] 3.4. Method to Describe Activity Space. An important parameter of human activity space is its geometric shape. e widely used method [25,[32][33][34] is the standard deviational ellipse (SDE) technique; however, the SDE will definitely cause the overestimation of the activity space.
In this study, a convex polygon method was proposed to solve this problem. In the 30-day period, each individual had several activity points, representing several sets of coordinates in the map. We chose the outermost points to form a convex polygon by following the arrows (from 1 to 7) in Figure 4, and then we can calculate the area of the polygon. Furthermore, to represent the sprawl direction of activity space, we defined the major-minor axis ratio � major axis/ minor axis, where the major axis is the longest distance between the activity points (the length of AB in Figure 4) and the minor axis is the plus of longest distances from both sides to major axis (CD + EF in Figure 4). e specific method is as follows: (1) e outermost point of the resident stay point is screened out, and the distance between the two is calculated, among which the longest distance AB is the long axis of the convex polygon. (2) Starting from the leftmost point A, circle the stop points A into a convex polygon counterclockwise and calculate their area. (3) e distance from the outermost stop point to the long axis AB is calculated. e sum of the upper maximum distance EF and the lower maximum distance CD is the length of the short axis, and the ratio of the long axis to the short axis is used as the major-minor axis ratio to describe the spatial orientation of residents.

Distribution of Anchor Points.
e proposed methodology was used to obtain the distribution of anchor points in D1. e results show that there are 803 people (54%) with only one anchor point (group 1, G1), 301 people (20%) with two anchor points (group 2, G2), and 20 (1%) people with 3 anchor points (group 3, G3). ere are also 376 people who had no anchor point which mainly concluded the following parts: tourists or businessmen; old people who rarely used cell phone; and others with data missing.
is group of people is very complicated, so in this paper, we will not take them into consideration.
Group 1 with 54% people means these people have one anchor point. However, this does not mean that these people all live and work at the same place. is group includes (1) people have no work, like old people and young children; (2) people have no fixed work place, like deliveryman and drivers, and during the work time, they move in a region; (3) people who work at home; and (4) people who have two cellphones, and they leave a cell phone at home when they go to work.
Group 2 with 20% people means these people have tow anchor points. is group includes people who have fixed work place or have two home places. Group 3 (1%) means people with three regular activity place, like home, work place, or leisure and entertainment place. e spatial distribution of anchor points of each group is shown in Figure 5. And samples from the 3 groups tell us the intuitive geometry and distribution of the activity space. Group 1 converges in 1 center. Group 2 distributes to 2 centers. And group 3 is more scattering and with a larger range.
As for the individuals in dataset D2, we considered them with one anchor point because their homes were in the three large residential communities along Line 7.
It can be seen from Figure 5 that the individual with one anchor point moves around the anchor point. e individual with two anchor points has a relatively flat and long activity space, and other activity points gather around the two anchor points. An individual with three anchor points has a triangular activity space.

e Area and Ratio of Activity Space.
After applying this convex polygon method to the three different groups in dataset D1, we got the corresponding distributions of area and major-minor axis ratio as follows: (1) We found that distribution of the polygon's area (S) can be approximated by an exponential ( Figure 6): with exponents α|G 1 � 0.075, β|G 1 � 0.02, a|G 1 � 0.032, and b|G 1 � 0.003 and α|G 2 � 0.033, β|G 2 � 0.004, a|G 2 � 0.005, and b|G 2 � 0.00009. For group 3, there were only 20 individuals, and it was not enough to find a proper distribution. However, still we can see the obvious variation tendency of area from group 1 to group 3: the probability of a large area gets bigger. And this is quite fit with the actual situation: people with more anchor points are more likely to own a larger activity space.

Mathematical Problems in Engineering
(2) e major-minor axis ratio represents the ductility of the activity space. Intuitively, the shape of the individuals' activity space from group 2 should be more flat long than the shape from group 1, so the probability of a large ratio should be bigger for group 2. However, actually we found people in group 1 shared the same distribution of ratio with people in group 2, which can be well approximated by the Weibull model with 2 parameters (R means the major-minor axis ratio): Ratio-group3 (f ) Figure 6: Distributions of area and major-minor axis ratio (D1).
with exponents a � 0.12 and b � 2.2. is means group 1 and group 2 share the same potential of spatial anisotropy or sprawl direction. As for the group 3, the difference is the parameters: a � 0.09 and b � 0.018.
is indicates a bigger probability in a round activity space. However, overall, the shape of most individuals' activity space is roundness (14%) or ellipse (57%, length-width ratio � 2 or 3); that is, a person in a society tends to travel in different directions rather than a single one. e distribution of activity space size of the three groups of residents is very similar, almost obeying the distribution of long tail. Residents living within 30 square kilometers account for 62.9 percent and 59.6 percent, respectively, in group 1 and group 2 (Figure 7 shows distribution of the area of the samples). e vast majority of residents live within 100 square kilometers, that is, within 10 kilometers * 10 kilometers, accounting for 81.2% and 78.5% in group 1 and group 2. For transportation planning and urban planning, the travel needs of residents within this scope must be considered in the planning stage. For example, in terms of subway planning, the influence of subway station setting on the residents around the station should not be considered only, but the travel needs of residents in the region should be considered. And how can residents have better travel experience. e same is true for road network planning. Besides expressway, how can other roads in the road network    effectively provide sufficient support for residents' middledistance travel.
As for the whole sample, the area of their activity range is also the same long tail distribution, with 60% of the population living within 30 kilometers and 79.4% living within 100 square kilometers.
As for the dataset D2, we use it as a comparison group to G1. Because we have known that the individuals from D2 live in the 3 communities long Line 7, this means they have one single anchor point: the home. e distribution of area and ratio also shows the consistency with G1: the exponential with close parameters α|D 2 � 0.06, β|D 2 � 0.022, a|G 1 � 0.016, and b|G 1 � 0.004 and the Weibull model with close parameters a � 0.11 and b � 2.3. ese parameters are extremely close (Figure 8).
is similarity between G1 and D2 means the consistency on the human activity space. So, we can say these people share the same characteristics in the spatial activity dimension. is also illustrates from sides the rationality of the rule to distinguish anchor points.

Directional Distribution of Individuals' Activity Space.
Similar to the distribution of activity space size of residents, the directional distribution of activity space has little difference. It can be seen in Figure 9 that the major-minor axis ratio of most individuals' activity space is within 5, respectively, 66.3% and 67.4% in group 1 and group 2. Very few residents have a major-minor axis ratio greater than 10.
is shows the regularity of the activity space: the activity space of residents tends to extend in one direction with the major-minor axis ratio less than 5, and only a small number of residents have activity space that extends in the same direction and not in other directions.
is is of great research significance for transportation planning. First of all, individuals' daily travel presents a flat spatial structure, so in some traffic corridors and in the direction where some individual s travel actively, it is necessary to provide transportation support for the movement of these individuals, like subway or expressway construction. Second, there is more randomness to long-distance travel, which makes harder for transportation forecasting.

e Distribution of Travel Distance.
As we know, people's travel distance or displacement can be well approximated by a truncated power-law distribution (11,20): where Δd is the displacement, β is the power-law exponent, and κ is the cut-off value. ese values are different but close for different group of people (see in Figure 10). ings should be different for the three groups separated by the number of anchor points. However, we found that all the three groups collapsed into almost the same distribution. e only different thing is the maximum distance: 105 km for G1, 72 km for G2, and 54 km for G3. And the maximum distance of dataset D2 is 97 km, which is close to G1. So, we  Mathematical Problems in Engineering can say people usually travel in the same mobility pattern, except that people with fewer anchor points have to travel farther very occasionally.

Conclusions
Under the background of the big data, the relationship between urban built environment and individual's activity space is a necessity for urban strategic space policy. However, for the particularity of data environment, the representation of individual's activity space needs appropriate modification in the big data analysis. We need to extract effective indicators to describe individual's activity space based on spatial subordination relationships. is study proposed a methodology to investigate the individuals' activity patterns based on the cellular signaling data (CSD). Compared with call detailed record (CDR) data, CSD contain more detailed information of an individual's activity trajectory. e anchor point theory was used in this study to investigate individuals' activity spaces due to the lack of personal properties and failing to identify the workplace. A convex polygon method was proposed to describe the shape of the activity space, with area to represent the size and the major-minor axis ratio to represent the ductility. e distributions of area and ratio can describe the human activity space comprehensively. For human mobility pattern, the important index-the travel distance-should also be discussed according to different groups. Individual activity space area is very important for traffic planning and urban planning. For previous research and the traditional traffic model, it is difficult to describe the individual activity space. In this study, mobile phone data can be used to calculate the activity space scope and form of residents, which can effectively describe the current situation of the city in the initial stage of urban traffic planning and provide strong data support for decision makers and urban planners. e proposed methodology was tested with sample datasets collected in Shanghai. By setting a rule to recognize the anchor point, we separated the sample (D1) into 3 groups. Every group shows particular characteristics. Group 1 has the same distribution of major-minor axis ratio with group 2, but the probability with a large activity space is bigger for group 2. Group 3 tends to own a larger and more flat long activity space than the other two groups. People from D2 are very consistent with the people from group 1 in every aspect we mentioned in this paper. So, we can say these people belong to the same category from the perspective of activity space and human mobility.
is study is only the first step trying to use CSD to describe individuals' activity patterns. In future, a more scientific rule is needed to distinguish the anchor point with the normal activity point. e rule should include not only the time threshold but also the frequency of the contact between activity points. e present results of the study indicate that individuals with different number of anchor points have various mobility patterns and activity spaces.
is will give us a better understanding of the individuals' intrinsic characteristics temporally and spatially in transportation area. Follow-up work will include the following: the sample size will be exaggerated to the whole Shanghai to study the morphology and spatial structure of the whole city; the relationship between the built environment and the living space is considered, and the differences of activity space of different residents under the same built environment and the differences of the same type of residents under different built environment are discussed.

Data Availability
e data used to support the findings of this study are restricted in order to protect patient privacy. Data are available from Shaofei Song (1610764@tongji.edu.cn) for researchers who meet the criteria for access to confidential data.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.