Delineating Mixed Urban “Jobs-Housing” Patterns at a Fine Scale by Using High Spatial Resolution Remote-Sensing Imagery

The spatial distribution pattern of jobs and housing plays a vital role in urban planning and traffic construction. However, obtaining the jobs-housing distribution at a fine scale (e.g., the perspective of individual jobs-housing attribute) presents difficulties due to a lack of social media data and useful models. With user data acquired from a location-based service provider in China, this study employs a deep bag-of-features network (BagNet) to classify remote-sensing (RS) images into various jobs-housing types. Considering Wuhan, one of the fastest developing cities in China, as a case study area, three jobs-housing types (i.e., only working, only living, and both working and living) at the land-parcel level are obtained. We demonstrate that the multiscale random sampling method can reduce the influence of image noise, increase the utilization of training data, and reduce network overfitting. By altering the network structure and the training strategy, BagNet achieved excellent fitting accuracy for identifying each jobs-housing type (overall accuracy > 0.84 and kappa > 0.8). For the first time, we demonstrate that urban socioeconomic characteristics can be obtained from high-resolution RS images using deep learning techniques. Additionally, we conclude that the total level of mixing within Wuhan is not high at present; however, Wuhan is continuously improving the mixture of jobs and housing. This study has reference value for extracting urban socioeconomic characteristics from RS images and could be used in urban planning as well as government management.


Introduction
Since the reform and opening up of China's housing system, a large number of urban residents have chosen to purchase newly built commercial houses [1][2][3], which has caused the collapse of the urban jobs-housing space structure in the planned economy period [4]. e freedom of residence facilitated by the housing system reform has made the separation of mixed jobs-housing patterns increasingly common in China [5].
Many studies have shown that the separation of the urban jobs-housing spatial structure is conducive to the effective concentration of businesses and has the advantages of an agglomeration economy [6][7][8][9]. However, this separation has also produced many urban issues, such as excessive commuting time, transportation costs, and increased environmental burden [10][11][12][13][14][15][16]. e increasingly severe urban traffic congestion and environmental degradation have become urgent problems. erefore, a study of urban mixed jobs-housing patterns can explain the internal residential space structure and provide a reference for understanding urban complexity and optimizing urban spatial layouts.
Some scholars have researched the jobs-housing spatial structure based on census or household interview data [4,17,18]. With household interviews, Wang and Chai [4] conducted a study of changes in the jobs-housing relationship and the traditional unit housing system in Beijing. Zhou et al. [18] investigated changes in the jobs-housing space and commuting structure in Xi'an with sampling survey data. e spatial resolution of these studies employs the basic unit of the census, such as administrative districts and streets. us, the generated results cannot reflect the distribution of jobs and housing at a fine scale, which means mixed jobs-housing patterns cannot be distinguished due to a lack of social media data from the perspective of individual attributes.
Because of the popularity of location-based services (LBS), many spatiotemporal data sets have evolved. ese data sets record the trajectory of human activities and can be applied to describe and understand the urban jobs-housing space structure [19,20]. For example, a city smart card system with location information is considered to be an effective way to analyze the personal data required for city commuting, and it has already been extensively employed to explain a city's jobs-housing space structure and commuting trajectory [21]. In addition to the city smart card system, signaling data from a mobile phone are also used in urban jobs-housing research [22]. ese studies show that it is possible to employ LBS data to understand the jobs-housing space structure; however, a useful model is still lacking to make full use of these data to figure out the mixed jobshousing pattern.
In recent years, scholars have begun to apply deep learning (DL) models to remote-sensing (RS) images for the extraction of economic activity characteristics. In RS images, the object scale variation can lead to weak feature representation for some scenes, influencing the classification results. To solve the problem of multiscale effects, Zhong et al. [23] combined a multiscale random sampling method with the large patch convolutional neural network (LPCNN) model for land use classification and obtained highly accurate land use classification results. Jean et al. [24] proposed an economic situation simulation model in an impoverished African country with data deficits using convolutional neural networks (CNNs), revealing that economic activity features can be extracted from RS images and applied to describe economic situations.
DL models can effectively mine social media data. Yao et al. [25] employed the word2vec model for extracting features in point-of-interest (POI) data. Incorporating the random forest model, they explored the spatial distribution of urban land use. Compared with the spatial density of RS image data, which covers an entire urban space, the distribution of social media data is sparse [26], which raises the question of what spatial scale should be considered when utilizing the data to analyze socioeconomic phenomena.
Studies also exist on extracting geometric information from multiple sources of data. Zhang and Du [27] found that information on urban scenes can be obtained automatically from high-resolution RS images. Liu et al. [28] introduced a probabilistic model to integrate multisource and geospatial big data to characterize urban mixed-use buildings. Song et al. [29] combined POIs with social properties and very high spatial resolution RS imagery with natural attributes to identify urban functions. Chen et al. and Shi et al. [30,31] found that CNNs could be applied to extract building geometric information from high-resolution RS images. Nevertheless, these urban structure studies cannot explain urban land use at a very fine scale and thus cannot delineate mixed jobs-housing patterns.
However, the above studies cannot present the urban structure from the perspective of individual attributes, especially in obtaining jobs-housing distribution because of the lack of media data and suitable models. In order to solve this problem, based on these studies, we conclude that the social and economic features of a city can be obtained from social media data and high-resolution RS images via DL models. is study addresses the question of whether this data combination can further explore urban mixed jobshousing patterns.
at is, do high-resolution RS images indicate high-level socioeconomic characteristics that reflect a mixed jobs-housing pattern?
We introduce a DL model to perform detailed simulations of the urban mixed jobs-housing pattern. e overall accuracy (OA) and kappa coefficient are used to evaluate the reliability of the model, and several case areas are chosen to analyze its reasonability. A comparison is conducted between the proposed method and several state-of-the-art DL models. As a result, false-color RGB images and the entropy index (EI) are used to visualize the global fitting map of jobshousing types (only working, only living, and both working and living) in Wuhan, China. In this study, we explore whether there is a relationship between high-resolution RS images and mixed jobs-housing patterns.

Study Area and Data
e study area, Wuhan (Figure 1), the provincial capital of Hubei Province, is located in central China, and it has a total area of 8,494.41 square kilometers and 11.081 million residents. In 2018, the regional GDP was 1,448.729 billion Yuan (http://www.wh.gov.cn/). e downtown area of Wuhan includes Jiang'an, Jianghan, Qiaokou, Hanyang, Wuchang, Qingshan, and Hongshan. e GDP of Jianghan, Wuchang, and Jiang'an exceeds 100 billion Yuan. e most critical data employed in this study comprise LBS user data and geographic location information. e data set is provided by one of the largest Internet companies in China, with a maximum user penetration rate of more than 85%. In large cities, such as Beijing, Shanghai, Guangzhou, Shenzhen, and Wuhan, the penetration rate exceeded 90%.
is data set contains the trajectory information of approximately 500,000 random anonymous users within three months (2018.3.1-2018.5.30) in 2626 communities of Wuhan and represents the working and living locations of these users. e trajectory information is obtained from anonymous users who have granted permission of collecting Global Positioning System (GPS) data during the process of using the LBS application. e main area of human activities during the 3 months is located, with a buffer zone set as 500 meters in width [22,32]. We focus on occupants in an age range of 18 to 65 years old (excluding students, freelancers, and retirees). e geographic location information was obtained from the user data set, which is shown in Table 1. Particular attention should be paid to the student population, as in China, the permanent home for students is 2 Complexity at school, which leads to the result that 90% of the population of schools has the attribute of only living at the school location. Hence, to analyze the distribution of faculty, the student population is excluded from the data set. e user data set estimates the proportion of three types of jobs-housing attribute populations at the land-parcel scale. is data set includes the rates of the attributes only working (OW), only living (OL), and both working and living (WL) together with the location information. In the original data, the sum of OW, OL, and WL residents within each parcel is not equal to 1. e potential reason is that some users do not open the function of GPS during the process of using the LBS application. We transform these three rates of jobs-housing attributes to make the sum of them equals to 1, which could reveal the relationship between three attributes and be convenient for future calculation and visualization. WL residents represent people who work and live in a parcel with a buffer zone set to 500 meters, including self-employed residents, operators of small-scale companies, and industrial park dormitory workers. e distributions of three rates of jobs-housing attributes are shown in Figure 2.
We also collect the land-parcel data of Wuhan from Gaode Maps, which is one of the largest map service companies in China, as the basis for the division of land parcels in the study area. e urban area of Wuhan contains 8,257 land parcels with an average area of 100 square meters per parcel. All of these land parcels are urban functional zones that do not include vegetation, water bodies, soils, or roads. e spatial distribution of land parcel is shown in Figure 3. Figure 1 shows an RS image downloaded from Google Earth at the level of 16, which contains RGB bands with a size of 32,512 × 32,768. According to the research of Yao et al. [25], the shadows in the background have little influence on extracting functional zones from Google Earth RS imagery and the slight date difference between geospatial data and RS image is tolerable on integration. User data could be related to the RS image via geographic location information. Based on the latitude and longitude of each land parcel, we cut the RS image into small pieces (estimated as 100 × 100 pixels) and obtained the RS image data set. e RS images of three typical jobs-housing types in Wuhan are shown in Figure 4.

Methodology
e workflow of this study, as shown in Figure 5, can be summarized as follows: (1) e user data set is classified to form the multiscale spatial data set by a multiscale sampling method after data preprocessing. (2) A state-of-the-art DL model is adopted for identifying the relationship between the spatial data set and the mixed jobs-housing index. (3) With the trained model, three distributions of mixed jobshousing types within Wuhan were estimated and evaluated by the OA, kappa coefficient, and entropy index.

Data Preprocessing.
In this study, because the sum of three jobs-housing attribute rates in the original data is not equal to 1, we cannot apply one model to predict three attributes. Classification is an application of CNN; therefore, we choose to build three CNN models for WL, OL, and OW separately; however, the user data in this study are a continuous data set and need to be discretized [24]. Additionally, the user data are transformed to a normal distribution by oversampling [33]. Referring to the discretization process of Yao et al. [34], this study calculates the mean μ and standard deviation σ of WL, OL, and OW. e  4 Complexity user data are discretized in the range [μ − 3σ, μ + 3σ], with steps of 0.5σ. Based on the studies of Ren et al. [35], the surrounding environments have a potential influence on the function of a community. As described above, the user data is obtained by setting a buffer zone at 500 meters. is could consider the influence of surrounding environments. In this study, we need to make segmentations on the land parcels which could contain the communities and the surrounding environments.
Studies have shown that multiscale problems may occur in RS images due to variations in the resolution of the RS images [36,37]. In this study, although the multiscale problem is not a large issue, it still influences fitting accuracy [23]. e small and imbalanced sample size of the original RS image data set in this study may induce overfitting issues during the process of training. erefore, we employ the multiscale random sampling method proposed by Zhong et al. [23]. e method is as follows: first, according to the user data, the latitude and longitude can be obtained to locate the parcel of the user data on the RS image; second, the length of the sampling window is set to W (W is set as the size that ensures the parcel of communities can be completely covered); then, based on the parcel of user data, a certain number of samples are randomly considered with length s (0.75 W ≤ s ≤ W). Sampling each parcel in this way ensures that a sufficient number of multiscale spatial data sets are obtained. Figure 6 shows an example of multiscale random sampling, which is automatically obtained from the RS images based on location information.
is study combines oversampling and multiscale random sampling to build the training data set.

Mixed Jobs-Housing Pattern Extraction Based on BagNet.
BagNet is a CNN model conducted by Brendel and Bethge [38] that can fully utilize each part of an image and obtain complete information about the image. e structure of BagNet is shown in Figure 7. BagNet intercepts an input image with specific pixel dimensions and then uses a 1 × 1 convolutional layer on each image block to obtain a class vector. All class vectors of image blocks are summed together, and predictions are made based on the most significant class vector.
e BagNet structure differs from the structure of traditional CNN models, which always uses an entire image to calculate a class vector without obtaining the summation. In this study, superfluous information exists in the borders of an image, which does not contain a parcel of communities. But

Complexity
Geometry Group, University of Oxford, and Google DeepMind. VGGNet explores the relationship between the depth of CNNs and their performance by repeatedly using 3 × 3 convolution kernels and a 2 × 2 max-pooling layer, and it successfully constructs a 16-to 19-layer CNN [39]. ResNet is built by a residual block and can effectively solve the problem of gradient disappearance [40]. e study employed a cross-entropy loss function, which is a common loss function on classification: In equation (1), x represents the input category, the label is the index value of the actual category, and N represents the number of categories. Based on the control variable method, this study adjusts the segmentation window, batch size, and optimizer and obtains the best-performing model. By comparing the results of the BagNet model and other DL models, the effectiveness of BagNet in this experiment was verified.

Accuracy Evaluation and Urban Mixed Functional Pattern Analysis.
In the evaluation of RS image classification, a confusion matrix (Table 2) is usually applied to determine W = 108 pixels  the accuracy and reliability of the classification [41]. In this study, the classification results were evaluated by using the overall accuracy (OA) and kappa coefficient. e OA is expressed as the percentage of the total number of correct predictions, that is, the sum of all values of the diagonal elements in the confusion matrix, divided by the total of all samples (equation fd2 (2)). In the 1960s, Fleiss et al. [42] proposed the kappa indicator as an indicator of the extent to which the classification results outperform a random classification (equation fd3 (3)). Kappa falls between 0 and 1, and a higher kappa value indicates better classification results.
In these equations, n is the category, N is the sum of the number of categories, X ii is a diagonal element of the confusion matrix, X i+ is the sum of the columns of a category, and X +i is the sum of the rows of a category.
is study refers to the calculation of the entropy index in the landscape pattern index [43,44]. e entropy index is used to quantitatively measure the mixing degree of each parcel of user data. is value is calculated by equation (4), and the value of the mixed entropy falls in (0, 1). e higher the mixed entropy value, the higher the mixing degree of the land parcel.
In this equation, n is the total number of categories, and p i is the proportion of attributes in the lot.

Parameter Sensitivity Analysis.
In this study, the sampling window needs to be set to ensure that the parcel can be entirely contained, and the multiscale sampling method is used to obtain the spatial data in the sampling window. e data set is recorded as D and contains a total of 26,260 sets of data. is study randomly fetches 80% of the data and makes data augmentation by combining oversampling and multiscale random sampling as training data D TR , 10% of the data as verification data D V , and 10% of the data as test data D TE . In the network training process, D TR is used for training and the size of the training data set is around 200,000, D V is used for fine-tuning the parameters, and D TE is used to evaluate the final result.
As shown in Table 3, this step applied 9 × 9, 17 × 7, and 33 × 33-pixel image blocks to train the BagNet model. SGD was selected as the optimizer algorithm, the batch size was set to 32, and the learning rate was set to 0.01. According to these results, the larger the size of an image block, the better the accuracy results since a larger amount of information about the images is obtained.
As shown in Table 4, we set the batch size � 8, 16, and 32 to train the BagNet model. e image block was set to 33 × 33 pixels, SGD was selected as the optimizer algorithm, the learning rate was set to 0.01 and the dropout rate was set to 0.4. Properly setting the batch size decreases the use of computer memory and accelerates training. According to these results, setting the batch size to 16 can decrease the training time and maintain satisfactory results.
As shown in Table 5, we used SGD, Momentum, and Adam for training the BagNet model. e image block was set to 33 × 33 pixels, the batch size was set to 16, the learning rate was set to 0.01 and the dropout rate was set to 0.4. According to this result, the accuracy of SGD is better than that of Adam. After adding Momentum, SGD could obtain a better accuracy result than other optimizers.

Comparing with Several State-of-the-Art CNN-Based
Models.
e VGGNet, ResNet, and BagNet network models were selected as base models for the experiment. e training strategy was set to a dropout rate of 0.4, a batch size of 16, and a learning rate of 0.01, and SGD + Momentum was selected as the optimizer. After training, neither VGGNet nor ResNet converged, but the BagNet training converged. An analysis of the original image data set revealed that each image contained edge noise. VggNet used 3 × 3 convolution and 2 × 2 max-pooling throughout, which meant that every part of the image was involved in the training. However, the large amount of noise data on the edges of images interferes in the process of adjusting parameters in training. Although ResNet increased the depth of the network, it could not solve the problem of noise interference. When a large amount of interference information is confused with useful information, the training cannot converge.
BagNet does not consider the spatial sorting method of an image, which means that BagNet focuses on each part of an image instead of the overall image [38]. BagNet classifies images according to small local features of the images. e constraints on local features can directly determine how each part of the image affects classification, which enables the algorithm to fully utilize the total information of the image and reduce the weight of useless information obtained from noise data.
is means BagNet could obtain useful information from the center and borders of an image and reduce the influence of the noises because the final result of the vote depends on the majority of parts which have made   Table 6 shows a comparison of the three models.

Mixed Jobs-Housing Pattern in the Case Study Area.
Based on the previous comparison and analysis, this study adopted an improved model of BagNet-33 for the experiment. e training strategy was set to a dropout rate of 0.4, a batch size of 16, and a learning rate of 0.01, and SGD + Momentum was selected as the optimizer. e loss function during the training process of BagNet is shown in Figure 8. After obtaining the classification of three jobshousing attributions, we sum the product of means and probabilities of each category to estimate the fitting result. In this study, the global fitting results of three types of mixed attributes and typical plots using RGB synthesis were visualized [45]. e red band indicates WL, the green band indicates OL, and the blue band indicates OW. e spatial distribution of the resulting population composition ratio is shown in Figure 9. e entropy calculation was performed according to the WL, OL, and OW attributes of each parcel, and the distribution result is shown in Figure 10. e average mixed jobs-housing entropy of Wuhan is 0.1982. Wuhan has a typical large central group structure, which indicates that more resources are focused on the central area for the development of the economy. e city has a center-focused developmental spatial structure [46]. e working attributes gradually weaken from the central area to the surrounding area, while the residential attributes strengthen and tend to slowly become mixed work and residential attributes. e working centers of several remote urban areas, such as Caidian, Jiangxia, Huangpi, Xinzhou, Dongxihu, and Hannan, are located in the area close to the central city. In general, Wuhan's development is focused on the central urban area. Few areas exist with a single working or living attribute. e jobs-housing properties of most areas are mixed, which indicates that Wuhan is developing toward increasing the level of mixed land use [47].
is study selects three typical cases to prove the reliability of the analysis. Figures 9(A) and 10(A) are typical university education areas in Wuhan, including residential buildings and related living facilities, such as Nanwang Villa and the Sunshine Community. e residential area accounts for a relatively high proportion of residents; the OL attribute of the area is significantly higher than that of other regions, and the jobs-housing mixing level is relatively low (WL � 0.0124, OL � 0.9781, OW � 0.0095, EI � 0.1095).
Figures 9(B) and 10(B) show typical working areas that are famous scenic spots in Wuhan and compose a mixed administrative, medical, and cultural area. e OW attribute of this area is significantly higher than that of most other areas, and the jobs-housing mixing level is very high

Discussion
is study determines whether a correlation exists between RS images and an urban jobs-housing pattern on a relatively fine scale. We combine user data and RS image data and employ a multiscale random sampling method to address the multiscale issues in the image and the limited data problem. is study segments the land parcels containing the communities and surrounding environments instead of using the border of communities as a sample, which could consider the potential influence of environments on the function of land parcels. is study introduces the BagNet model and adjusts the parameters, by selecting an appropriate segmentation window size and applying the dropout mechanism, which effectively improved the fitting accuracy. e BagNet-33 model in this study produced excellent fitting results, which indicated that the DL model can be effectively applied to the analysis of urban mixed land use. e CNN derivative model BagNet is innovatively introduced to improve the accuracy of the results. is attempt was effective in applying the DL method in the mixed analysis of mixed urban jobs-housing patterns. Compared with VGGNet and ResNet, BagNet is more suitable for extracting socioeconomic information from RS images, namely, the spatial distribution of jobs-housing patterns.
is study identified a strong correlation between highresolution RS images and urban jobs-housing patterns. Using DL to mine high-level semantic information in highresolution RS image data, this study revealed a strong relationship between this semantic information and urban socioeconomic features. e mixed jobs-housing pattern was obtained with the constructed fitting model, showing that two different modes of observation, namely, "bottomup" (social perception) [26] and "top-down" (satellite remote-sensing) [48], are effective in representing urban socioeconomic characteristics.
In this study, the fitting results of user jobs-housing data at the parcel scale in Wuhan were obtained from the BagNet model and analyzed with an entropy calculation. e jobs-housing mixing level in OL areas is low, while that in OW and WL areas is high. Moreover, the rationality of the fitting results was demonstrated in the case area analysis. As the level of mixed land use is closely related to urban development [47], further development at the level of mixed land use is needed to promote economic growth. Furthermore, this model could also be applied to analyze the distribution of residents and the user portrait of communities, which would be helpful in urban planning and urban design.
Despite the strategic contribution to supporting urban development, this study has areas that can be improved. e data employed in this study comprise RS images and a user data set. Studies have suggested that urban socioeconomic information can be explored by coupling multisource social media data [49]. In the future, we could potentially improve the ability to infer mixed jobs-housing patterns by coupling multisource social media data. Also, during the training process, the effect of each parameter on the results is not quantifiable. e training strategy is designed based on experience and repeated experiments. Moreover, the computation time of training BagNet model is around 20 hours, and this study does not make a comparison on computation time between the proposed models because there is not a time requirement. Future works would focus on improving efficiency.

Conclusions
is study designed a DL model based on accurate mining of semantic information in high-resolution RS images, which reflected the mixed spatial distribution of a city. We determined that social perception data and RS images can be combined to reflect urban socioeconomic characteristics and further obtain a mixed jobs-housing pattern. Considering Wuhan as the study area, we show that the mixture level is relatively low. e government should plan and construct additional mixed functional areas to increase the level of mixed land use and to stimulate economic development.
is study is conducive to understanding urban complexity and optimizing the urban spatial structure and could be used for urban planning and governmental management.

Data Availability
Data are available on request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.