Study on a Stratified Sampling Investigation Method for Resident Travel and the Sampling Rate

This text first dissected the relationship between average travel frequency, trip mode structure, and the characteristics of residential areas. The results showed that conducting a stratified resident travel investigation in accordance with the characteristics of residential areas will yield samples with much smaller differences and reduce the investigation sampling rate. Accordingly, a new type of resident travel investigation method was put forward based on the above ideas. The stratified sampling rate formula and the sampling rate of each layer have been derived in detail according to Probability Theory and Mathematical Statistical Methods. Finally, for the main urban area of Kunshan City in Jiangsu province, China, we discussed the reasonable values of parameters in the formulas and obtained sampling rates for travel surveys in different residential areas. Theory and case studies illustrated the operability of this method and its advantages compared to random sampling.


Introduction
Resident travel investigation is one of the core topics of urban transportation planning, with the main items of inquiry including individual basic information (e.g., gender, age, job category, income, and whether one has a car) and their travel characteristics (travel starting and ending points, departure time, arrival time, travel purpose, and trip mode).In the past, America implemented house-interview surveys about transportation travel in middle-sized cities such as Tulsa, New Orleans, and Memphis in 1944, later estimating the total travel quantity and OD of the entire cities through sampling expansion [1].The investigation of these cities became the start of a resident travel investigation and analytical study.In 1962, the US Federal Highway Act stipulated that cities with more than 50,000 citizens must formulate comprehensive transportation plans on the basis of a comprehensive urban traffic survey.Only then they could obtain fiscal subsidies for highway construction from the federal government; thus, the comprehensive urban traffic survey was finalized in the form of law.It promoted the development of a resident travel investigation and study.Since then, the United Kingdom, Japan, and other countries have also carried out resident travel surveys in urban and metropolitan areas.Countries in Europe, as well as in America and Japan, have carried out so many investigatory practices that they have accumulated a wealth of experience in sampling rate design and formed some quantitative explanations about the sampling rate of resident travel investigations [2,3].Thus, at present, they have their own empirical formulas consistent with their national conditions and recommended values for the sampling rate (see Table 1) corresponding to cities of different sizes, which provided a basis for practice.
During the sixties of the last century in mainland China, a few experimental resident travel investigations in some cities were carried out in response to the impact of the former Soviet Union.However, there was no wide range of applications because of the limits to socioeconomic and educational levels; further, we lacked a complete and mature theoretical system at the same time.China did not start to attach importance to resident travel investigations until 1979, when Mr. Zhang Qiu, a famous Chinese American scholar, came to China to explain and publicize modern urban transportation planning theory, including a foundational resident travel investigation method and the "four-stage" prediction method.In the past 30 years, many cities in China have carried out resident travel investigations more than once.For example, Guangzhou has carried out citywide resident travel Resident travel investigation is more and more widely applied; however, at the same time, we realize that we still lack theoretical studies on the investigation methods of resident travel.The postcard survey method and telephone inquiry introduced in advanced countries can hardly be adapted to China's national conditions.In the past 30 years, researchers in mainland China have always used the family visit survey method.The survey process consists of determining the investigation scope first, dividing the field of investigation into several traffic analysis zones, extracting a certain number of families randomly from each zone according to the sampling rate set beforehand, giving out questionnaires to all the family members of the extracted families that meet the conditions (i.e., have the ability to travel independently and are more than 6 years old) and finally receiving and sorting out the questionnaire forms.Almost all surveys are random sampling investigations, and the capriciousness of the sampling rate is relatively large, generally in the range of 2-4% (see Table 1).Many experts have doubted the scientific quality of such a sampling rate now and then; however, in practice, few people are willing to use the relatively high recommended values of Europe and America, especially in cities with populations of less than one million.
It should be noted that people's travel situations are extremely different in regions with different economic levels and development phases.For a developing country such as China, with rapid economic growth and continuous urban land use expansion, people's living standards are far from equal, and travel characteristics are also very different; thus, we theoretically need a relatively high sampling rate.However, if we can find some characteristic parameters with significant differences and then classify or stratify them, then the sampling rate will decrease correspondingly to address the high investigatory costs.Nevertheless, it is urgent that scholars can clearly put forward the sampling rates for cities of different population sizes to balance scientific quality and investigatory costs.With the family visit survey method still commonly used in China, this text will put forward a stratified sampling method aimed at the differences in the characteristics of residential areas with the background of rapid urbanization in China, examining the selection and setting of the sampling rate.It is expected that the results can improve the present situation, in which mainland areas of China lack methods for determining the sampling rate of resident travel surveys.

Main Factors Affecting Size of the Sampling Rate.
The sample size is mainly decided by the following [6,7]: (1) the degree of variation of the survey objects; (2) requirements and the allowable size of error, that is, accuracy requirements; (3) the required confidence coefficient, which, in general, is taken as 95%; (4) the population; and (5) the sampling method.The more complex the studied question is and the larger the degree of variation is, the larger the sample size must be.The higher the required accuracy is, the larger the sample size should be.The larger the population is, the larger the corresponding sample size should be, but the relationship is not linear.At the same time, the sampling method also determines the sample size.Commonly used methods include random sampling and stratified sampling.
The concept of the former is obvious.So-called stratified sampling means dividing the parent population into several types or layers and then sampling randomly from each layer, not sampling randomly directly from the parent population.The advantage of this method is it narrows the difference between different types of individuals through classification, which is conducive to extracting representative samples and reducing the sample size.Therefore, compared with random sampling, stratified sampling has relatively remarkable advantages, but the precondition is how to classify [8,9].

Proposal of Stratified Sampling
Method.The research of this text focuses on stratifying based on the characteristics of residential areas and so below focuses on the link between the characteristics of residential areas and resident travel characteristics.
Resident travel investigations include many items, but the most important data needed in transportation planning should be the average frequency of trips per capita (if all members of a family are investigated, then it should be the average frequency of trips per family), trip mode structure, and resident travel OD matrix.If we execute sampling surveys with the same proportion for the whole region, then the all-day OD matrix should be the OD matrix, which is obtained from the investigation, divided by the sampling rate.If we execute sampling surveys with different proportions corresponding to different traffic analysis zones, the all-day OD matrix should be the integrated data of all zones through the OD travel data surveyed for each zone divided by the corresponding sampling rate; thus, we can also obtain the complete OD matrix.Therefore, the complete OD matrix has nothing to do with whether we use the same sampling rate; perhaps the method of using the same sampling rate in the whole region in the past is too "rigid." Thus, is the sampling rate meeting the accuracy conditions selected according to the regional characteristics of each zone practicable?Next, we will further analyze the two important sets of indicators of the frequency of trips per capita (or family) and trip mode structure.
In China, newly developed houses are usually equipped with sufficient parking lots, and the house prices are relatively high, which makes both the household income of families moving in and the car ownership rate generally high.Therefore, the proportion of car travel in such residential areas is relatively high.In addition, for such families, because of the high cost of living and the busyness of work, the number of home-based work trips is low, whereas the number of nonhome-based work trips is relatively high and the number of leisure and entertainment trips is also relatively high.In general, the frequency of trips per day is higher.Under normal circumstances, there are two types of other residential areas in a city.One is multilayer dwelling houses built a certain period of time ago (such as the residential buildings built in the eighties and nineties of the last century), and the other is one-story houses or neighborhoods built even earlier (generally located in the city's Old Town District, such as Hutong in Peking).For the former residential area, the desire of residents there to buy cars is enormously smaller; thus, the proportion of trips by car is not high, and the low proportion of car ownership is an important reason for the relatively low elastic travel frequency there; therefore, as a whole, the average frequency of trips there is lower than that of newly developed residential areas.For the latter, it is more difficult to park cars, and the residents there are mostly the aged; thus, the proportion of trips by car in such areas is extremely low.Some residents only drop around and walk within the neighborhood for shopping.Walking and bicycle travel time do not meet the basic conditions of the scientific definition of a trip; so the average frequency of trips there is relatively low.
From the above analysis, it is not difficult to observe that the average frequency of trips and the trip mode structures of these three types of residential areas (i.e., newly developed areas in the city, multilayer dwelling houses built a certain period of time ago, and neighborhoods in Old Town) are totally different; thus, it is very appropriate to create reasonable divisions according to their differences and use the stratified sampling method to do resident travel investigations.This text proposes a residential-area stratified-based sampling investigation method for resident travel.Because we still use random sampling to select the respondents of each layer after classification, in the following paragraphs, the potential variance of random sampling for each layer and other parameters are taken into consideration in the study of the stratified investigation sampling rate.

Study on the Sample Variance of Each Layer.
Obviously, the sampling rate of each layer is related to the variance of variables in the layer.Next, we will discuss the variance for random sampling for each layer.To use the Mathematical Statistic Method to study the sampling rate, we should introduce a random variable; here, we use a relatively simple and intuitive variable, daily average frequency of trips, as the random variable.We use capital letter  and lowercase letter  to, respectively, stand for the parent population and the sample. = (1/) ∑  =1   stands for the population mean, and  = (1/) ∑  =1   stands for the sample mean.For random sampling, when there is no population information that can be used, we can take  as the estimate of .To obtain the variance formula of , we should introduce the following two lemmas (due to the limited space, see literature [10] for the reasoning process).
With the above two lemmas, we can prove that, for random sampling, the variance of  is as follows: where  2 is the population variance of , which can be replaced by the sample variance  2 during calculation: To prove [13], we introduce random variable   (the meaning is idem).Thus, ) . (3)

Study on Stratified Survey Sampling Rate.
We continue to take the daily average frequency of trips as the random variable.According to formula (2), we can obtain the following variance estimator formula of population mean [14][15][16]: where ( st ) is the variance estimator of the population mean  (st stands for stratified); ℎ is the total number of units of layer ℎ;  ℎ is the sample mean of layer ℎ;  ℎ is the sample size extracted from layer ℎ;  ) .
The mathematician Feller had proved the following [17,18]: for any infinite population, the distribution of its sample mean tends to be a normal distribution as  increases, as long as its standard deviation is limited; thus, was derived.We rewrite formula (7) as ( st ) = (/  ) 2 , and we can obtain the following formula by combining formulas ( 7) and (4): We usually use Neyman Allocation (or Optimum Allocation) to concretely determine the population sampling rate and stratified sampling rate [19].The resident travel investigation fee follows the following formula:  =  0 + ∑  ℎ=1  ℎ  ℎ , where  ℎ is the average investigation fee of each sample unit of layer ℎ and  0 is the fixed fee.Then, we can obtain the Optimum Allocation of each sample size as follows: The Optimum Allocation  ℎ is obviously directly proportional to  ℎ and  ℎ and inversely proportional to √ ℎ .Thus, the sample size in a layer should be larger when it contains more, the degree of variation in the layer is larger, or sampling from that layer costs less.In the investigation of resident travel, the investigation fee of each sample is almost the same; so formula (9) can be simplified to the following: According to formulas ( 5) and ( 6), we can obtain the following: The population sampling rate can be obtained by combining formulas ( 8) and ( 11): If the  ℎ and  2 ℎ of each layer are known, the population sampling rate can be found according to formula (12) under a certain accuracy (i.e., ,   ); thus, the sampling rate of each layer can be found according to formula (11).

Example
We will illustrate the operability of the research method with the example of the main urban area of Kunshan City, Jiangsu Province, China.The main urban area is the center of Kunshan City and the city's business service center, which serve the important functions of education, health, culture, finance, administration, and so forth.The total area is approximately 6.1 square kilometers, with a population of approximately 134 thousand.We will find the population sampling rate and stratified sampling rate according to the resident travel investigation method, which is based on stratifying the residential area.
4.1.Investigation Method.The main urban area includes 24 communities with an average population of approximately 5500 (see Figure 1).At present, the residential areas in the main urban area of Kunshan City can be classified into three categories, that is, newly developed areas (first category), residential buildings built from 20 to 30 years ago (second category), and one-story houses constituting old neighborhoodtype residential areas (third category).According to the field survey, we classify the 24 communities as follows according to their characteristics: Gaobanqiao, Xinkun, Yufeng, Daximen, Baimajing, and Cangjijie are lumped into the first category; Chaoyangmen and Zhengyang Road are lumped into the third category; and others are lumped into the second category.We use the family visit survey method and give out questionnaire forms, taking the community as a unit.We must be sure to give out questionnaire forms evenly to all plots and residential buildings to improve the representativeness and make each part of the parent population included uniformly in the sample.

4.2.
Determining the Sampling Rate.According to (11) and (12), ,   ,  ℎ , and  2 ℎ are the key parameters for the calculation of the population sampling rate and stratified sampling rate.These parameters will be discussed one by one in the following paragraphs.
The relationship between the allowable absolute error  and the relative error  is  = , where  stands for the sample mean of the control indexes.For different control indexes, the sampling rate is usually different.For the daily average frequency index of trips selected above, the relative error  is generally less than 1%.The daily average frequency of trips is usually between 2.4 and 2.8; so  can be from 0.024 to 0.028.  is the bilateral  quantile of the standard normal distribution, which has something to do with the confidence coefficient.The confidence coefficient is usually 95%; thus,   can be 1.96.Based on the statistic population of each community, we can obtain the population layer weight for each residential area, namely,  1 = 0.22,  2 = 0.68, and  3 = 0.10.
There are two common ways to determine  2 ℎ .One is to carry out a preinvestigation in a small area and obtain the variance through the analysis of data from the preinvestigation; the other is to obtain the variance through the analysis of the existing resident travel data from similar cities.The essence of the two methods is the same; the former increases the investigation cost, and the latter requires a large number of known data.The stratified sampling investigation method is rarely used in mainland China.Moreover, it is possible that the information about the characteristics of residential areas in similar cities was not obtained synchronously during the investigation; thus, it is difficult to use the second method.We chose one community from the three types of residential areas in the main urban area of Kunshan City and then carried out a preinvestigation; we gave out 810 questionnaires in total, and 689 effective forms were received.Statistical analysis showed that the three variances are 0.85, 1.4, and 1.0, respectively.
Using formula (12), we obtain the population sampling rate  = 5.30%.According to formula (11), the investigation sampling rates of resident travel for the three types of residential areas are 4.41%, 5.66%, and 4.80%, respectively.Of course, we had to be sure to pay attention to form omissions, work faults, and so forth in the process of sampling for the survey, form recycling and computer input, which may cause a reduction in the actual sampling rate; thus, we should increase the preliminary designed population sampling rate and stratified sampling rate appropriately.If we only take the 15% impairment of the preinvestigation segment into account, the population sampling rate is 6.24%.The investigation sampling rates of resident travel for these three residential areas are 5.19%, 6.66%, and 5.65% and the numbers of people investigated are 1522, 6069, and 767, respectively.

Conclusions
Comparing the sampling rate of the main urban area of Kunshan City with Table 1, we find that investigation workload is greatly reduced in the method proposed in this text compared with the requirements of North America on sampling rate.However, we can see that certain cities in mainland China were somewhat careless when they determined the investigation sampling rate.For instance, For Taicang City, which is of a similar size, even with random sampling, its sampling rate is smaller than the stratified sampling rate of Kunshan City calculated in this text.The main advantages of stratified sampling are that parameter estimation of each layer can be obtained; the sample for stratified sampling is more representative than that for random sampling, thereby improving the accuracy of the parameter estimation; and it greatly reduces the investigation sample size compared with random sampling.Both random sampling and stratified sampling require a certain workload of preinvestigation beforehand to obtain the variance and to further determine the sampling rate.
The resident travel survey is the most basic and most important investigation in urban transportation planning.The main aim is to obtain individual travel characteristics to understand the laws of travel activity.Resident travel surveys require a great deal of manpower, physical resources, and financial resources.The investigation cost per capita in mainland China is approximately from RMB 60 to 80 Yuan.It is much higher in the 14 cities in America, including Los Angeles, San Francisco, and Chicago, where it costs up to 90 US dollars per capita [20,21].Completing the investigation at a minimum workload and cost under the precondition of satisfying project accuracy is the desire of every person in charge of travel surveys and planning projects.Based on differences in the characteristics of residential areas, this text puts forward the concrete idea of stratified sampling and provides methods of calculation for the population sampling

Table 1 :
Com5]rison of the sampling rates of urban resident travel surveys in China and abroad[4,5].
investigations four times, in1984, 1998, 2003, and 2005, which reflects the degree of attention large cities and megacities have paid attention to this work.On the one hand, resident travel investigations are integral to Urban Comprehensive Transportation Planning; on the other hand, some cities, such as Beijing, Shanghai, and Nanjing, have paid growing attention to the evolution of urban transportation supply and demand, stressing the formulation of an annual report of urban transportation development.Therefore, a certain scale of resident travel investigations is required for many cities every year.And now, Urban Master Planning also introduces resident travel investigations as the basis for transportation monographic studies.