Capturing the Characteristics of Car-Sharing Users: Data-Driven Analysis and Prediction Based on Classification

.is work explores the characteristics of the usage behaviour of station-based car-sharing users based on the actual operation data from a car-sharing company in Gansu, China. We analyse the characteristics of the users’ demands, such as usage frequency and order quantity, for a day with 24 1 h time intervals. Results show that most car-sharing users are young and middle-aged men with a low reuse rate. .e distribution of users’ usage during weekdays shows noticeable morning and evening peaks. We define two attributes, namely, the latent ratio and persistence ratio, as classification indicators to understand the user diversity and heterogeneity thoroughly.We apply the k-means clustering algorithm to group the users into four categories, namely, lost, early loyal, late loyal, and motivated users. .e usage characteristics of lost users, including maximum rental time and travel distance, minimum percentage of same pickup and return station, and low percentage of locals, have noticeable differences from those of the other users. Late loyal users have lower rental time and travel distance than those of the other users. .is manifestation is in line with the short-term lease of shared cars to complete shortand medium-distance travel design concepts. We also propose a model that predicts the driver cluster based on the decision tree. Numerical tests indicate that the accuracy is 91.61% when the user category is predicted four months in advance using the observation-to-judgment period ratio of 3 :1. .e results in this study can support enterprises in user management.


Introduction
e increasing number of vehicles has caused congestion and deterioration of the environment in many large cities worldwide. A new and convenient energy-saving transportation mode called car-sharing with full-electric vehicular fleets has emerged with the development of mobile internet technology to alleviate these traffic issues [1]. Car-sharing is an autonomous car rental mode in which users can use cars for a short period of time by borrowing them from their operators instead of acquiring ownership [2]. Car-sharing users only need to use smartphones to complete a series of self-registration, unlocking, payment, and other car rental programmes on a fixed site. Accordingly, car-sharing can meet users' temporary personal demands satisfactorily. e car-sharing mode has remarkable social benefits. For instance, it can improve the utilisation rate of vehicles and decrease the number of vacant resources effectively; moreover, this mode is conducive to the reduction of car ownership to a certain extent [3]. Martin and Shaheen [4] found that each shared car could replace six to seven private cars on average. Nijland and Meerkerk [5] found that car ownership among car-sharing users had decreased by more than 30%. e car-sharing travel mode can reduce users' demand for parking; thus, it can alleviate the shortage of public resources, cut down user travel cost, and reduce unnecessary vehicle travel distance [6]. Wang et al. [3] found that more than 70% (50% on average) of car-sharing users cut their journeys with a shorter travel distance. Nijland and Meerkerk [5] found that the vehicle travel distance of private car owners was decreased by 15%-20% after the introduction of the car-sharing system. Loose [7] showed that the carsharing service could reduce 28%-45% of vehicle mileage. e energy cost and emissions will be greatly reduced because of the common launch of clean-energy vehicles by carsharing operators. Loose [7] reported that each car-sharing user reduces carbon dioxide emissions by 39%-54% on average. Nijland and Meerkerk [5] found that user's carbon dioxide emission decreased by 13%-18%. e site-based car-sharing model is a popular model in many areas and is convenient for operation and management; many studies on car-sharing are also based on the sitebased model, where the user's age is mostly distributed in the range of 20 to 30 years [8]. Some scholars found that incorporating electric and nonelectric vehicles into a fleet increases users' interest and participation because most users prefer hybrid electric vehicles under the same conditions [9]. A study found that car-sharing users prefer environmentally friendly vehicles [10] when selecting their own car. is notion indicates that developing car-sharing promotes the purchase of new energy vehicles to a large extent. Longdistance travellers do not opt for electric vehicles or plug-in hybrid ones because of the mileage limitation of new energy vehicles [11], and 83% of travellers use car-sharing for shortdistance travel [9].
Some scholars have analysed the factors that affect the demand and willingness of car-sharing on the basis of the user model; numerous studies have been carried out to develop various logit models, for example, the multiple logistic regression model [12], ordered logit model [13], and binomial logit model with sequence correlation (Greeks. [14]).
ese studies, which focused on the impact of demographic characteristics and travel attributes on the willingness to use shared car, reported that environmentally conscious low-and middle-income people are willing to use car-sharing. e usage characteristics and demand of carsharing users vary across different regions [10]. Car-sharing demand is concentrated in specific time periods and regions [8]. Accordingly, parking demand variations are related to geographical areas and parking types [15]. Regional attributes greatly affect the user's demand. Urban users have lower car ownership, and suburban ones have more trips than nonusers.
Describing users' usage behavioural patterns is a key issue in the car-sharing field; scholars can divide all users into active and sporadic ones based on their usage activity [16]. Users can also be divided into five clusters by combining user attributes [17], wherein most long-term users go on temporary short-or middle-distance travel. e usage of sporadic users is more likely to occur on weekends with high travel and time lengths. Hui et al. [18] realised the cluster of the travel chain by using the travel distance, home-based travel chain, and parking time in a certain place. e authors found that users have varied travel purposes under different travel chain modes. Predicting the dependent variables effectively and analyzing the influence of various attribute factors also have important practical significance. Habib et al. [19] proposed a user behaviour econometric model. Such model can predict the duration of users' continuous usage, determine the month in which users become active members, and estimate the quantity of active members' usages each month. Some researchers have analysed users' vehicle selection behaviour and influence of various attribute factors and proposed vehicle selection models, such as multiple discrete continuous extremum model [20], accelerated failure model, and space hazard-based model [21].
In the free-floating car-sharing mode, the distance between the user and the vehicle influences the possibility of selecting a vehicle [22]. e reasonable layout of the charging station is also the research focus. Schussler and Bogenberger [23] investigated the charging behaviour of different user groups and provided a strategy to determine the locations of public charging stations. Space attributes are emphasised in the cluster analysis of usage patterns because the spatial location is an important issue for users to consider the use of shared cars [24]. Although most car-sharing organisations have used hybrid and low-emission vehicles, several users are unaffected by the mileage limitations of battery electricity vehicles (BEVs) because they can meet most of the travel demands of users (when 80% of the travel distance is less than 20 km) [22]. Accurate prediction of order quantity can provide a practical significance for operation. Müller et al. [25] developed a negative binomial statistical model to predict the reservation quantity. e related influencing factors must be determined to predict the car-sharing demand accurately. Wang et al. [26] recently discovered that three factors, namely, selection behaviour of the car-sharing mode, maximum acceptable price for carsharing, and willingness to give up buying a car, were influential on Chinese individual user acceptance. e car-sharing mode has been developed for more than 20 years in some countries. Nevertheless, car-sharing is only emergent in recent years in China. To date, this mode is only officially operated in more than 10 large-and medium-sized cities in China. Car-sharing is a newly developed transportation mode that is beneficial in solving traffic-related problems. is mode also brings huge opportunities for carsharing companies [27,28]. e car-sharing mode in China is still in its initial developing stage; thus, many issues exist and should be addressed. In particular, most car-sharing companies cannot capture people's usage demands accurately. Consequently, these companies cannot identify potential users and retain high-value ones. e widespread geographical distribution of residents in China, which is characterised by differences in urban development and cultures, results in distinct characteristics of the user's travel behaviour. Consequently, the characteristics of car-sharing users in different regions of China must be deeply understood. However, only several studies have addressed this issue. To this end, this study analyses the rule of users' demand on the time axis based on the actual operational data of a car-sharing company in Gansu. e demand of different user categories is examined thoroughly. A category prediction model is developed to realise accurate advanced prediction of user categories. is work proposes to find a balance between predictability (longer time span between the current and the forecast points) and accuracy by dividing the observation and judgment periods for the first time effectively. is model can provide data support for operators' dynamic resource management. e rest of this work is organised as follows. In Section 2, we clean the acquired data and explore the rule of car rental and car return on a 24-h time axis. In Section 3, we use kmeans clustering to divide all users and comparatively analyse the usage attributes of different user categories for determining various usage behaviour. In Section 4, we use C 5.0 decision tree to develop the user classification prediction model. Such model can predict the user category in advance based on the user's partial usage attributes during the observation period. Finally, in Section 5, we present the conclusions.

Car-Sharing Data Analysis
e data are from a car-sharing company in Gansu, China, which was established in 2017 and provided a one-way station-based car-sharing service. e user can return a rented car to any car-sharing station, which may not be the origin station.
Users only need to register their personal information and pay a deposit once during the company's application.
ereafter, the users can complete a series of loan-return operations at a fixed site without managers. In September 2018, the company has a total of 1272 car-sharing stations and 655 shared cars, including 5 types of pure electric vehicles. e vehicle types E200, ZHIDOU2, EC200, and Lease Edition are economical, and E5 is comfortable. e data contain the car-sharing rental order information of the company from May 2017 to September 2018 with a total number of 290,266 transactions. Table 1 shows the attribute contents of the acquired order data.
We initially clean the data by deleting duplicate orders and preoperation test data with an actual travel distance less than 1 km, which was created before the actual operation, to ensure their authenticity. In summary, 18,501 records are deleted. e remaining data contain 271,765 records. A total of 10,345 car-sharing users are available from May 2017 to September 2018.
A user may have several orders. erefore, we conduct the analysis from the perspectives of the order and user. First, the analysis is performed in terms of each of the order data. We can analyse the usage pattern in different time periods of one day and observe whether a peak period of renting and returning of shared cars exists. Second, we can deeply analyse the individual characteristics from the user's perspective. Specifically, we analyse the multidimensional usage characteristics presented by the same user through multiple orders. e usage attributes of each category of users are analysed comparatively in detail after the users are classified.

Car-Sharing Order Data Analysis
2.2.1. Analysis of User Information. We use 10,345 users' registration information of car-sharing to analyse their attribute characteristics. Only the age and gender are selected in the following analysis because of data limitations. Figure 1 shows the distribution of the user's age. e users' age spans are large. Most users are distributed in the age range of 22-38 years, with a proportion of 78.90%. Furthermore, only 1% of users are older than 57 years. Young and middle-aged people have a high tendency to use shared cars. Figure 2 reveals the gender ratio of the user. e proportion of male users is 81.25%, whereas that of female users is 18.75%. e number of male users is almost 4.3 times that of female users. erefore, a man has greater possibility of using shared cars than a woman.

Order Quantity.
We draw a trend chart of the cumulative quantity of daily orders to understand the usage of car-sharing and provide an accurate grasp of the market prospects ( Figure 3). e company's cumulative daily order quantity remains basically stable before February 2018. A trough is observed during the Spring Festival in February. Starting from March 2018, the order quantity rapidly increased until June 2018 and then remains basically stable again. e results of data observation and analysis indicated that the order quantity displays periodic fluctuations, and the two peak points correspond to Saturday and Sunday. To this end, the subsequent analyses are based on a weekly cycle.
We focus on the variation of demand within a day. At present, we have 16 months of order data, which include 74 weeks. Considering the periodic change rule of order demand in a week, we superimpose the data from Monday to Sunday separately by calculating the correlation coefficient between car rental and return from the aforementioned days (Table 2). e result showed that the similarity coefficient between weekdays, between weekends, and between weekdays and weekends is as high as 0.99, 0.9, and approximately 0.85, respectively. Evidently, the distribution from Monday to ursday is similar; hence, we regard the average of weekdays as a research object and the average of weekends as another object. erefore, we take the mean value for subsequent analysis, and a day is divided into 24 1 h time periods in detail. e number behind the diagonal line corresponds to the similarity coefficient of the number of car returns. Figure 4 shows the variations of rental and return demand during weekdays and weekends. e demand of car rental on weekdays has two evident peak time points. e time point of the early peak is 7 : 00 am, whereas that of the evening one is 5 : 00 pm. e car rental demand during the morning peak on weekends is approximately 1.8 times than that during weekdays. e high trend of car rental during weekends continues until 5 : 00 pm and then presents a downward trend. No rapid growth occurs during the evening peak. us, users have greater and more dispersed demand for car-sharing during weekends than that during weekdays. e car return demand during weekdays also has two evident peak time points and lags 1 h behind the car rental demand. Such demand increases on weekends from the morning until its peak at 6 : 00 pm and then begins to decline. e cumulative quantity of returning cars after 4 pm on weekends is higher than the car rental demand. is time period may be the end time after users' short trips on weekends.
On the basis of the abovementioned analysis, we can clearly recognise the peak period of car rental and return demand in one day. e operators can reasonably dispatch the vehicles by using the peak periods of renting and returning of the shared cars at each station and the number of parking spaces and other information. For the users, it is possible to understand the usage demand of the other users in advance and is beneficial to plan the rental time in advance.

Orders of Each Car-Sharing User.
Considering that car-sharing is a new type of transportation mode, numerous issues still should be carefully addressed, particularly for the characteristics of users. To this end, this section attempts to analyse the characteristics of user demand in terms of the transactions for each car-sharing user. Figure 5 shows the frequency distribution of the number of orders for each user. Users who have only used shared cars once have a large proportion (nearly 2000 people).
e number of users gradually decreases as the number of orders increases. In summary, 20% of the users only use shared cars once, 50% of the users have more than seven orders, 30% of the users have more than 20 orders, and only 10% of the users have more than 72 orders. A small proportion of the users have highusage frequency. Nearly one-third of the users only have one to two orders, and these individuals are regarded as smallvalue users. e targeted activities for such users should be considered to stimulate user cost and promote the popularisation of car-sharing.

Clustering of Users
Given that diversity is a basic characteristic of users, this section classifies users based on the attribute data of users. We use the k-means clustering method to classify all users into different categories according to the two proposed user    usage indicators. On this basis, we further analyse the representative usage attributes for the different user categories. is analysis is expected to be helpful for operators in designing the corresponding service strategies to fit the usage habits of different user categories.

Cluster Algorithm.
Clustering is an important method in data mining. is method is a process of grouping physical or abstract objects into clusters. e objects within a cluster have high similarity, and those between clusters have low similarity. K-means clustering is an excellent and simple method for data mining. erefore, this study uses the kmeans clustering method to classify car-sharing users. e k-means algorithm, also known as fast clustering method, has good scalability and efficiency and is, thus, appropriate for processing large datasets. e specific steps of the k-means algorithm are explained in detail in the Appendix section.

User Clustering.
is study performs clustering to classify users with different loyalty effectively. Operators can determine the usage rules, thereby helping them improve the loyalty of users, prolong their use span, and create high value.
e current user attributes cannot measure users' loyalty. erefore, we use the k-means clustering method with two proposed indicators that measure the loyalty of users to cluster all users. We summarise the user attributes used in later definition to describe the classification indicators clearly (Table 3). e first indicator refers to the latent rate (LR). is indicator measures the time node when a user enters the carsharing market. A user will use the shared car late when the value is large. LR can be calculated as follows: where LP is the latent period, which refers to the time span from the statistical start time to the first use, amongst which the user has yet to utilise a shared car; and LC is a life cycle and is a fix value that measures the time span of the statistic. ese two indicators can be calculated as follows: LP � (first time − start time) + 1, e second indicator refers to the persistence rate (PR). is indicator represents the time proportion of a user that exists in the car-sharing market. Such indicator can be used to measure the loyalty of users and the persistence of their demands as follows: where DA is the duration, which represents the time span of a user's first use of shared car to the last use; and SP is the sustainability period, which is the time span from the first use until statistical end time. ese two indicators can be calculated as follows: DA � (last time − first time) + 1,    e value of the two proposed indicators is within the range [0, 1], which avoids the problem caused by the difference in magnitude between the two indicators. ese two indicators are directly used as input attributes of the k-means cluster algorithm. e final number of classification categories is determined by combining the DBI indicators and is used to measure the classification effect. e final aim is to achieve high similarity within classes and low similarity between classes. Specific calculation methods of indicators are shown in the Appendix. Table 4 shows the indicator results under different user classifications. e results in Table 4 manifest that the users can be divided into four categories (Table 5) with low DBI values. For statement convenience, the four clusters are named as lost users, early loyal users, late loyal users, and motivated users. Table 5 exhibits that the characteristics of different clusters are substantial. ese characteristics are discussed as follows: (i) e users of Cluster 1 have a relatively small latent and persistence rate values. is notion indicates that the users enter the market early but did not continue to use shared cars in the later period of statistics. Specifically, the users will no longer utilise shared cars after a short-term concentrated usage. erefore, we define such individuals as lost users with short-term demand or just-for-trial usage. e proportion of the number of lost users is only 19.79%. e total cost of lost users is the second lowest, accounting for 11.07%, who belong to lowvalue users. (ii) Cluster 2 is named as early loyal users. e users of Cluster 2 have a maximum persistence rate of 0.8796 and a small latent rate of 0.1605. us, the users have begun to use shared cars early and have a long usage duration. e early loyal users account for 19.00% of the total users. e created cost proportion is 49.02%, who belong to high-value users.
(iii) e users of Cluster 3 enter the market late with a high latent rate of 0.7474. However, the persistence rate is up to 0.8457, only second to the early loyal users, which is a user category that maintains high demand. We define these individuals as late loyal users based on the abovementioned analysis. e quantity of late loyal users accounts for 26.93%. e cost accounts for 32.57%, who belong to high-value users.
(iv) e users of Cluster 4 have the highest latent rate and the smallest persistence rate. Considering that Cluster 4 is the last users to enter the market, its usage characteristics are only partially displayed. ese users also tend to keep high loyalty by operators and take incentives for stimulating users in utilising shared cars. Accordingly, these individuals are defined as motivated users, and their number is the largest amongst the four user categories. A total of 34.28% of the total users make 7.34% of the total cost. e value created by motivated users is low because of their short duration in the car-sharing market.

Characteristic Analysis for Users of Various Clusters.
is section analyzes and compares various attributes to understand the characteristics of different user categories deeply.
ese attributes include average order quantity, average rental time, average travel distance, percentage of same pickup and return station, percentage of locals, and working day ratio.
We need to eliminate the problem caused by the difference in magnitude between these attributes to draw all of them on a graph conveniently and realise the comparison of attributes amongst different users. us, we normalise these indicators by adopting the maximum-minimisation method: where x, x max , and x min are the attribute, maximum, and minimum values, respectively. Consequently, the normalised data are within the range [0, 1]. Figure 6 shows the distribution of multiple usage attributes for different users. e classification of users in the average order quantity is the same as that in the percentage of locals. e loyal users generate additional orders with the increase in the percentage of locals. e percentage of locals of lost users is relatively low, that is, outsiders are likely to lose. e rental time and travel distance of the lost users are higher than those of the other users. In combination with the aforementioned analysis, the order quantity of the lost users  Journal of Advanced Transportation is less, but the demand for the time and distance is high. Late loyal customers tend to travel short time and distance. e short-term and short-distance rental characteristics of carsharing are gradually accepted by people. e analytical result of users' rental and return vehicle stations indicates that loyal users tend to rent and return vehicles at the same station. By contrast, lost users tend to rent and return vehicles at different stations. e working day ratio of motivated users is lower than that of the three other user categories, indicating that such users are likely to use shared cars during weekend.

Prediction Model for User Classification
Car-sharing, as an emerging transportation mode, and operators should pay extensive attention on the early prediction of user categories and targeted measures. To this end, this section develops a prediction model that captures the characteristics of a user in advance based on the decision tree.

Decision Tree Prediction Model.
is study develops a prediction model for user classification based on the decision tree. e decision tree is a classical nonparametric classification model that can predict a new sample set by summarising and refining the existing data inclusion rules in terms of the existing training set. is model is characterised by the good anti-interference of the extremum. e decision tree has excellent data analytical capabilities and intuitive visual graphical display. e amount of data used in this work is large. e partial variables are nonlinear, and some user attributes have outliers. erefore, we use the decision tree model with good tolerance and interpretation ability. We use the C 5.0 algorithm to develop a decision tree. Formula (6) presents the expression of the developed decision tree model.
where i represents user i, x i represents the usage attribute of user i, f DT represent the algorithm of the decision tree, and class(i) represents the user's final prediction category. e C 5.0 algorithm introduces the self-adaptive enhanced boosting technique. is technique can iteratively generate a series of decision trees by increasing the sample probability of the misjudgment of the previous decision tree. Finally, all the decision tree models are combined together for classification prediction. is algorithm can improve the accuracy and enhance the robustness of the model. e growth algorithm of C 5.0 adopts the branch rule based on the information gain rate to find the optimal grouping variables and segmentation points.

Performance Indicators.
, where accuracy model is the prediction accuracy of the model, Accuracy class(m) is the prediction accuracy of the users whose user category is m, k represents the k categories of users, N(m) represents the number of users whose actual user category is m, and N(m, m) represents the number of users whose actual user category is m who also predicted to be m.

Period Definition and Division.
e data acquisition period of user attributes is long, and the final prediction accuracy is high when the starting point and user prediction point of car-sharing life cycle are fixed. However, in reality,   we aim to judge the user categories early. In this case, we cannot obtain long-term usage data, that is, the high predictability corresponds to few data. erefore, we must determine the number of months in advance to predict whether users can determine the optimal balance between predictability (short data cycle) and accuracy. is problem is investigated in this section.
Before the user classification prediction model is developed, the datasets should be preprocessed, and the observation and judgment periods should be divided reasonably. e users have already used shared cars during the observation period. Meanwhile, the judgment period is assumed to be the future time relative to the observation period. We predict the users' categories after the end of the judgment period. Specifically, the final proposed model predicts the users' category in the judgment period by using the multiple usage attributes of the user during the observation period as input. In this section, we need to find a balance between the observation and the judgment periods to provide practical support for future user prediction model construction.

Input Attributes.
is study develops the prediction model with the judgment period from 1 month to 8 months by inputting user attributes. e users' data are divided into 70% training and 30% testing sets to obtain the optimal time division rule according to prediction accuracy. A total of 10 decision trees are constructed and combined with adaptive boosting technology.
In the initial analysis, the input attributes of the model only include two indicators, namely, the latent ratio and persistence ratio. ese two indicators are used in the clustering method. In the next analysis, we add the users' usage attributes to obtain the optimal model.

Output Category.
e final model output is the user prediction category. is output consists of four categories, namely, lost, early loyal, late loyal, and motivated users.

Framing and Testing.
Pruning branches and leaves is a method used to overcome noise effectively.
is method simplifies the decision tree and makes its structure easy to understand. Such method can also improve the classification efficiency and accuracy. Undercutting and overcutting reduce prediction accuracy. erefore, the key to constructing a reasonable and efficient decision tree model is to select a suitable degree of pruning according to the selection of the pruning severity. In this study, we select 75 as the centre point and set different degrees of pruning with a standard deviation of 10. e optimal pruning degree is determined when the prediction accuracy of the model is high. e boosting technology can improve the prediction accuracy of the model effectively. However, such technology can also cause the model to over fit. We use the optimal time division as the model basis for determining the optimal number of iterations of the boosting technology and set different iteration times to construct the model. e optimal number of iterations is achieved when the classification prediction accuracy is high. Figure 7 shows the trend of the model accuracy with different user attributes under various judgment periods. When numerous usage attributes are used as input, the model accuracy is high. e attributes include average order quantity, total cost, average rental time, average travel distance, percentage of same pickup and return station, percentage of locals, working day ratio, average time-to-cost ratio, maximum rental time, and maximum travel distance. ese attributes are analysed in detail in the third section. If the length of statistical period changes, then the values of these attributes will change in magnitude. ese input attributes are standardised by adopting the maximum-minimisation method to make the model universal and usable.
Each prediction point in the graph is the result of the optimal model under this type of condition. e observation result manifests that the prediction accuracy has been greatly improved by predicting user categories four months in advance, and then it gradually increases. is time point can enhance forecast of users four months in advance and can achieve an optimal balance between the data volume and prediction accuracy. At this time, the observation-to-judgment period ratio is 3 : 1.
After the model is run according to the classification principle of the decision tree algorithm, the importance degree of each attribute of the optimal decision tree on the classification prediction result can be obtained and sorted on the basis of size. e mean and variance of each user's attributes, which rank the top six in the importance of user attributes, are calculated and separately expressed by M and S ( Table 6). We use unprocessed data for statistical calculation to display the actual value of input attributes fully. us, the most important impact attributes in the model construction include the persistence ratio and latent ratio, followed by the average order quantity, the maximum travel distance, the maximum rental time, and the average travel distance.
In the final optimal model, the training set has a prediction accuracy rate of 90.87%, and the test set has an  accuracy rate of 91.61%. Table 7 shows the prediction accuracy of the four user categories on the training and test sets.
On the basis of aforementioned research, the multidimensional usage attributes can be used as the input to construct the classification prediction model four months in advance. is model can achieve high accuracy prediction of user categories. Setting the ratio of observation to judgment periods to 3 : 1 can not only achieve the early prediction of user categories but also ensure high accuracy. In this work, the observation and judgment periods are defined for the first time, and the mechanism of dividing them is analysed. e final conclusion can provide support for determining the prediction time point and defining the observation period to obtain sufficient effective data for completing the prediction. e model construction can provide quantitative decision reference basis for the operations and managements of car-sharing.
is basis is beneficial for operators in conducting scientific and reasonable user management to retain users and encourage continuous usage effectively.

Conclusions
e data analysis can confirm that the quantity of carsharing orders maintains a high growth rate. erefore, the car-sharing market has a good developmental prospect. However, the proportion of users who repeatedly utilize carsharing is small, and almost 50% of these users use carsharing less than 6 times. Morning and evening peaks can be observed in the distribution of car rental and return on weekdays. On weekends, orders are primarily concentrated from 10 : 00 am to 7 : 00 pm without demand peak.
We use the k-means clustering method to divide all users into four categories, namely, lost, early loyal, late loyal, and motivated users, by combining the two indicators, namely, the latent ratio and persistence ratio. Lost users' rental time and travel distance tend to be higher than those of the other users. e lost users account for 19.79% of the total users and create 11.07% of the total cost; therefore, they are considered low-value ones. Early loyal users account for 19% of the total users and create 49.02% of the total cost; thus, they are considered high-value ones. Motivated users have a great possibility to become loyal users. erefore, operators are required to take measures for extending these users' duration to promote substantial economic benefits. Late loyal users have the lowest rental time and travel distance. is observation is the same as conclusion drawn by some Chinese scholars. e use of car-sharing for a short-time travel will also become a major trend.
On the basis of the user classification with the k-means clustering method, the C 5.0 decision tree classification prediction model takes the user's multidimensional usage attributes as the input. e aforementioned model also predicts the user category four months in advance with an accuracy of 91.61%. Accordingly, the optimal balance of predictability and accuracy of prediction is achieved. e prediction model can predict the user category in advance according to the attributes of the person by using the carsharing service over a period of time. e prediction effect has reached a relatively good level when the ratio of observation to judgment periods is 3 : 1. e prediction is beneficial for operation managers in executing measures for different user categories in a targeted manner and rationally arranging resource delivery. is approach can also provide basic research for the operation scheduling and site layout of upcoming car-sharing operation.
In this research, we use data from a company in the early stage of car-sharing development. erefore, further studies need to be carried out. We can consider the influence of other external factors, such as weather and incentives, on car-sharing usage. We can also analyse the data of different regions and identify the developmental rules and usage characteristics. In this manner, we can perform the early estimation of user characteristics and rationally arrange vehicle resources in diverse regions [29].

A. Basic Principle and Algorithm Steps of K-Means Clustering Algorithm
K-means, an unsupervised clustering method, is commonly used to partition samples into k clusters automatically. is clustering method aims to assign all N samples into k  clusters by minimising the sum of point-to-centre distances as follows: where C � C 1 , C 2 , . . . , C k indicates k clusters; x → is an N × R feature matrix; R represents the dimensions of the matrix, each row is a single observation or sample; and μ i → indicates the cluster centre of the ith cluster. e detailed process of this clustering method is presented as follows: Step 1: initialisation Randomly select k cluster centre for all feature samples.
Step 2: assignation Assign each sample to the nearest cluster centre by measuring the distance between the sample and each centre as follows: Step 3: update Find all samples in each cluster and determine the new cluster centre using where N t i is the sample number of the ith cluster at the tth iteration.
Step 4: repeat Steps 2 and 3 until the cluster centre remains unchanged or the function converges.

B. Evaluation Index of K-means Clustering Algorithm
Clusters of clustering results are divided into C � C 1 , C 2 , . . . , C k }, and the following attributes are defined as follows: where dist(, ) is used to calculate the distance between two samples; μ represents the central point of cluster C, μ � (1/|C|) 1≤i≤|C| x i ; avg(C) corresponds to the average distance between samples in cluster C; and d cen (C i , C j ) corresponds to the distance between centre point of clusters C i and C j . e two indexes for measuring clustering performance can be deduced on the basis of formulas (B.1) and (B.2). e specific calculation is presented as follows: Davies Data Availability e data used in this paper were provided by a Gansu travelsharing company.

Conflicts of Interest
e authors declare that they have no conflicts of interest.