Analysis of Electric Vehicle Charging Behavior Patterns with Function Principal Component Analysis Approach

)is manuscript focused on analyzing electric vehicles’ (EV) charging behavior patterns with a functional data analysis (FDA) approach, with the goal of providing theoretical support to the EV infrastructure planning and regulation, as well as the power grid load management. 5-year real-world charging log data from a total of 455 charging stations in Kansas City, Missouri, was used. )e focuses were placed on analyzing the daily usage occupancy variability, daily energy consumption variability, and station-level usage variability. Compared with the traditional discrete-based analysis models, the proposed FDA modeling approach had unique advantages in preserving the smooth function behavior of the data, bringing more flexibility in the modeling process with little required assumptions or background knowledge on independent variables, as well as the capability of handling time series data with different lengths or sizes. In addition to the patterns revealed in the EV charging station’s occupancy and energy consumption, the differences between EV driver’s charging time and parking time were analyzed and called for the needs for parking regulation and enforcement.)e different usage patterns observed at charging stations located on different land-use types were also analyzed.


Introduction
Electric vehicles (EVs) produce fewer emissions that contribute to climate change and smog than conventional vehicles and help the United States achieve a greater diversity of fuel choices available for transportation. e evolution of EVs has advanced from models best suited for commuting or traveling short distances to vehicles that can travel more than 200 or even 300 miles per charge.
Proper planning of the EV charging infrastructure and scientific determination of their locations are critical to promoting EV ownership and usage. Modeling efforts can be found in the literature, such as the electric vehicle infrastructure projection (EVI-Pro) model developed by the National Renewable Energy Lab to address the fundamental question of how much charging infrastructure is needed in the United States to support Plugin-EVs (PEVs) [1]. e model generated a quantitative estimate for a US network of nonresidential (public and workplace) EVSE that would be needed to support broader PEV adoption. He et al. studied how to optimally locate public charging stations on a road network, considering drivers' spontaneous adjustments and interactions of travel and recharging decisions [2]. A bilevel programming model with the consideration of EV's driving range was proposed in [3], with the upper level to optimize the position of charging stations so as to maximize the path flows that used the charging stations, while the user equilibrium of route choice with the EV's driving range constraint was formulated in the lower level. Other research on EV charging station locations can also be found in [4,5] and many others.
Another approach to supporting the planning of charging infrastructure was to perform analysis of EV-related data, with the goal of identifying charging behavior patterns and inferring the scenarios of when and where people need to charge their vehicles. For example, the driving data in Denmark was analyzed to extract the information of driving distances and driving time periods which were used to represent the driving requirements and the EV unavailability. e Danish National Transport Survey data were used to implement the driving data analysis [6]. e analysis of charge event data in Ireland for public charging infrastructure, including data from fast-charging infrastructure, and additionally a limited quantity of household data was performed in [7]. Sun et al. studied driver's charging timing decisions, in which a mixed logit model with unobserved heterogeneity is applied to panel data extracted from a two-year field trial on battery electric vehicle usage in Japan [8]. e analysis over the real-world dataset can also be found in [9,10] and others.
is manuscript focused on performing analysis over the 5-year real-world charging event log data, from a total of 455 charging stations in Kansas City, Missouri (KCMO), with a functional data analysis (FDA) approach. e EV charging equipment recorded which vehicle was charged at which charging station, at what day and time. Such charging event log data contained many significant pieces of information for understanding EV charging patterns and user behavior. e goal of this research was to provide theoretical support to the EV infrastructure planning and regulation, as well as the power grid load management. We argue that compared with the existing research over the real-world charging event data, the proposed FDA modeling approach had many unique advantages over the prevailing discrete-based analysis models and led to some important insights that were difficult to model or discover with the other approaches.
Commonly, time series data (such as the EV charging log data used in this research) were treated as multivariate data because they were given as a finite discrete time series [11][12][13].
is usual multivariate approach completely ignored important information about the smooth functional behavior of the generating process that underpins the data [14]. For example, in our context, the vehicles' charging process was continuous and so was the time-dependent occupancy of a particular charging station. Additionally, in the previous research, performance measurements need to be defined by the researchers to extract useful information from the raw dataset, before any meaningful analysis can be performed. However, they were usually defined arbitrarily, based on the researcher's experience in the field. Instead of assuming a variety of explanatory variables, which was difficult or even impossible to enumerate and collect data for, FDA is much more flexible with little required assumptions or background knowledge on independent variables. Last not but least, time series data often has different time intervals or different lengths which are hard to deal with by other tools. In our context, some charging stations were more frequently used and might have thousands of charging records while others might only have a few hundred. It was thus impossible to apply principal component analysis (PCA) to the charging log dataset directly because of the dimension inconsistency. e basic idea behind FDA is to express discrete observations arising from time series in the form of a function (i.e., to create functional data) that represents the entire measured function as a single observation and then to draw the modeling and/or prediction information from a collection of functional data by applying statistical concepts from multivariate data analysis [15]. With this said, this manuscript firstly represented the EV charging dataset with a continuous functional form, then performed function principal component (FPC) analysis to identify the main contributing principal components (PC), and analyzed the dataset from different perspectives to understand EV owner's charging behavior patterns.
is research aimed to provide theoretical support to the EV infrastructure planning and regulation, as well as the power grid load management. To achieve such goals, the focuses were placed on three aspects. (1) e first aspect is the variability analysis of the daily usage patterns of all EV charging stations, in which the 24-hour occupancy of all charging stations in one day was treated as one continuous curve. Such analysis can provide insights and directly support the planning of new EV charging infrastructures.
(2) e second aspect is the variability analysis of the daily energy consumption of all EV charging stations, in which the total energy consumption of all charging stations in one day was treated as one continuous curve. Such analysis was important from the power grid load management perspective. (3) At the station level, the usage pattern variabilities were analyzed, in which one station's usage over the entire observation period was treated as a continuous curve. is analysis revealed insights on the usage pattern differences at the station level and was combined with the land-use information for better EV charging infrastructure planning and management purposes. e remaining part of this paper is organized as follows. e charging event log dataset used in this research is firstly presented in Section 2. Section 3 presents the analysis methodology, including the data smoothing, variable calculation, and the functional principal component analysis. e analysis results are shown and compared in Section 4. Section 5 concludes this research.

Data
is section presents the real-world charging event log data used in this research. e data was collected from 455 charging stations between January 2014 and November 2019 in Kansas City, Missouri (KCMO). e dataset included a total of 226,652 charging records from 4,921 users. Most of the stations were concentrated in the downtown area of KCMO.
e spatial distribution of charging stations was shown in Figure 1, in which Figure 1(a) showed an overview and Figure 1(b) zoomed in to the downtown area.
In the collected dataset, each row contained the information of a charging event and had a total of 30 columns/ attributes. Table 1 showed the sample data from the dataset, in which only the most critical and relevant information was displayed. e complete dataset included information of the following three categories: (1) Charging station information: including a unique station ID, station name, address and zip code where the station was located at, MAC address, latitude and longitude of the station, and type of the charging ports which included level 1, level 2, and DC fast charge (2) Electric vehicle attribute: including a unique ID of the electric vehicle and zip code where this electric was registered in (which is usually the zip code of the driver's home) (3) Charging event data: including the start date and time of the charging event, end date and time of the charging event, charging time which is equal to the end time minus start time, total duration which included not only the time spent on charging but also the time spent on parking afterward, start state of charge (SOC), end SOC, energy charged, Greenhouse Gas (GHG) saving, and information on how was the charging event ended (e.g., terminated by customer or server). Duration is the total time that a station is occupied, which is one of the most significant properties we are interested

Methodology
is section presents the analysis methodology used in this manuscript, including a brief overview of the function data analysis approach, charging pattern definition, and functional principal component analysis.

FDA Method Overview.
To process "curve-liked" data that are continuous in nature, such as the time-dependent charging station usage rates of this manuscript, one advanced and popular method is functional data analysis [16]. Apart from the commonly seen multivariate data analysis approaches, the proposed FDA approach considered EV  Journal of Advanced Transportation charging usage as a function of time; thus, all the EV charging events that were sampled in different scales, from different charging stations built at different time periods, and used with different frequencies with different data sizes, were all modeled uniformly by functions. In other words, under the functional data analysis approach, each charging pattern to be defined in Section 3.2 was treated as one functional data. By applying basis expansion techniques such as B-spline expansion denoted in (4) [17,18], each charging pattern can be modeled and expressed in a functional form.
where y(t) is the original function, φ j (t) is j th basis functions, and β j is the coefficient of the corresponding basis function.
With such a data analysis approach, all charging patterns which were sampled in different scales can be uniformly expressed in the same functional form. Additional benefits of such an approach also included the reduction of unnecessary noise in raw data by basis expansion smoothing. Based on this model, all the information from the raw data can be projected to M basis coefficients β j . Obtaining the basis coefficients can be done through an ordinary least square (OLS) regression.
is process is also known as B-spline smoothing. Section 3.3 will further describe the basis expansion and modeling process.

Charging Pattern Definition.
is subsection defines the three charging patterns to be analyzed, corresponding to the three analyses performed in the numeric analysis section.

Daily Usage Occupancy.
Daily usage occupancy was defined to measure the 24-hour time-dependent usage occupancy within a single day, by aggregating the charging events at all charging stations.
A binary variable u j, d (t) was firstly defined to denote the usage condition of a charging station j in hour t at day d. If the charging station was used for at least once, u j, d (t) � 1, else 0.
1, the station was in use, Next, the 24-hour time-dependent occupancy for each day can be calculated by aggregating all charging station's usage, so that in the end, one curve was generated to represent the daily usage occupancy of each day.
where u d (t) means the average occupancy of time t at day d, and J means the total number of stations on day d. Table 1, the energy consumption e i,d associated with each charging event i at day d was recorded and thus was directly available. First, e i,d was proportionally assigned to each hour, so that

Daily Energy Consumption. As shown in
in which e i,d ′ (t) was the proportion of energy consumption e i,d in hour t, l i,d was the duration of charging event i, and l i,d (t) was the proportion of l i,d in hour t.
Next, the 24-hour time-dependent energy consumption for each day can be calculated by doing aggregation over all charging stations, so that in the end, one curve was generated to represent the daily energy consumption of each day.

Station-Level
Occupancy. Similar to the daily usage occupancy calculation, to analyze the difference between stations, aggregations can be performed over the days. For each station j, its aggregated occupancy at time t was denoted as u j (t) and calculated as follows.
where u j,d (t) was calculated from (2), and T denoted the total number of days in the analyzed time period. In the end, one curve was generated to represent the aggregated usage occupancy of each charging station.

Data Smoothing.
is subsection focuses on how to represent the charging patterns defined above as curves. Since u d (t), e d (t), and u j (t) are all time-dependent, they can be represented by (t, u d (t), e d (t), and u j (t)). Based on B-spline expansion, these discrete points can be modeled by a continuous function: An example of B-spline expansion was depicted in Figure 2, where a smoothed function (solid black curve) was represented as a summation of B-spline basis functions (dashed black curves) to model the raw daily usage occupancy data (red diamond). e heights of these basis 4 Journal of Advanced Transportation functions were determined by the basis coefficients α T i , β T i , and c T i . Such basis expansion method was advantageous in terms of transferring a high volume of data points into several basis functions' coefficients without losing the original pattern [11].
To obtain the basis coefficients α T i , β T i , and c T i , the least square regression model was constructed as follows. u d (t) was used as an example to avoid repetition, but the method presented hereinafter was directly applicable to e d (t) and u j (t) as well.
To simplify the notations for the lease square model, some matrix-formed data were introduced as follows: (2) . . .
where u di (k) was a vector that contained raw data points in day i. Φ i was a K i × M matrix; each column was the basis function value at all the time points. By reconstructing these usage occupancy data, the least square model can be rewritten in a simple quadratic form: us, the basis coefficients for day i can be estimated by rough the B-spline model and least square regression, all three charging patterns defined above were converted into the basis coefficients. e functions can be obtained by

Functional Principal Component Analysis.
After data smoothing, functional PCA was enabled as a powerful tool of the FDA approach to explore the curve's underlying features. In multivariate data analysis, PCA was commonly used to convert a large number of variables to some comprehensive variables that are much less in quantity but account for the highest variability. e mathematical solution of this problem was similar to finding the eigenvalue and the new variables were the functional principal components (FPCs).
In the FDA approach, the analyzed function contained information of a set of specific variables at enormous time points in a time interval. As a result, the work was confronted with the curse of dimensionality if the time was seen as the independent variable in the functional case. Consequently, the functional PCA method can be applied for the purpose of dimension reduction. In [19,20], FPCA was employed as a data dimensionality reduction technique in the modeling of traffic flow patterns, which inhibit similar functional characteristics observed in EV charging. e approach was similar to the multivariate case. e dependent variable x i (s)(s ∈ T) was relative to x ij in multivariate case.
where β(s) was the weight value and β k (s) denoted the weight function of k th principal component. e variance function can be represented as To calculate the first principal component, we just need to solve the following optimization problem: Journal of Advanced Transportation and the k th principal component can be calculated by the following optimization problem: e covariance of x(s) and x(t) can be calculated by e weight function of functional principal components β(s) is needed to satisfy the following secular equation:

v(s, t)β(t)dt � λβ(s),
where λ was the eigenvalue and λ k / N−1 k�1 λ k meant the proportion of variability which the k th principal component accounted for. e left side of (6) was an integral transform V of the weight function β(s) with the kernel of the transform v defined by e covariance operator was denoted by V. erefore, (17) can be expressed as Equation (1) can be calculated through several methods, and we can calculate the FPC score f i through (12).

Numeric Analysis
In this section, the numeric analysis was performed with the goal of understanding the EV owner's charging behavior patterns. e focuses were placed on three aspects: (1) variability analysis of the daily usage patterns of all EV charging stations, (2) variability analysis of the daily energy consumption of all EV charging stations, and (3) at the station level, the usage pattern differences analyzed.

Daily Usage Pattern Variability Analysis.
To analyze the time-dependent usage pattern variabilities, the time-dependent occupancy of each day was calculated by aggregating all charging stations, so that in each year, a total of 365 curves were obtained, with each curve representing the occupancy of a day. Function PCA was then applied to extract the FPC from the dataset. It was observed that FPC1 accounted for 94% of the variance, and FPC2 accounted for 3%. When combined together, they reflected 97% of the data's variability and were kept for further analysis. Figure 3 showed a way to look at the two FPCs and how they supported the unique analysis that FDA enabled. X-axis represented the time (0-24 hours in a day), and Y-axis represented the percentage of charging stations that were occupied at that time. e blue curve in both subfigures (a for FPC1 and b for FPC2) stood for the mean occupancy of all charging stations, while the green and red curves stood for the functions adding and subtracting one functional principal component. For example, in Figure 3(a), the green curve was generated by adding one FPC1 to the mean function represented by the blue curve, and the red curve was generated by subtracting one FPC1 to the mean function.
e first principal component focused on daytime between 7 am and 5 pm, which corresponded to the time that public charging stations were busiest in the day, especially workdays. erefore, the first FPC essentially distinguished between working days and nonworking days. is observation was directly supported by Figure 4, in which almost all weekdays (blue dots) were located on the right-hand side of the plot, indicating a higher FPC1 score (X-axis), while almost all weekend days (red dots) were located on the left-hand side of the plot with lower FPC1 score. A few exceptions were identified in the plot and turned out to be the holidays, such as Labor Day and Independence Day, so these were nonworking days as well.
e second FPC accounted for only 3% variability and mainly captured the variance in the evening time from midnight to 6 am and again from 6 pm to midnight. e days with higher usage after 6 pm and before 6 am and with slightly less or average usage in the daytime would receive higher scores. However, due to the dominance of FPC1, the effect of FPC2 was rather limited. Figure 5 presented the monthly and yearly charging usage patterns. e X-axis was the score of FPC1 and the Yaxis was that of FPC2. Figure 5(a) showed the monthly pattern with the colors standing for 12 months, respectively. No clear monthly pattern was observed. Figure 5(b) showed the yearly pattern with the colors standing for years from 2014 to 2019. Dots in 2014 were almost invisible due to the low data size and overlap with pink color. e observation led to a clear pattern that as time went by, the scores of both FPC1 and FPC2 increased significantly.
at meant that for the days with a higher FPC1 score, the occupancy continued to increase at an everincreasing speed, while for the days with higher FPC2 scores, its morning and evening usage also increased significantly.
is interpretation was in line with the rapid increase of EV ownership in Kansas City at a 78% year-over-year growth rate [21] and emphasized the needs for more charging infrastructures in the region. Figure 6(a) shows the clustering result of the data. To make sure that similar data sizes are studied, data from 2016 to 2018 are selected for clustering. Compared with Figure 6(b), the result indicates that the data points in 2015 have a lower FPC1 score and FPC 2 scores and are obviously separated from the other data points. However, the difference between 2017 and 2018 is less significant, which means that they have a similar occupancy pattern.

Daily Energy Consumption Variability Analysis.
is section aimed to analyze the energy consumption variability caused by EV charging, which had a significant impact on the power grid and was helpful for grid load management. Similar to the analysis above in Section 4.1, the time-dependent energy consumption of each day was calculated by aggregating all charging stations, so that in each year a total of 365 curves were obtained, with each curve representing the energy consumption of a day. Function PCA was then applied to extract the FPC from the dataset. It was observed that FPC1 accounted for 81% of the variance, and FPC2 accounted for 5%. So when combined together, they reflected 86% of the data's variability and  were kept for further analysis. e results were shown in Figure 6. FPCA analysis of energy consumption resulted in some very different patterns when compared with daily occupancy. FPC1 mostly captured the variance of energy consumption during the morning peak between 7 am and 11 am. During this time range, the green curve increased dramatically, representing the days with a higher FPC1 score, and the required energy to charge EVs in the morning would be higher. On the other hand, FPC2 mostly captured the variance of energy consumption during the evening peak between 4 pm and 9 pm. In other words, if a day was observed to have a higher FPC2 score, its impacts on the power grid in the evening hours would be significantly increased.
A comparison between Figures 3 and 6 led to some interesting conclusions. While Figure 3 indicated that from an occupancy perspective, the peak hour during the day started from as early as 7 am and did not end until 5 pm, Figure 6 suggested that the impact on the power grid became low after 11 am. is suggested that some vehicles did not leave the charging stations after they were fully charged, under which circumstances, the charging stations continued to be occupied (and thus unavailable to the other EV drivers), but from a power grid perspective, they did not require any energy. To validate such interpretation, the team went on to compare the charging event duration and the time EV actually spent on charging. e finding was as follows: while 40% of EVs left the charging stations within 1 minute after they are charged, the remaining 60% of EVs continued to park at the charging stations for various durations, and among them, two-thirds (or 40% of the entire population) would  even occupy the charging stations for at least an hour. While the discrepancy between EV owner's daily activity and the time needed for charging was understandable, the longer-than-reasonable parking behavior effectively reduced the availability of charging stations to the other EV drivers and, in our view, called for the need for parking regulation and enforcement.

Station Occupancy Variability
Analysis. Different from the above analysis performed from the daily perspective, this analysis in this section examined the occupancy at the station level. So, each curve represented one charging station's 24-hour occupancy rates with all days aggregated, and a total of 455 curves (representing a total of 455 charging stations) were derived. Function PCA was then applied to extract the FPC from the dataset. It was observed that FPC1 accounted for 85% of the variance, and FPC2 accounted for 8%. So, when combined together, they reflected 93% of the data's variability and were kept for further analysis. e results were shown in Figure 7.
In the morning before 6 am and in the evening after 5 pm, stations with higher FPC1 scores were utilized more often than average, while in the daytime, their utilization rates were lower. On the other hand, stations with higher FPC2 values were utilized more often than average in the first half of a day (before noon) but were used less often in the second half of a day (afternoon).
An intuitive guess was these patterns might be attributed to the differences in the land-use patterns. As such, all 455 charging stations were mapped to five categories of land-use types: (1) recreational, which was meant to be used for the enjoyment of the people who used it, such as arts center and theater; (2) commercial, which was designated for businesses, warehouses, shops, and any other infrastructures related to commerce, such as plazas, hotels, and hospitals; (3) transport, which was built for the structures that help people get from one destination to the other, such as airport; (4) industrial such as the plant and industrial parks; and (5) residential, such as apartments and condominiums. en, the scores of FPC1 and FPC2 were plotted in Figure  No clear patterns can be found in Figure 9(a), in which charging stations of all land-use types were plotted together. However, when they were separated, some conclusions can be drawn. (1) For charging stations that were built on residential (Figure 9(c)), transport (Figure 9(e), and recreational (Figure 9(f )) areas, the majority of the dots in those subfigures had positive FPC1 scores. In other words, charging stations in these three categories shared a common pattern that they were used more often in the evening than in the daytime. Considering the nature of activities happening at these locations, this interpretation was consistent with what was observed in the real life. (2) FPC1 values for commercial (Figure 9(b)) and industrial (Figure 9(d)) areas were mixed and thus inconclusive to identify clear patterns. (3) Charging stations in the recreational area, in general, had negative FPC2 scores, meaning that they were used more in the second half of the day than before noon. Again, this seemed to be in line with our understanding of human behavior patterns.

Conclusions
In this manuscript, the focus was placed on analyzing the electric vehicle's usage behavior pattern with a functional data analysis approach, specifically, based on functional principal component analysis. Compared with the traditional discrete-based analysis models, the proposed FDA modeling approach had unique advantages in preserving the smooth function behavior of the data, bringing more flexibility in the modeling process with little required assumptions or background knowledge on independent variables, as well as the capability of handling time series data with different lengths or sizes. 5-year real-world charging event log data from a total of 455 charging stations in Kansas City, Missouri (KCMO), was used. e daily usage variability, daily energy consumption variability, and station-level usage variability were analyzed, with the goal of providing theoretical support to the EV infrastructure planning and regulation, as well as the power grid load management. In addition to the patterns revealed in the EV charging station's occupancy and energy consumption, the differences between EV driver's charging time and parking time were analyzed and called for the needs for parking regulation and enforcement. e different usage patterns associated with charging stations of different land-use types were also analyzed.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.
Disclosure is report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. e views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.