CIMA: A Novel Classification-Integrated Moving Average Model for Smart Lighting Intelligent Control Based on Human Presence

Smart lighting systems utilize advanced data, control, and communication technologies and allow users to control lights in new ways. However, achieving user comfort, which should be the focus of smart lighting research, is challenging. One cause is the passive infrared (PIR) sensor that inaccurately detects human presence to control artificial lighting. We propose a novel classification-integrated moving average (CIMA) model method to solve the problem. (e moving average (MA) increases the Pearson correlation (PC) coefficient of motion sensor features to human presence. (e classification model is for a smart lighting intelligent control based on these features. Several classification models are proposed and compared, namely, k-nearest neighbor (KNN), support vector machine (SVM), decision tree (DT), näive Bayes (NB), and ensemble voting (EV). We build an Internet of things (IoT) system to collect movement data. It consists of a PIR sensor, a NodeMCU microcontroller, a Raspberry Pi-based platform, a relay, and LED lighting. With a sampling rate of 10 seconds and a collection period of 7 days, the system achieved 56852 data records. In the PC test, movement data from the PIR sensor has a correlation coefficient of 0.36 to attendance, while the MA correlation to attendance can reach 0.56. In an exhaustive search of an optimum classification model, KNN has the best and the most robust performance, with an accuracy of 99.8%. It is more accurate than direct light control decisions based on motion sensors, which are 67.6%. Our proposed method can increase the correlation value of movement features on attendance. At the same time, an accurate and robust KNN classification model is applicable for human presence-based smart lighting control.


Introduction
Smart lighting systems utilize advanced data, control, and communication technologies and allow users to control lights in new ways [1]. Smart lighting products are already on the market, where their global revenue is up to US$600 million in 2020 [2]. e main issue of smart lighting research is energy efficiency, in which until 2021, 232 out of 384 papers on smart lighting try to solve this problem [3]. e main targets for smart lighting installations are on roads, offices, and housings [4]. Noting the needs of such targets, user comfort and security also become important in smart lighting. However, achieving user comfort is still challenging because the passive infrared (PIR) sensor, a low-price movement sensor, inaccurately detects human presence to control artificial lighting [5].
A smart thing device such as smart lighting should be able to co-operate with its users and environment intelligently [6]. Gartner stated that intelligence is one of five key factors in smart lighting. Activity recognition is an example of intelligence implementation, where it detects human activity based on machine learning applications on several types of sensors [7]. Intelligence can also be applied to improve uncertainty problems in conventional control systems, hence creating an intelligent control system [8].
Several previous studies have tried to overcome the problem of motion sensors to improve accuracy for smart lighting intelligent control based on human presence. Jin et al. [9] used a time-series-artificial neural network (TS-ANN) on historical PIR sensor data and got up to 97% accuracy in human presence predictive control based on human presence. Fakhruddin et al. [10] used activity recognition to detect five activities using four PIR sensors installed in the house using the principal component analysis-k-nearest neighbor (PCA-KNN) method and to get an accuracy of 94%. Lupion et al. [11] made another study that uses activity recognition and utilizes feature extraction from sliding windows on various sensor data used to produce 99.26% accuracy in detecting 14 activities using the random forest classification method. Park et al. [12] used reinforcement learning (RL) on the PIR sensor and several other sensors to get smart lighting that is adaptive to user needs and also energy-efficiency.
Reconsidering [9,11], we can think of human presence as a type of activity. On the other hand, we can also consider historical data as a sliding window feature extraction. A moving average (MA) concept can substitute the sliding window feature extraction method in this intuition. Usually, MA is a method for smooth fluctuating data and, among others, can be used as a noise filtering method for time-series data [13]. In some research, MA is used to increase the Pearson correlation (PC) coefficient of machine learning features [14]. Furthermore, we can conduct a comprehensive test to find the optimum classification model. Several studies use some well-known classical machine learning methods such as KNN, support vector machine (SVM), decision tree (DT), and näive Bayes (NB) to train the classification model [15]. Other research also uses ensemble learning methods such as ensemble voting (EV) to improve the performance of the existing classical machine learning method [16].
We propose a novel classification-integrated MA (CIMA) model method to solve the problem. e MA is to increase the correlation of motion sensor features to human presence, while the classification model is for a smart lighting intelligent control based on these features. We train the proposed classification model with KNN, SVM, DT, and NB. We also use ensemble learning methods such as EV to improve classical machine learning performance. An Internet of things (IoT) system is built on a test-bed environment to retrieve movement data from the PIR sensor. At the end-device layer, the microcontroller used is NodeMCU. We build a Node-Red server on the Raspberry Pi at the Platform layer. It stores the movement data log in a commaseparated value (CSV) file. We use test parameters such as accuracy, precision, recall, and F1-score to discover the optimal classification model. In addition, to check the robustness of the model, the cross-validation method is used. e main contributions of our work are listed below: Increasing the correlation between movement data and human presence through the MA method. A novel classification model with significant accuracy from state-of-theart research utilizing MA data from movement data as a feature. An accurate yet low-price solution for human presence-based smart lighting control because of the utilization of motion sensors. e remainder of this document has the following systematic: Section 2 presents works related to the research undertaken, Section 3 describes methods used in this research, Section 4 gives the results of the tests conducted, Section 5 reports the results and compares them with stateof-the-art studies while highlighting the contributions provided from our work, and finally, Section 6 emphasizes the important findings of this study.

Related Works
Several studies have discussed automatic smart lighting control using the PIR sensor. Jin et al. [9] aimed to improve the accuracy of PIR sensors using the time-series-artificial neural network (TS-ANN) method and compared several features such as time, occupied ratio, time steps, and historical occupied state data. e study showed that the proposed method can provide up to 97% accuracy for the intelligent control. Putrada et al. [17] used a hierarchical hidden Markov model (HHMM) to classify five different types of activities from four PIR sensors to control smart lighting in offices. e HHMM model tested is better than the hidden Markov model (HMM), NB, and KNN method and has an accuracy of 87.6%. Ramadhan et al. [5] also used the HHMM method on 14 different activities from five PIR sensors. e accuracy of HHMM was 93%, and the method was superior to HMM. Fakhruddin et al. [10] used activity recognition to detect five activities using four PIR sensors installed in the house using the principal component analysis-k-nearest neighbor (PCA-KNN) method and to get an accuracy of 94%. Each study investigates a different amount of activity and obtains varying performance. ere is an opportunity to find a correlation between the number of activities and performance in using PIR sensors for activity recognition on smart lighting. Other factors are also opportunities for investigation.
Furthermore, other studies also conducted smart lighting control but with devices or sensors other than the PIR sensor. Dai et al. [18] used five low-resolution cameras to detect nine activities in a smart lighting environment. e study provided a solution that ensures privacy even when using a camera, while the accuracy is up to 89.6%. Chun et al. [19] used a depth camera to detect four human activities in a room. e proposed method results provided 100% accuracy for the location where people are and 78.3% accuracy for the type of performed activity. Lupion et al. [11] used PIR sensors and also several different sensors including smartwatches and real-time location systems. e research produced 99.26% accuracy in detecting 14 activities using a random forest classification method. Park et al. [12] used a light sensor and actuators such as Switchmate. Switchmate consists of a motor and a position sensor to control and monitor a conventional light switch. e average light utility ratio (LUR) of the research is 67%. e studies mentioned have performance that vary from inadequate to highly adequate results. However, the equipment used is expensive when compared to the PIR sensor, which costs around US$ 1. ere is an opportunity to implement an accurate and low-priced solution using CIMA and PIR sensors.
Several previous studies have applied MAs for smoothing and increasing the PC between two variables.

Complexity
Husnayain et al. [20] used MA to increase the correlation between the incidence of dengue fever with Google search activities for dengue and found that the correlation was very high between the two. Hu et al. [21] utilized MAs to reduce noise in water pH and water temperature data to improve the correlation of the two data with other water quality data to provide better performance in mariculture water quality forecasting. Peng et al. [22] used MA to increase the PC between drought and flood to predict the occurrence of these two disasters in China. Badr et al. [23] showed that the correlation between the mobility ratio and growth rate ratio increased as the MA window size increased but slowly decreased when the window size was too large. Singh et al. [24] used MAs to refine CO 2 sensor readings and improved the correlation of sensor data with respiratory rate and Hjorth activity in a cardiorespiratory assessment. e results of the mentioned studies show that there is an opportunity to apply MA to movement data to increase the PC coefficient for an accurate classification model.

Research Methodology.
is section discusses the research methodology, from how the test data were collected, to how we obtained the final model. e methodology for developing a classification model to predict human presence is shown in Figure 1.
e PIR sensor is one of the most utilized sensor in smart lighting control [25]. We build an IoT system with PIR sensors to collect human movement data. Labeling is done to each movement as to whether there are people or not at each given moment. e system stores the data in a CSV file for further analysis. e next step is to apply the MA and observe the PC coefficient. Further is to prepare data before conducting classification training with methods KNN, SVM, DT, and NB. e possibility of applying EV to improve the performance of the classification model is analyzed later. e last step is to analyze the most optimum model and perform cross-validation to check for possible overfitting.

Smart Lighting IoT System.
e IoT architecture of the smart lighting system for automatic light control based on human presence is as shown in Figure 2. We chose a living room as a test-bed environment to implement the proposed architecture.
In the proposed IoT architecture, there are three main layers, namely, the end-device layer, platform layer, and application layer [26]. en, there are additional communication protocols and gateways that connect the three layers. At the end-device layer, the layer directly related to the IoT hardware, the three main devices are PIR sensors, NodeMCU, and relays. e PIR sensor functions to detect human movement [27]. NodeMCU has a system on chip (SoC), ESP8266, which includes a microcontroller and WiFi communication [28]. WiFi is used for communication between the end-device layer and platform layer [29]. e relay is an actuator connected to the LED light [30]. Its function is to turn the LED on and off like a switch controlled via the microcontroller.
We build the platform layer on a Raspberry Pi (Raspi), an open-source mini-personal computer (mini-PC) running with a Raspbian operating system (OS) [31]. We use Node-Red for web service functions. Node-Red is also an opensource web service based on Node.js, which has a special add-on for IoT systems [32].
e Node-Red performs movement sensor data log dumps to a CSV file used for training the classification model. Raspi can also be used to run Python functions [33]. Hence, the classification model running in Python can be executed on this server. e application layer is concerned with the interaction between the system and the user. Users can use the Pythonbased graphical user interface (GUI) to set the light status manually or automatically. Especially for testing, the user can also choose to control lights with the novel method or the conventional method, which compares the comfort between the new system and the legacy system. e platform layer links to the application and end-device layers via the Internet and the hypertext transfer protocol (HTTP) application programming interface (API) protocol. e device is a single set and detects the presence of one person at one location in one room. A chart depicting the placement of devices in a room is shown in Figure 3. e PIR sensor, NodeMCU, and relay are on the ceiling as part of the end-device. e PIR sensor is placed approximately above where humans conduct activities, for example, working. e end devices, especially the relay, are connected to the LED light. e LED light is on the ceiling in the middle of the room. e NodeMCU receives motion sensor data, controls the LED light via relays, and communicates with the IoT Platform via WiFi. A wall-mounted WiFi-4G router connects the WiFi network with the Internet. e motion detection distance from the sensor is 10 meters forward. In addition, the PIR sensor has a capture range as wide as 110°. Figure 4 shows the coverage area of the PIR sensor when placed on the ceiling. If, for example, the room's height is 2.4 m, with the range described previously, then the coverage area will form a cone with a base diameter of 5 m and a base radius of 2.5 m. Hence, the area of the cone base is approximately 20 m 2 . e proposed smart lighting system hypothetically considers a person present if the person is in that mentioned space.

Moving Average.
As the name suggests, MA is a method of averaging on time-series data in which a certain period of data (called data points) is averaged continuously and moves along the data series [34]. e data points are notated as N. Applying the MA results in a smoother data series [35]. Due to this nature, scientists and analysts utilize MAs in cases involving fluctuating data such as financial data, stock predictions, and signal filters [36,37]. e MA formula for N values is as follows: Complexity where p i is the i th data series in range n − N + 1 to n and MA(n) is the MA on p n . e MA for N values and the following n data (n + 1) can use the following formula: We also introduce a novel theorem for MA(n − 1). e description is given in eorem 1. is theorem is a lowlevel solution that simplifies the complexity of our real-time system in calculating the MA. It considers the property of the data structure in use.

Theorem 1.
If MA(n + 1) is given by equation (2), then MA(n − 1) is given by the following formula: Proof. Consider a signal p, the MA(n + 1) is given by equation (2). Suppose n � m − 1, substituting n with m − 1 in equation (2) yields the following formula: en, the formulas yield Moving term MA(m − 1) to the left side of the equation and substituting back m with n yield, □ 3.4. Classification Models. Assuming our hypothesis on MA is correct, we carry out a comprehensive test to find the optimum classification model in determining attendance based on the novel movement data. e classification methods used are KNN, SVM, DT, and NB. e ensemble learning method can also improve the performance of conventional classification methods. Here we propose EV to combine several classical classification models.
KNN is a type of supervised machine learning that makes decisions based on the closest k training example to a data whose class is unknown [38]. One way to measure the closest distance of data with a training dataset is the Euclidean distance. e formula for calculating the distance in KNN with Euclidean distance is as follows: where x is the training dataset, y is the classified data, and n is the number of features in the dataset.

Complexity
A data structure contains the distance of y with all training examples x. As much as k training example xs closest to y are moved to a new data structure. From the k training example xs, the algorithm chooses the class with the most training example x (calculated with a mode function) as the class of y. Varying the k value influences the KNN model performance. Hence, a further test finds the optimum k value.
SVM is an example of supervised machine learning that uses margins to classify [39]. e classification method is to create a hyperplane to separate the different classes in the dataset [40]. Several kernels determine which hyperplanes can be created, including polynomial, radial basis function (RBF), and sigmoid. Polynomial kernels can use up to some different degrees. Linear kernel is considered a first-degree polynomial kernel. Here is the formula for the SVM polynomial kernel with d-degrees, including the linear kernel, where x and x ′ are vectors in the input space and r is a free parameter. e RBF kernel is one of the most used kernel [41]. e kernel's formula of the two vectors x and x ′ is as follows: where ‖x − x ′ ‖ 2 calculates the squared Euclidean distance between x and x ′ and σ is a free parameter [42]. e sigmoid kernel formula for the two vectors x and x ′ is as follows: where c is a free parameter with a value greater than 0 and c is a free parameter with a value less than 0. If the dataset is linearly separable, then the suitable kernel is a linear kernel. However, if the dataset is nonlinearly separable, a kernel that fits between polynomials (several d-degrees are useable), RBF, or sigmoid is the solution.
e SVM classification function is as follows: where a i is the Lagrange multiplier, y i is the y value of x i , and b is the intercept.    Complexity e DT is a classification model which is essentially a binary tree, where each branch in the tree is an ordinary ifelse decision [43]. However, the if-else decision comes from a training process through several stages [44]. e two most common types of DTs are iterative dichotomiser 3 (ID3), and classification and regression tree (CART) [45]. e main difference between the two is that ID3 can only be used for classification, while CART can be used for classification as well as regression [46]. e CART formation uses a calculation of the Gini index of each feature.
e Gini index describes the inequality value of a feature [47]. e lower the Gini index value, the better the feature is used to make decisions. e Gini index formula is as follows: where p is the feature index, p i is the fraction of the feature p with the label i, and J is the number of labels present. If, after the decision, the resulting class is still not uniform, then the process of calculating the Gini index for that branch is repeated for other features. e process is iterative until all branches produce a uniform class or have reached the max depth limit. Max depth is the farthest distance from the root to the leaf. Limiting the max depth value is usually to prevent overfitting.
NB classifies with the concept of the Bayes theorem, which is looking for opportunities from a hypothesis on events that have never happened [48]. NB is an efficient algorithm because each variable can be independent. e following is the formula used for the classification of NB: where x is the data to be classified, c is the hypothetical data of a class, and P(c|x) is the a posteriori probability of the data c against x. Ensemble learning is a method of combining several learning models where the results are usually better than if only one of its members is used [49]. e downside of ensemble learning is that the algorithm is usually more computationally heavy [50]. EV is a type of ensemble learning in which, by utilizing several models from several different methods, EV selects the answer with the most number of results from each model [51]. EV can exploit the peculiarity of each member's classification model so that the advantages of each model can be seen in the results of the ensemble [52]. In hard EV, the formula used is as follows: where z is the classification result of the EV, X is each classification model, a is the number of classification models used, and y is the data to be classified.

Evaluation
Metrics. PC measures the linear correlation between two datasets [53]. e usual denotation for PC is the letter r, and the PC formula between data x and data y is as follows: where n is the number of records in the dataset [54]. e range of the calculated values for the PC formula is -1 to 1. ere are several interpretations of the results of the PC. A negative result means that the x and y datasets have a negative correlation, where if the results are positive, the x and y data have a positive correlation. If 0.5 < |r| < 1.0, then there is moderate to strong correlation between x and y. If 0.0 < |r| < 0.5, there is no correlation, there is a non-linear correlation, or there is a low correlation between x and y [55].
PC is useful for feature selection in machine learning. Features with a moderate to strong correlation with the label usually pass the selection and continue to the training stage of machine learning. Features that have no correlation or low correlation are eliminated and cannot continue to the training stage [56]. e confusion matrix forms a quadrant for models with binary classification, which only involves two output values. In that quadrant, each row has data with actual positive output and data with actual negative output. Further on, each column has data with predicted positive output and data with predicted negative output. Each cell in the quadrant is an intersection between the sets of each row and each column, resulting in four possible outcomes: True Positive (TP), False Negative (FN), True Negative (TN), or False Positive (FP). e confusion matrix results show a model's predictive ability and strengthen the explanation of its accuracy, precision, recall, and F1-score result.
Accuracy is the ability of a model to predict data correctly. e accuracy formula is as follows: Accuracy can only measure the ability of a model to predict the correct data but cannot describe the specific capabilities of a model in making predictions. erefore, other metrics such as precision, recall, and F1-score are used.
Precision shows the ability of a model to sort the negative class from the positive class. e precision formula is as follows: Recall shows the ability of a model to predict the positive class. In some cases, accuracy is often mistaken for recall, whereas in imbalanced data, recall gives a true picture of the model's ability to predict positive classes. e recall formula is as follows: F1-score is a value that describes a combination of precision and recall capabilities. e F1-score is different from the average because the F1-score uses the concept of a harmonic average. Even though it combines precision and 6 Complexity recall, the F1-score value is usually different from accuracy. e F1 − score formula is as follows: Sometimes a model can experience overfitting, which is a condition when the model produces good performance on training but poor performance on validation [57]. e characteristic of an overfitting model is that it has high variance and low bias [58]. High complexity is another nature of an overfitting model. e cross-validation method can examine models with high complexity. In K-fold crossvalidation, the method divides training data into several random subsamples of the same size. e fold is the term for each subsample, where K is the number of subsamples. After division, the method performs K iterations. It uses one different fold as validation data in each iteration and the rest as train data. In each iteration, accuracy or one other performance metric evaluates the model. At the end of execution, the average accuracy of each iteration becomes the final result of the cross-validation evaluation [59]. e complete process of K-fold cross-validation is given in Algorithm 1.

IoT Implementation and Data Collection.
With the IoT architecture as described in Subsection 3.2, we implement the proposed human presence-based smart lighting control. Parts of the implementation are shown in Figure 5. e main parts of the implementation are PIR sensors, NodeMCU, 4G-WiFi router, Raspberry Pi, Google Sheets, LED lighting, and relays. e Google Sheets monitors the sensed movement data. Moreover, the Raspberry Pi saves a CSV file containing movement data.
Data collection begins after the smart lighting system with the PIR sensor is successfully implemented. Movement data is collected with a sampling rate of 10 seconds and collected for seven days in one test area. During that period, the system collected 56852 data records. e data consists of movement data with binary values. A value of 1 means the PIR sensor detects movement. Otherwise, 0 means there is no movement. Each data is labeled manually. e label describes the presence of people in the room. e manual filling is done based on the presence of a subject in the room. e specification of movement data collection is given in Table 1.
A line plot can visualize how sensor data capture human movement and how the data looks compared to the actual human presence in the room. A partial snippet of the movement dataset with attendance labels is shown in Figure 6. e snippet shows data from two days out of seven days of data collection. e line plot explains that the PIR sensor reports 0 even though a subject is present. It is not that the PIR sensor is not accurate enough, but more because, while PIR sensors can only detect movement, one can imagine that subjects are not always moving while they are present. It is conceivable that if a smart lighting system directly uses the PIR sensor results for light control, the lights will turn on and off while people are still present. It results in disturbance to people's comfort.

Moving Average Application.
e intuition is that the application of MA to the movement data results in a curve with a PC coefficient closer to human presence than movement data. Visualization in the form of a line plot can help illustrate this intuition. e line plot of MA results, movement data, and human presence are shown in Figure 7. Movement data of Day 1 goes through a MA with N � 200. e plot shows that the MA curve elevates when people are present and approaches 0 when otherwise. However, it does not fully resemble human presence. e PC evidences the closeness of the MA value to human presence data according to equation (16). We create five MA curves with different N values and observe which curve has the strongest PC coefficient. A matrix showing the PC of movement, five types of MAs, and human presence is shown in Figure 8. Payload is the feature name for movement data. e last row of the matrix shows the PC coefficient of presence with each feature. e highest value is 0.56, which is the MA at N � 200. Based on the interpretation, the curve has a moderate positive correlation with human presence. In comparison, the payload correlation is 0.36, which classifies as a low positive correlation.
A line plot can illustrate the growing trend of the PC coefficient based on the number of N values. e line plot is shown in Figure 9. e green line is the growth of the PC based on the increasing N values of the MA, where the red line is the PC of raw movement data. e MA method can increase the PC coefficient of movement features. However, using a data point too large will decrease the correlation. e optimum value is 200.

Training Classification Models.
Because the MA of movement data has a moderate positive correlation with human presence, training a machine learning method with the new feature can hypothetically result in a model with good performance. In the model to be trained, the proposed input features are motion sensor data and some MA curves with different N values. e output class is human presence, with labels 1 for a human being present and 0 for no human present. It means that the type of classification is binary classification. We carry out an exhaustive test to find the optimum classification model for human presence based on movement data. e classification methods used are KNN, SVM, DT, and NB. EV is also applied to improve the performance of some of the mentioned methods. e training process uses 50% of the dataset, while the testing stage uses the rest. It means there are 23436 training data and 23436 testing data. e dataset is shuffled prior to the data split to prevent uneven distribution. e test metrics are accuracy, precision, recall, and F1-score.  Table 2.
In KNN, k describes the number of neighbors involved in calculating the closest distance between test and training data. A test of varying k finds the optimum KNN model. Changes in the value of accuracy, precision, recall, and F1-Score to the increase of k in KNN training is shown in Figure 10. e graph shows the comparison of the performance of the KNN model with k � 1 to k � 5. e values of precision and recall fluctuate, while the F1-score and accuracy values have a decreasing trend. Based on these tests, we conclude that k � 1 is the exact value for the optimum KNN model.
In SVM, the right type of kernel provides the optimum model. e kernel types compared are the linear kernel, 2 nddegree polynomial, 3 rd -degree polynomial, RBF, and sigmoid. A comparison of the performance of the SVM classification model with five kernels is shown in Figure 11. e bar chart compares four performance values: accuracy, precision, recall, and F1-score. In all four metrics, sigmoid has the lowest performance. e 2 nd -degree polynomial has the highest recall but not the highest F1-score. e highest F1-score and accuracy go to the 3 rd -degree polynomial and RBF. However, the precision value of the 3 rd -degree polynomial is lower than RBF. Hence, the RBF kernel provides the optimum SVM model. Using the right model depends on how to understand the data [60]. In 3.4, it has been explained that the selection of the SVM kernel depends on whether the data is linearly separable or not. In addition, the amount of data and the type of data also affect the selection of the model. A scatter plot matrix helps to understand the data better. e scatter plot matrix is often a tool for understanding high-dimensional data [61]. A visualization of the dataset in the form of a scatter plot matrix is shown in Figure 12. e scatter plot matrix shows that the scatter plot between each feature is not linearly separable. It explains why the linear kernel does not produce an optimum SVM model. Moreover, if the data is non-linearly separable and the RBF kernel is more optimum (1) Divide data into K equal folds (2) for k in range(0, K) do (3) R←Fold k in data (4) T←data/R (5) Train T (6) Acc k ←evaluate R with trained model (7) end for (8) Acc←1/K K k�1 Acc k ALGORITHM 1:K-fold cross-validation.

LED Lights
Relay PIR Sensor NodeMCU 4G-WiFi Router Raspberry Pi Google Spreadsheet  8 Complexity than other kernels, then the data is radially separable. In two dimensions, the binary class forms a doughnut shape, with one of the classes in the doughnut hole [62]. A too high max depth value in training can result in an overfitting DT model. e symptom of overfitting is that when comparing the performance of the tree with train data and cross-validation, the performance of the train data will continue to increase. In contrast, the cross-validation value will decrease or stagnate. Pruning is a solution to prevent overfitting the model. When using the early stop method in pruning, adding depths to the tree is stopped when the crossvalidation value starts to drops [63]. e effect of increasing DT max depth on the accuracy of training data and validation data is shown in Figure 13. e orange line in the graph is the model's accuracy based on the train data, while the blue line is the average accuracy based on cross-validation. After the value of max depth � 12, the accuracy value of the cross-validation value decreases, so max depth � 12 is considered to provide the optimum DT model.
NB is a machine learning method that is more suitable for text-based analysis than classification on sensor data [64,65]. It is also seen in this case when comparing the confusion matrix of KNN, SVM, DT, and NB. e confusion matrix of the four classifiers is shown in Figure 14. In the comparison, NB has the lowest performance. However, the FN and FP values of SVM, DTs, and NB are worth observing. e FP value of SVM is higher than its FN. However, it is the other way around in NB. ey are peculiarities that EV can exploit, so we build it based on the four previous models. e test results complete the confusion matrix comparison. e     Accuracy, precision, recall, F1-score, and cross-validation results show that EV has a better confusion matrix than SVM and NB. In addition, the EV model has the lowest FP compared to SVM, DT, and NB. As a result, EV is a model that optimizes the NB model.

Performance Evaluation and Cross-Validation.
Four optimum classification models can be compared: KNN with k � 1, SVM with RBF kernel, DT with max depth � 12, and EV from KNN, SVM, DT, and NB. e performance comparison of the four classifiers is shown in Figure 15. e comparison is in the form of a bar chart. In the bar chart, KNN is the blue bar, SVM is the orange bar, DT is the green bar, and EV is the red bar. Four metrics test the models: accuracy, precision, recall, and F1-score. SVM has the lowest performance in all four metrics of the four models. Between the three remaining models, EV is the only model with a recall value below 0.99. KNN excels in all four metrics, even compared to the DT. e optimum classification model for human presence based on movement data is KNN with k � 1.
e robustness of each model is also measured. We set SVM aside and only compare the models with the top three best performances from the previous tests, namely, KNN, DT, and EV. K-fold cross-validation measures the robustness of each model. e K values for testing are 2, 5, and 10 as they are commonly used [58]. Accuracy metric measures each cross-validation iteration. e cross-validation process accuracy comparison of the three models is shown in Figure 16. A box plot visualizes the performance comparison. In addition to the accuracy average, the box plot can also compare the accuracy variance of each case. For each model, the average accuracy trend increases as the number of folds increases. However, for EV specifically, the variance also increases. e EV owns the lowest average accuracy for each K value. At K � 2, the DT has the highest accuracy variance. For all K values, KNN has the lowest variance and the highest average. It concludes that KNN is also the most robust model apart from having the best performance. e KNN model with k � 1 can still be optimized. Not all features will be related to the output class in machine learning. If an irrelevant feature enters the training process, what happens is garbage in, garbage out, and the performance of the model will drop [66]. Hence, at this stage, feature selection is carried out based on the PC value, previously calculated in 4.2. Assuming that increasing the number of uncorrelated features will reduce the performance of the classification model, the following scenarios are made based on the PC value and compared: Complexity the two features have a high correlation and one of them is not excluded, the performance of the model will become poor, especially the linear regression model [67]. For example, revisiting the PC matrix in Figure 8, MA (N � 600) is highly correlated to MA (N � 800). We investigate this by applying the moving PC to the time-series dataset. A visualization of the application of the moving PC with N � 6000 to the dataset can be seen in Figure 18. We take a snapshot of two different cases. e upper part of the image is a situation where there is not much fluctuation in attendance. In this situation, MAs with high N have a high correlation. e bottom part of the image is a situation where there is much fluctuation in human presence. e MA with low N has a high correlation in this situation. It explains why the 5-feature model has the best performance.
We use the test data to measure how directly using PIR sensor movement data to lighting control would perform. We call it the raw method. e significance of the presence classification model (proposed method) on the raw method appears in a side-to-side comparison. e comparison of the two methods is shown in Figure 19. e image is in the form of a bar plot. It shows accuracy, precision, recall, and F1-score, where the proposed method is the blue bar, and raw is the orange bar. e two biggest significances are accuracy and recall, 99.7% to 67.8% and 99.8% to 62.6%, respectively.

Complexity
Visualization can showcase the performance of the KNN model in predicting human presence. e visualization compares the actual time-series attendance and the predicted time-series attendance. e comparison of the two and also movement data with sensors is shown in Figure 20.
e top part of the image is the time-series presence of the PIR sensor measurement results. en, the middle part of the image is the actual attendance time series. e last part of the image below is the time series of the prediction results of the KNN model. When compared between the movement data from the PIR sensor and the presence data from the KNN model predictions, the latter is more in line with the actual presence data.

Discussion
In the test results, the application of MAs to the movement data of the motion sensor results can increase the PC of the features on the actual presence of humans in the room. is is in accordance with existing studies, namely, [20][21][22][23] and [24]. e related studies use MA to increase the correlation  of regression and classification features, among others, for noise reduction and forecasting. e reason why KNN can be better than SVM, DT, NB, and EV is the nature of KNN, which is robust against noisy data [68]. SVM with RBF kernel is indeed good for radially separable data. However, if the data has high variance, then it possibly affects the performance of both SVM and DT in performing data separation.
We make direct comparisons of our proposed method with related studies to emphasize the contribution and novelty of our proposed method. e related studies are human presence-based smart lighting control using other equipment.
e comparison is given in Table 3. e superior values of each column are made bold. Compared to the benchmarked studies, our proposed method has the best performance, 99.8%. e research [11] has an approximate result, which is 99.3%. e study uses a concept similar to a MA, namely, a sliding window to calculate several statistical features such as mean, standard deviation, and max. A random forest RF model is an optimum model that applies the sliding window feature in the study. However, it uses several different sensors, some of which are expensive sensors, such as smartwatches and real-time location systems. Studies that also use expensive devices for activity recognition are [27], which uses a depth camera; [18], which uses five monochrome cameras; and [12], which uses Switchmate.  e latter research does not use accuracy in calculating performance but uses LUR. LUR is their proposed metric that describes the ratio between the time the lights are on and the time someone is present in the room. We assume that LUR is equivalent to accuracy. Our proposed method is the method with the best performance and a low-cost solution.
Moreover, we also investigate the factors that influence performance in studies regarding PIR sensors in human presence-based smart lighting control. e comparison of these related works is shown in Table 4. Based on our proposed method and [9], it seems that there is a negative relationship between the number of activities and performance. However, [5] that has 14 activities has a better performance than [10,17] that only have five activities. In addition, our proposed method and [9] are location-based methods. A person's presence is determined based on whether the person is under sensor or not. Meanwhile, [5,10] and [17] that have significantly lower performance are not location-based. ese methods define an activity where the activity is independent of its location. Further research can investigate the performance of a PIR sensor-based activity recognition on determining activities that are locationbased.
In future work, the direction of this research is to increase user comfort from smart lighting. Hence, if the automatic light control is carried out based on the presence of people, people will not feel discomfort. For the long term, the research aims to measure user comfort when users use smart lighting that has applied the novel method in this research. e user comfort method proposal is a novel one, which is a quantitative method.   Figure 18: Visualization of the application of the moving PC (N � 6000) to the dataset. Furthermore, the next future work is to use this novel method to monitor the movement of people in the house.
is achievement leads to a novel predictive control of lights based on the user movement. e benefit of this proposal is that automatic light control can occur without the user being aware of it. As an illustration of the case, before people enter the room, the lights are already on. It will furtherly increase user comfort while still maintaining energy efficiency.

Conclusions
is paper proposes CIMA, a novel classification-integrated moving average model for smart lighting intelligent control based on human presence. A smart lighting system based on the Internet of things (IoT) applies the proposed method. It uses passive infrared (PIR) sensors, light-emitting diode (LED) lights, relays, NodeMCU, Raspberry Pi, and supporting software. In the PC test, the movement data from the PIR sensor has a correlation of 0.36 to attendance, while the moving average (MA) correlation to human presence can reach 0.56. In exhaustive testing of machine learning classification methods, k-nearest neighbor (KNN) is the model with the best and most robust performance with an accuracy value of 99.8%. It is more accurate than direct light control decisions based on motion sensors with 67.6%. We conclude that our proposed method can increase the correlation value of movement features on attendance. At the same time, an accurate and robust KNN classification model is applicable for human presence-based smart lighting intelligent control.

Conflicts of Interest
e authors declare no conflicts of interest.