Waste Management System Using IoT-Based Machine Learning in University

Along with the development of the Internet of Things (IoT)


Introduction
e Internet of ings (IoT) is a new and promising technology, which has the potential to globally change human life in a positive way, thanks to its diverse connectivity. IoT provides exchange and linkage between lowenergy devices and interactions through the Internet. Many applications around the world have been implementing different operations, based on the background of IoT, to offer novel services for smart cities [1][2][3] and optimize energetic efficiency. For examples, energy efficiency was considered in [4,5], cars were connected by IoT methods in [6][7][8], and water management in smart agriculture was investigated in [9][10][11][12][13], among a long list of other IoT application areas.
One important application is that IoT technology has become a practical and efficient tool to build smart cities. According to [14], a critical issue for a smart city is the increase of waste generation with accelerated population growth in cities. Two significant challenges for waste management are waste collection and path collection. First, waste collection is a daily task in urban areas entailing the planning of waste truck routes, in which environmental, economic, and social factors must be considered. Second, the path length should be shortened, in order to avoid high fuel costs and reduce the work amount, by applying graph theory [14][15][16][17]. Some solutions have introduced the use of IoT devices to estimate the fill level of inboxes and send this data over the Internet to a server for decision-making [18][19][20][21][22][23].
Machine learning (ML) provides effective solutions, such as regression, classification, clustering, and correlation rules perception [24,25], for IoT-based waste management. ere are three main reasons for this: firstly, in IoT applications, all of the devices are connected, and an immense amount of data is collected every day. Furthermore, they may be programmed to trigger some events, based either on some predefined conditions or exciting feedback from the collected data. Secondly, computer systems can learn to perform certain tasks, such as classification, clustering, predictions, and pattern recognition. Furthermore, these systems are trained using numerous algorithms and statistical models to analyze sample data.
irdly, measurable characteristics (called features) usually characterize the sample data; some ML algorithms attempt to find correlations between the features and some output values (called labels). e information obtained through training is then used to identify patterns or make decisions based on new data.
However, human intervention is typically required to analyze the collected data, extract meaningful information, and create smart applications. IoT devices must not only collect data and transfer it to other devices but also be selfsufficient. ey must be able to make context-based decisions and learn from their collected data [25]. Hence, waste management systems using IoT-based machine learning should consider more variables for task prediction. On the other hand, regression is a mathematical model which can represent or recognize the connection between two or more variables. e dependent or response variable is the system's output. By applying a regression model to a system, the relationships between the dependent variables and the independent ones can be detected. Logistic regression (LR) is a kind of regression analysis used to explain the connection between a dependent variable and one or more independent variables. LR is appropriate when the dependent variable takes binary values. As such, it is suitable for prediction tasks in ML and IoT application [24][25][26][27][28][29].
IoT-based waste management models perform a vital function in improving the standard of living and human well-being by increasing energy-efficiency, enhancing governance, and reducing cost. In Vietnam, Ton Duc ang University has set a goal to become an elite research university in the world's top 500 universities. With the successful model and dream of a top Vietnamese university, the application of IoT technology to waste management is one aspect of this model. is article presents a novel platform for smart trash control at the university, which is able to keep the university clean at low cost, requiring low labour resources. A novel microcontroller system is designed with a sensor module for measuring the filling height of garbage using ultrasound and geolocation of collected data based on LoRa technology. Furthermore, the paper presents a new method for predicting the probability of the filling level of each trash bin by applying LR in ML; furthermore, a graph theory-based optimization solution is proposed to compute the paths of waste collection with different time schedules in order to minimize the environmental and socio-economic impacts, as well as supporting the workers of the university. e contributions of our work are: (i) Previous articles have mostly evaluated results on a test board; our work introduces the design of a single microcontroller board, which is of low cost and straightforward, with an ultrasound sensor which can measure the filling height of a garbage can and send information using LoRa E32 technology. (ii) We present a novel IoT-based machine learning method, which is employed to predict the probability of collecting waste in the real environment based on the historical input data. (iii) Our article is the first to propose the use of the sigmoid function for predicting the probability of waste collection and to apply Dijkstra's algorithm to optimize the path for waste collection from trash bins. (iv) e critical success of our article is in testing the performance of our system in the realistic environment of Ton Duc ang University (Vietnam). erefore, the advantages of employing smart trash bins for the trash collection task can be illustrated. Our algorithm has applied the results to implementing various filling height thresholds for determining the assembly of a garbage can, increasing profits, and optimizing the number of workers to be used. e paper is organized as follows: Section 2 summarizes the related literature. Section 3 presents all components and hardware design. Section 4 explains the developed algorithm. Section 5 tests the prototype in a real environment to perform different operations. Finally, Section 6 concludes and estimates future works.

Related Works
Considering the advantages of IoT technologies, many researchers have investigated and developed new applications for smart cities, especially for waste management. To save power consumption and maximize operational time, a simple system that identifies the fullness of trash bins was presented, which collected data and delivered it through a wireless mesh network [5]. However, the idea has still some ambiguous problems in the system. To improve waste management, platform software for smart cities has been introduced in [14]; however, they only concentrated on the 2 Wireless Communications and Mobile Computing collection of data, and their platforms were comprised of technologies from other companies. On the other hand, some approaches have developed waste management strategies based on the optimization to achieve an efficient system. In [15], the authors presented a waste control and management stage to be applied in rural areas using LoRaWAN technology and route optimization. Additionally, an implementation based on IoT was set up, but the system did not provide clarity about communication and optimization for all trash bins in the system. Based on logistic regression (LR) and genetic algorithm (GA) methods, the authors in [16] presented a new method to check the status of smart trash bins and select a collection path in Philadelphia, USA. Moreover, they did not provide any technologies for the transmission of data from the trash bin to the other devices in the system. In particular, optimization algorithms have been clearly defined for IoT-based waste management, such as the nearest neighbour search, colony optimization, genetic algorithm, and particle swarm optimization methods [17,18]. In [19], the authors proposed a solution to manage a garbage system integrated with IoT technology, which was an autonomous line-following vehicle with a robotic hand for garbage collection, in which they did not apply any algorithms to optimize the waste collection. In [20], an IoT platform for an automated waste collection system provided by the project allowed real-time monitoring and interface with the system. However, the aim of this paper was to present an IoT cloud solution combining device connection, data processing, and control, rather than the design and optimization of waste collection. A food waste collection method was presented in [21], where the information was collected using radio frequency identification (RFID) technology and transmitted using a wireless mesh network. However, the disadvantages of this technology were severe in the long range, especially considering the aim of the smart city is management with a large area. Finally, the results of the optimization algorithm became too vague and could not be applied to a real system, such as a city.
For practical waste management, an impressive architecture was proposed for a sensor node in [22,23], which used a microcontroller (ATMega328P), an ultrasonic sensor (SRF05), and a LORA E32 TTL-100 433 MHz module [30][31][32]. Nonetheless, they only tested the test board as a platform to provide sensor nodes and did not apply any methods for waste management in a smart city, such as optimized waste collection. While the target of this article was to propose an IoT application, with their design, the microcontroller board was very complicated; moreover, certain performances and functionalities are needed for each particular application. Our article applies machine learning and graph theory to optimize the waste collection processes, avoid overfilled bins, and reduce the work load.
In this paper, we consider algorithms, based on heuristic models or graph theory, from which we can find ways to minimize the distance of waste collection. e primary purpose is to reduce the total cost of transport, transfer, save labour, and reduce the dependency on used vehicles, while maximizing service quality, as well as improving general quality of life. e other algorithm optimizes waste collection, instead of considering low cost and energy savings, and supports the university's waste management network to perform with high efficiency by applying ML and graph theory methods. We summarize the features of algorithms considered in this part and compare them with our proposal, as shown in Table 1.

System Architecture
e system under consideration consists of smart trash bins with a real-time monitoring system which integrates multichoices, such as ultrasound distance, along with a LoRa E32 TTL-100 433 MHz transmission module. Low energy use was considered throughout the design process. Each node is consequently supposed to be powered by multiple sources; for instance, solar energy or batteries. For flexibility, we have designed hardware that can use either energy source, as trash bins are often put in places where direct sunlight is not available. Selection of the best electronic components for their interconnection and for energy efficiency strategies during the employment of the methods was considered. Furthermore, technical solutions will enable turning off nodes (or parts of them) when they do not work. Once principal components of the node had been chosen, the overall design was targeted at their integration. e proposed system architecture is shown in Figure 1, where the data collected are sent using LoRa to a server, where they are stored and processed. e data are used to track and predict the status of the trash over each period. Furthermore, they will be used to calculate the optimal path, accordingly. e predicted state of each trash bin can be examined, based on assigned training data. It is, then, reviewed to refresh the appropriate waste fill level, which is an essential input parameter of the optimal path algorithm.
Besides these main targets, low-cost containment and high efficiency were secondarily considered. One feature of smart cities is that the requirements of low power consumption imply that there is no connection to the electrical network. e most critical component affecting the energy spread of a sensor node is the transmitting module. So, the design of such architecture must focus on not only the structure of the sensor node, but also the meaning of the system structure. Finally, the description of efficient information transfer technology in large areas should be considered adequately. Contrarily, an IoT node can estimate the volume data of trash bins, and then, the proper sensor to perform this analysis must be determined. erefore, the selection of a sensor which considers the weight of trash bins was not considered, as a high weight does not imply a high trash level, for the following reasons: first, metallic materials can be weighty, yet leave the trash bin mostly empty, while a large quantity of paper can load the trash bin but weigh little [23]. Second, the authors in [16] recommended that, in some cases, the weight to the trash bin should not be considered, but, instead, the volume data and weight need to be combined. After all of the abovementioned specifications, the best selection solution for the ultrasound sensor was decided to be the SRF05 [30].

Wireless Communications and Mobile Computing
LoRa is designed to work in the appropriate band for each country or region because of license-free bands in each different country. e LoRa module depending on the physical layer used can be classified as in the 433, 868, or 915 MHz frequency bands [33]. In Asia, only the 433 MHz bands can be used [34]. We used LoRa E32 TTL-100 433 MHz module which aims at providing communication in the range of up to 2 km, low latency, mobility support, multilayer battery life, AES 128-bit shared key encryption support, and a data rate up to 167 kbit/s [35]. e LoRaWAN is defined as the MAC layer protocol and network system architecture based on the LoRa technology. e topology of LoRaWAN network architecture is star, where the end devices can only communicate with LoRaWAN gateways and not directly with each other [33].
e LoRaWAN gateways are used to respond to forwarding raw data packets from end nodes towards the network server. In our network architecture, we use the LoRaWAN MAC layer, class A, that provides the medium access control mechanism that enables the communication between multiple devices and the LoRaWAN gateway.
Based on previous solutions, in our proposal, firstly, the IoT node architecture, shown in Figure 2(a), is composed of three types of components: ATmega328P, LoRa E32 TTL-100 433 MHz module, and SRF-05. Energy specifications of main active components of the board are listed in Table 2, along with the price (which is below $20) for each IoT node. Secondly, the communication between the IoT gateway node using LoRa E32 TTL-100 433 MHz and the ESP8266 module is shown in Figure 2(b). Regarding the LoRa module and ESP8266, data collected from the commercial IoT node will be assigned timestamps and sent to the cloud. e progression description in Figure 3 summarizes the main interactions by step-by-step construction. Moreover, the IoT nodes are connected directly to a gateway through the LoRa MAC protocol that forwards their data. en, the IoT gateways consist of the transmission communication modules, which not only send the data to the server by applying FireBase API/Host for ESP8266 but also send to users/application. e IoT gateway node is responsible for establishing communications with all the nodes in 24 hours. Its hardware requirements are actually simple, since it only has to be able to host and send the data for the server and offer a communication interface with the node. Besides the hardware previously mentioned, the most important is the power system. For flexibility, the system can be chosen to power the nodes with Li-ion 18650 batteries or solar energy, as shown in Table 3, such that the input voltage is within an acceptable range for operating the LoRa E32 TTL-100 433 MHz module.
Considering the capacity of four Li-ion 18650 batteries or a battery and solar panel kit, with settings as in Tables 2  and 3, we can estimate of the lifetime of the batteries as where t is the lifetime, W t is the power throughout the node (W/h or W/day), and W is the total power (W/h or W/day), which is calculated by equation W � (1/T)( , with W 1 , T 1 and W 2 , T 2 being power and time duration (s) for the ON and SLEEP mode, respectively. e lifetime approximately corresponds to 39 days for the Li-ion 18650 batteries and 13 days for solar panel and battery kit. Moreover, as the average cost of one node is around $20, it is very relevant in the university setting.

Wireless Communications and Mobile Computing
After reviewing the components of the system, they were embedded in a board and tested in a real environment to perform different experiments (see the next section).

Optimal Path Planning Algorithm for Waste Collection
In this section, we discuss the waste collection data, along with their states, positions, and system design, and test real data to verify the output result. First, the obtained information is transmitted through the communication link to the server, where data are processed, saved, and forwarded to the cloud, as studied in the next section. Second, we apply Algorithm 1 to predict the status and route of each trash bin daily, which will be explained in Section 4.3. Note that we consider the location of the aggregate waste collection location of each building, instead of each small trash bin on each floor of the university campus; furthermore, the system will enable a node that does not work to be turned off.     Wireless Communications and Mobile Computing classes with study hours. Additionally, the weight of waste is increased in case of more students in classes. We conclude that the dataset of waste is depended by the number of students. erefore, we put one large waste bin on the ground floor to handle 13 buildings (nodes) in the university campus area.

Logistic Regression Model.
A description of logistic regression can start with an expression of the regular logistic function used for classification problems. It is a predictive analysis algorithm, based on the idea of probability. e logistic function is a sigmoid function, with t ∈ R and an output between (0,1) for any value. e function, called f(s), is given by In this subsection, Figure 4(a) describes the relationship between classes and the status of the trash bin. e green line indicates the direction of the trash bin's level. Taking into consideration the classes, if the green line is near to the state 1, that level of the trash bin will update to 1. Figure 4(b) displays the key algorithm employed in this article, logistic regression.
e primary reason we decided to use this model is due to its excellent characteristics: (i) e yellow line is performing linear regression, which fails to represent the true state as it could have a value greater than 1 or less than 0, which is not possible, as per the hypothesis of logistic regression. is is a not sensible choice for our solution. (ii) e red line (naturally separate from the perceptron learning algorithm's (PLA's) activation function, in which the two classes are 0 and 1, instead of -1 and 1) is the hard threshold. e PLA does not work in this problem, as the data is not linearly separable. (iii) e blue and green lines fit our problem better and have some of the following essential properties: (a) e function is a continuous real value, with range (0,1). (b) Assuming that the point considered is equal to 0.5 as the threshold, the closer the point is to the left, the closer it is to 0; and the closer the point is to the right, the closer it is to 1. is is in accordance with the observation that the more the classes are, the more the students will be able to collect waste and vice versa.

Optimal Path Planning Algorithm for Waste
Collection. e collection paths are the moving sequences, including all of the trash bins within the Tan Phong campus, Ton Duc ang University. e optimization of specific sequences is a combinatorial optimization problem. Considering several paths, we use Dijkstra's algorithm [36], which is active in implementing near-optimal clarifications. As the employees need time to check all of the trash bin, it is very joyful if the percentage filling level of waste is predicted. erefore, the next prediction, the system is going to suggest which one should be collected to check the overload appearance. In this article, the LR algorithm is used to predict the status of each trash bin, based on its actual data.
e overall system is presented in Algorithm 1, and the settings are described in Table 5. For each hour t in a working day, an LR algorithm is applied to predict the status of each bin. If the probability status of each trash bin is higher than a given threshold τ, the filling height of the trash bin is checked by the ultrasonic sensor. If the filling level is higher than 50%, the status is updated as 1. en, Dijkstra's algorithm is applied to find the shortest distance for collecting waste from all full trash bins, turning back to the offices. Consequently, the system detects the optimal paths within the university, which helps the employees to get the collected waste more efficiently.

Operation Tests
Our proposal was tested at Ton Duc ang University with three following considerations. First, we provide and test the algorithm to find the shortest distance in the simulation. Second, we show that these coefficients are entered into the logistic regression equation to predict the probability of collecting the waste in the trash bins. Finally, we design a mobile application to show the effectiveness of the sensor for analysis of the filling level with the data by a wireless communication link. e mobile application also shows the probability prediction and optimal path collection.

Testing of the Algorithm to Find the Shortest Distance.
To generate the nodes as uniformly-distributed points, the three steps mentioned below are used, as generating points using a distance uniformly distributed between 0 and R, and using an angle with uniform distribution from 0 to 2π causes the points to be more dense closer to the origin.
Step 1. Generate a random position inside the circle with radius R (where R is the coverage of the LoRa channel and the co-ordinate of the center is (0, 0)) with polar co-ordinates (r Node , θ Node ).
e point with polar co-ordinates (r Node , θ Node ) is converted to Cartesian co-ordinates by Step 3. Go to Step 1 to define the next node position. After the distributed node positions were generated, the optimal routes are found using Dijkstra's algorithm, based on the filling level and co-ordinates of trash bins. To understanding the operation of Dijkstra's algorithm, we give an example under the assumption in Table 6 with 6 nodes and 11 edges. e considering numbers are distances between two sensor nodes. e relationship between sensor nodes is drawn in Figure 5. Consider that the source node is node 1 and the destination node is node 2, as shown in Figure 5.
ere are any ways from node 1 to node 2. However, under minimizing the distance path between node 1 and node 2, Wireless Communications and Mobile Computing 7 the shortest path from node 1 to node 2 is 1 ⟶ 6 ⟶ 2. To achieve the fast path, Dijkstra's algorithm is used. e operation is shown in Table 7.

Predicted Probabilities of Each
Node. e considered problem is to build a model to assess the ability of garbage collection, based on the number of classes used during the day in different buildings. e relationship between the number of classes per day at each building in the university campus is depicted in Table 4. From the table, it can be seen that the more the classes were used, the higher the number of students was and, consequently, the more likely it was that the accumulated waste would lead to increased waste collection, although there were some cases where the class was abundant and the waste was below the threshold. Nonetheless, there was no threshold on the number of classes which accurately distinguished waste collection, and so, the threshold was set as 0.5. erefore, we can predict the probability of waste collection based on the number of classes.
Considering Table 4, the number of classes in each building had a corresponding status for the trash bins as collected (1) or not collected (0). e LR model was chosen as the predictive modelling algorithm to be applied, as the output variable was a binary classification; that is, the status of the trash bin: collected (1) or not collected (0). We aimed to define a mathematical equation which can be utilized to predict the probability of the case collected (1). Once the equation is estimated, it can be applied to predict the output variables when only the data are known.
e sigmoid function is shown in equation (2).
Applying equations (2) and (A.11) in Appendix A, the solution of the logistic regression was (− 2.20619801 0.05095177). e output indicates that classes are significantly associated with the probability of collecting the waste: Now, we apply equation (5) to Table 4. For example, for a building A, which had 91 classes on Monday, setting the value in equation (5) gives an estimated probability for waste collection of 0.91. Similarly, for building I, which had 62 classes, the estimated probability of waste collection was 0.72. Another example is building F, which had 117 classes on Tuesday, for which the estimated probability of waste collection was 0.97. In general, the higher the number of classes, the higher the chance of collecting waste. Equation (5) also indicates that the probability of waste collection increases as the number of classes increases. As sigmoid is a covariate function, Table 4 shows the probability of collecting waste for several values of numbers of classes. Based on these predicted probability values in Table 4, the threshold and false positive rate (FPR)/true positive rate (TPR) of the Receiver Operating Characteristic (ROC) curve are estimated in Table 8.
Optimal path of trash bin i in the system D � D i | i ∈ (1, n) Data set from  8 Wireless Communications and Mobile Computing Figure 6(a) shows that the Area Under the Curve (AUC) was 0.99, using threshold � 0.5, FPR � 0.1, and TPR � 1; see Table 8. e closer the ROC curve was to 1 and the larger the AUC was, the more effective the model was. For each threshold value in 7, we thus get a pair (FPR, TPR), which represents points (FPR, TPR) on the graph for which the limit changes from 0 to 1. Note that the range of thresholds does not necessarily range from 0 to 1 in general problems, and it should be ensured that the TPR/FPR receives the largest or smallest value it can achieve. A model is effective when there is a low FPR, and a high TPR means that there exists a point on the ROC curve close to the point with coordinates (0, 1) on the graph (upper left corner). e graph in Figure 6(b) shows the loss function; if the values of the loss function are small, then the evaluation provides useful results. In a functional classification problem, it is understandable that a few data points are misclassified. Our results in the figure are, thus, consistent with the loss function model. e loss function and its optimization are presented in Appendix A.1 and A.2, respectively.

Test in a Real Environment.
is section explains the network structure that was designed and implemented in the actual campus at Ton Duc ang University over an area of 30 hectares with buildings (see Figure 7). e number of classes of each building is described in Table 4. In each building, a centralized waste storage area was placed with a gateway and data collection. e authentic structure had three levels for each trash bin: HIGH, MIDDLE, and LOW. As the threshold was 0.5, if the trash bin's level is HIGH (alternatively, MIDDLE), its filling level is 1; otherwise, it is 0. Based on the status, the employees require updating the optimal paths every day toward getting the high-level bins. By applying Algorithm 1, based on the input parameters, the system can determine the probability of waste collection based on the logistic regression function and optimize the choice of moving between buildings within the campus. Figure 8 depicts the results of class data for Monday on a week of the second semester of the 2018-2019 academic year. We can also determine the path for waste collection with the optimal router. e color set for the percentages in the   mobile application in Figure 8 is as follows: 0-20% for placegreen, 21-40% for lime, 41-60% for yellow-green, 61-80% for ochre, and 81-100% for rust.

Conclusions and Future Works
In this work, an optimal algorithm combining graph theory and LR has been described, with the possibility of assessing the probability of a trash bin being fully based on the number of classes in the university. is algorithm presents many advantages, as compared with the old waste collection methods. In addition, this study also provides improvements over ROC for output logistic regression of Table 4.    classical algorithms. e algorithm is integrated into the system with a low-cost design circuit and LoRa technology, enabling its application in practical use-cases, in which changing the sensor components can be done quickly. is study also presented three experiments: first, a test was conducted by simulating the location of arbitrary trash bins and finding the shortest path between the trash bins. Second, the logistic regression equation was applied to estimate the probability of collecting the waste. Finally, a practical-use case of the waste collection process at Ton Duc ang University (Vietnam) was tested. e algorithm used the database of classes used in each university building and data received from smart buckets (e.g., occupancy rate) as input data. In summary, this system provides better operations for optimizing employee use, saving operating costs, and collecting data on time. e system can be built cheaply, simply, and effectively and will be extensively applied in all campuses of the Ton Duc ang University, a Smart University, to unify the filling level of the trash bins by using an ultrasonic sensor. Again, machine learning will be used, in particular, using multiclassification methods.
is equation is equivalent to equations (5) and (A.1); since y i � 0, the first component is 1. To make the model fit the data, we need to find w such that the probability is to the maximum.
Consider all training data with the data matrix X � [x 1 , x 2 , . . . , x N ] ϵ R d×N and with the corresponding label equivalent y � [y 1 , y 2 , . . . , y N ]. en, we need to solve the optimization w � arg max w p(y | X; w). (A.4) Assuming that the training data were generated independently, we can write the likelihoods of the parameters as p(y | X; w) �

A.2. Optimizing the Loss Function.
e problem of optimizing the loss function in logistic regression can be solved using SGD. For each loop, w will be updated to one random variable point. e loss function of logistic regression with one point (x i , y i ) and gradient is J w; x i , y i � − y i log a i + 1 − y i log 1 − a i , (A.7) Assume that we need to find the function a � f(z). As where η is the learning rate. is completes the proof.

Data Availability
(1) e nature of the data is collected by TDT university campus according to the number of classes and their corresponding days. (2) e survey data used to support the findings of this study are included within the supplementary information file(s). (3) ere are no restrictions on data access. Requests for data, 6 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.