The Fusion Model of Multidomain Context Information for the Internet of Things

The Internet of Things aims to provide the user with deep adaptive intelligence services according to the user’s personalized characteristics. Most of the characteristics are presented in the form of high-level context. But it often lacks methods to obtain high-level context information directly in the Internet of Things. In this paper, so as to achieve the corresponding high-level context information using the specific low-level multidomain context directly obtained by different sensors in the Internet of Things, we present a machine learning method to construct a context fusion model based on the feature selection algorithm and the multiclassification algorithm. First, we propose a wrapper feature selection method based on the genetic algorithm to obtain a simpler and more important subset of the context features from the low-level multidomain context, by defining a suitable fitness function and a convergence condition. Then, we use the decision tree algorithm which is a multiclassification algorithm, based on the rules obtained by training the subset of context features, to determine which high-level context the record set of the low-level context information belongs to. Experiments confirm that the model can be used to achieve higher classification accuracy without more significant time consumption.


Introduction
The Internet of Things technology is a network expanded to Internet-enabled objects, whose main function is to connect these objects [1]. It not only greatly improves the convenience of networks but also meets needs that people could not imagine before [2,3]. The Internet of Things is full of various kinds of information about communication, sensing, and computing information to provide the user with more intelligence services [4]. The context data which is produced by the process of providing intelligence services by sensors and intelligent devices in the Internet of Things is massive and valuable, because the context data can be used to affect the human's service experience in many different ways. The "context" is not a new concept, but up till now, no unified definition of "context" exists either theoretically or practically. This is because different researchers put forward different definitions based on different backgrounds, different understandings, and different perspectives [5]. Schilit and Theimer proposed the concept of "context" [6] first, and they thought that the context information includes the user's position, the user's identity, the physical objects around the user, and the interaction state of the devices used by the user. Then, many researchers [5][6][7][8][9] proposed the definition of context information based on their own research field and perspectives. On the whole, these definitions were defined based on the traditional viewpoint of "user-center," which mainly consists of three basic essential elements: human, machine, and environment. However, for the Internet of Things, the context information should be coordinated between two humans, a human and an object, and two objects. We adopt the definition in reference documentation [10], which regards context information as the interaction information between human, object, machine, and environment in the Internet of Things. It contains both the preset static information and the dynamic information caused by the interaction. In order to adapt to the user's personalized needs, the provided service should have significant and personalized characteristics to 2 Wireless Communications and Mobile Computing adapt to the user's context feature [11]. Usually, during the lifetime of the service in the Internet of Things, when the user changes the low-level multidomain context, such as location, temperature, or illumination, the high-level context for the user's personalized characteristics may always be changing [12]. So, it is vital to get the high-level context feature by fusing the low-level context information of the user's multidomain environment timely [13].
Context fusion is a process to obtain the high-level context by dealing with the multidomain low-level context (which consists of monitoring, sensing data, and so on) based on some methods and prior knowledge. Nowadays, most researches have adopted the method of rule reasoning. For example, the middleware of Context Toolkit [14] supported context reuse and customization by the abstract representation. Through encoding program for the logical rules, Context Toolkit achieved conflict detection and reasoning by OOPS (Organized Option Pruning System). Gaia [15] adopted CORBA (Common Object Request Broker Architecture), which is a distributed component architecture based on the thought of operating system, to achieve efficient reasoning by using a first-order predicate logic model. CORBA also introduced the idea of probability and fuzzy logic to deal with the uncertainty. The project of PACE (Pervasive, Autonomic Context-aware Environments) introduced three-valued logic to handle some uncertainty information based on the graphical context modeling language CML (Context Modeling Language) [16]. The SOCAM (Service-Oriented Context-Aware Middleware) [17] based on OSGI (Open Service Gateway Initiative) and the context application middleware CoBrA (Context Broker Architecture) based on agent [18] have both used OWL ontology to achieve reasoning. The expression of the rule-based reasoning is direct, unified, and accurate. This method is more suitable for solving small-scale datasets, but it is difficult for it to deal with complex systems and large-scale datasets, because the relationships between the observed symptoms and the corresponding diagnoses in large-scale systems are more complex, so it is difficult to sum up the effective rules in view of the experience of experts. This paper has introduced a method of machine learning to construct a context fusion model in order to realize the fusion processing of large-scale and multidomain context information.

The Classification of the Context
There is no unified standard for the classification of the context. Different researchers proposed different partition methods based on their own research backgrounds, applications, and research requirements. Dey et al. proposed that the context information was location, identity, activity, and time. Context was defined as user and role, process and mission, position, time, and equipment by Kaltz et al. [19], in order to make an extensive range of the mobile and network context [20]. Schmidt et al. proposed that the context information was the human factor and the physical environment in [21]. Luo et al. [22] made a more detailed distinction about source, purpose, and varying frequency of the context information.
We analyzed the context information which may be produced in the new networking environment and the processes of service registration, service discovery, service selection, and service composition based on the needs of the user under the Internet of Things and defined the context information as the environment context information, the device context information, the user context information, and the calculation context information. Details are as follows.

The Environment Context Information.
It is used to describe the context of service environment in the Internet of Things [23], including the specific environment, the scene environment, and the temporal environment. The specific environment refers to the measurable environmental status (such as temperature, humidity, and sound and light noisiness). The scene environment refers to a relatively stable environment (such as conference room or a café). The temporal environment includes some temporal objects (such as time point and time period) and the relations among temporal objects [24].

The Device Context Information.
It mainly refers to the ability and outline of a device in this paper. The device is the carrier of the interaction between the service and the user, including the static device context and the dynamic device context. The static device context includes the device type context and the display performance. The dynamic device context includes the signal intensity, the moving speed, and the electricity of the device.

The User Context Information.
It mainly refers to the ability and outline of the user in this paper. The user refers to the entity which can initiate the service demand in the Internet of Things (people, an ordinary object, etc.). It includes the static user context, the dynamic user context, and the historical user context. The static user context includes the user's identification, identity, and preferences. The dynamic user context includes the user's position, posture, and permission. The position includes the geometric position (specific latitude, longitude, and altitude obtained through GPS, etc.) and the relative position ("end of corridor," "the east side of the classroom," etc.). High-level context category 2 High-level context category t · · · n < m obtained (e.g., light, remaining battery), which are called the low-level contexts. The original acquisition is the process of obtaining various types of context information (or data) directly from a variety of context information sources (such as sensors, RFID readers, and cameras). The others need to integrate the low-level context information which was obtained from the information source to obtain the high-level context information (scenes, relative position, etc.). For example, a scene which the user is faced with cannot be obtained directly, and it must be a comprehensive analysis of the low-level context, such as the temperature, the humidity, the speed, the position, and the direction, to infer which scene the user faced (in the corridor, upstairs, etc.). The context information of the perception is often complex and has a multidomain for the Internet of Things, and it is difficult to sum up the effective rules based on the experience of the experts. So, it is not suitable to use the rule-based method for reasoning. This paper has introduced the machine learning method and given the context fusion model based on the genetic algorithm and the decision tree. The model structure is shown in Figure 1. First, the feature selection method is adopted to deal with the input dimension reduction. Then, it can determine the kind of high-level context by fusing the low-level context based on the classification algorithm. This method can be trained to obtain inference rules through a large number of samples, and it does not require human intervention. Although the inference result is not very accurate, it is easy to implement and apply in processing large-scale information. It will be more feasible if the requirements of identification accuracy are not very strict.
The model contains two parts: feature selection and classification. The purpose of the feature selection is to decrease the dimension of the training samples and delete some features which are unrelated or weakly related to the task, in order to obtain the simple but important feature subsets [25]. The sample can be trained to get classification rules on the basis of the feature subsets. Then, according to the classification rules, some low-level contexts can be classified as a kind of high-level context. In this model, dozens of dimensions (recorded as " ") of the low-level context information constitute the original input space, and the high-level context corresponding to the low-level context information (recorded as " ") constitutes the output space. We can obtain ( < )-dimensional compact feature subset by the feature selection algorithm (filter, wrapper, or mixed mode) for the original input space. Then, we can reduce the dimensions of the test data based on the feature subset. Finally, the classification and recognition task of each sample is completed by the multiclassification algorithm (Bayes, decision tree, or SVM).
Bayes classification follows Bayes theorem. Bayes theorem gives a method to calculate the posterior probability. Bayes classification provides a method which can combine practical learning algorithms and prior knowledge and observed data. It provides a beneficial perspective for understanding and evaluating many learning algorithms. The naive Bayes classification is the most commonly used method in Bayes classification. As the name suggests, this classifier uses the naive Bayes theorem to get the classification for a given variable value. The naive Bayes classifier is a very simple classification algorithm based on probability models with independence assumptions between predictors. The independence assumptions do not often have an impact on reality. Therefore, they are considered as naive. A naive Bayes model is very useful for large datasets, which is easy to build and with no complicated iterative parameter. Despite its simplicity, the naive Bayes classifier is widely used, because it usually behaves well and often outperforms more sophisticated classification methods. 4 Wireless Communications and Mobile Computing Support vector machine (SVM) is a set of supervised learning methods used for classification, regression analysis, and outliers detection, which is derived from the statistical learning theory. It often yields great classification results from complex and noisy data. SVM is mainly used for two categories of classification problems. Although some of the papers mentioned that the support vector machine combination can be used to solve the class classification problem, this process requires some caution.
In the classification technique, the decision tree is a powerful classification method. It can be used to determine the characteristics of the training data segmentation, resulting in a good generalization. Decision tree algorithm can naturally deal with binary or multiclassification problems. And the leaf nodes can refer to any of the classes concerned.
Through a lot of experiments, we find that the decision tree classification algorithm and the wrapper feature selection method based on the genetic algorithm have achieved good classification results in the context information for the Internet of Things. In this paper, the experiment will be given later in Section 4. Next, we will introduce the feature selection method based on the genetic algorithm and the multiclassification method of the decision tree.

Feature Selection Based on the Genetic Algorithm.
Generally, there are three modes of feature selection, namely, the filter mode, the wrapper mode, and the mixed mode. The filter mode uses the properties of the data itself as an evaluation index for feature subset, and the wrapper uses the correct rate of the machine learning algorithm as the evaluation index of the feature subset [26]. In general, the filter feature selection is faster. The process of the selection is not related to the machine learning algorithm, so the feature subset may not be adaptable to the certain machine learning algorithm. This makes the result subset after feature selection not necessarily the optimum one. The wrapper mode is slower than the filter, because it needs to do cross-certification and more complex calculation. The feature subset can be adapted to the classification algorithm, so the selection result is generally better. The mixed mode needs to do feature selection in two steps, so the computation time is very large while the accuracy rate is not improved significantly. So, the mixed mode is not used commonly. This paper has chosen the wrapper model to do the feature selection. The working principle of the wrapper is that it needs to package data into different feature subsets in accordance with dimensionality and make its selection through the correct classification rate. So, we need to search the whole feature subset space.
Search strategies are generally divided into three types, namely, exhaustive search, heuristic search, and uncertain search [27]. The exhaustive search strategy can search for all possible feature subsets, and it will be able to find the optimal subset of features. But it is difficult to achieve the optimal solution for a large number of features, because the cost of the space and time is large. The heuristic search strategy will search for subset features according to a certain heuristic rule [28]. Its cost is less, but it is liable to fall into a local optimum, and the global optimum cannot be obtained. The uncertainty search strategy is a balance of the above two kinds of search, for example, the genetic algorithm. We use the genetic algorithm in this paper. Figure 2 shows the wrapper mode based on the genetic algorithm in the feature selection process. Because the context classification is a kind of multiclassification, the typical methods of Bayes, decision tree, and SVM are chosen as the evaluation function of feature subset. The decision tree is the main method used in this paper; Bayes and SVM are the methods used for comparative experiments.
In Figure 2, the parameter needs to be initialized first, which is a key of the genetic algorithm. The feature subset is coded as the only parameter in this paper. Then, the model completes the generation of feature subset, the evaluation of feature subset, evaluation stopping, and the verification of the result. An initial subset is randomly generated according to the initial parameters and the original population. The characteristic subsets of each generation are generated according to the relevant parameters calculated by the genetic operator. The fitness function is defined as follows: In (1), is the th generation individual. CR( ) is the correct classification rate of the subset. is all subsets in this generation. | | is the number of subsets. CR max is the maximum classification accuracy of the subset in the th generation. CR min is the minimum classification accuracy of the subset in the th generation. CR avg is the average classification accuracy of the subset in the th generation.
Genetic operators are the key to implement the optimal search. There are three kinds of operators, namely, selection, crossover, and mutation [29]. The selection operator selects the individual which can be inherited to the next generation by a certain strategy, based on the evaluation of individual fitness [30]. The crossover operator randomly selects two chromosomes according to a certain crossover probability and exchanges some of its genes in some way to form two new individuals. The mutation operator is used to search capability [31] of the genetic algorithm. The mutation operation is an auxiliary method to generate new individuals, and it determines the local search ability of the genetic algorithm. The mutual cooperation between the crossover operator and the mutation operator will complete the global search and local search for the search space. It can make the genetic algorithm complete the search process with a good performance [32]. In this paper, two conditions of the algorithm convergence are designed: one is that the subset has achieved stability and the other is that generation quantity has been over the threshold.

Decision Tree Classification Algorithm.
One of the commonly used methods in data mining is the decision tree learning method. The goal of the decision tree learning method is to create a model that predicts the value of the target variable based on several input variables. Each internal node corresponds to an input variable and there are edges to children for each possible input variable value. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. The top node of the tree is root. The decision tree can be established and redefined by the training of test samples. The building process of the decision tree is a machine learning process [33].
Currently, there are lots of classic decision tree algorithms, such as the ID3 algorithm, the C4.5 algorithm, and the CART algorithm. The ID3 algorithm can only deal with discrete data. The C4.5 algorithm has made some improvements to the ID3 algorithm according to the information gain ratio which is used to select the test attributes. Some of these are as follows: dealing with both continuous and discrete attributes, dealing with training data with missing attribute values, dealing with attributes with differing costs, and pruning trees after creation [33]. The CART algorithm cannot efficiently handle large-scale training sample data. Based on the above analysis, in this paper, we have chosen the C4.5 algorithm.

The Calculation of the Information Gain.
If the value of the property (recorded as " ") of the sample will divide the sample set (recorded as " ") into subsets, namely, 1 , 2 , . . . , , the formula for calculating the information gain is as follows: In (2), "| |" is the number of samples in dataset " "; "| |" is the number of samples in subset " "; info( ) is calculated as follows: where freq( , ) is the frequency of the category of the sample data " " and " " is the number of categories of the sample " ."

The Calculation of the Information Gain Ratio. One has
In (4), splitinf( ) represents the split information, which is the potential information generated when " " is divided into "ℎ" parts; the formula is as follows: 3.5. Building the Decision Tree. The method of building a decision tree is proposed as follows: S1: Create the node " " and start building the decision tree from the node.
S2: If the samples are in the same class, the node becomes a leaf node. Label the node with this class.
S3: Otherwise, for each property, the data should be dispersed if its data is continuous.
S4: The information gain ratio is calculated for each attribute, and then the property which has the highest information gain ratio will be selected and labeled.
S5: The consistent value is calculated for the properties of each branch. And then it produces the branch with the same value. S6: Let " " be the branch set of the training test set. If " " is empty, it needs to add a leaf node and be marked by the class. S7: If " " is not empty, go to "S4." To prevent overfitting between the established trees and the training samples to enhance the speed and accuracy of the subsequent classification, we usually need a pruning strategy.
Pruning is a technique in machine learning. It reduces the scale of decision trees by eliminating sections of the tree that offer small power to classify instances. Pruning not only reduces the complexity of the final classifier but also improves the predictive accuracy by the reduction of overfitting. Common methods of calculating the classification error rate and the encoding length of the decision tree are used to prune the decision tree [34]. For each nonleaf node, the pruning method will calculate the expected classification error rate if the node is pruned based on the classification error rate. At the same time, according to the classification error rate of each branch and the weight of each branch, the expected classification error rate will be calculated if the node is not pruned. If the expected error rate gets larger because of the pruning, the pruning will be abandoned, and each branch of the corresponding node will be retained. Otherwise, each branch of the corresponding node will be pruned. In the pruning process, an independent test dataset is needed to be used to evaluate the accuracy of classification for the pruned tree, to retain a pruning decision tree which is the minimum expected error rate after being pruned.

Experiments and Results Analysis
In this paper, we use a context information dataset, "Sensor Signal Dataset for Exploring Context Recognition of Mobile Devices" proposed in [35]. There are thirty-two columns about the sensor information [25]. Among them, the first column and the second column show the sequence number of the test scheme and the times of the test repetition; the third column shows the context information of the time; the fourth column to the ninth column show the context information of the device direction; the tenth and the eleventh columns show the context information of the device stability; the twelfth column shows whether the device is in the hand; the thirteenth column to the nineteenth column show the context information of the illumination; the twentieth column to the twenty-third column show the context information of the temperature; the twenty-fourth column to the twenty-sixth column show the context information of the humidity; the twenty-seventh column to twenty-ninth column show the context information of the noise; the thirtieth column to the thirty-second column show the action context information of human behavior. The 10,470 records from the third column to the thirty-second column are chosen in this experiment, to constitute the original context dataset (10470 * 30). According to the image materials given by the authors [28], we organize   In the first experiment (E1), 2,960 records have been selected randomly as a feature selection dataset. The remaining data has been used in the classification experiment. The Bayes algorithm and the SVM algorithm have been used as comparison algorithms. The former uses a more representative method called naive Bayes. The latter uses the SVM multiclassification algorithm based on the voting mechanism and uses the C-SVM algorithm for each binary classification, whose kernel function is an RBF kernel function. The test results are as follows.
The results of feature subset selection of the three algorithms are shown in Table 2. From the table, the SVM algorithm and the decision tree algorithm select thirteen features, but the Bayes algorithm selects 12 features. We can see that the features in the third and tenth column are more important because these three algorithms all have selected these features. Table 3 shows the time consumed by the three algorithms when training and testing. It can be seen that Bayes algorithm takes the least time, and the decision tree algorithm takes more time than Bayes. The SVM consumes more training and testing time than Bayes and the decision tree.
A comparison of the classification accuracy of the three algorithms is shown in Figure 3. As can be seen, the highest is the decision tree whose classification accuracy rate is more than 95%. The second is the SVM with nearly 89%. The lowest is the Bayes method with only approximately 57%.
So as to verify the adaptability of the proposed fusion model in the paper for different context data, we have carried  out another experiment (E2) using another 980 random records in the datasets. We also used the Bayes algorithm and SVM algorithm as comparison algorithms. The results of feature subset selection of the three algorithms are shown in Table 4. The results of classification accuracy of the three algorithms are shown in Figure 4.
Simulation results show that the multiclassification algorithm based on decision tree can achieve higher classification accuracy, compared to the classical Bayes and the SVM classification algorithm. It is suitable for solving the fusion of large-scale multidomain low-level context information. So, it can obtain the high-level context class represented by the low-level context information more quickly and accurately. In addition, the random forest is an extended version of the decision tree, so if the decision tree was replaced by random forest in the context fusion model, it can also achieve very good results.

Conclusion
This paper has proposed a context fusion model based on the machine learning method to achieve the fusion of the multidomain low-level context information under the Internet of Things. First, the dimensions of the original data are reduced by using the wrapper feature selection method based on the genetic algorithm. Then, based on the decision tree classification algorithm, it completes the classification and recognition of the low-level context identification to determine which kind of high-level context it belongs to. The experimental results confirm the validity of the proposed model. If the recognition accuracy requirements are not particularly strict, the model will be feasible for large-scale context information. Further research is needed on the optimal selection of some related parameters in the algorithm.

Conflicts of Interest
The authors declare that they have no conflicts of interest.