Rough Set Approach toward Data Modelling and User Knowledge for Extracting Insights

Information is considered to be the major part of an organization. With the enhancement of technology, the knowledge level is increasing with the passage of time. This increase of information is in volume, velocity, and variety. Extracting meaningful insights is the dire need of an individual from such information and knowledge. Visualization is a key tool and has become one of the most signiﬁcant platforms for interpreting, extracting, and communicating information. The current study is an endeavour toward data modelling and user knowledge by using a rough set approach for extracting meaningful insights. The technique has used diﬀerent rough set algorithms such as K-nearest neighbours (KNN), decision rules (DR), decomposition tree (DT), and local transfer function classiﬁer (LTF-C) for an experimental setup. The approach has found its accuracy for the optimal use of data modelling and user knowledge. The experimental setup of the proposed method is validated by using the dataset available in the UCI web repository. Results of the proposed study show that the model is eﬀective and eﬃcient with an accuracy of 96% for KNN, 87% for decision rules, 91% for decision trees, 85.04% for cross validation architecture, and 94.3% for local transfer function classiﬁer. The validity of the proposed classiﬁcation algorithms is tested using diﬀerent performance metrics such as F-score, precision, accuracy, recall, speciﬁcity, and misclassiﬁcation rates. For all these performance metrics, the KNN classiﬁer outperformed, and this high performance shows the applicability of the KNN classiﬁer in the proposed problem.


Introduction
With the passage of time, the information and user knowledge become increasing. is is due to the advancements and rapid development in technology. Essential information has become the need of users in their daily life which requires the support of advanced tools like Hadoop, Tableau, Informatica PowerCenter, and so on. e data and knowledge exist in diverse shapes such as structured and unstructured. e structured data are mostly easily understandable and can be managed, while extracting meaningful insights from unstructured data has become a challenging task. According to the report of IDC [1], in late 2011 about 1.8 ZB of data were created. Globally, electronic data of approximately 1.2 ZB (10 21 ) are generated per year by diverse sources [2]. By 2020, 40 ZB data are expected [3]. Human beings are always interested to capture the knowledge in an easy and effective way. is easiness is due to the translation of data and knowledge through graphs or maps for user understanding.
e role of visual context is obvious through which the patterns are identified from huge bulk of data and can be transformed through graphics and visualizations. Conclusions are drawn from the data through collection of data, modelling of data, and processing of data finally to plot the derivations. From interrelated perspectives, the data, knowledge, and information are mostly used in visualization. e aim of visualization is to gain meaningful insights from the data [4]. Users can interact with the data using the techniques of visualizations and go for analysis of the data and knowledge. One can communicate through data visualization in an effective and easy way for easy transferring of message and technical drawing for scientific purposes.
In order to support the data modelling and user knowledge, the proposed research contribution is to use a rough set approach toward user knowledge and data modelling for extracting insights. Different algorithms of rough set such as KNN, decision rules, decomposition tree, and LTF-C were used for the experimental setup. e dataset was used for the experimental setup of the proposed method which is available in the UCI web repository [5]. KNN is suggested in different problems such as text recognition [6]. e organization of the paper is as follows. Section 2 represents the related work to user knowledge, data modelling, and visualization with different literatures. Section 3 shows the research method and modelling of the proposed study with the detail of visualization of the dataset. Section 4 gives the results and discussion. e paper is concluded in section 5.

Related Work
Researchers are trying to use different approaches, tools, and techniques in order to analyze user knowledge, data modelling, and visualization. Table 1 shows the brief descriptions of the existing approaches available in the literature.

Rough Set Approach toward Data Modelling and User Knowledge for Extracting Insights
Machine learning algorithms play an important role in different areas of research [16,[25][26][27][28][29][30]. In this paper, a rough set approach is used for data modelling and user knowledge to extract meaningful insights. e rough set approach works well in a situation of uncertainty by plotting the lower and upper approximations. e obtainable model or rough set consists of "IF THEN rules." e rough set was presented by Pawlak in 1982 [31]. It has a specific lower and upper approximation boundary area. Lowering the degree of precision in the data makes the data pattern more clearly. Rough sets and boundaries can be mathematically presented as follows [32]: It shows two possibilities: the element belonging to the set and the element possibly belonging to the set. Figure 1 shows the concept of rough set. Figure 2 represents the workflow of the rough set theory application. e main parts in the workflow are explained in this section. e experimental process of the above flow shown in Figure 2 has been implemented using RSES [33]. Rough set and fuzzy rough set theories are based on some preliminary parts [34]. e reason behind the selection of the rough set approach for the proposed research is that it works very well in situations of uncertainty and vagueness. e following main parts were considered for the experimental setup: (i) Decision/information table (ii) Indiscernibility, reduct, and core (iii) Cut and discretization (iv) Rules generation 3.1. Classification Measurements. Various formulations have been performed for the classification measurements. e formulation of measures is given below: coverage � no. of cases satisfying condition and decision no. of cases satisfying decision . (2)

Results and Discussion
Different algorithms of rough set were applied for the experimental setup of the proposed research. ese algorithms include KNN, decision rule, decomposition tree, and LTF-C. Figure 3 shows the knowledge level of the user along with the number of decision instances. Figure 4 shows the algorithms along with the number of rules for the given decision instances.
Different performance metrics such as specificity, accuracy, precision, F-score, recall, and misclassification rates are followed to check the validity of the proposed model based on different classification algorithms. ese algorithms include KNN, cross validation/k-fold mechanism, decision rules, decomposition trees, and local transfer function classifier. e accumulated results and discussion are discussed below in detail.  Reference Year Method Description [7] 2020 Knowledge transfer by the domain-independent user latent factor e study proposed an approach of knowledge transfer by the domain-independent user latent factor for cross domain recommender systems. e method has used tr-factorization. [8] 2020 Assessment of linked data visualization tools e study presented the analysis of the state-of-the-art tools for the visualization of linked data. List of 77 linked data visualization tools from the previous research and integrating new tools published are given. Based on usability and their features, the visualization tools are compared and described. [9] 2020 Role of media in user participation e study considered the usage of media effects in online commentaries on creating knowledge. e user groups were divided into three categories: passive participants, active participants, and bystanders. eir experimental results reveal that the active participants largely tend to use tablets, PCs, and smartphones for creation of knowledge in online space.
[10] 2020 Entrepreneurs' advantages from user knowledge to create innovation in the digital sector e authors have focussed on the user knowledge value to entrepreneurs and tackled the gap in the literature associated to the activities of entrepreneurs and user knowledge in the digital services. e framework of innovation opportunity space is proposed and has been applied on a UK-based mobile telephony supplier giffgaff for the issues faced by the user knowledge application to digital services. [11] 2019 Visualization of knowledge and nanocrystal modelling geometry e study has extracted important insights from the crystal's geometry and physical properties for creation of new structuring according to the methodology of knowledge and visualization. [12] 2019 User choice of interactive data visualization format e authors have investigated cognitive style, task difficulty, spatial ability affect choice, and preference of visualization format and then how the visualization selected affects the confidence and decision accuracy.
[13] 2019 Architecture and optimization of data mining modelling for visualization of knowledge extraction Gebremeskel and Biazen have designed a system capable of analyzing and handling the data which is in large scale. [14] 2019 TrajAnalytics e study presented TrajAnalytics, an open source software for modelling, transforming, and visualizing the urban trajectory data, for the study of urban and transportation. e approach allows practitioners to understand the data of the population mobility and find out knowledge. A conceptual model for data is presented which incorporates geostructures with trajectory data with the help of different access queries of data.
[15] 2019 Visualization and analysis of schemas and instances of ontologies for improving user tasks and knowledge discovery e authors have proposed a solution of visual analytics based on the use of several coordinate views for the description of diverse aspects of ontology and the technique of degree of interest use for reduction of complexity in the visual representation of ontology.
[16] 2018 Interactive machine learning by visualization e research presented an approach of visual analytics for the visual data mining and interactive machine learning. In the approach, techniques of multidimensional data visualization are applied for the facilitation of user interactions with machine learning and data mining process [17] 2018 Making graph visualization a user-centred process e study has explored a cognitive approach for following user-centred process in the visualization graph. A graph-based visualization model is proposed which is a two-stage conceptualized assessment cycle. [18] 2018 A user-based taxonomy for deep learning visualization Yu and Shi presented a minisurvey consisting of the user-based taxonomy that converts the works of state of the art in the field.

Complexity 3
are depicted in Figure 6. From the figure, it is concluded that the cross validation provides good results but its performance is not as good as the KNN-based model. It generates comparatively large values for the misclassifier than the KNN-based model. Also, it generates small % age values for other performance measures. ese small accuracy values and high misclassification rates show the inability of the cross validation mechanism in the proposed field. (iii) Decision rules: the results of the decision rulesbased classification architecture are depicted in Figure 7. Compared to both the KNN and cross validation models, its accuracy results are too small and its misclassification rate is very high. is low performance reflects the inability of the decision rule-based architecture in the proposed field. (iv) Decomposition tree: the results of the decomposition tree-based classification architecture are depicted in Figure 8. Compared to the prescribed KNN cross validation and decision rules-based models, its accuracy results are too small and its misclassification rate is very high. is low performance reflects the inability of the decision rulebased architecture in the proposed field.

Reference Year
Method Description [19] 2017 SemUI e authors have proposed a SemUI tool-based solution as a multitiered method consisting of (a) a semantic layer which incorporates data through notion of entity of the real world and groups them based on their differences and similarities and (b) a layer of visualization which concurrently shows several views based on entities properties. [20] 2017 Visualization of multidimensional resource space e study proposed an interface of multidimension for adopting the resource space model and presented its advantages in property letting application. [21] 2014 Model of knowledge generation for visual analytics e authors proposed a model of visual analytics knowledge generation to tie different frameworks. [22] 2014 CoDe modelling e study presented a methodology for exploiting visual language CoDe based on a logic paradigm. e CoDe is giving a structure for organizing visualization by the CoDe model and represents graphically the relationships between items of the information. [24] 2012 Graphical representation and exploratory visualization for decision trees in the KDD process e authors presented an approach of representation and a scheme of investigative visualization for the decision tree in the knowledge discovery database process for data mining.

Lower approximation
Upper approximation

Complexity
(v) LTF-C-based results: the results of the LTF-Cbased classification architecture are depicted in Figure 9. For some keywords, it generates the optimum results, but for some instances, it generates high misclassification rates. For two objects, it generates a misclassification rate greater than 60% and 17% that can generate vague results. In recognition task, vague results are never acceptable, and this ultimately reflects the nonapplicability of the LTF-C-based architecture in the proposed model.

Conclusion
With the enhancement of technology, the level of user knowledge is increasing day by day. is increase of information is in volume, velocity, and variety. Extracting meaningful insights is the dire need of an individual from such information and knowledge. Visualization is a key tool and has become one of the most significant platforms for interpreting, extracting, and communicating information. e current study is an endeavour toward data modelling and user knowledge by using the rough set approach for extracting meaningful insights. e technique has used different rough set algorithms such as KNN, decision rules, decomposition tree, and LTF-C for the experimental setup. e approach has found its accuracy for the optimal use of data modelling and user knowledge. e experimental setup of the proposed method is validated by using the dataset available in the UCI web repository. e KNN algorithm shows good accuracy among the algorithms used for the experimental setup of the proposed research. e results have an accuracy of 96% for KNN, 87% for decision rules, 91% for decision trees, 85.04% for cross validation architecture, and 94.3% for LTF-C. e validity of the proposed classification algorithms is tested using different performance metrics such as F-score, precision, accuracy, recall, specificity, and misclassification rates.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.