A Management Specification for Data Sharing Security in the System Construction of Smart Mine

. With the development of Internet of Tings technology and the informatization of the coal industry, various intelligent applications have emerged in the process of the system construction of the smart mine. During this process, data sharing is essential to the efective use of data resources in the smart mine. In order to improve the protection of coal mine data, this study proposes a set of management specifcations for data sharing applied to the system construction of a smart mine to unify security standards in data storage and sharing. It standardizes the processes of data collection, transmission, and storage. We design three sub-specifcations for these processes, namely, data source specifcation, data quality specifcation, and data storage specifcation. Te data source specifcation specifes the data collection and transmission standards to improve the security and timeliness of data sharing. Te data quality specifcation sets three evaluation criteria of integrity, accuracy, and timeliness according to the characteristics of each business system and data. Te system ensures data quality during data sharing by governing and recording the data failing to meet the criteria. Te data storage specifcation specifes the data storage protocol, data label, and data set restrictions. Only authorized platforms and users can share data and make use of data labels to search data efciently. Finally, we constructed a coal mine data collection and analysis system. It can collect, manage, store, and safely share the real measured data from a certain colliery according to the specifcations.


Introduction
With the rapid development of intelligence in the coal industry, the Internet of Tings (IoT) has become the key technical support for the construction of intelligent mines. Te innovative development of coal resources relies on big data technology. Coal mine data refect the overall production process, production indicators, safety status, and other production information of coal mine. With the application of big data technology in smart mine, data become one of the most important resources. Colliery data have the following characteristics: frst, the scale of the data is large; second, the data collection speed is fast; third, the value density of the data is low; and fourth, the data need high accuracy and strong timeliness. Data sharing is essential to the efcient use of data resources in smart mines. One of the biggest challenges of data sharing is to safely transmit the increasing amount of data. Data sharing is often accompanied by extraction, transformation, and loading processes. Tis means that data quality, data governance, and data security are particularly important. In order to realize the further progress of intelligent coal mine construction, data standardization has become a challenge.
In the operation of a smart mine, there are three diferent types of data sharing processes in the coal mine, as shown in Figure 1. First, all kinds of automation systems collect data and store it in the data storage platform. Second, the intelligent coal mine application extracts data from the storage platform and analyzes it. Finally, each coal mine aggregates the data collected by the automation system or from the data storage platform and submits it to higher management.
Since various automation systems belong to diferent businesses and have diferent functions, the type of data it manages and its granularity varies. Terefore, the data formats transmitted to the storage platform by each automation system are diferent. At the same time, there is a lack of uniform transmission protocols and standards for data sharing at all levels. As a result, storing and using coal mine data can often encounter problems such as low data accuracy, unguaranteed timeliness, data leakage, and difculties in accountability. Specifcally, there are four common problems, shown as follows: (1) Te format of the dataset is piecemeal, reducing data sharing security and timeliness, and the standardization workload is heavy. Te data accessed by the intelligent coal mine big data platform is scattered on multiple devices. Te fle format and data format description of the sent data set are confusing and nonstandard when automation systems transfer data to the storage platform and the upper-level management department. Te storage platform and upper management are unable to identify the integrity of the data, which is prone to data inconsistency caused by secondary delivery. Before implementing data from disparate business systems, the receiving unit must restandardize the data according to business demands. Tis restandardized process has resulted in an enormous workload for those responsible for data collection. It also led to delays in data collection and eventually no one wanted to take on the job. (2) Te lack of clear data quality descriptions makes the storage platform unable to guarantee the correctness of shared data. Te coal mine automation system cannot describe data quality very clearly when uploading data. It makes the storage platform unclear about the quality of the data, making it difcult to ensure its correctness and unable to guarantee data standardization. Tis poses a potential hazard to future data applications. Problems with data quality can lead to duplication of data collection by the system, resulting in the reduction of data sharing, and it is difcult to form a virtuous circle. (3) Coal mines have diverse production environments and equipment, and there is no uniform reference specifcation for data governance, especially for specifc types of data. Coal mines have diferent business systems, and each business system corresponds to a variety of equipment. It is common for automated systems to generate abnormal data. Te storage platform and data collection unit need to manage the error data submitted by each automatic system and perform diferent correction and annotation operations according to the type of error data.
Tere is no dedicated governance for characteristic type data and sensitive data, which can easily lead to sensitive information leakage. Desensitized data are difcult to maintain data consistency and business relevance. Data governance is primarily done by people who do not have expertise related to mining. Data governance cannot achieve the expected results due to the lack of reference specifcations for specialized governance methods for abnormal or irregular data or sensitive data.
(4) Te platform does not have clear storage specifcations, erroneous data are difcult to trace, and there are data sharing security issues. From production to the presentation of the fnal results of intelligent coal mine applications, coal data often go through multiple processes such as extraction, conversion, mapping, and reorganization. Te platform does not select the appropriate data storage according to the business characteristics, and the systems easily access diferent levels of data. In these processes, security risks and errors such as data leakage, data tampering, data loss, data inaccuracy, data redundancy, and data expiration often occur. Te lack of storage specifcation requirements for data storage retention times and record cyclic relationships makes it difcult to investigate errors and improve processes when these problems occur.
Because of the abovementioend problems, this study designs a specifcation for the data processing process in coal mines to unify security standards in data storage as well as sharing to improve the protection of coal mine data. Tis specifcation covers data collection, transmission, and storage in the process of unifed data management in intelligent mines. It specifcally includes data source specifcation, data quality specifcation, and data storage specifcation. It has the following attributes and functions: (1) A data source specifcation: It describes the format of data transmission and fle storage in the data collection process. It reduces the workload of coal mine workers, improves efciency while reducing personnel, and reduces pretreatment work in the subsequent stage. Te intention is to improve the security and timeliness of data sharing. (2) A data governance specifcation: It helps software developers realize data governance without professional knowledge of coal mining and data mining. Design desensitization rules according to the data needs of diferent business units to improve the security of sharing special data. (3) A data storage specifcation: It defnes the data retention period of data storage and the mapping format between recorded data. Tis specifcation makes it easier to track problems and improve the system when errors occur in production. Systems share data securely based on access rights for data.
Tis paper is arranged as follows. In Section 2, we review some relevant work. In Section 3, we design the top-level structure of the data management specifcation model. Ten, three submodels are proposed, respectively, in Sections 4, 5, and 6. Tey are data source specifcation model, data quality specifcation model, and data storage specifcation model. In Section 7, we create a coal mine big data system to validate the utility of the specifcation and demonstrate the implementation of this data management specifcation. Te study is concluded in Section 8.

Related Work
Te authors in [1] provided a digital construction plan for coal mine big data based on life cycle management, which included technological approaches such as digital data collection, processing, and storage. It can also be used in other industries. Te authors in [2,3] used Internet of Tings (IoT) technology to create a smart mining architecture. Teir architecture includes data collection, data transfer, data storage, and intelligent applications. Te authors in [4] presented a data platform system that combines digital technology, big data, and artifcial intelligence. Tis data platform system can collect, transmit, store, and process smart mining data over the network. However, the main issues encountered in the development of smart mines, such as data transmission and storage efciency, data quality, and data traceability, cannot be fully addressed in a single system.
Coal businesses employ IoT technology to construct smart mines in order to boost mine production and better manage coal mine big data. However, the problem of transferring huge amounts of data created by end devices has become an important issue that must be addressed. Edge computing is currently a very representative solution for reducing the Internet of Tings data transmission delay [5][6][7][8]. In the studies of [9,10], techniques for work assignment in edge computing systems are proposed. Tey carefully examined the trade-of between data transmission and computer resource allocation. Based on multihop vehicle computation resources, the authors in [11] suggested an adaptive algorithm ofoading technique. Te aim of these algorithms is to reduce task delays. Another solution to the problem of low-quality intelligent analysis fndings produced by data noise in large data sets is to efectively minimize the data set size [12,13]. Te authors in [14] created an edge computing based system to handle data anomaly detection and analysis in underground mining. Te edge devices were employed to do anomaly detection jobs, which increased efciency. Te authors in [15] present a study based on edge computing technologies that ofered intelligent video surveillance for coal mines. FL-YOLO, a depthwise separable convolution and downsampling inverted residual block algorithm, was used in edge devices to identify security incidents. Te authors in [16] developed an unloading task method that took into account network latency, wireless communication air rate, and computer resource consumption. To fnd the best option, they employed particle swarm optimization. Te authors in [17] use federated learning in wireless edge networks to safeguard the privacy of user data, improving the performance of federated learning by jointly optimizing local accuracy and various resource allocation strategies. Te authors in [18,19] provide algorithms for the Internet of Tings system's nodes. Tey evaluate the social relationships between nodes and partition the nodes to increase the Internet of Tings' information transmission efciency and network performance. Our specifcation is based on IoT devices and edge computing, standardizing data processing, designing anomalous data detection, and governance standards to improve data transmission speed and quality.
Various data standards are used in diferent felds to describe and manage data storage and transmission. For instance, in the feld of geographical information, the International Standards Organization [20] provides a structure that describes the various steps involved in the data description, management, transmission, and sharing. In order to identify the types of errors in the metadata elements, the authors in [21] presented a method that can be used to improve the quality of data based on ISO 19157:2013. In the feld of biology, the authors in [22] proposed data specifcation known as BIND was presented to describe and store the biomolecular information. In the feld of medicine, various medical decision-making systems are based on the data collected and stored by multiple sources [23]. To improve the efciency of telemedicine services, the authors in [24] developed a framework that standardizes the four processes involved in the collection, analysis, transmission, and decision-making of data. Due to the inconsistent nature of the data specifcations in materials science, it is difcult to use them in deep learning. For addressing this issue, the authors in [25] created a data specifcation that is fexible, searchable, and formal. In the smart city, the data collected by the sensors will need to be stored and analyzed to improve the efciency of the operations. Te authors in [26] proposed an attributed-based specifcation that can be used to fnd and analyze the data. We proposed a data specifcation applied to the coal mining industry in the study of [27], but it is still not comprehensive, and this study makes further research on the basis of the study of [27].

Data Management Specification Model
Tis paper mainly discusses the specifcation of the data processing stage in coal mine, and the general framework is shown in Figure 2. Te fgure describes the direction of data fow. Data source access is to standardize the data collection behavior of each automation system at the source end, including data source, data format, and equipment information. Data transmission is a standard constraint on the transmission stage between diferent levels, mainly the specifcation of data transmission mode, transmission protocol, and data governance. Data storage standardizes data storage locations and storage media.

Security and Communication Networks
Tis section defnes a data management specifcation model using unifed modeling language (UML) based on the data source access, data transmission, and data storage parts of the abovementioned framework, as shown in Figures 3-6. It covers all data processing stages of smart coal mining, including data characteristics, data transmission, data quality, and data storage. Te top structure of the data management specifcation model is depicted in Figure 3. It consists of three models: data source specifcation, data quality defnition, and data storage specifcation.
Te three models have the following connection. At frst, the data source specifcation governs the data collecting and transmission method. Tis corresponds to the data source access, transmission method, and transport protocol. Second, the data quality specifcation outlines the data inspection and data governance procedures to be followed during the transmission process. Tis corresponds to the data governance module of the framework for data transfer. Tird, the processed data is saved following the data storage specifcation. Tis requirement corresponds to the data storage module in the framework. Finally, intelligent coal mine applications retrieve and exploit the data.
(1) Data Source Specifcation Model. A full description of the data source is provided by the data source specifcation model. It specifes a hierarchical classifcation of the data. Some data must be recorded during the data collection and transmission procedure. Te standard mandates documentation of the data source system and associated sources, as well as other pertinent and essential information, to permit traceability of issue data and accountability. (2) Data Quality Specifcation Model. Te data quality specifcation model is used to establish the data quality standards and assessment criteria for more advanced intelligent applications. Utilizing pertinent details like the data source system and source description, one may examine the data integrity, correctness, and timeliness efciently. Furthermore, problematic or nonstandard data might be recognized, repaired, and handled by professional experts to raise the level of data quality.
(3) Data Storage Specifcation Model. A detailed defnition of the record information required to transmit data to the storage platform is provided by the data storage specifcation model. Data storage location, medium, and life cycle are all specifed by the data storage specifcation model. Te placement of the storage aids in making the data fow clear. Applications for coal mine intelligence can discover the information's source. Te practitioner can more easily analyze the data lineage with its assistance. Te life cycle assists in avoiding data duplication, enhancing the spatial exploitation of data storage, and providing greater support for intelligent applications.

Data Source Specification Model
Te data source specifcation model is to standardize the process of data source acquisition and transmission. It unifes the data access process, classifes data hierarchically, and improves data sharing security. Figure 4 shows the data source specifcation model. During data collecting, the following details must be set at the same time: data source system, data source description, data transfer, identifcation, contact information, and references.
(1) Data source system gives specifc information about the data source system, which is used to clarify the scope of business scenario requirements for data sharing and ensure that data usage is not beyond the authorized scope. For systems containing   sensitive information, a database encryption system can be deployed. Te information stored in the database is encrypted and stored, and an independent permission control system is used to realize the permission control of sensitive data access to ensure the security of its data. Data source system refers to the system from which the data are collected. We defne fve business layers for coal mine data, which are mine system, subsystem, device, subdevice, and measurement point. Te data source system should include two layers: mine system and subsystem. For instance, common coal mine systems are coal mining system, excavation system, drainage system, ventilation system, transportation system, and electromechanical system. Subsystems are divided by area or function. For example, the subsystems of the main drainage system are the central pump house, the pump house below the adit, and the pump house in the 121-panel. In practice, mine systems and subsystems need to be modifed according to the characteristics of the colliery. In order to standardize the processing of data, for the data sources of multiple business systems, the unifed standard naming management of each business system and its equipment is realized through the master data naming specifcation. Te naming rule of the full name of the mine is the abbreviation of the group company, the full name of the branch (optional), the scope of mining rights-coal mine. Te naming rule of working face is working face numberfunction-working face. A specifc example is 123 coal mining face.
(2) Data source description is a list of measurement points under the devices and their subdevices to provide the source of the data. As an example, some measurement points for a drive motor. Te subdevices of the drive motor are motor, reducer, inverter, and high voltage switchgear. Te measurement points of the motor are A phase winding temperature, B phase winding temperature, C phase winding temperature, motor front shaft temperature, motor rear shaft temperature, etc. For data from diferent automated systems, we use a data access method based on multiple data sources. A mapping relationship is established between the source and target data to achieve a unifed naming and standardized description of the data set. Mapping the source data into a standardized format avoids repetitive human standardization work and improves the speed of data standardization. Te source of the data can be located by keeping log records while it is being sent. As a result, personnel may inspect the associated data collecting devices and measurement sites to solve issues like data mistakes or inaccuracies when they arise.
(3) Data transmission describes the necessary information for data transfer and storage. It consists of the required feld data and the retention period. To  Figure 4: Data source specifcation model. Security and Communication Networks prevent data loss and retransmission, the data source system needs to retain the data after successful transmission. Terefore, the retention time of the data source data are the length of time that the data need to be retained in the data source system after transmission. Te system needs to record the time before and after data transmission to calculate the transmission delay and verify the data retention time. In the data transmission information, the transmission method must be online or ofine information transmission. Te online transmission information includes fve felds: connection mode, encoding format, transmission protocol, connection address, and encryption mode. From the receiver's point of view, it can be divided into active and passive connection modes. Te former model means that the data source opens the query port, and the latter mode means that the data source system sends the data directly to the receiver. If the ofine transmission mode is selected, the ofine period and ofine storage media need to be recorded. Tat is, the system will record the frequency of ofine transmission and the media used, such as weekly or monthly transmission using a hard disk. Encryption methods can be selected from one-way encryption, symmetric encryption, hash function, and digital signature. Users can select reasonable transmission modes and encryption methods according to data characteristics to enhance the security of sharing sensitive data. Recording the whole process of data fow helps to improve the data sharing log.
(4) Identifcation records information about the data format. Te specifcation of data sets and packets prevents the computer from being able to read the code. Te data set restrictions to limit the scope and manner in which the dataset can be used. Tis feld ensures that only compliant personnel have access to authorized-mine data, improving the security of data. Te identifcation also requires the data set to follow certain format specifcations, reduce data parsing errors, and improve efciency through a unifed fle naming format. Te fle head should be named "coal name; system name; data upload time." Among them, the data upload time refers to the time  of generating the data set fle. Te fle body is a collection of measuring point data, and the data format is "measuring point; unit; upper range; lower range; upper alarm range; lower alarm range; data time." (5) Contact records the contact person and contact unit of the data source system, specifying the institutional departments and responsibilities related to data sharing. If there is a problem with the source data, managers can quickly seek help through contact information. (6) Reference records the industry management methods, professional theoretical knowledge of the coal industry, and relevant technical indicators related to this specifcation.
Data specifcation information can be set using XML fles. Instance 1 shows a simple example of the data source specifcation model with XML format. It only gives partial information about the data source specifcation model.

Data Quality Specification Model
Te coal data quality standard is described and evaluated using the data quality specifcation model, which also ensures the accuracy and consistency of the shared data. Figure 5 shows the data quality specifcation model, including the following modules.
Data quality is the specifc criterion for describing and evaluating data. Te specifc content needs to be developed based on the advice of business experts and combined with the actual situation of the coal mine and the classifcation of data, taking into account the three attributes of integrity, accuracy, and timeliness.
To examine data integrity at the business level, it is necessary to consider the overall and local aspects separately. System-level integrity requirements describe the specifc business information contained in the entire automation system, including system information, subsystem information, and equipment information respectively. For example, the ventilation system includes 3 ventilators, 2 vertical air gates, and 2 fan oil stations. Te parameter-level integrity describes all data measurement points in the system. For example, the fan needs to measure the fan blade Angle, the winding temperature and bearing temperature related to the fan motor, the wind speed and efciency of the fan, etc.
In addition, the accuracy requirements defne the corresponding data type, data range, and data length of the measured point data. Te data type ensures the accuracy requirements of the data. Te data range allows for evaluating the data reasonableness. For example, if the data type is Boolean, then the data have no data range. If the data are of other types with a clear threshold range, then the data range needs to be specifed according to the actual situation. Te data range can be developed in a variety of ways, including the parameters of the equipment itself and the expert's estimate of the safety situation in the coal mine. For example, the upper threshold for the pool water level of the gas drainage system is 1.9 m and the lower threshold is 0.8 m. For the data with timeliness characteristics, the timeliness requirements are used to judge the quality of the data. Timeliness means that data will be recorded in chronological order and conform to certain rules of change. It is mainly   Data governance consists of a mandatory label and a mandatory governance standard. Tese mandatory governance standards come from integrity, accuracy, and timeliness. Te label indicates whether the data have been modifed and what specifc modifcations have been made. Data governance method, neglect, retransmission, and warning are four common mandatory felds covered by each governance standard. When the data are modifed, the data governance method must be recorded. Data governance methods need to refer to expert experience and commonly used methods in related industries. Ignore implies that the inaccuracy is acceptable and should be disregarded. An essential feld that contains an unfxable mistake has to be resent. If the data deviate greatly from the normal range, the alarm feld will be enabled to inform the responsible person that there are security vulnerabilities in the system.
In the creation of specifcations, many methods can be utilized to guarantee the data governance process. For sensitive data, common data desensitization methods are adopted for governance. For ordinary data, we use a density-based outlier identifcation approach to ensure accuracy. For time-series data, we use a time-series model to ensure timeliness.
Te density-based outlier identifcation method is as follows. First, the data are grouped using quick search and density peak (DPC) [28] clustering algorithms. Ten, any points whose distance from each center is more than or equal to the radius in the clustering procedure are picked as outlier candidate sets. Finally, an enhanced local outlier factor (LOF) [29] is employed to fnd outliers in the candidate outlier collection.
Formula (1) is used to locate cluster centers. Two concepts are defned. One is the sample i local density, which is denoted as ρ i . Another is the shortest distance between the sample i and the location with a higher local density. It is denoted as δ i . (1) Here, dist(i, j) is the Euclidean distance between sample i and j. dc is a hyperparameter expressing cutof distance. χ(x) is an activation function. χ(x) � 1 when x < 0, else Here, δ i is the one with the largest local distance among all samples. Tose sites with larger rho i and delta i are selected as cluster centers.
In the second step, the improved LOF model f(i) is utilized to calculate the degree of an outlier: Here, ρ i is the local density of sample i. N k (i) is a set composed of all samples in the k neighborhoods of a sample i. Formula (3) measures the extent of outlying. For example, if f(i) greater than 1, the point i is located in a sparse area. It is an outlier. Otherwise, if f(i) is less than 1, the local density of sample i is higher than its neighbors. Tis is the normal case. Te aforementioned approach may be used to obtain the samples in the outlier candidate set S. Following the sorting of set, data governance may be applied to the frst n outliers.
For normal, stationary, and zero mean time series x t , if x t is connected to the value and incentive of the preceding n steps, there is a general ARMA model (formula (5) [30]. Te ARMA model comprises an autoregressive model (AR) and a moving average model (MA).
Here, n and m are the order of autoregressive process and moving average correspondingly. Te ARMA model is denoted as ARMA (n, m). If n � 0, the ARMA model becomes the MA model. If m � 0, this model is an AR model. α i ∈ R is termed the autoregressive coefcient and β i ∈ R is the moving average coefcient. Te series n t is the white noise sequence. Akaike information criterion (AIC) [31] is used to calculate the order of ARMA (n, m) model. Te representation of AIC is given by the following formula : Here, u � n + m + 1 specifes the number of model parameters. N refects the sample size and σ 2 is the error variance of the model. If the value of AIC is the least, then ARMA (n, m) is the best efective model for forecasting time series.
Identifcation describes the maintenance information of data quality. Te maintenance personnel shall check and update the data quality requirements according to the maintenance and update frequency. When deploying new equipment, they need a set of requirements for timely updating data accuracy and timeliness according to the information of new equipment. Contact records the business experts consulted when setting up coal data quality and data governance methods. Reference mainly records the international standards referred to when specifying data quality standards and the instructions for equipment-related parameters.

Data Storage Specification Model
Te data storage specifcation model describes the storage requirements in colliery data management, as shown in Figure 6. Tis facilitates the use of technologies such as storage encryption and backup to protect hierarchically classifed data. It includes fve modules: data storage, data label, identifcation, contact, and reference.
(1) Data storage defnes data storage information. It covers the data storage location, medium, and retention time, among other things. Te data storage location and the data storage medium are two required parameters that provide the particular URL route and storage media, respectively, by storing the precise URL path for improved traceability. For sensitive data, a good data storage medium needs to be selected. It also provides efective technology and management tools for data storage media to prevent data leakage due to improper use of media and improve the security of data sharing. It assists the system in identifying the precise fow of various business data by capturing the location of data. If it is necessary to retain the data for a certain period, the data retention period feld needs to be set. Tis is an optional feld to be set according to the actual requirements. Te data overdue processing feld is provided when the data are past due. Descriptions of data source and destination show the fow of data. Te data source description specifes the data production system. It identifes the department responsible for the data. Te data destination description specifes the access rights of each platform to diferent business data. Each platform should apply for data use to the data management department according to the authority to obtain data use authority and improve data sharing security. Data governance labels highlight the governance mechanism, whereas data labels primarily record the business system to which the data belongs. Tis is the distinction between the two. (2) Data label includes a business label, an application label, and a timestamp. A typical mine industry business label can be divided into 4 layers: mine system, subsystem, equipment, and subequipment. Application label include worker type labels, device labels, disaster labels, operation labels, region labels, and system labels. Each piece of data can correspond to only one service label but can correspond to multiple application labels and provides a timestamp. Coal industry practitioners and IT industry practitioners can use business label and application label to rapidly query data. (3) Identifcation shows the coding format, data packet format, and data encryption method for colliery data storage. Appropriate data encryption methods to protect the security of data sharing. To increase data security, it restricts the use of data sets and access personnel through data set constraints. It specifes the scope of data sharing scenarios and the rights holders of data sharing. (4) Contact records the person responsible for storing the data. When the data are lost, we can quickly fnd the relevant person in charge to follow up on the situation. (5) Reference includes the documents referenced and referenced in the process of formulating colliery storage standards.

Experiment
To demonstrate the validity and usefulness of the specifcation given in this work, we used data from the Wangjialing Security and Communication Networks coal mine to construct a set of coal mine data collection and analysis system. Te system requires three identical computers to form a cluster, and the experimental settings are presented in Table 1. We obtained data from the Wangjialing coal mine's IoT devices with the coal company's permission. Te data access process in the system is designed based on the data source specifcation, as illustrated in Figure 7.
According to the data source specifcation model, the data access process must save data source information, data transmission, identifcation, contact, and reference. Tis contributes to data traceability and ensures the security of data sharing. In Figure 7, more detailed information on the data source system can be viewed by clicking on the description. For example, if you click on the task with the id is three, you can know that the coal mine is Wangjialing coal mine, the system is main ventilation monitoring system, and the subsystem is mine ventilation room. Te data source data retention period is a week. Te pretransmission timestamp is 182984608043, and the transfer completion timestamp is 182984608582. Te data are transferred online, and the connection method is active access. Te contact information of equipment contains the equipment manufacturer and phone number and so on. Due to the wide range of data sources, including databases, fles, and sensors, the data naming is not uniform. In order to unify the feld names of coal mine data and record the data source system according to the data source specifcation, the data access process needs to implement the data mapping function. Te data mapping function of the system is shown in Figure 8. After analyzing the data source fle, select the data source fle and the corresponding coal mine, system, subsystem, equipment, and feld to make them correspond one by one. After completing these tasks, a mapping relationship will be generated between the source data and the target data, that is, a new data access task will be created. Te target data can also form a uniform naming standard. Te completed mapped data is encrypted by the data encryption standard (DES) algorithm and is then securely transferred to the storage platform.
In the data quality feld, data governance is done as the data is being transferred. Data quality criteria for each type of data are defned based on the expertise of the experts and the device specifcations. Te data quality criteria for the 10 kV incoming cabinet are shown in Figure 9 by taking into account all factor types. Tis diagram shows the screenshot of the data quality standards of the coal mine data collection and analysis system. Tis fgure shows the data quality specifcation model for the equipment 10 kV incoming cabinet, covering data quality, identifcation, and reference information. Completeness indicates this measurement point data for this device is present. Accuracy includes the data type and data range. 10 kV incoming cabinet only has an upper bound for every measurement point. Te threshold range and reasonable range of the data determines how the data are processed, specifying whether the data should be governed, ignored, or alarmed. Te version in the fgure ensures the reference information of the data quality specifcation model. Each modifcation by the user will update the version number of the data quality standard and record the updated range description, the modifer, and the date of modifcation. By clicking on data governance, you can also see the corresponding governance methods and reference information referenced by the setting of the standard. By clicking on data governance, the governance method may also be seen. Te data governance process is the next phase, which is determined by the data quality requirements. We have created comparable standards based on distinct parameter features. In order to verify the reliability of the data governance methods mentioned above, we have used the data of 10 kV incoming cabinet and motor as an example for illustration. Te size of the dataset is 1000. For generic data (e.g., shaft temperature of the motor), we utilize the aforementioned density based outlier identifcation approach to fnd. Te results are shown in Figure 10. Outliers that need to be handled are the data points in the red circle in the fgure. Te data are then adjusted using the mean, median, or other methods. For common time-series data such as current and voltage, the time series algorithm autoregressive moving average model (ARMA) is utilized. Figure 11 illustrates the results of residual analysis on line voltage data. Te standardized residuals demonstrate that there is no shifting variation throughout time. Te autocorrelation function (ACF) of the residuals suggests no autocorrelations. Te Q-Q plot is a normal probability plot that demonstrates that the data conform to a normal distribution. Te preceding research reveals that the ARMA model may be utilized to identify voltage data.
As illustrated in Figure 12, certain data governance outcomes about the exhaust temperature of the pressure fan. It can be shown from the results that the specifcation can ensure data quality in the big data system. Te governed data and associated information will be kept. Te database will record the data storage location, retention period, overage processing method, data destination description, identifcation information, contact information, and reference information. For example, the data storage location is htpps:// ip:9000/data/WJH-MVMS. Te data retention period is a month. Te data destination description is an algorithm platform. During data sharing, only the algorithmic platform and its users are authorized to access and use the data of this subsystem. Te system will also tag the data with a data governance label, data label, and application label, recording the governance technique, business system, and data category. In addition, the system includes a security access control function. Tis module is responsible for authenticating the user's operation rights and only users with login rights can access and operate the data to achieve secure data sharing.     Figure 10: For the measure point of electrical machinery shaft temperature, a density-based outlier identifcation approach was used to identify the outliers and mark them with red circles to form the scatter diagram of the electrical machinery shaft temperature.

Conclusions
Te area of smart coal mining is growing quickly, and the enormous growth in data size creates a great demand for data management and data sharing. Te lack of data standards causes inefcient use of computer and human resources and costly costs. To address these challenges, we have carried out the following: frst, we ofer a set of data specifcations for data collection, transmission, and storage for big data practices in the coal mining sector. To improve the generality of data and the security of data sharing, our specifcation provides a complete data model that flls the gaps in data collection, transmission, governance, sharing, and storage according to the characteristics of the mining sector. Data are divided into business and application categories, and data tags are used to identify the category to which the data belongs, clarifying the scope of data sharing. Both those in the coal mining industry and those in the IT industry can easily fnd the data they need, allowing them to beneft from the specifcation. Te standard sets up business-level classifcations that allow employees to quickly track the source of anomalous data. Special governance rules for sensitive data ensure the security of sensitive data in the sharing process. Appropriate data encryption algorithms and transmission methods are selected according to the data transmission needs of diferent platforms to ensure the security of data sharing. Second, for the specifcation, we designed a short XML example for the data source model. All data criteria can be set and imported into the system based on this example. Tird, we constructed a coal mine data collection and analysis system based on the standards of the data specifcation. Te access, mapping, governance, sharing, and storage processes of data processing were implemented in the system.
Experimental results show that the system verifed the validity, usefulness, and security of the specifcation. Appropriate data encryption algorithms and transmission methods are selected according to the data transmission needs of diferent platforms to ensure the security of data sharing. In the future, we will try to combine microservices, privacy computing, and other technologies based on this specifcation to design multi-source data security sharing solutions that can meet cross-industry requirements.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.