Feature Extraction of National Physical Fitness Data Based on Data Mining

In order to better understand and optimize the national physical fitness, this paper puts forward the national physical fitness data change feature extraction method based on data mining, uses the decision tree and association rule data mining algorithm to collect the national physical fitness data in recent years, constructs the database to realize the effective data management, and uses the data mining algorithm to construct the physical fitness change feature evaluation index. Finally, through experiments, it is confirmed that the national physique data change feature extraction method based on data mining has high effectiveness in the process of practical application. It can better understand the national physique change trend and put forward targeted suggestions for national physique health optimization.


Introduction
National physique is crucial to the future of the country, social progress, and personal development. If the promulgation and implementation of the outline of the national tness plan are regarded as the new era of mass sports in China, then the "several opinions on accelerating the development of sports industry and promoting sports consumption" (hereinafter referred to as "opinions") issued by the State Council are more like a milestone event "opinions." It not only marks that mass sports have entered a new development period, but also raises national tness as a national strategy for the rst time and takes strengthening people's physique and improving health level as the fundamental goal of sports development [1]. As the country pays more and more attention to the national tness work, various national tness related plans and policies are constantly introduced and improved. At the same time, the mode of rapid development of mass sports in China has been slowly started [2]. Data preprocessing technology is an important means to improve data quality. How to combine data preprocessing technology with national physique monitoring and the data preprocessing technology (method) selected according to the characteristics of national physique monitoring and the objectives of each stage not only ensures the authenticity and reliability of national physique monitoring database data [3], but also objectively re ects the quality level of national physique monitoring database. In addition, comparing the national physique monitoring data of provinces (regions) and cities before and after data preprocessing can also be used as the standard to judge the quality of national physique monitoring of provinces (regions) and cities [4]. e application of this technology improves and ensures the data quality of national physique monitoring and puts forward new requirements and directions for the development of national physique monitoring in the next stage.

Feature Extraction of National Physical
Fitness Data analyze the country's physique state and offers data to aid in the creation of government policies. Its data quality must be shown and approved by specialists at all levels, which means that, prior to the creation of the national physique monitoring database, a number of data preparation tasks must be completed. e data cleaning of the national physique monitoring database generally goes through several steps, according to an analysis of the construction process of the database over time: the determination of warehousing samples, the entry of relevant data, data integration, data cleaning, and data unification [5]. e determination of warehousing samples, the entry of relevant data, data integration, and data cleaning are among them. ey clean and convert the coding structure and variable attributes of the part in accordance with the theme of building a data warehouse, in order to meet the overall design of the database and effectively and quickly meet visitor expectations [6]. As indicated in Figure 1, the major goal of data integration and data unification is to combine datasets from several sources into a single database.
Data integration is the process of logically or physically organically gathering database tables with different sources, formats, and characteristics and establishing data warehouse. ere are many database tables after data cleaning [7]. In order to reduce the number of database tables and improve the convenience of data analysis and the consistency of analysis results, multiple database tables must be integrated. After integrating the data of the questionnaire database and the measurement table into the same database, the existing data of the questionnaire database and the measurement table will be merged into the same database, resulting in the fact that the data of the questionnaire database and the measurement table will pass the test [8].
e national physique data collection and management process is shown in Figure 2.
Data entry includes completing questionnaires, digital entries, and other input activities within the allotted timeframes and in accordance with the established standards and amounts. Currently, national physical fitness monitoring data is mainly manually inputted by professional input employees. As illustrated in Figure 3, double-blind data entry is used from the commencement of data entry through the creation of the database to guarantee consistency between the data and the questionnaire data [9]. After entering the data, the national physical fitness monitoring center randomly picks 1% of the data to compare and recheck with the paper questionnaire [10]. Following the recheck, the firm will send a copy of the original input data sheet as the database's original data.
After obtaining and entering the original data sheet of the company, we should start our data screening work. Data screening is an important means to ensure data integrity, accuracy, credibility, and interpretability, and it is also an important evaluation means to ensure data quality. Firstly, the data cleaning work is carried out on the original input data, mainly to judge the noise data, identify the outliers, and correct the inconsistency of some data. e noise data and outliers are identified through manual screening after being identified by professional software [11]. e original data table is imported into the "national physique monitoring data screening software" for monitoring index deviation detection and logical judgment. e data passing the deviation inspection and conforming to the logical judgment are directly included into the final logical screening database for data synchronization. In the application administration of national physical fitness monitoring, the data synchronization software is employed to resolve data consistency between the province data center and each ground station database [12]. e data synchronization application is installed on each ground station server and uses the Internet to link the provincial data center and the ground station database. It can monitor data changes on both sides on a regular basis and synchronize modified data to the other party's database in a timely manner to maintain data integrity and consistency. e data synchronization service is built on the Web and does tasks on a regular basis using Windows services [13]. To determine if the data is synchronized or not, add three fields of modification time, synchronization time, and data source to the table to be synchronized and implement data management in the whole central database and each ground station database. Each record of each table is set with a unique ID to ensure the normal synchronization of data. After the synchronization service is started, first check the network condition. When the network is unobstructed, find the data whose modification time in the ground station database is greater than the synchronization time and synchronize the user data, physical examination data, and prescription data to the provincial data center in turn [14]. After the synchronization is successful, modify the synchronization time to the current time. en, find the data whose modification time is greater than the synchronization time in the provincial data center. Once found, synchronize these data to the ground station database. After successful synchronization, modify the synchronization time to the current time [15]. When the network is blocked or synchronization fails, continue to check the network condition and execute the synchronization process after an interval of two minutes. e flow chart of synchronization program is shown in Figure 4.
In order to ensure the normal data synchronization, each table and record of the central database and each ground station database must be unique. e implementation process needs to identify the same user and the fraudulent use of user name, that is, to solve the conflict problem and the problem of user account combination and audit [16]. e data synchronization mechanism ensures the final consistency of data, makes the data between the provincial data center and each ground station database back up each other, and realizes the overall remote backup function.

Evaluation Algorithm of National Physique Data
Characteristics. Physique data analysis is the core function of the model developed in this paper. is section is divided into two sections: one contains general data statistics from a large number of ordinary user physique data sets collected by the physique identification and health management program, and the other contains association rule mining of    the data set to identify different ages, genders, and occupations. ere is the law of health conditioning requirements of individuals with various characteristics in different locations [17]. ere are the distribution of physical kinds and the rule of health conditioning demands of people with various characteristics in different regions. Python and the flask framework are used to construct the display layer and business logic layer of the whole data analysis function module. e flow chart of data analysis is shown in Figure 5.
e national physical fitness test database contains a large amount of data, which needs data cleaning, removing noise in the data and correcting data inconsistency. e main preprocessing contents include data cleaning and digitization [18]. After years of research, China has implemented and revised the national physical fitness measurement standards for many times. According to these standards, the data are processed to meet data mining needs. First, clean up the data. Eliminate missing values, identify and eliminate outliers, and check and correct errors in data is paper uses the data mining plug-in of data mining add-ins for Office 2017 to browse data and remove outlier data in Excel 2017 and preliminarily process the irregular original data through accumulation generation, subtraction generation, or weighted accumulation generation, so as to turn it into a more regular generation sequence.
is paper uses cumulative generation to preprocess the original data. Cumulative generation: set the data of the series as the original data sequence at one time: (1) In order to analyze and compare the related information as a whole and weaken the dispersion of the related information ε i , it is necessary to process the related information centrally, for example, average; that is, the average value k of the correlation coefficient of the comparison series and the reference series at each time point n is calculated to reflect the corresponding correlation degree of each comparison series and the reference series.
Based on the above algorithm, the variables of physical fitness are mainly set by three-level indicators. e weight coefficients of physical fitness indicators of two and gender groups in the same year are shown in Table 1.
e weight coefficients of physical fitness secondary indicators of people of different ages and genders are shown in Table 2. Now, the variable equation is based on the body shape "sub," and the three-level index is set as the variable. Because the dimensions of the index are inconsistent, and the change direction is inconsistent, the original test value of the index should be converted first; that is, the score should be made according to the national physical fitness measurement standard, and the variable will be changed from the original variable to the score variable, which is expressed by "s + three-level index name"; for example, "s body fat rate" represents the score of body fat rate [19]. All three-level indicators represent the setting of conditional parameters of "s + three-level indicator name" of different ages, which can be represented by "if then" in STELLA software. Taking the physical condition of young people aged 20-24 as the evaluation standard, the scores of physical indicators are divided: SBMI � BMI≺18. 5 3, BMI ∈ (18.5, 23.9) 4, BMI ∈ (24, 26.9) 2, BMI ≥ 27 1.
When SBMI is less than 18.5, 3 points will be obtained; when SBMI is between 18.5 and 23.9, 4 points will be obtained; when SBMI is between 24 and 26, 2 points will be obtained [20]; when SBMI is greater than 27, 1 point will be obtained. e flexibility is expressed by flexibility, and the expression function of SFlexibility is as follows: If the flexibility is less than 41l, 1 point will be obtained; when the flexibility is between 411 and 693, 2 points will be obtained; when the flexibility is between 41.5 and 69.3, 3 points will be obtained; when the flexibility is between 13.14 and 15.96, 4 points will be obtained [21]; when flexibility is greater than or equal to 1596, 5 points will be obtained. e torso force is expressed in sit-up table, and the function is as follows: When sit-up is less than 17, 1 point will be obtained; when sit-up is between 18 and 24, 2 points will be obtained; when sit-up is between 25 and 38, 3 points will be obtained; when sit-up is between 3945, 4 points will be obtained; when sit-up is greater than or equal to 45, 5 points will be obtained. e body fat rate is expressed as fat%, and the expression function of SFAT% is When fat% is less than 12, 3 points will be obtained; when fat% is between 12.1 and 18, 4 points will be obtained; when fat% is between 18.1 and 24, 2 points will be obtained [22,23]; when fat% is greater than or equal to 24, 1 point will be obtained.
e systolic blood pressure is expressed by BPH, and the expression function of SBP − H is When SBMD is less than 0.37, 1 point will be obtained; when SBMD is between 0.38 and 0.52, get 2 points; when SBMD is between 0.53 and 0.67, get 3 points; when SBMD is greater than or equal to 0.67, get 4 points.

Implementation of Feature Extraction of National Physical
Fitness Data. National fitness monitoring application management adopts a mature monitoring scheme based on zabbix4 software [24]. ZABBIX open source monitoring software has rich community support and provides rich plug-ins. ZABBIX offers more extensive development papers and better Chinese support than other monitors like Nagios and CACI. is article utilizes Zabbix software to build and construct a monitoring system to fulfil the criteria of server monitoring and application failure alert after a thorough comparison. Server host monitoring and application failure monitoring are two types of monitoring. Host monitoring, for example, may track the characteristics of the host's hardware and services, whereas application fault monitoring can track the program's potential problems. Server failure monitoring, application failure monitoring, stability guarantee monitoring, and monitoring alarm monitoring are the four modules that make up the monitoring section. ese four modules address a wide range of national fitness monitoring application management monitoring needs. e high availability is achieved by the omnidirectional monitoring of four components. As illustrated in Figure 6, the physical fitness identification and data analysis model is built based on the monitoring data.
Physique identification and data analysis model mainly involves three main functions: physique identification, physique data analysis, and visual display of analysis results. In addition, it also needs some basic management functions. e overall functional structure of the model is shown in Figure 7.
ere are many user roles involved in the whole model. In order to facilitate the function design and authority management of users with different roles, the users in the model are divided into four categories: (1) ordinary users, that is, users who use the model for physique identification and self-health management, and such users use the physique identification function; (2) data analysis user: that is, the user who uses the data analysis function in the model to analyze the data of physique identification results. is kind of user uses the physique data analysis and result visual display function; (3) ordinary administrators, that is, administrators who publish and maintain the contents of the scale and 9 kinds of physique related materials. Such users use the management function; (4) administrator, that is, the administrator who manages the basic information, roles, and permissions of users of other types of accounts. e management function is also used by these users. e   database table structures,  and the table structure should meet the third paradigm as much as possible, retaining appropriate data redundancy in exchange for better time efficiency. Table 3 and Table 4 demonstrate the primary database table structure of the database. ese data tables are implemented using MySQL as database management. Comprehensive report service is a physical health description document provided to users by national physical fitness monitoring application management. e user's physical examination data is uploaded through the ground station and synchronized to the provincial data center through the data synchronization service. e user's detection data is then compared to official health standard data, and the diagnosis, risk status, and proposed management goals for the user's health status are provided. e user may access his own website and examine his own full analysis report as well as individual analysis reports and download and save the reports in PDF format. e user's comprehensive analysis and stratification proposed management objectives and a detailed description of each diagnosis result, which includes lung function, cardiovascular, bone mineral density, body composition, balance function, constitution rating, cardiopulmonary endurance, rowing power, and other items. e cardiovascular function test report, body composition analysis and improvement recommendations report, bone mineral density test report, lung function test report, balancing machine test report, treadmill test report, and rowing machine test report are among the seven single analysis reports. e health signs report and the exercise ability report are included in the full analysis report. Among them, the report of health signs consists of the main body of the comprehensive report (cardiovascular module, body composition module, bone mineral density module, lung function module, balance ability module, and nutrition and metabolism module) and the corresponding subreports; the sports ability report consists of the main body of the comprehensive report (cardiopulmonary endurance module, rowing power module) and the corresponding subreports. e function structure of comprehensive report generation is shown in Figure 8. e flow chart of user's data characteristics comprehensive report generation is shown in the figure. When starting to generate the comprehensive report, first, you need to obtain the FTP information, database connection information, and management target service information in the configuration file to connect the database; then, check whether the comprehensive report has been generated. If it has been generated, download it directly from FTP and display it. Otherwise, continue to generate the report. In the process of generating the report, it is necessary to obtain the data of the first page and each subreport. e data of the first page includes the user's basic information, diagnosis results, cardiovascular exercise risk stratification, and suggested management objectives. e data for the subreport must be filled in selectively based on the parameter type. e information acquired is the key indicator data for each module over the last five years. Fill in the comprehensive report entity with the relevant data to produce a comprehensive report; obtain the corresponding itemized PDF report from FTP according to the configuration file's path for each itemized report, then combine the generated comprehensive report body and the obtained itemized PDF report into a PDF file of comprehensive analysis report, and upload the combined PDF file to the configuration file's FTP path. e created complete report file may be seen upon request.

Analysis of Experimental Results
e 11-item data acquisition program of national physique is used to collect 11 basic pieces of data of test users, including grip strength, vital capacity, height, weight, sitting forward flexion, reaction time, vertical jump, push-ups (male), eye closed one leg standing, step test, and sit-ups (female). Collecting users' national 11 pieces of data through the acquisition program is divided into three steps: card opening, acquisition, and card reading. e IC card model used is at24c16, which has an internal capacity of 16K (bits), which is 2048 bytes and can store the personal information  Table 5. If P is the national physical fitness test result, t is the test time, X is the number of nationals, and N is the number of experiments, then According to the calculation formula, the national physical quality can be judged. e statistical results of the physical fitness model produced by the python package and the logarithmic curve of the general physical fitness model are displayed by the python package, and the statistical results of the physical fitness model and the real curve produced by the python package are displayed by the python package. It provides rich interactive statistical charts such as relationship charts and also provides the download function of the generated charts, which can be saved to the computer in the form of pictures. e general statistical results of change data are shown in Figure 9.
Further, the traditional method and the feature extraction method proposed in this paper are used for comparative detection. e above figure is a reference sample to judge the accuracy of the extraction results of the two methods. e results are shown in Table 6.
It can be seen from the table that the feature extraction effect of this method is better than the traditional feature extraction method in performance, and the extraction accuracy is higher. Further, compare the traditional feature mining method with the method in this paper in terms of time consumption, as shown in Figure 10.
It can be seen from the figure that the traditional feature mining method consumes significantly more time than this grammar. It is concluded that this method has high accuracy, less time consumption than traditional methods, high mining speed, and accuracy and is feasible.     Year of 2010 year of 2020

Conclusion
Data mining is applicable to the study of national physique. rough the data mining of physical fitness test data, we find some rules that are confirmatory and contain new knowledge, which proves that the data mining tool is suitable for physical fitness data analysis and serves the field of physical health. According to the mining results, different physical exercise methods are adopted for different populations.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.