Research on Sports and Health Intelligent Diagnosis Based on Cluster Analysis

In order to explore the relationship between sports and health and improve the scientific nature of sports, this paper uses cluster analysis algorithm as the basis, adopts the entropy estimation method for small sample sets to estimate the information entropy value, and improves the mutual information estimation to propose a mutual information estimation method based on entropy estimation. Moreover, this paper uses a clustering algorithm to combine sports and health intelligent diagnosis requirements to construct a system structure. *e system recommends better sports suggestions to the user according to the user’s physical condition, makes sports plans according to the user’s health, and can also analyze the user’s sports process. In addition, on the basis of demand analysis, this paper designs experiments to test the performance of the system constructed in this paper. From the experimental statistical results, it can be seen that the system constructed in this paper can basically meet the actual needs of sports and health intelligent diagnosis. At the same time, this paper proves that there is a strong correlation between sports and health.


Introduction
In the modern city, people's physical labor is less and less and mental work and work pressure are more and more, so people's physical function according to the current mode of life is gradually declining. erefore, the defense ability of the whole body should be improved through sports, including muscle, bone, and the whole internal organs system and body circulation system. erefore, it is necessary to study the benefits of aerobic metabolism sports on menopausal women health.
Exercise therapy is the application of sports in medicine. It is a treatment method based on kinematics, biomechanics, and neurodevelopment, and its main goal is to improve the physical, physiological, psychological, and spiritual dysfunction, and its main factor is force and reaction. Exercise therapy includes both active physical activity training and passive physical activity training, and its functions include improving blood circulation, metabolism, and nerve control of sports tissues (muscles, bones, joints, and ligaments), promoting neuromuscular function, improving muscle strength, endurance, cardiopulmonary function, and balance function, and alleviating abnormalities [1].
Moderate and active sports can make people feel comfortable, relaxed, and happy, thus weakening the pressure brought by the fast-paced modern life. After sports, people can also be relaxed due to the end of muscle contraction or hormone secretion. Although sports can not completely eliminate the pressure source and also face pressure and tension after sports, it can temporarily reduce pressure and remove harmful energy substances to the human body. When people face pressure again in a comfortable and happy mood, they will face the challenge of pressure with a positive and confident attitude [2]. e idea that sports is medicine (ELM) has existed for a long time. In 2007, the American Sports Medicine Association formally proposed ELM as a health promotion action to solve global public health problems, which has been widely responded to and promoted in many countries and regions.
e aim of ELM is that doctors, fitness professionals, and other health care providers apply scientific evidence and policies on the relationship between sports and health in practice and apply evidence of sports science research to public health practice in the form of laws or illegal legal policies, so as to promote the globalization of EIM. EIM suggests that the level of physical activity should be taken as a basic vital sign and included in the content system of doctor consultation. Clinicians and health managers should actively participate in preventive health services to promote the improvement of the level of physical activity and advocate a positive and healthy lifestyle. EIM is a global public health practice based on evidence.
Based on the above analysis and clustering analysis algorithm, this paper constructs an intelligent diagnosis system of motion and health based on cluster analysis, studies the relationship between motion and health, and verifies the performance of the system.

Related Work
By studying the influence of Taijiquan Exercise on human physical health, literature [3] found that different frequencies of Taijiquan sports have a positive impact on the human respiratory system and cardiovascular system and can also improve the flexibility of the human body.
rough the study of the influence of sports on human physical health, literature [4] found that long-term sports has a good effect on human body shape, especially on the weight index, and can play the role in weight loss. e research of literature [5] shows that the effect of sports on the vital capacity of boys and girls is different; that is, it can significantly improve the vital capacity of boys but can only maintain the original vital capacity of girls. e research of literature [6] shows that sports can significantly improve the strength of boys' upper limbs, but it has no obvious effect on girls' upper limb strength, but it also has a certain effect, and sports can improve the quality of human speed and endurance. In the study of the effect of aerobic sports on human cardiopulmonary function in literature [7], it was found that aerobic sports has a benign effect on the cardiopulmonary function of the human body, can improve the myocardial function, make the blood circulation supply capacity more smooth, and increase the vital capacity, can improve the aerobic metabolism ability of the body, and also can improve the contractile ability of the heart and the relaxation tension of the blood vessels, enhance the elasticity of the blood vessels, promote the blood lipid metabolism, effectively prevent arteriosclerosis, and thus reduce the blood pressure and improve the cardiovascular function when the body is quiet. By studying the influence of different elective courses on human physical health, literature [8] found that different elective courses have different degrees of influence on human physical health.
ere are internal and external factors and subjective and objective factors that affect human physical health, and the internal and external factors are mainly congenital inheritance and acquired. e subjective factor mainly includes their own personality characteristics and cognition of physical health [9]. By studying the relationship between subjective factors and human physical health, literature [10] concluded that the subjective factors affecting human physical health are mainly the cognition of physical health, and the degree of cognition of physical health is an important factor affecting human physical health. In addition, people's own personality characteristics also have a great impact on physical health. e physical condition of extroverted people is much better than that of introverted people. Literature [11] believed that the physical health status of people with lively and extroverted personality and high enthusiasm is significantly better than that of people with introverted personality and low enthusiasm. rough the investigation of people's sports lifestyle, literature [12] found that sports lifestyle and habits are the important reasons for affecting physical health and finally concluded that a good sports lifestyle is an important factor affecting people's physical condition. Literature [13] analyzed the relationship among obesity rate, family income, and physical health status through relevant research on the influence of obesity rate on physical health status in colleges and universities. e results show that family economic status has a significant impact on male obesity, which is basically proportional to the relationship, while female obesity is less affected by family income status, which is not significant or linear. e analysis of literature [14] shows that there are many factors that affect the physical fitness of the masses in China. Among them, the main social reason is online games, which makes the human body ignore the awareness of sports. Second, the school ignoring the importance of fitness is one of the important factors, so that the campus cannot form a good fitness atmosphere, resulting in poor physical health of students.
With the development of social science and technology, Internet technology has been widely used in society, which provides a new platform for sports health control and management system [15]. We can install a control terminal on the detection equipment and use Internet technology to control the terminal to collect the data of the detection equipment and upload the data to the background cloud computing server of the sports health control and management system, so as to avoid the inconvenience of user manual input and improve the efficiency [16].

Information Entropy Estimation of Sports and Health
In the estimation of information entropy, the general method is to use the discrete probability of variables and usually use the histogram method to estimate the probability density of discrete random variables. When the sample set is large enough, the estimated value of information entropy tends to its true value. However, in the case of small sample size, if we still use the simple method to replace the probability in the histogram with the observed frequency, the statistical fluctuation will make the distribution look less uniform, which will lead to the underestimation of information entropy. In this case, the estimation of information entropy will be biased, which will affect the accuracy of information entropy calculation. At present, a lot of research work has put forward relevant suggestions on how to estimate the deviation in order to reduce the deviation of information entropy estimation. Similarly, in the estimation of continuous and discrete variables, there is the problem of estimation bias.

Scientific Programming
In the histogram method, a given discrete random variable X contains N results X � X 1 , X 2 , . . . , X n . And the corresponding number of occurrences is x i � n 1 , n 2 , . . . , n N , i ∈ (1, N) which has the following formula: (1) e following results can be derived: Formula (2) is called the naive entropy estimation, and the information entropy is estimated by counting the frequency of discrete values in the sample set. e relevant literature explains the deviation of the naive estimation method. If the real entropy value of the random variable is assumed to be H, then the expected value of the entropy value estimated using formula (2) is as follows [17]: Miller proposed the Miller correction term to correct the deviation of the primary term (N − 1)/2n, which is called Miller correction, and designed an entropy estimation method that takes into account the deviation in the entropy estimation process.
e Miller-adjust entropy estimation method is defined as follows [18]: Formula (4) corrects the deviation of the naive estimation method in the primary term, so that the estimated entropy value is relatively closer to the true entropy value, and further improves the accuracy of the entropy estimation method. e Miller-adjust method only corrects the deviation of the first term in formula (3). However, the quadratic and higher-order terms in formula (3) all depend on the true probability density function value, so it is usually difficult to estimate and correct.
Hierarchical clustering of mutual information based on the Grassberger entropy estimation method is as follows.
Grassberger proposed a series of entropy estimation methods by considering the deviation in entropy estimation. ey all have the following basic forms but differ in the expression of the Φ(x i ) term [19].
If we approximate Φ(x i ) in the formula to log n i , Formula (5) will be transformed into formula (2); that is, the abovementioned entropy estimation method will be transformed into the Naive entropy estimation method by substituting formula (6). If Grassberger assumes that the probability density p(x i ) is much less than 1, then the sample set will follow a Poisson distribution. erefore, the q-order Renyi entropy can be defined as follows: When q infinitely tends to 1, formula (7) is transformed into Shannon entropy. p q (x i ) in formula (7) can be estimated by the following formula: Among them, n represents the number of samples and N represents the number of random variables with different values in the sample set. In a study by Grass Gerber, the Γ function is used to replace the factorial operation in formula (8), and the following formula is used to define the Shannon entropy [20]: When p(x i ) is smaller, this method has higher accuracy than Miller entropy estimation.
Grassberger proposed a more effective method in the subsequent entropy estimation research: Among them, G(n i ) is expressed as follows: where ψ represents the dual gamma function. From formula (11), it can be concluded that when n i approaches infinity, G(n i ) will approach the log function, which is transformed into a naive entropy estimation method, thereby making the entropy estimation approach the theoretical value. However, in actual estimation, it is difficult for n i to reach infinity due to the limitation of sample size. erefore, the Grassberger entropy estimation method is more accurate than the Naive entropy estimation method in the case of small samples.

Scientific Programming
In the research, this article first obtains the estimation formula of mutual information through the relationship between the concepts of information entropy: en, the Grassberger entropy estimation formula is brought into the mutual information estimation formula so that the mutual information estimation method based on Grassberger entropy estimation can be deduced by the following formula: In this part, by introducing the Grassberger entropy estimation method, the mutual information estimation method based on the Grassberger entropy estimation is derived. Because Grassberger entropy estimation can accurately estimate information entropy when the sample size is small, the calculated deviation is smaller than that calculated by the Naive entropy estimation method. Nuwozin also proved in the research that Grassberger entropy estimation can effectively reduce the deviation of information entropy estimation compared with the Miller entropy estimation method. erefore, the mutual information estimation method based on Grassberger entropy estimation deduced in this paper can effectively estimate the mutual information between random variables in a finite sample set, making the estimation of mutual information more accurate and more applicable than the estimation using ordinary methods.
In this paper, the derived method of mutual information estimation based on Grassberger entropy estimation is used in agglomerated hierarchical clustering analysis, and a new mutual information hierarchical clustering algorithm based on Grassberger entropy estimation is proposed. e basic idea of the G-MIHC algorithm is to first treat each sample point as a cluster, so that each cluster contains only one sample point, and then use the similarity metric based on mutual information to calculate the distance between the clusters and build a distance matrix. In this process, the mutual information estimation method based on Grassberger entropy estimation is used to calculate the mutual information value. Second, the algorithm finds the two closest clusters according to the minimum distance criterion and merges these two clusters to form a new cluster, then updates the distance matrix again, and deletes the distance between the original two clusters and other clusters from the distance matrix. e mutual information distance between the new cluster and all other clusters is calculated, and it is put into the distance matrix. e distance information between other clusters remains unchanged during the update process. Finally, the algorithm judges whether the number of clusters meets the termination condition; if not, it continues to merge clusters and updates the distance matrix. If the set termination condition is reached, it stops clustering and returns the clustering result. e algorithm flow chart of G-MIHC is shown in Figure 1.
In terms of setting the number of clusters, there are usually two selection methods. First, when the number of clusters is small, the experiment usually enumerates 1 to N and then uses the evaluation index to measure the optimal situation. e other is that when the number of clusters is relatively large, the algorithm can use the learning method to use the previous state to modify the number of clusters in the current state. Because the biomedical dataset used in this article contains prior category information, this experiment directly uses the number of dataset categories as the parameter setting for the number of clusters.
When estimating information entropy, the general method is to use the discrete probability of the variable to bring in information entropy, and the histogram method is usually used to estimate the probability density of the discrete random variable. When the sample set is large enough, the estimated value of information entropy approaches its true value. However, in the case of a small sample size, if we still use the naive method to simply replace the probability in the histogram with the observed frequency, the statistical fluctuation will make the distribution look less uniform, which will lead to a loss of information entropy. is situation will bias the estimation of information entropy, thereby affecting the accuracy of information entropy calculation. A lot of research work has put forward relevant suggestions on how to estimate the deviation to reduce the deviation of information entropy estimation. Similarly, the problem of estimation bias appears in the estimation process of continuous and discrete variables.

Sports and Health Intelligent Diagnosis System Based on Cluster Analysis
According to the design concept of cluster analysis, the flow chart of the push mechanism of sports health management messages is shown in Figure 2. e system algorithm mainly calculates according to the user's physical condition to recommend better diet and sports suggestions to the user. Because there is no way to give all users professional and targeted advice one by one, we look for relevant information about healthy sports to customize the algorithm and calculate the user's record data to give a reasonable diet and sports recommendation feedback. e user data analysis process is shown in Figure 3.
According to the judgment result of the user's sports and diet situation in the recent period, the corresponding decision is selected from the decision table of many situations preset in the system and recommended to the user. e corresponding process is shown in Figure 4.
Similar functions should be placed in a large module as much as possible, the classification should be reasonable, the main function should be prominent, the level should not be too many, and it is better not to exceed four levels, and we need to think about the layout of the function from the perspective of the user. After the previous analysis and research, the final system module structure is shown in Figure 5.
Based on the analysis of functional requirements and the overall structure design, the health and sports management system is divided into functional modules, as shown in Figure 6.
When the user enters the health sports management system for the first time, in order to make the user have a better experience in the future, the user needs to set some basic information of his own. In this way, data applicable to each user can be better pushed in future use. Figure 7 is an operation flowchart when the user enters the system.

Sports and Health Intelligent Diagnosis Analysis Based on Cluster Analysis
is article combines cluster analysis to construct a sports and health intelligent diagnosis and analysis system, which can make sports plans according to the user's health and can also analyze the user's sports process. After that, this paper verifies the performance of the sports and health intelligent diagnosis system constructed in this paper through experiments. is paper selects users from the society, connects the system with users through wearable devices, and obtains 81 sets of valid data through health diagnosis. e scores of the health diagnosis effect of the system are shown in Table 1 and Figure 8.
From the above analysis, it can be seen that the sports and health intelligent diagnosis system based on cluster analysis constructed in this paper has a good effect on human health diagnosis. On this basis, this article studies the effects of the system constructed in this article on human health and sports recommendation. e results are shown in Table 2 and Figure 9.
From the above analysis, it can be seen that the sports and health intelligent diagnosis system based on cluster analysis constructed in this article meets the expected functional requirements, can make sports plans based on human health, and also proves that there is a clear correlation between sports and health.

Conclusion
Scientific sports can prevent many diseases and can also be used to treat some diseases. What we need is a reasonable and targeted sports guidance method; otherwise, it will often produce unfavorable effects. Moreover, personal errors or unreasonable sports are not only harmful to the physical condition but may also aggravate some health risks. e analysis of the combination of basic personal body data and sports data to provide users with suitable health plans and sports programs and to guide people to perform more reasonable fitness will help our body's health and disease prevention.
is article combines the clustering analysis algorithm to construct a sports and health intelligent diagnosis system based on the clustering algorithm, sets the functional modules of the health diagnosis system based on actual needs, and verifies the performance of the system. From the research results, the system constructed in this paper has a certain effect.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.