Service-Life Study of Polycarbonate Outdoors Using Python with Incomplete Data

The deterioration of polycarbonate (PC) depends on various environmental factors. Meanwhile, the complexity of the related weathering processes inhibits the prediction of service life based on the environmental factors. To elucidate the nonlinear correlation between PC weathering and the environmental factors, three-year-long natural weathering tests were conducted at eight experimental stations in China. The relationship between tensile-property data of PC and environmental and pollutant data is analyzed by extra-trees and multilayer perceptron networks implemented in Python. The results indicated that (1) the degradation of PC tensile properties is mainly affected by the experimental period (76.37%), whilst the effect of the environmental or pollutant factors on the degradation is less pronounced (23.63%); (2) the classification accuracy of the trained model on the training set is 91% (91/100), and on the testing set is 72.13% (44/61); and lastly, (3) it is inferred from the error analysis of the classification results that the performance change of polycarbonate in Qionghai and Wuhan is characterized by an initial reduction followed by a slight improvement. Lastly, we show that the proposed method performs well, especially in the case of areas with incomplete data available.


Introduction
Polycarbonate (PC) is a widely used engineering plastic owing to its excellent mechanical properties and low specific gravity. However, the deterioration of PC materials is inevitable and largely depends on the ambient factors in their application, e.g., solar radiation, temperature, water exposure, and atmospheric pollution [1][2][3][4]. Besides its physical and chemical properties, the weathering of PC is the result of the combined action of various environmental factors. The interplay between these environmental factors is so intricate that the service-life prediction of PC products in many different environments is extremely challenging. Hulme and Cooper [5] concluded that the difficulties of the predictions of the service-life of a polymer are as follows: (1) polymers are time-, temperature-, environment-, and stress-dependent; (2) the limit(s) of various properties of the polymers at which they fail is often unknown; (3) the service conditions generally vary and often include fault situations; and (4) for com-plex applications, it is impossible to fully replicate the service condition in accelerated tests. However, when information about the time, temperature, environmental factors, and mechanical properties of the polymer can be collected and analyzed on a large scale, the above-listed challenges can be mitigated.
Artificial intelligence (AI) has been rapidly developing in the recent years. Consequently, software supporting various AI tasks is continuously being made available. However, only a few people use this software to apply the latest machine learning methods to study material weathering mechanisms. The limited information available about new algorithms and the lack of skills required to use them may be one of the reasons. Nevertheless, with additional basic tools, the state-ofthe-art AI algorithms can be deployed through Python, which is a high-level language suitable for scientific and engineering applications. The use of Python enables the rapid and flexible development of AI applications, which can be further enhanced with additional extensions [6]. In addition, the Python programming language established itself as one of the most popular languages for scientific computing [7]. Thanks to its high-level interactive nature and its maturing ecosystem of scientific libraries, Python is an appealing choice for algorithmic development and exploratory data analysis [8,9]. Python is easy to learn and convenient to apply. Hence, in the present work, two common machine learning methods (extra-trees and multilayer perceptron networks) have been used for the analysis of various environmental factors and mechanical properties of PC materials.
In fact, the use of Python-or other tools-to integrate machine learning methods for scientific applications has been gaining attention. Ong et al. [10] developed the Python Materials Genomics (pymatgen) library, a robust, opensource Python library for materials analysis. Nevertheless, few researchers have applied Python as the main means of studying weathering mechanisms. Similarly, with the help of machine learning, we can find hidden connections in large datasets.
The study of weathering mechanisms remains a significant and valuable research subject. Many researchers have studied weathering mechanisms through various laboratory-based methods, both at the macro and micro levels [11][12][13][14]. However, for enhanced material protection in practical applications, it is necessary to study the weathering mechanisms of materials outdoors. Liu et al. [15] developed an outdoor weathering-life prediction system for PC based on Artificial Neural Network (ANN).
In this work, a three-year-long PC natural weathering test was conducted at eight exposure stations in China. After a tensile-strength and elongation at break factor analysis of the weathered PC, we counted the frequency distribution of the values of all environmental factors in a large-scale data analysis process to identify the most influential factors. Consequently, we separated the most important factors by an extra-trees algorithm to reduce the disturbance introduced of unrelated factors. Lastly, a multilayer perceptron neural network was constructed based on the relationships observed among the characteristic environmental parameters, tensile property variation parameters, and service lifetime. We introduced a guiding action into the model to study weathering mechanisms. There are two main advantages of this method: on the one hand, it maximizes the information extracted from the collected data even if the original data does not have a uniform scale and is incomplete. On the other hand, the applied method can identify macroscopic laws based on the large-scale data analysis.

Materials and Methods
2.1. Materials and Sample Preparation. Raw PC materials (K1300, Teijin Limited) were purchased. Standard dumbbell tensile samples (150 mm gauge length, 4 mm × 10 mm cross-section) of pure PC were injection molded on a UA120A injection-molding machine (Yizumi, China). The injecting temperature, mold temperature, injection pressure, packing pressure, and pressure-holding time were 190°C, 40°C, 700 bar, 150 bar, and 10 s, respectively.

Outdoor Weathering Experiments.
According to the ISO 877 standard, the exposure tests were conducted at eight natural exposure stations in China. The stations were exposed to different climate types. The eight stations are located in Wuhan (WH, subtropical zone, humid urban climate type), Lhasa (LS, warm temperate zone, plateau rural climate type), Wanning (WN, torrid zone, marine climate type), Dunhuang (DH, warm temperate zone, dry and hot desert climate type), Shenyang (SY, warm temperate zone, humid urban climate type), Jiangjin (JJ, subtropical zone, suburban acid rain climate type), Guangzhou (GZ, subtropical zone, humid urban climate type), and Qingdao (QD, temperate zone, marine climate type) (Figure 1)    Modelling and Simulation in Engineering environmental factors-such as monthly mean, high, and low temperature (T); monthly mean, high, and low relative humidity (RH); rainfall duration (RD); precipitation (P); sunshine duration (S); total solar radiation (G); infrared radiation (IR); ultraviolet radiation (UV); sulfur dioxide (SD); hydrogen chloride (HC); nitrogen dioxide (ND); hydrogen sulfide (HS);  outdoor weathering, the PC samples were fixed at both ends on aluminum alloy frames, tilted at 45°from the horizontal position, and directly exposed to the south without any backing.

Characterization Methods.
Dumb-bell specimens were used for tensile tests according to the ISO 527.2 standard, using a universal material machine (CMT 6503, MTS Systems Corporation), at a stretching rate of 20 mm/min.

Data Source.
All environmental-factor data were collected from the website http://data.ecorr.org, which was built for public research. The original data was recorded in excel tables such as the ones shown in Tables 1 and 2. We focused on five areas whose environmental factors were fully documented. We will refer to these areas as well data-accumulated areas (Guangzhou, Qingdao, Shenyang, Wanning, and Wuhan). The remaining three areas will be referred to as incomplete data areas (Dunhuang, Jiangjin, and Lhasa).
The The extra-trees algorithm was developed from random decision trees, which is a classic machine learning method. The traditional decision trees divided all objects into different branches according to whether their characteristics fit the filtering conditions of the individual branches. The basic process of traditional decision trees is shown in Algorithm 1.
Geurts et al. [16] developed the extra-trees algorithm by adding the following process, which extremely increases the randomness of decision trees.
For regression problems [16], we have where var fyjSg refers to the variance of the output y in sample set S, r and l refer to the right and left branch of the node, respectively. Moreover, Pierre Geurts proved that the extra-trees learning algorithm could provide near-optimal accuracy and good computational complexity, especially on classification problems.
The criteria for selecting optimal decisive attributes is the key to successful sample classification. Similarly, if applying a given attribute as a decisive attribute can significantly improve the classification accuracy, that attribute is important for describing the weathering mechanisms of the samples. Therefore, multiple methods are available for selecting the optimal and decisive attributes.
Geurts et al. [17] have described the variable importance in forests of randomized trees. They demonstrated that Mean Decrease Impurity (MDI) importance computed by totally randomized trees and extra-trees exhibit desirable properties for assessing the relevance of a variable: it is equal to zero if and only if the variable is irrelevant and it depends only on the relevant variables.
We used this method to find the important factors which can contribute more significantly to the weathering process of PC materials than other factors.

Weathering-Life Prediction Model with Multilayer
Perceptron Networks. With the development of artificial intelligence in the recent years, ANNs gained widespread popularity for data processing in various industries. There is an increasing number of convenient computer software packages that facilitate the use, implementation, and application of ANNs. For example, only by importing several parameters can a corresponding model be implemented in Python, which was used in the current work.
The structure and basic principles of multilayer perceptron networks-as one of the first ANN models-have already been described by many researchers [18]. Hence, the multilayer perceptron network shown in Figure 2 was implemented in Python.
Besides its basic use, the sklearn library provides alternative activation functions and solving algorithms (the solvers for weight optimization) which work in various situations. For example, "lbfgs" is an optimizer from the family of quasi-Newton methods; "sgd" refers to stochastic gradient descent; "adam" refers to the stochastic gradient-based optimizer proposed by Kingma and Ba [19]. In addition, it is noted in the help document of Python that the default solver "adam" works well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, "lfbgs" can converge faster and perform better. Hence, we chose the "lfbgs" solver in this work.

Activation functions
Python provided four types of it: relu, tanh, logistic, identity.  Some data were not recorded due to reasons beyond the experimental control. Consequently, more accurate conclu-sions could be drawn from a complete dataset. Therefore, a method that was robust against small numbers of missing values is needed.
Firstly, frequency distribution graphs were constructed (Figures 3 and 4) for the whole dataset, i.e., for each factor,   Modelling and Simulation in Engineering so that the data could be intuitively presented. A frequency distribution chart of all environmental factors was drawn to observe the distribution of different factors in each region, and to determine whether median values could be used to suitably characterize and simulate the actual environment.
There are similarities to study the impact of individual key factors on performance data. For calculation, after uniformly dividing the maximum and minimum values of each factor into 15 intervals, based on the amount of data falling in different intervals, the frequency of the total data of the factor is determined. Based on the frequency distribution graphs, the data about most factors were concentrated in a specific interval that depended on the region. The temperature and relative humidity appear relatively as noise as these two climatic factors periodically vary with the seasons. However, there is a distinct difference between the curve fluctuations, suggesting that this noise would not affect the distinction between the environments of distinct regions. Hence, the median values of every factor were selected to characterize them based on the regions, as shown in Tables 3 and 4. 3.2. Importance of Factors. The ExtraTreesClassifier package of the sklearn library of Python 3.6 offers a convenient approach to determining the importance of each factor, as shown in Tables 5 and 6. When using the algorithm, we set the environmental factors-including the pollutant factors and the experimental time periods-as the cause and set the elongation of materials at break as the result. Subsequently, the ExtraTreesClassifier operation was repeated 100 times. The mean and the variance of the 100 results were used to determine the importance of each factor with statistical significance.
Tables 5 and 6 suggest that the experimental period is the most important factor as it exhibited an importance of 0.7637 that is one or two orders larger than that of the other factors. Hence, the experimental period contributes most significantly to the tensile property degradation of PC materials (76.37%). This is in line with most laboratory findings. Besides, the remaining approximately 25% importance can be attributed to the environmental factors. These relationships are shown in Figure 5.
Based on existing research, high temperature can affect the mechanical properties of polymer materials. Hence, all factors with importance parameter larger than the monthly maximum temperature were chosen for the next step. Table 7 shows the data of the chosen factors in regions with partially missing data. Eventually, the data was combined

Training and Testing.
For the training dataset, to predict the outdoor service-life of PC materials, we set the experimental period as the output and 14 factors (the 11 environmental factors in Table 7 and 3 mechanical properties: tensile strength, yield strength, and elongation at break) as the input. In order to obtain the best performance, every combination (especially the hidden-layer structure and the activation function) within reasonable limits has been explored. Figure 6 shows the results. Figure 6(a) is the identity activation function; the best hidden-layer structure is 27-38, and the achieved accuracy is 62.30%. Figure 6(b) is the logistic activation function; the best hidden-layer structure is 12-23, and the achieved accuracy is 70.49%. Figure 6(c) is the relu activation function; the best hiddenlayer structure is 15-16, and the achieved accuracy is 72.13%. Figure 6(d) is the tanh activation function; the best hidden-layer structure is 10-31, and the achieved accuracy is 70.49%.
Overfitting is a common problem in machine learning models. Therefore, the accuracy achieved on the training data set cannot be a reference standard of the models' performance. Figure 6 shows the accuracy of four types of activation functions for several hidden-layer structures and shows that the accuracy can reliably measure models' performance. Furthermore, the figure shows that the optimal combination is the "relu" activation function.
The neural network with two hidden layers with 15 and 16 neurons, respectively, combined with the "relu" activation function provided the best classification accuracy: 91% on the training set and 72.13% on the testing set.

Discussions
The model trained by 100 input features can recognize the training data with a 91% accuracy (91/100) and the test data with a 72.13% accuracy (44/61). Moreover, additional information can be extracted from the specifics of the recognitions. Tables 8 and 9 show the classification accuracy for various well data-accumulated areas and incomplete data areas, respectively. Tables 10 and 11 show the classification accuracy of various experimental periods in the well data-accumulated areas and the incomplete data areas, respectively.
In terms of the recognition of the training data, the observations in Qingdao and Shenyang (40 observations) were all accurately classified, whilst a single false recognition (8/9) was observed for Wanning and Wuhan ( Table 8). The single false recognition resulted from a confusion between 12 months and 36 months whilst no recognition was confused between 12 months and 24 months, including the incomplete data regions (Table 11). Therefore, it can be inferred that the outdoors mechanical degradation of polycarbonate is characterized by an initial deterioration followed by a slight improvement (as shown in Figure 7) assuming that the systematic error generated during data collection for the three different experimental periods is the same.
There is a significant deterioration of the mechanical properties between the samples exposed to the environment for 12 months and 24 months. Hence, no false recognition was observed among them. Almost all misclassifications (25/26) are related to the 36-month samples. Hence, it is probable that the mechanical properties of the samples   Non-water-soluble dust (g/m 2 ) 0.0059862 2:42E-05 exposed for 36 months fall between those of the samples exposed for 12 months and 24 months. Therefore, it is difficult to accurately classify the samples exposed for 36 months. The frequency of the misclassification of the samples exposed for 36 months as samples exposed for 12 months or as samples exposed for 24 months depended on whether the mechanical properties of the samples exposed for 36 months were closer to those of the samples exposed for 12 months or   Figure 6: Performance of the explored neural network structures with the number of neurons in the two hidden layers varying between 10 and 50.
samples exposed for 24 months. The statistics obtained from every sample shown in Figure 7 support these findings.
Considering the results shown in Figure 8, the black line is more likely to coincide with the blue line than the red one. Moreover, the black line intersects the blue line at 73%. The crossover point at 73% also divides the three lines into two parts: on the right side of the crossover point, all three lines exhibit the same trend, reaching their maxima at 100%; on the left side of the crossover point, the red line differs from the other two by exhibiting a semi-circle-shaped peak. This suggests that the samples exposed for three years (three-year samples) have a similar breaking elongation value distribution as the one-year samples. However, the breaking elongation values of the three-year samples will partially shift to the left as a result of the weathering and the degradation of the mechanical properties of the materials. Therefore, a higher proportion of the three-year samples reached an elongation before break above 73% and a lower proportion below 73%. Similarly, the two-year samples have the same value distribution and a significant and concentrated left shift, especially in the part below 73%. Jiang et al. [20] researched the weathering mechanism of bisphenol A polycarbonate. Their results support our findings. Jiang et al. showed that this phenomenon is a weathering-induced ductile-brittleductile transition which partially results from the competition between oxidation-induced chain-scission and chain crosslinking. Hence, it is inferred that our polycarbonate samples also exhibit the same weathering mechanism. Table 9 shows that the classification accuracy of the experimental regions in incomplete data areas is less than that of the well data-accumulated areas by 18.87%. There are three possible reasons: (1) The median characterization does not perform well because of the smaller amount of data available in the incomplete data areas. Errors tend to be more pronounced with smaller amounts of data (2) Statistically, the data range of some factors in the incomplete data areas is far beyond that in the well data-accumulated areas. Hence, the difficulty of recognizing the test data is beyond the capacity of the model trained by the limited training data (3) Fundamentally, different regional environmental characteristics led to essential differences in the property degradation of the polycarbonate between the training data areas and the test data areas

Conclusions
It was proven that by using the integrated tools of Python, it is possible to conveniently analyze data with the state-of-theart mathematical methods. The important climatic factors and pollutant factors affecting the breakage elongation identified by the extra-trees algorithm present high stability and interpretability. In addition, the important parameters     guided a more reasonable use of the data in the subsequent process and improved the performance of the multilayer perceptron model. If limited amounts of data are available, looping through all possible combinations with high computing performance is a reliable way to find the optimal hyperparameters of a machine learning model. The model obtained through this method could recognize the experimental periods with relatively high accuracy. This provided an important reference value for the study of weathering processes and appropriate protection measures for polycarbonate in atmospheric environments. According to the error analysis, from the macroscopic point of view, the outdoors mechanical properties of polycarbonate would deteriorate first and then rise slightly. This suggests that the outdoor weathering process of polycarbonate is a ductile-brittle-ductile transition.
It is feasible to predict the weathering periods of samples in incomplete data areas with samples in the well dataaccumulated areas, although with modest errors. Moreover, it is more accurate to predict the service life of certain samples from the data obtained from well data-accumulated areas.

Data Availability
The raw data of materials' mechanical property required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study. The environmental data can be found in China Geteway to Corrosijon and Protection (http://data.ecorr.org/) and a part of them is in the attachment.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.