Research Ancient Artifact Identification Methods under Intelligent Perception and Recognition Technology

,


Introduction
China is one of the ancient civilizations that has the longest continuous history in the world.Artifacts are one of the significant heritages left by the ancient people.Ancient Chinese artifacts are considered as one of the most significant carriers of Chinese culture and hence play a significant role in Chinese history.At the same time, science has developed severely and human recognition methods are now rarely used.Instead, AI-based recognition systems are now used, which are incorporated into the humanistic verification processes and have creative characteristic identification abilities.Many scientific research institutions started to get involved in the research of ancient artifacts and research using technological techniques to determine the areas that were enriched [1].The study of the chemical element composition features of artifacts has become significant in this study.X-ray fluorescence spectroscopy is an analytical technique for the research of ancient artifacts.Due to the advantages of low detection limit, low cost, fast analysis speed, and simple preparation, it has become a significant technique for determining ancient artifacts.With respect to product information, the product type has different features from regions and times, so the ancient artifact serves a significant role in the appraisal [2].While determining the features, the digital ancient artifacts will add up the core features of artifacts throughout various historical periods and evaluate the law of artifact identification, which acts as a guide for determining the authenticity and dating of ancient relics.The preservation of two-dimensional image information into three dimensions using machine learning technology and wavelet transformation technique seems to be the latest research path for ancient relics [3].This paper is going to discuss all the critical factors and aspects regarding machine learning techniques for the identification of ancient artifacts.
The rest of the paper is organized as follows.Section 2 presents literature review of the study.Section 3 evaluates algorithm used for the study.Section 4 analyzes the experimental details.Finally, Section 5 presents the conclusion.

Literature Review
In order to understand the issue, thorough literature review has been conducted.In an ecological and cultural background, it is essential to identify real and fake ancient artifacts, which requires modern technology and has social significance.In recent years, there have been two types of ancient artifact detection techniques, namely, traditional and modern.The traditional detection method is defined as the ancient method of determining ancient artifacts with the naked eye [4].It can be highly influenced by different inspectors and come to conclusions based on different personal experiences towards inspecting the same ancient artifacts works of art.Due to the advancement of technology, there are different types of detection methods used in identifying ancient ceramics and pottery fragments [5].They are the element identification method and the aging identification method.The element identification method consists of a small number of chemical elements.This method is applied to the exploration of pottery fragments [6].If people prefer this method to identify ancient ceramics, they will identify the content of each ceramic element and obtain its proportion.The process of identifying various types of aging traces of ancient ceramics, such as patch traces, pit traces, exfoliation traces, linear traces, and others, which are used to evaluate the authenticity of ancient ceramics related to fakes or some age of ancient artifacts, is known as aging identification technique.To forecast the accurate production age, aging identification methods such as thermoluminescence dating, devitrification structure analysis, and accuracy testing are used.There are some new ways that must be identified for scientific identification.Ancient artifacts are one of the historical relics that constitute a significant position in the traditional culture of China [7].Archaeology is a complex way of knowing human history.In addition, many discoveries were explored through archaeological work, influencing and supporting other intellectual disciplines like demographics, geography, and more [8].In some ways, archaeology generates historical context that several fields focus on.In recent years, technology has paved the way for new research directions in ancient ceramics, restoring twodimensional images into three-dimensional models using machine vision technologies [9].Due to the increased traditional culture, the influence of ecological and cultural focus on traditional art seems to be vital.People have higher requirements and expectations for the quality of ancient ceramics [10].People expect to practice scientific testing techniques for ancient ceramics and acquire scientific and comprehensive information on ancient ceramics.This has resulted in the development of computer vision technology used in ceramics.
Artificial intelligence technology is creating a new revolution in the field of archaeology [11].AI technology plays a significant role in the archaeology field and works as the most promising application with high random success possibilities without any denial work.However, it affects the human workforce rigorously through the advancement of transformation in our lives.AI utilizes large amounts of data to execute its results in a short period of time and uncover hidden patterns that are extremely useful.Artificial intelligence is a significant tool in the arsenal of archaeologists' process that brings back history to new life.In the past few decades, some neural networks and ML algorithms were unknown concepts in archaeology.Moreover, AI supports archaeology departments in analyzing complicated, highly specialized, subjective, and time consumable projects, even though there may be a vast amount of data in them [12].
To determine the locations or origins of objects, today's technology employs a geographical recognition system.GIS has been used to redistribute and organize the objects into subclasses [13].The artifact ID numbers sped up the process, allowing for more data tables to be produced.The link between artifact classes could be tested by creating shape files from artifact classes.The location of all the ceramic bowls in the cave has to be determined.It is possible to evaluate the correlation between artifact classes and bowls [14].The buffer function helps to evaluate if specific types of cave features were selected for ritual activities.The geomorphic feature coverage was developed so that a number of artifacts coming under the zones would be generated.Earlier, LIDAR (Light Detection and Ranging) was used to identify the artifacts.It usually creates spatial information using light.These devices compute the distance to objects on the ground by evaluating the time it takes for the reflected laser pulse to return to the device.In [15], 2D Siamese neural network is used to determine the matching probabilities.This network is used for predicting the absence or existence of a match.The study used the first ancient text restoration model called Pythia, which retains missing characters from damaged text using deep neural networks.This architecture is used to manage long-term context information and corrupted word or character representations [16].
In Greece, they found some of the ancient texts scripted on stone tablets that provided irrelevant findings about the past civilizations' histories.Since 2,600 years ago, these types of worthy inscriptions were not even considered for any advancement in the restoration process until the ravages of time.These priceless texts were damaged in a number of ways, including cracks and chips.It resulted in sometimes missing entire fragments of the scripts.Most of the fragmented texts are considerably loaded with missing sections, like the epigraphy method, where most historians depend on this type of discipline.It uses layout and shape, historical context, grammatical and linguistic considerations, and textual parallels to examine missing texts on damaged tablets.It is a complex and time-consuming process.Hence, Pythia has been developed for retaining missing historical texts or missing characters from the detained damages of the text.It is one of the first ancient text restoration models.The researchers involved in training the system diligently converted PHI, which is the first largest digital corpus of ancient Greek monument work, inscriptions, and text to be converted successfully into a machine learning algorithm, meant to be more understandable and reliable too [16].It will guess and display the missing texts or character sequences one by one in a ranking list of damaged text, along with possible solutions, and assign a confidence score to each suggestion.Now, human experts research all options for complete analysis in a stepwise process and make their own judgments to become experts in finding a reliable solution for the damaged characters.
Machine learning and 3D modelling are determining the burial sites in satellite images, revolutionising the archaeology field, classifying ancient pottery fragments, identifying ancient artifacts, developing 3D digital reconstruction of historical sites, and identifying the artifacts that are sold on the 2 Wireless Communications and Mobile Computing web illegally [17].The study used predictive modelling in geochemistry to predict regional background soils [18].In [19], proof of concept (POC) was used to demonstrate the technical possibility and viability of classifying objects from a limited number of pages.The researchers introduced a novel approach to machine learning to identify pottery fragments [20].In addition to this, the researcher used proof of concept of deep learning methods to obtain archaeological information from historical maps in a greater way.The study implemented deep learning technology for predicting archaeological sites [21].The researchers deployed convolutional neural networks to identify the tombs [22].
When an archaeologist attempts to reconstruct damaged artifacts, they face three major challenges.There is also color fading, abrasions, and text continuity.Abrasion creates gaps between each piece, making it difficult to match the adjacent fragment pieces.Color fading makes it difficult to distinguish between gradients and the true edges of the fragmented image.In addition, it always produces false edges.Finally, the undefined shape of the fragmented text pieces produces important factors of continuity issues that are very difficult to resolve by offering one or more possibilities to configure them [23].To evaluate exact neighbouring pieces, the detailed scans of algorithms are executed first to extrapolate all fragments, and then, they help to predict the surrounding fragments instantly.The special characteristics and features of this process are taken into consideration, such as color fading, color lengths of matched boundaries, spurious edges, gaps in pieces, and exact transformations.These are perfectly matched by assigned algorithms to find each pair of fragment dissimilarity and produced confidence scores to obtain exact matches.Thus, this type of algorithm was tested on different real archaeological objects, including frescoes from churches and the British Museum, to reassemble the major defects in the broken artifacts.The research used a multiautonomous underwater vehicle motion planning method for underwater archaeological sites [24].Throughout this literature review, it is observed that the study of identifying burial sites and ancient texts has been done through computer vision, 3D modelling, and machine learning approaches.There is no such existing study found on this research topic using machine learning technology.In order to break the conventional technique, this study has been conducted.The methods used for identifying ancient artifacts have been mentioned in the next section.

Materials and Methods
A classification approach is used in this study to classify the data that are significant for training data.Classification is a predictive modelling problem that consists of two steps, namely, learning and prediction [25].This classification model is developed with the given training data in the learning step.This model is used to predict the response of present data during the prediction step [26].For this study, a decision tree algorithm is used to classify the cities with different features like ancient harbours, battles, temples, modern cities, and parts of South China.The study used the Python programming language for implementing machine algorithms.

Decision Tree Algorithm.
The decision tree algorithm is an algorithm used for solving classification and regression problems.It comes under the family of supervised learning algorithms.The objective of the decision tree is to develop a training model, which can be used to forecast the value or class of a target variable by learning decision rules obtained from training data.In this algorithm, the class label will be predicted as it starts from the root of the tree [27].A decision tree is a support tool that prefers tree-based models of decisions and possible consequences such as resource costs, utility, and chance event outcomes.In classification trees, the target variables have discrete values whereas the regression trees have continuous values [28].Each interior node refers to the input variables.In the decision tree, the edges refer to children and also represent possible values of the input variable.Each leaf indicates the value of the target variable and the values of input variables indicated by the path from root to leaf.
Figure 1 represents the decision tree used for this study.It clearly indicates the classification process that took place in this study.By using this approach, we can identify the location where the ancient artifacts found.In order to evaluate information gain, entropy and the Gini index will be calculated.The decision tree algorithm prefers information gain to splitting the node.Both these functions are used to evaluate the impurity of a node.
3.1.1.Gini Index.In general, the Gini index is used for impurity functions.It can be used to determine the dispersion of the population.This function can be used for decision tree learning.It can be determined by this equation.
where P = ðp 1 , ⋯ ⋯ p n Þ. Pi denotes the probability of an element being classified for a distinct class and n represents data items.GiniðPÞ is zero, if p i = 1 and each pi < 1.It will be increased if all p i is equal.It evaluates the expected error if we select a single record and prefers Y value as predictor.Equation ( 1) offers values of Gini for the given decision tree.
3.1.2.Entropy Function.The decision tree uses a top-down approach, in which a tree can be created from a root node and the data is divided into subsets with similar values.In this case, the decision algorithm computes the sample's homogeneity using entropy.If the sample is homogeneous, the entropy is zero.
To create a decision tree, the entropy needs to be calculated.In this case, entropy using frequency single table will be computed by using the below equation.
In the above equation, S represents split the data and pi 3 Wireless Communications and Mobile Computing denotes the probability which it is function of entropy.For this equation, the misclassification error rate c is ɛf0, 1g.
The information gain is highly dependent upon the decrease in entropy if the dataset is split on attributes.It is about determining attributes that return high information gain.In order to evaluate the accurate information gain, entropy functions have been used in this study.

Implementing Decision
Tree.This study uses decision tree and gradient boosting techniques to identify the location of ancient artifacts.A gradient boosting technique has been implemented in the study.Gradient boosting is a class of machine learning algorithms that can be used for regression predictive modelling issues.It is also known as stochastic gradient boosting or gradient tree boosting.By using decision tree models, these ensembles are created.Trees will be included one at a time into the ensemble and make it correct the prediction errors made by prior models.It is a form of ensemble machine learning model known as boosting.These are fit with the help of a gradient descent optimization algorithm and an arbitrary differentiated loss function.When minimizing the loss gradient, the model fits like a neural network.A significant way of generating a weighted combination of classifiers can be optimized by gradient descent in the function space [28].There are some improvements given to basic gradient boosting which can enhance performance, such as follows: (i) Weighted Updates.This refers to learning rate used to restrict how much each tree involves in the ensemble.
(ii) Tree Constraints.This refers to number of trees and depth of trees used in the ensemble.
(iii) Random Sampling.This refers to fitting trees on random subsets of samples and features [29].
The implementation of random sampling will cause a change in the name of the algorithm to stochastic boosting.Instead of the full sample, the randomly selected subsample will be used to fit the base learner.Gradient boosting is an efficient machine-learning algorithm used to attain competition in machine learning on similar and tabular structured datasets [30].
In order to create a gradient boosting model, the weak models should be combined optimally.The steps of gradient boosting are as follows: (1) With the help of a data sample, we can train a weak model (2) Enhance the weight of samples that are misclassified by step 1 and minimize the weights of samples that are classified correctly (3) By using a new sample of weight distribution from step 2, we can train the next weak model Gradient-boosting trains the data that is complex to learn in the past rounds, but it has resulted in an ensemble of models.In gradient boosting, the training model method will be computed by using the function called trainingModel ().
The Algorithm 1 has implemented in the program for obtaining prediction results.This algorithm is used for computing the output ŷ.  [31].The Shaolin temple has been depicted in TV series and movies.Historical sites include the Potala Palace and its Buddhist temples and monasteries.The imperial garden city has gardens which are hundreds of years old, like the humble administrator's garden.Macau was one of the first cities in the west during the age of exploration.By considering all these historic places, the study focused on major cities present in China.For this study, eleven cities in China have been chosen to analyze ancient artifacts found in the country.The cities like Beijing, Wuhan, Nanjing, Yunnan, and Yangtze, Canton, the Han dynasty, Great Yunnan, and so on have been taken for this study.For this study, 23 provinces and their ancient artifact cities have been taken for analysis.In order to identify the city with its locations, we used the latitude, longitude, and altitude of the mentioned cities.This data was obtained from the Google search engine.The cities are classified based on the significant features.The first feature is a "temple," which indicates the locations near the ancient Chinese temple.These temples may be different from the Shaolin or Buddhist temples in Beijing.In the ancient period, temples were used to attract commerce and pilgrims, so the proximity of temples indicates that archaeological findings were important.The second characteristic is "battle," which indicates that the location is closer to an ancient battle site.In China, several wars were fought.The third feature is "ancient harbour," indicating that the location is closer to

Define set of predictions 2. Error in the prediction is Jðy, ŷÞ
For MSE: J ð:Þ = ∑ðy½i − y∧½iÞ 2 3. We can adjust ŷ to try to minimize the error ŷ½i = ŷ½i + alpha f ½i for MSE, FðiÞ ≈ ΔJðy, ŷÞ = ðy½i − ŷ½iÞ 4. Learner is evaluating the gradient of loss f ' n 5. Gradient descent usually takes sequence of steps to minimize J 6. Sum of predictors will be estimated by size alpha.
Algorithm 1 5 Wireless Communications and Mobile Computing ancient harbours.Some ancient harbours have been discovered in ancient literature.The ancient cities' landscape consisted of land used for hunting, commerce, farming, and mining.Because these areas around cities saw human activity, it is fair to expect that humans left traces.The fourth feature is "South China," indicating that the location is in South China.South China has a large number of artifacts and cultural heritage sites that have been mentioned in ancient literature.The final feature is modern city, indicating that the location is closer to the modern city.We analyze the data using decision tree and gradient boosting algorithms when we use these features.
3.4.Analyzing the Data.Because all of the features are binary, the mean of each column explains the data distribution.The mean for the feature near South China is 0.82.It indicates that 82% of the data points are located near South China.Because this is a large number, this feature and its influence are limited.In addition, other features such as battles, temples, and modern cities are distributed evenly.
The classification column has a value of 0.54 indicating that positive and negative training examples are equally distributed.Following that, it is important to compare the mean of features to the classification column.We can see the two ways of each significant feature in this way.If the classification column is zero, it means that no object was found.If the value is 1, it indicates that the object has been discovered.It is critical to remove any unnecessary columns from the data frame.
The "harbour nearby" feature and the "South China nearby" feature are shown in Figure 2 as essential features for the algorithm.We can see from this graph that the number of archaeological sites or artifacts is high in the harbour and South China areas.Nonartifact locations are closer to battles, modern cities, and temples.

Results
The information gain has now been calculated using the decision tree algorithm.Following classification, the model must be trained with gradient boosting.Gradient boosting is used to improve several weak predictive models in an ensemble of weak models [32].The existing logistic regression model was unsuitable for assessing the study's accuracy.Mean (MSE) is used in decision trees to divide a node into two or more subnodes.For each subset, the MSE will be computed [33,34].If the MSE value is zero, it will overfit for the other set.If the dataset is large, the MSE value will rise.Figure 3 depicts the MSE decision tree construction.In this case, only three samples were taken to build the tree.It is significant to divide data into two subsets.We need to divide until the value of MSE gets small value.The tree prefers the value with results in smallest MSE value.For constructing tree, best split is to try every variable and every possible value of such variable.The splitting will stop when some stopping condition is attained.If the leaf nodes have only one, then the no further split is possible.Further, MSE will be zero but will overfit for other dataset.We must assess the accuracy of the tree once it has been built.Furthermore, the results are significant when compared to other methods.The results of using the logistic algorithm show that the hyperline is plotted below.As a result, it is inappropriate for this dataset.The hyperline will be above if a decision tree-based gradient boosting algorithm is used.We proved mathematically that the value of information gain with entropy is 98%.So the problem is underfitting.Additionally, the precision and recall metrics have been assessed.Precision is used to calculate the percentage of relevant results.The percentage of total results classified by the algorithm is calculated using recall [35].
Entropy is then calculated using the abovementioned Equation (2).In general, accuracy is heavily dependent on the dataset.Table 1 shows the sample sizes of ten that were used to calculate entropy.
For computing the entropy, we are taking ten sample data.
Here, 7 and 3 are the number of locations where artifacts found and not found, respectively.We are calculating for yes and no.(5) 98.0% Hence, the entropy function for the given dataset is 98%.It is predicted that accuracy of the result will be the same.
From Table 2, the score of train data and test data obtained from the analysis was 0.75 and 0.98, respectively.
Figure 4 represents precision and recall graph of the given data.It indicates that both recall and precision are positively correlated as it predicts right values.The algorithm's output is depicted in Figure 5.We use the SKLearn function "predict" in this case.In this case, we require x features as input, and the output is ŷ.The code will receive input features, and the model will provide prediction as an output.If the search location is appropriate, it will 7 Wireless Communications and Mobile Computing return the phrase "this is a good site to investigate."According to the findings, the most important features used for identifying ancient artifacts in China were south China and the harbour.We can see from the mean and classification that the means of South China and harbours were both high.As a result, there is a good chance that ancient artifacts will be discovered in these areas.This research model provides a good indicator for locating archaeological sites.
The prediction is defined as a response in the SKlearn function.It can be written as ŷ.In terms of output, the model provides accurate prediction results.We are using ROC (receiver operating characteristic) to assess the quality of the model.A metric compares true positives to false positives.This code retrieves the data and converts it to a data frame.In this study, we are developing a gradient boosting classification to predict whether or not a site is suitable for archaeological investigation.The data frame is used as input in the training model.The data frame for X values as features and Y values are used in this method.It is critical in this case to train the gradient boosting classified from the SKlearn library.It provides a different implementation of the gradient boosting algorithm.We must compute the ROC value in order to evaluate the false and true positive rate.The ROC curve helps in determining the performance of a machine learning classifier.The significance of feature importance is significant in this study, and the features (X) are critical for predicting the outcome.It gives you gradient boosting with a true and false positive rate.The accuracy of gradient boosting will increases with the size of the dataset.If the AUC (area under curve) is 1, the classifier can correctly distinguish between all positive and negative classes.If the value is 0, the classifier will predict all positives to be negatives and all negatives to be positives.In this case, the AUC is 0.84, indicating that the classifier can effectively perform its function.Furthermore, the ROC curve rate is 0.98, indicating that the algorithm model works well in predicting and evaluating accuracy.
The ROC curve of the chosen algorithms is depicted in Figure 6.The area under the ROC-curve is 0.98, indicating that this model is performing extremely well.It is critical to evaluate the individual characteristics.Also, AUC is 0.84 which indicates that decision tree is considerably effective.Once the gradient boosting model has been trained, new predictions are required.The blue line in this graph represents the difference between true positive and false positive rates, while the green line represents the difference between false negative and true negative rates.The results show that there is a high correlation between the true positive and false positive rates, indicating that it correctly predicts values.

Conclusion
Ancient artifacts are significant for learning about the ancient people and their culture.In modern cultures, the researchers are tracking the lifestyles and practices of recent periods.Archaeologists are involved in the identification and exploration of relics and artifacts for future generations.The identification method for ancient artifacts is crucial for scholars and archaeologists as they are practicing modern methods.The development of technology has paved the way for exploring ancient artifacts in a simple manner.This study focused on identifying the ancient artifacts using machine learning algorithms.This study proved that implementation of decision tree and gradient boosting algorithms in identifying cities around China.The study proved that most of the archaeological sites were found near ancient harbour and South China.The results implied that the performance of these algorithms showed 98% accuracy.For future research, it is suggested to implement artificial neural networks to predict potential sites of artifacts using satellite recognition images.

Figure 2 :
Figure 2: Means of features to identify the ancient object.

Figure 3 :
Figure 3: Decision tree constructions with mean square Error.
For over a thousand years, Beijing has been the greatest historic city in China.The Forbidden City and the Great Wall of China have been the imperial palaces for the last six hundred years.Xian is the historical city that has the most ancient atmosphere and character.Nanjing is one of the capitals of the Jin and other dynasties as well as the modern republic of China.Luoyang was the major capital of ancient China where different dynasties, from the Eastern Zhou to the Han, ruled.The Longmen Grottoes are remnants of the imperial era, where a number of historical and Buddhist figures were carved.Hangzhou is the imperial landscaped garden and retreat for Beijing emperors.Anyang is one of the ancient capitals in China, the earliest known capital.Anyang was situated between the Yangtze Middle and the Yellow River's lower reaches.The Shang dynasty ruins, where other artifacts are found in the museum over the excavations in the park area.Zhengzhou was the capital of the Shang dynasty

Table 1 :
Sample data for entropy calculation.