Analysis of the Application of Deep Learning in Model Reconstruction of Ancient Buildings

With the rapid development of interactive 3D graphics technology, as well as the growing demand for virtual reality, digital urbanization and digital cultural heritage protection and time-consuming and inefficient traditional artificial building modeling methods have been far from meeting the rapid and intelligent needs of the application market and automatic. Architectural modeling methods have been paid more and more attention. Architectural modeling is an application-oriented comprehensive research field. According to different application scenarios, its research methods cover many technical fields and disciplines. This paper introduces a method of modeling ancient buildings using depth image estimation, spherical projection mapping, 3D adversarial generation network, and other techniques. The characteristics of architectural modeling methods are discussed from different disciplinary and technical perspectives. Second, the three major schools of architectural modeling technology, mainly the process modeling method, image modeling method, and point cloud modeling method, as well as the inverse process modeling method, which has attracted much attention and challenges in recent years, are summarized in detail. Then, the problem of building modeling is discussed. The problems and challenges of building modeling technology are analyzed, and the future development trend is predicted.


Introduction
Ancient architecture is one of the most important intangible cultural heritages in China. It can re ect traditional culture with architectural design. Today, although some of the ancient buildings are still preserved, most of them have been seriously damaged. With the development of marketization, urbanization and modernization, some ancient buildings in the center of the city have to be demolished or rebuilt, and some ancient buildings on the edge of the city are no longer shining as the status quo of ancient buildings in urban and rural areas with the passage of time. In the process of promoting the urbanization construction, large areas of real estate development and commercial development have caused damage to the ancient buildings. In order to solve the contradiction between the historical culture and the social modernization process, the relevant units began to pay attention to the transformation of the ancient building shape.
However, due to the use of traditional model reconstruction technology, it is di cult to accurately re ect the speci c situation of the built ancient building model due to its low precision [1].
In recent years, researchers in the eld of computer vision and machine learning have made remarkable progress in the reconstruction of ancient architectural models [2]. Various digital technologies emerge in an endless stream, developing and improving the means of preserving ancient architectural information.
e main digital technologies available in the restoration of ancient buildings are shown in Figure 1. According to the method of model reconstruction, it is divided into traditional reconstruction and deep learning-based reconstruction.
e software package has rich 3D image processing ability and can easily draw 3D images. However, since this method is established in the model design phase, it is di cult to generalize it to other types of objects, and the resulting model will only change according to the specific type. In the research of ancient architecture modeling, there are more and more ways to reproduce the traditional single image. On this basis, the deep convolutional network is used to extract the depth data, and the bivariate distribution on the two-dimensional volume element is used to carry out the three-dimensional geometric modeling, and the morphological type is predicted. Some results have been obtained for 3D reconstruction using unknown voxels.
Because the model is multidimensional, it is difficult for the network to learn some useful information in the training process [3]. Most of the previous studies used different loss functions and added some prior knowledge to the shape prediction, or through additional monitoring training, to make the predicted ancient building model closer to the actual shape. Figure 1 shows the results of training and testing the two models. e results show that the 3D model constructed by the two models has a better effect when the input image is seat type. However, when an input image is not from the seat training set, both algorithms will not give a correct prediction result and only provide similar results to the training data. However, the current 3D reconstruction techniques are based on the trained 3D data and select the nearest 3D model to the input image in the dataset, which has poor generalization performance.
Based on the above research, we find that most deep learning-based reconstruction methods have some problems in various aspects. In order to solve the above problems, this paper proposes an algorithm model to solve the problem that the generalization ability of the current model is weak.
In order to solve this problem, this paper proposes a method to reconstruct an ancient architecture model by depth image estimation, spherical projection mapping, and 3D inversion generation network [4]. e algorithm model has the following characteristics: (1) the modular method is adopted to force each module to model according to the original model, instead of simply memorizing the form of the training dataset; the prediction effect of each module is consistent with the input image. e same size, so you can get a more formal mapping. (2) Due to the low resolution of the input image, the existing 3D reconstruction methods are difficult to reconstruct the 3D model with high accuracy. e algorithm model established in this paper solves the problem of low accuracy of the ancient building model by introducing a super-resolution module, which has an important practical significance for reconstruction and protection of ancient buildings.

Overview of the Status Quo of Reconstruction Technology of Ancient Architectural Models.
Due to the lack of construction drawings of ancient buildings, BIM digital ancient building protection faces great difficulties [5]. e combination of modern surveying and mapping technology and BIM technology is a comprehensive digital protection method of ancient building information. It provides a new way for 3D model reconstruction in the process of digital protection of cultural relics and historic sites.
In the field of ancient architecture, the parametric design of ancient architecture has been studied from the perspectives of the parametric information model, parametric structure library, parametric software platform and so on, and some research results have been obtained. It has been widely used in cultural relic protection, archaeological surveying and mapping, reverse engineering, and so on. Wang Jianmin et al. took Flying Bridge Across Fish Ponds, Jinci Temple, Shanxi Province as the research objects and obtained more detailed data by field investigation, mapping, laser point cloud mapping, and other methods. e Revit technology is used to restore the structure of the fish pond flying beam in Jinci, and it is used as the family gallery of ancient architecture, so as to maximize its role in ancient architecture.

BIM Technology.
Due to the development of information technology, the traditional CAD manufacturing method has been unable to adapt to the requirements of information technology. BIM technology is mainly to establish and use the business model and data model of enterprises, so as to provide information for the design, construction, operation, and management of engineering projects, so as to achieve the purpose of integration, sharing, and collaboration. BIM technology is also known as "building information model," which is mainly used to describe three-dimensional images and oriented buildings, and it is difficult to be visually expressed in architectural engineering. BIM technology is to present all the information of the design project in the form of digital and information technology and to carry out engineering quality management in the form of the virtual model. In this mode, project managers can share and analyze project information at different stages.   ree-Dimensional Laser Scanning Technology. 3D laser scanning technology is a new application in recent years in geology, mining, aerospace, and other high-tech fields. In the process of architectural design, this technology is used to obtain point cloud data, construct the basic environmental model of the site, and complement the design. 3D laser scanner is a kind of measurement technology, which can completely break the limitation that conventional GPS single-point measurement accuracy is not high. e main research contents of ancient building model reconstruction based on BIM technology and 3D laser scanning are given in Table 1.

Classification of the 3D Laser Scanning System.
e 3D laser scanner is a kind of high technology based on the scanner [6]. At present, in the market basis, according to user' needs, a variety of functional characteristics of the scanning equipment emerged. For the user, it is necessary to select the appropriate scanner according to the engineering needs and actual conditions. erefore, this article must have a three-dimensional induction and summary of the laser scanner. ere are many kinds of 3D laser scanners, each with a different type.
(1) According to the length of the scanning distance, it is divided into near, middle, and far. (2) According to the different scanning operations, it can be divided into the laser scanning system and ground 3D laser scanning system system, handheld laser 3D scanner. (3) According to the different ranging principles, it can be divided into the three-laser scanning system, pulse distance measurement system [7]. Laser scanning system, phase ranging laser scanning system, and laser and phase combination ranging system command.

State of the Art
e parameterized 3D component library is to realize the effective management of model components and the sharing of information resources. e characteristics of BIM technology, such as informatization, integration, and visualization, make the parameterized design possible. On this basis, the model in the component library is modified, and the related data are updated and improved, so as to realize the call to the component library and the parameterized information model platform. e system not only creates favorable operating conditions for the automatic assembly of ancient buildings but also can effectively improve the protection of cultural relics, so as to lay a foundation for the implementation of digital protection technology.

Network Model Structure.
By learning certain functions, the reconstruction algorithm can map 2D images into 3D shapes [8]. Aiming at the problem of poor generalization ability of current 3D reconstruction technology, an improved algorithm model based on the Marr network model was proposed. By introducing regularization parameters into the discriminant module, the problem of poor generalization ability of the model is solved.

Depth Image Estimator.
is section estimates the depth image of a single image using an encoder-decoder (autoencoder) network structure, a type of neural network whose goal is to find random maps that minimize inputoutput differences. Using the autoencoder structure, the effective features of the image can be captured, and the depth image with a better effect can be extracted. e depth estimator module consists of a super-resolution reconstruction network and an encoder-decoder (autoencoder, AE) structure.
rough the encoding and decoding process, the model can learn the distribution and features of data, effectively capture the effective features of images, and extract better depth images.
(1) Super-Resolution Reconstruction Module. In this paper, 3D modeling of ancient architecture image is carried out. e results show that the external factors such as the focus and jitter of the image are not clear enough, which will affect the extraction of the image, resulting in the poor effect of the three-dimensional image. It is a good solution to add the super-resolution module to the front end of the network. Some scholars have previously analyzed the characteristics of different network models in image recognition and found that the ESRGAN network can obtain the minimum PI value when recognizing images. is is shown in Figure 2.
Unlike other super-resolution modules, ESRGAN removes the normalization (batch normalization, BN) layer in the residual block. Figure 3 shows a comparison between traditional and residual blocks in ESRGAN.
(2) Encoder-Decoder. ResNet solves the problem of performance degradation when the network depth is deepened and provides feasibility for some complex feature extraction and classification.
Based on ResNet, this paper introduces a U-shaped network structure (nested U-Net architecture, U-Net++), which is added in the middle of the network. More jump connections can better combine image information for segmentation [9].
Equation (1) is the network structure, where H () represents the convolution and starting function, U () represents the upper sampling layer, and [] represents the connecting layer. For example, x1, 2 are used to sample x1, 0, x1, and 1 and then obtained by convolution and linear activation (rectifying linear unit ReLU).

Spherical Projection Module.
In order to solve this problem, a method based on depth map is proposed to obtain the surface shape of 3D objects by projection map. Using the camera parameters, the depth information is Advances in Multimedia transformed into point clouds, and the surface information is transformed using the stereogeometry method. Assuming P (x, y, z) is any point in the cube, the function value of this point can be obtained according to the linear interpolation operation, as shown in the following formula : where coefficient a i (I � 0,1,. . .,7) is the function value of the eight fixed points in the cube. If the isosurface threshold is C, the simultaneous system of equations is equation (3), and the equivalent value can be calculated. e intersection of the face and the boundary of the cube is e obtained surface information is projected onto the center of the unit sphere by each U-axis and V-axis to generate a spherical representation. e whole process is not differentiable.

Identification Module.
Due to the limitation of spherical projection technology, in the case of self-occlusion, the surface information of the object will be lost, and the 3D model generated by the depth image will cause a lot of detail loss.
is problem can be solved well by introducing  recognition module into the 3D adversarial generation network [10]. Equation (4) is the loss function of the discriminant component. In this loss function, Pg and Px represent the three-dimensional form generated by different image datasets, Pr is the corresponding actual three-dimensional form, D is the recognition volume, and λ is the penalty term. e recognition module trains the damage function to obtain more detailed reconstruction results.

Ancient Building Model Reconstruction Based on BIM
Technology and 3D Laser Scanning. Based on BIM technology and 3D laser scanning, this paper reconstructed the flowchart of the ancient building model, as shown in Figure 4. Combined with Figure 4, the following three steps will be taken to realize the reconstruction of the ancient building model.

3.2.1.
ree-Dimensional Modeling of Ancient Buildings Based on BIM Technology. In order to solve this problem, BIM technology is adopted in this paper [11]. BIM technology is used to carry out preliminary restoration of ancient buildings, and it is applied to the preliminary design and construction effect drawings of ancient buildings.

Using 3D Laser Scanning to Obtain 3D Point Cloud
Data of the Model. Based on BIM technology, 3D point cloud data are obtained by the 3D laser scanner, which further improves the accuracy of modeling. e field measurement method is used to scan the ancient buildings, eliminate the noise, and obtain accurate three-dimensional point cloud data.
is process can be represented by a formula. If its objective function is Tedge, the following formula exists: In formula (5), dgray is the similarity of high-density mixed noise filtering; dgrad is the distance of a noise gradient. Using (5), noise reduction is carried out on the scanning points of ancient buildings. In the actual scanning, due to the limited scanning angle, it is necessary to analyze from multiple angles to ensure that a complete set of 3D point cloud data of ancient buildings can be obtained from all angles.

Reconstruction of Ancient Building Models.
e 3D point cloud data in the above model were introduced into the BIM model, and the details such as model texture, murals, and even carving were drawn by UV mapping technology [12]. e combination of BIM technology and 3D laser scanner can achieve the real design effect, so that the designer has an illusion of being in it. At the same time, designers can also use CAD space technology to retrofit ancient buildings with damage problems.

Key Technologies.
e main technologies involved in this paper are virtual environment construction technology, role control technology, collision detection technology, and so on. e main architecture of this system is based on the implementation of the above three technologies [13].

Virtual Environment Construction Technology.
e socalled virtual environment is a highly three-dimensional model and executed in the development software Unity3D.
ere are two methods to establish a virtual environment. One is to import the 3D model into Unity3D in FBX format through modeling software such as 3 DS MAX. e other approach is to use the Unity3D model directly and operate directly inside it. is chapter closely combines the above contents to model 3DS MAX and output it to Unity3D in FBX format. is process is complex, but it is crucial to the accuracy of the 3D model. As a professional modeling software, 3DS MAX has a much higher accuracy than Unity3D.

Role Control Technology.
A character controller in virtual reality is a virtual reality "person" that the user can control through input devices. ink about it. e key to character control technology is the principles of physics and visualization.
(1) Principles of Physics. e basic principle of physics is that the motion of a character must conform to Newtonian kinematics [14]. Character manipulators are essentially the result of camera placement. is can be done with three scripts. CharacterMotor.cs foot is used to initially set character parameters, such as height, stride size, and movement speed. Use mouselook.cs script to achieve free rotation angle. e CS script uses firstPersonController.cs to Advances in Multimedia set the WASD key as a button to control the character's movement.
(2) Principle of Visual Imaging. Image formation includes three steps: projection transformation, view clipping, and view area transformation [15]. Projection transformation can be divided into two kinds: one is orthogonal projection and the other is stereo projection. FOV reduction, that is, images that are outside the FOV are automatically deleted. View area conversion to transform the stereoscopic image segmented by the field of view into a two-dimensional image and display it on the computer. [16]. Set the slope that triggers the conflict to simulate the real world. If this is the case, the character can navigate some steps and slopes to a certain extent, but if not, it is "walking through a wall" or even "falling to the ground." In Unity3D, rigid parts and collision parts assigned to characters and scenes are often used to detect whether there is a collision. If a collision occurs, it cannot go forward. For example, in the collision between the character and the building, the character cannot pass.

Experimental Preparation.
Based on BIM technology, this paper combined with 3D laser scanning technology to model ancient buildings [17]. e target of the test was a 158year-old building that had been undergoing routine inspections for some time, refurbished, repaired, but still damaged. erefore, it is necessary to reconstruct the model. e test takes the accuracy of model reconstruction as the main index. e reconstruction accuracy of the model can reflect the completeness of ancient architectural details.
With the improvement of model reconstruction accuracy, its reconstruction accuracy also increases. First, BIM technology and 3D laser scanning technology were used to reconstruct the ancient building, and Matlab software was used to reconstruct the model, and the reconstruction accuracy of the model was obtained. Conventional methods are used to reconstruct the ancient building model, and the reconstruction results are compared with the control values.
Several main structures such as foundation, wall, column, beam, arch, and roof were tested [18].

Experimental Results and Conclusions.
e comparison table is obtained from the experimental data, as shown in Table 2.
As can be seen from Table 2, the reconstruction accuracy of the experimental group is more than 95%, while that of the control group is only 70%. e accuracy of the test model is 25% higher than that of the control group, indicating that the reconstruction accuracy of the model is higher than that of the control group, and the design integrity is high, which can accurately reproduce the ancient architectural model.

Depth Estimator.
In terms of evaluation indexes, the two-dimensional joint intersection (IoU) and mean square error (MSE) of real image and output image are used as measurement indexes. e calculation method is shown in the following formulas: where A represents the predicted image, and B represents the real image. e higher the value of IoU, the better the prediction effect.

Index of 3D Reconstruction.
e following equation is the voxel intersection point between the 3D voxel reconstruction model and the actual 3D model, where I, j, and k represent the voxel position, I () represents the index function, and t is the voxelization threshold.
Since neither deep nor spherical images can reflect the internal shape of objects, it is one-sided to evaluate the restoration effect solely by IoU, so the chamfer distance is  Figure 4: Flowchart of reconstruction of the ancient building model. 6 Advances in Multimedia taken as another evaluation method [19]. Equation (9) is a formula for calculating the chamfer distance. In this formula, S1 and S2 represent the predicted and actual three-dimensional point sets, and the mean distance between the predicted point and the actual point is measured.
In order to effectively detect stable keypoints in scale space, DoG function and image convolution are used to construct DoG Gaussian differential pyramid. By examining the variation of image gray level in the Gaussian difference image group, the features must be the points that vary as much as possible [20]. It should be noted that the dog image expresses the contours and changes of the target. e dog's expression is as follows:

Training Dataset.
is paper uses ShapeNet data to train and test 55 commonly used items and 51300 3D models. Based on these data, we added several classified real depth images for depth estimation. For different models, we picked three types of cars, seats, and planes. e trained model is then used to evaluate other types of models.

Depth Estimator.
is paper takes 72 and 300 resolution images as the research object, four types of images are tested, and four different types of images are detected. As can be seen from Table 3, under the condition of low resolution, the IoU index of this model is 0.189 higher than that of the MarrNet model. Under the condition of high resolution, the depth image obtained by this method is better than that of Mahalanobis image. e experimental results show that the depth estimator module has better extraction efficiency. Figure 5 shows the value of MSE and the prediction results of each classification. As we can see, the depth estimator is trained on datasets such as vehicles and aircraft. is method can be extended to new depth image classification. In the new test type and training set, the difference of MSE values is not significant, which fully demonstrates the good generalization performance of the depth estimator.

3D Model Reconstruction.
In order to test the generalization performance of the model, only three different data of aircraft, car, and seat are used for training. In the final test, the method is divided into two parts: one is to test the training dataset and the other is to test the generalization of the model through experiments. Table 4 shows the chamfer distance performed in the training set classification model. As can be seen from the table, the chamfering distance of the AtlasNet model on the simulation set (chair, car, and plane) is better than that in the paper. However, from the results of the actual model reconstruction, the AtlasNet model cannot predict the real result well.
rough the analysis of the CD value in the training set, it is found that the model selects the 3D value closest to the input image from the training library and uses the CD value to find the average distance from the predicted point to the actual point. In this way, AtlasNet will perform better. However, according to the results of the study, some of the predictions of AtlasNet are not consistent with the reality and do not have reference significance.
On this basis, the proposed model is compared with the existing main models. As can be seen from Table 5, except for the training set, the chamfer distance effect of each classification is better than that of the control model. From the analysis of 3D morphology of each model, it can be found that, in the case of nontraining set, the existing model cannot perform 3D reconstruction well, and the reconstructed shape is very different from the actual shape, and its generalization performance is poor. Compared with the existing models, this model has a better reconstruction effect. Although the details need to be further improved, the 3D shape produced by the model is the closest to the objective reality, which proves that the model has high generalization performance.

Analysis and Design of the Virtual Reality System.
e system needs can be divided into nonfunctional requirements and functional requirements. Nonfunctional requirements include usability requirements, performance Advances in Multimedia requirements, operating environment requirements, and so on. Functional needs can be divided into passenger functional needs and management personnel. Functional requirements: tourism role control, map navigation, information retrieval administrator mode, and visitor mode functional requirements are similar, but with a real-time parametric construction. e model function. Nonfunctional requirements: (1) Performance requirements: after loading a large number of 3D model data, the system can carry out smooth response; (2) easy to use: the user interface is required to be beautiful, concise, and simple to operate; (3) Real-time interaction: the user's command can be real-time feedback; (4) Operating environment requirements: after the completion of the system development can be used in the PC platform, WebGL platform, Android platform, IOS platform, and other platforms. Meet the needs of different types of customers. e functional requirements of tourist mode are as follows: (1) the establishment of virtual 3D scene: according to the 3D model described in the previous chapter, the creation of virtual three-dimensional scene; (2) Roaming role control: the character can be freely controlled to roam in the scene; ③ Conflict detection: ensure when the characters walk in the virtual scene, they will have conflicts, avoid crossing the wall, enhance the fidelity, and enhance the use's game experience; (4) Information query: the user can browse the relevant ancient buildings of the introduction, size, and other information; e size of any measurement object is not limited to the data entered by the administrator; ⑥ Map guide: execute small ground.
Fang science and technology: this is the positioning of Chengyang wind and rain bridge.
Functional requirements of management mode: realtime parametric modeling includes the modification of

Conclusion
Due to the poor generalization ability of single-view 3D reconstruction, when the resolution of the input image is low, the detail loss in the reconstruction process is great.
Using the super-resolution module and the network automatic coding technology, the depth image can be extracted effectively. On this basis, a spherical projection and recognition module is added to reduce the dependence on data and overcome the problem of overfitting. From the experimental results, compared with the current mainstream 3D model reconstruction algorithms, the proposed reconstruction algorithm has a significant improvement in detail reconstruction and generalization. Under the condition of low resolution, this method can still get better image, realistic 3D reproduction mode. However, because the database of the 3D model is not perfect and the dimension of the 3D model is very high, it is difficult to train, so it can only be modeled and checked by simple object classification. How to construct a complex diorama from a single image is an urgent problem to be solved. is is something we need to explore further in the future. Finally, combined with a concrete engineering example, the model is verified, the feasibility of the scheme is illustrated, and the scheme is verified. erefore, we have good reasons to believe that the method proposed in this paper can be used to solve the reconstruction problem of traditional ancient architectural models.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.