Three-Dimensional Animation Space Design Based on Virtual Reality

3D animation stereo space design is to process and analyze the collected target image information on the computer and nally obtain a 3D model that can represent the corresponding structure. As a branch of computer vision, 3D animation stereo space design is an important part of realizing the key technology of environment perception.What humans see in the three-dimensional world is a two-dimensional picture. Combined with the visual mechanism and the prior information of the object, it can realize the perception of the environment and promote the convenience of life and production. Games, movies, virtual humans, and any product that uses 3D technology require 3D engine support. e 3D engine encapsulates hardware operations and graphics algorithms and can manage a large number of texture and model resources to construct complex 3D scenes. In order to facilitate the research related to virtual human, in this paper, through the research and study of graphics-related mathematical knowledge and related technologies of engine architecture, an intelligent virtual animation generation engine based on DirectX12 is realized, and three-dimensional characters are controlled through related components, and the function of each module is tested and guaranteed. Test cases were designed to test the various functions of the engine and nally showed that the design goals in the demand analysis have been completed, and a lightweight three-dimensional animation three-dimensional space model has been realized.


Introduction
e emergence of digital technology has widened the expression space of animation art with its own advantages. It has brought new opportunities to animation art and scene design. According to the current research situation of scene design, in the conception method of animation scene design, people have formed thinking methods such as establishing the overall modeling consciousness, grasping the theme, determining the tone, and exploring unique and appropriate modeling forms. In terms of the expression of space in the scene, there are some speci c methods for the elements and classi cation of animation scene and the methods of shaping the scene, such as using the sense of gravity, strengthening the depth of eld, and using character scheduling. In the traditional scene design, there is also a certain theoretical basis for the light and shadow modeling and color creation in the scene. Light and shadow play an important role in animation lm scene design, such as shaping space, creating atmosphere, and creating suspense. Nowadays, in the animation scene design, the design and research of furnishings and props are also an important aspect. In the lm, they play an important role in explaining the character's identity, shaping the psychological space, depicting the task character, and setting o the task emotion. With the rapid development of the animation industry, the term "scene design" is gradually known to people, but from the previous development, lm and television animation scene design is often ignored in the animation industry and animation theory circle. From the perspective of professional training and theoretical research, it is basically in its infancy. For digital scene design, in terms of the research on the characteristics and performance methods of scene design, there is still room for further research. ere are only a few books on scene design in the market, which often elaborate scene design from the perspective of two-dimensional animation or basic theory, without involving some characteristics of digital level. is paper attempts to build a lightweight 3D rendering platform that can quickly combine some intelligent algorithms with character control through the intelligent control framework to realize the interactive 3D animation space based on the natural user interaction interface [1][2][3][4][5][6][7][8][9].

Related Works
Since the 1970s, foreign countries have carried out a lot of research on graphics and engines. After half a century of development, a number of very excellent 3D software companies have been born. ey include Discreet (later merged by Autodesk), Epic Games, Unity Technologies, Id Software, etc. ese companies have played a great role in the development of computer graphics and the promotion of 3D engines and have produced rendering engines such as 3 ds Max, Maya, Unreal, Unity, and Blender. ese engines are widely used to this day. Later, more special 3D engines were born. Zbrush developed by Pixologic is a 3D engine for sculpting modeling, Houdini developed by Side Effects Software focuses on special effects design and terrain editing, Substance Painter and Substance Designer developed by Allegorithmic are mainly used for texture material design, Marvelous Designer developed by Foundry Video Effects Company realizes clothing tailoring and cloth simulation, and Daz 3D produced by Daz Production Company can edit high-quality character models. e current hot direction in the generation of 3D human animation is to reduce the cost and increase the restoration degree of animation. Abroad, Ryo et al. proposed a method to create a spatial 3D model using the Kinect sensor. e 3D information of the space is obtained from the Kinect sensor, and the 3D model is created by synthesizing the images and the obtained 3D information. Kunlin et al. parameterized the standard model according to anthropometric parameters. en, the Kinect depth map of the human body model was optimized by processing and matching the point cloud data by using the PCL library. Quick integration with the PCL library yields a realistic human model. A new method for 3D human modeling based on a single Kinect was obtained by using an iterative closest point algorithm to register the captured upper human 3D point cloud data with standard reference human data. Zhang et al. proposed a highly personalized human modeling analysis method based on a single Kinect. First, a high-precision human positioning point cloud based on a single Kinect is obtained, and then, on the basis of ensuring the accuracy of the head of the point cloud, the point cloud information is preprocessed. Finally, by using hierarchical compactness to support radial basis functions (CS-RBFs), a 3D human body model is obtained by fitting the sampled point cloud to the existing human body. At present, there are few related papers on directx12 and Vulkan. Some articles have a clear description of the modules of the 3D engine. Compared with the above research, this research is more inclined to extend from the 3D animation design, from the basic data structure design to the implementation of related technologies [10][11][12][13][14][15][16][17][18][19].

Key Technologies of 3D Animation Stereo Space
3.1. Scenario Management Technology. 3D characters and their environments can be collectively referred to as 3D scenes. e 3D engine needs to fully describe the scene and each object in it in order to integrate various model data and present them on the computer screen. According to different application requirements, the scene management technology in the engine can be divided into two categories: (1) Scene management based on space division: this technology includes a variety of space conversion and division algorithms, including quadtree, octree, and Bsp tree. e core is to establish a hierarchical structure according to the spatial distribution relationship of objects, and the nodes in the tree represent an area of space where objects are stored on leaf nodes. is type of scene management mainly uses the tree structure to quickly locate scene objects, optimize collision detection, and view frustum culling [20]. (2) Scene management based on parent-child relationship: this management technology is also called "scene graph." e core is to establish a hierarchical tree structure based on the parent-child affiliation of objects. e nodes in the tree are objects with local transformation attributes, and the world transformation is obtained by passing the local transformation between the parent and the child. is type of scene management mainly records the transformation transfer direction through a tree structure, so as to perform the overall transformation of multicomponent objects. In the space partitioning technology, nonleaf nodes represent a space region and are mainly used for the efficiency optimization of internal computing.
In the scene graph, nodes represent objects with local transformation properties, which is convenient for batch manipulation of scene objects. After many 3D software tests, scene graph technology is very suitable for developers to create and design 3D scenes. Since this article focuses on the display of 3D characters and their environments, this article will focus on the technical details of the latter.

Character Animation
Technology. 3D animation is one of the core technologies of virtual characters. Excellent animation performance can not only increase the authenticity of the characters but also enrich the interactivity of the characters. Animation technology has undergone a long history of evolution, but currently the most widely used animation technologies are mainly in the following two categories: (1) Vertex animation and vertex animation texture technology (VAT): vertex animation directly records the spatial position of each vertex of the model mesh, which can simulate very complex animation scenarios. is animation technology is often used to simulate some very real object systems such as cloth and fluid. However, in the early stage, due to the large storage space required for vertex animation and the difficulty of mixing and reusing, it was rarely used in real-time systems. With the development of GPU hardware, the vertex animation texture (VAT) technology was born. For an animation with n vertices and a time length of f, the coordinates of the vertices at each moment are stored in a texture of size n * f: animation texture sampling is performed in the vertex shader, and the pose transformation of the vertices is performed. e animation texture technology is often used for thousands of identical animated objects in the picture. It can greatly improve the efficiency of animation operation [21]. (2) Skeletal animation consists of two parts of data: the skinned mesh and the skeleton. e skinned mesh records the index of the bones affecting the vertices, and the skeleton records the parent-child relationship between the bones, as well as the local coordinate transformation curve of each bone. rough the local coordinate system transfer technology in the scene graph, the bone pose at a specific moment is calculated, and then the bone animation is mixed according to the bone index recorded by the vertices of the skinned mesh to obtain the final vertex coordinates, specifically as shown in Figure 1.
In the binding pose, set the palm bone world transformation matrix to WorldMa tBind_Bone and the thumb mesh world transformation to WorldMat Bind_Finger ; the offset matrix formula is as follows: (1) e original coordinates of the mesh vertices are multiplied by the offset matrix to obtain the mesh vertex coordinates under the bone coordinate system and then multiplied by the skeletal animation matrix at a certain time to obtain the mesh vertex coordinates under the world coordinate system. e formula is as follows:

Rendering Technology.
e quality and efficiency of 3D rendering are closely related to the effect and fluency of character display. For the choice of rendering technology, it is necessary to consider satisfying complex materials and enriching the content of the picture and also consider maintaining the frame rate as much as possible in some extreme cases. Today's rendering technologies are mainly divided into two categories: (1) Ray tracing\path tracing rendering technology: the core idea of ray tracing rendering technology is to shoot multiple rays from the camera toward the screen pixels, use the scene management based on space division, accelerate the intersection and reflection of scene objects, and calculate the direct light source, the contribution of the indirect light source to the pixel, and finally the actual brightness. (2) Rasterization rendering technology: rasterization technology mainly maps the triangle surface transformation in all mesh models to the screen space through perspective projection and then disassembles them into pixels and obtains the actual brightness of pixel coloring according to the scene lighting information. Ray tracing technology is a rendering framework that can solve global illumination and can bring quite realistic lighting effects, but due to the need for a large number of ray calculations, it is often only suitable for offline rendering areas. For real-time rendering, most engines still use rasterization rendering architecture. In order to improve the real-time performance of interactive characters, this article will also use rasterization rendering technology as the rendering core.
According to the different processes of rasterization rendering, it can be divided into the following two categories: forward rendering and deferred rendering.

Interactive
Technology. Interactive intelligent characters are often able to respond to external input, as shown in Figure 2. e technology in which machines process external input and feedback response is called human-computer interaction. Human-computer interaction technology can be divided into the following three categories according to the development period: In this working mode, the component data pointed to by the pointer is discrete. When the System operates a large number of entities, the Cache hit rate is not high, causing frequent page replacement and even page "jitter." In order to store component data in contiguous space as much as possible, all components of all entities are placed in a contiguous memory block. In this version, the data between entities is discrete, which will reduce the Cache hit e ciency when operating a large number of entities. Since the separation of data and operation functions is achieved, the data should be stored continuously, as shown in Figure 4.
As shown in Figure 4, the same components of all entities are stored together, but, for example, Entity2 does not contain the HealthCmpt component, while Entity1 and Entity3 do not contain the AttackCmpt component. When all the entity components are stored together, there will be gaps, making the entity and the corresponding component data. Location relationships are not easy to manage. erefore, a better method is to put the component data of entities with "similar components" in an Archetype, as shown in Figure 4. Entity is only used as an index pointing to the Archetype to indicate the location of the data contained in the entity in memory. ere is a memory pool implementation in Archetype, and each block of memory is divided into a xed size, which is convenient for memory alignment and improves the e ciency of continuous data reading, speci cally as shown in Figure 5. Component addition or deletion is an important function of ECS. e operation process design is shown in Figure 6.

Local Coordinate System Transfer Design.
Many 3D animation engines, including 3 ds Max, Unity, and Unreal, build scenes through SceneGraph. SceneGraph is a multifork tree that records the parent-child relationship between scene nodes. In the engine, the transform of the parent node property changes that a ect child nodes is called parentchild pose transfer. e e ect of the translation operation on subsequent nodes is very intuitive. Only the translation transformation of the parent node is directly applied to the child nodes, as shown in Figure 7.
But when it comes to the rotation and scaling of the parent and child classes, the child class will appear in an unbelievable shape, as shown in Figure 8.
It can be seen that the scaling of the parent class is not directly used for the subclass objects but produces "nonequal    Figure 9.
Unequal scaling with angles can be split into two rotations and one scaling. For example, double scaling with a 45-degree angle is equivalent to the following: first rotate the coordinate system counterclockwise by 45 degrees to get the blue color shown in Figures 5-13, extend the new coordinate system to double the Y-axis direction, and then rotate the coordinate system 45 degrees clockwise to return to the original black coordinate system.
In order to store and load the scene, the basic data needs to be serialized and deserialized. Serialization and deserialization mean that all types of data are uniformly converted into binary and then stored in binary form or transmitted over the network. Relatively, when the file is read and the network is received, it is written according to the binary. e order of input is restored to the corresponding data structure.
A large amount of continuous data of the same type such as grids and animations can be read and written in binary using traditional data structures such as Vector. But sometimes it is necessary to record not only the value of the data, but also the dependencies between the data. e SceneGraph that represents the scene structure is this kind of data with dependencies, including the parent-child relationship between scene nodes and the mount relationship between entity and component. is kind of dependency obviously cannot be recorded with traditional data structures. e data structure relationship designed in this paper is shown in Figure 10. is kind of inclusion relationship is very suitable for storing with JSON. JSON is a typical data structure that can be nested. It is very suitable for recording complex hierarchical data relationships. Many games in the market use JSON to store complex character attributes and level progress. Similarly, JSON can also be used to store the hierarchical relationship of SceneGraph. Another advantage of the JSON data structure is that it is convenient to add properties to JsonNode in the form of Key-Value, which is just suitable for storing resource indexes in material components and mesh components. ere are many C++ JSON open-source libraries on GitHub. is article will use the cJSON open-source library to JSONize the scene structure. JSON data will change size according to the depth of Sce-neGraph, and it is also dynamic data, similar to Vector and String formats. When serializing JSON data, size is also required for reading and writing.
rough the serialization of static data and dynamic data, this study designs the following binary file structure. By storing and reading the binary file of this structure, the storage and restoration of 3D rendering scenes are basically realized, as shown in Figure 11.

GBuffer Design.
e first texture of the GBuffer can be used to record the normal information of the object. e normal may be the built-in normal of the object vertex, or the sampled normal calculated by the normal map and the surface tangent. e three-dimensional normal vector can be obtained through the text. e mentioned "normal compression formula" compresses the normal information into two dimensions. e second texture of the GBuffer can be used for the diffuse reflection color of the object. e diffuse reflection color will be obtained by sampling the diffuse reflection texture of the object material in the shader of the GBuffer. e first three channels of the four channels of RGBA are occupied, and the remaining one channel can be used to store other data. e third texture of GBuffer can be used to store special rendering information, such as Roughness (Roughness), Metalness (Metal), and masking (Ao) that may be statically baked for PBR material rendering. e last channel is used to store the RenderType, and the subsequent lighting processing stage will decide which lighting model to call for rendering based on the RenderType of the vertex. Since an object is basically not rendered by two different lighting models at the same time, the meaning of each channel will change with the RenderType. For example, when performing NPR rendering, it may be written into the MetalRoughAo texture, Emmisive, Shadow, equal coefficient. e sizes of the Gbuffer parts are as follows, totaling 80 bytes, as shown in Table 1.   Scienti c Programming 7

Construction of Deferred Rendering
Pipeline. First, we set the rendering target as a screen-sized texture, judge the lighting rendering model to be called according to the A channel data RenderType stored in the GBu er, read other required parameters from the GBu er, and perform lighting calculations. Subsequent forward rendering image postprocessing can be performed to process some objects with transparency, but obviously, transparent objects cannot be rendered between front objects, which is even the disadvantage of delayed rendering [22].

Animation System Design.
VideoPose3d is an opensource video input 3D pose estimation library, which is di erent from using the depth camera Kinect for 3D pose estimation. e library only needs the video data provided by the nondepth camera and processes the input video through a neural network to obtain coordinate estimation of 3D joint points. is chapter attempts to combine this neural network with 3D character animation, so that 3D characters have the interactive ability of action simulation. e VideoPose3d model was presented as a paper at Facebook's 2019 Conference on Computer Vision in Pattern Recognition (CVPR). It has been open sourced on GitHub. Di erent from the traditional learning model based on joint point detection, this model uses the two-dimensional key points of the video frame as the input stream to perform time series convolution, so as to train to recognize the threedimensional pose in the video, validated on the Human36 m dataset with good results. Another reason for choosing VideoPose3d is because the model provides a semisupervised learning method for unlabeled video, by predicting the 2D key points of the unlabeled video, then estimating the 3D pose, and projecting the 3D pose to the 2D pose in reverse by passing in the camera parameters.

Test Environment and Tools.
is article uses Windows 10 as the test operating system and uses Microsoft's C++ development environment Visual Studio 2019 for engine development. Vs 2019 provides a wealth of extension plug-ins, and you can download the HLSL syntax highlighting tool to improve the e ciency of Shader programming. In order to use the DirectX12 graphics development interface, the DirectX12 SDK needs to be installed, but most of the current Windows 10 SDK integrates the DirectX12 development environment. If the operating system includes it, you can write code directly without additional con guration. Part of the interface is written by Qt5.12.2. Qt's.ui le needs to be precompiled by Qt's built-in compiler, so the computer must install the corresponding version of Qt. e entire project is compiled and built through CMake, which is convenient to adapt to di erent versions of the Vs development environment.

3D Animation Stereo Space Test.
Take the FBX le below as a test example and refer to the running action around frame 180. Next, we will import the FBX into the engine for testing.   e running action of about 180 frames imports the FBX into the engine and sets the material to paste the texture on the model. e effect is as shown in Figure 13.
Comparing Figures 13 and 14, it is shown that the engine basically uses the material function to restore the character image. Next, adjust the time axis of the Cmpt_SkinRoot component of the color correction root node to around 180 frames, and the following effects can be obtained.
It can be seen from Figure 15 that the FBX animation data analysis of the animation system in this study shows that the transformation of the local coordinate system of the bones is normal, and the simultaneous animation effect in 3 ds Max is restored. e end result is a demonstration of skeletal skinning techniques and a reproduction of the same animation system as 3dx Max's internal poses.

Conclusion
e advent of the digital age has had varying degrees of impact on all aspects of our lives. It has also had a great impact on art and even animation scene design. From the many breathtaking special effects in TV films and digital 3D animations that rely entirely on digital media, the powerful influence of digital technology is vividly displayed. e core purpose of this paper is to build a 3D character display platform based on DirectX12. In order to realize the display of 3D characters, the basic data required for 3D character display is firstly displayed, and then the related programming of DirectX12 is carried out around the acquisition and conversion of these data. In the development of the engine, the functions and architectures of many existing engines and open-source engines were referenced to design the various modules of the engine.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.