Artificial Intelligence Assistive Technology in Hospital Professional Nursing Technology

Global aging is becoming more and more serious, and the nursing problems of the elderly will become very serious in the future. The article designs a control system with ATmega128 as the main controller based on the function of the multifunctional nursing robot. The article uses a convolutional neural network structure to estimate the position of 3D human joints. The article maps the joint coordinates of the colour map to the depth map based on the two camera parameters. At the same time, 15 joint heat maps are constructed with the joint depth map coordinates as the centre, and the joint heat map and the depth map are bound to the second-level neural network. The prediction of the position of the user's armpit is further completed by image processing technology. We compare this method with other attitude prediction methods to verify the advantages of this research method. The research background of this article is carried out in the context of global aging in the 21st century.


Introduction
In recent years, the problems of walking and movement of the elderly, the disabled, and others have received more attention from scientific researchers in various countries. Many countries have carried out scientific research in this field. Most research studies on human pose recognition focus on estimating 2D human joint coordinates from colour maps. e deep learning method based on large datasets has shown excellent results in detecting joint human points in colour images [1]. However, these algorithms cannot directly provide the position of the human body in the global coordinate system for the transfer and transportation care robot.
To ensure the accuracy of human body posture recognition of the transfer and transportation nursing robot, this paper uses the colour map human joint detection model as the first-level neural network to calculate the colour map human joint pixel coordinates. e article maps the joint coordinates of the colour map to the depth map based on the two camera parameters. At the same time, 15 joint heat maps are constructed with the joint depth map coordinates as the centre, and the joint heat map and the depth map are bound to the second-level neural network. Finally, this paper obtains the global coordinates of 15 joints. Artificial intelligence is a technology developed to simulate, extend, and expand human intelligence. Figure 1 shows the transfer and transportation nursing robot "Baize" developed by the team. e human-like back-carrying robot uses sound source localization, visual recognition, and other means to realize the position localization and posture recognition of the person cared for. e functions of autonomous obstacle avoidance and autonomous navigation are realized through the perception of the surrounding environment [2]. Tactile sensors are installed on the robot arm, chest rest, and seat to sense the user's back hug status in real time. e real-time safety guarantee module in the core control system adjusts the robot's movements and can safely and comfortably complete the actions of lifting, carrying, transferring, and placing. ese designs provide intelligent transport services for groups with lower limb inconvenience. ese all illustrate the application of artificial intelligence assistive technology in hospital professional nursing technology.

Transfer and Transportation Nursing Robot
To achieve the above functions, the team designed the transfer and transportation nursing robot ( Figure 2). e transfer nursing robot first uses the microphone array and the sound processing module to recognize the user's voice and calculates the user's angle relative to the robot to turn to the user. e human body gesture recognition module and the path planning module move to the user in front of the user at a short distance. Finally, the robot drives to the destination and places the user according to the guidance of the path plan.

Level 1 Convolutional Neural Network
According to the needs of the transfer and transportation nursing robot, this article defines the human joints as 15 joint points (Figure 3). e human body 3D joint position prediction is divided into a two-level network [3]. In the first-level network, we use PAF (part affinity field) to predict the state of the human body in the colour map. After calculating each pixel's human joint likelihood value, we take the maximum coordinate as the joint coordinate.

2D Joint Point Detection.
e PAF method has the characteristics of high precision and high robustness. It can effectively detect the 2D posture of the human body from the RGB image. is method uses a 2D human joint detection model to generate a multichannel scoring heat map S est ∈ R n×m×15 , S est � (S 0 , S 1 , . . . , S i , . . . , S 14 ). We order p i � arg max (u s ,v s ) S i (u s , v s ), where u s , v s represents the coordinates of the heat map and p i represents the coordinates of the maximum score value in the heat map S i , that is, the predicted pixel coordinates of the joint i. en, we get the predicted coordinates p est � (p 0 , p 1 , . . . , p i , . . . , p 14 ), p est ∈ R 2×15 , of 15 joints. To further improve the adaptability of the PAE method to the working environment of nursing robots, we use the weights provided by Open Pose as the initial weights. At the same time, we use the family environment dataset to perform migration learning on the network [4].

Loss Function.
To predict the joint pixel positions, we define a 15-channel real heat map during model training. S gt represents 15 Gaussian images with the real joint pixel position as the extreme centre. We calculate the minimum mean square error of S gt and S est as the loss function of model optimization:

Joint Heat Map
is paper constructs a multichannel joint heat map with the pixel coordinates of the 15 human joint point depth maps as the centre. We first calculate the depth map mapping relationship of the colour map to obtain the joint depth coordinates. e article obtains the denoising depth map: where (u depth , v depth ) represent the pixel coordinates of the depth map; k, l represent the values in the template W; and med stands for median operation. Considering that the pixel coordinates of the depth map can be mapped to the camera coordinate system, this paper maps the I depth pixel to the colour map I color to obtain the two-picture pixel coordinate registration relationship map M reg . In this way, the coordinates of the joint depth map can be calculated in reverse [5]. e process is as follows.
e article obtains I depth , I color , and M reg at the same time. Add 2 channels based on the colour image I color .
ese channels are used to store the corresponding depth map pixel coordinates. Define the coordinate of the pixel x depth in I depth as (u depth , v depth ).
Step 2. Construct a 3-dimensional vector p depth � (u depth , v depth , z) for pixel x depth , where z is the pixel value of I depth at x depth .
Step 3. Calculate the coordinate p ∼ depth of x depth in the depth camera coordinate system through the depth camera internal parameter matrix H depth and p depth .
Step 4. e article uses the space coordinate transformation matrix to transform p ∼ depth into the coordinate p ∼ color in the colour camera coordinate system.
where R, T represent the rotation matrix and translation matrix between the two cameras, respectively.
Step 5. Calculate the corresponding coordinate p color of x depth in the colour image coordinate system through the colour camera internal parameter matrix H color and the colour camera coordinate system coordinate p ∼ color .
Repeat the above calculation for each pixel in I depth in turn to obtain the coordinates of the mapped colour map. In this way, the pixel coordinates of the depth map corresponding to the pixel coordinates of each colour map can be obtained in reverse. Finally, we get the Step 7. Find the joint depth map coordinate p est_depth corresponding to the joint colour map coordinate p est according to M reg . We define the multichannel joint heat map H � (H 0 , H 1 , . . . , H i , . . . , H 14 ), with i as the joint index [6]. e H i dimension of the heat map is the same as the I depth dimension. p h � (u h , v h ) is the pixel coordinate of the heat map. We use the 1-dimensional Gaussian function h() to calculate H i with the coordinate p est_depth of the joint depth map as the centre:

Level 2 Convolutional Neural Network
In the second stage, we use depth image and multichannel joint heat map binding as input. We further optimize the 2D joint detection results through convolutional neural networks to obtain 3D human joint poses. e algorithm flow is shown in Figure 4.    e real-time human posture recognition algorithm based on a 3D convolutional network relies on a high-performance GPU (graphics processing unit), which is unsuitable for ordinary households' transfer and transportation care robot systems [7]. erefore, to meet the accuracy requirements of joint prediction, especially near-range joint prediction, this paper uses a compromised 2D convolution method instead of 3D convolution. Based on the VGG16 network structure and the spatial pyramid pooling method (SPP), we design a convolutional neural network that is not limited by the size of the input image ( Figure 5). We replace the 1000-unit fully connected layer final output by VGG16 with a 45-unit fully connected layer. e u − v − z coordinates of 15 joints are estimated to get Defining the global coordinates is the same as defining the depth camera coordinates. e depth map camera coordinates p i est_depth are converted to the global coordinates p i est w by formula (7), p i est w � (x i w , y i w , z i w ) is obtained, and then p est w � (p 0 est w , . . . , p i est w , . . . , p 14 est w ) is obtained.
Among them, f x , f y , u cam , v cam represent the internal camera parameters and x w , y w , z w represent the coordinate values of the global coordinate system.
In this paper, based on the ITOP (invariant-top view) dataset, the application scenario data training network of the transfer and handling nursing robot is added.

Loss Function.
We use the minimum mean square error between the predicted joint global coordinates and the actual joint global coordinates as the loss function of the secondlevel neural network: where i is the joint index, p i est w is the predicted joint position, and p i gt w is the actual joint position.

Estimation of Underarm Points
We put the robot's two arms under the user's armpit which is the key to successfully picking up the user. is paper delineates ROI (region of interest) for the depth image based on the given depth image joint coordinates [8]. en, we perform threshold segmentation in the ROI to obtain the underarm area and determine the target point where the robot arm is placed.  that the armpit area obtained by image segmentation does not contain the human body, we map the grey pixel value in the human body area to 1/3 of the original range [9]. We calculate the maximum depth j max and minimum depth j min among the four joint positions of the left (right) shoulder joint, left (right) elbow joint, left hip joint, and right hip joint in each ROI in turn. According to the pixel value mapping function shown in formula (9), the ROI is converted to grayscale.

Estimation of Axillary Points.
We first perform 5 × 5 median filter denoising processing on the depth map.
Considering that the background and foreground depth values vary with the scene, this paper uses the Otsu segmentation algorithm based on the maximum between-class variance to obtain the mask image M of the armpit area [10]. Further, we calculate the pixel coordinate p est sp of the armpit centre point by the following formula:

Experiments and Results
e PC configuration used in this experiment is Intel Core i7-6700HQ CPU, with 8G RAM and 4G NVIDIA GeForce GTX 950M GPU. e colour map and depth map sensors use the Microsoft Kinect 2 sensor. For the lower limb inconvenience group, we selected 1,100 human body images in the home environment as the test dataset.

Real-Time Evaluation.
e experimental results of the calculation speed of commonly used gesture recognition methods are shown in Table 1. e comparison shows that the 3D convolution method used in [10] single-time 3D joint position estimation exceeds elapsed time 2 s. is cannot meet the real-time requirements of the transfer and handling nursing robot. erefore, the follow-up experiments of this article will not be tested. e Kinect SDK v2.0 method can complete about 30 joint predictions per second. e realtime performance of the algorithm at this time is optimal. e method in [9] is similar to the method in a single calculation, and the time is about 200 ms. is can prove the good real-time performance of the method in this paper.

Average Accuracy Evaluation.
We use the 10 cm rule as the evaluation criterion for joint prediction. e article adopts the average accuracy of joint prediction (mAP) to measure the accuracy of the method used. Under the condition that the distance between the user and the camera is 550 mm-3500 mm, we collect 600 pieces of actual work scene data of the transfer and transportation nursing robot. e results of our evaluation of Kinect SDK v2.0, the method of Fujisawa et al. [9], and this paper's method are shown in Figure 6.
It can be seen from Figure 6 that when we predict each joint of the human body, the average estimation accuracy of the KinectSDKv2.0 method is the lowest. e direct mapping method used in [9] has good environmental adaptability and high accuracy. However, due to the colour map, the joint estimation itself has certain errors. is method easily predicts failure in relatively slender parts such as hands, elbows, and knees. In addition, the method is also deeply affected by the invalid value of the depth map and the noise, which reduces the accuracy. Compared with the above methods, this method has the highest accuracy, and the average accuracy of joints reaches 91.5%. Journal of Healthcare Engineering close-range robustness of the method in this paper under the short-distance conditions with the operating range of 550 mm-800 mm. is article only evaluates the estimation effect of each method on the near-distance joint estimation based on the coordinate prediction of the head, neck, left and right hips, left and right elbows, and left and right shoulders. It can be seen from Figure 7 that all three methods have reached the highest estimation accuracy on the head. Kinect SDK v2.0 and [9] methods have the lowest estimation accuracy at the hip joint, while the method in this paper has the lowest estimation accuracy at the elbow joint. Overall, the estimation accuracy of this method in any joint is the highest among the three methods. e average accuracy of the joint reaches 90.3%. When the distance is between 550 mm and 800 mm, the human body information collected by the depth camera is not comprehensive. Compared with the average accuracy evaluation experiment, the accuracy of Kinect SDK v2.0 dropped sharply in the close-range experiment. e algorithm has misjudgement of joint positions and even misidentified joints that do not exist in the figure, which is unacceptable for the transfer and transportation nursing robot.

Accuracy Evaluation at Close
On the other hand, the depth camera based on the ToF (time of flight) principle is more sensitive to the wrinkles and colours of clothes under close-range conditions. In this way, it is easier to collect invalid values, which increases the probability of the direct mapping method [9] to map invalid values. is leads to an increase in the recognition error rate. In comparison, the two-stage network method proposed in this paper has better short-range adaptability.

Evaluation of the Accuracy of the Axillary Point Prediction.
is article defines the 3 cm non-background rule as the basis for evaluating the accuracy of the axillary points. e predicted axillary point cannot be located on the human body, and the distance from the real axillary point in the X-Y plane is not more than 3 cm. It is regarded as an accurate prediction. Before calculating the user's armpit position, the transfer and transportation care robot moves to the front of the human body to face the front of the human body. We collect 150 frontal human body images at a distance of 550 mm-800 mm. Based on this, the experimental results of the evaluation of the method in this paper are shown in Figure 7. Method of this article  After calculation, the method's accuracy for predicting the axillary points in this paper has reached 91.3%. is is because most of the operating objects are elderly people with mobility impairments under the conditions of transfer and transportation care applications. e human body is mostly in a passive state without too many complicated movements, and the body's posture is relatively simple. In addition, the method in this paper is based on human posture detection to obtain ROI. e prediction accuracy of the axillary points is directly related to the recognition ability of left (right) shoulder joint coordinates, left (right) elbow joint coordinates, and left (right) hip joint coordinates. erefore, when the camera is too close to collect specific joint information, the armpit point estimation cannot be reliably completed when the joint position is incorrectly recognized.

Conclusion
is paper designs a 3D human pose estimation system for home care robots. e system uses Kinect 2 as the RGB-D data acquisition device to realize human joint position reasoning based on colour map 2D human body pose estimation.
rough image processing technology, the prediction of the position of the user's axillary point is further completed. We compare this method with other posture prediction methods to verify the advantages of this research method and illustrate the correctness of the application of artificial intelligence assistive technology in hospital professional nursing technology [11,12].
Data Availability e human body images data used to support the findings of this study are restricted by the Ethical Care Committee of Qiqihar Medical University in order to protect patient privacy. Data are available from Zhangbo Xiao (e-mail address: xzbmd@qmu.edu.cn) for researchers who meet the criteria for access to confidential data.

Conflicts of Interest
e authors declare that they have no conflicts of interest.