A Graph Optimization-Based Acoustic SLAM Edge Computing System Offering Centimeter-Level Mapping Services with Reflector Recognition Capability

Robots can use echo signals for simultaneous localization andmapping (SLAM) services in unknown environments where its own camera is not available. In current acoustic SLAM solutions, the time of arrival (TOA) in the room impulse response (RIR) needs to be associated with the corresponding reflected wall, which leads to an echo labelling problem (ELP).-e position of the wall can be derived from the TOA associated with the wall, but most of the current solutions ignore the effect of the cumulative error in the robot’s moving state measurement on the wall position estimation. In addition, the estimated room map contains only the shape information of the room and lacks position information such as the positions of doors and windows. To address the above problems, this paper proposes a graph optimization-based acoustic SLAM edge computing system offering centimeter-level mapping services with reflector recognition capability. In this paper, a robot equipped with a sound source and a four-channel microphone array travels around the room, and it can collect the room impulse response at different positions of the room and extract the RIR cepstrum feature from the room impulse response. -e ELP is solved by using the RIR cepstrum to identify reflectors with different absorption coefficients. -en, the similarity of the RIR cepstrum vectors is used for closed-loop detection. Finally, this paper proposes a method to eliminate the cumulative error of robot movement by fusing IMU data and acoustic echo data using graph-optimized edge computation.-e experiments show that the acoustic SLAM system in this paper can accurately estimate the trajectory of the robot and the position of doors, windows, and so on in the room map. -e average self-localization error of the robot is 2.84 cm, and the mapping error is 4.86 cm, which meet the requirement of centimeter-level map service.


Introduction
With the arrival of intelligent society, mobile robots have been widely used in people's life and work, which greatly facilitates people's life and work. In the unknown indoor space, robots realize positioning and navigation services need to know the surrounding environment map and their own position in the map. However, the robot does not know the indoor map information. Simultaneous Localization and Mapping (SLAM) is the service of detecting and sensing the map (contour) of the surrounding environment for a moving subject in an unknown environment, relying only on the mounted sensors, while determining its own position in the map [1]. After decades of continuous development, SLAM technology services have been widely used in mobile robotics [2], virtual/augmented reality [3], autonomous driving [4], and so on. e current mainstream SLAM techniques are classified based on the differences in the sensors used and can be divided into LIDAR SLAM techniques and visual SLAM techniques. Sensors such as LIDAR and cameras have the advantage of accuracy and high resolution but they also have disadvantages: LIDAR is a very expensive sensor and poses health and safety issues in operation [5]. Cameras, although cost getting lower, require high processing power in low-light environments as well as low signal-to-noise ratios [6]. In addition, the above systems are computationally complex and usually use cloud-based processing, which is costly and involves privacy and security. In contrast, acoustic sensors as the standard for mobile robots can be used in low-light and dark environments [7]. Map reconstruction work can be achieved using the robot's own arithmetic power, which not only has significant cost advantages but also offloads privacy-aware services to MEC (mobile edge computing), avoiding the leakage of private information such as indoor images. erefore, researchers have begun to explore the implementation of acoustic SLAM.
In indoor environments, the propagation of acoustic signals is obscured and reflected by buildings resulting in multipath effects, which to some extent reflect information about the room arrangement and geometry and can be used to estimate environmental maps. Researchers have started to estimate room shapes from the room impulse response (RIR) of acoustics. Labelling the first-order echoes in the RIR to the walls that generate them is the key for room shape estimation. Since the RIR itself does not specify which reflections come from which walls, there is an echo labelling problem (ELP) [8]. Literatures [9,10] solved the ELP using the properties of the Euclidean distance matrix (EDM), but the algorithm must traverse the TOA combinations of all echoes, which has a high computational complexity. References [8,11,12] improved the computational complexity of their work based on EDM using graph theory, subspace filtering, and greedy iteration, respectively, but the overall computational complexity is still large.
ere is also a method of solving ELP using elliptic constraints. Literatures [13,14] solved the series of reflective points of walls based on an elliptic constraint model and finally used the Hough transform for the estimation of each reflective wall.
is method needs to arrange many anchor nodes to obtain enough data, which is complex to implement and extremely computationally intensive. Literature [15] proposed an algorithm to reduce the computational complexity based on elliptic constraints for iterative echo marking. e above method uses a stationary distributed microphone array, which requires the sound source and microphone array to be arranged in the room in advance and is not applicable to the practical application scenario of SLAM.
In addition to the static deployment of sources and microphones scheme in the above work, there is another scheme that embeds acoustic sensors on a mobile robot, which is more in line with the practical needs of SLAM and is more relevant for research. Whether the robot is equipped with an acoustic source can be classified as active acoustic SLAM and passive acoustic SLAM. References [16,17] used robots equipped with multichannel microphones for their own localization as well as localization of acoustic sources by sensing the ambient sound sources around them. However, suitable sound sources that can be detected are not always available in the actual environment. Another option is to use a robot equipped with both sound sources and microphones for simultaneous localization and mapping. In literatures [18][19][20], a mobile robot with a juxtaposition of an acoustic source and a single-channel microphone was used to collect first-order echoes. Due to the weak spatial perception of the single-channel microphone, the robot moved at least three times for a reflective wall estimation, and the movement error may affect the accuracy of the reflective wall estimation. Multichannel microphones have better spatial perception capability than single microphones and can obtain more information about the interior geometry for the same number of measurements. Literature [21] achieved room shape estimation using a robot equipped with an acoustic source and a four-channel microphone by estimating the location of the first-order image source by clustering, but the robot needs to collect a large amount of RIR data at each movement. All the above acoustic SLAM schemes can achieve the estimation of room shape but ignore the effect of cumulative errors in the measurement of the robot's moving state, that is, on the reconstruction of the room contour. e room map has only room shape information and lacks the location information of doors and windows because it cannot distinguish between different reflective materials.
In this paper, a robot active sounding scheme equipped with both sound sources and microphone arrays is used for simultaneous localization and mapping. To address the ELP problem mentioned above and the cumulative error of robot movement measurement affecting the accuracy of map building, this paper proposes a graph-optimized acoustic SLAM edge computing system based on graph optimization that can identify room detail information. e main contributions of this paper are as follows: (1) A robot system prototype is designed and implemented that can be used for acoustic SLAM (2) A first-order echo labelling algorithm based on RIR cepstrum is proposed, which solves ELP by distinguishing different reflective materials (3) A graph optimization-based method is proposed for correcting the pose estimated by the trapezoidal constraint e sections of this paper are organized as follows. Section 1 introduces the background and current research status of acoustic SLAM, and Section 2 describes the problem setup and the architecture of the graph optimization-based acoustic SLAM system. In Section 3, a prototype of our designed robotic system for acoustic SLAM is presented. Section 4 introduces the related acoustic SLAM methods. Section 5 shows the simulations and experiments, and Section 5 concludes.

Problem Description and System Structure
In an indoor environment, the sound signal received by a microphone consists of the direct sound from the source and the reflected sound that is reflected by the walls. In the image source model [22,23], the reflected sound from the actual sound source was replaced by the direct sound from the image source, as shown in Figure 1(a). For a first-order echo described by the unit normal n k and an arbitrary wall point p k and the kth reflected wall, the first-order image source s 1,k of the real sound source s with microphone r j is calculated according to s 1,k � r j + 2〈p k − r j , n k 〉n k . e sound propagation process was described in terms of the room impulse response (RIR), which consists of a series of Dirac pulses δ: where ε(t) is the noise, α i is the amplitude of the received pulse, and its magnitude depends on the absorption coefficient of the wall and the distance from the image source to the microphone [18]. τ i is the arrival time of the corresponding pulse, which is proportional to the distance from the image source to the microphone r j . e room impulse response can be represented as a dataset data j � (α i , τ i ), i � 1, 2, . . . , n}. ELP finds the data (α i , τ i ) associated with the first-order image source s 1,k from the dataset data j . e label i was defined as the label corresponding to data (α i , τ i ). e data association process for a first-order image source s 1,k can be represented by the function L(α i , τ i ): e distance from the first-order image source s 1,k to the microphone r j can be known from the marked first-order echo data. For a single-channel microphone, as shown in Figure 1(a), the position of the image source s 1,k is on a circle C 1,k with the position of microphone r 1 as the center and d j,k as the radius, and the exact position of the image source s 1,k on the circle cannot be determined due to the lack of spatial information. For multichannel microphones, as shown in Figure 1(b), the position of the image source s 1,k is at the intersection point between circles C j,k . In 2D space, it is known from the TOA localization algorithm [24] that the uniqueness of the image source s location can be guaranteed when the number of microphones is greater than 3. e midpoint of the line connecting the real source s and the image source s 1,k is the location of the reflecting wall.
In this paper, an omnidirectional sound source was defined, and an omnidirectional 4-channel microphone array are installed on the robot, and the location of the microphone array in relation to the sound source is shown in Figure 2(a). e robot travels around the room in a circle, and for each step the robot takes, the sound source generates a pulse while the microphone array records an echo. e room was defined as a 2D polygon for the sake of descriptive simplicity, and the approach in this paper can be easily extended to 3D. e robot can estimate the position of each wall relative to itself in the room using echo information, as shown in Figure 2(b). Based on the above description, the difficulty of first-order image source position estimation is solving the echo labelling problem (ELP) [8]. In this paper, taking advantage of the strong spatial perception of multichannel microphones, an RIR cepstrum feature that can distinguish reflective walls with different absorption coefficients was proposed, and based on this feature, this paper proposes a solution for first-order echo labelling based on the RIR cepstrum. e robot travels around the room in a circle and uses the echo information from different locations to estimate the distance from the wall to itself for the shape estimation of the indoor room, as shown in Figure 2(b). Since there is a cumulative error in the robot position estimated by IMU data, this can lead to inaccurate estimation of the wall position. To solve this problem, a graph optimization-based acoustic SLAM method was proposed, which uses graph optimization to fuse the robot pose estimated by the sound echo signal with the pose estimated by IMU to eliminate the cumulative error, and the system block diagram of this method is shown in Figure 3. Referring to other graph optimization structures [25][26][27], the graph optimization system in this paper is also mainly divided into two parts: front end and back end. e front end establishes the graph vertices and the positional constraint relations between graph vertices based on the IMU sensor and acoustic sensor data, and the back end optimizes the positional graph based on the closed-loop constraints added by the closed-loop detection and the constraint relations between graph vertices to finally obtain globally consistent robot trajectories and indoor room maps.

Acoustic SLAM Method
According to the system framework of acoustic SLAM, the acoustic SLAM method can be divided into two parts: echo labelling and pose correction. Echo labelling extracts the first-order echo signal from the echo signal and then estimates the position of the first-order image sound source based on the first-order echo signal. e pose correction is to eliminate the accumulated errors during the robot movement globally using graph optimization methods.

Echo Labelling Based on the RIR Cepstrum Feature.
e ELP problem is often solved using the Euclidean distance matrix approach, which requires traversing all possible combinations of echoes with high complexity. In this section, an echo labelling method based on RIR cepstrum is proposed, which can achieve fast and accurate echo labelling.

RIR Cepstrum.
Multichannel microphones are more spatially aware than single-channel microphones, and a spatial cepstrum feature is proposed in literature [28], which can represent the relative position of the sound source in the room. Inspired by this, the room impulse response cepstrum feature was proposed, which can be used for first-order echo labelling and loopback detection.
Suppose the robot is equipped with M omnidirectional microphones and an omnidirectional sound source s. As shown in Figure 4, the robot moves N steps from x 1 to x N , and each time it moves, the robot's sound source generates a pulse, while the microphone acquires the room impulse response at the current position. e robot can acquire M room impulse responses h i,j (n), j � 1, 2, . . . , M at x i . According to equation (1), h i,j (n) consists of a series of pulses, and the average energy feature r i,j,k of the kth pulse is extracted: where τ i,j,k is the TOA value of the kth pulse in h i,j (n) and 2L + 1 is the width of the rectangular window. e time delay feature of the kth pulse in h i,j (n) is where c is the speed of sound propagation in the air. Log operations are performed on the above two features separately to obtain the log energy vector p i,k and the log time delay vector q i,k : e robot is obtained from x 1 ⟶ x N , N > M; the matrix of the average amplitude logarithm of the impulse response about the room A k and the matrix of the logarithm of the arrival distance B k can be obtained as follows: As in the method for extracting the spatial cepstrum [29], we also use PCA instead of DFT or DCT. R A k can be obtained from A k .
Since R A k is a symmetric matrix, R A k the eigenvalue decomposition can be expressed as follows: where E A k is the eigenvector matrix, D A k is the diagonal matrix, and the diagonal elements are the eigenvalues, in descending order. After PCA dimensionality reduction, the data d A k can be expressed as Similarly, for the matrix B k , PCA dimensionality reduction is performed.
(10) e principal component components are selected from d A k and d B k , and they are the components with the largest eigenvalues d a k and d b k , which form the room impulse response cepstrum d h k .
where d h k is the matrix, d h k is defined as the room impulse response cepstrum, d a k is the amplitude cepstrum, and d b k is the distance cepstrum. e amplitude cepstrum corresponds to the average amplitude of the pulses observed by the microphone array. When the pulses observed by the microphone array are consistent, i.e., the first-order echoes come from the same reflecting wall, the magnitude of the amplitude cepstrum is inversely proportional to the distance of the robot from the reflecting wall. Similarly, when the pulses observed by the microphone array are consistent, the magnitude of the distance cepstrum is proportional to the distance of the robot from the reflecting wall. As shown in Figure 5(a), in a 6 m * 6 m rectangular room, the robot moves from x 1 to x 20 , and to ensure the consistent observation of the microphone matrix, the reflection coefficients of walls w1-w4 to 0.8, 0, 0, and 0 were defined. At this time, the robot can only receive the echo from wall w1. We select the second pulse in the room impulse response and extract the RIR cepstrum d h 2 according to the above method and represent the RIR cepstrum in a two-dimensional Cartesian coordinate system, as shown in Figure 5(b). We can observe that when the pulses observed by the microphone matrix are consistent, the cepstrum of the room impulse response d h k of d a k and d b k approximates a linear relationship. When the value of the RIR cepstrum of the microphone array pulse combination is in the vicinity of the straight line corresponding to the reflective wall and then combined with the size of the microphone array, it can be determined whether the microphone array is observed consistently.

Echo Labelling Based on the RIR Cepstrum Feature.
Based on the feature that the one-dimensional component of the RIR cepstrum approximately satisfies a linear relationship with the two-dimensional component when the RIR cepstrum is observed consistently, a method for firstorder echo labelling was proposed. According to the image source model, it is known that the 2 nd and 3 rd pulses received by the microphone must be the first-order echoes reflected from the wall. e robot follows the trajectory in Figure 4 from x 1 moving to x N . Since the robot moves against the wall and the spacing between the microphones is small, it can be guaranteed that the 2 nd and 3 rd pulses observed by the microphones are mostly from the same virtual source, i.e., the observations are consistent. Assuming that the acoustic reflection coefficients of each of the four walls of the room in Figure 4 are different, the robot takes the 2 nd and 3 rd pulses from x 1 ⟶ x N . e second and third pulses received by the microphone are taken to find the cepstrum of the room impulse response; the matrix of the logarithm of the room impulse response amplitude at this time A 2,3 and the matrix of the logarithm of the arrival distance B 2,3 are as follows: Following the method in the previous section, PCA operations are performed on A 2,3 and B 2,3 , to obtain the feature matrices E A 2,3 and E B 2,3 , respectively. e RIR cepstrum after PCA dimensionality reduction is According to the characteristics of the RIR cepstrum feature, d h 2,3 can be fitted as a straight line l i corresponding to the reflecting wall i with different reflection coefficients.
When the kth pulse observed by the microphone matrix is greater than 3, it is not possible to determine whether the kth pulse observed at this time is a first-order echo. e firstorder echo candidates from wall i can be obtained from the TOA values of the first and second echoes and the relationship between the microphone positions. ese candidates can be combined to obtain a new combination of room pulses, which corresponds to the RIR cepstrum d τ as follows: If the new combination is a first-order echo from wall i, the corresponding room impulse response cepstrum d τ should be near the straight line l i , and the distance from d τ to the straight line l i satisfies the following equation: where Δε is the Euclidean distance threshold. Following the above method, the first-order echoes of different walls can be distinguished, and thus, the location of the image source can be estimated.

Pose Correction Based on Graph
Optimization. e pose correction method consists of three parts: closed-loop detection, pose estimation, and pose correction. In this paper, the pose at different locations of the robot is used as nodes of the graph. e constraint relationship between graph nodes is established using closed-loop detection and pose estimation. Finally, the graph optimization method is used to correct the robot's pose.

Closed-Loop Detection.
Closed-loop detection determines whether the robot has reached the previous position, and it is extremely important for back-end optimization.
After the above echo labelling method, the robot at the x i location can obtain the RIR cepstrum consistent with the k-sided wall observation, and these cepstrums can be combined into a 2k-dimensional RIR cepstrum vector X i as follows: where x i,k y i,k is the RIR cepstrum from wall k. Similarly, the robot can obtain RIR cepstrum vector X j at x j as follows: e vectors X i and X j can be used to express the Euclidean distance between them to express the similarity of their spaces. When x i and x j are in the same position or close to each other, the Euclidean distance between the two vectors should satisfy the following formula: where δ is the Euclidean distance threshold.

Trapezoidal Constraint-Based Pose Estimation.
In indoor space, the actual position of the robot and the position movement of the image source satisfy the isosceles trapezoidal constraint [18,29], as shown in Figure 6. Based on this constraint, a robot positional estimation method was proposed.
In world coordinates, let the robot's pose at x i− 1 be X i− 1 � (x i− 1 , y i− 1 , θ i− 1 ) and the robot's pose at x i be X i � (x i , y i , θ i ). e change in pose dX i of the robot moving from where dX i is expressed in polar coordinates as dX i � (r cos α, r sin α, θ).
In equation (21), r is the displacement variable, α is the angle of the displacement direction to the X-axis of the world coordinate system, and θ is the rotation angle of the robot coordinate system. e coordinates of the image source at x i− 1 and x i are expressed in polar coordinates in the robot coordinate system, respectively, as follows: e robot rotation angle θ is related only to the angle of the polar coordinates of the image source.
e estimated value of the robot rotation angle is Robots from x i− 1 move to x i , and the length of the image acoustic source polar coordinates changes as follows: Let s � (r, α), the following vector function can be obtained: e estimated value of the acoustic sensor d has a random error ζ, and the variance of the error is σ 2 .
Weighted error function (s) is as follows: where Φ n � σ 2 I and I is the unit matrix. e objective optimization function is obtained by minimizing the weighted error function ε(s).
e Levenberg-Marquardt algorithm is used to solve equation (29).
Linearize the error function ε(s) by linearly expanding d(s) through a first-order Taylor series:

Security and Communication Networks
where J(s) is the Jacobi matrix, which is expressed as follows: Equation (30) is brought into equation (28) to solve for the extreme value of the weighted error function ε(s). According to the Levenberg-Marquardt method: where H � J(s) T J(s) and g � J(s)(d − d(s)). e above equation allows to find the step size Δs k for each iteration. e value s 0 � (r 0 , α 0 ) estimated by the IMU is used as the initial value for the iterative calculation.
where n is the number of iterations. According to the above method, it is possible to use the acoustic signal to accurately estimate the pose change between different positions of the robot.

Pose Correction Based on Graph Optimization.
According to the above method, we can construct the graph. Every time the robot moves a certain distance or rotates a certain arc, a vertex is added to the graph, and the constraint relationship between the vertices is established according to the pose estimation algorithm. e structure of the graph is shown in Figure 7.
Let x � (x 1 , x 2 , . . . , x T ) be a vector of parameters, where x i describes the pose of node i. e robot moves from the pose node xi to the pose node x j , z ij is the pose transformation estimated by the IMU and z ij is the pose transformation observed by the acoustic sensor. Let e(x i , x j ) be the error function from x i to x j , which is the difference between the robot's predicted observation z ij and the actual observation z ij .
(34) e dashed box in Figure 7 shows the constraint relationship between node x 2 to node x p .
Let C be the set of constraint pairs of nodes in the graph and the set of nodes in the graph of trajectory points x of the robot. e goal of the maximum likelihood method is to find the configuration of nodes x * that minimizes the negative log likelihood F (x) of all observations.
where Ω ij is the measurement information matrix of the error function e ij . e objective optimization function is Using the first-order Taylor expansion error function e ij (x i , x j ).
e error function e ij (x i , x j ) is only related to x i and x j . Its Jacobi matrix J ij is e nonlinear optimization algorithm is used to iteratively solve for the minimum value of F (x). e step size of each iteration can be solved using equation (32).
where H � (i,j)∈c J ij T Ω ij J ij and b � (i,j)∈c J ij T Ω ij e ij . e optimal robot trajectory point x * is obtained by iterative calculation.
where x 0 is the initial trajectory point of the robot estimated by the IMU and n is the number of iterations. e robot travels around the room once, constructs the vertices and edges of the graph according to the method in this paper, and solves equation (36) using a nonlinear optimization algorithm. e optimal robot movement trajectory is obtained by optimizing the length of the edges of the graph to minimize F (x).

System Implementation and Experimental Verification
is section introduces our self-designed robot prototype and then experimentally verifies the performance of the acoustic SLAM method in this paper.

System Implementation.
Our robot is based on the Turtlebot3 Waffle Pi robot, a small, low-cost, fully programmable, ROS-based mobile robot, as shown in Figure 8(b). Turtlebot3 consists mainly of Raspberry Pi3 and OpenCR control board (with IMU sensor inside). In this system, OpenCR is responsible for collecting the built-in IMU sensor data and sound data as well as driving the robot to move, Raspberry Pi 3 is responsible for processing and calculating the data, and Raspberry Pi 3 is connected to OpenCR via USB 2.0. e system architecture connection diagram of the robot is shown in Figure 8(a). Since the sound source is close to the microphone array, which may result in larger direct waves and affect the reception of other reflected waves, a sound insulation panel was designed between the microphone array and the sound source to isolate the direct sound. In addition, the sound insulation panel can also isolate the reflected waves from the upper and lower walls. e physical diagram of the robot is shown in Figure 8(c). e robot uses an active acoustic scheme for self-localization and room contour estimation, and to prevent the sound signals emitted during the robot's work from affecting people's life and work, this paper uses sound pulse signals in the pseudoultrasonic band (16k-24k), which has a wavelength between 1.4 cm and 2.1 cm and can be guaranteed to be received by a small microphone array, and the sound signals in this band are insensitive to the human ear but can be picked up by the robot's acoustic sensors. Sound source emits a sound signal of 16 kHz-20 kHz chirp pulse signal, which can avoid the leakage of sensitive information such as indoor human voice and ensure the security of privacy, but most speakers do not support the sound signal of the band, the need for speaker selection. Sound generation equipment was used, that is, Huawei Sound is used as the sound source. Huawei Sound is a 360-degree omnidirectional speaker with a frequency response range of 55 Hz-40 kHz and supports 3.5 mm wired audio input. e process of acquiring the sound pulse signal from the robot itself is the process of converting the sound signal from a mechanical wave to a digital signal. e sound signal is first converted into a voltage signal through a microphone, and since this voltage signal is usually small, it needs to be amplified by an amplifier; then, the amplified signal is passed through a 15k-24k bandpass filter to filter out the noise signal outside the pseudoultrasonic band. Finally, the filtered signal is converted to a digital signal by an ADC.
Based on the acquisition process of the sound signal, the microphone array signal acquisition board for the robot was designed.
e microphone in the acquisition board is a 130F22 omnidirectional microphone from PCB, which has a frequency response range of 10 Hz-20 kHz, and the microphone is an SMB interface that can be plugged into the acquisition board standing up. e acquisition board is a four-channel microphone array, and the microphones are distributed at equal intervals on a circle with a radius of 0.14 m. e physical diagram of the acquisition board is shown in Figure 9.

Experimental Verification.
e experimental part is divided into echo labelling experiments, pose correction and room shape estimation experiments, and real room experiments.

Echo Labelling Algorithm Performance Simulation.
For the echo labelling experiments, echo labelling simulations were conducted in three different shapes of rooms: square, rectangular, and pipeline. e shape of the room is schematically shown in Figure 10, and w1, w2, w3, and w4 were used to denote the four walls of the room, and their corresponding reflection coefficients are [α 1 , α 2 , α 3 , α 4 ]. e radius of the microphone array of the robot is 0.2 m, and a location in the room is randomly selected, the RIR of each microphone under that location is simulated using the image source method, and the echoes are labelled using the method in Section 3. To verify the robustness of the algorithm to noise, the Gaussian white noise was added to the propagation distance of the signal as follows: where d is the true propagation distance and ε is the additive noise, and its standard deviation σ varies from 0.01 to 0.05 in steps of 0.005. Similar to [15], the F1-score was used to evaluate the goodness of the echo markers. Literature [15] solved ELP by means of elliptical iterations, and for comparison, experiments in a square room with reflection coefficients for each wall of [0.8, 0.8, 0.8, 0.8] were performed. Randomly 300 points were chosen to conduct echo labelling experiments with the method of this paper and compare with the method of literature [15]. e experimental results are shown in Figure 11(b), where method 1 is the above method and method 2 is the method of this paper. en, the F1-score of different numbers was simulated of microphone array robotic echo markers in each of the three rooms in Figure 10   shown in Figure 12. e experiments show that the F1-score can be maintained above 95% in all three rooms under low noise conditions (σ ≤ 0.03 m).

Simulation of Pose Correction and Room Shape
Estimation. Simulation experiments were conducted on robot self-localization and room position estimation in a room with wooden door and glass windows of size 7 m * 7 m. Among them, the sound reflection coefficient of wooden door is 0.7, the sound reflection coefficient of glass window is 0.8, and the sound reflection coefficient of wall is 0.9. e robot's microphone array is a four-channel microphone array with a radius of 0.2 m. e robot travels around the room along the wall, and every 0.4 m, the robot actively emits sound and simulates the RIR of the current position (σ � 0.05 m). e blue diamond line in Figure 13(b) shows the trajectory of the robot without pose correction, and the green line is the closed-loop result detected by the method in this paper. Figure 13(c) shows the trajectory after the pose correction based on the graph optimization, and the optimized path trajectory is basically consistent with the real trajectory. Figure 13(d) shows the location of the reflector estimated based on the optimized path, where the green "×" is the estimated location of the wall, the red "×" is the location of the glass, and the pink "×" is the location of the door. e position error Err � ��������� X 2 err + Y 2 err was used to measure the robot's self-positioning error and mapping error, where X err is the X-axis coordinate error and Y err is the Y-axis coordinate error. Figure 14(a) shows the average self-localization error and mapping error statistics of the robot traveling some of the position points according to the route in Figure 13(a). e self-positioning error of the robot is less than 3.18 cm with 60% probability, and the average mapping error is less than 4.86 cm with 58% probability.  e comparison experiments between the method in this paper and the method based on KEF filtering was done, in which the robot drove around the room randomly once in the above room and simulated 100 Monte Carlo experiments, respectively. e average self-localization error and mapping error of the experiments are shown in Table 1, where method 1 is the method without path optimization, method 2 is the method of path optimization by KEF filtering, and method 3 is the method of this paper.

Real Room Experiments.
To verify the stability of our own designed robot and the practical performance of the method in this paper, the experiments were conducted in a real room of 4.3 m × 5.5 m × 3 m (room dimensions were obtained using total station measurements). e total station was placed at the doorway and was used to measure the actual position of the robot as well as the actual position of the walls. e positions of the total station and the robot in the room are shown in Figure 15(a). Due to the height limitation of the robot, the robot can only measure the reflected wall under the red line in the right figure in Figure 15(a). e robot travels around the room close to the wall, and every time it moves, the robot actively vocalizes once (moving distance is less than 0.5 m) and records the RIR of the current position. Every time the robot moves during the experiment, the real position of the robot is measured with the total station and recorded. e RIRs obtained by the four microphones are shown in Figure 15(b) (the ambient temperature of the experiment is 30 degrees, and the corresponding sound speed is 349.75 m/s). Red marker points are first-order echoes from the wall, and red marker points of the same shape are first-order echoes from the same wall. Since the sound insulation panels were added between the speaker and the microphone receiver board and the sound propagation direction of the speaker is 360 degrees in the horizontal direction, there is no first-order reflection echo from the upper and lower walls. e final experimental results are shown in 2D, as shown in Figure 15(c). e  overall average self-positioning error of the robot is 2.84 cm, and the average mapping error is 4.86 cm.

Conclusion
is study introduces a graph optimization-based acoustic SLAM edge computing system and a method that provide new ideas for the solution of the acoustic SLAM problem. Based on the solution in this study, the robot can use acoustic signals to achieve self-localization and centimeterlevel room map construction services containing door and window information. e current method in this paper has better performance in an empty room. In the future, acoustic SLAM research will be conducted in more complex indoor spaces.
Data Availability e simulation and experimental data used to support the findings of this study have not been made available because this paper is funded by the Guangxi Science and Technology Plan Project (No. AD18281044). e grant is still in the research phase, and all research data are currently restricted to disclose within the project team.

Conflicts of Interest
e authors declare that they have no conflicts of interest.