Gait and Trajectory Optimization by Self-Learning for Quadrupedal Robots with an Active Back Joint

(is paper presents an efficient technique for a self-learning dynamic walk for a quadrupedal robot. (e cost function for such a task is typically complicated, and the number of parameters to be optimized is high.(erefore, a simple technique for optimization is of importance. We apply a genetic algorithm (GA) which uses real experimental data rather than simulations to evaluate the fitness of a tested gait. (e algorithm actively optimizes 12 of the robot’s dynamic walking parameters. (ese include the step length and duration and the bending of an active back. For this end, a simple quadrupedal robot was designed and fabricated in a structure inspired by small animals. (e fitness function was then computed based on experimental data collected from a camera located above the scene coupled with data collected from the actuators’ sensors. (e experimental results demonstrate how walking abilities are improved in the course of learning, while including an active back should be considered to improve walking performances.


Introduction
e main advantage of legged robots over their wheeled counterparts is their ability to overcome obstacles such as stairs and hard outdoor terrains. For example, in [1], the authors presented a continuous free gait generation method for quadrupedal robots when walking on rough terrains based on the CoG trajectory planning method. Upright robots can manipulate objects and interact in a human environment using their hands [2]. Nevertheless, dynamic walking gaits are harder to achieve for these robots compared to quadrupedal (see [3,4] for a survey on robot learning from demonstrations). Also, note that there is a clear distinction between bipedal and quadrupedal mechanisms in the literature. e term gait, which is often used in characterization of legged robot walking patterns, refers to the patterns of the limbs during locomotion on a solid surface.
Running and walking gaits differ by the interaction time with the ground. For walking, the interaction period with the ground is more than 50% of the entire gait cycle [5]. Quadruped and hexapod robots have stability and speed advantages over bipedal robots (see, for example, [6], where Juarez-Campos et al. implemented a Peaucellier-Lipkin mechanism for a hexapedal robot). Moreover, trafficability of these robots increases with their degrees of freedom [7][8][9].
An important approach in designing biped walking gait is the zero moment point (ZMP) [10]. It is defined as a point on the floor where equilibrium is established, while the horizontal portion of the reaction-moments vanishes. e main idea is that when the ZMP is within the convex hull of the contact points between the feet and the floor (support polygon), stable walking may be maintained. For example, see [11] which used particle swarm optimization (PSO) in order to maintain stable walking and [12] which used PSO for quadruped gait adaptation using terrain classification and gait optimization. e reader is referred to [13] for some additional stabilization methods. Controlling a real-life walking robot according to its dynamic model alone is difficult (see [14]). e associated dynamics is complicated with multidimensional vector-state which is basically nonlinear and timevariant. Moreover, uncertainties in the robot's model and state add complexity (see [15]).
A considerable portion of research in the field aims to ease calculations and enhance the robustness of walking.
Explicitly, improvement is aimed at [16] (1) trajectory planning; (2) achieving ZMP optimal walking gait; (3) calculating the ZMP's position feedback-force system; and (4) planning reference trajectories of the center of gravity (CoG) of the body. Many researchers apply reinforcement learning to calculate the CoG of the robot such as [17] which used reinforcement learning for posture stabilizing enhancement by exerting random disturbances. In [18], the researchers implemented a hybrid adaptive fuzzy dynamic evolutionary neural network technique. eir performances are demonstrated in simulations.
Genetic algorithm is yet another approach for optimizing a set of parameters according to an objective function. For biped motion control, Taherkhorsandi et al. [19] presented a sliding control which was used to optimize an adaptive robust hybrid PID controller while a GA was applied to select the controller's coefficients (from the Pareto front). Kato et al. [20] presented a research study where they used a GA in a simulation for gait optimization and implemented their results in a real model. ough not all gaits performed well, some gaits generated by a GA on a simulated environment did. erefore, it is preferable to apply a GA on real-world experiments. is is the main aim of this paper.
In real implementations, it is mostly common to close the loop using sensors. Loffler et al. [21] used IMU sensors and encoders for every motor shaft and also applied a 6-axis force-torque sensor placed in the robot's foot. As expected, they reported an overall improvement in walking and jogging. e same strategy is used for recent quadrupedal robots such as HyQ2Max and HyQ2Centaur [22] or Cheetah 2 [23] or the well-known BigDog [24], LittleDog [25], and WildCat [26]. ese robots also map out their environment by using infrared cameras, retroreflective markers, and range sensors.
e usage of cameras may be implemented for position and velocity sensing. A camera may be placed on the robot, or alternatively, as an external sensor, and data on the robot's performance are gathered during its motion. Obviously, using cameras as sensors in the gait optimizing procedure requires real-time techniques for analyzing the video stream (see [27,28]). e back joint (spine) was investigated in [29], where the research studies presented simulations showing that an active back is a key driving factor for the improvement of speed and cost-of-transport in quadrupeds. In [30], Khoramshahi et al. presented a simplified (wheeled) locomotion system which is behaviorally and structurally similar to a galloping quadruped and showed that fast locomotion requires a flexible spine. e examples presented above present sophisticated robots in terms of their sensory equipment and mechanical structure (at least 3 degrees of freedom per each leg), which lead to impressive trafficability and load-bearing abilities. In this work, we use a machine learning approach based on a genetic algorithm (GA) for improving a four-legged robot's walking abilities. Such an approach was introduced in [31] for quadrupedal robots with a rigid back, where the researchers optimized the forward speed as their objective function. Here, we extend their work to the case where an active back is present by optimizing "straightness" of movement coupled with the power efficiency. After optimizing these gaits, the robot will have the ability to track on a predefined path such as in [32].
is paper is organized as follows: in Section 2, we introduce the mechanical concept of our quadrupedal robot. In Section 3, we specify the initial gaits for the first generation and explain the genetic algorithm which we use. Section 4 provides a description of the experimental setup and the optimization results. We conclude the paper in Section 5.

Mechanical Design
Nine servo motors were incorporated into the robot's structure (Figure 1), two MG90S microservos per leg and one VEX-EDR-393 for the back joint (see Figure 2(b)). e robot's body was designed and built as a two-dimensional platform of approximately 200 mm length made of 4 mm thick Perspex. Each leg was connected to the robot body by a shaft with torsion springs (see Figure 2(a)). ese are used for maintaining the elasticity during the robot's movement. e robot's body consists of two parts connected by a shaft actuated by a servo motor that enables the back curvature (see Figure 3). e leg was designed as a five-bar mechanism (see Figure 2(b))-a two-DOF leg.
e advantage of such a structure is that the motors are mounted on the robot's skeleton rather than on its joints (see also [33]). is design reduces the leg's physical volume and mass. In addition, such a design enables scaling the motors without changing the leg's structure and its moment of inertia [34]. To enable a proper foot path, x(t), y(t), where t ∈ [0, 1], (Figure 2(b)), one solves the inverse-kinematics problem. is yields the motors' angles θ 1 (t), θ 2 (t) for a single cycle duration.

Genetic Algorithm for Self-Learning
e set of parameters controlling the robot's motion is e parameter T is the time duration of a single cycle. e rest of the parameters are dimensionless and are given with respect to T. So, Τφ i indicates the i-th leg time phase from the cycle's beginning, and Tψ i indicates the time duration of the i-th leg (i � 1, 2, 3, 4) to complete its individual cycle. Tφ b , Τψ b indicates the time phase of the back joint and its time duration, respectively (see Figure 4).
We initiate the algorithm by selecting the gait's parameter vector (1). To do so, we manually generated the following gaits: walk, rack, amble, canter, trot, and gallop. eir values were extracted from [35]; see Figure 4. Additionally, 4 initial gaits were randomly chosen.
In general, the GA searches for an optimal gait by applying genetic operators to a population of gaits [36]. Gaits that perform well are rewarded and proliferate through the population, whereas gaits that perform poorly are removed. In our GA, from the second generation onward, in order to avoid s local stationary solution, we used a random operator function to generate random new gaits. Our GA maintains a population of the 6 best gaits and uses mutation and pairing to manipulate the gaits in the population. Each generation of tests consists of a random function (20%), mutation (40%), and pairing (40%). e random operator randomly generates a new gait with UB and LB-the upper and lower parameter bounds, respectively. e quadrupedal feet are impacted by the floor during the gait. So, when Τ ψ i is too short, the robot's limbs jerk, which may harm the mechanism. On the contrary, long time periods are not desirable as well (see, for example,   e mutation operator acts on the gait by randomly altering its parameters with some predefined probability ρ: (2) Here, we used ρ � 0.2. e pairing operator generates a new gait from two "parent" gaits by performing multipoint recombination of their set of parameters. e gaits chosen for "reproduction" were (1) the gait G best which gained the best score, paired with (3), and a random gait G rand chosen from the top eight, equation (3). We define α � (score(G best )/score(G best ) + score(G rand )), the prioritizing weight. e offspring gait is then 3.1. e GA Weight Function. e weight function (also known as the evaluation or cost function) evaluates how close a given solution is to the optimum solution of the desired problem. Each solution is given a score, specifying its fitness to the desired solution. We are interested in a smooth trajectory that requires minimal power consumption while maximizing the resulting velocity. e efficiency is the ratio between the input power and the power consumption (i.e., the ratio between the average kinetic energy ∼ v 2 and the input power W, compare with [29]). We also include an additional dimensionless ratio d/T which measures the "straightness" of the path (compare with [37]). e fitness function used for scoring the robot's gait is then Here, v, d, and T are the average velocity, distance, and overall trajectory length, respectively. To comply with the dimensions indicated above, we set k w � k d � 1, k v � 2, though these may be chosen differently for other purposes.
e power consumption W of the motors is calculated by summing the electrical current sensor's value in every constant time interval. Since the voltage is maintained constant, the current summation will do. e power consumption was extracted from an INA219 high side DC current sensor placed on the robot that sampled the current consumption of the motors during walking. In addition, an IR LED was mounted to the robot's back, faced upward. A webcam with an IR filter was positioned above the experimental area in order to detect the robot's location. e images received from the camera underwent image processing for identifying the locomotion of the robot on the experimental surface.

Experimentation and Results
e experimental setup (see a short movie in [38]) included a 400 mm × 600 mm horizontal surface ( Figure 5). In order to avoid skidding during the experiment, the surface and the robot's foot were covered with fabric. e computation lasted for fifteen generations in which the algorithm tested a total of 146 different gaits and converged to a solution. It began with 10 initial gaits followed with 14 generations having 10 gaits each. During the experiments, the trajectories' length and smoothness were improved as the number of generations increased (Figures 6 and 7). Each experiment begins by placing the robot in a predefined starting point. e experiment ends after 7 T time periods or alternatively when a nonmotion scenario occurs. Each gait was repeated 3 times, and results were averaged. e results show that both the mutation and the random functions helped the algorithm to converge. e authors believe that the random procedure was required in order to move "freely" in the solution space at the first stages of the algorithm, preventing convergences into local minima solutions (compare with a simulation annealing approach). e pairing function had little effect on finding the optimal gait. ese insights were manually examined in the course of the experimentation by tracking the convergence rate. e back joint was found to be significant. We performed two sets of experiments: (1) e back joint was activated in accordance with the parameters Tφ b and Tψ b indicating the time phase of the back joint and its time duration, respectively (2) e back joint was fixed to a flat angle during the optimization Figure 8 depicts the corresponding results of these two setups. At the first stages of the optimization, the back-joint activation did not play a significant role. A possible explanation for this is that, in these early stages, robot locomotion was far from optimal, and thus, improvements due to any of the parameters were of the same importance. Nevertheless, in generations 7 to 10, where fine tuning of the parameters took place, having an addition joint that completely changes the robot's dynamics, such as the back joint, is expected to be of importance and indeed it was. e final walking gaits were found to be G � (79, 58, 56, 13,30,13,43,20,52,8,50) which include the back parameters. e final walking gaits which ignore the back parameters were found to be G � (95, 61, 46, 31, 42, 11, 0, 26, 96, 0, 0).

Conclusions
We introduced a real-world set of experiments for optimizing the robot's walking gait using a genetic algorithm. In order to demonstrate our solution, a low-cost mechanical model of a quadrupedal robot was designed and fabricated. Our results show that, after 15 generations, the robot's trajectory improved significantly relative to the first generation gaits. e improvement was shown in terms of the robot's velocity, trajectory length, and smoothness, as well as power consumption. e authors believe that the concept described in this paper may be implemented in a totally autonomous experimental system, which releases the need to lift the robot to the start point at each test and enables a larger number of generations. In addition, as shown in the literature [39], the back structure has a significant function in the gait quality. Here, we implemented an active back to examine these advantages.
Future work will include a mechanism having the ability to perform a large number of experiments without a human    Journal of Robotics 5 in the loop. Moreover, we shall examine, in addition to the active back, the qualities of an active tail.
Data Availability e authors confirm that the data supporting the findings of this study are available within the article.

Conflicts of Interest
No potential conflicts of interest were reported by the authors.