Modelling Hysteresis in Shape Memory Alloys Using LSTM Recurrent Neural Network

,


Introduction
Shape memory alloys (SMAs) are a class of materials that exhibit the unique shape memory effect (SME) due to their crystalline structure.SME allows the alloy to recover its strain when its temperature is changed, which induces a phase change between the austenite and martensite crystal structures [1].This shape memory property along with high force-to-mass ratio, biocompatibility, and silent operation makes SMA ideal for various applications requiring significant force and movement [2,3].
SMAs exhibit nonlinear dynamics coupled with hysteresis behavior, resulting in complex material characteristics.The complexity is further heightened by the dependence of this behavior on factors such as applied stress, transformation temperature, the percentage of martensite and austenite at any given moment, and constituent elements of the alloy making modeling such alloys even more challenging.Researchers have introduced various modeling approaches to predict SMA behavior, including constitutive models, hysteresis models, and those trained by machine learning (ML) methods.
Constitutive models of SMAs attempt to describe the behavior of these alloys as a function of variables including stress, strain, temperature, and time rate of them.The Tanaka model is one of the earliest constitutive models proposed for SMAs, introduced in 1986 [4].In this model, strain, temperature, and the volume fraction of martensite phase are considered as state variables, and stress is calculated as a function of these variables.Additionally, the phase transformation kinetics is expressed exponentially and is a function of stress and temperature.Liang and Rogers [5] built upon Tanaka's research to introduce a novel set of empirical equations for phase transformation kinematics.Their approach involves a simplified kinetic relation, represented by a cosine function, to describe the martensitic phase fraction.While the Tanaka and Liang-Rogers models successfully describe the phase transformations between martensite and austenite, they are limited in that they do not account for the detwinning of martensite that produces the SME at lower temperatures [6].The Brinson model [7] was utilized to solve this problem.In the Brinson model, the martensite volume fraction is separated into stressinduced and temperature-induced components.There are also modified versions of the Brinson model [8], but the complexity of the equations increases along with the model's accuracy in predicting SMA behavior.
Hysteresis in SMAs refers to the phenomenon where the material exhibits different behaviors during loading and unloading cycles so the material's response is history-dependent, meaning it depends not only on the current input but also on its previous states.Hysteresis models can be classified into two main categories: operator-based models like Preisach [9], Krasnoselskii-Pokrovskii [10], and Prandtl-Ishlinskii [11] that use play operators and differential equation-based models [12,13].Operator-based models can accurately predict hysteresis but require complex computations.Differential equation-based models are simpler but less flexible in modeling complex hysteresis.
Constitutive models for SMAs use complex equations to describe hysteresis and nonlinear stress-strain-temperature relationships.Determining these equations is timeconsuming.On the other hand, hysteresis models only consider one input parameter, neglecting others that constitutive models include.To overcome these challenges, ML methods have been proposed as an alternative to model SMA behavior.
In recent years, ML methods have been applied to various applications in real life.In health care field, ML algorithms have been instrumental in diagnosing diseases and predicting patient outcomes with greater accuracy [14,15].In the finance industry, ML has enabled professionals to predict financial parameters with precision, leading to better investment decisions and risk management [16]; in the field of geology, ML enables professionals to analyze vast amounts of geological and spatial data to make informed decisions on environmental planning [17][18][19] and urban and rural development [20].
Neural networks (NNs), as a subset of ML, have proven effective in representing the hysteresis characteristics.NNs are a viable alternative to traditional modeling approaches for capturing the complex behavior of SMAs.In a 2003 study [21], researchers used a shallow NN as an open-loop controller to tracking control of an SMA actuator.The inputs to the NN were desired outputs and a label indicating whether the system was in the heating or cooling state.In another 2010 study [22], researchers used a shallow NN to estimate the strain of an SMA wire.The inputs to this NN were the resistance of the wire at each moment and binary values indicating whether the system was in the heating or cooling state, but this approach requires the SMA to be only on major hysteresis loops, meaning SMA should be fully expanded or fully contracted.This NN was then used in a proportional-integral-derivative (PID) control algorithm to estimate the displacement and consequently eliminate the need for a displacement sensor.In 2011, Zakerzadeh et al. used an NN to approximate functions that determined the hysteretic behavior of a numerical Preisach model.The results demonstrated that NNs for numerical function approximation provide higher accuracy in predicting hysteresis behavior compared to the classical Preisach model and numerical approaches [23].In 2013, Wang and Song introduced a new type of recurrent neural network (RNN) that can predict the hysteresis behavior of an SMA wire at different frequencies.The output of this NN was the strain of the wire in the next moment, and its inputs included the previous output values of the NN and the given current value of electrical current [24].In a 2018 study [25], researchers  current.In 2020 [26], researchers used two different NNs to predict the temperature of an SMA wire.The first NN was a feedforward network with two hidden layers with 32 and 16 neurons, and its inputs were the differential resistance value, four current values, and a label determining whether the input voltage was increasing or decreasing.The second NN used for predicting the temperature of the SMA wire was a long short-term memory (LSTM) network, and its inputs were the current values and the differential resistance values up to three previous moments.The second NN achieved significantly higher accuracy.In 2022, researchers used an innovative NN to estimate the displacement of an SMA actuator consisting of a pair of antagonistic SMA wires [27].The NN used in this research consisted of three parts.In the first part, an LSTM neural network was used, with the input being the differential resistance values of the wire in the last 50 moments and the output being the temperature values of the wire in the last 50 moments.In the second part, a feedforward network was used to model the static relationship between the temperature value and the martensitic volume fraction of the SMA wire.In the third part, similar to the first part, an LSTM network was used, with the inputs being the martensitic volume fraction values in the last 50 moments and the output being the displacement value at the current moment.The results obtained in this study were compared with the results of a 2-layer LSTM neural network, and it demonstrated that the designed network in this research provides better results.The aim of this research is to construct a model for a rotary actuator actuated by SMA using LSTM neural networks.In contrast to previous works that utilized NNs, using LSTM network eliminates the need for single-tag data to determine whether the SMA wire is in a loading or unloading state.Furthermore, LSTM networks demonstrate the capability to model both major and minor hysteresis loops [21,22].Additionally, owing to time series nature of LSTM network, there is no requirement for supplemental information such as the frequency of the input signal [25].Using LSTM network allows to use simpler architecture thereby circumventing the need for multiple feedback loops to capture historical relations in hysteresis loops [24].
The paper is structured as follows.Section 2 presents the experimental setup of the SMA-actuated rotary actuator and introduces the input signals used to obtain training data for the LSTM model.In Section 3, the proposed LSTM model is introduced.Section 4 presents the performance of the proposed model and compares the results with a ratedependent Prandtl-Ishlinskii (RDPI) hysteresis model.Finally, in Section 5, we conclude the outcomes and goals of the research.

Experimental Setup
The test setup shown in Figure 1 consists of a pulley with a radius of 2 cm and mass of 0.05 kg, actuated by two antagonistic SMA wires.SMA wires are connected directly from the pulley to a fixed base.In each moment, one of SMA wires is heated through voltage applied to its terminals and the other wire is initially contracted and serves as a spring, generating an opposing moment against the first wire.
The SMA wires used in this research are of the Flexinol type, having a diameter of 0.008 inches and a length of 50 cm.The pulleys can withstand temperatures up to approximately 200 degrees Celsius.It is worth noting that the temperature of the SMA wires will never exceed 160 degrees Celsius to prevent damage.A 3600-pulse rotary encoder (Autonics -E50S Series) measures the rotational angle of the pulley.To apply current to the SMA wires, an Arduino control board, a single-channel power supply with a maximum output of 32 volts and 3 amps, and a motor driver (LMD5560) which regulates and switches the current from the power supply to the SMA wires are used.
The input to the system is a pulse-width modulation (PWM) voltage signal applied to the SMA wire's terminals.The output of the system is the angle of rotation of the pulley sampled at a rate of 20 Hz.To better model the behavior of the SMA wire, we use two types of inputs to obtain the required NN training data.In the first type (Equation (1)), the input value reaches zero in each cycle, while in the second type (Equation ( 2)), the input value is nonzero in each cycle.The two input types are as follows: The input signals have a sinusoidal form that decrease in amplitude over time with a decay time constant of τ = 0 008.In this research, parameter A in Equation (1) and Equation ( 2) is set to 6 V, so the max value of signals v 1 t and v 2 t does not exceed 12 V.Therefore, the PWM duty cycle ranges 0-100% for 0-12 V.The frequency (f ) of the input signal varies from 0.03 to 0.07 Hz across trials.Figures 2(a 1) and Equation (2), respectively, both with a frequency of 0.03 Hz.Figures 2(c) and 2(d) then show the corresponding system response to these inputs.Finally, Figures 2(e) and 2(f) illustrate the system response versus duty cycle for Equation (1) and Equation (2) when using the 0.03 Hz input signals.

Modelling
Output of a system that exhibits hysteresis behavior depends not only on the current input but also on previous inputs-in 4 Journal of Applied Mathematics other words, the system has a form of memory.As shown in Figure 3, RNNs have a feedback loop, where the network's output is fed back into the network along with the next input.This allows RNNs to retain information about previous inputs in their internal memory, which can then be used to process sequential inputs.In essence, the feedback loop in RNNs gives them a form of memory derived from persisting previous inputs.
3.1.Recurrent Cell.RNNs are often built using standard recurrent cells, such as sigmoid and tanh units.Figure 4 shows a diagram of a typical recurrent sigmoid unit.The mathematical equation defining this standard recurrent tanh cell is   Standard RNNs with conventional recurrent units struggle with long-term dependencies; the larger the gap between relevant inputs, the harder it is to learn connections between them.Analyses by Hochreiter and Schmidhuber in [28] and Bengio et al. in [29] identified key reasons for this long-term dependency problem: error signals propagating backward through time tend to either explode or vanish.
The original LSTM was proposed by Hochreiter and Schmidhuber in 1997 [28] to address the problem of learning long-term dependencies in sequence data that happens in standard recurrent cell.The main difference between LSTM and standard RNN is the structure of the LSTM cell.Figure 5 illustrates an LSTM cell which contains an input gate, output gate, and forget gate.The gates act as pathways to control the flow of information, allowing only relevant data to pass through.
The mathematical expressions that define the LSTM cell, as shown in Figure 5, are as follows: where X t is a vector consisting of duty cycle of voltage applied to SMA wire (v t ) and angle of pulley (θ t ).
To determine the optimal time window size, d, the NN has been trained using different values of d.Prediction errors on evaluation data have been analyzed, as shown in Table 1.
The results indicate that as d increased from 1 to 3, the prediction error decreases significantly.However, further increasing d not only stops decreasing the error but slightly increases it.Given these observations, we can conclude that the most suitable time window size is d = 3.The architecture of the proposed model is presented in Figure 6.
At each time step, the model takes the current and previous 2 time steps of data on the pulley's rotational angle as well as duty cycle of the input voltage as input.The input data passes through 3 LSTM layers, with the output of the last LSTM layer being the hidden state vector h t .This vector h t is then fed into a fully connected layer with 64 neurons.The output from this fully connected layer gives the predicted rotational angle for the next time step.The network was trained offline using the neural network toolbox in MATLAB software.The training and validation losses are shown in Figure 7, while the remaining hyperparameters of the network can be found in Table 2.
Once the training of the LSTM model with the data specified in Section 2 is completed, its performance is assessed by evaluating it on 5 sets of test data that were not used for training.The inputs for test sets are generated using equations ( 12)-( 16): Furthermore, a signal with a variable frequency has been used as follows:     Table 3 shows the model's root mean square error (RMSE) values for one-step-ahead predictions across the five input groups I 1 to I 5 .

Multi-
Step-Ahead Prediction.In this section, we use the model to predict the system's response without having its actual values at each moment.For this purpose, the model's predictions are provided as inputs to the system for subsequent time steps.In this case, a specific measured angle of zero degree is used as the initial value.The model is utilized in this setup to predict all five groups of test data given in equations ( 12)-( 16).The results are presented in Figure 9.The model's performance accuracy is lower when making multi-step-ahead predictions compared to one-stepahead predictions, as seen in Figure 9.This outcome is expected because error accumulates over time when the model recursively predicts multiple steps into the future.Table 4 shows the model's RMSE values for multistep predictions across the five input groups I 1 to I 5 .

Comparison of the Proposed Model with RDPI Model
The rate-dependent Prandtl-Ishlinskii (RDPI) model is often used to model hysteresis behavior in SMAs and other smart actuators [11,30].This model can also incorporate the effect of excitation frequency in its equations.In this study, we utilized the same data that was previously used to train an NN to find appropriate coefficients for the Prandtl-Ishlinskii model as presented in a 2019 paper [11].The obtained coefficients for the constructed model are shown in Table 5.Furthermore, the results of this RDPI model for the five previously mentioned test data groups are compared to the proposed model in multi-step-ahead prediction configuration in Figure 10, and the results are presented in Table 6.
The results demonstrate that the RDPI model is unable to accurately capture the system's response at peak values, as seen in Figure 10.In contrast, the proposed LSTM model successfully models these points in its multi-step-ahead prediction mode, providing significantly more accurate results.Furthermore, when comparing the errors of the two models in Table 6, it is evident that the LSTM model's errors are approximately 70% lower on average than those of the RDPI model.

Conclusion
Modeling the behavior of SMAs is challenging due to their nonlinear dynamics and hysteresis.This research is aimed at creating a model for predicting the pulley rotational angle in an SMA wire-driven rotational actuator.To capture the hysteresis behavior of the SMA, which depends on current and previous inputs to the SMA wire, we employ an LSTM recurrent neural network capable of retaining previous input information.
The LSTM model developed in this research could effectively predict the nonlinear hysteretic behavior of the SMA wire actuator with high accuracy.The model takes as input the SMA wire voltages and pulley angles at the current and two previous time steps and predicts the pulley angle at the next time step.In online configuration where encoder data is available, the LSTM model generates accurate one-stepahead predictions.In offline mode without live encoder data, the LSTM model uses its own predictions as inputs for subsequent time steps.In this configuration, the results of the LSTM model were compared to a rate-dependent Prandtl-Ishlinskii model, highlighting the LSTM's superior accuracy.The success of the LSTM model in accurately capturing the

Figure 1 :Figure 2 :
Figure 1: SMA and rotary encoder connected to the pulley.
) and 2(b) display examples of input signals for Equation (

Figure 6 :
Figure 6: Architecture of the proposed RNN model.

Figure 7 :
Figure 7: Loss of LSTM network during training.

Figure 8 :
Figure 8: Actual and predicted angle by LSTM model in one-step-ahead prediction configuration.(a) I 1 as input signal.(b) I 2 as input signal.(c) I 3 as input signal.(d) I 4 as input signal.(e) I 5 as input signal.

Figure 9 :
Figure 9: Actual and predicted angle by LSTM model in multi-step-ahead prediction configuration.(a) I 1 as input signal.(b) I 2 as input signal.(c) I 3 as input signal.(d) I 4 as input signal.(e) I 5 as input signal.

Figure 10 :
Figure 10: Actual and predicted angle by LSTM model in multi-step-ahead prediction configuration compared with RDPI model.(a) I 1 as input signal.(b) I 2 as input signal.(c) I 3 as input signal.(d) I 4 as input signal.(e) I 5 as input signal.

Table 1 :
Errors of LSTM model for different values of d. and h t are input and recurrent information at time t, respectively, W h and W x are weights, and b is the bias.

Table 3 :
Errors of LSTM model in one-step-ahead prediction configuration for five types of inputs.

Table 4 :
Errors of LSTM model in multi-step-ahead prediction configuration for five types of inputs.

Table 6 :
Comparison of prediction errors for LSTM and RDPI models across 5 input types. of the SMA wire actuator is the key outcome of this research.