Exploring Physics-Informed Neural Networks for the Generalized Nonlinear Sine-Gordon Equation

,


Introduction
Diferential equations provide a powerful framework for describing a wide range of engineering, mathematical, and scientifc phenomena.Tey are particularly valuable in capturing heat transfer processes, fuid dynamics, wave propagation in electronic circuits, and mathematical modeling of chemical reactions.One notable example of a nonlinear hyperbolic partial diferential equation (PDE) is the nonlinear sine-Gordon equation (NLSGE), which dates back to the nineteenth century and originally emerged in the study of surfaces with constant negative curvature [1][2][3].Tis equation fnds extensive application in simulating and describing various physical phenomena across engineering and scientifc disciplines, including nonlinear waves, the propagation of fuxons in Josephson junctions, and the dislocation behavior of metals [4][5][6][7][8][9].
NLSGE has found numerous applications in various scientifc and engineering domains.In the feld of condensed matter physics, this equation has been used to study phenomena such as solitons and topological defects [10].In the realm of nonlinear optics, the equation is used to model the propagation of optical pulses in nonlinear media, particularly in the context of optical fbers [11].Furthermore, in the study of superconductivity, the NLSGE is used to describe the behavior of Josephson junctions, which are key components in superconducting devices [12].Te equation has also found application in surface science, where it describes the dynamics of atoms and molecules on surfaces, including the propagation of surface waves [13].Furthermore, NLSGE has been applied in biophysics to model phenomena such as nerve impulse propagation and protein dynamics [14].Tese are just a few examples of the wide-ranging applications of the NLSGE in diverse scientifc and engineering problems.Readers interested in additional information should consult the monographs [15][16][17][18][19][20][21][22][23][24].
Te NLSGE has recently been the subject of extensive computational and analytical analysis due to its signifcance in non-linear physics.For example, Babu and Asharaf [25] used a diferential quadrature technique based on a modifed set of cubic B-splines to numerically solve non-linear SGEs in one and two dimensions, as well as their coupled form.Te modifcation employed in this approach achieves optimal accuracy of order four in the spatial domain.Spatial derivatives are approximated using the diferential quadrature technique, and weight coefcients are calculated using the set of modifed cubic B-splines.In a diferent study, Shiralizadeh et al. [26] implemented the numerical method of the rational radial basis function to solve the perturbed and unperturbed NLSGEs with Dirichlet or Neumann boundary conditions.Tis method is particularly suitable for cases where the solution exhibits a steep front or sharp gradients.Furthermore, Babu and Asharaf [27] employed the Daftardar-Gejji and Jafari method to obtain an approximate analytical solution for the NLSGE.Tey compared the obtained solution with the variational iteration method to assess its accuracy.
In 2022, Deresse [28] achieved successful integration of the double Sumudu transform with an iterative approach to obtain an approximate analytical solution for the onedimensional coupled NLSGE.Te double Sumudu transform alone is insufcient to solve this particular equation.As a result, the linear component of the problem was addressed using the double Sumudu transform, while the non-linear part was handled through an additional iterative approach.Te two-dimensional stochastic time fractional NLSGE was investigated by the authors of the paper [29] in 2023.To fnd the numerical solution, they employ the clique polynomial approach.Te clique polynomial is regarded as a fundamental function for operational matrices in this method.For more details, refer to the following references: [30][31][32][33][34][35][36].
Tese recent developments highlight the growing interest in tackling the challenges posed by the NLSGE, and researchers use various numerical and analytical techniques to explore its solutions and properties.Tis paper aims to introduce a deep learning-based method called a physics-informed neural network (PINN), to acquire the solution of NLSGE with Dirichlet and Neumann boundary conditions.PINNs are a scientifc machine learning technique used to solve problems involving PDEs [37].By training an ANN to minimize a loss function, PINNs approximate PDEs.Tis loss function incorporates various terms, including the initial and boundary conditions along the boundary of the spacetime domain, as well as the PDE residual evaluated at specifc points within the domain, known as collocation points.Tis approach allows PINNs to capture the essential physics of the problem and provide accurate solutions throughout the domain [38][39][40].
A parallel information-processing system, known as an ANN, shares similarities with certain brain functions.Comprised of neurons and synaptic weights, an ANN learns to perform complex computations [41].By emulating the functioning of the human brain, the network receives inputs from various sources, combines them, applies non-linear operations, and produces an output [42][43][44].Te architecture of an ANN consists of three types of layers: input, hidden, and output, with neurons or units in each layer [45][46][47].Te architecture of the ANN processor is scalable, allowing for an infnite number of layers and neurons in each layer.It can also implement feedforward and dynamic recurrent networks [46,48].
Approximating highly non-linear functions has become an attractive application of NNs due to their inherent capabilities.However, in low to moderate dimensions, PDE solvers based on NNs or deep NNs typically fall short when compared to classical numerical solution methods.Tis is primarily because solving an algebraic equation is generally easier than dealing with the highly non-linear, large-scale optimization problems associated with NN training [49,50].
Furthermore, traditional numerical approaches have developed sophisticated error analysis techniques, which is an area where NN-based solvers currently lag.Consequently, specialized techniques have emerged over time to tackle specifc issues, often incorporating constraints or underlying physical assumptions directly into the approximations [51].One notable technique in this domain is PINNs, which have gained popularity for rapid prototyping when efciency and high accuracy are not the primary concerns.PINNs can be applied to virtually any diferential equation, making them versatile tools for approximation [52].
Te authors of the research presented in [53] demonstrated promising results that indicate the ability of PINNs to achieve good prediction accuracy, provided that the given PDE is well posed and a sufcient number of collocation points are available.PINNs seek to identify an NN within a specifc class of NNs that minimizes the loss function, resulting in an approximation of the PDE's solution [53].Unlike the classic variational concept, which minimizes an energy function, PINNs have introduced modifcations to this approach.A notable distinction between PINNs and variational methods is that not all PDEs satisfy a variational principle.However, the formulation of PINNs allows their application to a wide range of PDEs, regardless of whether the PDE possesses a variational principle [54].
In their work, Shin et al. [54] provide a theoretical justifcation for PINNs in the context of linear second-order elliptic and parabolic-type PDEs.Tey demonstrate that the sequence of minimizers strongly converges to the PDE solution in the set of continuous functions.Moreover, they argue that when each minimizer satisfes the initial/ boundary conditions, the convergence mode becomes the Sobolev space of order one.
Recently, the repertoire of scientifc publications on PINNs has increased rapidly, which confrms the efectiveness of PINNs.For example, Beck et al. [55] obtained the solution of stochastic diferential equations, and Kolmogorov PDEs sufer from the curse of dimensionality In the paper [37], the authors introduced an innovative approach that combines the power of NNs with the knowledge of physics to tackle complex problems related to non-linear PDEs.Te authors propose a framework where NNs are trained to approximate the solution of these equations while incorporating physical principles as constraints.Tis approach enables the accurate and efcient solution of both forward and inverse problems, ofering great potential for applications in various scientifc and engineering felds.Te study contributes to the growing feld of physics-informed machine learning, providing a promising avenue for advancing the understanding and solving of non-linear systems.
Blechschmidt and Ernst [40] provided a comprehensive overview of recent approaches to solving PDEs using NNs.Tey discuss the taxonomy of informed deep learning, present a literature review in the feld, and highlight the potential of using machine learning frameworks to accelerate numerical computations of time-dependent PDEs.Te authors used the PINN to solve a high-dimensional linear heat equation as an illustration and suggested that PINNs can ofer attractive approximation capabilities for highly non-linear and high-dimensional problems.
In the paper [56], the authors presented a novel approach to solving PDEs in complex geometries using deep feedforward NNs.Te paper explores the application of deep NNs in approximating solutions to PDEs and demonstrates their efectiveness in solving systems of ordinary diferential equations.Te authors provide insights into the architecture of the NN and discuss the weight connections between the neurons in diferent layers.Te research contributes to the feld of computational mathematics by introducing a unifed framework that combines deep learning techniques with the solution of PDEs, paving the way for more accurate and efcient numerical methods in complex geometries.To effectively solve diferential equations, the authors of the paper [57] presented DeepXDE, a potent deep learning library that combines the advantages of deep NN and PINN.Furthermore, Schäfer [58] applied Dirichlet boundary conditions to a PINN solution of the one-dimensional heat equation.To solve a single instance of the PDE, the authors compared a PINN to a NN with defned beginning and boundary conditions.It turned out that PINNs are more accurate than NNs for a limited number of training samples.However, it should be noted that a PINN uses more computation time than a NN because each iteration includes a gradient evaluation.As the runtime grows exponentially for an increasing number of input features, this can be a serious bottleneck for higher-dimensional issues.
More recently, [59] presented two novel PINN architectures that satisfy various invariance conditions for constructing robust and efcient deep learning-based subgrid-scale turbulence models for use in large Eddy simulation procedures widely used in various fuid engineering applications.Te frst architecture is called tensor basis neural networks (TBNN) and the second architecture is a Galilean invariance embedded neural network (GINN) that incorporates the Galilean invariance and takes as input the independent components of the integrity basis tensors in addition to the invariant inputs in a single input layer.A deep learning-accelerated computational framework based on PINN is presented by the investigator of the paper [60] for the solution of the linear continuum elasticity equation.Te authors suggested a multi-objective loss function that included terms that ft data-driven physical knowledge across randomly chosen collocation points in the problem domain, constitutive relations derived from the governing physics, terms corresponding to the residual of the governing PDE, and diferent boundary conditions.In a different study, a multi-objective loss function-based PINN is used by the authors of the monograph [61] to obtain the solution to the data-driven elastoplastic solid mechanics problem.
Even though many studies are conducted to use PINN to solve a variety of problems, many of them focus on elliptic and parabolic DEs.Tere are very few research papers on the use of PINNs to solve hyperbolic PDEs.Tis is due to hyperbolic PDEs like the NLSGE involving both second-order time derivatives and spatial derivatives.Such a problem contained an initial condition involving timederivative that adds an extra layer of complexity to the solution process, as the solution must satisfy the dynamics of the PDE while also matching the specifed initial data.In research published in the journal [62], the PINN method was used to solve linear hyperbolic PDEs while taking into account forward and inverse issues.Examples considered by the author are homogeneous linear wave equations.Te author did not, however, investigate the PINN for the nonlinear, hyperbolic PDEs that are inhomogeneous.In the present work, we use PINNs to solve NLSGE (1), which is the inhomogeneous non-linear class of hyperbolic wave equation containing a derivative of second order in time, taking inspiration from the work of the paper's author [62].We focus on exploring two boundary condition categories: Dirichlet and Neumann.To minimize the loss function of the residuals of the governing equation, initial conditions, and boundary conditions, a PINN technique with a multiobjective loss function is employed.In addition, we conducted experimental simulations to assess the impact of diferent neural architectures on the performance of the model.Subsequently, we implement the algorithm developed using the Python-based software library, Deep-XDE, as a computational tool [57].
Te remaining parts of this manuscript are organized as follows: In Section 2, the governing problem is presented with some preliminary descriptions.Fundamental ideas, theorems, defnitions, and an algorithm for PINNs are addressed in Section 3 for the specifed issues.Te method is validated in Section 4 using a numerical experiment for Dirichlet and Neumann boundary conditions, and fnally, concluding remarks are drawn in Section 5.

The Governing Equation
Te generalized Cauchy-type NLSGE employed in this paper is given by [63]: Here, Δ represents the Laplacian operator and n the dimension of the space variable x.Te function ϕ(x) can be interpreted as the Josephson current density, while the parameters α and β are real numbers with α, β ≥ 0. Te dissipative term, denoted by β, characterizes the presence of damping in the equation.When β > 0, (1) reduces to the damped SGE, while β � 0, equation ( 1) reduces to undamped SGE If f(x, t) � 0 the undamped SGE (2) has the conservation for the energy defned by which is not valid for the damped system (1) [64].Here dV � d n x is the Euclidean n− dimensional volume diferential.
In the case of d � 1, with x � x and △u � z 2 u (x, t)/zx 2 � u xx , (1) represents the NLSGE in one dimension.Te equation is subject to initial conditions: along with either Dirichlet boundary conditions: or Neumann boundary conditions: In this study, our aim is to address the solution of this equation using PINNs [37].PINNs employ NNs specifcally designed for solving PDEs by minimizing a loss function that incorporates the given PDE and both initial and boundary conditions.We develop a PINN algorithm and implement it using the Python-based software library, DeepXDE.Additionally, we conduct various deep experiments to identify the optimal neural architecture for our purposes.

Te Mathematical Description of Neural Network
Defnition 1 (see [65,66]).Let d ∈ N. We defne an artifcial neuron v: R d ⟶ R as a mapping with weight w ∈ R d , bias b ∈ R, and activation function σ: R ⟶ R. Te neuron's output is given by the expression Te role of the activation function σ is to produce the output from a set of input values fed to a node (or a layer).Tere are benefts and drawbacks to each activation function.Note that there is no set rule regarding the selection of an activation function for a particular activity.In machine learning, the most commonly used activation functions with PINN are the sigmoid function σ(x) � 1/1 + e −x , the tan hyperbolic function σ(x) � tanh(x) and the ReLU function Defnition 2. A deep feedforward neural network is defned as a function of the form W (1) ,b (1)    (X), where it consists of multiple layers.Each layer is represented by a semi-afne function incorporating a univariate and continuous non-linear activation function σ (l) .Te weight matrices W � (W (1) , . . ., W (L) ) and the ofsets (biases) b � (b (1) , . . ., b (L) ) defne the parameters of the network.Tis deep feedforward NN is designed to process input data X and produces output  Y, representing predictions or results of the network computation [66].

Te PINNs Algorithm for 1D NLSGE with Dirichlet BCs.
In this subsection, we present the PINN approach for approximating the solution u: [0, T] × Ω ⟶ R of the onedimensional problem (10) with Dirichlet boundary conditions.Te problem can be stated as follows: 4 Applied Computational Intelligence and Soft Computing subject to the conditions: where represents a bounded domain, and T denotes the fnal time.Te PINN method combines the supplied PDE with physical constraints placed on the network to ensure the answer respects the physics of the problem.In the PINNs method, a NN is used to approximate the solution, and a set of nodal points is where the equations are imposed in the least-squares sense.
(i) Construct an ANN  u(x, t; P) to serve as an approximation of the true solution u(x, t).(ii) Set up a training set that will be used to train the NN.(iii) Formulate an appropriate loss function that considers residuals of the PDE, initial, boundary, and fnal conditions.(iv) Train the NN by minimizing the cost function established in the previous step.

3.3.
Step 1: Deep Neural Network.We employ the following notations: Te superscript (i) denotes the i th data point (collocation) or training example, while superscript (l) represents the l th layer in the network.Te input size is denoted as n x , and the output size as n y .Additionally, n l refers to the number of neurons in the l th layer, and L signifes the total number of layers in the network.Te input is denoted by X, which is the set of collocation points comprising points from the interior and boundary of the domain.Te weight matrix for the l th layer is denoted as W (l) ∈ R n l+1 ×n l , and the bias vector in the l th layer is represented as b (l) ∈ R n l+1 .Te predicted output vector is denoted as  u∈ R n y or equivalently written as a (L) , where L indicates the total number of layers in the network.Figure 1 displays a demonstration of a sketch-deep NN diagram.Te structure shown is an advancement of the NN structure in papers [48,71] designed for the systems of ordinary differential equations.
To solve the one-dimensional NLSGE, our input data will have the form (x i , t i ) ∈ R 1+1 .Tat is, according to the notations described above, n x � 2. Furthermore, n y � 1 since we have only one network output  u(x, t; P), where P represents the parameter consisting of weights and biases.We selected the DNN scheme to have two nodes in the input layer and one node in the output layer that contains the value of  u(x, t) to generate u(x, t) that solves (7) using PINN.Tere were four hidden layers in the structure, and each layer contained ffty units (neurons).We consider a deepfeedforward NN, whose main objective is to approximate a function, in this case u(x, t) for any input (x, t), among other options.
In our case, the solution  u(x, t; P), which corresponds to the output of the NN, is constructed as described in [72], mainly: where: (i) a l : R d in ⟶ R d out is the l layer with n l nodes, (ii) W l ∈ R n l ×n l−1 and b l ∈ R n l are the weights and the biases and θ � W l , b l L−1 l�1 are the parameters of the NNs, and (iii) σ is an activation function which acts componentwise.

3.4.
Step 2: Training Dataset.When using a PINN to solve a PDE, it is important to properly split collocation points into two disjoint sets: training and test data to ensure accurate model evaluation [73].Training data will be used to train the PINN, while test data will be used to evaluate the model's performance.Tese data are typically split into ratios of 20% for testing and 80% for training in machine learning [74].Tis division ratio is sometimes referred to as the 80 − 20 rule.In this study, we used 500 for training and 125 for testing.Te training data X ⊂ Ω is the union of the set X Ω ⊂ Ω which contains points selected from the interior domain and the set X Γ ⊂ Ω which contains points taken from the boundary.Te general training set X of the PINN model for the initial/boundary value problem is a union of the following: 3.5.
Step 3: Loss Function.Te total loss function J(X; P) is the contributions of the losses due to: the residual of a given NN approximation  u: (i) Diferences from network approximations on the initial collocation points.
Similar to the originally proposed approach by authors of the paper [37], the PINN approach for the solution of the initial and boundary value problem now proceeds by minimization of the loss function of parameter P which is given by where Tus, the optimal parameters P * of the network satisfy x (i)  (10).Terefore, we apply the loss function given by ( 15) on the training samples (parts of the domain and the boundary, see Figure 2), and we get the blue line in Figure 3, which implies that the loss function of the train decreases with respect to the training time.At the same time, we calculate the loss function.
on the test samples.

Te Combined Adam and L-BFGS-B Optimization
Algorithms.Like NNs, the training process for PINNs corresponds to the minimization problem min P J(X; P).Training of network parameters P is carried out using a gradient descent approach such as Adam [75] or L-BFGS-B (limited memory algorithm Broyden-Fletcher-Goldfarb-Shanno) [76].However, the required number of iterations depends highly on the problem "(e.g., smoothness of the solution)" see [57].Te partial derivatives are necessary at every stage of the training process.Terefore, it is computationally difcult to calculate the PINN loss in each iteration if the interior domain contains a signifcant number of points.Lu et al. [57] proposed a method called residualbased adaptive refnement to increase the efectiveness of the training procedure.To validate the efcacy of these optimization techniques and enable their reuse, we conduct three separate experiments in this paper: one for the Adam optimization algorithm, one for L-BFGS-B optimization, and a fnal one for the combination of both Adam and L-BFGS-B optimization algorithms.

Weight Initialization.
Due to the randomness of the initial weight state in deep learning, each training can produce a distinct set of outcomes.Te variance of the input signal decreases as it moves through each layer of the network if the weights are set too close to zero.If the weights are excessively large, the network either approaches a vanishing gradient problem or the variance of the signal tends to amplify as it moves through the network layers.Terefore, choosing weights that are either too high or too small is not a feasible initialization since in both circumstances, the initialization is outside the optimization procedure's right-hand basin of attraction.Tere are several well-known randomized weight initialization techniques, including uniform, Gaussian, Glorot uniform, and Glorot normal initialization over time.When used in conjunction with symmetric activation functions, the Glorot uniform weight initializer ofers a systematic method of weight initialization that can aid in training stability, gradient fow, and convergence in NNs [77,78].Taking this inquiry into account, Glorot uniform initialization was used for the demands of this article with a learning rate of 0.001.

Weakness and Limitation of the PINN Model. Te
PINNs model, while powerful, has several limitations [79].A fair weakness and limitation of the PINNs model is the requirement of a large amount of labeled data for training.To enforce physical constraints, PINNs usually rely on solving PDEs, which calls for a good understanding of the underlying physics.However, it can be difcult to get labeled data that faithfully capture the physical system, particularly in situations where access to experimental data is expensive or limited.Tis restriction may make the PINN model less useful and less generalizable [80].To address this weakness, one possible improvement is to incorporate transfer learning techniques.Trough transfer learning, performance on a target task with limited data can be improved by utilizing pre-trained models on related tasks or domains.Explicitly integrating domain knowledge into the model design is another way to enhance PINNs.One can direct the model to produce more accurate predictions by feeding it with prior knowledge in the form of physical principles, equations, or constraints.Additionally, an ensemble-based approach can be used to enhance the predictive capacity of PINNs.Instead of relying on a single neural network, multiple networks with diverse architectures or initializations can be trained.In this paper, we also consider various networks with distinct architectures to efectively solve the NLSGE using the PINN algorithm as presented in Algorithm 1.

Implementation
In the following section, we use Python code to build the PINN algorithm to solve the NLSGE (1) in one dimension.As an illustration, we take into account both Dirichlet and Neumann boundary conditions to validate the efectiveness of the models.

1D NLSGE with Dirichlet BCs. Consider the following one-dimensional NLSGE:
and initial conditions Te exact solution of the IBVP is given by u(x, t) � cos(πx)cost [32].

Te PINNs Algorithm
(1) Step 1: Neural Network.To obtain u(x, t) that solves (21) using the proposed method, we chose the structure of the NN to have two nodes in the input layer (x, t) and one node in the output layer that contains the prediction for the value of u(x, t).Te structure had four hidden layers, each of which contained 50 nodes (neurons).
(2) Step 2: Training Dateset.Te general training set X of this model is selected in the interior domain X Ω ⊂ (0, 1) × (0, 2) and on the boundaries Te training set we used consisted of 500 samples (x i , t i ); u(x i , t i )   500 i�1 where u(x k , t k ) is the solution of ( 21) at (x k , t k ).300 training samples were chosen from (0, 1) × (0, 2) and the rest was taken from the boundary of the domain (see Figure 2).
(3) Step 3: Loss Function.Te loss function used to train the PINN with the parameter P is given by (15) where where Te number of steps in Figure 3 (also known as the number of epochs) indicates the number of iterations used to train the model and thus the number of times the weights of the network are updated.In our case, we used 15000 epochs, which indicates that the NN was trained for 15000 passes over the training dataset.Te loss in train and test decreases as the number of epochs increases, as the fgure illustrates.As a result, using more training iterations results in smaller train and test losses, indicating that the suggested strategy produced a better solution.Additionally, the L-BFGS-B optimization algorithm produces fewer train and test losses than the Adam optimization algorithm, and combining Require: Training data, collocation points X � X Ω ∪ X Γ ∪ X 0 , contains interior and boundary points.
Applied Computational Intelligence and Soft Computing the two optimization algorithms results in smaller train and test losses.Terefore, it is preferable to use both optimizations simultaneously rather than one of them alone.
Figures 4-6 present the precise solution and result of problem ( 21) using the suggested method.Te graphs of the 2D and 3D solution plots for the model optimizations proposed in step 4 of Subsection 4.1 allow for a comparison of the two solutions.Furthermore, Figure 7 and Table 1 are used to compare the estimated solution error for the Adam, L-BFGS-B, and combined Adam and L-BFGS-B optimization algorithms.
Te solution to NLSGE (16), depicted in 3D Figures 4-6, shows that there is not much diference between the precise solution and the solution produced using the suggested technique PINN.However, the result obtained using the L-BFGS-B optimization algorithm is relatively better than that obtained using the Adam optimization algorithm, and the result obtained using the Adam and L-BFGS-B mixture is better than that obtained by both optimizers, as we can observe from Figure 7.
Te 2D line plot in Figure 8 shows comparisons of the solution of the suggested method with the exact solution at x � 0.5 with their corresponding absolute error by diferent optimization algorithms.As we can see in Figures 8(a), 8(c), and 8(e), the line plots of the two solutions overlap, suggesting that they are possibly much related.Observing the result for the selected optimization algorithm, as we can see from Figures 8(b), 8(d), and 8(f ), the result obtained utilizing the L-BFGS-B optimization approach is relatively more successful than the one obtained using the Adam optimization technique, and the result produced using the combination of Adam and L-BFGS-B is of higher quality than both of them.
Te precise answer and the suggested method are compared in Table 1, and the results are explained using L 2 , L ∞ , relative and mean square error.Tis comparison also shows that the PINN approach with the L-BFGS-B optimization algorithm yields a better solution than the one with the Adam optimization algorithm and that the solution resulting from the combination of Adam and L-BFGS-B is better than the individual algorithms with the least amount of absolute error.It takes a longer time for the model to compile when both techniques are used simultaneously.Te training error varies as the number of training samples increases, as seen in Figure 9.It shows that the training error increases for the frst few training samples before gradually decreasing for the remaining training trials.Tis fnding indicates that using few samples results in high error rates and using more training samples is preferable to getting good results with low error rates.

Error Analysis and Computational Time
(2) Error on a Validation Set.Te training error is important to fnd out whether our model can be applied to any input data and still produce accurate results, even if it performs exceptionally well on training data (the error is small).According to this method, τ � (x i , t i ; u i ) should be randomly divided into two disjoint sets; a training set and a validation set, where τ represents the set of all available data.
Figure 10 illustrates how, for a given number of training samples, the error initially reduces.Still, we fnd that even for a relatively modest collection of additional training examples, the error is close to 0. As the training set size grows, the error remains consistently insignifcant.
(3) Computational Time.Costly computations are involved when the number of training samples is increased.When a large number of training samples were taken into account, the code execution was incredibly slow.Tis is shown in Figure 11  (4) Test Error vs. Computational Time.We can examine the performance of our machine learning model in further depth thanks to the plot that depicts the dependence of the test error on the computational time required.Our goal is to create a model that performs well (test loss is minor) and that can be completed in a reasonable amount of time.
Te decrease in test loss is initially accompanied by an increase in processing time, as seen in Figure 12.Even when the model takes longer to run, we see that this pattern disturbs and that the test loss is essentially constant.
(5) Discussion on the Number of Nodes in the Neural Network.We investigate how the size of the NN afects our model's performance by using fve diferent NN layouts in the model construction and test loss collection.We fx our NN structure having four hidden layers and conduct experiments for 30, 50, 100, 150, and 200 nodes of the NN architectures.Te test loss vs. computing time for the aforementioned NN structures for the NLSGE Dirichlet BCs example is depicted in Figure 13.Te graph shows how the test error changes as the number of iterations (i.e., the processing time) rises for these fve diferent NN settings.Figure 14 illustrates the absolute errors between the results of the proposed model and the exact solution for various nodes of the NN structures.Te error for the NN architecture containing 50 is very close to zero relative to others,  indicating that our model showed the greatest performance improvement when the number of nodes is 50.Furthermore, a comparison between the solution produced using the suggested method and the exact one, based on L ∞ , relative and mean square error for the fve distinct nodes, is presented in Table 2.
As we can see from the table, NNs with node counts of 30, 100, 150, and 200, along with the corresponding hidden layers, have a nearly uniform pattern, and the NN with node 50 gives smaller L ∞ , relative and mean squared error, indicating that the suggested approach is efcient for the NN architecture with 50 nodes.
(6) Discussion on the Selection of Activation Function.When using PINNs to solve PDEs, the choice of activation functions have an impact on the performance and convergence of the model.We provided a comparison between a few well-known activation functions in to determine which activation best minimized the loss function of our suggested model.
In Table 3 the approximation error of the proposed model based on L ∞ , relative and mean square error is presented for the activation functions of Tanh, sigmoid, and ReLu.According to the fndings, applying the sigmoid activation function yields a better result than the ReLu    activation function.However, compared to the other methods, using the tangent hyperbolic function (tanh) the best approximation with the least amount of error.Figure 15 demonstrates a comparison of the suggested approximate error line plots for these three diferent activation functions.Te graph shows that the inaccuracy of the tanh activation function is nearly zero compared to the other lines, which also indicates that the tangent hyperbolic function is appropriate for our proposed model.

1D NLSGE with Neumann
with the Neumann boundary conditions and initial conditions Te function u(x, t) � cos(πx) cos(t) satisfes the (31) and conditions ( 26)-( 29) [32].Te NN with two nodes in the input layer (x, t), one node in the output layer (value of u(x, t))), and four hidden layers, each with 50 nodes, produces u(x, t)), which solves (25), for the given input (x, t)).We set this model's epoch count to 15000.

Training Dataset.
Te training set we used in this example consisted of 500 samples (x i , t i ); u(x i , t i )   500 i�1 where u(x k , t k ) is the solution of (31) at (x k , t k ) found by a PDE solver python ofers deepxde.300 training samples were chosen from (0, 1) × (0, 2) and the rest was taken from the domain boundary.

Loss
Function.Similarly to the above example, the loss function is expressed as the summation of the square of the diference corresponding to each of the equations in (31).Te loss function used to train the PINN with the parameter P is given by (15) where where   Since we have previously demonstrated that, for example one, the combined Adam and L-BFGS-B optimization algorithm the optimal optimization for our model, we employed this mixed optimization technique to minimize the loss function of the problem (31).Similarly, we used the tanh as an activation function to predict the solution of the suggested instances given by (31) by PINN.
Figures 17 and 18 show the exact solution and the resulting PINN solution with the corresponding absolute error to the problem (31).Te graphs of the 2D and 3D solution plots for the combined model optimization Adam and L-BFGS-B allow a comparison of the two solutions.
Te distinction between the precise and PINN solutions is seen in Figure 17(c), and we observe that the diference between these solutions equals mostly zero, which suggests a reasonable match between these two solutions.Te overlap between the line plots representing the precise solution and the predicted solution, as seen in Figure 18(a), indicates that our suggested model provides an excellent approximation with the least amount of error, as demonstrated by the corresponding error plots in Figure 18(b).

Conclusions and Outlook
In this paper, we have presented a deep learning frameworkbased approach known as PINNs for the solution of nonlinear SGE with source terms.To solve efciently the proposed problem, we provided PINN with a multi-objective loss function that incorporates the initial condition, Dirichlet/Neumann boundary conditions, and governing PDE residual over randomly selected collocation points in the problem domain.We used a feedforward deep neural network with two input layers, four hidden layers, and one output layer to train the PINN model.Te weights of the feedforward NNs were initialized using a Glorot uniform initialization, also called a uniform Xavier initialization, which is the most appropriate when employing a symmetric activation function, such as the tanh or sigmoid.We looked at the NLSGE with Dirichlet and Neumann boundary conditions as benchmark examples to demonstrate how well the suggested model performed.We conducted several experiments and utilized graphs and tables to simulate the results using the Python DeepXDE software module.Te PINN model's train and test loss for both the Dirichlet and Neumann boundary conditions are decreasing with respect to training iterations; this suggests that the model is making progress in resolving the given problem by improving its approximation of the NLSGE solution.Te experiment on choosing the optimal optimization method for the proposed problem shows that the L-BFGS-B model optimization algorithm yields better results than the Adam optimization strategy.However, integrating the two gives the best result, but compiling the model takes more time.Furthermore, three activation functions ReLU, Sigmoid, and hyperbolic tangent (tanh) function are examined to determine the best choice of activation function to utilize with the suggested model.Results indicate that the tanh activation function produces the most accurate results, whereas the ReLU activation function produces the least accurate results (see Table 3 and Figure 15).Graphs and tables are used to depict the simulation for comparison between the exact solution and the PINN-predicted solution.Te results show that the method can accurately capture the solution for the NLSGE, with the diference being extremely close to zero.To further strengthen the foundation of PINN for solving diferent classes of physical phenomena involving PDEs, further investigation must be performed in future work.More research is required to examine the stability, convergence, and robustness of the suggested method to solve NLSGE.Furthermore, investigating higher-order and multidimensional variants of the SGE can improve PINNs' ability to represent the complex dynamics and behavior of non-linear waves.Moreover, real-time simulations and greatly increased computational efciency can be attained via the implementation of adaptive and parallelizable PINN architectures, such as Extended PINNs, Bayesian PINNS, Multi-fdelity PINNS, and Adaptive PINNs.

Figure 2 :
Figure 2: Illustration of the training dataset.

Figure 3 :
Figure 3: Train and test loss of PINN process for 15000 epochs (training iterations) for the dirichlet BCs case by using (a) adam, (b) L-BFGS-B, and (c) combined adam and L-BFGS-B optimization algorithms.

) ( 4 )
Step 4: Training Process.With the training samples, we apply the loss function (10) to obtain the blue line in Figure 3, which indicates that the train loss function decreases with the number of model training repetitions.At the same time, we calculate the loss function on the test samples using (20).

( 1 )
Training Error.Te training error provides insight into how well the predicted outputs of the training inputs ft the training outputs, i.e., how the model performs in the training set.
below (time is given in seconds).We take into account eight diferent training samples, each having the following sizes: 5, 15, 25, 55, 90, 185, and 350.We can see that the compilation time increases with the size of the training set size diferences.Te link between the size of training samples and the amount of time needed for compilation or model training is depicted in Figure 11.Te training samples that were mentioned have specifed sizes of 5, 15, 25, 55, 90, 185, and 350.Te fgure shows that the amount of time needed for model compilation or training increases with the number of training samples.Tis implies that the amount of time required for these procedures and the quantity of training samples are positively correlated.

Figure 4 :Figure 5 :
Figure 4: 3D plots of the (a) PINN solution and (b) true solution for the dirichlet BCs using the adam optimization algorithm.

Figure 6 :
Figure 6: 3D plots of the (a) PINN solution and (b) true solution for the dirichlet BCs using combined adam and L-BFGS-B optimization algorithms.

Figure 7 :
Figure 7: 3D plots of the PINN point-wise absolute error (the diference of exact and PINN solution) for the dirichlet BCs using (a) adam (b) L-BFGS-B, and (c) combined adam & L-BFGS-B optimization algorithms.

Figure 8 :Figure 9 :Figure 10 :Figure 11 :
Figure 8: Te line plots of the comparison of PINN-predicted with the exact solution and the corresponding absolute error for NLSGE dirichlet BCs case at t � 0.5 using the adam optimization algorithm (the frst row (a) and (b)), the L-BFGS-B optimization algorithm (the second row (c) and (d)), and the combined adam and L-BFGS-B optimization algorithms (the third row (e) and (f )).

Figure 15 :Figure 16 :Figure 17 :
Figure 15: Te comparisons of PINN approximation error to handle the dirichlet BCs case for tanh, sigmoid, and ReLu activation functions at t � 0.5.

Figure 17 :Figure 18 :
Figure 17: 3D plots of the (a) PINN solution, (b) true solution, and (c) PINN point-wise absolute error (the diference of exact and PINN solution) to the problem (25) with the neumann BCs.

2
Applied Computational Intelligence and Soft Computing employing deep learning.Te authors derived and proposed a numerical approximation method that aims to overcome the related drawbacks.Tey solved examples including the heat equation, the Black-Scholes model, the stochastic Lorenz equation, and the Heston model, and showed that the proposed approximation algorithm is efective in high dimensions in terms of both accuracy and speed.
(8)ustration of a fully connected deep feedforward neural network with two nodes, (x i , t i ), in the input layer and one node,  u(x i , t i ), in the output layer.Layer 0 is the input layer, the layers l − 1 & l are the hidden layers with associated weight matrix functions w l & bias vectors b (l−1) & b(l), and the layer L is the output layer.Input data passes through the network by following(8).

Table 1 :
Te comparisons of error approximation for diferent model optimizers.

Table 2 :
Te comparisons of PINN approximation errors for diferent nodes.

Table 3 :
Te comparisons of PINN error approximation for diferent activation functions.