This paper deals with usage of an alternative tool for symbolic regression—analytic programming which is able to solve various problems from the symbolic domain, as well as genetic programming and grammatical evolution. This paper describes a setting of an optimal trajectory for a robot (originally designed as an artificial ant on Santa Fe trail) solved by means of analytic programming. Firstly, main principles of analytic programming are described and explained. The second part shows how analytic programming was used for the application of finding a suitable trajectory step by step. Because analytic programming needs evolutionary algorithms for its run, three evolutionary algorithms were used—self-organizing migrating algorithm, differential evolution, and simulated annealing—to show that anyone can be used. The total number of simulations was 150 and results show that the first two used algorithms were more successful than not so robust simulated annealing.

The term “symbolic
regression” represents a process during which measured data is fitted and a suitable mathematical formula is
obtained in an analytical way. This process is well known for mathematicians. It is used when a mathematical model of unknown data is needed. For long time, symbolic regression was a domain
of humans but in the last few
decades, computers have gone to foreground of interest in this field. Firstly,
the idea of symbolic regression done by means of computer was proposed by Koza in genetic programming (GP) [

Genetic programming was the first tool for symbolic synthesis of the so-called programs done by means of computer instead of humans. The main
idea comes from genetic algorithms (GAs) [

The other tool is GE
which was developed in the last decade of 20th century by Conor Ryan. Gramatical evolution has one advantage
compared to GP which
is the ability to use arbitrary
programming language not only LISP as in the case of the cannonical
version of GP. In contrast to other
evolutionary algorithms, GE was used with a few search strategies with a binary representation of the
populations [

This contribution demonstrates the use of a method which
is independent of computer
platform, programming language, and can use any evolutionary algorithm (as
demonstrated in [

Basic principles of the
AP were developed in 2001. Until that time, mainly GP and GE existed. GP uses genetic algorithms while AP can be used
with any evolutionary algorithm, independently of individual representation. To avoid any confusion
based on the use of names according to the used algorithm, the name Analytic programming
was chosen, because AP stands for synthesis of analytical solution by means of
evolutionary algorithms [

AP was inspired, in general, by numerical methods in Hilbert spaces and by GP. Principles of AP [

functions: sin, tan, And, Or, and so forth,

operators: +, −, *, /, dt, and so forth,

terminals: 2.73, 3.14,

All these “mathematical” objects create a set which AP tries to synthesize
the appropriate solution from. The set of mathematical objects are
functions, operators, and so-called terminals (usually constants or independent variables). All these objects are mixed together as shown in Figure

General function set (GFS).

This nested structure is necessary that the main principle of AP can work without any
difficulties. The core of AP is based on discrete set handling, proposed in [

Discrete set handling.

Briefly said, DSH works with integer indexes which represent
numerical or nonnumerical expressions (operators, functions, etc.) in a
discrete set. This index then serves like a pointer into a discrete set. Based on that,
appropriate objects are chosen for cost function evaluation [

The nested structure presence in GFS is vitally important for AP. It is used to avoid synthesis of pathological programs, that is, programs containing functions without arguments, and so forth. Performance of AP is, of course, improved if functions of GFS are expertly chosen based on experiencies with solved problem.

The important part of the AP is a sequence of mathematical operations which are used for program synthesis. These operations are used to transform an individual of a population into a suitable program. Mathematically said, it is mapping from an individual domain into a program domain. This mapping consists of two main parts. The first part is called discrete set handling (DSH) and the second one is security procedures which do not allow synthesizing pathological programs.

Discrete set handling proposed in [

Analytic programming is basically a series of function mapping. Figure

Main principles of AP.

To avoid synthesis of pathological
functions a few security “tricks” are used in AP. The first one is that
GFS consists of subsets containing functions with the same number of arguments.
Existence of this nested structure is used in the special security subroutine
which is measuring how far the end of individual is, and according to it, objects from different subsets are selected to avoid
pathological function synthesis. Precisely, if more arguments are desired than
possible (the end of the individual is near), function will
be replaced by other function with the same index pointer from subset
with lower number of arguments. For example, it may happen
that the last argument for one argument
function will not be a terminal (zero-argument
function). If
pointer is bigger than length of subset, that is, the pointer is 5 and is used

GFS needs to be constructed not only from clear mathematical functions as demonstrated but also from other user-defined functions, which can be used, for example, logical functions, functions which represent elements of electrical circuits, or robot movement commands.

Today, AP exists in three versions:

Security procedures (SPs) are in the AP as well as in GP, used to avoid various critical situations. In the case that AP security procedures were not developed for AP purposes after all, but they are mostly integrated parts of AP. However sometimes they have to be defined as a part of cost function, based on kind of situation (e.g., situation 2, 3, and 4, etc., see what follows). Critical situations are like

pathological function (e.g., without arguments, self-looped),

functions with imaginary or real part (if not expected),

infinity in functions (e.g., dividing by 0),

“frozen” functions (e.g., extremely long time to get a cost value: hours).

Simply as an SP can be regarded here mapping from an integer individual to the program which is checked for how far the end of the individual is, and based on this information, a sequence of mapping is redirected into a subset with lower number of arguments. This satisfies that no pathological function will be generated. Another activities of SP are integrated part of cost function to satisfy items 2–4, and so forth.

Because AP was partly inspired by GP, then between AP, GP, and GE are some differences as well as some logical similarities. A few of the most important ones are as follows.

I. Similarity

Synthesized
programs: AP as well as G0P and GE is able to do symbolic regression in general
point of view. It means that output of AP is according to all important
simulations [

Functional set:

II. Differences

Individual coding:
coding of an individual is different. Analytic programming uses an integer index instead of direct representation as in canonical GP.
Grammatical evolution uses binary representation of an individual, which is consequently converted into
integers for mapping into programs by means of BNF
[

Individual
mapping: AP uses discrete set handling, [

Constant handling:
GP uses a randomly generated subset of numbers, constants, GE utilises user-determined constants and AP uses only one constant

Security procedures: to guarantee synthesis of nonpathological functions, procedures are used in AP which redirect the flow of mapping into subsets of a whole set of functions and terminals according to the distance to the end of the individual. If a pathological function is synthesized in GP, then synthesis is repeated. In the case of GE, when the end of an individual is reached, then mapping continues from the individual beginning, which is not the case of AP. It is designed so that a nonpathological program is synthesized before the end of the individual is reached (maximally when the end is reached).

During AP development and research simulations, a lot of various kinds of programs have been
synthesized. In (

Random synthesis of
function from GFS, 1000 times repeated: the aim of
this simulation was to check if pathological function can be generated by AP.
In this simulation, randomly generated individuals
were created and consequently transformed into programs and checked for their internal structure. No pathological program was identified [

sin(

Solving of ordinary differential equations (ODE):

Solving of ODE: (4 +

Boolean even and symmetry problems according to [

Sextic and Quintic
problems [

Simple neural network synthesis
by means of AP: a simple few layered NN synthesis was tested by AP [

Such elementary objects are usually
simple mathematical operators (+, −, *,

The Santa
Fe trail,
demonstrated in Figure

Santa Fe trail.

The aim of the task is that an artificial ant should go through defined trail and eat all food which is there. From a simple point of view, it can be looked at it as on robot movements on some trail. Robot trajectory is, of course, very complex task but the more complex behaviour can be added later in further simulations.

The Santa
Fe trail is defined as a

In the real world, robots have obstacles in their moving. Therefore,
also in this case, such approach was chosen. The first problem which ant has to overcome is the simple hole (position (8,27) in Figure

The set of functions used
for movements of the ant is as follows. As a set of variables

The set consist of

Left: function for turning around in the anticlockwise direction,

Right: function for turning around in the clockwise direction,

Move: function for moving straight and if bait is in the field where the ant is moved, it is eaten.

This set of functions is
not enough to make successfully a desired task. More functions are necessary, then
a

The aim of the ant is to
eat all food on the way. There are 89 baits. This is so called raw fitness, and
the value of cost function (

The aim is to find such
formula whose cost value is equal to zero. To obtain an appropriate solution, two constraints should be set up into a
cost function. One is a limitation concerned to the number of steps. It is not
desired to the ant
to go field by field in the grid. A requirement to the fastest and the most effective way is desired. Then a
limit of steps was equal to 600. According to the original assignment, 400 steps should
be sufficient, but as the work in [

The second constraint could be concerned to the length of the list of commands for an ant. The longer can cause the more steps to reach all food is eaten. In this preliminary study, this constraint was not set up, but in further studies, a penalization concerned this constraint will be surely used.

In
this paper, self-organizing migrating algorithm (SOMA), differential evolution (DE),
and simulated annealing (SA) were used as an evolutionary algorithm. For
detailed information, see [

Differential evolution is a
population-based optimization method that works on real-number-coded
individuals [

Differential evolution is robust, fast, and effective with global optimization ability. It does not require that the objective function is differentiable, and it works with noisy, epistatic, and time-dependent objective functions.

SOMA is a stochastic optimization
algorithm that is modeled on the social behaviour of cooperating individuals [

The novelty of this approach is that the PRT Vector is created before an individual starts its journey over the search space. The PRT Vector defines the final movement of an active individual in search space.

The randomly generated binary perturbation vector controls the allowed dimensions for an individual. If an element of the perturbation vector is set to zero, then the individual is not allowed to change its position in the corresponding dimension.

An individual will travel a certain distance (called the PathLength) towards the leader in

Simulated
annealing is one of
older algorithm compared to SOMA and DE. It was introduced by Kirkpatrick et
al. for the first time [

This approach was used in the case of simulated annealing including terms. It starts off from a randomly selected point. Then, a certain number of points (depends on user) are generated in the neighbourhood. The point with the best cost value is selected to be the middle of new neighbourhood (start point for a new loop). However, it is possible to accept also worse value of cost function. The acceptance is based on a probability which decreases with the number of iterations. In the case that the best cost value is in the start point, this one is chosen for the next loop. This approach is basic and some other improvements were done during research in this algorithm.

The main idea is to show that SOMA, DE, and SA are able to solve such problems of symbolic regression–-setting a trajectory–-under analytic programming.

50 simulations were
carried out for each algorithm (i.e., 150 simulations in total). SOMA and DE
have almost all simulations with positive results; only one case in both
algorithms did not reach the extreme. SA was not so successful, only 14
positive results. To show that AP is able to work with arbitrary evolutionary
algorithms, we suppose to carry simulations out with genetic algorithms (GAs)
and other algorithms, and also parallel computing is intended in this field. Data from all
simulations were processed and vizualised in [

In simulations made for the purposes of
this article, following setting was used to run SOMA, DE, and SA according to Tables

Setting of SOMA.

Parameter | Value |
---|---|

PathLength | 3 |

Step | 0.22 |

PRT | 0.21 |

PopSize | 200 |

Migrations | 50 |

MinDiv | -0.1 |

Individual length | 50 |

Setting of DE.

Parameter | Value |
---|---|

NP | 200 |

F | 0.8 |

CR | 0.2 |

Generations | 700 |

Individual length | 50 |

Setting of SA.

Parameter | Value |
---|---|

10 000 | |

0.000 01 | |

0.986 | |

MaxIter | 1 500 |

MaxIterTemp | 93 |

Individual length | 50 |

Firstly, the results show values of cost function evaluations. This parameter shows good performance of analytic programming. As can
be seen in Table

Cost function evaluation for SOMA, DE, and SA.

Cost function evaluation | |||
---|---|---|---|

SOMA | DE | SA | |

Minimum | 3 396 | 4 030 | 2 697 |

Maximum | 134 114 | 136 011 | 98 241 |

Average | 61 966 | 66 620 | 50 142 |

Figure

Graphical representation of minimal, maximal, and average values of cost function evaluation for SOMA, DE, and SA.

Second indicator depicts histogram
of successful hits and the number of cost function evaluations for each hit
(see Figures

Histogram of SOMA algorithm.

Histogram of DE algorithm.

Histogram of SA algorithm.

Another creation of
histograms can be made from the point of view of number of cases (axe

Histogram of SOMA algorithm: the number of cases in specific intervals of cost function values.

Histogram of DE algorithm: the number of cases in specific intervals of cost function values.

Histogram of SA algorithm: the number of cases in specific intervals of cost function values.

Next
point, which we were interested in,
was a number of commands for the ant and number of steps required to eat all
baits (Tables

Number of commands.

Number of leaves (commands) | |||
---|---|---|---|

SOMA | DE | SA | |

Minimum | 11 | 11 | 15 |

Maximum | 50 | 50 | 50 |

Average | 32 | 32 | 26 |

Number of steps.

Number of steps | |||
---|---|---|---|

SOMA | DE | SA | |

Minimum | 396 | 367 | 406 |

Maximum | 606 | 604 | 605 |

Average | 547 | 540 | 535 |

Sorted numbers of steps and commands for all algorithms.

SOMA | DE | SA | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Sorted by steps | Sorted by commands | Sorted by steps | Sorted by steps | Sorted by commands | Sorted by steps | ||||||

396 | 49 | 594 | 11 | 367 | 49 | 599 | 11 | 406 | 25 | 577 | 15 |

399 | 36 | 596 | 11 | 387 | 49 | 592 | 12 | 406 | 25 | 592 | 16 |

409 | 21 | 568 | 14 | 390 | 50 | 564 | 13 | 409 | 23 | 605 | 16 |

409 | 22 | 594 | 14 | 409 | 18 | 542 | 14 | 503 | 22 | 592 | 17 |

409 | 23 | 594 | 14 | 409 | 18 | 568 | 14 | 503 | 22 | 537 | 19 |

421 | 37 | 577 | 15 | 409 | 50 | 577 | 14 | 537 | 19 | 503 | 22 |

456 | 50 | 544 | 16 | 421 | 50 | 581 | 14 | 577 | 15 | 503 | 22 |

489 | 17 | 590 | 16 | 475 | 16 | 581 | 14 | 577 | 49 | 409 | 23 |

521 | 50 | 594 | 16 | 496 | 50 | 583 | 15 | 592 | 16 | 406 | 25 |

532 | 50 | 606 | 16 | 509 | 21 | 594 | 15 | 592 | 17 | 406 | 25 |

533 | 20 | 489 | 17 | 516 | 46 | 475 | 16 | 592 | 50 | 594 | 34 |

533 | 27 | 544 | 17 | 517 | 49 | 533 | 16 | 594 | 34 | 594 | 34 |

537 | 34 | 583 | 17 | 519 | 49 | 409 | 18 | 594 | 34 | 577 | 49 |

540 | 27 | 576 | 18 | 525 | 38 | 409 | 18 | 605 | 16 | 592 | 50 |

542 | 27 | 533 | 20 | 533 | 16 | 533 | 18 | ||||

544 | 16 | 550 | 20 | 533 | 18 | 568 | 18 | ||||

544 | 17 | 409 | 21 | 533 | 20 | 584 | 19 | ||||

548 | 30 | 589 | 21 | 533 | 32 | 604 | 19 | ||||

548 | 50 | 409 | 22 | 541 | 49 | 533 | 20 | ||||

550 | 20 | 409 | 23 | 542 | 14 | 550 | 20 | ||||

551 | 43 | 559 | 24 | 550 | 20 | 509 | 21 | ||||

551 | 50 | 584 | 24 | 551 | 50 | 581 | 22 | ||||

559 | 24 | 583 | 26 | 557 | 31 | 596 | 23 | ||||

562 | 50 | 533 | 27 | 562 | 29 | 562 | 29 | ||||

568 | 14 | 540 | 27 | 564 | 13 | 557 | 31 | ||||

572 | 34 | 542 | 27 | 568 | 14 | 533 | 32 | ||||

574 | 27 | 574 | 27 | 568 | 18 | 525 | 38 | ||||

576 | 18 | 548 | 30 | 572 | 50 | 599 | 42 | ||||

577 | 15 | 537 | 34 | 573 | 49 | 516 | 46 | ||||

581 | 49 | 572 | 34 | 577 | 14 | 581 | 47 | ||||

581 | 50 | 399 | 36 | 581 | 14 | 367 | 49 | ||||

583 | 17 | 421 | 37 | 581 | 14 | 387 | 49 | ||||

583 | 26 | 551 | 43 | 581 | 22 | 517 | 49 | ||||

584 | 24 | 603 | 47 | 581 | 47 | 519 | 49 | ||||

589 | 21 | 396 | 49 | 583 | 15 | 541 | 49 | ||||

590 | 16 | 581 | 49 | 584 | 19 | 573 | 49 | ||||

592 | 50 | 596 | 49 | 588 | 50 | 589 | 49 | ||||

594 | 11 | 604 | 49 | 589 | 49 | 591 | 49 | ||||

594 | 14 | 606 | 49 | 591 | 49 | 595 | 49 | ||||

594 | 14 | 456 | 50 | 592 | 12 | 597 | 49 | ||||

594 | 16 | 521 | 50 | 594 | 15 | 601 | 49 | ||||

594 | 50 | 532 | 50 | 595 | 49 | 390 | 50 | ||||

596 | 11 | 548 | 50 | 595 | 50 | 409 | 50 | ||||

596 | 49 | 551 | 50 | 596 | 23 | 421 | 50 | ||||

601 | 50 | 562 | 50 | 597 | 49 | 496 | 50 | ||||

603 | 47 | 581 | 50 | 599 | 11 | 551 | 50 | ||||

604 | 49 | 592 | 50 | 599 | 42 | 572 | 50 | ||||

606 | 16 | 594 | 50 | 601 | 49 | 588 | 50 | ||||

606 | 49 | 601 | 50 | 604 | 19 | 595 | 50 |

Figure

Santa Fe Trail overcome by ant found by DE.

This contribution deals with an alternative algorithm for symbolic regression. This study shows that this algorithm is suitable not only for mathematical regression but also for setting of optimal trajectory for artificial ant which can be replaced by robots in real world, in industry.

In comparison with standard GP, it can be stated on the basic aforementioned results that AP can solve this kind of problems in shorter times as cost function evaluations are counted.

The aim of this study was not to show that AP is better or worse than GP (or GE when compared), but that AP is also a powerful tool for symbolic regression with support of different evolutionary algorithms.

The main object of this paper was to show that symbolic regression done by AP is able to solve also cases where linguistic terms as, for example, commands for movement of artificial ant or robots in real world are. Here, simulations for 3 algorithms: SOMA, DE, and SA were carried out. As the figures showed, SOMA and DE were more successful in positive results than SA was. This proved that a good performance of AP depends on a choice of suitable robust and powerful evolutionary algorithms.

During simulations carried in this problem following results were reached:

50 simulations for each algorithm means 150 in total for all 3 algorithms.

Positive results:

49 from 50 simulations for SOMA,

49 from 50 for DE,

and 14 from 50 for SA,

which accomplished the required tasks thus analytic programming is able to solve such kind of problems in symbolic regression. This result also says that the basic version of simulated annealing used here is not so powerful tool as other two evolutionary algorithms are. It is supposed that the cost function is very complicated with quite a lot of local optima and, therefore, the simulated annealing was not so successful as SOMA or DE were.

Solutions which fulfil conditions which were laid down by Koza
[

Future research is key activity in this field. The following steps are to finished simulations with GA and other evolutionary algorithms and to try some other class of problems to show that analytic programming is powerful tool as genetic programming or grammatical evolution are.

This work was supported by Grant no. MSM 7088352101 of the Ministry of Education of the Czech Republic and by grants of the Grant Agency of the Czech Republic GACR 102/09/1680.