Research on Community Competition and Adaptive Genetic Algorithm for Automatic Generation of Tang Poetry

As there are many researches about traditional Tang poetry, among which automatically generated Tang poetry has arouse great concern in recent years. This study presents a community-based competition and adaptive genetic algorithm for automatically generating Tang poetry. The improved algorithm with community-based competition that has been added aims to maintain the diversity of genes during evolution; meanwhile, the adaptation means that the probabilities of crossover and mutation are varied from the fitness values of the Tang poetry to prevent premature convergence and generate better poems more quickly. According to the analysis of experimental results, it has been found that the improved algorithm is superior to the conventional method.


Introduction
Tang poetry as one of the most precious cultural heritages in China for thousands of years has a great value to research.There are a large number of scholars researching on it in different ways [1].For most modern people, composing a classic poem is still of great challenge [2].In recent years, artificial intelligence plays an important role in our daily lives and it has become a hot topic in every field [3].Computer has been regarded as an artist in our eyes, and, with the help of computer programs, autogenerating system can compose several beautiful Tang poems [4].Thus, such system is of great value to be realized [5].In particular, Dr. Zhou et al. have designed a system which can compose high quality TLSD (The Lyrics of the Song Dynasty) with evolution strategy [6].However, the achievement of automatically generating Tang poetry is not accomplished.In order to write a suitable algorithm for generating Tang poetry, grasping the characteristics of Tang poetry is important thing to do first, when some factors are taken into account [7].For instance, it has several styles, such as JUEJU (a poem of four lines) and LVSHI which has a strict tonal pattern and rhyme scheme.
Genetic algorithms (GA) employ a random search for locating the globally optimal solution [8,9].It has been researching and improving since GA was put forward by Dr. Holland in 1975 [3].Now, GA has formed well-established theoretical system after the continued research for nearly 40 years [10][11][12].It is used in many fields, such as production scheduling, combinatorial optimization, automatic control, robotics, and image processing [13].GA is the core technology of the artificial intelligence and, in the latest study, adaptive genetic algorithm (AGA) [14] has become the mainstream: the crossover and mutation probability are influenced by the fitness function; it can effectively prevent the problem of premature convergence.
Therefore, the application of GA in autogenerating system for Tang poetry is reasonable and feasible.However, there are problems still awaiting resolution.For example, (1) it takes too long to compose a poem; (2) the selection of the parameters mostly comes by experience rather than a lot of verification, such as crossover and mutation probability which are not flexible; (3) premature problem that a population for poetry converges too early results in being suboptimal.
It is easily trapped in the local optimum and premature convergence appeared.To solve these problems, this study has developed an Automatic Tang Poems Generating System (ATPGS); the system ATPGS has the following two advantages: First, the community competition will classify the group when initial population was formed.It can not only effectively shorten the time but also avoid the problem of prematureness.Second, AGA will make crossover and mutation probabilities become more reasonable in this system.Furthermore, a coding scheme is proposed based on level and oblique tones, elitism algorithm, and roulette wheel selection.
Above all, this work represents progress in the field of autogenerating system of Chinese poetry.The rest of the paper is organized as follows: Section 2 shows details of related work; Section 3 describes the implementation details of the system and the optimization method for algorithm and presents the main framework of the algorithm; Section 4 analyzes the improved performance of the verification system; Section 5 provides the conclusions.

Related Work
Tang poetry is regarded as a pearl in the long history of China; there is no lack of formal researches on computer-assisted poetry's autogenerating system.The autogenerating system of Tang poetry is a complex task, as it involves several levels of language like phonetics, lexical choice, syntax, and semantics and usually requires a considerable amount of input knowledge [1,2].This study simply describes the process of autogenerating system of Tang poetry generation by genetic algorithms.The research has not detailed its branch study, such as thesaurus generation, semantic relatedness, and the study of poetry participle.

About Semantic Relatedness.
The study of words semantic relatedness can be viewed in various linguistic resources, such as Wikipedia or large scale text corpora for methods like Latent Semantic Analysis (LSA).Computing the semantic relatedness degree of words in Tang poetry is key to many applications such as searching, clustering, and automatically generation of poem.Aiming to increase computing efficiency and accuracy of semantic relatedness, the process of LSA was improved, which uses representation of words semantic by "words-by-poetry categories" instead of "words-by-poems."Meanwhile, the experiment which obtained segmentation words from more than 40000 poems is designed and the study computes relatedness by cosine value calculated from decomposed cooccurrence matrix with Singular Value Decomposition (SVD) method.

About Poetry Participle.
In the poetry participle the ATPGS adopts the Nash Equilibrium Theorem which is of another area's research.The research results provide a more appropriate thesaurus, and when the composing of Tang poetry is finished, it will check the quality of Tang poetry automatically.

About Adaptive Genetic
Algorithm.After research on a variety of adaptive genetic algorithm consideration, in this study the most suitable method for the characteristics of Tang poetry are selected [15,16].
The main formulas are as follows: The parameters are as follows: pc is the crossover probability; pc max is the maximum crossover probability; pc max is the minimum crossover probability;  max is the maximum fitness value;  avg is the average fitness value;  is the fitness value for the parameter.Significant findings are as follows: when adaptive degree is greater than the average value, the greater the adaptive degree the smaller the probability of the corresponding crossover.When adaptive degree is less than the average value, it is with the greater crossover probability.Consider pm ( The parameters are as follows: pm is the mutation probability; pm max is the maximum mutation probability; pm min is the minimum mutation probability;  max is the maximum fitness value;  avg is the average fitness value;  is the fitness value for the parameter.When it is greater than the average value, the greater adaptive degree is, the bigger the mutation probability is.When the adaptation is less than the average, the mutation probability is less.

Concrete Steps to Achieve
3.1.Coding Scheme.Choosing a suitable coding scheme in this system is very important.Currently, there is no quite accurate and efficient encoding scheme for the autogenerating of Tang poetry.Therefore, according to the previous literature of Professor Zhou, we proposed a coding scheme for the autogenerating system [17,18], and, based on the patterns of level and oblique tones of Tang poetry, we convert it into the binary; "0" stands for level tones, "1" stands for oblique tones, " * " stands for "both are acceptable."

Generation of Initial Population.
According to the preexisting verses in Tang poetry, the words are randomly chosen to generate sentences into standby database syntax.Main steps are the following: (1) Find related words according to the keywords to make a candidate space.
(2) In accordance with the previous word in the verses to find other words whose correlation is less large, add them into the candidate space in turn.Repeat until the candidate space is large enough.
(3) Add the words in their appropriate place randomly.
(4) Find words which satisfied the rhyme in the candidate space and fill in the relevant position.
(5) Repeat steps ( 3) and ( 4) to generate population groups for community competition.It is not only helpful to achieve global optimal, but also helpful to reduce the calculation time.

Fitness Function.
The calculation of fitness function of genetic algorithm is very important, and it has influence on crossover probability and mutation probability, as well as the iteration stop condition of genetic algorithm [4].This fitness function should try to meet the following conditions: (1) being single-valued, continuous, nonnegative, and maximized; (2) reasonability and consistency; (3) less calculation; (4) strong versatility.
The fitness function we set is mainly based on the parameters like level and oblique tones, antithetical parallelism, the prevalence of the words, and so on.
Level and oblique tones are as follows: they are the most basic requirements of judging a good poem.The system will exclude the verse that did not meet the requirement of it as far as the poem is generated.It is only one of the requirements of judging, but not the fitness function parameters.
Antithetical parallelism is as follows: for Tang poetry, JUEJU do not require antithetical parallelism generally, but LVSHI must use the antithetical parallelism.Therefore, it is a task for modern people to write Tang poetry.As antithetical parallelism can improve the mood of the poetry, if the system generates verses of antithetical parallelism, the parameter will be larger than other sentences without it.
The frequency of words is as follows: for a poem with the specific theme, some words appear with a high frequency.Then, if the Tang poetry which generated automatically has words that appear with a high frequency, then this poem will be relatively high fitness value.On the contrary, if the poem is composed of remote words, then the fitness value of this poem will be the low side.
Pattern matching is as follows: build a database of different kinds of verses that belong to specific themes.Each theme stores about 200-300 verses.The result of the pattern matching is related to the relevancy of the verse of the corresponding theme.If the frequency rate is over 70%, then it can be considered that the generated Tang poetry has a fitness value; otherwise, the fitness value will be reduced.

Selection Operator.
Selection operator is also called the copy operation, for selecting the most adaptable individual by individual fitness function value from groups.Generally speaking, the selection will allow individuals with high fitness to reproduce more offspring, while the other individual will produce less offspring or even is eliminated [19].We adopt elitism algorithm, roulette algorithm, and sons competitive in selection method.In the standard genetic algorithm, the next generation of each individual in the population is birth from the crossover and mutation of individual in the mating pool.However, in real life, each group includes individuals from the older generation and the new generation.Moreover, all the adaptable new individuals will also make their parent individual genes eliminated, making the optimization process slower.This competition mechanism makes excellent individual preserved and helps to keep the superior genes and to improve the convergence speed.If the fitness of parents and their offspring is equal, then the offspring will have the priority to get into the next generation.

Crossover and Mutation Operator.
In this paper we use the adaptive genetic algorithm.The crossover and mutation probability are changing according to the fitness.ATPGS uses the heuristic crossover to make the whole poem as a swap space to ensure the preservation of the superior individual and the inferior individuals will be eliminated [15].The greater the genetic algorithm crossover, the faster the speed of the new individual generated.However, when the crossover probability is too large, the possibility that genetic model is destroyed is greater.It will lead to the individual structures with high quality destroyed soon.However, if the crossover probability is too small, it will make the search process slower and even stagnant.When the population of each individual's fitness tends to converge to local optimum, the crossover probability will increase [14,19].
The mutation probability is based on the principles of genetic variation in the biogenetics to change the value of a gene or genes of an individual coding string according to a certain probability.If the mutation rate is too small, it is unlikely to produce a new individual structure; if the mutation probability value is too large, then the genetic algorithm will turn to a purely random search algorithm.When the fitness of each individual of the population tends to converge to local optimum, the mutation probability will increase, and when the group fitness scattered, the mutation probability will decrease [14,19].

Community Competition.
Community competition is essentially a way of parallel computing; this method can effectively solve the "premature convergence" issue which appears in the genetic algorithm.It will convert local optimum into global optimum, and, because of parallel computing, you can save a lot of computing time.Its main operation is to apply all the steps mentioned above to each community, so that 10 communities will compose a best poem of their own.After competing one more time, the poem of highest fitness value will survive.

Experimental Results
In this section we provide the experimental results of community competition and adaptive genetic algorithm.First, experiments are designed to find reasonable settings for the presented implementation framework in the previous sections by tuning parameters such as mutation probability,

Analysis of the value of individual fitness
Producing an optimal individual in group 1

Generating new individuals
The same operation with group 1 The same operation with group 1 The same operation with group 1 The same operation with group 1 Producing an optimal individual in group 2 Producing an optimal individual in group 3 Producing an optimal individual in group p  crossover probability, and elitist probability [20,21].Elitist probability is adopted to allow the best organism from the current generation to carry over into the next generation which can guarantee that the solution quality obtained by the GA will not decrease from one generation to the next.Second, experiments are designed to demonstrate how the value of evolution time can impact the results.The experiments set the crossover probability and the mutation probability to be constant, and the value of evolution time is changeable.For the purpose of accuracy of report, we set different constraint condition.The experimental results show that when the evolution time is more than 40, the composing time exceeds 60 s.Considering it is unfriendly to spend more than 60 s producing a poem, so evolution time more than 40 is no longer taken into account in the experiments.Finally, we compare the effectiveness of the four genetic algorithms including noncommunity competition and nonadaptive genetic algorithm (NCNA), noncompetition competition and adaptive genetic algorithm (NCA), community competition and nonadaptive genetic algorithm (CNA), and community competition and adaptive genetic algorithm (CA).

Tuning Crossover Probability and Elitist Probability to
Find Reasonable Settings.It is worth tuning parameters such as the mutation probability, crossover probability, and elitist probability to find reasonable settings.
To compare with experimental data of crossover probability based on the community competition and adaptive genetic algorithm, we set the probability of elitist to 0.01 and evolution times to 20 and change the crossover probability.Statistics are shown in Table 1 in a certain range.
Through the reorganization of data analysis, we can get more intuitive icons.Under the same conditions, when the value of crossover probability is 0.7, 0.8, and 0.9, the generations of the poems are likely to reach the high score.
Based on Table 1, 0.7, 0.8, and 0. 9 are chosen as crossover probabilities to find whether the elitist probability will produce a changeable result.From the comparison of Table 2, these figures are changing with some clear trend.With more careful comparisons, when the elitist probability is 0.01 and evolution time is 40, the scores are almost always higher than others.Results indicate that 0.01 is the best elitist probability to compose a high quality poem when evolution has enough times.

Influence of Different Evolution
Times.We find that the times of evolution can impact the result, so we set the crossover probability and the mutation probability to be constant to demonstrate the changing value of different evolution times.For the purpose of accuracy of report, we set different constraint condition.The data of score are shown in Table 3.
These data show that the quality of the generated poetry will get better with the increase of the evolution times.In this system, score over 2000 points can be accepted as a high quality poem.Then we can conclude that when the value of evolutionary time is about 30, the convergence of genetic algorithm will be more reasonable.

Comparison of Different Evolution Types.
In this part, we will compare the effectiveness of the four evolution types.When the mutation and crossover probability are in the same circumstances, we need to discuss the connection between evolution times and the quality of autogenerating poems.The scores of the poems are listed as follows.
Table 4 shows that the quality of the poems generated by community competition and adaptive genetic algorithm is better than other three genetic algorithms.
Figure 2 plots the concise result of different evolution types by average score of three kinds of parameters.CA genetic algorithm can not only generate good poem quickly but also improve poem quality constantly with the increasing evolution times.It can be also seen from Figure 2 that the adaptive method plays an important role in the early stage, but with the increase in the number of evolution times, the role is becoming more and more limited.

Conclusions and Future Work
Poetry composition is a difficult task in the field of language generation.The highlight in ATPGS is the community competition, adaptive crossover, and mutation operation.However, with the utilizing of abundant lexical resources, making autogenerating system of Tang poetry comes true.Poems composed by our system are with high quality no matter in rhyme or semantics.Furthermore, there will be the antithesis of the sentence which is much better than the online website of Tang poetry's generating.We can conclude that community-based and self-adaptive genetic algorithm to compete in the autogenerating system of Tang Poem has a better effect.In the future, researches would focus on quantities style evaluation criteria so that computers can automatically label poem to their distinctive bold-and-unrestrained or gracefuland-restrained values.We plan to incorporate more criteria into the constrained summarization framework, which should consider aspects like the structure, novelty, and semantics of poems.

Figure 2 :
Figure 2: The average score of different evolution types.

Table 1 :
Experiments based on crossover probability.

Table 2 :
Experiments based on elitist probability.

Table 3 :
Experiments based on different evolution times.

Table 4 :
Experiments based on mutation probability and elitist probability.