A Comparative Dynamical Analysis of Hebrew Texts

Three Hebrew texts, one of them the Hebrew bible, are investigated using dynamical analysis. First, the average mutual information for each of the texts is determined. The first minimum occurs at T=3 in all three cases, suggesting that 3-letter words are sufficient to study the dynamical properties of the texts. Using 3-letter words as the state space for each of the text, we construct a Markov chain model and compute the relative measure–theoretic entropy for each of the texts and use this tool as a means of comparing the information content of the three texts.


INTRODUCTION
The Hebrew language consists of 22 letters and a space that separates words.Five of the letters have also final forms, which are used at the end of the words.Some vowels in the middle of a word are omitted in the old language while they are put in the modern texts.We removed these vowels from all the texts in order to obtain texts that can be more uniformly compared.Vowels which we ignored in- side a word in order to produce more uniform word data in all three texts are: alef, vav, yud, ayin.
Let X denote the space of 28 basic letters (Fig. 1).
To study the statistical dynamics of a text we consider a transformation T that takes the first 3 letters of a word to the first 3 letters of the next od Tet Chet Zayin Vav H:e word.If a word has only 2 letters, the space following the word is appended to the word itself.If a word has only one letter we append two spaces after it to create a triple representing this word.
A. BOYARSKY AND P. GISRA In this way we obtain a data set consisting of a sequence, each element of which is a 3-letter sequence.Depending on the text under considera- tion there are approximately 7000-8000 such 3-letter admissible sequences (maximum is 27 x 28 x 28).
for T 2, 3 and then starts increasing, attaining the minimum at T= 3.This shows that the fourth letter of a word is more dependent on the first letter than the third letter of the word.Hence the fourth letter provides less information about the text than does the third letter.

STATE SPACE
We use 3-letter sequences because that size pre- sents a respectable indication of word structure without creating an overly large space.Four-letter sequences would create a space with hundreds of thousands of elements, making it difficult to ana- lyze even with the most powerful computers.Since the average word size is between 3 and 4, 3-letter words give a good indication of the word itself.
We suggest a number of measures for "richness" of text.One such measure is simply based on an average attainable set of 3-sequences.Let xyz repre- sent a valid 3-letter sequence in a Hebrew text.We compute all the 3-letter sequences that are attain- able from xyz in the given text.We repeat this for all possible 3-letter sequences and then compute the average number of attainable sequences over all admissible 3-letter sequences.A rich text will have a relatively large such average.
The following argument supports our choice of 3- letter representation of words.For all three texts, we calculated the average mutual information AMI(T) ( [1]) between the first letter xl i) and the (T)th letter x( of the word, T-2,..., 6.

AMI(T) Z
where rx) and rxi.
are the probabilities (frequencies) of individual letters and pairs of let- ters in the text, and changes from to the number of words in the text (approximately 8000).For all texts the first minimum of AMI(T) occurs at T= 3. We interpret this fact as follows: the dependence of the letter xron the letter Xl of the word decreases [2] and was a source of examples in Shannon's work on communication theory [3].See also [4,5].)Motivated by the results of Section 1, we use only the first 3 letters of each word (adding extra spaces for 1-and 2-letter words).Each word is considered as a state and the transition probabilities are cal- culated from the flow of the text.We describe this in detail now.
Let N denotes the number of words in the text.Our texts contain around 80000 words each.In general, we assume that the number of words is "large".For any admissible triple of letters uvw let Nuvw denote the number of words represented by uvw in the text but without including the last word, i.e., considering only words 1,2,...,N-1.Simi- larly, for any admissible triple of letters uvw let N',v denote the number of words represented by uvw in the text but without including the first word, i.e., considering only words 2,...,N-1,N.The numbers N, and N', are equal for all except at most two admissible triples, and for these two they differ at most by 1.We can artificially add one word at the end to the text to make them always equal.In the sequel we assume that they are equal, as we are interested only in the ratios N,vw/N.
To calculate transition probabilities, for any uvw we count the number of consecutive pairs represented by uvw and xyz, N,w, xyz and we set Nuvw,xyz for any admissible uvw and xyz.
We will show that the numbers P,vw N,vw/N are stationary probabilities for this Markov chain, if we assume N, vw/N N,vw/N, for all uvw.PROPOSITION If Nuvw/N Nuvw/N, for all uvw, then the numbers P,w N,w/N are stationary prob- abilities for the Markov chain with transition prob- abilities given by (1).
Proof We have to show that for any xyz P*Y: Z PuPw,xyz. (2) Obviously, we have Nxyz--,w N,w,xyz, for any xyz.If Nuvw/N N,w/N, for all uvw, we can re- write this as Nxyz/N-Z(Nuw/N)N-uw,xz/N.uw, W which is equivalent to (2).Now the entropy of the Markov chain is given by the formula entropy Z Z PuuwPuuw,x)z log(Puw,x), uvw xyz where the logarithm is to the base 2.

RESULTS
In this section we present the numerical results obtained from analysis of the texts.The average number of the possible different following words and standard deviation of these numbers: In the following table we present the entropy and the maximal possible entropy (as defined in [3]) for the three texts.The maximal possible entropy is the logarithm (to base 2) of the number of states in the Markov chain.The numbers of different admissible triples are different in the three texts.

Text
Entropy Number Max. of  The text used in analysis were: Koren text of Hebrew bible and two books obtained in electronic form from Bar Ilan University: Hebrew text 1: "Hitganvut Yehidim" (Heart Murmur) by Joshua Kenaz, published by Am Oved Tel-Aviv, ISBN 965-13-0413-8.
Hebrew text 2: "An Autobiography" by N. Lorekh, now a historian, formerly a military officer and diplomat.

Call for Papers
As a multidisciplinary field, financial engineering is becoming increasingly important in today's economic and financial world, especially in areas such as portfolio management, asset valuation and prediction, fraud detection, and credit risk management.For example, in a credit risk context, the recently approved Basel II guidelines advise financial institutions to build comprehensible credit risk models in order to optimize their capital allocation policy.Computational methods are being intensively studied and applied to improve the quality of the financial decisions that need to be made.Until now, computational methods and models are central to the analysis of economic and financial decisions.However, more and more researchers have found that the financial environment is not ruled by mathematical distributions or statistical models.In such situations, some attempts have also been made to develop financial engineering models using intelligent computing approaches.For example, an artificial neural network (ANN) is a nonparametric estimation technique which does not make any distributional assumptions regarding the underlying asset.Instead, ANN approach develops a model using sets of unknown parameters and lets the optimization routine seek the best fitting parameters to obtain the desired results.The main aim of this special issue is not to merely illustrate the superior performance of a new intelligent computational method, but also to demonstrate how it can be used effectively in a financial engineering environment to improve and facilitate financial decision making.In this sense, the submissions should especially address how the results of estimated computational models (e.g., ANN, support vector machines, evolutionary algorithm, and fuzzy models) can be used to develop intelligent, easy-to-use, and/or comprehensible computational systems (e.g., decision support systems, agent-based system, and web-based systems) This special issue will include (but not be limited to) the following topics: • Computational methods: artificial intelligence, neural networks, evolutionary algorithms, fuzzy inference, hybrid learning, ensemble learning, cooperative learning, multiagent learning 3 ENTROPY FOR A MARKOV CHAINWe model a text by a Markov chain on 3-letter se- quences.(Textual analysis is what originally moti- vated Markov to introduce Markov chains in 1911

•
Application fields: asset valuation and prediction, asset allocation and portfolio selection, bankruptcy prediction, fraud detection, credit risk management • Implementation aspects: decision support systems, expert systems, information systems, intelligent agents, web service, monitoring, deployment, implementation