Fractals and Hidden Symmetries in DNA

This paper deals with the digital complex representation of a DNA sequence and the analysis of existing correlations by wavelets. The symbolic DNA sequence is mapped into a nonlinear time series. By studying this time series the existence of fractal shapes and symmetries will be shown. At first step, the indicator matrix enables us to recognize some typical patterns of nucleotide distribution. The DNA sequence, of the influenza virus A H1N1 , is investigated by using the complex representation, together with the corresponding walks on DNA; in particular, it is shown that DNA walks are fractals. Finally, by using the wavelet analysis, the existence of symmetries is proven.


Introduction
The main task of this paper is to show the existence of hidden geometries which underly the structure of a DNA sequence.Moreover, it will be shown that this geometry is fractal.In order to achieve this goal the fundamental steps are 1 the choice of the digital representation of the symbolic sequence of DNA, 2 the definition of the indicator matrix, 3 the construction of walks on DNA, 4 the cluster analysis of wavelet coefficients.
In this paper it will be shown that the distribution of nucleotides A, C, G, T along the sequence must fulfill some hidden geometrical rules, thus implying that the biological activity depends on these geometrical rules.The understanding of the underlying biological function from a possible interpretation of the given sequence of nucleotides 1-6 is still under investigation.
The existence of hidden law, periodicities, and statistical correlations 2, 7-9 might help us to characterize each DNA sequence in order to construct a possible functional classification.
From mathematical point of view the DNA sequence is a symbolic sequence of nucleotides with some empty spaces no coding regions .In order to get some numerical information from this sequence it must be transformed into a digital sequence.When the symbolic sequence of A, C, G, T is digitalized into one or more sequences of digits one may benefit from the statistical analysis of the digitalized time series, so that the genome can be characterized by the classical statistical parameters like variance, deviation, or nonclassical like complexity, fractal dimension, or long range dependence.
There follows that the symbolic sequence is transformed into a very large time series from half million of digits, for the primitive organisms such as fungus and eukaryotes and, to several millions, as for mammals, like the nearly 1.5 billion of nucleotides for the humans DNA .However, these large sequences look like some random sequences, from where it seems to be quite impossible to single out any single correlation see, e.g., 8 and references therein .
In any case the arbitray choice of the representative digital time series discrete time signal for the symbolic sequence of the genome, so that the representation would be the most suitable for the statistical-mathematical analysis, is a difficult task, that was approached with some interesting preliminary results by using a complex representation 10, 11 .
The easiest mathematical model for the transformation of a symbolic string into a numerical string is based on the Voss indicator function 12, 13 which is a discrete binary function.In the following a suitable generalization is given and it will be shown that the graphical representation gives rise to some featuring patterns.The existence of patterns and symmetries is shown also through the cluster analysis of the wavelet coefficients 14, 15 .
The analysis of DNA by wavelets 7, 9, 16 , as seen in 9, 16-19 , is an expedient tool to single out local behavior and to characterize singularities as local spikes and jumps 7, 14 or to express the scale invariance of coefficients 20 and thus the multifractal nature of the time series 21-23 .However, as shown below, the wavelet transform features also a decorrelation of the sequence, so that it allows the emergence of the basic rules of the uncorrelated sequence.We will see that the wavelet coefficients of the short Haar wavelet transform are quantized.This can be achieved by a decomposition of the sequence into short segments of equal length and by a wavelet transform to be applied to each segment.The long range-correlation in the digital representation of the DNA sequence 12, 13, 24-34 is a fundamental problem in DNA analysis.Correlation in a digital signal can be roughly linked with the concept of dependence, in a statistical sense, of elements which are far away from each other.The existence of correlation in DNA has been explained with the so-called process of duplication-mutation.According to 29, 35 in the evolutionary model the actual DNA sequence results from an original short-length chain that was duplicating and modifying some pieces of the sequence.Due to this there followed the characterizing 1/f power law decay 13, 26, 27 .The power law for long-range correlations is a measure of the scaling law, showing the existence of self-similar structures similar to the physics of fractals.The long-range correlation, which can be detected by the autocorrelation function, implies the scale independence scale invariance which is typical of fractals.
The power law for long-range correlations is a measure of the scaling law, showing the existence of self-similar structures similar to the physics of fractals.The long-range correlation, which can be detected by the autocorrelation function 12, 13, 30-32, 36, 37 , implies the scale independence scale invariance which is typical of fractals.However, the preliminary results in this topics were disputed 26, 27, 38 because of the limited number of available data and because of different approaches to this analysis.On the other hand the existence of patchiness and correlation would imply some important understanding of DNA organization.Therefore, in the following we will discuss the correlation of a DNA virus sequence, roughly 2000 base pairs bp , with the undertsanding that a well-defined concept of correlation holds only on a long sequence of the order higher than 10 5 .However, the most important are the fractal properties and symmetries which are well defined even for a short sequences.The identification and classifications of these patches could be the key point for understanding the large-scale structure of DNA.
Due to the recent outbreak of 2009 H1N1 Flu Virus Swine Flu in humans, this paper will focus on the DNA influenza virus for similar results on a mammalian and a fungus see 10, 11, 20 with particular regards to the A H1N1 variant provided by the National Center for Biotechnology Information 3-6 : It will be shown that as the dog and candida DNA the influenza virus is characterized by DNA walks with fractal shape.The most amazing is that the same symmetries seen on the wavelet coefficients of the DNA walks for dog and candida holds true also for viruses.The fractal dimension of the indicator matrix for this virus is 2.03 much higher than dog's 11 ; moreover some difference in the DNA walks for segment 4 will be highlighted.
This paper is organized as follows.Section 2 deals with some preliminary remarks on flu epidemiology; DNA and DNA representation together with the indicator matrix is given in Section 3.Here the global fractal estimate is computed and the existence of fractal patterns is shown.The complex cardinal representation is given in Section 4 and the DNA complex walks are analysed in Section 5.It is proven that DNA complex walks are fractals and they are compared with walks on pseudorandom and deterministic complex sequences.Section 6 deals with correlation, power spectrum, and complexity of DNA.Sections 7 and 8 deal with wavelets analysis and show the existence of simmetries in the wavelet coefficients.

Flu Epidemiology
Flu epidemics cause morbidity and mortality worldwide.Each year, only in the USA, more than 200000 patients are infected by influenza and there are approximately 36000 deaths due to influenza virus.Of the three types of influenza virus-A, B and C-the A and B types can cause flu epidemics.Influenza A virus is found not only in humans but also in many other animals.There are over hundreds of subtypes of Influenza A virus.All subtypes have been detected in wild birds, which are considered the source of influenza A viruses in all other animals.For example, pigs may be infected with influenza A viruses from different species e.g., ducks and humans at the same time, which may allow the genes of these viruses to mix, creating new variants of the hemagglutinin and/or neuraminidase proteins on the surface of the virus antigenic shift .If these variants spread to humans, then they would not be recognized by the immune system and so can cause seasonal epidemics of flu.In addition, influenza viruses undergo mutations when they spread from place to place and therefore introduce gradual changes in the hemagglutinin and/or neuraminidase proteins antigenic drift .It will be shown however, that even if there are some variants of the same virus at different places, still the DNA structure remains the same at least in the indicator matrix, see below , without significant variations.In other words the DNA sequence might apparently show some differences, but when we pass to the digital representation and the indicator matrix, these differences vanish.
Each year, it is essential to identify new flu virus variants and produce vaccines against them to avoid flu epidemics.Therefore the investigation of DNA sequence of variants might help to better understand the intrinsic nature of variation.
The Centers for Disease Control and Prevention CDC and other health organizations are actively investigating the recent outbreak of 2009 H1N1 Flu Virus Swine Flu in humans.First cases were reported at the beginning of 2009.CDC has determined that this swine influenza A H1N1 virus is contagious and is spreading from human to human.Swine Influenza is a respiratory disease of pigs swine caused by type A influenza virus that regularly causes outbreaks of flu in pigs.Like all influenza viruses, swine flu viruses change constantly.Pigs can be infected by avian influenza and human influenza viruses as well as swine influenza viruses.When influenza viruses from different species infect pigs, the viruses can reassort i.e., swap genes and new viruses that are a mix of swine, human, and/or avian influenza viruses can emerge.There are four main influenza type A virus subtypes that have been isolated in pigs: H1N1, H1N2, H3N2, and H3N1, but most of the recently isolated influenza viruses from pigs have been H1N1 viruses.While swine flu viruses do not normally infect humans, sporadic human infections with swine flu have occurred.Most commonly, these cases occur in persons with direct exposure to pigs; human-to-human transmission of swine flu can also occur, as is the case with the 2009 outbreak.
An influenza A virion is composed of the nucleocapsid, a surrounding layer of the matrix protein M1 and the membrane envelope.The envelope contains two major surface glycoproteins, that is, hemagglutinin HA and neuraminidase NA , and a minor membrane protein M2.The nucleocapsid consists of individual ribonucleoproteins vRNPs .Each vRNP contains one of the 8 genomic negative sense RNA segments vRNA , multiple copies of the major structural protein NP, and a few copies of the RNA dependent-RNA-polymerase complex.All 8 vRNA species must be present in an infectious virion.
A virion attaches to the host cell membrane via HA and enters the cytoplasm by receptor-mediated endocytosis, thereby forming an endosome.A cellular trypsin-like enzyme cleaves HA into products HA1 and HA2.HA2 promotes fusion of the virus envelope and the endosome membranes.A minor virus envelope protein M2 acts as an ion channel thereby making the inside of the virion more acidic.As a result, the major envelope protein M1 dissociates from the nucleocapsid and vRNPs are translocated into the nucleus via interaction between NP and cellular transport machinery.In the nucleus, the viral polymerase complexes transcribe and replicate the vRNAs.Newly synthesized mRNAs migrate to cytoplasm where they are translated.Posttranslational processing of HA, NA, and M2 includes transportation via Golgi apparatus to the cell membrane.NP, M1, NS1 nonstructural regulatory protein , and NEP nuclear export protein, a minor virion component move to the nucleus, where bind freshly synthesized copies of vRNAs.The newly formed nucleocapsids migrate into the cytoplasm in an NEP-dependent process and eventually interact via M1 with a region of the cell membrane, where HA, NA, and M2 have been inserted.Then the newly synthesized virions bud from infected cell.NA destroys the sialic acid moiety of cellular receptors, thereby releasing the progeny virions.

Patterns on the Indicator Matrix
The DNA of each organism of a given species is a long sequence of a specific large number of base pairs bp .The size of the DNA might range from 10 5 to 10 9 number of base pairs.Each base pair is defined on the 4 elements alphabet of nucleotides: be the finite set alphabet of nucleotides and x ∈ A any member of the alphabet.A DNA sequence is the finite symbolic sequence the value x at the position h.

Indicator Matrix
The 2D indicator function, based on the 1D-definition given in 12 , is the map

3.8
According to 3.7 , the indicator of an N-length sequence can be easily represented by the N × N sparse symmetric matrix of binary values {0, 1} which results from the indicator matrix This squared matrix can be plotted in 2 dimensions by putting a black dot where Figure 1 u hk 1 and white spot when u hk 0.

Indicator Matrix for the Influenza Virus A H1N1
The data, under investigation, refer to influenza virus A H1N1 Tver, Novosibirsk , Japan Nagasaki , America Mexico City, Rio Grande do Sul, Rio de Janeiro and Turky Ankara .
The plots of indicator matrix Figure 1 show that 1 there are some motifs which are repeated at different scales like in a fractal; 2 empty spaces are more distributed than filled spaces, in the sense that the matrix u hk is a sparse matrix having more zeroes than ones ; 3 it seems that there are some square-like islands where black spots are more concentrated; 4 some indicator matrix can be grouped into different sets like, a Tver, Novosibirsk; b Nagasaki, Rio GRande do Sul, Rio De Janeiro, Ankara; and c Mexico City.
From the analyis of Figure 1 we can also notice that even if there is a big distance among different places, the corresponding virus does not change Nagasaki, Rio do Janeiro and Ankara are nearly the same .

Fractal Dimension
From the indicator matrix we can have an idea of the "fractal-like" distribution of nucleotides.The fractal dimension for the graphical representation of the indicator matrix plots can be computed as the average of the number p n of "1" in the randomly taken n × n minors of the N × N correlation matrix u hk : The fractal dimension of the influenza virus A H1N1 is 2.30±0.1 while that for the dog DNA and candida was 1.66 ± 0.01 see, e.g., 11 .However some interesting coinciding values can be observed in the following table where the fractal dimension up to 10 −2 shows the same groups already seen in the matrix shapes of Figure 1

Complex Representation
The digital representation of a DNA sequence is defined as the map of S into R , ≥ 1.The embedding space of representation is based on the 4 vectors in the real space R , or almost equivalently in the complex space C , so that X x ≡ X x is a -ple which is associated with the symbol x ∈ A.
There follows that the basic elements of the representation are The digital representation in R or C of a N-length DNA sequence is the map R : defined as follows.Each element of the DNA sequence can be considered 39 as the linear combination: The graph of Y n Y n is G and if we define it is a n c n g n t n n, 4.8 so that, as a consequence of 4.6 and the definition 4.7 , the following identity holds: We have a degeneracy or a loop, circuit, or periodicity if it is see, e.g., 40 Y n 0 or, equivalently, If, we do not have a degeneracy, then there is a one-to-one correspondence between the DNA sequence S and G, that is, S ↔ G.

Cardinal Complex Representation
In the remaining part of the paper we consider the cardinal representation 11 in C 1 , so that the DNA digital representation is the N-length one-dimensional complex signal {Y n } n 0,..., N−1 .In this case, from 4.6 we have so that the representation is a map S → C 1 and the time series Y n is a sequence of complex numbers: {Y n } n 0,...,N−1 , Y n ξ n η n i. 4.14

DNA Walks
DNA walk is defined as the series which is the cumulative sum on the DNA sequence representation: Taking into account 4.7 , 4.11 , for the complex cardinal representation, it is so that the DNA walk is the complex values signal {Z n } n 0,..., N−1 with where the coefficients a n , g n , t n , c n given by 4.7 and fulfil condition 4.8 .
The DNA walk DNA series on a complex cardinal representation is a complex series as well.If we map the points whose coordinates are the real and the imaginary coefficients of each term of the DNA walk sequence, we obtain a cluster showing the existence of some patches or some kind of selfsimilarity Figures 2 and 3 .Both figures for the influenza virus Figures 2 and 3 show that there exists a fractal behavior of the random walk on DNA sequence.Moreover, focussing on some segments of the DNA walks it can be seen that there are some featuring patterns see, e.g., Figure 3, with respect to the base pairs between 200 and 500 .
Let us now compare the DNA walks with walks on pseudorandom and deterministic sequences.
A pseudorandom white noise complex sequence similar to the cardinal complex representation 4.11 can be defined as follows: with r n , s 2 being random integers and it looks like Its random walk is A deterministic walk can be Figure 4

5.10
It can be seen Figure 4 how the fractal shape of DNA walk is completely different from corresponding walks on random and deterministic sequences.

Statistical Correlations in DNA
For a given sequence {Y 0 , Y 1 , . . ., Y N−1 } the variance is and the variance at the distance N − k is 6.2 From the variance follows immediately the standard deviation σ σ 2 .
A simplified definition of correlation, in the fragment F − N has been given 41 as follows: with the indicator given by 3.7 .
The power spectrum can be computed as the Fourier transform of c k : c n e −2πink/N .

6.6
If c k 0, there is no linear correlation, c k > 0 means that there is a strong linear correlation anticorrelation when c k < 0 , while c 0 1 does not give any information about correlations.A true random process has a vanishing correlation c h δ 0h and its power spectrum S h is constant.Its integral gives the Brownian motion random walk whose power spectrum is proportional to 1/k 2 .
It has been shown 24, 34, 42 that correlations in DNA are linear.However, the main problem of this measure is that it strongly depends on the representation, on the length of the sequence, and, for nonbinary representation, it is affected by spurious results 36 .Moreover, the definition 6.4 holds only for real values of the representation.

6.7
Mathematical Problems in Engineering The power spectrum of the sequence {Y n } n 0,...,N−1 , that is, the mean square fluctuation, is defined as 43

6.8
The power spectrum of a stationary sequence gives an indirect measure of the autocorrelation.A long-range correlation can be detected if the fluctuations can be described by a power law so that with α > 1/2.The fluctuation exponent α, with its values, characterizes a sequence as For the human DNA there was observed 44 a long range correlation, only for coding regions, with α 0.61.However, the same value can be seen also for dog's and candida DNA 11 for the complete sequence coding and noncoding regions , even if, by including the noncoding regions, this value is a little bit higher being α ∼ 0.65 for the dog's, and α ∼ 0.62 for the candida's DNA, respectively.For the Influenza virus A it is instead α 0.02 Figure 5 .
When the power spectrum is a power-law function then this function is scale invariant like fractals , that is, f λx λ H f x .It can be shown see, e.g., 29 that for the power-law functional dependence, in the N → ∞ limit, it is with k 1 − a, and a being related to the so-called Hurst exponent.In other words, for a power-law function the power spectrum is scale-invariant like a fractal .
In particular, from Equation 6.11 there follows that when b 1 and a 0, the correlation function has a slow decay to zero and the spectrum is more properly called 1/f-noise or white noise .This spectrum appears in many natural phenomena noise in electronic devices, traffic flow, signals, radio-antenna, turbolence .
The power spectrum 1/k has been observed in DNA sequences 30, 37 , however it is not yet clear how the correlation function should be step-function, power law decay, whitenoise , and in particular if there exists a single length scale or a multilength scale 29 .This scale-dependence or self-similarity of DNA cannot yet been explained from biological point of views.A possible explanation could be the dynamic process of the evolution or maybe the functional activity inside constrained domain like the fractal shape of brain, lungs, etc. which might have some influence on the spatial geometry 38 .
The biological explanation of long-range correlations can be explained by the existence of heterogeneity in DNA i.e., different density distribution of bases .The main questioning is about the power law spectrum: 1/k 35 or 1/k 2 29 .Indeed it has been observed that the the power spectrum is nearly flat for low and high frequencies and only for the central part has a power low decay.
However, 29, 30 the existence of long range correlation in DNA should be intended from statistical point of view in the sense that far away base pairs tend to have similar variation.In other words, this correlation should be understood as a periodic distribution of base pairs without a causality law between base pairs located at different segments far away from each other.

Complexity
The existence of repeating motifs, periodicity, and patchiness can be considered as a simple behavior of sequence, while nonrepetitiveness or singularity is taken as a characteristic feature of complexity.In order to have a measure of complexity, for an n-lenght sequence, 44 the following has been proposed: with By using a sliding n-window 44 over the full DNA sequence one can visualize the distribution of complexity on partial fragment of the sequence.For the whole sequence the asymptotic constant value is Figure 6 K ∼ 1.3, 6.14 that has been observed also for other DNA sequences 11 .Moreover, as can be seen from Figure 6, initially Tver and Novosibirsk DNA shows a lower complexity see also Figure 1 .

Wavelet Analysis
Wavelet analysis can be considered as a good tool 19, 24, 29, 42 for studying the heterogeneity in a time series and in particular in a DNA sequence.Heterogeneity can be shortly described as follows: in some fragments of DNA there exists a higher concentration of nucleotides C, G with poor distribution of A, T while, on the contrary, other fragments are more rich of A, T and poor of C, G see Figure 1 .Thus a fundamental problem is to make a partition of a DNA sequence into homogenous segments.This segmentation can be done by minimizing the variance or maximizing the entropy 36 .
The wavelet transform expresses the signal in terms of dilated and scaled instances of the wavelet basis functions.If we call W f x 0 the wavelet transform of the signal f x computed in x x 0 at the scale 2 −n and h x 0 the local H ölder exponent, it is 24 W f x 0 2 −nh x 0 .Therefore, wavelet transform is one of the most expedient tools for detecting singularities.It can be used to define a generalization of box-counting method, the so-called wavelet transform modulus maxima, in order to focus on scaling behavior 24 and to visualize the multifractal property.
In this section some fundamentals on Haar wavelet theory will be given and applied to the analysis of DNA sequences.

Haar Wavelet Basis
The Haar scaling function ϕ x is the characteristic function on 0, 1 ; its family of translated and dilated scaling functions is defined as

7.1
The Haar wavelet family {ψ n k x } is the orthonormal basis for the L 2 R functions 45 :

Discrete Haar Wavelet Transform
where K is a real field , sampled at the dyadic points x i i/ 2 M − 1 , in the interval restricted, for convenience and without restriction, to Ω 0, 1 .The discrete Haar wavelet transform is the

20 Mathematical Problems in Engineering
Let the direct sum of matrices A, B be defined as being 0 the matrix of zero elements.The N × N matrix W N can be computed by the recursive product 14, 15 of the direct sum of the following elementary matrices.
3 Lattice derived from the recursive inclusion formulas see, e.g., 14, 15 : and in general For example, with N 4, M 2, assuming the empty set I 0 def ∅ as the neutral term for the direct sum ⊕ so that A ⊕ I 0 I 0 ⊕ A A, it follows from 7.5

Haar Wavelet Coefficients and Statistical Parameters
From 7.3 with M 2, N 4, by explicit computation, we have and 10, 11, 20

7.14
When the wavelet coefficients are given, the above equations can be solved with respect to the original data.With M 2, N 4, we have, for example,

7.15
Thus the first wavelet coefficient α represents the average value of the sequence and the other coefficients β the finite differences.The wavelet coefficients β's, also called details coefficients, are strictly connected with the first-order properties of the discrete time-series.

Hurst Exponent
Concerning the variance, from definition 6.1 we obtain by a direct computation its expression in terms of wavelet coefficients:

7.16
It has been observed 25 that for scale invariant functions the standard deviation 6.3 , as a function of the scale n, is with H being Hurst exponent, so that in a log-log plot log 2 σ 2 n n H − 1 log 2 σ 2 0 7.18 we obtain a straight line whose slope gives an estimate of H.The Hurst exponent, in terms of wavelet coefficients, can be evaluated by the following 11 .
Theorem 7.1.The Hurst exponent is given by

Algorithm of the Short Haar Discrete Wavelet Transform
In order to reduce the computational complexity of the wavelet transform 7.3 , 7.5 , the sequence Y can be sliced into subsequences and the wavelet transform is applied to each slice.With the reduced Haar transform 10, 11, 20 it is possible to reduce the number of basis functions and the computational complexity.

8.2
For example, the reduced wavelet transform W 4,2 to be compared with W 8 is

Clusters of Wavelet Coefficients
Significant information on a time-series can be derived not only from the wavelet coefficients but also from clusters of wavelet coefficients.For the N 2 M -length real vector Y the wavelet transform W N Y represents a point in the N-dimensional Euclidean space For the N 2 M -length complex vector Y the wavelet transform is applied to the real W N R Y and to the imaginary part W N I Y and gives either 1 point in or a cluster of N points in the product of 2 dimensional spaces: where the star denotes the wavelet coefficients of I Y .In each 2-dimensional phase space R 2 i there is only one point and these single points do not give any significant information about the existence of some autocorrelation of data.By using, instead, the p-parameter short Haar wavelet transform we can analyse the cluster of points For a complex sequence {Y k } k 0,...,N−1 {x k iy k } k 0,...,N−1 we can consider the correlations if any between the wavelet coefficients of the real part {x k } k 0,...,N−1 against the imaginary coefficients {y k } k 0,...,N−1 .This can be realized by the cluster algorithm of Table 1.
This algorithm enables us to construct clusters of wavelet coefficients and to study the correlation between the real and imaginary coefficients of the DNA representation and DNA walk, as given in the following section.

Cluster Analysis of the Wavelet Coefficients of the Complex DNA Representation
The cluster algorithm of Table 1, applied to the complex representation sequence 4.11 , which is in the form shows that the values of the wavelet coefficients belong to some discrete finite sets Figure 7 .For each complex DNA representation there are 2 sets of wavelet coefficients which correspond to the real and complex coefficient of the complex values of 5.1 and 5.4 .However, even if the real and complex coefficients of the DNA walk show some nonlinear patterns Figures 2 and 3 the detail coefficients range in some discrete sets of values.It can be seen by a direct computation that the jumps from one value to another belong to some discrete sets see, e.g., Figure 7 .
If we compare the clusters of Figure 7 with the clusters Figure 8 of the pseudorandom sequence 5.6 , which is similar to the above sequence, we can see that the set of wavelet coefficients is larger still discrete than the set for the DNA although the detail coefficients have more or less the same values.
As can be seen from Figure 7, the real and imaginary coefficients of the complex DNA representation increase with a given law and the distribution of the nucleotides must follow this rule.Moreover, it should be noticed that all wavelet coefficients are distributed on symmetric grids Figure 7 .Even if the DNA representation looks like the pseudorandom sequence 5.6 , the wavelet detail coefficients are quantized and symmetrically distributed in the sense that the detail coefficients of both the representation and the DNA walks see 11 have discrete finite values Figure 7 , being, in particular, 8.12 This is not true for the pseudorandom series, because the wavelet coefficients of the sequence are still quantized see Figure 8 while the wavelet coefficients of the corresponding random walk are randomly distributed in the phase plane Figure 10 .It is very interesting to compare also the DNA walk Figure 9 with the random walk Figure 10 and random walk on deterministic sequence.DNA walk shows a clear symmetry which is missing in the others.

Conclusion
In this paper some fractal shapes and symmetries in DNA sequences and DNA walks have been shown and compared with random and deterministic complex series.DNA sequences are structured in such a way that there exists some fractal behavior which can be observed both on the correlation matrix and on the DNA walks.Wavelet analysis confirms by a symmetrical clustering of wavelet coefficients the existence of scale symmetries.

Figure 2 :Figure 3 :Figure 4 :
Figure 2: DNA walk of the influenza virus A.

Figure 5 :
Figure 5: Power spectrum for influenza virus A DNA walk.

Figure 6 :
Figure 6: Complexity for the first 100 base pairs of influenza virus A DNA.