A Class of New Metrics for n-Dimensional Unit Hypercube

As early as 1971, Zadeh introduced a geometric interpretation of fuzzy sets by stating that they can be represented as points in unit hypercube [1]. Many years later, his idea was taken up by Kosko, who built a promising fuzzy-theoretical framework and geometry thereon [2, 3]. This geometry of fuzzy sets was used in [4] to develop the fuzzy polynucleotide space. He demonstrated a polynucleotide molecule as a point in an n-dimensional unit hypercube. This approach enabled us to make quantitative studies such as the measurement of distances, similarities, and dissimilarities between polynucleotide sequences. The n-dimensional unit hypercube enriched by a metric d is named fuzzy polynucleotide space (In, d) with I = [0, 1] ⊂ R which is a metric space. Torres and Nieto [5, 6] considered the frequencies of the nucleotides at the three base sites of a codon in the coding sequence as fuzzy sets to give an example on I. Later, Dress, Lokot, and Pustyl’nikov have pointed out that the metric is under the L 1 -norm and showed the metric properties [7]. Because the fuzzy sets which come from the polynucleotidemolecules reflect the information of those sequences, we may introduce the related concept in information theory to measure the differences between polynucleotide sequences. In information theory, the relative entropy (also called Kullback-Leibler divergence) is the most common measure to show two probability distributions. But it is not a metric for it does not satisfy symmetric and triangle inequality [8]. In the past time, many pieces of research [9– 15] were made to improve the relative entropy. From those references, Jensen-Shannon divergence as an improvement of relative entropy receivedmuch attention. In this paper, a class of new metrics inspired by the Jensen-Shannon divergence are introduced in the n-dimensional unit hypercube. These metrics with information-theoretical property of logarithm can replace the former metric d in the fuzzy polynucleotide space.


Introduction
As early as 1971, Zadeh introduced a geometric interpretation of fuzzy sets by stating that they can be represented as points in unit hypercube [1].Many years later, his idea was taken up by Kosko, who built a promising fuzzy-theoretical framework and geometry thereon [2,3].This geometry of fuzzy sets was used in [4] to develop the fuzzy polynucleotide space.He demonstrated a polynucleotide molecule as a point in an n-dimensional unit hypercube.This approach enabled us to make quantitative studies such as the measurement of distances, similarities, and dissimilarities between polynucleotide sequences.The n-dimensional unit hypercube enriched by a metric  is named fuzzy polynucleotide space (  , ) with  = [0, 1] ⊂ R which is a metric space.Torres and Nieto [5,6] considered the frequencies of the nucleotides at the three base sites of a codon in the coding sequence as fuzzy sets to give an example on  12 .Later, Dress, Lokot, and Pustyl'nikov have pointed out that the metric is under the  1 -norm and showed the metric properties [7].
Because the fuzzy sets which come from the polynucleotide molecules reflect the information of those sequences, we may introduce the related concept in information theory to measure the differences between polynucleotide sequences.In information theory, the relative entropy (also called Kullback-Leibler divergence) is the most common measure to show two probability distributions.But it is not a metric for it does not satisfy symmetric and triangle inequality [8].In the past time, many pieces of research [9][10][11][12][13][14][15] were made to improve the relative entropy.From those references, Jensen-Shannon divergence as an improvement of relative entropy received much attention.In this paper, a class of new metrics inspired by the Jensen-Shannon divergence are introduced in the n-dimensional unit hypercube.These metrics with information-theoretical property of logarithm can replace the former metric  in the fuzzy polynucleotide space.

Preliminaries
Let  = { 1 ,  2 , . . .,   } be a fixed set; a fuzzy set in  is defined by where The number   () denotes the membership degree of the element  in the fuzzy set .We can also use the unit hypercube   = [0, 1]  to describe all the fuzzy sets in , because a fuzzy set  determines a point  = (  ( With the metric  defined, the fuzzy polynucleotide space is constructed.
Let  be a discrete random variable with alphabet Y = { 1 ,  2 , . . .,   }. ,  are two probability distributions of .Then, the relative entropy between  and  is defined as Here, ln denotes the natural logarithm for convenience.Furthermore, the Jensen-Shannon divergence is defined by where  = (1/2)( + ).
Jensen-Shannon divergence is obviously nonnegative, symmetric and vanishes for  = , but it does not fulfill the triangle inequality.And a point in the n-dimensional   is not a probability distribution.In view of the foregoing, the concept of Jensen-Shannon divergence should be generalized.If ,  are two points in   , this function   (, ) is studied: where  ∈ R. In the following sections, we discuss the function to all  ∈ R and obtain the class of new metrics.
To all  ∈ R, we wonder whether the function ((, ))  can be a metric on the space .Lemma 2. (, ) ≥ 0, with equality only for  = .
Proof.From (7) Using the standard inequality we find The equality holds if and only if  = .So   () ≥ 0,  is convex function, and the function () gets the minimum 0 when  =  for   () = 0.
Thus, the lemma holds.
To sum up the theorems and corollary above, we can obtain the main theorem.

Metric Property of 𝐹𝐷 𝛼 (𝑃,𝑄)
In this section, we mainly prove the following theorem.
Theorem 11.The function   (, ) is a metric on the space   if and only if 0 <  ≤ 1/2.
From what has been discussed above, the conclusion in the theorem is obtained.

Comparison between 𝐹𝐷 1/2 and 𝑑
As [5,6] mentioned, we focus on the RNA alphabet {U, C, A, G}.Code U as (1, 0, 0, 0): 1 shows that the first letter U is present, 0 shows that the second letter C does not appear, 0 shows that the third letter A does not appear, and 0 shows that the fourth letter G does not appear.
Thereby, C is represented as (0, 1, 0, 0), A is represented as (0, 0, 1, 0), G is represented as (0, 0, 0, 1).So any codon can correspond to a fuzzy set as a point in the 12-dimensional fuzzy polynucleotide space  12 .For example, the codon CGU would be recorded as (0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0) ∈  12 . (49) However, there exist some cases in which there is no sufficient knowledge about the chemical structure of a particular sequence.One therefore may deal with base sequences not necessarily at a corner of the hypercube, and some components of the fuzzy set are not either 0 or 1.For example, (0.3, 0.4, 0.1, 0.2, 0, 1, 0, 0, 0, 0, 0, 1) ∈  12  (50) expresses a codon XCG.In this case, the first letter X is unknown and corresponds to U to an extent of 0.3, C to an extent of 0.4, A to an extent of 0.1, G to an extent of 0.2.
It is easy to obtain that   is closer to 1 and also larger than  when 0 <  < 1/2.

Concluding Remarks
By the discussion in the above sections, we come to the main conclusion: when ,  are two points in the n-dimensional unit hypercube   ,   (, ) is a metric if and only if 0 <  ≤ 1/2.
In Section 4, the method in case (iii) can also be used to prove that triangle inequality (36) does not hold in case (ii).But the method in case (ii) is intuitive, and we can find one determinate point  beyond the existence.So we adopt the method in case (ii) when  ≥ 1.
In this paper, we extend the method in [2,[4][5][6] to discuss the new fuzzy polynucleotide space.By considering all the possible values of parameter , we obtain the new class of metrics in the space.At last, we numerically compare the new metrics   to the former metric  by computing some basic examples of codons.This shows the improvement is comprehensive.
With 0 <  ≤ 1/2, we can also study the metric space (  ,   ) using the theory of metric space, such as the Pythagoras theorem, the isometric property, the isomorphism property, and the limit property in the future.We think the new metrics can interpret more biological significance for the sequences of the polynucleotide and be useful in the bioinformatics.