Communication P Systems on Simplicial Complexes with Applications in Cluster Analysis

The purpose of this paper is to propose a new kind of P systems on simplicial complexes. We present the basic discrete Morse structure, membrane structures on complexes, and communication rules. A new grid-based clustering technique is described based on this kind of new P systems. Examples are given to show the effect of the algorithm. The new P systems provide an alternative for traditional membrane computing.


Introduction
Membrane computing is a new branch of natural computing which is initiated by Pȃun et al. at the end of 1998, as an attempt to formulate models from the functioning of living cells 1 , just like DNA computing coming from genes 2-4 .The advantage of these methods lies in its huge inherent parallelism which has drawn great attention from the scientific community so far.In recent years, many different models of P systems have been proposed, such as cell-like P systems, tissue-like P systems, and spiking neural P systems 5-9 .The obtained computing systems prove to be so powerful that they are equivalent with Turing machines 10 even when using restricted combinations of features and also computationally efficient.Up to now a number of applications were reported in several areas such as biology, biomedicine, linguistics, computer graphics, economics, approximate optimization, cryptography, and so forth.
Traditionally, Morse theory is the subject of differential topology and differential geometry where the topological spaces in question are smooth manifolds.When we want to study discrete problems, we will use combinatorial complexes rather than manifolds.Along this line, discrete Morse theory has been developed 11, 12 .Recently, discrete Morse theory has attracted many researchers because it has found applications in triangulations and graphics.In fact, simplicial complex, the basic data structure in discrete Morse theory, will prove to be an important data structure besides trees and graphs.
Spatial cluster analysis is a traditional problem in knowledge discovery from databases 13 .It has wide applications since increasingly large amounts of data obtained from satellite images, X-ray crystallography, or other automatic equipment are stored in spatial databases.The most classical spatial clustering technique is due to Han and Kamber 13 who developed a variant PAM algorithm called CLARANS, while new techniques are proposed continuously in the literature aiming to reduce the time complexity or to fit for more complicated cluster shapes.Other clustering-like problems include impulsive cluster anticonsensus of discrete multiagent linear dynamic systems 14 and driving general complex networks into prescribed cluster synchronization patterns by using pinning control 15 .In another research, the authors introduce a cooperative article bee colony algorithm for solving clustering problems 16 .Also the authors propose a DNA-based clustering method by the Adleman-Lipton model 17 .For related research, one can also refer to 18-20 .In medical analysis there often appear data clustering problems of various type.The Wisconsin Breast Cancer Showhouse was founded in 1998 as an allvolunteer 501 c 3 charitable organization by Nance Kinney, a breast cancer survivor http://www.breastcancershowhouse.org/wbcs2012/index.html .Its mission is to support breast cancer and prostate cancer research at the Medical College of Wisconsin.This breast cancer databases were obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.Many computing methods have been applied to study this data case and, in fact, the Wisconsin Breast Cancer data set is becoming an important testing benchmark for soft computing.
Inspired by the above research, this paper focuses on the joint study of discrete Morse theory with membrane computing.Our purpose is to propose a P system on simplicial complex.Up to our knowledge, this is the first paper to extend membrane computing to complexes.Then we use membrane computing in cluster analysis, providing a new approach to data mining.We first propose a discrete Morse structure for a candidate of a class of new P systems.Then we described for the first time a communication P system on simplices.Then we propose a new method for cluster analysis by simplicial P systems.Finally, we present the Wisconsin Breast Cancer analysis.

Discrete Morse Structure
In this section we present some general discrete models which will form the basis of membrane structures.The main idea comes from 11, 21 .In order to do this, we need to present some basic topological concepts.For simplicity we always assume that we are working in an Euclidean space R n .

Simplex without Orientation
A k-simplex cell σ is the convex hull of k 1 affinely independent points.More precisely, suppose a 0 , a 1 , . . ., a k are affinely independent, that is, a 1 − a 0 , . . ., a k − a 0 are linearly independent.Then σ is defined as the set of points in the form x k i 0 λ i a i , where k i 0 λ i 0 and λ 0 , λ 1 , . . ., λ k ≥ 0. We will call k the dimension of a simplex and write dim σ k, while a 0 , a 1 , . . ., a k are called vertices of the simplex.A simplex is uniquely indicated by its vertices and hence is expressed as σ a 0 , a 1 , . . ., a k , or simply σ a 0 a 1 • • • a k , and will be called a cell in this paper.
A face τ of a simplex σ is defined as a simplex generated by a nonempty subset of its vertices.We write τ < σ.A face τ is called a hyperface of σ if dim τ dim σ − 1 and is denoted by τ ≺ σ.In this case, σ is called the parent of τ.Two cells τ 1 and τ 2 are called incident if τ 1 , τ 2 ≺ σ, and σ is called the coface of τ 1 and τ 2 .Two cells σ 1 , σ 2 are called neighbors if they share a common hyperface.The cone from a vertex x to a k-simplex σ is the convex hull of x and σ which yields a k 1 -simplex xσ provided x is not an affine combination of the vertices of σ.A simplicial complex K is a finite collection of nonempty simplices for which σ ∈ K and τ ≺ σ implies τ ∈ K and σ 1 , σ 2 ∈ K implies that σ 1 ∩ σ 2 is either empty or a face of both.The underlying space of K is the union of simplices: |K| σ∈K σ.K p is a subset of K containing simplices of dimension p.
For a k-simplex σ, define K to be the collection of σ and all its faces.Then it is clear that K is a simplicial complex.We will call this simplicial complex a simple complex or, simply, a complex.Now we consider some properties of incident and neighborhood relations as described above.First suppose τ 1 , τ 2 ≺ σ are incident and dim σ ≥ 2. Then τ 1 , τ 2 , σ are faces of a simplex.Define ψ τ 1 ∩ τ 2 and we will show that ψ / ∅.This is evident because if By removing one vertex we obtain τ 1 and τ 2 with p vertices remained.Since p ≥ 2 we know that there exists at least one common vertex among τ 1 , τ 2 .By the definition of K we get ψ ∈ K and consequently τ 1 , τ 2 are neighbors.
Conversely, if τ 1 , τ 2 ψ are neighbors, they need not be incident as shown in Figure 1.In the case when they are in the same simplex, however, this is true.In fact, suppose Then we can assume that Since both cells are in the same simplex, define By the definition of neighborhood, if σ 1 , σ 2 are neighbors and their common hyperface is τ, then clearly τ σ 1 ∩ σ 2 and, consequently, this common hyperface is unique.However, there exist cells with nonempty intersection but they are not neighbors.Now we consider incident cells τ 1 , τ 2 ≺ σ.We will show that their coface is also unique.First if dim τ 1 dim Therefore σ is the edge joining the two vertices τ 1 , τ 2 and hence is unique.Next if dim τ 1 dim τ 2 ≥ 1, then as described previously we define ψ τ 1 ∩ τ 2 .Then

Simplex with Orientation
Simplicial complexes with orientation are important tools in the study of topological properties of discrete data structure.Concepts about discrete Morse functions are listed as follows Robin 11 .For a simplex σ a 0 , a 1 , . . ., a k , there are two orientations and the opposite orientation is denoted by −σ. Figure 2 shows the orientation of a three-dimensional complex.
In the following we will use a 0 , a 1 , . . ., a i , . . ., a k to denote a face of σ, where the vertex a i is eliminated.The following chain is defined as the boundary of the complex: If g i are integers, then i g i σ i is called a chain, where dim σ i remains the same within a chain.Boundary operator extends to chains naturally.An important property of boundary operator is that ∂ • ∂ 0. For a k − 1 -dimensional simplex τ k−1 and a k-dimensional simplex σ k , define its relationship operator as follows: τ ≺ σ with same orientation −1, τ ≺ σ with opposite orientation.

2.2
In simplicial complex we can define Morse functions which is a tool for optimization.Definition 2.3.Let K be a simplicial complex.A function f : K → R is a discrete Morse function if for every σ p ∈ K p the following two statements are true: Discrete gradient can also be defined on the complex Here the inner product •, • is the obvious inner product on oriented chains with respect to which the oriented simplices are orthonormal.It is easy to see that, if the edge e vu, then V v −e.Generally speaking, discrete gradient is a mapping

Traditional P Systems
Membrane is a structure serving as a protected reactor.We will identify a membrane m with its delimited space.When we say inclusion for membranes, it is always strict inclusion.Now we list some elementary concepts concerning the basic operations of membranes: i m, m are vicinal, if m ⊂ m and there is no m such that m ⊂ m ⊂ m, ii elementary membrane: with no lower vicinal membranes, skin membrane: with no upper vicinal membranes, we assume there is always a unique skin, iii degree: number of membranes, iv sibling membranes m, m : if there is a m which is upper vicinal for both m and m .
Parentheses expression is often used to describe membrane structures.For example, the membrane structure as shown in Figure 3 has a parentheses expression for membranes as follows:

3.1
For a set U, a multiset over U is a mapping M : U → N, where N is the set of nonnegative integers.For a ∈ U, M a is the multiplicity of a in M. Suppose the set of objects is O with a subset E such that objects from E are available in the environment in arbitrary multiplicities, that is, its multiset is M : E → {∞}.A P system with symport/antiport rules of degree m ≥ 1 is a construct where O is the alphabet, T ⊂ O is the alphabet of terminal objects, μ is a membrane structure of degree m, ω 1 , . . ., ω m are the multisets of objects associated with the m regions of μ, and R 1 , . . ., R m are finite sets of symport and antiport rules associated with the m membranes of μ, and i 0 is the input/output region.
A P system is called stable if, even if some rules are still applicable, their application does not change the string/object content of the membrane structure, nor the membrane structure itself.For a subalphabet W ⊂ O, we call a system stable over W if the projection over W of the string/object remains unchanged, even if some rules are still applicable.If W {a}, we will say stable over a.
, is a subset of rules, we call a P system stable with respect to the rules R if the P system with rules R is stable i.e., applications of rules in R do not change the string/object content of the system's membranes 19 .

Membrane Structures on Simplices
Now we describe membrane structures on simplicial complexes.First we assume that the complex is a simple complex, that is, a simplex with all its faces including vertices.In Figure 4 a simple three dimensional example is presented with 15 membranes.In the general case when there is a complex {a 0 , a 1 , . . ., a n } in R n , the number of simplices are The boundary relations of simplices are shown in Figure 5 where the arrows point to the boundary cells.Now we consider the general cases where K is a simplicial complex in R n .For an example of K in R 3 as shown in Figure 6, the total number of cells is 6 19 21 9 55.For example, the three dimensional simplices are listed as a 0 a 1 a 4 a 3 , a 3 a 1 a 4 a 2 , a 3 a 4 a 2 a 7 , a 1 a 5 a 2 a 4 , a 4 a 5 a 6 a 2 , a 4 a 6 a 7 a 2 .3.4  Generally speaking, a simplicial complex K is denoted by a set of vertices {a 0 , a 1 , . . ., a k } in R n .A simplex σ ∈ K is called membrane.A membrane is called a maxsimplex if it is not a face of another simplex in K.A simplex is denoted by its vertices.Evidently, vertices are zero-dimensional cells and hence are elementary membranes.If τ ≺ σ is a face of σ, then we way that σ is the parent of τ. Figure 5 shows the parents and neighborhood relations of the complex in Figure 4.For the more general simplicial complex as in Figure 6, the network model is shown in Figure 7.
If σ 1 , σ 2 are incident, we say there is an upper link channel between σ 1 , σ 2 .If σ 1 , σ 2 are neighbors, we say there is a lower link between them.A upper link is denoted by where τ is their common parent, while lower link is written as σ 1 , σ 2 τ , where τ is their common hyperface.Upper link is also written by σ 1 , σ 2 U or, simply, σ 1 , σ 2 , while lower link is denoted by σ 1 , σ 2 L .Links have no directions.Thus σ 1 , σ 2 and σ 2 , σ 1 are identical.We will specify one from these two links, and only one is allowed.
a 6 a 7 a 8  Definition 3.1.A P system on a simplicial complex K, called a simplicial P system, with antiport and symport rules is a construct where m is the number of cells labeled with 1, 2, . . ., m, O is the alphabet, E is the set of objects with unlimited multiplicity in the environment, ω 1 , . . ., ω m are initial strings over O of multiset, R 1 , . . ., R m are symport and antiport rules associated with the m membranes, ch ⊆ { i, j R , i, j L : i, j ∈ {1, . . ., m}} is the set of links, and F i, j is a finite set of antiport and/or symport rules associated with the link i, j ∈ ch.
Ceterchi and Martin-Vide 19 proposed a new type of communication P systems with priority relations.They introduced a promoter for a rule to be active and a inhibitor for it to be inactive.Induced by their idea, we will present an ordered system in this section.
An antiport rule u, out; v, in i in R i exchanges the multiset u inside R i with v outside it.A symport rule x, out i or x, in i sends out takes in the multiset x with respect to membrane R i .For a specific membrane R i , rules are totally ordered as The rule r n 1 is applicable if and only if the system has reached a stable configuration with respect to rules r 1 > • • • > r n .We can use a queue structure to represent this process.

Rules in Simplicial P Systems
Now we describe the communication rules in simplicial P systems.For our purpose in this paper, there are mainly four types of communication rules in a simplicial P system.Each type of rule may have operators such as out, in, up, down.
First suppose τ 1 , τ 2 σ is an upper link.A rule like x, y , up U means that the multiset x and y from τ 1 and τ 2 transform into z and go up to their parent σ.For two cells τ ≺ σ, the antiport rule u, out; v, in | τ ≺ σ in τ means exchanging multiset u inside membrane τ with the multiset v outside it in σ .The symport rule u, out | τ ≺ σ sends the multiset u outside τ.Another symport rule u, in | τ ≺ σ works similarly.

3.7
An lower link rule in R σ τ 1 , τ 2 may have the following forms: 3.9 Equation 3.6 means moving x from cell τ 1 and y from τ 2 to z to parent σ and simultaneously moving γ from σ to α into τ 1 and β into τ 2 .Equation 3.8 means moving x from cell τ 1 and y from τ 2 to z to hyperface σ and simultaneously moving γ from σ to α up to τ 1 and β up to τ 2 .

Configuration and Computation
Now we describe the configuration and computation of simplicial P systems.For our purpose, change of membrane structure is not involved.A configuration of a simplicial P system is the state of the system described by specifying the objects and rules associated to each membrane.The initial state is called initial configuration.Therefore, the multisets represented by ω i , 1 ≤ i ≤ d in Π, constitute the initial configuration of the system.
The system evolves by applying rules in membranes and this evolution is called computation.The computation starts with the multisets specified by ω 1 , . . ., ω m in the m cells.In each time unit, rules are used in a cell.If no rule is applicable for a cell, then no object changes in it.The system is synchronously evolving for all cells.
When the system has reached a configuration in which no rule is any longer applicable, we say that the computation halts.A configuration is stable if, even if some rules are still applicable, their application does not change the object content of the membranes.The computation is successful if and only if it halts, or it is stable.The result of a halting/stable computation is the number described by the multiplicity of objects present in the cell i 0 in the halting/stable configuration.

Problem Setting and Algorithm
Now suppose the data set to be clustered is We now construct a uniform grid in R n as follows.Choose integers n i > 0 and let δ i Q i − P i /n i , and divide the interval , where N 0 is the number of cells that do not contain data: Then we can define a new data set Ω {x 1 , . . ., x N }, where x i is each grid point x i x 1 i , . . ., x n i : Define a weight function on Ω by m x #{z i ∈ the cell corresponding to the data x}.In this way, the original data set is transformed into a new data set Ω with weights equal to the original number of points.Next we will always work on the data set Ω.For simplicity, we will consider a density-based clustering technique.For two points x, y ∈ Ω, the similarity is defined by where d x, y is the topological distance of the two points.That is, if x, y are incident, then d x, y 1. Else d x, y is the minimal number of edges which form a path connecting x and y.For two subsets C 1 , C 2 ⊂ X, the similarity is defined by The clustering is implemented by a hierarchical method as follows.At first, each point of the data set X forms a cluster which contains a singleton.Then each data point tries to connect another data point in its neighborhood.After this step, a cluster can be found as connected points.Now we construct a simplicial complex K corresponding to the data set Ω. On the basis of the rectangles as in 4.1 , we add some hyperplanes to form a triangulation.Then each rectangle is decomposed as several simplices.Hyperplanes can be chosen such that the set of simplices satisfy the definition of simplicial complex.And then K is defined as the union of such simplices.In the three dimensional case, this is shown as in Figure 8. Now we will show that the triangulation as above exists.In fact, we need only to consider an inner cell C Π n i 1 p i , q i , where r i < p i < q i < s i .There are 2 n vertices for this cell There are totally 3 n − 1 surrounding cells n i 1 x i , y i | x i , y i r i , p i , p i , q i , q i , s i \ {C}.

4.6
First we consider the triangulation of the cell C. Now we choose one vertex v 0 z 1 , . . ., z n from the cell C where z i can be p i or q i .Now we denote Consider the vertex set Then # S j 2 n−1 .Clearly C can be decomposed as disjoint cones C ∪ n j 1 v 0 S j .Notice that S j is an n − 1 -dimensional rectangle and this means that, for any triangulation of S j , we can join them with v 0 to form a triangulation of v 0 C. Therefore by induction we get the construction of triangulation. 4.9 Clearly φ 1 ≥ 0. Let y J − 1 − t 0 z J − t 0 a J 0. Then x ∈ v 0 S J .
By the above discussion, we also have another lemma.
Lemma 4.2.Suppose the triangulation of S j in R n−1 is {T s j }.Then the set ∪ n j 1 ∪ s {v 0 T s j } forms a triangulation of C.
Finally we obtain a theorem.Theorem 4.3.A simplicial complex K exists corresponding to the grids G.
For each node x ∈ Ω, define its neighborhood as N x {y ∈ Ω | y is incident with x in K}.For a subset C ⊂ Ω, define its neighborhood N C as N C x∈C N x .Now we propose an algorithm for our clustering problem.This is a self-joining cluster technique.Initially, one data point in Ω forms a cluster of singleton.Then each node searches for its neighborhood.If there exists a neighboring node which is similar enough, then activate a link between the two nodes.The final cluster is linked nodes.To define the meaning of similar enough, we need a parameter δ 0 such that f x, y > δ 0 .Putting everything together, we get the algorithm as shown in Algorithm 1.

Design of a P System
Now we have already defined the simplicial complex K as the membrane structure with M total membranes.Next we need to specify the alphabet and rules to be used.First we design a binary coding scheme for the weight function m • and the distance function d x, y .Suppose the length of the coding is L. In this way, the weight function and the distance function take binary strings as values.Now we suppose m x i m i1 • • • m iL , where m ij 0, 1.We also need an integer H such that if the coding of δ 0 is d Suppose μ ⊂ ch is a subset of links.Then we can define a P system as follows: The working alphabet is

4.12
Let v i x i , i 1, . . ., N be the vertices.We will use The initial multiset ω i stands for the multiset in the membrane σ v i :

4.13
For i, j ∈ μ, rules are R i, j {A s ij , B ij : s 1, . . ., L} with the order as follows:

4.14
Rules on edges are
We now define L 9. Then 2 L 512 and N 100 − 25 75.Now we choose δ 0 1.Then the final cluster is shown in Figure 11.We find five clusters and 31 outliers.Next we choose attributes 1 and 2 and then the data matrix is Now we analyze the effect of the parameter δ 0 .First we choose δ 0 0. Then the clustering result is one cluster with all data.Then the ill clustered points is 699.Next let δ 0 1.Then we get two clusters and some outliers.However, in this case the red data are all ill-clustered and hence the error rate is 241.Now choose δ 0 2. In this case we find four clusters.The number of outliers are 46 while the ill-clustered points are 65 74 139.Then As a result we see that the best parameter is δ 0 2.

Figure 1 :
Figure 1: An attractor model on simplicial complex.

Figure 5 :
Figure 5: A network model for membrane structure.

Figure 6 :
Figure 6: A simplicial complex model with all cells.

Figure 7 :
Figure 7: A network model for the simplicial complex model in Figure 5.

Lemma 4 . 1 .
The decomposition C n j 1 v 0 S j is valid with the property that each pair of Int v 0 S j is disjoint, where Int C is the set of interior points in R n .

Figure 10 :
Figure 10: Simplices of the Wisconsin breast cancer data.

Figure 11 :
Figure 11: Cluster result of the Wisconsin breast cancer data.

Figure 12 :
Figure 12: Error number with respect to the parameter δ 0 .
τ 2 are incident.Putting everything together we get a theorem.
Inputs: Ω {x 1 , x 2 , ..., x N }, δ 0 : similarity threshold value, N i {y | y, x i are incident} Outputs: C {C 1 , C 2 , ..., C t }: set of clusters, t: number of clusters, Ω 0 : outliersBegin Set Ω 1 Ω 0 ∅, C 1 ∅, t 1. for i 1 to N If N i ∅, then add x i into set Ω 0 .Otherwise add x i into set Ω 1 .For each x j ∈ N i , j > i, calculate the similarity measure f x i ,y .If f x i ,y > δ 0 , then set the edge σ x i y as active.In this case, setC 1 {x i } if C 1 isstill an empty set.end while Ω 1 / ∅ do 1 For each x ∈ Ω 1 if there exists an active edge σ xy with x/ y, y ∈ C t , then add x into C t and remove x from Ω 1 .Proof.Suppose x a 1 , . . ., a n ∈ C and x / v 0 .Consider the point x t 1 − t v 0 tx 1 − t z 1 ta 1 , . . ., 1 − t z n ta n where t ≥ 1.Let t 0 be the first zero point of the following function: