Solving the Maximum Weighted Clique Problem Based on Parallel Biological Computing Model

1School of Information Sciences, Shanghai Ocean University, Shanghai 201306, China 2Guangxi Institute of Water Resources Research, Nanning 530023, China 3State Key Laboratory of Simulation and Regulation of River Basin Water Cycle, China Institute of Water Resources and Hydropower Research, Beijing 100048, China 4Department of Civil Engineering, Xi’an University of Architecture & Technology, Xi’an 710055, China


Introduction
DNA computing, as a comprehensive discipline, can use DNA biological technologies to solve complex practical engineering problems.In 1994, Adleman [1] made use of DNA molecule operations to solve the Hamiltonian path problem with  vertices in () time complexity; simultaneously, he also demonstrated the strong parallel ability of DNA computing.In 1995, Lipton [2] figured out the NP-complete satisfiability problem utilizing Adleman's biochemical experiment.Since then, DNA biological computing attracted more and more interest from different disciplinary scholars.DNA biological computing has three advantages: high parallelism, low energy consumption, and large memory capacity.Many research scholars, designing DNA procedures and algorithms, succeed in solving multifarious kinds of complicated NP-complete problems [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21], which promoted development of DNA computing.In order to better apply DNA computing theory to more practical engineering science broad, it is worth trying to solve more intractable problems using the DNA molecular computing.Furthermore, most previous works, relating to DNA computing, focused on how to solve the path search problems that the solutions are continuous head-to-tail ligation edge or vertex sets, so that the possible solutions can be relatively easily represented by DNA strands, while some practical engineering problems, such as maximum weighted clique problem, are discrete set problems without sequentially connected path.So how to represent discrete data on DNA strands is an important key to expand the applied scope of DNA computing.
The maximum weighted clique problem has a wide range of applications in optimal engineering scheme and computational mathematics.In this paper, DNA algorithm, based on the research foundation of Adleman [1] and Lipton [2], is used to get solution of the maximum weighted clique problem.The rest dissertation is organized as follows.
In Section 2, the parallel biological computing model is introduced with detailed description.Section 3 uses DNA molecular algorithm to solve the maximum weighted clique problem.Section 4 proves DNA algorithm correctness and feasibility and gets the computation complexity.We come to the conclusions in Section 5.

The Parallel Biological Computing Model
DNA is the material basis of biological genetics, which is strung together from deoxyribonucleotides. DNA is formed by four kinds of base composition.These bases are, respectively, called adenine (), guanine (), cytosine (), and thymine ().The permutation and combination of bases store genetic information.An important feature of DNA is that two single strands can form a double strand through complementary base pairing.Moreover, the pairing has high specificity:  can only match ;  can only be paired with .The length of a DNA single strand is counted by the number of bases.For example, a single strand () includes 5 bases; then it is called a 5 .
Based on Adleman [1] and Lipton's [2] research, DNA biological algorithm operations are described as follows.Corresponding biological operations can be used to get solution of the maximum weighted clique problem.In the parallel biological computing model, we can perform the following operations with given tubes which contain a list of DNA strands.
(1) Copy( 1 ,  2 ): given a test tube  1 , it can get another test tube  2 with the same strands as  1 .
(2) Merge( Since above operations are realized through the limited biological experimental procedures with DNA strands [18], we can reasonably conclude that each operation is in (1) time complexity.

Biological Algorithm for the Maximum
Weighted Clique Problem In succession, the symbols   ,   ( = 1, 2, . . ., ), 1, 0, #, # are composed by different single strands having same length, as  .Certainly,  would be best to choose a small integer which can be determined by the scale of the problem.Then in the following algorithms, we use DNA single strands symbols   0  ,   1  to indicate the vertex V  , with strands symbol   1  for vertex V  in the vertex subset while   0  for not.Simultaneously, the symbols #, # are the signal of division between different vertex subsets.We denote DNA singled strands   to encode the vertex V  weight value with length of   .For distinguishing some edges belonging to the graph  or not, we meantime design DNA strings     in the tube  if  , ∈ .Let For a -vertex graph, every vertex subset can be expressed by a -bit binary value.The th bit set to 1 means the vertex V  in the subset; on the contrary, the th bit set to 0 shows the vertex V  out of the subset.Taking Figure 1, for example, the vertex subset {V 2 , V 3 , V 5 } can be expressed by the binary value 01101.Using the same method, we can represent the vertex subsets of a -vertex simple graph as a series of -bit binary numbers.
After the above six manipulations, the single strands in tube  1 mean all kinds of vertex subsets.For example, in Figure 1, we have single strands: which denote the vertex subset {V 1 , V 2 , V 4 , V 5 } corresponding to binary value 11011.These operations can be executed with (1) time complexity since every operation can be finished in (1).
(2) Every strand in tube  1 denotes one kind of vertex subset.For the maximum weighted clique problem, solution is one kind of vertex subset that arbitrary two vertices in the subset can be connected by one edge included in the graph .Therefore, we check whether all vertex subsets in  1 are in line with the condition or not.If  , ∉ , we discard the strands indicating that both vertices V  and V  are in the same subset.For example, in Figure 1, the singled strands }) should be discarded for not including the edge  2,4 in graph  to connect vertices V 2 and V 4 .We choose all possible vertex cliques in graph.
For  = 1 to  = , End for

End for
Through the above operations, all the single strands in tube  1 represent different vertex clique subsets.
Meanwhile, the algorithm includes two "For" clauses, this step is executed in ( 2 ) time complexity since each operation can be finished in (1).
(3) The maximum weighted clique problem should be a maximal vertex clique subset in which arbitrary two vertices should be linked by certain edge of the graph .So we select the maximal vertex subset from all kinds of vertex clique subsets.If the vertex V  is included in the vertex subset, we append additional strand   at the end of previous subset strand in order to find the optimum solution strand.For the singled strand (representing the vertex subset {V 2 , V 3 , V 5 }) we append strands { 2 ,  3 ,  5 } at end of the previous strand to This step can be carried out as follows.

End for
This step includes one "For" clause; thus it can be finished in () time complexity.
(4) We select single strands with the longest length from  1 , which represent the solutions of maximum weighted clique problem.For example, in Figure 1, single strands in  1 with the largest length are Consequently, solution of maximum weighted clique problem for Figure 1 is vertex subset {V 3 , V 4 , V 5 } with weight sum 15.

The Feasibility and Computational Complexity of the Parallel Biological Computing Algorithm
Theorem 1.The maximum weighted clique problem for a vertex graph can be solved by the biological computing algorithm.
Proof.At first, we get all kinds of the vertex combinational subset in the test tube after Step (1).For the maximum weighted clique problem, if  , ∉ , vertices V  and V  should be not in the same subset.Therefore, basic biological manipulations remove illegal combinations and seek legal In conclusion, we can get the solutions of maximum weighted clique problems with -vertices in ( 2 ) time complexity.
Theorem 3. Solution strands to the maximum weighted clique problem with -vertices can be found in the finite length range.
Proof.After Step (1), the singled strands in tube  1 denote all possible vertex subsets.These strands can be described as follows: is defined as strands assemblage after Step (3).Then  can be described: Appending the strands    or not is decided whether there exists vertex V   information strands    1   on the previous strands.Due to the fact that the number of vertex V  in sunset is between 1 and , so after "append" operation, the strands is also in a finite length range: For the maximum vertex clique problem, the length of solution strands is between (3 + 2) and (3 + 2) + .Therefore, we can get the solution in appropriate length range at Step (4).
Table 2: Sequences chosen to represent all kinds of vertex subsets for the example of Figure 1.

The Detailed Approach and Walkthrough of the Biological Computing Algorithm
Taking Figure 1 as example, we describe operation result of each step.Due to the fact that biological computing algorithm depends on basic biochemical DNA molecules reactions which may cause errors in the process, it is an important matter to make biological computing more reliable by means of the DNA molecular sequence design.To have a better performance in hybridization reactions, we follow [22] to accomplish the sequence design.For the problem of Figure 1, the program generates 3-base random sequences to represent symbols #,   ,   , and   .If the generated DNA sequence fails to pass any of the constraints, the program will regenerate a new DNA sequence.If the constraints are satisfied, the new DNA sequences are accepted.If all the DNA strands satisfy the constraints, the program has then succeeded and these sequences would be the outputs.The corresponding vertex symbol sequences are shown in Table 1.
In accordance with the above design, we can get all kinds of Table 3: Symbols sequences chosen to represent the different vertex cliques for the example of Figure 1.

Vertex cliques Symbols sequence
Table 4: Symbols sequences chosen to represent the vertex weighted cliques for the example of Figure 1.
Vertex set Symbols sequence symbol representations of vertex subsets in Table 2 after Step (1).
Step (2) discards the inappropriate vertex combinatorial sequences and retains the vertex clique sequences in Table 3.At Step (4), we append the corresponding weighted sequences which are showed in Table 4. Through the "Sort" operation at Step (4), we find the optimal solution to the maximum weighted clique problem of Figure 1 in Table 5.

Conclusions
In this paper, we present a parallel computing algorithm to solve the maximum weighted clique problem based on biological operations.Due to the fact that DNA biological computing has some advantages including high parallelism, low energy consumption, and large memory capacity, comparing to electronic computers low speed and limited memory, the method of DNA computing has attracted more and more attention.Besides, compared with the previous algorithms, our proposed algorithm has the following features: (1) we utilize fixed length DNA strands to generate the solution strands of the problem, the algorithm actually has lower error rate in hybrid operations; (2) the time cost of algorithm and solution strands length increase in linear proportion with the expansion of instance scale.For an undirected simple -vertex graph, the parallel biological computing algorithm executes in ( 2 ) time complexity for the maximum weighted clique problem, having lower computational complexity than previous algorithms in exponential level.Although operations in our paper are on the basis of a theoretical model, the capacity to executive complicated operations in algorithm could help us understand more about the nature of computing and promote the better and faster development of biocomputing, more conducive for us to solve complex practical engineering problems.
Sort( 1 ,  2 ,  3 ): it picks out the shortest length strands into tube  2 from tube  1 , the longest strands into  3 , and the surplus strands are still kept in  1 .
1 ,  2 ): given two test tubes  1 and  2 , it can get the compound strands  1 and  2 in  1 and leave  2 empty.(3) Annealing(): given a test tube , it can generate all feasible double strands in  by annealing.The products and residues are still stored in  after annealing.(4) Separation( 1 , ,  2 ): given a test tube  1 and a list strands set , it can remove all single strands in  from  1 and get an another tube  2 with the removed strands.(5) Ligation(): given a tube , it is used to ligate together the strands in .(6) (7) Denaturation(): given a test tube , it can dissociate every double strand in  to couple of single strands.(8) Read(): given a tube , it can be used to describe each single strand in .(9) Append-tail(, ): given a test tube  and a single strand , it can append  at back of each strand in the tube .(10) Discard(): given a test tube , it discards the strands in tube  and leave  empty.
An undirected simple graph  = (, , ) is a pair of vertex set  = {V 1 , V 2 , ..., V  } with corresponding vertex positive weight value { 1 ,  2 , ...,   } and edge set = { , | 1 ≤  <  ≤ }.For a vertex subset  1 ⫅ , if ∀V  , V  ∈  1 ,V  and V  can be linked by edge  , in the graph, then  1 is called a clique of the graph , and simultaneously the clique weight is the sum of vertex weight in the  1 .The solution of maximum weighted clique problem aims to seek a vertex clique  of graph  with maximal weight sum.For example, the undirected simple graph in Figure1is defined as the MVC problem.

Table 1 :
Sequences chosen to represent #,   ,   ,   ( = 1, 2, ..., ) in the example of Figure1.ones from solution space strands through the Step (2).AtStep (3), we append a series of "tails"   at the end of the strands which imply the vertex V  included in the vertex subset.Owing to the length of strands ‖  ‖ =   , the longest length strands in the pool mean the solutions of maximum weighted clique problem.Besides, we can search and get the solution at the last step.
Theorem 2. The solutions of maximum weighted clique problem for a -vertex graph can be solved in ( 2 ) time complexity using DNA molecules computing.Proof.The parallel biological computing algorithm can be entirely executed in finite time complexity such as Steps (1) and (4) in (1), Step At Step (2), the single strands in  1 mean all possible vertex clique subsets.We design the fixed length strands of #,   ,   , and   , for           =           = ‖#‖ =           =  .

Table 5 :
DNA sequences chosen to represent the solution of the maximum weighted clique problem.{V 3 , V 4 , V 5 }