Partitionable Bus-based String-matching Algorithm for Run-length Coded Strings with VLDCs

String matching (SM) problem is to find the occurrences of a pattern within a text. A vanable length don't care (VLDC) is a special symbol, not belonging to a finite alphabet ∑ but in ∑*. Each VLDC in the pattern can match any substring in the text. Given a run-length coded text of length 2n over ∑ and a run-length coded pattern of length 2m over ∑*, this paper first presents an O(1) time parallel SM algorithm for run-length coded strings with VLDCs on a reconfigurable mesh (RM) using O(nm) processors. Consider the hardware limitation in VLSI implementation. In order to be suitable for VLSI modular implementation, a partitionable parallel algorithm on the RM with limited processors is further presented. For N < n and M < m, the SM for run-length coded strings with VLDCs can be solved in O(X^Y^) time on the RM using O(NM)(= O((nm)/((X^Y^))) processors, where X^ = [(n – 1)/(N – 1)] and Y^ = [(m – 1)/(M – 1)].


INTRODUCTION
A basic search operation on patterns is the string matching (SM).In many applications, using a special encoding method for representing strings is important and advantageous for saving storage and manipulating them.One well-known method that has been widely used in many fields and has played a valuable historical role in the develop- ment of data compression is run-length coding.
The basic idea of this method is to replace sequences of identical consecutive symbols with that representative symbol and its multiplicity.For example, the run-length coded representation of the string aaaabbbaaacccc is a4b a3c 4 (a, 4) (b, 3) (a, 3) (c, 4), and the length is reduced from 14 to 8. A variable length don't care (VLDC) is a special symbol, not belonging to E but in E* Each VLDC in the pattern can match any substring in the text (possibly zero length).For example, given a text *Corresponding author, e-mail: klchung@cs.ntit.edu.tw.This research was supported in part by the National Science Council of R.O.C. under contracts NSC85-2121-E011-009, NSC85-2213-M011-002, NSC86-2213-E011-010 and NCHC86-08-015.56 HSIU-NIANG CHEN AND KUO-LIANG CHUNG string 'cccaaaaabbaaabbbccccddaabb' and a pattern string 'aabb.cccddaa',where is the VLDC, the two matched positions are from 7 to 24 and from 12 to 24.The SM problem for run-length coded strings with VLDCs can be viewed as an extension of the classical SM problem and has many important applications [4] such as editing operations, pattern recognition, file retrieval, DNA matching, etc.
Reconfigurable mesh (RM) is a very promising platform to be used in high performance bus-based VLSI architectures due to its simplicity and regularity.Many efficient algorithms and simula- tions on RMs [5-13] have been developed.Except the results in [8, 9], the number of processors used in most of the published results in the literature is dependent on the problem size.Assume the text (pattern) with length t(p) has been compressed into a run-length coded representation with length 2n(2m), 2n < and 2m < p. Previously, Chen [1]  presented an O(1) parallel SM algorithm on a RM with O(t2) processors.Later, Chung [2] presented an O(1) time parallel SM algorithm on a RM with O(tp) processors.Recently, Chung [3] presented an O(1) time SM algorithm for strings with VLDCs on a RM with O(mn) processors.The number of processors used in the above three parallel SM algorithms on RMs is also dependent on the problem size, and they are not suitable for VLSI modular implementation.The main motivation of this research is to design a partitionable parallel SM algorithm for run-length coded strings with VLDCs such that it is suitable for VLSI modular implementation.To the best of our knowledge, this is the first time such a partitionable parallel SM algorithm on RMs is being proposed in the literature.
This paper first presents an O(1) time parallel SM algorithm for run-length coded strings with VLDCs on the RM using O(nm) processors.Then, a partitionable parallel algorithm on the RM with limited processors is further presented for solving the same problem.For N < n and M < m, the SM for run-length coded strings with VLDCs can be solved in O(') time on the RM using O(NM) (--O((nm) /(.'))) processors, where (n-1)/(N-1) and -(m-1)/(M-1)].
The remainder of the paper is arranged as follows.Section 2 introduces our computational model RM.In Section 3, we present the O(1) time parallel SM algorithm for run-length coded strings with VLDCs on the RM with O(mn) processors.In Section 4, the partitionable parallel SM algorithm for solving the same problem is presented.Some concluding remarks are included in Section 5.

THE COMPUTATIONAL MODEL: RM
The parallel computational model used is the 2-D RM A reconfigurable bus system is a bus system whose configuration can be dynamically changed.
An rn n RM consists of rn n (m rows and n columns) identical processors arranged in a 2-D rectangular array with a reconfigurable bus system.The processor located in row and column j for l<i<rn and l<j<n is referred to as PE(i, j).Every processor has four ports denoted by N(north), S(south), E(east), and W(west), respectively.For example, Figure shows a RM of size 3 5, where the RM contains 15 processors and the four black dots on each processor represent four ports.
In each processor, ports can be dynamically connected in pairs to fit computational needs.As shown in Figure 2, each processor connects its W and E ports.Of course, all processors can also connect their own W and S ports; N and S ports; N and E ports, etc.
We assume that the RM is operated in a SIMD (single instruction multiple data) model.All processors can work synchronously and setting internal connections, performing an arithmetic or boolean operation, broadcasting a value on a bus, or receiving a value from a specific bus only need O(1) time.If there exists a bus between two ports, we assume that it takes O(1) time to broadcast the data from one port to the other.In addition, at any given time only one value can be sent by a bus and processors read the bus if instructed to do so.

THE O(1) TIME PARALLEL ALGORITHM
We take the text string 'cccaaaaabbaaabbbccccd- daabbccccddaaa' and pattern string aabb,cccddaa' with one VLDC to demonstrate our O(1) time parallel algorithm.The run-length coded text becomes T: c aSb:za3b c4dZa:Zb:c4dZa3 it has six entries.In this example, the first matched substring in T starts from the location (2, 4) (the fourth position in the second entry) and ends at (8,2); the second matched substring in T starts from the location (4, 2) and ends at (8, 2); the third matched substring starts from the location (8, 1) and ends at (12, 2).
On the rn x n RM, our O(1) time parallel SM algorithm for run-length coded strings with VLDCs is presented in the following nine steps.Initially, suppose the run-length coded text T has been fed into row of the RM and the run-length coded pattern P has been fed into column 1.That is, processor PE(1, j) stores the data (T(1, j), T(2,j)) for < j < n and processor PE(i, 1) stores the data (P(1,/), P(2, i)) for l<i<m.For simplicity, the initial data allocation on the m x n RM is illustrated in Figure 3. Here, n 2 and m=6.

Algorithm_l
Step 1 Step 2 Establish a vertical bus system for each column.This configuration can be built by connecting the N and S ports of each processor.Then, each processor PE(1,j) for <j<n broadcasts its own data T(1, j) and T(2, j) to the others in the same column via the vertical bus system.Establish a horizontal bus system for each row.This configuration can be built by connecting the W and E ports of each processor.Then, each processor PE(i, 1) for <i<m broadcasts its own data a c d d FIGURE 3 The initial configuration of the 6 12 RM.
P(1,/) and P(2,/) to the others in the same row via the horizontal bus system.
Step Each processor first disconnects its vertical and horizontal connections.Then, each processor holding the symbol ',' sends a special symbol, say '+', to its south neighbor and north neighbor.Figure 4 only illustrates the special symbols, ',' and '+ ', in each processor.
Step 4 Excepting the processors in the first row, each processor connects its N and E ports.
Step Each processor in the first row, the last row, and the rows holding '+' connects its W and S ports when P(1,i)= T(1,j) and P(2, i) _< T(2, j).Otherwise do nothing.
Step Each processor PE(i, j) holding ',' con- nects its W and S ports; connects its W and E ports.Step 8 Step 9 configuration of the RM after performing this step.
Each processor PE(m, j), < j <_ n, with the connection linking the W and S ports, first sends the symbol, say '!', from the S port to processor PE(1, k) for _< k_< n.
If each processor holding the symbol ',' receives the symbol '!' from its S port then it disconnects its W and E ports; dis- connects its N and E ports if it has; connects its N and S ports.This can avoid the negative length effect.Figure 6 illus- trates the configuration of the RM after performing this step.
Along the corresponding stairlike bus system, each processor PE(m, j) _< j _< n, with the connection linking the W and S ports sends the data (j, P(2, m)) from the S port to processor PE(1, k) for _< k <_ n.Finally, each processor PE(1,k) reports the data (k, T(2, k)-P(2, 1)-1) as the matched starting location and (j, P(2, m)) as the matched ending location if the W port of that processor receives the data.
Note that if P(1, m)= ',', then only PE(m,n) sends the data (n, T(2,n)) to PE(1, k).If P(1, 1) ',', each processor PE(1,k) reports the data (1, 1) as the matched starting location and the re- ceived data as the matched ending location if the S port of that processor receives the data.Figure 7 illustrates the config- uration of the RM after performing this step.In our example, the processor PE(1,2) reports the data (2, 4) as the matched starting location and (8, 2) (see Fig. 7) as the matched ending location; PE(1,4) reports (4,2) as the matched starting location and (8,2) (see Fig. 7) as the matched ending location; PE(1,8) reports (8, 1) as the matched starting location and (12, 2) (see Fig. 7) as the matched ending location.Since each step in Algorithm_l takes O(1) time, we have the following result.
In our example, the initial data allocation on the mxN (=MxN=6x7) RM is illustrated in Figure 8, where in fact PE(1,7) does hold only one copy of d2.
LEMMA 2 Given a run-length coded text of length 2n where n > and the number of processors of size N, the number of pipes, say f(, is equal to (n-1)/ (N-1)-].
Proof Since two pipes have one overlapping entry; three pipes have two overlapping entry; and so on, we then have THEOREM The SM problem for run-length coded strings with VLDCs can be solved in O(1) time on the rn x n RM. n + f(-<_ Nf(: N-1).Q.E.D.
Consider the case when the number of processors available is not enough.Given an RM consisting of M N (M rows and N columns) processors, if the run-length coded text (pattern) is of length 2n (2m), where N < n or M < m, this section pro- poses a partitionable strategy to overcome this hardware limitation.Without loss of generality, we focus on the following two cases: (1)N< n; (2)N < n and M < m.
A. Case 1: N < n Assume the RM consists of 6 7 processors and the run-length coded strings are the same as in ) ) FIGURE 8 The initial ocnfiguration of the 6 x 7 RM.
By Lemma 2, we first partition the run-length coded text into " pipes and each pipe has one overlapping entry shared with the last pipe.In Figure 8, the last entry in the first pipe is shared with the last entry in the second pipe.In general, the last (first) entry in the odd (even) pipe is shared with the last (first) entry in the next pipe.Besides this entry-sharing feature, the text is arranged into a snakelike row-major order.For clarity, let N 5, the text alblcldlelflglhliljlklllm can be ar- ranged into the following snakelike row-major: Our parallel algorithm process these pipes from the th pipe to the first pipe successively and processing each pipe is similar to the parallel algorithm described in Section 3. Our partitionable parallel algorithm for this case, i.e., N < n, is described below.

Algorithm_2
Step 1 Each processor PE(1,j) for <j<N broadcasts all of its own data to the other processors in the same column via the vertical bus system.It takes O(') time.
Here, we assume that each time it takes O(1) time to broadcast a data.
Step 2 and Step 3 These two steps are the same as Step 2 and Step 3 described in Algorithm_l, respectively.
For X " to 1 /* we process these " pipes from the last one to the first one */ begin Step 4 Each processor first disconnects its all connections.Then, excepting the proces- sors in the first row, each processor connects its N and E (W) ports for odd (even) X.
Our algorithm process these 'I? pipes from pipe I?
to pipe for each fixed pipe X, J_> X>_ 1, successively; it is similar to Algorithm -2, but we change the roles of text and pattern each other.
Our partitionable parallel algorithm for Case 2 is described below.

Algorithm_3
on the M xN (-5x5) RM is illustrated in Figure 10.
By Lemma 2, it follows that the number of pipes for text and pattern are 2-[(n 1)/(N-1)1 for n> and I?= [(m-1)/(M-1)] for m> 1, re- spectively.Therefore, we partition the run-length coded text (pattern) into 2(I?) pipes and each pipe has one overlapping entry shared with the last pipe.For the run-length coded text in Figure 10,  Step Step Step 3 1 Each processor PE(1, j) for _< j<_ N broadcasts all of its own data to the others in the same column via the vertical bus system.It takes O(2) time.
2 Each processor PE(i, 1) for <_ i_< M broadcasts all of its own data to the others in the same row via the horizontal bus system.It takes O(I?) time.
Otherwise do nothing.Each processor PE(i, j) holding ',' con- nects its W and E ports; connects its W and S (N) for odd X and odd (even) Y; connects its E and S (N) for even X and odd (even) Y.
Step 8 Case X is odd With the connection linking the W and S (N) for odd (even) Y, each processor PE(i, j) holding (P(1, m), P(2, m)), processor PE(M,j) (PE(1,j)) holding the data sent from pipe Y + and Case X is even Step 9 Case X is odd processor PE(i,N) holding the data sent from pipe X+ 1, where l<j<Nand l<i<M, first send the symbol '!' from the S (N) port to the processor PE(1, k) (PE(M, k)) for l<k<N or PE(i, 1).If each processor holding the symbol ',' receives the symbol '!' from its S (N) port then it disconnects its W and E ports; disconnects its N (S) and E ports if it has; connects its N and S ports.
With the connection linking the E and S (N) for odd (even) Y, each processor PE(i,j) holding (e(1, m), P(2, m)), processor PE(M,j) (PE(1,j)) holding the data sent from pipe Y + and processor PE(i, 1) holding the data sent from pipe X+ 1, where I<j<N and l<i<M, first senf the symbol '!' from the S (N) port to the processor PE(1, k) (PE(M, k)) for l_<k_<N or PE(i,N).If each processor holding the symbol ',' receives the symbol '!' from its S (N) port then it disconnects its W and E ports; disconnects its N (S) and W ports if it has; connects its N and S ports.
Along the corresponding stair- like bus systems, with the con- nection linking the W and S (N) for odd (even) Y, each proces- sor PE(i, j) holding (P(1, m), P(2, m)), processor PE(M, j) (PE(1, j)) holding the data sent from pipe Y + 1, and processor PE(i,N) holding the data sent from pipe X+ 1, where <j <_N and l_<i_<M, send the data (2 (X)/(2)J (N-1) + j, P(2, m)) if they hold (P(1, m), P(2, m)) or the received data from the S (N) port to the processor PE(1, k) (PE(M, k)) for <_ k _< N or PE(i, 1).Final- ly, PE(1,k) reports the data ( 2[(X)/(Z)J (N-1) + k, T(2, 2 (X)/(Z)J (N-1) + k)-P(2, 1) + 1) as the matched starting Case X is even Along the corresponding stair- like bus systems, with 'the con- nection linking the E and S (N) for odd (even) Y, each proces- sor PE(i,j) holding (P(1, m), P(2, m)), processor PE(M, j) (PE(1, j)) holding the data sent from pipe Y + 1, and processor PE(i, 1) holding the data sent from pipe X + 1, where I_<j<_N and l<_i_<M, send the data (X(N-1)-j + 2, P(2, m)) if they hold (P(1, m), P(2, m)) or the received data from the S (N) port to the processor PE(1, k) (PE(M, k)) for _< k _< N or PE(i, N).Finally, PE(1, k) reports the data (X(N-1)-k + 2, T(2, X(N-1) -k+2)-P(2,1)+ 1)) as the matched starting location and the received data as the matched ending location if the E port of that processor re- ceives the data and it holds (P(1, 1), P(2, 1)); otherwise, PE(1, k) (PE(M, k)) and PE(i, N) keep the received data if the E port of those processors receive the data.end It is observed that from Step 4 to Step 9, the major difference when compared with Algo-rithm_l is to replace E (W) port by W (E) port when it runs the even pipe of the run-length coded text; replace S (N) port by N (S) port when it runs the even pipe of the run-length coded pattern.

CONCLUSIONS
The significance of string-matching for run-length coded strings with VLDCs is due to its popular use (f) X=2 and Y=2 Some snapshots for simulating Algorithm_3.
in pattern recognition, edit operations, and so on.Given a run-length coded text (pattern) of length 2n (2m), the main contributions of this paper are twofold: first we have presented an 0(1) time parallel SM algorithm for run-length coded strings with VLDCs on the RM with O(nm) processors,  (1) X=I and Y=I and this result generalizes the results in [2, 3]; second, the most important is that we have presented a partitionable parallel SM algorithm for solving the same problem, and this partition- able parallel algorithm on RMs is very suitable for VLSI modular implementation.
FIGURE 9 Some snapshots for simulating Algorithrn_2.

FIGURE 10
FIGURE 10 The initial configuration of the 5 x5 RM.