This paper proposes the use of the statistics of similarity values to evaluate the clusterability or structuredness associated with a cell formation (CF) problem. Typically, the structuredness of a CF solution cannot be known until the CF problem is solved. In this context, this paper investigates the similarity statistics of machine pairs to estimate the potential structuredness of a given CF problem without solving it. One key observation is that a well-structured CF solution matrix has a relatively high percentage of high-similarity machine pairs. Then, histograms are used as a statistical tool to study the statistical distributions of similarity values. This study leads to the development of the U-shape criteria and the criterion based on the Kolmogorov-Smirnov test. Accordingly, a procedure is developed to classify whether an input CF problem can potentially lead to a well-structured or ill-structured CF matrix. In the numerical study, 20 matrices were initially used to determine the threshold values of the criteria, and 40 additional matrices were used to verify the results. Further, these matrix examples show that genetic algorithm cannot effectively improve the well-structured CF solutions (of high grouping efficacy values) that are obtained by hierarchical clustering (as one type of heuristics). This result supports the relevance of similarity statistics to preexamine an input CF problem instance and suggest a proper solution approach for problem solving.
The research of this paper is like a crossroad of manufacturing systems and computer science. Based on our disciplinary background, we initially study the cell formation (CF) problem that seeks for the clustering of similar machines and parts to support mass customization in [
In the domain of computer science, the notion of structuredness somehow corresponds to the clusterability concept [
Notably, the measure of clusterability remains an open topic in computer science. Ackerman and Ben-David [
Back to the context of the CF problem, in response to the CDNM thesis, we also observed that a heuristic approach (e.g., HC in our case) can yield satisfactory results. To further utilize this observation in practice, this research develops the criteria that assess the potential structuredness (corresponding to clusterability in computer science) of a given CF problem and suggest either using HC or genetic algorithm (GA) for problem solving. To verify the development, we have applied numerical examples to examine the results of the structuredness criteria and the quality of CF solutions via HC and GA.
Though developed independently, we want to acknowledge that our approach of evaluating the structuredness criteria is similar to the statistical approach by Ackerman et al. [
Notably, this paper was extended from our conference paper [
The rest of this paper is organized as follows. Section
In the design of a cellular manufacturing system, one early and important decision is the formation of machine groups and part families, and it is often referred to as the cell formation (CF) problem. A simple CF problem can be compactly captured by a machine-part incidence matrix. Let
By using the incidence matrices to represent CF solutions (i.e., block-diagonal matrices), they can be roughly classified into two types: well-structured and ill-structured matrix [
Comparison of well-structured and ill-structured matrices.
To quantify the structuredness of a CF matrix solution, we use the traditional grouping efficacy (denoted as
Yet, not all incidence matrices can be converted to a well-structured matrix due to the original complex interdependency of the production requirements among machines and parts. This situation cannot be resolved by advanced optimization techniques as the root cause stems from the original inputs of the CF problem. However, we cannot practically know whether a given CF problem is going to have a well-structured matrix or not until we actually solve this problem. In this context, the purpose of this paper is to assess the structuredness of a given CF problem by analyzing the similarity of machines without actually solving it. In the traditional CF notion, two machines can be said similar if they are required mainly to produce a subset of common parts. In this work, the Jaccard similarity coefficient is applied [
After specifying the notion of machine similarity, let us revisit the two examples in Figure Well-structured matrix: 81 (out of 435) machine pairs have similarity values higher than or equal to 0.80. Ill-structured matrix: 4 (out of 435) machine pairs have similarity values higher than or equal to 0.50.
In this illustration, it is roughly identified that a well-structured matrix can have quite a different statistical distribution of machine similarity values as compared to an ill-structured matrix. This observation leads to an investigation question on the statistical conditions in which a well-structured matrix can be classified. This investigation is the focus of this paper. By knowing such statistical conditions, engineers in the design of cellular manufacturing systems can initially assess their production requirements via the statistics of machine similarity. If the statistical data shows unfavorable results (i.e., chance of getting a well-structured matrix is low), they can either modify the production requirements (e.g., buy more machines) or seek for other manufacturing systems. It can save the efforts to solve the CF problem with such initial assessment. Also, this paper will show that a well-structured matrix can be satisfactorily obtained by some less time-consuming heuristics (where complex optimization methods may not bring additional benefits).
To investigate the statistical conditions of the structuredness of a CF solution, this section will discuss the three properties of a well-structured matrix. These three properties include
The original formulation of the grouping efficacy (GE) in (
Based on its definition, a well-structured matrix should have few exceptional elements and voids, leading to a high value of GE. While GE is effective in indicating the structuredness of a CF solution (high value
Compared to the property of high grouping efficacy, it is less obvious to know that a well-structured matrix has a high percentage of high-similarity machine pairs. In view of the Jaccard similarity coefficient in (
In literature, the notion of similarity has been applied for many years to address the CF problem, and the Jaccard similarity coefficient is one of the early applications [
While similarity coefficients have been studied extensively for CF problems, the statistical distribution of similarity values of a CF problem has not been investigated reasonably in our understanding. Notably, these similarity values can be found without solving the CF problems. Then, if we know the relation between the statistical distribution of similarity values and the GE measure, we can use the statistical distribution of similarity values to assess the potential of yielding a well-structured matrix for a CF problem. This is the major aim of this paper.
At this point, we may wonder why it is important to know the potential of yielding a well-structured matrix before solving the CF problems. First of all, it has been recognized that a CF problem is a NP-hard problem [
In contrast to metaheuristic algorithms, heuristic algorithms are easier to implement but the quality of their solutions is often targeted [
As its third property, it is observed that a well-structured matrix can be obtained relatively easily by a heuristic approach (referred to HC specifically in this paper), where the metaheuristic approach does not necessarily have an advantage for getting higher-quality solutions. Alternately, the advantage of the metaheuristic approach is observed more often in the case of ill-structured matrices. As discussed before, a well-structured matrix demonstrates sharp differences between similar and dissimilar machine pairs. This feature supports the “greedy” nature of the heuristic approach, which can easily distinguish high-similarity pairs in the progressive grouping process. In contrast, an ill-structured matrix has more machine pairs with middle-similarity values so that some borderline cases can potentially lead to solutions of lower quality. While this third property may not be obvious, more verifying examples will be reported later in Section
Given this third property of a well-structured matrix, the statistical analysis of similarity values can then lead to another application, i.e., supporting the choice of the algorithmic approach for solving CF problems. If the statistical analysis shows a high potential to obtain a well-structured matrix, we can choose a heuristic approach to solve the CF problems. Alternately, if it indicates a high chance of getting an ill-structured matrix, we may consider revising the input incidence matrix (e.g., adding more machines or changing some part requirements). Also, we can prepare to use the metaheuristic approach to seek for high-quality solutions. In sum, the statistical analysis can preliminarily probe the structure of a given CF problem in order to determine the next problem solving step.
In view of the three properties of a well-structured matrix discussed above, the research and development questions are set as follows. What are the criteria related to the statistics of similarity values to assess the potential of getting a well-structured matrix? How do we decide on whether using a metaheuristic or heuristic approach for solving a CF problem?
To address the first question, this paper will utilize two statistical tools: histogram and the Kolmogorov-Smirnov (K-S) test. Histogram will be used to analyze the distribution of machine similarity values of a given CF problem, and twenty CF solutions will be set to investigate the threshold values for informing the potential structuredness of a matrix. The K-S test will be used to assess the normality of the distribution of machine similarity values. That is, if the set of similarity values roughly follow the normal distribution, it means that many machine pairs have the average similarity value, implying a low proportion of high-similarity values (i.e., an ill-structured matrix).
Based on the investigation using the histogram and the K-S test, we will develop a procedure to probe the structure of a given CF matrix and suggest whether using a metaheuristic or heuristic for problem solving (i.e., address the second question). In this paper, we have implemented genetic algorithm (GA) and hierarchical clustering (HC) as the metaheuristic and heuristic approaches, respectively, for solving the CF problems. To verify the procedure, additional forty CF matrices will be set. These CF matrices will be solved by HC and then genetic algorithm to observe the relation between the matrix’s structuredness and the utility of the metaheuristic approach for better CF solutions.
In this study, histograms are used to report the frequency distribution of machine similarity values with an increment of 0.1. Figure
Histograms of well-structured and ill-structured matrices.
From these two histograms, it is observed that a well-structured matrix tends to yield an U-shape histogram, i.e., relatively high numbers of extreme similarity values. The right peak of the U-shape can be explained by the property of high percentage of high-similarity machine pairs discussed in Section
Since the frequency distribution of a histogram will not be altered by the orders of a matrix’s rows and columns, we can set the CF solution matrices with known structuredness and then observe their histograms to develop the structuredness criteria. In this investigation, twenty 30×40 solution matrices (i.e., 30 machines and 40 parts) with three cells (or blocks) are set. These matrices are varied by two factors: Case A Case B Case C Case D Case E
Besides, four cases are set below to characterize the structuredness of matrices via the control of the numbers of exceptional elements and voids. Case I (well-structured): few exceptional elements and no voids Case II (well-structured): no exceptional elements and few voids Case III (well-structured): few exceptional elements and few voids Case IV: (ill-structured): good numbers of exceptional elements and voids
The resulting 20 matrices are shown in Figure
The resulting matrices of 20 benchmark cases.
To inform the matrix’s structuredness, two conditions as the U-shape criteria are set toward the low and high-similarity values. Let
Figure
Histograms of 20 benchmark matrices.
Concerning the region of low-similarity values (i.e., the left side of the U-shape), it is found that both well-structured (i.e., Cases I, II, and III) and ill-structured (i.e., Case IV) matrices have high proportions because many machines, as long as they are not in the same cell, have less common parts to work with in both cases. As a result, the proportions of low-similarity values from a well-structured matrix can become less discernible statistically. Thus, we choose to investigate the extreme value when the similarity values equal to zero, i.e.,
Number of machine pairs with similarity values equal to zero.
Case I | Case II | Case III | Case IV | |
---|---|---|---|---|
A | 34 | 300 | 110 | 32 |
B | 7 | 225 | 83 | 23 |
C | 23 | 240 | 121 | 57 |
D | 22 | 225 | 116 | 15 |
E | 11 | 252 | 73 | 14 |
Concerning the region of high-similarity values (i.e., the right side of the U-shape), as discussed earlier, not all well-structured matrices have high proportions of high-similarity values at the rightmost region. By inspecting the histograms in Figure
Number of machine pairs with similarity values greater than or equal to 0.5.
Case I | Case II | Case III | Case IV | |
---|---|---|---|---|
A | 135 | 106 | 109 | 38 |
B | 200 | 104 | 194 | 4 |
C | 136 | 91 | 194 | 38 |
D | 139 | 203 | 158 | 19 |
E | 182 | 175 | 103 | 10 |
The Kolmogorov-Smirnov (K-S) test is one type of hypothesis testing in statistics (Corder and Foreman) [
Figure
CDFs of similarity values (solid line: empirical CDF; dashed line: hypothesized normal CDF).
A single-peak histogram
A U-shape histogram
CDFs of the single-peak histogram
CDFs of the U-shape histogram
The
Notably, the purpose of using the K-S test in this work is not about hypothesis testing, but only using its
By knowing the property of the trend associated with
The upper bound of
In the normalization process, we can first identify the size and the number of nonzero entries of a given matrix. Let Number of machines: from 10 to 50 machines Number of parts: from 10 to 110 parts (with an increment of 10) Number of even-size cells: from 2 to 14 cells (also restricted by the matrix’s size to avoid extremely large and small cells)
Further details of the setup of these perfect matrices can be found in Zhu [
The setting of the ratio threshold for
| | |||||
---|---|---|---|---|---|---|
| | | | | | |
Case A | 43.44 | 62.39 | 0.70 | 69.58 | 85.40 | 0.81 |
Case B | 22.03 | 31.35 | 0.70 | 44.35 | 72.93 | 0.61 |
Case C | 14.80 | 70.55 | 0.21 | 42.57 | 97.73 | 0.44 |
Case D | 13.13 | 75.90 | | 43.58 | 96.69 | 0.45 |
Case E | 23.06 | 39.67 | 0.58 | 53.56 | 65.65 | 0.82 |
| ||||||
| | |||||
| | | | | | |
| ||||||
Case A | 39.67 | 76.05 | 0.52 | 10.77 | 73.53 | |
Case B | 26.84 | 54.38 | 0.49 | 3.77 | 72.19 | 0.05 |
Case C | 23.06 | 88.52 | 0.26 | 7.69 | 86.15 | 0.09 |
Case D | 18.77 | 91.49 | 0.21 | 3.13 | 83.92 | 0.04 |
Case E | 17.56 | 67.29 | 0.26 | 1.97 | 67.14 | 0.03 |
Yet, when we examine the extreme situations, the lowest ratio of the well-structured cases is 0.17 (i.e., Case D-I, bold in Table
This section provides a four-step procedure below to assess the potential structuredness of an incidence matrix using the histogram-based U-shape criteria and the criterion based on the
Procedure to assess the potential structuredness of an incidence matrix.
By receiving an incidence matrix as an input, the similarity values of machine pairs are first determined based on (
This represents the preliminary check based on the frequencies of having high and low-similarity values. If either one of the criteria
The dataset of similarity values is treated as the input to determine the
With the values of
To examine the statistical analysis of similarity values for CF problems in this paper, other 40 matrices (in addition to the earlier 20 benchmark matrices, making up a total of 60 matrices) will be generated and applied in this section. These 60 matrices will be used to examine the following two issues specifically. Given the three criteria for assessing the potential structuredness of a matrix, we are going to use these 60 matrices to examine their effectiveness to distinguish well-structured and ill-structured matrices. While Property III (i.e., relative ease of obtaining satisfactory CF solutions) of a well-structured matrix has been discussed in Section
The strategy to generate 60 matrices is based on the extension of getting the 20 benchmark matrices in Section In addition to the size of 30×40 matrix, another size of 40×100 matrix is set. We add cases with more numbers of cells (from 3 to 6, 8, and 12 cells) The evenness of cell sizes is also varied for each case.
Table
Setup of incidence matrices.
Matrix size | No. of cells | Cell sizes | |
---|---|---|---|
Case A | 30×40 | 3 cells | (10×13) (10×13) (10×14) |
Case B | 30×40 | 3 cells | (20×26) (5×7) (5×7) |
Case C | 30×40 | 3 cells | (20×7) (5×26) (5×7) |
Case D | 30×40 | 3 cells | (20×5) (5×17) (5×18) |
Case E | 30×40 | 3 cells | (14×18) (14×19) (2×3) |
Case F | 30×40 | 6 cells | (5×7) (5×7) (5×7) (5×7) (5×7) (5×5) |
Case G | 30×40 | 6 cells | (15×20) (3×4) (3×4) (3×4) (3×4) (3×4) |
Case H | 30×40 | 6 cells | (15×4) (3×20) (3×4) (3×4) (3×4) (3×4) |
Case I | 30×40 | 6 cells | (15×5) (3×7) (3×7) (3×7) (3×7) (3×7) |
Case J | 40×100 | 8 cells | (5×13) (5×13) (5×13) (5×13) (5×12) (5×12) (5×12) (5×12) |
Case K | 40×100 | 8 cells | (8×8) (8×8) (4×14) (4×14) (4×14) (4×14) (4×14) (4×14) |
Case L | 40×100 | 8 cells | (7×18) (7×18) (7×18) (7×17) (6×17) (2×4) (2×4) (2×4) |
Case M | 40×100 | 12 cells | (3×8) (3×8) (3×8) (3×8) (3×8) (3×8) (3×8) (3×8) (4×9) (4×9) (4×9) (4×9) |
Case N | 40×100 | 12 cells | (5×5) (5×5) (4×5) (4×5) (3×10) (3×10) (3×10) (3×10) (3×10) (3×10) (2×10) (2×9) |
Case O | 40×100 | 12 cells | (4×12) (4×12) (4×12) (4×12) (4×11) (4×11) (4×11) (4×11) (2×2) (2×2) (2×2) (2×2) |
To evaluate the effectiveness of the criteria to assess the structuredness of the matrices, we have evaluated the criteria values for the 60 matrices. The results are provided in Table
Results of the criteria values.
Case I | Case II | Case III | Case IV | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
| | | | | | | | | | | | |
A | 0.08 | | | | | | 0.25 | | | 0.08 | 0.09 | 0.15 |
B | 0.02 | | | | | | 0.19 | | | 0.05 | 0.01 | 0.05 |
C | 0.05 | | | | | | 0.28 | | | 0.13 | 0.09 | 0.09 |
D | 0.05 | | 0.17 | | | | 0.27 | | | 0.03 | 0.04 | 0.04 |
E | 0.03 | | | | | | 0.17 | | | 0.03 | 0.02 | 0.03 |
F | | 0.14 | | | 0.11 | | 0.43 | 0.10 | | 0.26 | 0.03 | 0.14 |
G | 0.18 | | | | 0.14 | | 0.26 | | | 0.18 | 0.07 | 0.14 |
H | 0.34 | 0.14 | | | | | 0.43 | 0.18 | | 0.23 | 0.03 | 0.08 |
I | 0.18 | 0.11 | | | | | 0.34 | 0.15 | | 0.19 | 0.01 | 0.06 |
J | 0.29 | 0.10 | | | 0.09 | | | 0.06 | | 0.14 | 0.00 | 0.10 |
K | 0.21 | 0.08 | | | 0.08 | | 0.17 | 0.03 | | 0.08 | 0.00 | 0.07 |
L | 0.17 | 0.13 | | | 0.12 | | 0.30 | 0.04 | | 0.10 | 0.02 | 0.16 |
M | 0.37 | 0.04 | | | 0.05 | | 0.41 | 0.02 | | 0.27 | 0.00 | 0.13 |
N | 0.40 | 0.04 | | | 0.06 | | 0.42 | 0.02 | | 0.24 | 0.00 | 0.08 |
O | 0.30 | 0.06 | | | 0.06 | | 0.48 | 0.04 | | 0.14 | 0.00 | 0.08 |
In view of the effectiveness of individual criteria, it is observed that
By comparison, the ratio criterion (i.e.,
As a recall from Section
Table
Grouping efficacy results with the two-stage solution process.
| | |||||
---|---|---|---|---|---|---|
HC | HC+GA | % Improve | HC | HC+GA | % Improve | |
Case A | 0.7817 | 0.7817 | 0 | 0.7550 | 0.7550 | 0 |
Case B | 0.8859 | 0.8859 | 0 | 0.6542 | 0.6542 | 0 |
Case C | 0.7587 | 0.7587 | 0 | 0.7180 | 0.7180 | 0 |
Case D | 0.7514 | 0.7514 | 0 | 0.8218 | 0.8218 | 0 |
Case E | 0.8590 | 0.8590 | 0 | 0.8302 | 0.8302 | 0 |
Case F | 0.8230 | 0.8230 | 0 | 0.7800 | 0.7800 | 0 |
Case G | 0.8511 | 0.8511 | 0 | 0.6861 | 0.6861 | 0 |
Case H | 0.7059 | 0.7059 | 0 | 0.7500 | 0.7500 | 0 |
Case I | 0.6475 | 0.6475 | 0 | 0.7444 | 0.7444 | 0 |
Case J | 0.7645 | 0.7645 | 0 | 0.7600 | 0.7600 | 0 |
Case K | 0.7106 | 0.7106 | 0 | 0.7241 | 0.7241 | 0 |
Case L | 0.7561 | 0.7561 | 0 | 0.8122 | 0.8122 | 0 |
Case M | 0.6720 | 0.6720 | 0 | 0.7798 | 0.7798 | 0 |
Case N | 0.6327 | 0.6327 | 0 | 0.8484 | 0.8484 | 0 |
Case O | 0.6644 | 0.6644 | 0 | 0.8854 | 0.8854 | 0 |
| ||||||
| | |||||
HC | HC+GA | % Improve | HC | HC+GA | % Improve | |
| ||||||
Case A | 0.7627 | 0.7627 | 0 | 0.5776 | 0.5959 | 3.17% |
Case B | 0.7961 | 0.7961 | 0 | 0.4650 | 0.4713 | 1.35% |
Case C | 0.8142 | 0.8142 | 0 | 0.5842 | 0.5884 | 0.72% |
Case D | 0.7290 | 0.7290 | 0 | 0.4938 | 0.5129 | 3.87% |
Case E | 0.6898 | 0.6898 | 0 | 0.4927 | 0.5065 | 2.80% |
Case F | 0.7097 | 0.7097 | 0 | 0.4689 | 0.5240 | 11.75% |
Case G | 0.7990 | 0.7990 | 0 | 0.5711 | 0.5747 | 0.63% |
Case H | 0.6667 | 0.6667 | 0 | 0.4134 | 0.4884 | 18.14% |
Case I | 0.6481 | 0.6481 | 0 | 0.4067 | 0.4281 | 5.26% |
Case J | 0.6995 | 0.6995 | 0 | 0.3785 | 0.4483 | 18.44% |
Case K | 0.6035 | 0.6035 | 0 | 0.3819 | 0.4369 | 14.40% |
Case L | 0.6079 | 0.6094 | 0.25% | 0.4657 | 0.4898 | 5.18% |
Case M | 0.5753 | 0.5765 | 0.21% | 0.3738 | 0.4586 | 22.69% |
Case N | 0.5891 | 0.5903 | 0.20% | 0.3556 | 0.3986 | 12.09% |
Case O | 0.6673 | 0.6673 | 0 | 0.4150 | 0.4494 | 8.29% |
Figure
Percentage of solution improvement versus grouping efficacy.
This paper has explored the statistics of similarity values to investigate the structuredness of cell formation (CF) matrix solutions. Using grouping efficacy (
While the worst-case computational complexity of clustering problems (e.g., NP hardness) is well recognized, the CDNM thesis (discussed in Section
The matrix data used to support the findings of this study are included within the supplementary information file (pictorial illustrations). Other data formats (e.g., Excel file) can be available from the corresponding author upon request.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the NSERC Discovery Grants, Canada.
One file of supplementary materials is included with the manuscript. This file contains the images of the matrix data for the problems presented in Section