Classification of Cancer Recurrence with Alpha-Beta BAM Marı́a

Bidirectional Associative Memories BAMs based on first model proposed by Kosko do not have perfect recall of training set, and their algorithm must iterate until it reaches a stable state. In this work, we use the model of Alpha-Beta BAM to classify automatically cancer recurrence in female patients with a previous breast cancer surgery. Alpha-Beta BAM presents perfect recall of all the training patterns and it has a one-shot algorithm; these advantages make to Alpha-Beta BAM a suitable tool for classification. We use data from Haberman database, and leave-one-out algorithm was applied to analyze the performance of our model as classifier. We obtain a percentage of classification of 99.98%.


Introduction
Breast cancer is a preponderant disease in the world and it is death cause of women.The women who have suffered from breast cancer and have overcome it have the risk to suffer a relapse; therefore women have to be monitored after the tumor has been extracted.
The prediction of recurrent cancer in women with previous surgery has high monetary and social costs; as a result, many researchers working in the Artificial Intelligent AI topic have been attracted to this problem and they have used many AI tools among others for breast cancer prediction.Some of these works are described as follows.
Many methods of AI have shown better results than the obtained by the experimental methods; for example, in 1997 Burke et al. 1 compared the accuracy of TNM staging system with the accuracy of a multilayer backpropagation Artificial Neural Network ANN for predicting the 5-year survival of patients with breast carcinoma.ANN increased the prediction capacity in 10% obtaining the final result of 54%.They used the following parameters: tumor size, number of positive regional lymph nodes, and distant metastasis.Domingos 2 used a breast cancer database from UCI repository for classifying survival of patients using the unification of two widely used empirical approaches: rule induction and instance-based learning.
In 2000, Boros et al. 3 used the Logical Analysis of Data method to predict the nature of the tumor: malignant or benign.Breast Cancer Wisconsin database was used.The classification capacity was 97.2%.This database was used by Street and Kim 4 who combined several classifiers to create a high-scale classifier.Also, it was used by Wang and Witten 5 ; they presented a general modeling method for optimal probability prediction over future observations and they obtained the 96.7% of classification.
K. Huang et al. 6 construct a classifier with the Minimax Probability Machine MPM , which provides a worst-case bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points.They used the same database utilized by Domingos.The classification capacity was of 82.5%.
In other types of breast cancer diagnosis, C.-L. Huang et al. 7 employed the Support Vector Machine method to predict a breast tumor from the information of five DNA viruses.
In the last two decades, the impact of breast cancer in Mexico has increased 8 .Every year 3500 women die due to breast cancer, becoming the first death cause and the second frequent type of tumor.Therefore, we applied Associative Models to classify recurrence cancer.
The area of Associative Memories, as a relevant part of Computing Sciences, has acquired great importance and dynamism in the activity developed by international research teams, specifically those who research topics related with theory and applications of pattern recognition and image processing.Classification is a specific homework of pattern recognition because its main goal is to recognize some features of patterns and put these patterns into the corresponding class.
Associative Memories have been developed, at the same time with Neural Networks, from the first model of artificial neuron 9 to neural networks models based on modern concepts such as mathematical morphologic 10 getting through the important works of pioneers in neural networks perceptron-based 11-13 .
In 1982 Hopfield presents his associative memory; this model is inspired in physical concepts and has as particularity an iterative algorithm 14 .This work has great relevance because Hopfield proved that interactions of simple processing elements similar to neurons give rise to collective computational properties, such as memory stability.
However, Hopfield model has two disadvantages: firstly, associative memory shows a low recall capacity, 0,15n, where n is the dimensions of stored patterns; secondly, Hopfield memory is autoassociative, which means that it is not able to associate different patterns.
In 1988, Kosko 15 developed a heteroassociative memory from two Hopfield memories to overcome the second disadvantage of Hopfield model.Bidirectional Associative Memory BAM is based in an iterative algorithm the same as Hopfield.Many later models were based on this algorithm and they replaced the original learning rule with an exponential rule 16-18 ; other models used a multiple training method and dummy addition 19 to achieve more pairs of patterns to be stable states and, at the same time, they eliminated spurious states.Lineal programming techniques 20 , gradient descent method 21, 22 , genetic algorithms 23 , and delayed BAMs 24, 25 had been used with the same purpose.There are many other models which are not based on Kosko, so that they are not iterative and have not stability problems: Morphologic 26 and Feedforward 27 BAM.All these models have appeared to overcome the low-capacity recall problem showed by the first BAM; however, none of them have could recover all training patterns.Besides, these models require the patterns to have certain conditions such as Hamming distance, orthogonality, lineal independence, and lineal programming solutions, among others.
The bidirectional associative memory model used in this work is based on Alpha-Beta Associative Memories 28 ; it is not an iterative process and does not have stability problems.Alpha-Beta BAM recall capacity is maximum: 2 min n,m , where n and m are the dimensions of input and output patterns, respectively.This model always shows perfect recall without any condition.Alpha-Beta BAM perfect recall has mathematical bases 29 .It has been demonstrated that this model has a complexity of O n 2 see Section 2.4 .Its main application is pattern recognition and it has been applied as translator 30 and fingerprints identifier 31 .
Because Alpha-Beta BAM shows perfect recall, it is used as a classifier in this work.We used Haberman database, which contains data from cancer recurrence patients, because it has been included in several works to prove other classification methods such as Support Vector Machines SVMs combined with Cholesky Factorization 32 , Distance Geometry 33 , Bagging technique 34 , Model-Averaging with Discrete Bayesian Network 35 , ingroup and out-group concept 36 , and ARTMAP fuzzy neuronal networks 37 .Alpha-Beta BAM pretends to surpass the previous results, doing the observation that none of the aforementioned works have used associative models for classifying.
In Section 2 we present basic concepts of associative models along with the description of Alpha-Beta associative memories and Alpha-Beta BAM and its complexity.Experiments and results are showed in Section 3 along with the analysis of our proposal with leave-oneout method.

Alpha-Beta Bidirectional Associative Memories
In this section Alpha-Beta Bidirectional Associative Memory is presented.However, since it is based on the Alpha-Beta autoassociative memories, a summary of this model will be given before presenting our model of BAM.

Basic Concepts
Basic concepts about associative memories were established three decades ago in 38-40 ; nonetheless here we use the concepts, results, and notation introduced in 28 .An associative memory M is a system that relates input patterns and outputs patterns, as follows: x → M → y with x and y being the input and output pattern vectors, respectively.Each input vector forms an association with a corresponding output vector.For k integer and positive, the corresponding association will be denoted as x k , y k .Associative memory M is represented by a matrix whose ijth component is m ij .Memory M is generated from an a priori finite set of known associations, known as the fundamental set of associations.
If μ is an index, the fundamental set is represented as { x μ , y μ | μ 1, 2, . . ., p} with p being the cardinality of the set.The patterns that form the fundamental set are called fundamental patterns.If it holds that x μ y μ , for all μ ∈ {1, 2, . . ., p}, M is autoassociative; otherwise it is heteroassociative; in this case it is possible to establish that ∃μ ∈ {1, 2, . . ., p} for which x μ / y μ .A distorted version of a pattern x k to be recuperated will be denoted as x k .
If when feeding a distorted version of x with ω {1, 2, . . ., p} to an associative memory M, it happens that the output corresponds exactly to the associated pattern y , we say that recuperation is perfect.

Alpha-Beta Associative Memories
Among the variety of associative memory models described in the scientific literature, there are two models that, because of their relevance, it is important to emphasize morphological associative memories which were introduced by Ritter et al. 39 and Alpha-Beta associative memories.Because of their excellent characteristics, which allow them to be superior in many aspects to other models for associative memories, morphological associative memories served as starter point for the creation and development of the Alpha-Beta associative memory.
The Alpha-Beta associative memories are of two kinds and are able to operate in two different modes.The operator α is useful at the learning phase, and the operator β is the basis for the pattern recall phase.The heart of the mathematical tools used in the Alpha-Beta model is two binary operators designed specifically for these memories.These operators are defined as follows: first, we define the sets A {0, 1} and B {0, 1, 2}, and then the operators α and β are defined in Tables 1 and 2, respectively: The sets A and B, the α and β operators, along with the usual ∧ minimum and ∨ maximum operators form the algebraic system A, B, α, β, ∧, ∨ which is the mathematical basis for the Alpha-Beta associative memories.
Below are shown some characteristics of Alpha-Beta autoassociative memories.3 The memory is a square matrix, for both modes, V and Λ.If x μ ∈ A n , then 2.1 and according to α : A × A → B, we have that v ij and λ ij ∈ B, for all i ∈ {1, 2, . . ., n} and for all j ∈ {1, 2, . . ., n}.
In recall phase, when a pattern x μ is presented to memories V and Λ, the ith components of recalled patterns are 2.2

Alpha-Beta BAM
Generally, any bidirectional associative memory model appearing in current scientific literature could be draw as Figure 1 shows.General BAM is a "black box" operating in the next way: given a pattern x, associated pattern y is obtained, and given the pattern y, associated pattern x is recalled.Besides, if we assume that x and y are noisy versions of x and y, respectively, it is expected that BAM could recover all corresponding free noise patterns x and y.
The model used in this paper has been named Alpha-Beta BAM since Alpha-Beta associative memories, both max and min, play a central role in the model design.However, before going into detail over the processing of an Alpha-Beta BAM, we will define the following.
In this work we will assume that Alpha-Beta associative memories have a fundamental set denoted by { x μ , y μ | μ 1, 2, . . ., p} x μ ∈ A n and y μ ∈ A m , with A {0, 1}, n ∈ Z , p ∈ Z , m ∈ Z , and 1 < p ≤ min 2 n , 2 m .Also, it holds that all input patterns are different; M that is x μ x ξ if and only if μ ξ.If for all μ ∈ {1, 2, . . ., p} it holds that x μ y μ , the Alpha-Beta memory will be autoassociative; if on the contrary, the former affirmation is negative, that is, ∃μ ∈ {1, 2, . . ., p} for which it holds that x μ / y μ , then the Alpha-Beta memory will be heteroassociative.Definition 2.1 One-Hot .Let the set A be A {0, 1} and p ∈ Z , p > 1, k ∈ Z , such that 1 ≤ k ≤ p.The kth one-hot vector of p bits is defined as vector h k ∈ A p for which it holds that the kth component is h k k 1 and the set of the components are h k j 0, for all j / k, 1 ≤ j ≤ p. Remark 2.2.In this definition, the value p 1 is excluded since a one-hot vector of dimension 1, given its essence, has no reason to be.

Definition 2.3 Zero-Hot . Let the set
The kth zero-hot vector of p bits is defined as vector h k ∈ A p for which it holds that the kth component is h k k 0 and the set of the components are Remark 2.4.In this definition, the value p 1 is excluded since a zero-hot vector of dimension 1, given its essence, has no reason to be.
Definition 2.5 Expansion vectorial transform .Let the set A be A {0, 1} and n ∈ Z , y m ∈ Z .Given two arbitrary vectors x ∈ A n and e ∈ A m , the expansion vectorial transform of order m, τ e : A n → A n m , is defined as τ e x, e X ∈ A n m , a vector whose components are X i x i for 1 ≤ i ≤ n and X i e i for n 1 ≤ i ≤ n m.Definition 2.6 Contraction vectorial transform .Let the set A be A {0, 1} and n ∈ Z , y m ∈ Z such that 1 ≤ m < n.Given one arbitrary vector X ∈ A n m , the contraction vectorial transform of order m, τ c : A n m → A m , is defined as τ c X, m c ∈ A m , a vector whose components are c i X i n for 1 ≤ i < m.
In both directions, the model is made up by two stages, as shown in Figure 2.For simplicity, the first will describe the process necessary in one direction, in order to later present the complementary direction which will give bidirectionality to the model see Figure 3 .
The function of Stage 2 is to offer a y k as output k 1, . . ., p given an x k as input.Now we assume that as input to Stage 2 we have one element of a set of p orthonormal vectors.Recall that the Linear Associator has perfect recall when it works with orthonormal vectors.In this work we use a variation of the Linear Associator in order to obtain y k , parting from a one-hot vector h k in its kth coordinate.
For the construction of the modified Linear Associator, its learning phase is skipped and a matrix M representing the memory is built.Each column in this matrix corresponds to

Modified
each output pattern y μ .In this way, when matrix M is operated with a one-hot vector h k , the corresponding h k will always be recalled.
The task of Stage 1 is the following: given an x k or a noisy version of it x k , the one-hot vector h k must be obtained without ambiguity and with no condition.In its learning phase, Stage 1 has the following algorithm. 2.3 5 Create modified Linear Associator: Recall phase is described through the following algorithm.
1 Present, at the input to Stage 1, a vector from the fundamental set x μ ∈ A n , for some index μ ∈ {1, . . ., p}.
2 Build vector: If r is one-hot vector, it is assured that k μ, then y μ LAy • r.STOP.

Mathematical Problems in Engineering
Else: 6 For 1 ≤ i ≤ p : w i u i − 1.
7 Do expansion: G τ e x μ , w ∈ A n p .8 Obtain a vector: S Λ∇ β G ∈ A n p . 9 Do contraction: s τ c S μ , n ∈ A p . 10 If s is zero-hot vector, then it is assured that k μ, y μ LAy • s, where s is the negated vector of s.STOP.Else: 11 Do operation r ∧ s, where ∧ is the symbol of the logical AND operator, so y μ LAy • r ∧ s .STOP.
The process in the contrary direction, which is presenting pattern y k k 1, . . ., p as input to the Alpha/Beta BAM and obtaining its corresponding x k , is very similar to the one described above.The task of Stage 3 is to obtain a one-hot vector h k given a y k .Stage 4 is a modified Linear Associator built in similar fashion to the one in Stage 2.

The Alpha-Beta BAM Algorithm Complexity
An algorithm is a finite set of precise instructions for the realization of a calculation or to solve a problem 41 .In general, it is accepted that an algorithm provides a satisfactory solution when it produces a correct answer and is efficient.One measure of efficiency is the time required by the computer in order to solve a problem using a given algorithm.A second measure of efficiency is the amount of memory required to implement the algorithm when the input data are of a given size.
The analysis of the time required to solve a problem of a particular size implies finding the time complexity of the algorithm.The analysis of the memory needed by the computer implies finding the space complexity of the algorithm.

Space Complexity
In order to store the px patterns, a matrix is needed.This matrix will have dimensions p x n p .Input patterns and the added vectors, both one-hot and zero-hot, are stored in the same matrix.Since x ∈ {0, 1}, then this values can be represented by character variables, taking 1 byte each.The total amount of bytes will be Bytes x p n p .
A matrix is needed to store the py patterns.This matrix will have dimensions p • m p .Output patterns and the added vectors, both one-hot and zero-hot, are stored in the same matrix.Since y ∈ {0, 1}, then this values can be represented by character variables, taking 1 byte each.The total amount of bytes will be Bytes y p m p .
During the learning phase, 4 matrices are needed: two for the Alpha-Beta autoassociative memories of type max, Vx and Vy, and two more for the Alpha-Beta autoassociative memories of type min, Λx y Λy.Vx and Λx have dimensions of n p x n p , while Vy and Λy have dimensions m p x m p .Given that these matrices hold only positive integer numbers, then the values of their components can be represented with character variables of 1 byte of size.The total amount of bytes will be Bytes VxΛx 2 n p 2 and Bytes VyΛy 2 m p 2 .A vector is used to hold the recalled one-hot vector, whose dimension is p.Since the components of any one-hot vector take the values of 0 and 1, these values can be represented by character variables, occupying 1 byte each.The total amount of bytes will be Bytes vr p.The number of instances was reduced at 287 due to some records appeared as duplicated or in some cases records were associated with a same class.From the 287 records, 209 belonged to class 1 and the 78 remainder belonged to class 2.
Implementation of Alpha-Beta BAM was accomplished on a Sony VAIO laptop with Centrino Duo processor and language programming was Visual C 6.0.
Leave-one-out method 43 was used to carry out the performance analysis of Alpha-Beta BAM classification.This method operates as follows: a sample is removed from the total set of samples and these 286 samples are used as the fundamental set; therefore, we used the samples to create the BAM.Once Alpha-Beta BAM learnt, we proceeded to classify the 286 samples along with the removed sample, and this means that we presented to the BAM every sample belonging to fundamental set as well as the removed sample.
The process was repeated 287 times, which corresponds to the number of records.Alpha-Beta BAM had the following behavior: in 278 times, Alpha-Beta BAM classified in perfect way the excluded sample and in the 9 remainder probes it did not achieve to classify correctly.Here, it must be emphasized that incorrect classification appears just with the excluded sample, because in all probes belonging to fundamental set, Alpha-Beta BAM shows perfect recall.Therefore, in 278 times the classification percentage was of 100% and 99.65% in the remainder.Calculating the average of classification from the 287 probes, we observed that Alpha-Beta BAM classification was of 99.98%.
In Table 3 there can be observed results comparisons of some classification methods such as SVM-Bagging, Model-Averaging, in-group/out-group method, fuzzy ARTMAP neural network, and Alpha-Beta BAM.Methods presented in 24, 25 do not show classification results and they just indicate that their algorithms are used to accelerate the method performance.
Alpha-Beta BAM exceeds the other methods by a 9.98% and none of these algorithms use an associative model.
We must mention that Haberman database has records very similar to each other, and this feature could complicate the performance of some BAMs, due to the restriction respecting to the data characteristics, for example, Hamming distance or orthogonality.However, Alpha-Beta BAM does not present these kinds of data limitations and we had proved it with the obtained results.

Conclusions
The use of bidirectional associative memories as classifiers using Haberman database has not been reported before.In this work we use the model of Alpha-Beta BAM to classify cancer recurrence.
Our model present perfect recall of the fundamental set in contrast with Kosko-based models or morphological BAM; this feature makes Alpha-Beta BAM the suitable tool for pattern recognition and, particularly, for classification.
We compared our results with the following methods: SVM-Bagging, Model-Averaging, in-group/out-group method, and fuzzy ARTMAP neural network, and we found that Alpha-Beta BAM is the best classifier when Haberman database was used, because the classification percentage was of 99.98% and exceeds the other methods by a 9.98%.
With these results we can prove that Alpha-Beta BAM not just has perfect recall but also can recall the most of records not belonging to training patterns.
Even though patterns are very similar to each other, Alpha-Beta BAM was able to recall many of the data, so that it could perform as a great classifier.Most of Kosko-based BAMs have low recalling when patterns show features as Hamming distance, orthogonality and linear independence; however, Alpha-Beta BAM does not impose any restriction in the nature of data.
The next step in our research is to test Alpha-Beta BAM as classifier using other databases as Breast Cancer Wisconsin and Breast Cancer Yugoslavia and with standard databases as Iris Plant or MNIST; therefore we can obtain the general performance of our model.However, we have to take into account the "no free lunch" theorem which asserts that any algorithm could be the best in one type of problems but it can be the worst in other types of problems.In our case, our results showed that Alpha-Beta BAM is the best classifier when Haberman database was used.

Figure 1 :
Figure 1: General scheme of a Bidirectional Associative Memory.

Figure 3 :
Figure 3: Schematics of the process done in the direction from x to y.Here, only Stage 1 and Stage 2 are shown.Notice that h k k

Table 3 :
Results of the classifications with different methods using Haberman database.