An Implicit Memory-Based Method for Supervised Pattern Recognition

,


Introduction
Pattern recognition methods can be divided into two categories: two-stage and end-to-end. Most traditional pattern recognition methods are two-stage: feature extraction and pattern classification [1]. Feature extraction reduces the number of resources that are required to describe a large amount of raw data. e first step is to identify the measurable quantities that make these training sets X 1 , . . . , X l distinct from each other. e measurements used for the classification, such as mean value and the standard deviation, are known as features. In general, some features constitute a feature vector. Some information gets lost since feature extraction is not a lossless compression approach. e lost information cannot be used for pattern recognition. erefore, how to generate features is a fundamental issue. In the feature vector selection, two crucial issues are the best number of features and the classifier design [2]. For example, feature selection plays an important role in text classification [3]. e complexity of the practice data makes it arduous to use two-stage methods.
Nowadays, deep neural networks can be trained end-toend [4]. Raw data can reserve all information of the pattern. Inspired by the biological neural networks that constitute animal brains, the artificial neural network (ANN) is applied to do image classification [5], speech separation [6], forest fire prediction [7], etc. However, a major drawback of neural networks is the black-box character. Explaining why the neural networks make a particular decision is formidable. e knowledge representation of neural network is unreadable to humans [8]. e exact reason why trained deep neural networks can implement recognition remains an open question [9]. e training algorithm does not specify the way to recognize. A causal model is formidable to build or acquire. End-to-end learning relies on data for the cognitive task [10,11], where the data cannot tell the reasons [12].
In a supervised pattern recognition task, a set of training data (training set) is used to train a learning procedure. A training set is a set of instances that are properly labeled by hand with the corrected labels. e learning procedure attempts to recognize the instances as accurately as possible. e goal of the learning procedure is to minimize the error rate on a test set. e question arising in the recognition task is why a new data instance can be classified as a particular category. e problem of supervised pattern recognition can be stated as follows. Assume that the training set X i only contains instances with label i where i ∈ L ≜ 1, . . . , l { }, and X i ∩ X i � ∅ for any i ≠ j. Given l training sets X 1 , . . . , X l , the question is how to label a new instance x.
A memory system is involved in the process of recognition. Jacoby and Kelly posit that memory can serve as both storage and a tool [13]. Memory is treated as storage in a recall. In this case, the focus is on the past, and memory is used as computer storage. Meanwhile, memory (from experience) can be used as a tool to perceive and interpret present events. e implicit memory is acquired and used unconsciously and can serve as a tool/function [14][15][16]. In one experiment, two groups of people are asked several times to solve a Tower of Hanoi puzzle. One group is amnesic patients with heavily impaired long-term memory, and the other is composed of healthy subjects. e first group shows the same improvements over time as the second one, even if some participants claim that they do not even remember seeing the puzzle before.
ese findings strongly suggest that procedural memory is completely independent of declarative memory [17]. Given a game state, implicit memory is trained to output an operation. Memorizing the solving steps is not necessary. When the state appears again, the input can evoke the trained operator [18].
Usually, humans implement recognition processes unconsciously, and the focus is on current given images. So the memory works as a tool while implementing the recognition processes. e recognition process depends on the similarity comparison between the current input and the labeled instances. e more similar, the more likely they are in the same class and have the same label. However, the way to compare similarities without memorizing any labeled instances is not evident.
is paper proposes an implicit memory-based method for supervised pattern recognition. e method does not memorize or recall any labeled instances and is not in the two-stage or end-to-end categories. e proposed method has interpretability since similarity criteria are used in the process of recognition. A new instance is recognized as a particular class because the instance appears similar enough to the training data of the class. Compared with the k-nearest neighbors algorithm [19], the proposed method does not need to recall and iterate through the training sets. e process is consistent with the human ability of pattern recognition. People may forget most of the training instances, but they can recognize a new instance. e Mixed National Institute of Standards and Technology (MNIST) database (General site for the MNIST database: http://yann. lecun.com/exdb/mnist) is used to verify the proposed method. e rest of this paper is organized as follows. First, a model is built to describe implicit memory. Second, with the implicit memory model, a recognition algorithm is proposed. en an application and the analysis of experimental results are given.
Notations: |X| expresses the cardinality of the set X; 0, 1 { } n is an n-dimensional space constructed by 0 and 1; d is a metric on 0, 1 { } n ; define a distance function between any point z of 0, 1 { } n and any nonempty set Given two n-dimensional a and b, the element-wise product of a and b, written a°b, is the vector c with elements given by c i � a i b i ; |x| l 1 expresses the number of 1 in the binary vector x; the probability of X � x is written as . . , p . e expression of an inverter (NOT circuit) can be expressed as follows: If the input variable is called Z and the output variable is called Y, then Y � Z. If Z � 0, then Y � 1 and if Z � 1, then Y � 0. e expression of a 2-input AND gate can be expressed in equation form as follows: e output Y of an AND gate is 1 only when both inputs are 1 s. e expression of a 2-input OR gate can be expressed as follows: Y � Z 1 + Z 2 . e output Y of an OR gate is 1 when any one or more of the inputs are 1 s.

Model of Implicit Memory
In this section, a model is built to describe implicit memory. From one given input, implicit memory can give an output. Both the input and output of implicit memory are actual signals. e signals have explicit physical meaning in the real world, such as sound, light, electricity, etc. e types of input-output signals can be the same or different. Meanwhile, the input-output signals can be measured and represented as a binary vector. e action of implicit memory can be represented as a function ψ: Z ⟶ Y. e domain Z is a set of binary vectors. e element of Z represents the input signal of implicit memory. e codomain Y is also a set of binary vectors. e element of Y represents the output signal of implicit memory. e set of all ordered pairs (z, y) represents the training result of implicit memory where z is an element of Z and y is an element of Y. An example of the function is given as follows: the domain is chosen to be the set 000, 001, 100, 110 { }, and the codomain is the set 01, 10 { }. Figure 1 shows these mapping relationships.
A computer can store all ordered pairs (z, y) in a database. Given an input, the computer can search the database for output. With the database, the computer can simulate the external behaviors of implicit memory. However, this method needs to recall the ordered pairs stored in the database. is internal process is different from the process of implicit memory. e implicit memory does not need to execute memorizing or searching. e implicit memory is similar in operation to a high-speed logic circuit. From one given input, the implicit memory would not have a second's hesitation of giving output.

2
Discrete Dynamics in Nature and Society A logic circuit and a database have different implementations. In the preparation phase, the database stores the input-output pairs on a hard disk. However, the logic circuit connects the logic gates to implement the input-output maps. At run time, the database searches the given input on the hard disk to find an output. In the logic circuit, nevertheless, the input goes through the logic gates to get the output. e complexity of the practice data also makes it arduous to manually design logic circuits. However, ANN can simulate the actions of implicit memory automatically. e capacities of the implicit memory can be represented as a set X � (z 1 , y 1 ), . . . , (z n , y n ) where z i and y i are binary vectors. Without loss of generality, assume that all of z 1 , . . . , z n have the same length and all of y 1 , . . . , y n have the same length. And then, the operation of an implicit memory can be expressed with a table. e table lists all allowed input vectors with the corresponding output, as illustrated in Table 1 for an example. e table shows the output for each allowed input. Only the first bit's implementation of the output signal is considered since the other implementations can operate in the same way. e table can serve as a truth table.
Truth tables are widely used to describe the operation of logic circuits. With a truth table, the sum-of-products (SOP) expression can be written. e Boolean SOP expression can be obtained from the truth table by ORing the product terms, for which Y 1 � 1 is e first term in the expression is formed by ANDing the three variables Z 1 , Z 2 , and Z 3 . e second term is formed by ANDing the three variables Z 1 , Z 2 , and Z 3 . e logic gates can be used to implement the expression. e logic gates are constructed as follows: two inverters to form the Z 2 and Z 3 variables; two 3-input AND gates to form the terms Z 1 Z 2 Z 3 and Z 1 Z 2 Z 3 ; and one 2-input OR gate to form the final output function, Figure 2 shows the logic diagram. erefore, a logic circuit can simulate the actions of the implicit memory.

ANN-Based Boolean
Operation. NAND gate is a universal gate because it can be used to produce the NOT, the AND, the OR, and the NOR functions [20]. With NAND gates and appropriate connections, all logic circuits can be built.
Suppose there is a sigmoid neuron with two inputs, x 1 and x 2 . e sigmoid neuron has weights for each input, w 1 , w 2 , and an overall bias, b. e output of the sigmoid neuron is σ(w 1 x 1 + w 2 x 2 + b), where σ is called the sigmoid function and is defined by When w 1 � w 2 � − 20 and b � 30, the sigmoid neuron is shown in Figure 3. en the input 00 produces output 1 since σ(− 20 * 0 − 20 * 0 + 30) � σ(30) ≈ 1. Similar calculations show that the inputs 01 and 10 produce output 1. But input 11 erefore, the sigmoid neuron can implement a 2-input NAND gate.

Simulation of Implicit Memory. According to De
Morgan's laws, Boolean expression can be resolved into 2input NAND and 1-input NOT. For example, Table 1: Example of listing the operation of an implicit memory.

Input
Output Product term

3-input AND gate
Discrete Dynamics in Nature and Society A neural network of five layers can implement expression (4), as is shown in Figure 6. is method of constructing the network has generality. ANN can be configured to execute an arbitrary map. erefore, ANN can simulate the actions of the implicit memory.
Sometimes a situation arises in which some input variable combinations are not allowed. Because these unallowed states never occur in an application, they can be treated as "do not care" terms. For these "do not care" terms, either a 1 or a 0 may be assigned to the output. e "do not care" terms can be used to simplify an expression. Table 2 shows that, for each "do not care" term, a χ is placed in the output. As indicated in Figure 7, when grouping the 1s on the Karnaugh map, the χs can be treated as 1 s to make a larger grouping or as 0 s if they cannot have the advantage. e larger the group, the simpler the resulting term [20]. With "do not care", the neural network of five layers can be simplified to a connection line between Y 1 and Z 1 .
Using the ordered pair set X as a training set, the ANN can be trained to simulate the implicit memory. e optimization process of training the ANN has some similar properties to a Boolean expression simplification process.
eir purpose is to implement the required functionality with minimal resources, such as logic gates and activated neurons. e brain might have less than 1% neurons active at any given time [21][22][23]. If an input vector is not included in the training set, then the ANN can treat the input as a "do not care" term. e ANN can output either a 1 or a 0 when the input is a "do not care" term.    Figure 6: ANN can implement the Boolean expression . e neurons are configured to implement the NOT, the NAND, and the connecting line. In general, the neurons have more inputs. While the weight of inputs is 0, the outputs are not affected, which are not drawn.  Figure 7: Example of the use of "do not care" conditions to simplify an expression. Without "do not care" With "do not care" Y 1 � Z 1 . So you can see the advantage of using "do not care" terms to get the simplest expression. 4 Discrete Dynamics in Nature and Society Suppose the supervised learning algorithm trains an ANN. While the given input is in the training set, the output of the trained ANN can be expected. Otherwise, the output of the trained ANN cannot be expected. Only by actual measuring can the output be known. e measurement process is like sampling from a statistical population. e trained ANN model can be expressed as a function f. e function f has the following properties: (i) e output of the function is a specified value when the input is in the training set. at is, if (z, y) is in the training set, then f(z) � y. (ii) Otherwise, the output can be assigned by generating a sample from a statistical population.
e following section proposes a pattern recognition algorithm with the above function f.

Recognition Based on the Implicit
Memory Model e principle of recognition is based on the intuitive assumption that examples in the same class are closer/similar [24,25]. erefore, a recognition algorithm is proposed to estimate the similarity via the implicit memory model.
When a function f can precisely predict any masked part of the signal x and x is not the same as x, distinguishing between the signal x and any other signal x is feasible. e following theorem describes how to recognize the signal x.
, from a random number generator. e generator can produce binary digits, 0, 1, through equal probability sampling.
. , x n ] be a new given signal, and following theorem describes how to recognize the signal x. where Pr (1 − m)), x°m) e proof of eorem 1 is given in Appendix. According to equations (7)-(9), recognizing the signal x with the function f is feasible. When a point can help the function f to retrieve the signal x, the point is called an evoked point. Let ere are |D| evoked points, where |D| � 2 |x| l 1 � 2 x 1 +x 2 +···+x n . If Pr x i � 1 � (1/2), then the expected value of |D| is 2 (n/2) . e function f can retrieve the signal x from the evoked point, h � x°(1 − m), in the current input signal x since the signal x can be expressed as h + f(h). By comparing f(h) with x°m, identifying whether x is the same as x or not is feasible. If there exists a mask m ∈ 0, 1 { } n that can make (1 − m)) � x°m. In a similar way of recognizing the signal x ( eorem 1), identifying whether the new given signal x is in a set X ≜ x 1 , x 2 , . . . , x p or not is also feasible. where If 0], then the cardinality of B i satisfies that If e proof of eorem 2 is given in Appendix. Assume that any signal in set X does not equal When a new signal x appears, it is possible to identify whether x is in the set X or not with a function for any x i°( 1 − m) ∈ B i , i ∈ P. If an input z ∈ 0, 1 { } n is not included B 1 ∪ · · · ∪ B p , then the output of the function f is assigned by generating a signal from a random number generator. If there exists a mask m ∈ 0, 1 { } n that can make f(x°(1 − m)) � x°m, then Pr x ∈ X { } � 1 − (1/2 n ). If x ∈ X (for example x � x j ), then there exists at least one mask m ∈ 0, 1 { } n that can make f(x°(1 − m)) � x°m since |B j | ≥ 1.
To identify whether a new given signal x has the same label as X, the distance/similarity between x and X can be used, such as Without memorizing any element in X, it is also possible to estimate the similarity by using the function f. Let where j ∈ 1, 2, . . . , p i . e evoked points and the masks can be used to retrieve x i . With the function f, x i can be gotten by e similarity between x and X can be estimated by e process of similarity estimation is influenced by x°(1 − m) ∉ B. If x has the same meaning as element in X, then x has to overcome the influence. e drawback of this estimation is that the algorithm might not traverse all elements in X. If D ∩ B i � ∅, then there are no evoked points that can help us to retrieve x i . However, the advantage is that intentionally recollecting all elements in X is not necessary.
By training l functions f 1 , . . . , f l to predict the masked part of elements in their respective instance sets, i.e., X 1 , . . . , X l , recognizing a new given signal x as one category is feasible.
where j ∈ 1, . . . , for any x i,j°( 1 − m) ∈ B i,j , j ∈ 1, . . . , p i . If an input z ∈ 0, 1 { } n is not included B i,1 ∪ · · · ∪ B i,p i , then the output of the function f i is assigned by generating a signal from a random number generator.
When a signal x is to be labeled, the similarity rule is a natural choice. To reduce the influence on the analysis of similarity, replace the infimum with average. e similarity between x and X i can be estimated by 1 − m)) . (23) e smaller the similarity is, the more likely they are in the same category and have the same label. e above recognition model is presented in Algorithm 1, which is called the Implicit Recognition Model.
In summary, the Implicit Recognition Model uses similarity comparison to do the recognition. But intentionally recalling any labeled signals is not necessary. e focus is not on the past, but on the current input. e evoked points are objective features, which can help us to retrieve labeled signals from the current input. Both evoked points and retrieved signals have explicit physical meaning in the real world, which are objective pieces of evidence to support the judgment.

Experiment
In the application, the Mixed National Institute of Standards and Technology (MNIST) database is used to verify the Implicit Recognition Model. e MNIST database is one of the famous image classification benchmarks [26]. ere are 60,000 instances in the training set and 10,000 instances in the testing set. e database is created by remixing the samples of NIST's original data sets. e black and white images from NIST are normalized to fit into a 28 × 28 pixel bounding box and antialiased, which introduces gray-scale levels [27]. e first 500 elements of the training set are used for training. All of the testing instances are used for testing.
X i is labeled i and only contains training images with label i where i � 0, 1, . . . , 9. Among the first 500 training images, there are p 0 � 62 training images labeled 0, p 1 � 51 training images labeled 1, and so on. |X 0 | + · · · + |X 9 | � 500. All the 500 images are different. X i ∩ X j � ∅ for any i ≠ j.
In the process of recognition, an image in the testing set is recognized as the class of the highest similarity. To execute Algorithm 1, a distance function d is redefined first. Cosine distance is a usual measure of similarity in machine learning [28,29]. Given two vectors a and b, the cosine similarity cos(θ) is represented by where a i and b i are components of the vectors a and b, respectively. e angle θ ∈ [0, π] is used as the distance between two images in this application. erefore, With the artificial neural network, the implicit memory is simulated to execute the Implicit Recognition Model (Algorithm 1). e main process of Algorithm 1 is to train 10 approximators of the functions f 0 , f 1 , . . . , f 9 . en, f i can be used to retrieve images in X i from the evoked points of the current input image x and estimate the similarity between x and X i .
To imitate the human recognition process, 10 ANNs are trained in two steps. e first step is to train a base neural network to complete all training images by filling in missing regions of a rectangular shape. is step does not use labels. In order to finally generate the specific function f i , the second step is to add routing layers to the trained base neural network. With supervision, the routing layers are trained on the instance set X i . Routing neural networks are used to approximate the specific functions.

Architecture of ANN.
A fully convolutional neural network is modified to be the base neural network. Refer to [30] for more details of the original fully convolutional neural network. e modified network parameters are given in Tables 3-6. Behind each convolution layer, except the last one, there is a Rectified Linear Unit (ReLU) layer. e output layer consists of a convolutional layer with a sigmoid function instead of a ReLU layer to normalize the output to the [0, 1] range. "Outputs" refers to the number of output channels for the output of the layer (Tables 3-6). Fully connected (FC) layers refer to the standard neural network layers. e output layer of discriminator consists of a fully connected layer with a sigmoid transfer layer. e discriminator outputs the probability that an input image came from real images rather than the completion network.
Routing layers are added to the trained base neural network to generate an approximator of the specific function f i . Figure 8 shows the architecture of a routing neural network. ree routing layers are inserted in front of each network layer. e routing layer performs element-wise multiplication between the input x � [x 1 , . . . , x n ] and the routing weights w � [w 1 , . . . , w n ], which can be represented as x°w � [x 1 w 1 , . . . , x n w n ].
e initial value of routing weight w is set to 1 � [1, . . . , 1]. e role of routing layers is to keep f i (x i,k°( 1 − m))°m + x i,k°( 1 − m) similar to x i,k , but the distance between f j (x i,k°( 1 − m))°m + x i,k°( 1 − m) and x i,k is becoming far with the training, for any j ≠ i, 1 , x i,2 , . . . , x i,p i is a training instance set and y i is the known label of X i ; f i , labeled y i , is an approximator and can be trained; i ∈ L; x is a testing instance. * Cognitive Process * (1) for each i ∈ L do (2) repeat (3) Observing a signal in X i ; (4) Training the prediction function f i to predict the masked parts of the signal; (5) until For each signal x i,k ∈ X i and any j ≠ i, s i (x i,k ) < s j (x i,k ). (6) end for (7) return Prediction functions f 1 , . . . , f l labeled with y 1 , . . . , y l respectively. * Recognition Process * (1) for each i ∈ L do (2) Estimating the similarity between testing signal x and instance set X i ; Let ς i � s i (x); (4) end for (5) Assigning the image x to the class of its highest similarity; (6) l � argmin i ς i ; (7) return y^l.
All the 500 training images (denoted X) are used to train the base neural network.
When the training of the base neural network is finished, routing layers are added to generate an approximator of specific prediction function f i . While the routing layers are training, the parameters of the base neural network remain unchanged. Ten digit labels/categories correspond to 10 approximators of prediction functions. Each prediction function f i has a unique set of routing parameters. e training method for routing layers is the same as the base neural network. e routing neural network is also trained to complete images by filling in masked regions of a rectangle. However, only the instance set X i is used to train routing parameters of approximator of f i . e three routing layers are trained one after another. While one routing layer is training, the other two routing layers remain unchanged. While s i (x i,k ) is becoming less than s j (x i,k ) for each signal x i,k ∈ X i and any j ≠ i, the cognitive process finishes. e trained routing neural network is an approximator of the specific prediction tool/function f i .

Experimental Results.
e neural network models are created by TensorFlow 1.8.0. e learning rate of the Adam optimizer is 10 − 3 . A batch size of one image is used for training. First, the base neural network is trained for 100 iterations.
en both the discriminator and base neural network are jointly trained. In each iteration, all 500 training images are shuffled and traversed. For each image, a masked rectangle region is randomly selected to train the neural networks. By 1000 iterations, the base neural network can predict the masked part ( Figure 9). en, keep the base neural network unchanged, and add routing layers. e first routing layers are trained for 260 iterations and then remain unchanged. e second routing layers are also trained for 260 iterations and then remain unchanged. e approximation of a specific prediction function finishes when the third routing layers are also trained for 260 iterations.
A set of routing parameters correspond to a specific prediction function. Each approximator of functions f 0 , . . . , f 9 is corresponding to a routing neural network with a particular set of routing parameters. e approximator of function f i can only work on the training set X i . Figure 10 gives an example. e mean distance d i,f i is less than Figure 11). With the approximator of the prediction function f i , the algorithm can estimate the similarity between a testing image x and the instance set X i . A subset of M ′ (denoted M r ) is used to estimate the similarity. Equations (23) and (25) are modified by 30 masks are randomly generated, and |M r | � 30. e masks remain unchanged when the algorithm calculates ς 0 , . . . , ς 9 . Finally, the image x can be recognized as the digit l satisfying that ς l ≤ ς i for any i ∈ 0, 1, . . . , 9 { }. e masked part is the key for doing the similarity comparison. If the mask's size is too small, x and x°(1 − m) + f(x°(1 − m))°m are mostly the same.
ere is no enough data to demonstrate the similarity. e main cause of high similarity should be from the accuracy of prediction (i.e., f(x°(1 − m))°m is similar to x°m), not the small mask  Discrete Dynamics in Nature and Society 9 window (i.e., |m| l 1 is small). An appropriate size can ensure the estimation results would be more comparable and credible. For a similarity comparison, the mask's window size cannot be too small. Meanwhile, the size cannot be too large to train ANN. e bigger the size is, the harder the masked parts are to be predicted. Using the Implicit Recognition Model (Algorithm 1), the correct recognition probability approaches 100% on the training set (contains 500 training images). "Digit 4" and "digit 9" are easily to be confused. 80.44% recognition results of the Implicit Recognition Model are the same as the labels, and the confusion matrix is shown in Figure 12. Experimental results show the efficiency of the proposed Implicit Recognition Model. e proposed model recognizes image x as digit l, because x is most similar to the instances in X l . As the similarity becomes lower, both the recognition precision of the Implicit Recognition Model and the one nearest neighbor algorithm (Explicit Recognition Model) decrease while rotating the testing images ( Figure 13). Even using the testing image x itself to do the rotation similarity comparison, the distance d(x, r(x, α)) becomes further still (Figure 14), where r(x, α) represents that image x rotates α degrees. While the rotation angle is around 90 and 270 degrees, the distance approaches 1.22 radians, which is the mean distance between a noise image and the testing image. e pixel of testing image takes integer values between 0 and 255. e pixel of noise image obeys discrete uniform distribution on the integers 0, 1, 2, . . . , 255. In Figures 13 and 14, the curves show W-shape and M-shape, because the images labeled 1 are similar to their rotated images when the rotation angle is around 180 degrees. Rotation of image can cause similarity changes. erefore, a rotated image can also be used to define a new category if the similarity change is big enough. For example, the main difference between "W" and "M" is the direction of the opening (Figures 13 and 14). Furthermore, the proposed algorithm is robust while adding random noise n to the testing images. Suppose that n � [n 1 , . . . , n 784 ] where n i is an independent random variable. Noise point n i obeys discrete uniform distribution on the integers β, β + 1, β + 2, . . . , c, where β and c restrict the value range (0 ≤ β ≤ c ≤ 255). β, c represents the noise level. e bigger the values, the higher the noise level. e   testing images x � [x 1 , . . . , x 784 ] are disturbed by noise n (Figure 15) to test the robustness of algorithms. e image in noise is denoted as x noise � [T(x 1 + n 1 ), . . . , T(x 784 + n 784 )] where Using PyTorch 1.5.1, the traditional neural network models are modified to do the recognition, too. Two linear layers after the output layer are added. In front of each linear layer, there is a Rectified Linear Unit (ReLU) layer. e testing images are resized and repeated along the channel. en, the stochastic gradient descent optimizer and a batch size of 32 images are used for training.
By training 1000 iterations, the traditional neural network models can work when the noise level is lower than 0, 127 { }. However, the performance is not stable with increase of the noise level. e recognition precision of many traditional neural networks drops sharply while the change in the noise level is not big, as shown in Figure 16. However, the performance curve of Alexnet and VGG is smooth, as shown in Figure 17. Overfitting cannot provide the reasons behind the phenomenon since the neural networks are black-boxes. e   human can recognize the digits even when the noise level is 229, 255 { }. erefore, the more straight the performance curve, the closer it is to the human capability. When the noise level is 102, 255 { }, the recognition accuracy of the traditional neural networks is less than 20%, but the accuracy of the Implicit Recognition Model is about 50%. e Implicit Recognition Model improves the robustness significantly.

Conclusion
Scientists early know the existence of implicit memory. is paper establishes a mathematical model of implicit memory and explains ANN can simulate the model from the view of digital logic circuit design. A trained ANN can be expressed as a function. When a given input is not in the training set, the output of the ANN is hard to control. With the function, this paper proposes a new pattern recognition method, the Implicit Recognition Model.
e Implicit Recognition Model works under the similarity rule and has interpretability. Compared to the one nearest neighbor algorithm (Explicit Recognition Model), the Implicit Recognition Model makes similarity comparisons without recalling any instances. e experiment results show the efficiency of the Implicit Recognition Model.