Handwritten Geez Digit Recognition Using Deep Learning

. Amharic language is the second most spoken language in the Semitic family after Arabic. In Ethiopia and neighboring countries more than 100 million people speak the Amharic language. Tere are many historical documents that are written using the Geez script. Digitizing historical handwritten documents and recognizing handwritten characters is essential to preserving valuable documents. Handwritten digit recognition is one of the tasks of digitizing handwritten documents from diferent sources. Currently, handwritten Geez digit recognition researches are very few, and there is no available organized dataset for the public researchers. Convolutional neural network (CNN) is preferable for pattern recognition like in handwritten document recognition by extracting a feature from diferent styles of writing. In this work, the proposed model is to recognize Geez digits using CNN. Deep neural networks, which have recently shown exceptional performance in numerous pattern recognition and machine learning applications, are used to recognize handwritten Geez digits, but this has not been attempted for Ethiopic scripts. Our dataset, which contains 51,952 images of handwritten Geez digits collected from 524 individuals, is used to train and evaluate the CNN model. Te application of the CNN improves the performance of several machine-learning classifcation methods sig-nifcantly. Our proposed CNN model has an accuracy of 96.21% and a loss of 0.2013. In comparison to earlier research works on Geez handwritten digit recognition, the study was able to attain higher recognition accuracy using the developed CNN model.


Introduction
Amharic language is the only African language with its own alphabet and writing system while most of the other African languages use Latin and Arabic alphabets for their own writing system [1]. Te Federal Democratic Republic of Ethiopia and other regional states use the Amharic language as their ofcial working language. It is the mother language for over 50 million people and the second language for over 100 million people in Ethiopia [1]. Arabic is the only Semitic language spoken more than Amharic in the world. Amharic is also spoken by some people in neighboring countries like Eritrea, Djibouti, and Somalia. Tere are many historical documents written in Geez scripts found in Ethiopia. Tere are around 80 diferent languages spoken in Ethiopia, with up to 200 dialects. Te Geez alphabet is used as the writing system in some languages. Amharic, Geez, and Tigrinya are the most spoken languages in Ethiopia that use the Geez alphabet [1].
Geez script consists of 265 characters including 27 labialized characters (characters representing 2 sounds), 20 symbols for numerals, and 8 punctuation marks [2]. Our research focused on only the Geez digits. Geez numerals have been used in Ethiopian calendars, Geez Bibles, and historical documents. Geez numbers consist of twenty different symbols to represent the numerical values. Unlike Latin numbers, 0 is not represented by any symbol. Twenty numbers are represented by independent symbols such as 1-9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, and 10000 as shown in Figure 1. Other numbers are represented by the combination of those twenty symbols. Each digit symbol has a dash (horizontal line) above and below the digit character.
Handwritten character and digit recognition works are done in diferent languages to improve the efciency of the recognition when they digitize historical and handwritten documents [4]. Digit recognition is a well-known problem that has been used to document indexing using dates such as document date, birth date, marriage date, and death date [5].
Digit recognition and detection have been utilized in a variety of applications, including automated the reading of the number of bank cheques, postal numbers and codes, tax forms, and document indexing based on dates [6]. Tere are two types of architectures for handwritten digit string recognition. Te two strategies for recognizing the digit string are detection-free and segmentation-based recognition [7]. In segmentation based on the system, we frst detect the numerical string that may contain multiple digits. Splitting digits should be done before a recognition to isolate each digit [8,9]. However, detection-free recognition approach recognizes each digit without any splitting and detection preprocesses [10].
Random Forest, SVM, KNN, and other machine learning techniques have been developed to recognize handwritten digits. Deep learning methods like CNN have the highest accuracy when compared to the most commonly used machine learning algorithms for handwritten digit recognition [11,12]. Pattern recognition and large-scale image classifcation are both done with CNN. Handwriting character recognition is a research feld in computer vision, artifcial intelligence, and pattern recognition [1]. It might be claimed that a computer application that conducts handwriting recognition has the capacity to acquire and recognize characters in photographs, paper documents, and other sources, and convert them to electronic or machine-encoded form. Deep learning is a popular feld of machine learning that uses hierarchical structures to learn high-level abstractions from data. According to references [13,14], the availability of technology CPUs, GPUs, and hard drives, among other things, machine learning algorithms, and large data, such as MNIST handwritten digit data sets and ImageNet data, are all factors in deep learning's success. Handwritten digit recognition, facial recognition, computer vision, audio and visual signal analysis, voice recognition, disaster recognition, and automated language processing are all areas where deep learning is applied [15].
Nowadays, deep learning is becoming a popular technique to learn to recognize patterns and deep patterns and extract. It has a deep learning level to generate patterns from a given dataset. It is an amazing algorithm with diverse libraries to extract patterns and recognize from images and classify them. Among the deep learning algorithms, the CNN is efcient and has good image classifcation, image recognition, pattern recognition, feature extraction, and so on.

Related Works
Kusetogullari et al. [5] introduced a deep learning architecture known as DIGITNET to detect and recognize English handwritten digits that are found in historical documents in Sweden. Te authors also created a large-scale handwritten digit dataset for the public known as DIDA. Te data were collected from the Swedish handwritten historical documents written by diferent priests in the nineteenth century. Te dataset consists of 100,000 handwritten digit images. Te DIGITNET consists of two diferent architectures to detect a digit and recognize the digit. Te frst architecture is DIGITNET-dect which detects the digit strings from handwritten documents and the second architecture is DIGITNET-rec which recognizes the handwritten digit. Te authors used a deep learning approach to train both models and used regression-based deep CNN methods to detect the digit. YOLOv3 was designed by the authors to detect and classify a digit from an image. In the recognition phase, three diferent CNN architectures were proposed by the authors. Convolutional, batch normalization, max-pooling, fullyconnected layers, and SoftMax layers are all included in each proposed model. But still, it has a limitation of some of the image data having high resolution, so it increases the computational cost in the training of the model and some digits are not labeled due to their bad appearance. Low digit detection accuracy because of negative sampling is also a limitation of the research work.
Chen et al. [16] compared fve machine learning classifcation models to recognize handwritten digits ofine. Te authors compared the performance of the KNN, neural network, random forest, decision tree, and bagging with gradient boost. 70,000 digit images are used to develop the classifer models. Te KNN and neural network show better accuracy than other classifers and KNN achieves 10 times faster speed than the neural network model. Te preprocessing stage is the crucial part of the recognition system in handwritten recognition. Te authors used some preprocessing techniques to enhance the data. Tey used normalization to give equal weight to each attribute. Ten, they used a median flter for the noise reduction step. Image sharpening and image attribute reduction are the other steps in the preprocessing phase, but still, it has some limitations from those, the bewilder tool is not efective to preprocess handwritten image data and they did not fnd a threshold value for the binarization preprocess technique; then, they ignore binarization technique. Te image is blurred after median flter and sharpening in preprocessing techniques.
Beyene [3] proposed a multilayered feed-forward propagation ANN for ofine handwritten and machineprinted Amharic (Geez) number recognition. Te author collected only 560 datasets for the model. He used 460 for the training and 100 for the test data. Te author collected the data manually because there is no public data for Geez handwritten digits. Te overall classifcation accuracy is 89.88%, which is poor because he used a very small amount of the data to develop his model [3]. Many researchhas been experimented in the specifc area of handwritten digits of ancient Semitic language (Geez). Some other researchers  Applied Computational Intelligence and Soft Computing have done for all Geez character recognition but the author [3] did his research specifcally on Geez digits. But still, it has some limitations from those, a small number of data are used to train the algorithm, the work does not give any information about the preprocessing technique, and the accuracy of the proposed model is low to recognize the digit. Hossain and Ali [17] proposed a handwritten digit recognition using a CNN on MNIST handwritten datasets. Te authors used MatConvoNet to increase the speed of the operation of building the proposed model. MatConvoNet is a MATLAB function that supports an efcient computation on CPU and GPU allowing the training of complex models on large datasets such as Image Net ILSVRC. However, it has some limitations such as the research does not give any information about the preprocessing technique and the number of hidden convolution layers is small in the proposed model. Demilew and Sekeroglu [1] proposed an ancient Geez script recognition model by using deep learning. Te authors developed a deep CNN model to recognize Ethiopian ancient Geez characters found in historical documents. Tey proposed an architecture that only recognizes Geez characters and not words or full sentences. Te dataset is a total of 22,913 images collected from libraries, private books, and the Ethiopian Orthodox Tewahedo Church. Tey also developed a recognition system to recognize twenty-six base characters only. In Geez scripts, there are around 265 characters and 34 base characters, but they classifed each character to its base character class, not to its specifc character. Tere are 7 characters found in each base class including the base class. One of the challenges in recognizing handwritten Geez script is the similarity between the characters which are found in the same base class. Te authors classifed all of the seven characters found in the same class into one base class and ignored the difcult task in their model, but still, it has the problem of low image quality, the number of instances is not balanced for each character. Also, the research work does not mention the methods that are used for character detection. Te proposed model classifed all of the seven characters found in the same class into one base class; this is the other limitation.
Gondere et al. [2] designed a handwritten Geez character recognition system using a CNN. Te authors used multitask learning to enhance the model from the relationships of the characters. Tey ran the experiment by some hyper-parameters of a CNN. Te parameters are 100 batches in size, 0.3 keeping probability for dropout, 0.0001 learning rate, and 0.01 L2 regularization. Tey organized a dataset from different previous research works. But still, it has some problems in the research work. Te frst one is they used a unique handwritten dataset that afected the performance of the models and the work does not mention the preprocessing technique. Ali et al. [18] proposed a model to recognize a handwritten digit. Te authors used a CNN algorithm to develop the model. Tey used deeplearning4j with a CNN for the recognition system. Te CNN is composed of two main tasks. Te frst task is to extract a feature from each layer. Each layer takes input from the output of the previous layer and forwards the current output to the next layer. Te second task of the CNN architecture is feature classifcation. Tis unit generates or classifes the predicted output. Te authors used the MNIST dataset for their work. 60,000 handwritten digit images were used for training and testing the model. But it has some limitations from those, the proposed model used a large kernel size in the convolution layer, and because of that, it consumes a longer training time. Also, the work does not give any detailed information about the preprocessing technique.
Most of the researchers did digit recognition on English numbers. Tey achieved high performance using diferent methods to recognize handwritten digits. For English handwritten digits, there are many resources and datasets ready to be used by the research community. It encourages the researchers to focus on that area. However, for Geez handwritten digits, there are no organized data in public for researchers to work on recognition of handwritten digits. Some researchers did Geez character recognition for machine-printed and handwritten characters but they did not focus on digits, especially for handwritten. Te author of [3] is the frst researcher to work on recognizing handwritten Geez digits, but the dataset he used was a very small and low performance made.

Data Collection Method
For this study, handwritten data were collected from a variety of people with various writing styles. Instead of manual feature extraction, which is difcult for humans to do, deep learning models are utilized, which are life-simplifying and efcient techniques to extract with high accuracy, and performance. A data-gathering paper was created for this purpose. Te data gathering paper is prepared in a way to make the pre-processing easier. Te paper is A4 size which consists of the symbol of all 20 Geez numbers, in 2 rows and 10 columns in a box, and other same-sized empty boxes prepared and repeated 5 times as shown in Figure 2. Tis means an individual has to handwrite 100 instances or digits. Te data were collected from 524 diferent individuals and each person gave 100 instances of digits. According to calculations, since the collected data are from 524 diferent individuals, 52,400 instances are obtained. People from many demographic groups participated in the data collection. Te data were gathered from elementary pupils, high school students, high school staf members, university students, and university academic staf (lecturers). Te majority of information was acquired from university students, which totaled roughly to 250 at Adama Science and Technology University.
Te data collection in the university was successfully conducted with the help of Computer science and Engineering Club ASTU (CSEC-ASTU) members. Te club had 100 members at the time of data collection; thus, the data were gathered from them and through their connections on the campus. As mentioned earlier, data were obtained from 250 university students, 150 of whom are male and 100 of whom are female. After collecting the data, it must be converted from paper to digital format before it can be processed. Te documents were scanned using a TECNO mobile with a 50 Mega Pixel camera and a software app called cam scanner for this process. Te advantage of using a Applied Computational Intelligence and Soft Computing cam scanner is that it detects the paper and provides only the digital format (in image format) of the paper part after removing the background, reducing noise.
Python's OpenCV library was used for data extraction during the pre-processing technique. Tis program's input is one partition, and its output is the extracted data. Once prepared for one partition, the same would go for others.

Data Preprocessing
Te second phase of the proposed model is the preprocessing phase that occurs after the digital image has been made. Te digitized image is frst checked for skewing before being preprocessed to reduce noise. Preprocessing is necessary for creating data that are simple to recognize using handwritten digit recognition systems, and the goal is to reduce background noise, enhance the image's region of interest, and produce a clear distinction between foreground and background. Te study use the Python OpenCV library for the preprocessing technique.

Resize Image.
Because the data are available in a range of sizes, it must be resized to ft the network's input size. All images are resized to 32 × 32 pixels in this work. Tis scaling is important for reducing computational complexity and for concentrating on the region of interest by cropping it.

RGB to Grayscale
Conversion. Te simplest color model is grayscale, which specifes colors using only one component: lightness. A value ranging from 0 (black) to 255 (white) is used to defne the amount of brightness (white). All the original images in the dataset are in RGB color format. Converting the RGB to grayscale, reduce the color channel, and it reduces the computational complexity compared with RGB color images. In our proposed model the input images are grayscale so, the original images should be converted to the grayscale color format.

Color Inversion.
Te dominant color of the original image is white, which has a value of 255. For grayscale image, dataset models changing the dominant color to black is preferable to reduce the complexity of the mathematical operations. Because black color has 0 values, a convolution operation with the dominant part with a 0 value is reducing the computational complexity of the model. Figure 3 shows the preprocessing techniques used in our dataset. As shown in Figure 3(d), the dominant part of the image is the background of the image. In the color inversion technique, the background is converted from white to black color as shown in Figure 4.

Proposed Model
Te convolutional neural network (CNN) is the proposed model to address the Geez handwritten digit recognition. To recognize the digits, a CNN-based digit classifer is used. Six diferent CNN-based handwritten digit classifers consist of a number of layers such as a convolutional layer, maxpooling layer, dropout layer, fatten layer, fully-connected layers, and SoftMax layer to achieve high recognition accuracy. Furthermore, the training was performed by applying the backpropagation approach of stochastic gradient descent.
Finally, based on the evaluation metric, choose the best model for recognizing digit strings. Each classifer is constructed with a diferent number of convolutional layers, kernel sizes, and flters. Te parameters applied in all six classifers are summarized in Table 1. Model 6, for example, shown in Figure 5 has 8 convolutional layers, 4 max-pooling layers, 3 dropout layers, 2 fully connected layers, and 20 output layers. Te kernel size, stride, and number of flters in the frst convolutional layer are 3 × 3, 1, and 32 (3 × 3@1@ 32), respectively. Te second and third convolution layers are similar to the frst. After three convolution layers, the max-pooling layer (2 × 2@2@32) is applied. Te convolutional layer (3 × 3@1@64) is used in the ffth layer, and it consists of 64 flters with a kernel size of 3 3 and a stride of 1. Te following two layers are convolution layers, with the same hyperparameter as the ffth layer. Te max-pooling layer (2 × 2@2@64) is applied in the eighth layer. After the max-pooling layer, dropout is applied. Te convolutional layer (3 × 3@1@64) is applied next, which consists of 64 flters with a kernel size of 3 3 and a stride of 1. Te maxpooling layer (2 × 2@2@64) is the next hidden layer.
After the max-pooling layer, the dropout is applied. Te convolutional layer (3 × 3@1@128) is applied next, which consists of 128 flters with a kernel size of 3 × 3 and a stride of 1. Te max-pooling layer along with dropout layer is used before the fully connected layer. Fully connected layers are used, which consist of 128 nodes. In the convolutional and fully connected layers, ReLU is used as an activation function. SoftMax is used as a last layer to compute the probabilities of output classes in the last layer. Te class with the highest probability produces the desired result. Te epoch size is 30 and the total number of training instances in a single batch is 32. Te other fve classifers have varying numbers of convolutional and fully connected layers, as well as diferent layer organizations. Te frst fully connected layer contains 128 neurons and the second contains 20 neurons for all cases.

Result and Discussion
Te CNN is used to observe and see the diferences of the accuracies among diferent results from the handwritten Geez digit models. Training and validation accuracy were measured for 30 diferent epochs by changing out hidden layers for various combinations of convolution layers and using batch size 32 in all cases. Figures 6,7,8,9,10,and 11 illustrate the accuracy of the CNN, and Figures 12,13,14,15,16,and 17 show the loss of the CNN with various convolution and hidden layer combinations. Table 1 shows the maximum and minimum training and validation accuracies of the CNN determined after experiments for six diferent cases with diferent hidden layers, and Table 2 shows the maximum and minimum training and validation loss of the CNN in various cases for the recognition of Geez handwritten digits. Table 3 describes the CNN confguration and parameters for the six cases. Te models have varies numbers of convolutional and fully connected layers, as well as diferent layer organizations. Te frst fully connected layer contains 128 neurons and the second contains 20 neurons in all cases.
Te frst hidden layer in the frst case presented in Figures 6 and 12 is the convolutional layer 1, which is used for feature extraction. It has 32 flters with a kernel size of 3 × 3 pixels, and it uses ReLU as an activation function. Te next hidden layer is convolutional layer 2, which consists of 32  Applied Computational Intelligence and Soft Computing 5 flters with a kernel size of 3 × 3 pixels and ReLU. To minimize the spatial size of the output of a convolution layer, a pooling layer 1 is defned, with max-pooling and a pool size of 2 × 2 pixels. Te next layers are two convolutional layers of a 64 flter with a kernel size of 3 × 3 pixel and the ReLU activation function is applied to the model. A max-pooling layer 2 is applied after the convolution layer4. Next to the pooling layer 2, a regularization layer dropout is used to reduce the     Figures 7 and 13 are defned for case2, where the frst hidden layer is the convolutional layer 1, which is used for feature extraction. It has 32 flters with a kernel size of 3 × 3 pixels, and it uses ReLU as an activation function. Te next hidden layer is convolutional layer 2, which consists of 32  Applied Computational Intelligence and Soft Computing flters with a kernel size of 3 × 3 pixels and ReLU. To minimize the spatial size of the output of a convolution layer, a pooling layer 1 is defned, with max-pooling and a pool size of 2 × 2 pixels. Te next layers are two convolutional layers of a 32 flter with a kernel size of 3 × 3 pixel and the ReLU activation function is applied to the model. A max-pooling layer 2 is applied after the convolution layer4. Te next two hidden layers are convolution layers which are made up of 64 flters with a kernel size of 3 × 3 pixels. Max pooling and dropout layers are applied after the convolution layers. Te next two layers are convolution layers with a channel size of 64 followed by a max-pooling layer. Te next hidden layer is convolution layer 9 with a 3 × 3 kernel size of 128 flters. A max-pooling layer with a dropout is applied after the convolution layer. Rectifed Linear Units (ReLU) are used as an activation function in all convolution layers. Te dimensions and hyperparameters used in this and the next cases are the same as those used in case 1. Te overall performance test accuracy is found to be 94.71%. Te minimal training and validation accuracy is determined at epoch 1. Te training accuracy is 85.01%, and the validation accuracy is 89.00%. Epoch 28 has the highest training accuracy, while epoch 20 has the highest validation accuracy. Te maximum accuracy for training and validation is 98.74% and 94.99%, respectively. Te total model loss is estimated to be approximately 0.2928. Two convolutions layers with a kernel size 3 × 3 which have 32 flters are taken one after the other in case 3, as shown in Figures 8 and 14, followed by a max-pooling layer. Two other convolution layers which have the same parameter from the frst two layers are applied before the maxpooling layer and dropout layer. Te next layers are three consecutive convolution layers which have 64 flter channels with a 3 × 3 kernel size and followed by a max-pooling layer. Before the fatten layer, two convolutional layers, maxpooling layer, and the dropout layer were applied. Te two convolution layers have 64 and 128 kernel channels, respectively. Both layers have the same kernel size of 3 × 3. A fattened layer is followed by the two fully connected layers.
Te overall performance test accuracy is found to be 94.98%. At epoch 1, the minimum training accuracy is 85.96%, whereas the minimum validation accuracy is 89.63%. Te maximum training and validation accuracies are 98.63% and 95.28% found at epochs 26 and 20, respectively. Te total model loss is found at approximately 0.2908.
For case 4, shown in Figures 9 and 15, three consecutive convolution layers are applied one after the other. Te number of channel is 32 and the kernel size is 3 × 3. Te maxpooling layer was applied after the three convolutional layers. Te max-pooling layer is followed by three convolution layers which have 64 kernel channels and 3 × 3 kernel size which are followed by a max-pooling layer with a     Figures 10 and 16, and for this case, three consecutive convolution layers are applied one after the other. Te kernel channel is 32 and the kernel size is 3 × 3. Te max-pooling layer was applied after the three convolutional layers. Next to the pooling layer, a regularization layer dropout is used to reduce overftting by randomly eliminating 20% of the neurons in the layer. Te next layers are three convolution layers followed by a maxpooling layer and a dropout layer. Te two fully connected layers are followed by a fattened layer.
Te overall performance test accuracy was found to be 94.42%. At epoch 1, the minimum training accuracy is 87.41%, while the minimum validation accuracy is 89.90%. Epoch 29 has the highest training accuracy, while epoch 27 has the highest validation accuracy. Te maximum accuracy for training and validation is 99.77% and 94.84%, respectively. Te total test loss of the model is 0.5504. Te validation loss of the model increase when the iteration goes. It shows the model became overft to the training data. Te maximum model loss is occurred in this case from all the six cases. Also, the minimum model accuracy among all cases occurred in case 5. It shows that overftted models give a high model loss and low accuracy for a new test dataset.
Finally, in Case 6 ( Figures 11 and 17), three convolutions are taken one after the other, followed by a pooling layer. Te three convolution layers have 32 kernel channels. Tree convolution layers with a kernel size 64 are next, followed by a max-pooling layer. Next to the pooling layer 2, a regularization layer dropout is applied to reduce overftting by randomly eliminating 20% of the neurons in the layer. Convolutional layer 7 which has 64 kernel size is the next hidden layer, followed by a max-pooling layer and a dropout layer. Te next layer is convolution layer 8 with 128 number of channels and kernel flter size of 3 × 3. All convolution layers have the same flter size. Max-pooling layer 4 with a dropout was applied after the convolution layer 8. Te fatten layer, followed by two fully connected layers, is applied. Te overall performance test accuracy was found to be 96.21%. At epoch 1, the minimum training and validation accuracies were found to be 88. 77%       Applied Computational Intelligence and Soft Computing loss decreases when the number of epoch goes, but the validation loss fuctuate for 10 epochs and then remain constant for the remaining number of epochs. By varying the hidden layers, the changes inaccuracies for handwritten digits were observed over 30 epochs in the experiment. Accuracy curves for the six cases for each parameter were generated using a handwritten Geez digit dataset. Te six cases behave diferently due to the different combinations of hidden layers. Te maximum and minimum accuracies for several hidden layer variations were recorded using a batch size of 32. As shown in Figure 18, the highest test accuracy in performance was found to be 96.21% for 30 epochs in case 6 among all the observations (Conv1, Conv2, Conv3, pool1, Conv4, Conv5, Conv6, pool2 with dropout, Conv7, pool3 with dropout, Conv8, pool4 with dropout, fatten layer, 2 fully connected layers).
Tis type of greater accuracy will work in Geez handwritten digit recognition to help the machine execute more efciently. In case 5, however, the lowest accuracy among all observations in the performance was discovered to be 94.42% (Conv1, Conv2, Conv3, pool1, Conv4, Conv5, Conv6, pool2, fatten layer, and 2 fully connected layers). Furthermore, the total highest model loss in case 5 is 0.5504, while the total lowest model loss in case 6 with dropout is around 0.2013 (Figure 19). With this minimal loss, the CNN will be able to achieve greater image quality and noise processing. From the observed result, the study chooses the best model from six cases that have highest model test accuracy and lowest test loss. So, case 6 model with highest accuracy of 96.21% and lowest loss of 0.2013 is the proposed model for this research work.
Te previous work on Geez handwritten digit recognition is done by the author of [3] who achieved 89.88%       accuracy using an ANN model. Tis study evaluates CNN models with diferent layers with diferent hyperparameters. Compared with the previous work, the study improve the accuracy of the recognition from 89.88% to 96.21% by using CNN, increasing the dataset size, and enhancing the quality of the image by using pre-processing techniques on the dataset.

Conclusion and Future Scope
In this research work, convolutional neural networks was used to recognize Geez handwritten digits with 20-digit classes. CNNs are the current state-of-the-art algorithm for classifying image data and are widely used. On a prepared form for data collection, a large number of Geez handwritten digits were collected from individual handwriting. Te handwritten documents are scanned and preprocessed to get 32 × 32-pixel digit images. Te study ofered a new public dataset for the Geez handwritten digit dataset, which is open to all researchers. CNN architecture was used from the deep learning approaches to develop an Geez handwritten digit recognition system. A lot of trial and error neural network confguration tuning mechanisms were used to get the best ft model of CNN-based architecture. In comparison to earlier research works on Geez handwritten digit recognition, the study able to achieve higher recognition accuracy using the developed CNN model. Te proposed model achieved an accuracy of 96.21% and a model loss of 0.2013. Regardless of the fact that much work has been done in the English language to recognize handwritten digits, only a small amount of work has been done in the Amharic language. Due to a lack of research work on the area, there is a big challenge to get datasets for the Amharic language. Te collected data amount is enough to train the model, but it is not a large dataset, and the students dominate the respondent of the data gathering. Most of the respondent is student, so the model is performed well for the students and for other individual group the model does not perform well like the students. Te dataset does not include the historical document and manuscript images. Te collected data are only from individuals not including other sources. In this research, a dataset was developed that can be used by other researchers in the future. In the future, the dataset will also have historical data as the dataset for the model, and the     Applied Computational Intelligence and Soft Computing current work only supports a single handwritten Ge'ez digit, but in the future, add the support for multi-digit.

Data Availability
Te data used to support the fndings of this study are available at https://drive.google.com/fle/d/1abJWvSYSyw8 mLQ5Blg_lYAJng1K3LtGS/view?usp=sharing.

Conflicts of Interest
Te authors declare that they have no conficts of interest.