Image-Based Arabic Sign Language Recognition System Using Transfer Deep Learning Models

. Sign language is a unique communication tool helping to bridge the gap between people with hearing impairments and the general public. It holds paramount importance for various communities, as it allows individuals with hearing difculties to communicate efectively. In sign languages, there are numerous signs, each characterized by diferences in hand shapes, hand positions, motions, facial expressions, and body parts used to convey specifc meanings. Te complexity of visual sign language recognition poses a signifcant challenge in the computer vision research area. Tis study presents an Arabic Sign Language recognition (ArSL) system that utilizes convolutional neural networks (CNNs) and several transfer learning models to automatically and accurately identify Arabic Sign Language characters. Te dataset used for this study comprises 54,049 images of ArSL letters. Te results of this research indicate that InceptionV3 outperformed other pretrained models, achieving a remarkable 100% accuracy score and a 0.00 loss score without overftting. Tese impressive performance measures highlight the distinct capabilities of InceptionV3 in recognizing Arabic characters and underscore its robustness against overftting. Tis enhances its potential for future research in the feld of Arabic Sign Language recognition.


Introduction
Sign language (SL) is a nonverbal and natural language with the same functions as spoken language [1].Deaf and hardof-hearing individuals use SL to interact with others through a vocabulary of signs and gestures [2].In the past, people with disabilities did not receive global attention.However, today's technologies ofer tools designed to enhance the quality of life for individuals with disabilities [3].Recognizing Arabic Sign Language (ArSL) is a signifcant area of research due to its complex nature.Moreover, sign language recognition has become an essential application in deep learning and artifcial intelligence [4].In this study, we aim to develop an Arabic Sign Language Identifcation System (ArSL) using deep convolutional neural networks (CNNs) to assist deaf people with hearing problems.Sign language and spoken language have the same work roles [5]; it is used to deal with those who cannot speak or hear, as it depends on the language of the hands with specifc movements [6].Te signs difer according to each letter of the alphabet and other movements to form sentences [7].
Recent advances in deep learning (DL) and computer vision have shown great promise in the felds of gesture recognition that signifcantly improve communication between individuals who use sign language and those who do not [8,9].Furthermore, hand shape features can be detected using many approaches such as using CNNs [10,11] and histograms of orientation gradient feature extraction [12].Sign language employs signals and body dialects such as hand shapes, facial expressions, and lip patterns to communicate meaning [13].It consists of manual gestures represented by hand position, direction, form, and path-nonmanual gestures representing facial expressions and body movement [14].However, most researchers focus on hand signals because they contain raw information [15].Tere are two prime approaches to Sign Language Recognition (SLR) systems which are image-based and sensor-based.Te frst approach is based on the use of SLR images, movements, and marks in the cameras' vision [16], while in the second approach, instead of adopting the cameras' basis, the sensors use fxed gloves to capture the marks with the probes [16].
Tis study develops an Arabic Sign Language Identifcation System (ArSL) using six diferent pretrained architectures with pretrained weights: MobileNetV2, VGG16, InceptionV3, ResNet50V2, ResNet152, and Xception.Experimentally, to enhance the robustness and efectiveness of pretrained models, we employed early stopping [17] and data augmentation techniques.Tese practices are essential to facilitate better generalization of the model on unseen data.Striking the appropriate balance and iterating through experimental iterations are crucial steps to fne-tune the model and mitigate overftting.
Te sections of this paper are organized as follows.Section 2 provides a comprehensive overview of existing research in the feld.Section 3 explains the aim of the study.Section 4 illustrates the materials and methods proposed in this research.Moreover, Section 5 presents various experiments and their results, while the fnal section, Section 6, provides the conclusion of this paper.

Related Work
Nowadays, the power of deep learning technologies is applied in the feld of sign language to improve the quality of life for people with disabilities.Many works have been proposed to enhance the sign language recognition system in diferent languages using diverse techniques [8,18].Several surveys provide a comprehensive overview of sign language recognition systems utilizing deep learning [19].Te survey, in [20], has reviewed sign language recognition and ArSL.Te survey encompassed an evaluation of various classifers and their respective performances across diferent sign languages, ultimately reporting the most efective classifer tailored to each specifc sign language used for optimal sign language recognition systems.In this section, we provide an overview of the most pertinent research related to the Arabic Sign Language recognition systems.Table 1 provides a summary of the prior research discussed in this study.
Saleh and Issa [24] proposed models that match the VGG16 and the ResNet152 structures and employed transfer learning and fne-tuning of deep convolutional neural networks (CNNs) to enhance the accuracy in recognizing 32 hand signs from Arabic Sign Language.Te proposed method was applied to 2D images of diverse Arabic Sign Language data, achieving an impressive accuracy rate reaching a validation accuracy of 99.6% for the ResNet152 and 99.4% for the VGG16.ElBadawy et al. [32] employed a deep behavior-based feature extractor to capture the fner details in Arabic Sign Language efectively.A 3D convolutional neural network (CNN) was also utilized for the recognition of 25 gestures from the Arabic Sign Language dictionary.Te recognition system was fed with data obtained from depth maps.Te system demonstrated an accuracy rate of 98% for observed data and 85% for the new data.
In [31], Hayani et al. proposed a new approach based on convolutional neural networks and fed the applied approach with a real dataset.Te approach is used to automatically recognize numbers and letters of Arabic Sign Language.Ten, a comparative study was conducted to demonstrate the efectiveness and robustness of the proposed approach compared to traditional models, particularly, K-nearest neighbors (KNN) and support vector machine (SVM).Te recognition rate for the proposed system is 90.02% surpassing both SVM at 88% and KNN at 66%.Kamruzzaman in [25] introduced a vision-based approach utilizing convolutional neural networks (CNNs) for the recognition of Arabic hand sign-based letters and translating them into spoken Arabic.Te accuracy achieved by this approach equals 90%, which ensures that this system is demonstrated to be highly reliable and efcient.Almasre and Al-Nuaim [26] developed a dynamic prototype model (DPM) utilizing Kinect in order to recognize specifc dynamic words in Arabic Sign Language (ArSL).In this work, the DPM integrated eleven predictive models employing three machine learning models (SVM, RF, and KNN) with varying parameter confgurations.Te results in this research demonstrated that the SVM models utilizing a linear kernel with a cost parameter of 0.035 performed the highest accuracy rates in recognizing the dynamic words.
Elatawy et al. [27] introduced a novel approach employing the neutrosophic technique [33] and fuzzy c-means for the detection and recognition of Arabic Sign Language alphabet.Te system employed a Gaussian flter to eliminate noise and prepare the input image for further processing.Ten, images were transformed into the neutrosophic domain, and the features were extracted to commence the classifcation stage.Experimental results showed the system's commendable performance, and it achieved a total classifcation accuracy of 91%.Te study in [28] proposed a new framework for signerindependent sign language recognition, leveraging a combination of deep learning architectures.Te proposed framework encompasses hand semantic segmentation, hand shape feature representation, and a deep recurrent neural network.Te framework is evaluated on a challenging Arabic Sign Language database, encompassing 23 isolated words recorded from three diferent users.Te experimental results demonstrated that the applied framework signifcantly outperforms other state-of-the-art methods in the context of signer-independent testing strategies with an accuracy of 89.5% using DeepLabv3+ semantic hashing of the hand.
Alnahhas et al. [29] introduced an approach for recognizing words in Arabic Sign Language utilizing the Leap Motion device.Te device facilitates the creation of a 3D model of the human hand through infrared technology.Te proposed methodology intends to analyze mathematical features derived from the Leap Motion controller.Te gesture is also represented as a series of frames to refect its temporal nature, using the LSTM layer-based neural network classifer to encode the sequence and fnd the matching gesture.Te highest rating was 89% for one-handed gestures and 96% for hand gestures.Te study [30] proposed an afordable smart glove system capable of recognizing hand gestures in Arabic Sign Language.Te proposed approach integrated the fex sensors and a tilting sensing module for both the right and left hands.Additionally, an Android application called "Smart Glove" has also been developed to 2 Applied Computational Intelligence and Soft Computing Applied Computational Intelligence and Soft Computing translate gestures into textual speech.Te glove system was designed to accommodate both word level and sentence level and showed an impressive 90% recognition rate.Te work in [24] applied transfer learning and fne-tuning deep convolutional neural networks (CNNs).Te pretrained model weight values are frst fed into the layers of each network according to the proposed methodology, which then creates models that correspond to the VGG16 and ResNet152 structures.Finally, their town softmax classifcation layer is added as the fnal layer following the last fully connected layer.Te networks were able to deliver an accuracy of around 99% when they were fed typical 2D images of various Arabic Sign Language data.
Te study [34] also proposed a framework based on a variety of deep learning models for the automatic recognition of Arabic Sign Language, specifcally by using AlexNet, VGGNet, and GoogLeNet/Inception models in training and evaluating the efectiveness of shallow learning techniques using nearest neighbors and SVM algorithms as baselines.Te suggested algorithm provided encouraging results in detecting Arabic Sign Language with a 97% accuracy rate.A recent fully labeled dataset of images in Arabic Sign Language is used to evaluate the proposed models.Te goal of work [35] is to solve the recognition problem for Arabic Sign Language while assuring a trade-of between improving classifcation performance and condensing the deep network's design to lower computational costs.To categorize Arabic Sign Language motions, AlKhuraym et al. specifcally modifed Efcient Network (EfcientNet) models and created lightweight deep learning algorithms.In addition, an actual dataset of hand motions for thirty distinct Arabic alphabets recorded by numerous signers was developed.Te classifcation results generated by the suggested lightweight models were then evaluated using the proper performance indicators.Mahmoud et al. [23] developed an architecture that integrates transfer learning (TL) models and recurrent neural network (RNN) models for ArSL recognition.Te results achieved in this work have a peak recognition accuracy of 93.4%.
Te work in [36] reviewed the literature on deep learning techniques used for Arabic POS tagging during the previous two decades.Te Preferred Reporting Items for Meta-Analyses and Systematic Reviews (PRISMA) methodology was used to perform the review.To extract all DL methods used to create POS taggers for the Arabic language, more than 4,000 publications were examined.Twelve articles were chosen for a thorough examination after numerous exclusion procedures.According to the reviewed publications, long short-term memory (LSTM) and Bi-LSTM models are the most popular DL approaches for Arabic POS tagging and produce the best results.On the other hand, in this work [37] on Arabic Sign Language detection, the images have been through a number of preprocessing and data augmentation procedures.On the ArASL dataset, tests have been run using a variety of pretrained models.Most of them performed rather typically, and in the last stage of the analysis, the EfcientNetB4 model was determined to be the best ft.Models other than EcientNetB4 performed poorly given the complexity of the dataset due to their lightweight construction.EcientNetB4 is a heavyweight architecture with a higher level of complexity.Te best model is revealed with a 98% training accuracy and a 95% testing accuracy.
In paper [38], El Zaar et al. introduced a CNN-based highly efcient deep learning architecture.Te suggested architecture is efective because it can recognize and analyze various datasets in sign language with a high degree of accuracy.One of the most crucial tasks that transform the lives of the deaf by making daily life and social inclusion easier is the recognition of sign language.Teir system beats state-ofthe-art methods, with a recognition rate of 99% for ASL and ISL and 98% for ArASL.It was trained and tested on datasets for American Sign Language (ASL), Irish Sign Alphabet (ISL), and Arabic Sign Language Alphabet (ArASL).Te study in [22] provided a dataset of 20 Arabic words and proposed a deep learning architecture that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs).Te supplied dataset showed that the suggested architecture has a 98% accuracy rate.Te top-1 accuracy on the UCF-101 dataset was reported to be 98.8%.Aldhahri et al. [21] employed convolutional neural networks to construct a model aimed at recognizing Arabic alphabet signs.Te study utilized the Arabic alphabet's Sign Language Dataset (ArASL2018).Te results from this model showed a recognition accuracy of 94.46%.
Prior research has explored various approaches in sign language recognition systems, aiming to facilitate efective communication for individuals with hearing and speech impairments.In this context, our study stands out by focusing on the development of an Arabic Sign Language Identifcation System (ArSL) using six distinct pretrained architectures: MobileNetV2, VGG16, InceptionV3, ResNet50V2, ResNet152, and Xception.Te critical aspect of our study is to distinguish our proposed model from the existing ones clearly.We thoroughly evaluate and compare the performance of these pretrained models, highlighting the superior accuracy achieved by ResNet50V2 and InceptionV3, both reaching 100% accuracy which is the highest achieved accuracy.Tis distinction allows us to emphasize the uniqueness and efectiveness of our approach in the realm of Arabic Sign Language recognition.

Materials and Methods
In this section, we present the dataset, utilized neural network models, data preparations, and processing.Adopted Methodology.Te adopted methodology section serves as a guide for how this work was carried out, encompassing the entire process from data collection to the production of study fndings.We will delve into the major steps of the methodology as depicted in the fowchart, providing detailed explanations.Additionally, we will provide brief explanations of the pretrained models that were utilized in our study.
As depicted in Figure 2, we frst import the essential packages and libraries, including Keras, Pandas, and Matplotlib.Ten, the ArASL image data were loaded directly from the Kaggle website.As mentioned earlier, the dataset contains 54,049 images for 32 Arabic Sign Language characters.Te frst step after loading the dataset is to prepare the data to enter the model by implementing some preprocessing steps.Due to the dataset's imbalance issue, which means that every category holds a varying number of images, it may result in biased detection outcomes; hence, we solve this issue by aiming to avoid any inconsistencies and biases in the testing results.We allocated a fxed number of samples for each category in the dataset.Moreover, to complete preparing the dataset to enter the model, another data preprocessing step is performed, which is image resizing.Te ArASL images are in diferent sizes, so all images were resized to a standard resolution of 64 × 64 pixels.In addition, image normalization is conducted to make the images more consistent in terms of contrast, color, and brightness.After that, data augmentation was applied.It is the process of generating new data from existing data to increase the data size and variety, thereby achieving better results.In our study, we implemented diferent augmentation techniques, including rescaling, zooming, fipping, and shifting.
Moving on to model development, the dataset was divided into training and testing sets with a 70% ratio for training and 30% for testing.Te training set was entered into six chosen pretrained models, leveraging their efciency and robustness in extracting complex patterns from data.Tese models are MobileNetV2, VGG16, InceptionV3, ResNet50V2, ResNet152, and Xception.Te models' weights are loaded using the ImageNet model, and the prediction layer is added using the softmax activation function after the last fully connected layer.We then fne-tuned these models using various settings by adjusting hyperparameters, including diferent learning rates and diferent number of epochs.After fne-tuning these models, we validate the effectiveness of the models on the validation set by measuring the accuracy score and visualizing the results for better understanding.Finally, we choose the best model.[40] due to their remarkable capacity to uncover hidden patterns and generalize efectively, even with small datasets and limited resources.In this section, we will explain the utilized pretrained models in our methodology.

Models. Pretrained models have found extensive application in the feld of computer vision
(i) VGG16: Te VGG16 neural network has a resolution of 70.5% and is computationally more expensive than neural networks [41].Te VGG16 network is an embedded system with more complexity because it consists of 16 layers, in which the convolutional layers ( 13) are stacked with 3 × 3 flters, which are adopted to improve the mesh depth, improve the mesh efect to a certain extent, and reduce the number of weight parameters [41].Also, it has 2 × 2 assembly layers as maximum.Between these layers, the ReLU activation function is applied.Next, three fully connected layers contain most of the network parameters.Finally, the softmax function is used to produce the probabilities for each category [41].(ii) InceptionV3: InceptionV3, also known as Inception-v3, represents the third version of Google's convolutional neural network, which was showcased during the ImageNet Identifcation Contest.Goo-gLeNet is particularly well-suited for processing extensive data, especially in scenarios where there are constraints on memory or computing resources.

Applied Computational Intelligence and Soft Computing
It excels in tasks such as image analysis, object detection, and object classifcation [42].Incep-tionV3 consists of 48 layers, and the network's image input size is 299 × 299 pixels.It incorporates numerous enhancements, including the utilization of label smoothing, 7 × 7 convolutions, and the integration of an additional classifer to propagate label information throughout the grid.Additionally, it employs batch normalizing layers on the side (auxiliary branches) [43].(iii) ResNet50V2: Te ResNets [44] are modular structures that stack building blocks of the same continuous shape.Inception-ResNet-v2 is an improvement, a convolutional neural network that builds on foundation models but incorporates residual connections, replacing the flter sequence stage of the foundational architecture [45].the Inception modules.Te original deep separable convolution consists of a depthwise convolution followed by a pointwise convolution, while the separable convolution starts with a pointwise convolution followed by a depthwise convolution.Tis modifcation is introduced in the starting module of InceptionV3, where a (1 × 1) convolution precedes any (n × n) spatial convolutions.As a result, Xception difers slightly from the original Inception architecture.Notably, the Xception architecture maintains the same number of parameters as InceptionV3, aiming for improved performance through more efective utilization of the model's parameters, rather than merely increasing capacity [47].(vi) MobileNetV2: MobileNetV1 [48] emerged as a family of computer vision neural networks designed to support classifcation and detection in standard functions primarily built for mobile devices.It can run these networks on mobile devices, enhancing user experiences by providing benefts such as always-on access, privacy, security, and power efciency.Subsequently, MobileNetV2 was introduced to power the next generation of mobile computer vision applications.MobileNetV2 represents a signifcant improvement over MobileNetV1 and incorporates the latest technology for mobile optical recognition, including support for various convolutional neural network applications such as object detection, classifcation, and semantic segmentation [49].Released as part of the TensorFlow-Slim image classifcation library, MobileNetV2 builds on ideas from MobileNetV1 [49], using separate depthwise convolutions as efcient building blocks.Additionally, MobileNetV2 introduces new architectural features, including linear bottlenecks between layers and shortcut connections between bottlenecks.

Experiments and Results
Due to the remarkable success of convolutional neural networks (CNNs) in the feld of sign language recognition, we conducted a comprehensive study to compare the performance of several pretrained models.Our goal was to determine the most efective model for recognizing signs using transfer learning.We used the ArASL dataset [39] in the training and validation phases, which consisted of a substantial 54,049 images, each depicting one of 32 Arabic signs.Our proposed technique comprised several key steps: (i) Preprocessing: we initiated the process by carefully preprocessing the images.(ii) Fine-tuning: the pretrained models underwent a fnetuning process using the preprocessed images.
(iii) Data augmentation: to improve the model's generalization and mitigate overftting, we applied data augmentation techniques.(iv) Monitoring performance: at the end of each epoch, we assessed the performance of each network using accuracy as a key metric.(v) Varied experiments: we conducted multiple experiments, exploring diferent numbers of epochs, batch sizes, and learning rates to comprehensively evaluate each model's performance.(vi) Early stopping: to prevent overftting, we implemented early stopping strategies during training.
Te dataset was divided into two subsets: a validation set comprising 30% of the data and a training set with the remaining 70%.Te results of our evaluation revealed that ResNet50V2 and InceptionV3 outperformed the other models.Both achieved an exceptional accuracy rate of 100%, with an error rate of 0%.ResNet50V2 was trained for 10 epochs, and InceptionV3 was trained for 6 epochs, both using a batch size of 32.Our application of early stopping and data augmentation techniques contributed to preventing overftting and enhancing the models' ability to generalize.In summary, the experiments indicated that InceptionV3, ResNet50V2, Mobi-leNetV2, Xception, and VGG16 exhibited superior performance when compared across various hyperparameters and network settings.Tis comparison revealed signifcant improvements in the models' speed and accuracy.Furthermore, we fne-tune these models for 3 epochs, 6 epochs, and 10 epochs.Table 2 shows the results of these models after 3 epochs.
Table 2 shows the diferences in the results after the training of these models fnished with three epochs.As we see, ResNet50V2 and Xception show the highest accuracy scores equal to 98% and 97% with the loss equal to 0.01 and 0.03, respectively.However, in ResNet50V2, we used an adoptive learning rate (decreasing the value of LR every three epochs) while the Xception model used a 0.001 learning rate.Figure 3 is an overview of accuracy scores on three epochs for all models.
Table 3 shows the diferences in models' performance on 6 epochs; the primary interpretation is that InceptionV3 and ResNet50V2 achieved 100% accuracy score with a 0.01 loss score.Tese two models are optimized through Adam optimizer and with batch size 32.In addition, setting an adaptive learning rate, e.g., by reducing the learning rate (LR) value after a certain number of epochs, leads to an improvement in the performance of the models such as in ResNet50V2where the LR is reduced from 0.001 to 0.005 on epoch 2.
Table 4 shows the diferences in model performance over 10 epochs; the highest accuracy was again achieved by ResNet50V2 which achieved an accuracy score of 100% with a loss score of 0.01.As we can see in the table, Xception and VGG16 results achieved less accuracy in this iteration.Also, the lowest error rates were also achieved in VGG16, Incep-tionV3, and ResNet50V2 compared to other models.Figure 4 visualizes the accuracy for these six models after 10 epochs.Applied Computational Intelligence and Soft Computing 7 Figure 4 shows on the left the accuracy through the epochs at test time for the best model (ResNet50V2) as well as shows the loss on the right.
Figure 5 shows the best results of the models across different number of epochs and diferent batch sizes; InceptionV3 achieved a 100% accuracy on 6 epochs with loss value equal to   Te primary contribution of this study lies in the exceptional performance demonstrated by ResNet50V2 and InceptionV3 in ftting our model to our dataset.Tese models achieved outstanding results with 100% accuracy and zero errors, showcasing their remarkable ability to classify sign language images into Arabic letters efectively.
Troughout the training phase of InceptionV3, we diligently applied early stopping mechanisms and, when applicable, data augmentation techniques.Tese strategies played a pivotal role in enhancing the model's generalization to previously unseen data.Our approach focused on achieving the right balance through iterative experimentation, ensuring the model was fnely tuned and efectively mitigated overftting.
Te variance in the number of epochs required for convergence comes from several factors related to the model's training, including the initialization conditions, hyperparameter adjustments, and the diference in model complexity.During our experimentation, we adjusted  Applied Computational Intelligence and Soft Computing diferent hyperparameters, such as the learning rate, early stopping criteria, and batch size.Tese adjustments impact convergence directly.Moreover, based on the particular model complexity, it might have diferent convergence behaviors; more complex models require a larger number of epochs to fne-tune the high number of parameters and reach convergence.In conclusion, it is evident that Incep-tionV3 outperformed other pretrained models in our comparison.

Conclusion
In our research, our goal was to accurately classify Arabic Sign Language images using advanced artifcial intelligence techniques.We achieved this by harnessing the power of pretrained models, including MobileNetV2, VGG16, InceptionV3, ResNet50V2, ResNet152, and Xception.Te dataset comprised a vast collection of Arabic Sign Language images.To ensure a fair evaluation, we split the dataset into two portions: 70% for training and 30% for validation.After conducting extensive experiments, particularly fne-tuning of the pretrained models, we observed exceptional performance from ResNet50V2 and InceptionV3.Tese models achieved an impressive 100% accuracy with zero errors.Te training process involved 10 and 6 epochs, with a batch size of 32.To further enhance the model's performance and prevent overftting, we applied techniques like early stopping and data augmentation.In summary, InceptionV3 consistently outshone the other pretrained models across various experiments with diferent number of epochs.What is really interesting here is that they achieved this remarkable accuracy without falling into the trap of overftting.Tis highlights the efectiveness of incorporating techniques like early stopping and data augmentation, which played a crucial role in enabling InceptionV3 to generalize exceptionally well while maintaining its high accuracy and avoiding the risk of overftting.
(iv) ResNet152: One year after the construction of VGGNet, the Residual Network (ResNet) emerged.Te ResNet model was developed with various depths, ranging from 32 layers to 152 layers [46].ResNet152, a deep network comprising up to 152 layers, learned residual representation functions instead of directly learning signal representation.It is eight times deeper than VGG networks while maintaining lower complexity.Te ResNet group also achieved an error rate of 3.57% on the ImageNet test, securing frst place in the ILSVRC 2015 classifcation challenge [46].(v) Xception: Xception [47] expands upon the Inception architecture by replacing standard Inception modules with deeply separable convolutions.In this architecture, deeply separable convolutions replace

Figure 1 :Figure 2 :
Figure 1: Samples of ArSL letters from the training data.

Figure 5 :
Figure 5: Optimal performance of models across varied epochs and batch sizes.

Table 1 :
Summary of related work on ArSL recognition research.
Tis study aims to advance Arabic Sign Language recognition utilizing state-of-the-art transfer deep learning techniques, with a focus on improving various research domains.Te objective is to develop an accurate Arabic Sign Language Identifcation System (ArSL) leveraging deep neural networks.Te motivation behind this work is to distinguish Arabic Sign Language by subjecting a neural network to diverse orientations and lighting conditions associated with images of hand gestures.Te goal is to achieve higher accuracy compared to existing techniques, reduce training time with fewer epochs, and efectively handle images of varying sizes.To evaluate and identify the most efective approach, we employ six distinct pretrained architectures: MobileNetV2, VGG16, InceptionV3, ResNet50V2, ResNet152, and Xception.Te aim is to attain the highest accuracy possible, ultimately assisting individuals who are "deaf and mute" and ensuring the removal of 4Applied Computational Intelligence and Soft Computing communication barriers they often encounter.Te model utilizes a dataset composed of Arabic sign images for training, translating each sign image to an Arabic letter, making interaction with the broader population more accessible.Te proposed model's key contribution lies in its automatic recognition of Arabic letters in sign language.Te dataset utilized for this purpose is the "Arabic Alphabets Sign Language Dataset (ArASL)," comprising 32 labels, including 28 for the letters and 4 for standard Arabic signs.Tis dataset comprises 54,049 images of ArSL letters contributed by over 40 individuals, encompassing the full spectrum of standard Arabic Sign Language.In the pursuit of creating an efcient ArSL system, it is vital to distinguish our proposed model from existing ones.Tis involves highlighting the unique features, advantages, and outcomes achieved through the utilization of our carefully selected pretrained architectures.Importantly, we conduct a comparative analysis of our model's performance against other established models, underscoring the efcacy and relevance of our approach.