An Efficient Classification of Neonates Cry Using Extreme Gradient Boosting-Assisted Grouped-Support-Vector Network

The cry is a loud, high pitched verbal communication of infants. The very high fundamental frequency and resonance frequency characterize a neonatal infant cry having certain sudden variations. Furthermore, in a tiny duration solitary utterance, the cry signal also possesses both voiced and unvoiced features. Mostly, infants communicate with their caretakers through cries, and sometimes, it becomes difficult for the caretakers to comprehend the reason behind the newborn infant cry. As a result, this research proposes a novel work for classifying the newborn infant cries under three groups such as hunger, sleep, and discomfort. For each crying frame, twelve features get extracted through acoustic feature engineering, and the variable selection using random forests was used for selecting the highly discriminative features among the twelve time and frequency domain features. Subsequently, the extreme gradient boosting-powered grouped-support-vector network is deployed for neonate cry classification. The empirical results show that the proposed method could effectively classify the neonate cries under three different groups. The finest experimental results showed a mean accuracy of around 91% for most scenarios, and this exhibits the potential of the proposed extreme gradient boosting-powered grouped-support-vector network in neonate cry classification. Also, the proposed method has a fast recognition rate of 27 seconds in the identification of these emotional cries.


Introduction
Crying is the primary mode of communication among infants to make their care givers aware of their physiological and psychological necessities. It is also the first expression of life at birth. e reasons behind infant crying can be numerous. A crying infant achieves the objective of attracting attention of the caregiver informing that the baby needs an interaction of some sort. Infant cries hold enormous information on its sound wave, and often the sound provides insight on the reason and severity of the cry. Infant cry is an important indicator of various types of information-emotion, gender, maturity at birth, first cry, and health status, and sleep pattern of the infant. e activity of crying is controlled by the brain and is triggered in case of any exceptional event occurring against the normal functioning of the infant's body. It acts like an alarm to inform on any alternated event pertinent to the functioning of the body, being reflected as a cry. e event of crying encompasses sequences of motor skill performances along with acoustic expressions such as vocalization, coughing-choking, constrictive silence, and various combinations of these manifestations.
Babies who are born before the 37 weeks of gestational period are preterm. Neonates are newborn babies born full term completing the gestational period. It is obvious that preterm babies are susceptible to various health issues and need immense care during the early developmental stage. Neonates similarly need care to ensure that growth is normal, and chances of health complications are eliminated to the maximum level. e pitch of cry in preterm and neonates contributes significantly in analyzing the signs of the problem to take immediate remedial steps at the earliest. Especially, in preterm, the sound and frequency of cry provide early information with deeper problems in comparison to other diagnostic tests, wherein pathological diseases in infants take almost seven months to a year time frame to get detected. Early detection of these diseases provides opportunities for versatile treatment applications and medical therapies. An infant cry wave contains information on the physical state and physical pathologies of babies. Processing of this information is similar to the task of pattern recognition, possessing high potential as a noninvasive complementary tool for detection and strategizing preventive steps for infant health issues. Figure 1 depicts the waveform and spectrogram of an infant cry. Figures 2-4 illustrate the waveforms and spectrograms of sleep, hunger, and discomfort cries, respectively.
It becomes pertinent to mention that infant cries belong to the most sensitive range of human auditory sensation. Infant cries are initiated from events occurring in the respiratory and nervous system. e sound is generated from the vocal cord and vocal tract with a frequency range of 250 Hz. to 600 Hz. e first shrill cry provides significant information on APGAR count to categorize if a newborn baby could be considered as health, weak or sick. e vocalizations, time variance, and limb movements associated with infant cry provide insight on their neurological aspects. Analysis of this sound as already mentioned is important in the detection of health hazards, and various studies have been conducted in the similar direction. Originally, sound spectrogram was used as the primary tool for the analysis of crying sounds in the years 1960s and 1970s. Spectrogram was an analog device that plotted time on the x axis, frequency on the y axis, and the frequency was depicted by dark lines. e progression of the technology led to the present day use of pitch frequency, cross-correlation, Mel frequency, cepstral coefficients, and various automatic classification methodologies. ese traditional methodologies emphasized on analyzing infant cry sounds based on features derived from fundamental frequency contours, pitch contours, and signal energy in different frequency subbands and unvoicing. Moreover, the frequency of crying and formants of the cry signals has been analyzed. Attempts have been made to classify infant cries based on the root causes-pain, sadness, hunger, and fear. e characteristics of the cry signal pitches have categorized cry signals as urgent, sick, and various others. Pitch detection algorithms have been implemented to calculate the instantaneous fundamental frequency (F0), wherein the first three formants along with F 0 have been used to analyze the sounds. Machine learning enables systems to automatically learn and build the analytical model from their experience. [1][2][3][4][5][6][7][8][9][10][11][12][13][14] e work in [15] presented an extensive review of research done focusing on the analysis and classification of infant crises. e review was conducted targeting various aspects such as data acquisition techniques, cross-domain signal processing techniques, and various machine learning classification techniques. e contribution of preprocessing techniques in describing diversified features, namely, Mel-Frequency Cepstral Coefficients (MFCC), spectrogram, and fundamental frequency, is discussed. It is observed that acoustic and prosodic features that      are extracted from various domains have the potential to segregate frame-based signals from one another and thus are used for the training of machine learning classifiers. e study discusses traditional machine learning techniques along with newly generated deep learning architectures and highlights future directions of research in data processing, feature extraction, and neural networks for understanding, interpreting, and processing infant cry signals.
Although various studies have been conducted on infant cry data analysis, they lag a comprehensive methodology for the analysis and recognition of signals to achieve optimized decision making. e unique contribution of the present study involves the following: (1) Conducting exhaustive preprocessing includes seven steps, namely, standardization through normalization, framing, detection or end-point detection or cry unit, preemphasizing, windowing, and Fast Fourier Transformation. (2) Features were extracted using acoustic feature engineering, and variable selection using random forests (VSURF) was used for selecting the highly discriminative feature set. (3) Implementation of grouped-support-vector network is for an efficient infant cry classification with a faster emotional cries recognition rate of 27 seconds. (4) e proposed model yields effective results in higher dimensional spaces and is applicable in scenarios, wherein the dimension count is larger than the sample count. (5) e implementation of boosting helps create classifiers for its ensemble by training each classifier. is is achieved through random redistribution of the training datasets through the resampling process. (6) e best experimental results exhibited a mean accuracy of about 91% for most cases, and this demonstrates the potential of the extreme gradient boosting-powered grouped-support-vector network in neonate cry classification.
e following sections of this paper are organized as follows: Section 2 presents a detailed literature review. Section 3 illustrates the proposed methodology, Section 4 presents the results and discussions, and Section 5 provides conclusion of the study.

Literature Review
Numerous studies have been conducted pertinent to the analysis and interpretation of infant cry. Some of the significant and recent studies conducted on infant cry classification are discussed in this section. Table 1 presents a consolidated review of techniques and challenges in infant cry classification.
is section highlights some of such studies and their observations. e study in [24] emphasized the importance of automatic recognition of infant cries to develop an application that would improve the quality of life of the infant and their parents. e study generated real-time datasets from infant cries and selected the most relevant sound attributes that affected the experimental results and helped monitor infants. e framework automatically detects instances of discomfort signals, which are common among 25 percent of infants, using machine learning techniques. e study included an ensemble technique through which low-level audio features were selected from labelled precry recordings and high level features relevant to the envelop of crying. e inclusion of precry signals helped understand infant needs better, providing the opportunity to develop superior quality of baby monitors. e study in [25] performed an experimental analysis using two ensemble models to classify infant cry. e two models used are a boosting ensemble of artificial networks and a boosting ensemble of support vector machines. e study highlighted the superiority of the neural networkbased ensemble model in the classification of infant cry. e challenges of the study included the difficulty of collecting cry samples from normal babies without any pathology, pain, or hunger issues. e availability of a larger number of samples would help generalize the results achieved and justify the possibility of its application in real time.
ere have been immense advancements in perinatology and neonatology, which have created a positive impact in the survival of preterm and low weight neonates. e study in [26] highlighted the importance of infant cry analysis as a noninvasive complementary tool for the assessment of the neurological conditions of premature neonates. e study emphasized on identifying the distinctions between fullterm and preterm neonatal cry using automatic acoustical analysis in association with various data mining techniques. e act of crying is a sole method of for infants to communicate with their environment to inform about their necessities and issues. ese audio signals require thorough analysis and extraction of features with the help of expert knowledge. e use of deep learning does not require much of preprocessing and is capable of extracting important features automatically from the datasets. e study in [27] implements a deep learning-based feature extraction technique followed by machine learning algorithms for the classification of infant cry. e audio signal of 4-second duration was transformed into a spectrogram image and then fed into the DCCN for extraction of features. e extracted features were further classified using ML algorithms, namely, SVM, Naïve Bayes, and KNN. e framework was evaluated with the Bayesian hyperparameter optimization technique. e results highlighted the superiority of SVM in the classification of infant cry due to hunger, pain, or sleepiness. e study in [28] implemented machine learning techniques to analyze distress calls among infant chimpanzees. e exemplars were extracted from the distress call episodes, and the external events that caused such calls and the distance from the mother were analyzed to identify any correlations.
e results revealed that such distress calls could provide information on discrete problems faced by the infants and their distance from the mother. ese factors would act as a guide for maternal decision making. However, Journal of Healthcare Engineering the role of acoustic cues in this regard has remained a topic of future scope of research. e study of infant cry recognition helps identify the typical needs of infants from their care givers. Different cry sounds portray different meanings and help the caregivers respond appropriately, which further influence their emotional, behavioral, and relationship development. e recognition of infant cries is quite more difficult than understanding of adult speech due to the absence of verbal language-based information. e study in [29] analyzes different types of emotional necessity as communicated by infants through their cry, namely, due to hunger, sleepiness, stomachache, uneasiness, and need to burn. A combination of CNN and RNN is used for feature extraction and classification. e CNN-RNN method when implemented in the study outperforms the traditional methods in terms of accuracy up to 94.97%. e use of IoT and smart devices has helped develop state-of-the-art infant incubators that would enable caregivers to respond quickly to the specific needs of the infants. e baby voices are classified using machine learning using the open voice database.
e sensor-based incubator as proposed in the study [30] would help in reporting the infant's condition. e use of IoT technology would enhance the function of the actuators inside the incubator. Finally, the combination of historical data and the live data collected by the sensors would provide extensive information on the infant's condition and the environment. e study in [31] used neural networks to analyze the source of infant crises. e work combined the genetic algorithms, ANN in association with linear predictive coding (LPC) and MFCC for the classification of infant cries. e results justified the superiority of the proposed method when compared with other traditional approaches. e study in [32] used Gaussian mixture model-universal background model for the recognition and analysis of infant cry signals even when there are channel imbalances and corroded signals. e results proved to be much superior when compared with high-order spectral features ensuring enhanced accuracy. e study in [16] highlighted the importance of explicit and relevant feature representation of infant cry signals, which are significantly different from speech signals. e study proposed the use of unsupervised auditory filterbank learning implementing convolutional restricted Boltzmann machine (ConvRBM) model. e model was able to successfully distinguish the differences between normal and pathological crying signals. e model, when compared with the mel-frequency cepstral coefficient model (MFCC), performed better in terms of accuracy. However, the model was not evaluated against any other model apart from MFCC. e study in [17] presented a review of various techniques adopted in infant cry analysis and classification. e reviews were primarily focused on different aspects of data acquisition, signal processing techniques across various domains and different machine learning based classification techniques. e paper discussed diversified features, namely, MFCC, fundamental frequency, and spectrograms. e study also highlighted the use of traditional machine learning models such as KNN, GMM, and SVM and latest models CNN and RNN in infant cry identification, analysis, classification, and detection. e scalability of datasets, unavailability of skilled labor for collecting data, and lack of collaboration between medical professionals and researchers were identified as the challenges in infant cry research. e study in [18] proposed a CAD system that helped differentiate the healthy and unhealthy infant cry signals.
e system constituted of four stages: firstly, the preprocessing of the cry signals was done to remove background noise and signal segmentation was carried out. en, the preprocessed signal was analyzed to attain its cepstrum.
irdly, the cepstrum coefficients were fed into a deep feed forward neural network (DFNN) for the purpose of training and classification. Finally, the system was evaluated against the standard classification performance metrics. e study did not include classification of deep features and nonlinear statistical features. e study in [19] implemented convolutional neural network (CNN) for classifying the infant vocal sequences. e classes identified were, namely, "crying," "fussing," "babbling," "laughing," and "vegetative vocalization." e audio segments were represented as spectrograms and fed into the conventional CNN. e accuracy achieved was quite balanced. e model, however, was not evaluated against any other models. e study in [20] developed a smart cradle that operated based on the sounds of the infant. e infant sounds were classified using support vector classifier (SVC) and radial basis function (RBF) kernel based on 18 features extracted from the infant sounds. e system was evaluated against linear and polynomial SVC functions and other traditional classification models like Decision Tree, Random Forest, and Naïve Bayes algorithms. e proposed system was identified to be working with enhanced accuracy in comparison to the other traditional approaches. e study was based on only 4 types of sound, and hence, inclusion of more sound categories would help justify the accuracy of the model with enhanced accuracy. e study in [21] performed infant cry classification based on different features that were extracted from the processing of speech and auditory dataset. e model at the outset was trained using the individual features. en, the most significant features were selected, and the model was retrained combining these selected features. SVM, KNN, logistic regression, and random forest models were used for the purpose of classification. e model could be further evaluated using an extensive dataset for further evaluation and justification of the proposed approach.
It was emphasized in [22] that duration and frequency of infant crying were important identifiers for child health condition. e manual monitoring and observation in this regard were extremely taxing and had possibilities of indicating erroneous results. e study thus focused on developing a smart phone-based framework to automatically detect infant crying. Datasets of infant crying clips were collected from different online sources, and the audio features were extracted from these clips using the OpenSMILE software. e random forest algorithm was used to classify the crying and noncrying audio clips, and then, the model was evaluated using real-time audio clip recordings. e Motorola G5/G6 phones were used for experimentation. Hence, the performance of the model in case of other smart phones remains uncertain. e study in [23] classified infant crying sounds using the Higuchi fractal dimensions. e KNN and SVM algorithms were used for the purpose of classification. Both algorithms were evaluated, and the results revealed that SVM performed better than KNN in terms of accuracy. e study could be further improved by including more types of infant cries and an extensive dataset that would help justify the superiority in the performance of the proposed approach. Figure 5 represents the flow diagram of the proposed neonate cries classification system. e second step in the proposed framework involves preprocessing of the dataset, which helps in eliminating inferences and disturbances existing in the cry data. An exhaustive preprocessing is performed, which includes seven steps, namely, normalization, framing, end-point detection, cry unit detection, preemphasizing, windowing, and fast Fourier transformation. e preprocessing is succeeded by feature extraction (acoustic feature engineering), which helps eliminate irrelevant features from the dataset and consider features, which contribute significantly to the overall output of the classification. e classification of infant mode cries specifically by audio signals would lead to the generation of a large amount of data having difficulty in identification. Hence, the input signal is converted to a relatively concise feature vector, and the characteristic parameters are extracted, which represent the cry signal. In the case of signal analysis pertaining to the time domain without conversion, relatively lesser features are extracted from the original signal without making it damaged or lost. On the contrary, in the case of signal analysis pertinent to frequency domain, relatively more features get extracted from the original signal without much damaging or loss of the same, while the audio signal gets converted from the time domain to frequency domain. us, in this study, features are extracted from both time domain and frequency domain to achieve enhanced identification of infant cry models. Table 2 lists the twelve extracted features using acoustic feature engineering. e variables used in the study are as follows:  e magnitude of an audio signal is the measure of how distant it, regardless of direction, and its quantitative value differ from zero

Zero Crossing
Rate. e rate at which the signal changes from positive to zero to negative or vice-versa is termed as zero crossing rate. is value is immensely helpful in the recognition of audio signals, being a key feature for the classification of percussive sounds.
3.1.5. Bandwidth. Bandwidth is the difference between lower and upper frequencies in the case of continuous band of frequencies.

Valley.
FValley j,k � log 1 αN 3.1.8. Pitch. Pitch presents the perception of considering frequency based on low acoustic signal or high acoustic signal, which is analogous to the concept of fundamental frequency (f 0 ). e basic methodology to measure f 0 is to study the waveform in the time domain. Autocorrelation calculation was used to extract the pitch of the neonate cry signal in the present study.

Formant.
Formant is defined as the assortment of frequencies of a complex sound, in which there exists an absolute or relative maximum in the acoustic spectrum. Formants can be referred to as either a resonance or the spectral maximum generated by the resonance. e formants are usually measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram or a spectrum analyzer. e first six formants are extracted from the signal frame as per the characteristic parameter being represented as F1∼F6. [27] is predominantly employed in various audio recognition applications. LPCC involves modeling of the human vocal tract using a digital all-pole filter. ere exists p number of LPCCS, which are clustered together to establish one feature vector for a specific neonatal cry signal frame. In the present study, the factor p is fixed to a value 12, and the extracted twelve features are represented as LPCC 1 ∼LPCC 12 .

MFCC.
In audio recognition systems, MFCC [26] is one of the best frequently adapted feature extraction techniques. Considering the frequency spectrum of the  windowed neonatal cry signal frames, the feature vectors get extracted. Assuming that p is the order of the Mel scale spectrum, the feature vectors are attained considering the first p-DCt coefficients. In the present study, the factor p is fixed to a value 12, and the extracted twelve features are represented as MFCC 1 ∼MFCC 12 .

ΔMFCC.
e objective of using ΔMFCCs involves the enhanced capability to recognize cry signals.
is is achieved through a better understanding of the dynamics of the power spectrum, i.e., the trajectories of MFCCs over time.

Feature Selection-Variable Selection Using Random
Forests.
e feature selection technique used in the present study is variable selection using random forests (VSURF), which is implemented using the "VSURF package" in R environment. e steps involved in the VSURF algorithm are presented in this section. Firstly, the variables are ranked on the basis of their importance measurement, and unimportant ones are eliminated from consideration. Secondly, two different subsets are obtained either by considering a collection of nested RF models including selection of the most accurate variable or by introducing the sorted variables sequentially done. It is important to mention that each RF is typically built using ntree � 2000 trees. ere are 12 features extracted in each frame. It becomes cumbersome and time consuming to use the 12 features directly to train a classifier. Hence, to reduce the computational time, the selection of discriminative features is necessary to achieve the optimum level of accuracy. As part of this study, five discriminative features, namely, peak, pitch, MFCCs, ΔMFCCs, and LPCC, are selected, which would be used as features while training the cry signals.

Grouped-Support-Vector
Network. Support vector machines (SVMs) are a set of supervised learning methods, which are used for classification, regression, and outliers' detection in the n-dimensional hyperspace. e benefits of SVMs include its ability to perform effectively in high dimensional spaces and in cases where the numbers of dimensions are greater than the number of samples. Another significant advantage is that it deploys a subset of training points in the decision function termed as support vectors, making it memory efficient. Furthermore, SVMs can simultaneously minimize estimation errors and model dimensions. e experimental analyses were implemented in R open source platform. e SVM models in this work were devised by employing the "e1071 package" in the R library. Moreover, the major advantage of using this library is that it permits the alternation of the traditional SVM classification model and makes it possible to be implemented for the multiclass classification. For augmenting the computational speed, the "doMC package" and "foreach package" were used to permit the parallel development of the modules in the grouped model. e "doMC package" is a "parallel backend" for the "foreach package." It presents the technique required to execute "foreach" loops in parallel.
e "foreach package" delivers a new looping construct for the execution of R code repeatedly. Moreover, the "foreach package" was deployed in combination with a package named "doMC," which would enable the code to be executed in parallel. e main objective behind the use of "foreach package" is its ability to support parallel execution, in the sense that it can execute such repeated operations on multiple cores on the workstation. Besides, by using "foreach," this operation was executed in parallel on multiple cores, reducing the execution time back down to minutes. Furthermore, it can be witnessed that the final classification outcome of the grouped classifier is the amalgamation of the prediction of the individual classifiers. Another vital point to note is that the individual SVM classification members are diverse, and they also have precise and unique performance, which makes it easier for the grouped-support-vector network to have a more accurate prediction. Figure 6 shows the infant cry classification using grouped-support-vector network.

Boosting.
Bootstrapping, which is also known as bagging, is a predominantly used sampling technique. It is an ensemble method that creates classifiers to implement an ensemble approach. is is achieved by training each of the classifiers, following a random redistribution of the training datasets using the resampling technique. It thus incorporates the best of both bootstrap and aggregating techniques. In case of boosting, "r" samples are chosen out of the "p" available samples with replacement. e learning algorithm is then implemented on each of these samples. e point of sampling with replacement is to ensure that the resampling performed is random in the truest sense. If the point of sampling is performed without replacement, the samples drawn would be dependent on the previous ones and hence will not be random. e predictions from the above models are aggregated to conclude to the final combined predictions. e aggregation could be done on the basis of predictions made or the probability of the predictions made by the individual bootstrapped models. e random sampling method based on the bagging technique is applied repeatedly to attain a group of member classifiers. e use of bagging in resampling of the training subset using bootstrapping of each classifier within the ensemble helps achieve diversity. In such circumstances, a dissimilar training subset is extracted from the original training set using the technique "resampling with replacement," thereby generating "m" number of subsets. Each one of these generated subsets is further used to train a classifier within the ensemble.
is developed ensemble further used helps predict a subset of unseen testing data, wherein the output of the classifiers inside the ensemble is combined using weighted majority voting. In case the ensemble classifier fails to get similar prediction accuracy, providing more voting weight to the classifier having high accuracy is considered the best fit. is approach is known as weighted majority voting. Combining bootstrapping with the weighted majority voting aggregation method leads to the development of a new category of ensemble-based systems called "boosting." e objective is to assign higher "weights" to classifiers, which have high accuracy during the training process, whereas assigning lower weights to classifiers has lower accuracy, which would increase the probability of correct final output of the ensemble model.  Extreme gradient boosting (XGBoost) is implemented using the "xgboost package." It is a scalable and efficient application of gradient boosting framework, which helps perform parallel computation automatically. In XGBoost, the decision trees are developed in sequential form. e weights take an essential role in XGBoost being assigned to the independent variables. It is fed to the decision tree, which helps in predicting the results. In case the weight of the variables is predicted inaccurately by the tree, the respective value is increased, and further, variables are fed into a second decision tree. ese individual classifiers are then combined or ensemble to develop a stronger and more accurate model. e ensemble SVM model is initially optimized by identifying the best regularization parameter "C" for the training criterion and the bandwidth "c" for the Gaussian kernel. For the sake of achieving the same, a grid search is implemented using the parameters "C" � [1, 3,33] and "c" � [0.1, 0.2, 0.3,., 5]. e entire dataset TR is first segregated into training TR m and testing subset VA m as per 80 : 20 ratio, respectively. e radial basis function (RBF) kernel is used in the development of SVM regression models. e final SVM classifier is then developed using these parameters before being added to the ensemble. is procedure is repeated 300 times until the entire ensemble is generated. e final ensemble classification is finally calculated using a weighted majority voting approach. Figure 7 illustrates the infant cry acquisition setup. e dataset used in this study consisted of 258 sleepy cries, 372 hunger cries, and 372 discomfort cries taken from 12 female and 17 male infants having an age between one to ten days. ese newborn babies were located at the department of Obstetrics and Gynecology, National Taiwan University Hospital Yunlin Branch, Taiwan. Additionally, the newborn babies were normal and had no pathological background. Moreover, these infant cries were classified into three categories, namely, sleepy, hunger, and pain-induced cries. During the acquisition of the cry, the infants were placed in the semisupine position: the infants facing upward with their head resting on a cradle and their neck in a neutral position. e infant's arms were in a neutral thumb position. Sony HDR-PJ10 High Definition Camcorder was deployed to record the infant cries with a 44.1 kHz sampling rate of 16 bits of resolution. e distance between the infant's mouth and the camcorder microphone was around 40 cm. e lengths of recorded infant cries were between 10 and 60 seconds. e infant cry measurement setup is illustrated in Figure 7. is research was approved and accepted by the ethical review committee from the National Cheng Kung University Human Research Ethics Committee. e parents of the infants had provided their written consensus for being a part of this study. All experimental results were validated using 10-fold cross-validation. e benefit of this process is that, during the training and validation, it completely includes all the data samples; however, a data sample is deployed only once for validation purposes. Table 3 lists the dataset deployed for training and testing purposes.

Infant Cry Classification Based on the 12 Extracted
Features.
e twelve extracted features were deployed in the experiment. Among all considered cry signals, eighty percent of the cry signals were employed in training, and twenty percent were utilized for testing. Table 3 collates the number of cries deployed in the training and testing of the dataset.
ere are totally 372 hunger cries, 258 sleep cries, and 372 discomfort cries. Table 4 presents the classification accuracy based on the extracted 12 features. It can be observed that the mean cry classification accuracy is around 91 percent. Further, the discomfort cries have the highest classification accuracy of 95 percent. Figure 8 portrays the receiver operating characteristic curve of the proposed model with 12 features. e accuracy can be computed from the following expression: accuracy � number of correctly predicted cries total number of cry predictions .

Infant Cry Classification Based on the Selected 5 Features.
In this experiment, the dataset exemplified in Table 2 was employed. Table 5  infant cry classification based on the selected 5 features, namely, Peak, Pitch, MFCCs, ΔMFCCs, and LPCC. Further, the discomfort cries have the highest classification accuracy of 96 percent, and the mean cry classification accuracy is around 95 percent. Figure 9 portrays the graphical representation of the classification accuracy with and without feature selection. Figure 10 depicts the receiver operating characteristic curve of the proposed model with 5 features.

Comparison of Infant Cry Classification between Male and
Female Babies. In experiment 3, the classification scenario of experiment 2 is deployed for understanding the variation in cries between male and female infants. An aggregate of 422 cries comprising of 193 male and 229 female infant cries was tested in this research. e classification accuracy for distinct genders is displayed in Table 6. e classification accuracies for male and female infant cries are 93.78% and     Table 7 indicates the comparison of the proposed model with the other models for different infant cry datasets. It can be witnessed that the proposed grouped-support-vector network provides better mean classification accuracy than Chang et al. [36] for hunger, sleep cry, and discomfort emotional cries. It could be witnessed that when there is a clear margin of separation between the emotional cry classes, the proposed extreme gradient boosting-powered groupedsupport-vector network works excellently and is very effective in high dimensional spaces.

Conclusion
We studied numerous details about infants using newborn cry signals. Quite a few researches were found in the literature for infant classification using different approaches. is research proposed an extreme gradient boostingpowered grouped-support-vector network for infant cry classification with mean accuracy about 91% for the majority of the experimental scenarios. Initially, 12 features were extracted using acoustic feature engineering, and the variable selection using random forests (VSURF) is used for selecting the highly discriminative features. e dataset used in this study comprised 258 sleepy cries, 372 hunger cries, and 372 discomfort cries taken from 12 female and 17 male infants having an age between one to ten days. e newborn babies were normal and had no pathological background. e empirical results show that the proposed method provides a mean classification accuracy of around 91%, and this approach effectively studies even the elusive changes in the neonate cry signals with a faster recognition rate of 27 seconds. Even though the proposed method performs reasonably well for neonate cries with minimal noise, however, the performance of this system could be held back for cries with high noise levels. erefore, in the present dataset, the unwanted noise signals were removed during the preprocessing stage. For future work, we are planning to deploy advanced deep learning and optimization approaches for achieving an improved performance.
Data Availability e original contributions generated for this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest to report regarding the present study.