Predicting Pulsars from Imbalanced Dataset with Hybrid Resampling Approach

,


Introduction
Pulsar star represents a stellar remnant often formed by the remains of a collapsed giant star. Usually a neutron star, a pulsar is small in size but contains a large amount of mass. Despite being uncommon, pulsar stars are very important for scientists to study nuclear physics, general relativity, gravitational waves, and factors leading to the collisions of black holes. In 1967, Jocelyn and Anthony Hewish accidentally discovered a pulsar when they were studying distant galaxies [1]. Looking at a particular point through the telescope, they noticed radiation pulses and named them little green men 1 (LGM1). Later these unidentified objects were termed pulsars due to emission as pulses. Now they are called the pulsating source of radiation (PSR), and B1919 + 12 (PSR B1919 + 21) shows the position of the pulsar in the sky [2]. e emission pattern of each pulsar varies over each rotation, so it is averaged over several rotations to determine a star as a pulsar candidate. Without enough radiation, it is very difficult to detect a true pulsar star. However, under certain conditions, detection is possible such as when angled at earth or X-rays burst caused by the detonataion also known as supernova.
Pulsars are the rapidly pivoting astronomical objects detected as a neutron star that emits radiation at the rate of 100,000 km/k to 150,000 km/s with regular intervals and patterns. rough rays, pulsars emit electromagnetic power that gradually slows down, and pulsars become quiet within ten to a hundred million years. According to the Australian Telescope National Facility (ATNF) catalogue, around 2801 pulsars are identified [3,4], and an estimated 20,000 to 100,000 pulsars are present in our galaxy indicating that 90% of the pulsars are yet to be identified [5]. Detecting true pulsar is not a trivial task as it is challenging to detect pulsar from the noisy time series data. Each pulsar produces slightly different patterns of signals which make it different from the other signals, and these patterns are called pulsar profiles. In practice, pulsar detection is based on radio frequency interference which makes the identification of legitimate signals very hard. e signals which fulfill the criterion of pulsars are termed as "candidates" and may be termed as new pulsars.
Several automated and human-based methods are used to identify the legitimate candidates for pulsars, and this process is known as "candidate selection" [6]. Until the 2000s, manual selection of candidates was used to find pulsars which generally requires 1-300 s for inspecting each observation [7]. erefore, for manual inspection of 1 million candidates, up to 80,000 hours of a person are needed. So, manual classification techniques for evaluating pulsar candidates are not appropriate and suitable. Consequently, other techniques are developed to carry out pulsar candidate identification like graphical and automated methods. However, these techniques are computationally expensive as a lot of work is required to uplift the speed and sensitivity of algorithms [8].
By the time, algorithms decreased the ratio of noise in pulsar signals, and signal-to-noise ratio (SNR) became an important factor for pulsar detection. In pulsar astronomy, another important feature called dispersion measure (DM) of the pulsar is also used [9]. e delay of the pulse is associated with DM and radio frequency and has been regarded as an important feature for finding pulsars. Both supervised and unsupervised approaches can be used to perform pulsar detection. For example, unsupervised approaches can be used to group the pulsar data into different clusters whereby the features of each cluster can be further analyzed to select pulsar candidates. is approach is particularly useful for large amounts of unlabeled data. For the HTRU2 dataset, the labels are added by the experts, so supervised machine learning models seem appropriate. One major limitation with the recent works on pulsar detection is the use of imbalanced data. HTRU2 contains a large number of non-pulsar samples while pulsar samples are very few which affects the performance of the classification models.
is imbalanced dataset can lead to model overfitting on majority class data. For such models, even though high accuracy is reported, the F1 score is significantly different than the accuracy. Despite the proposal of several automated approaches for finding pulsars, the gap between the provided and the desired accuracy and sensitivity demands further research in this domain. To this end, this study proposes an automated approach for true pulsar prediction using supervised machine learning algorithms and makes the following contributions: (i) is study devises a methodology for automatic detection of pulsars using the supervised machine learning algorithms. For this purpose, the performance of several well-known machine learning algorithms is analyzed such as random forest (RF), extra tree classifier (ETC), gradient boosting classifier (GBC), and logistic regression (LR). In addition, a multilayer perceptron (MLP) is added in the study. (ii) e HTRU2 dataset is used for conducting experiments, and the influence of dataset imbalance is extensively investigated. ree resampling approaches such as synthetic minority oversampling technique (SMOTE), adaptive synthetic (ADA-SYN), and cluster centroids (CC) are studied for their efficacy in data balance. Ultimately, a hybrid data resampling approach, concatenated resampling (CR), is proposed to solve the data imbalance problem of the HTRU2 dataset. (iii) Extensive experiments are performed to analyze the effect of data balance with SMOTE, ADASYN, CC, and CR on pulsar detection accuracy. Experimental results and performance comparison with state-ofthe-art approaches prove that the CR approach performs superior to other resampling approaches. e rest of the paper is arranged in the following manner. Research papers related to the current study are discussed in Section 2. Section 3 describes the dataset, machine learning algorithms used for experiments, resampling approaches, and the details for the proposed hybrid resampling. Results and discussions are presented in Section 4 while Section 5 provides the conclusion.

Related Work
Due to the importance of the detection task for true pulsar stars, several automated approaches have been proposed. ese approaches can be broadly categorized under three groups: machine learning approaches, deep learning approaches, and approaches focusing on features' importance. Due to the success of machine learning approaches for various tasks such as classification, object detection, and text analysis, a large number of machine learning-based methods are available in general [10]. However, the pulsar detection domain is not extensively studied and lacks the desired accuracy. e authors present a machine learning-based approach in [7] for the pulsar selection. It deals with 16 million pulsar candidates obtained from the reprocessing of the Parkes' multibeam survey dataset. A radio transit discovery method named V-FASTR fused random forest is proposed in [11]. V-FASTR has the capacity to consequently shift through realized occasion types with 98.6% accuracy on the training data and 99% on test data. e authors utilize 6 different models to characterize scattered pulsar bunches using signal pulse seek framework in [12]. e dataset used in the research contains 300 pulsars examples and 9600 non-pulsar examples. Several datasets have been generated using different imbalance treatments. Experimental results show that multiclass ensemble tree learner has high performance and low false positive rate when used with oversampled data. e study [13] used different machine learning algorithms like GBC, AdaBoost, and XGBoost for the classification of pulsar candidates. To deal with the data imbalance problem, SMOTE is used for oversampling the minority class in the dataset. Several important features from each algorithm are determined for pulsar classification. e major issue with this technique is that the accuracy of radio frequency interference classification is very sensitive to feature selection. e authors present a hybrid machine learning model, random tree boosting voting classifier (RTB-VC), in [14] for pulsar star prediction. RTB-VC combines the freebased classifiers for training on the HTRU2 dataset. RTB-VC uses various combinations of hard voting, soft voting, and weighted voting to obtain high accuracy. A 98.3% F1 score is reported using the proposed RTB-VC model.
Due to the deployment of deep learning approaches in diverse fields for classification and their high accuracy, several deep learning-based models have been adopted for pulsar detection and classification. For example, the authors used a convolutional neural network (CNN) in the PCIS algorithm from the ResNet model for pulsar detection in [15]. On the GBNCC dataset, the proposed system achieved 96% accuracy. Similarly, the research [16] uses an artificial neural network (ANN) for finding true pulsar stars from the HTRU dataset. e research achieves an accuracy of 85% to detect pulsars by visually impaired investigation. It also dismisses 99% of noisy candidates. Both the studies greatly improved recall and decreased the false positive rate. However, the used feature selection method is simple which is based on the hypothesis and subjective to experience. Artificial errors can be made easier which readily affects the performance of the used approaches.
e study [17] focused on pulsar classification using hierarchical deep neural network (DNN). To reduce the training time of DNN, pseudoinverse learning (PIL) is preferred over the gradient descent (GD) method. e proposed model provides 94.65% and 87.66% F1 scores for HTRU medlar and PMPS-26 k datasets, respectively. Despite the low F1 score compared to CNN + BPNN, training time for the proposed model is 5 times low than traditional CNN models. A swift model for the elimination of radio frequency interference (RFI) in pulsar data was proposed in [22]. For learning RFI signatures of real pulsars, PIL-based single hidden layer autoencoder (AE) was used. Results indicate that AE is more robust in learning RFI signatures and can be used to remove them from fast-sampled spectra. As a result, the signals from real pulsars can be obtained. e study [20] investigated the pulsar classification using three datasets: HTRU mid-latitude dataset, the MINIST dataset, and the CIFAR-10 dataset. In the first stage, strong representations for the pulsar candidate are developed in the image domain by extracting deep features with a deep convolutional generative adversarial network (DCGAN). During the second stage, MLP-based classifier is defined using a pseudoinverse learning autoencoder (PILAE). For data imbalance, the SMOTE oversampling technique is used. e achieved accuracy on the HTRU dataset with different data splitting ratios is 100%. On the MINIST dataset, 97.50% accuracy is achieved while CIFAR-10 shows an accuracy of 100%.
e study [6] extracted eight unbiased statistical features including mean, kurtosis, variance, and skewness from the DM curve and pulse profile curve and designed Gaussian-Hellinger fast decision tree for imbalanced data. Using the statistical features on two datasets including HTRU-1 and LOTAAS, 92.8% recall is achieved with a false positive rate of only 0.5%. e research discovered 20 new pulsars from the LOTAAS dataset using the same strategy. A hierarchical candidate shifting model (HCSM) was proposed in [18] where the cost of incorrect prediction of positive samples is emphasized and multiple classifiers are assembled. Handcrafted features are used from three datasets including HTRU, HTRU-1, and LOTAAS to train three classifiers, which collectively make the assemble classifier. Emphasizing the positive examples and assigning higher weights to them produce better results with the proposed model. HCSM achieves a recall value of 97.49% for HTRU dataset, 84.52% for HTRU-1 dataset, and 100% for LOTAAS dataset. A summary of discussed research works is presented in Table 1.

Dataset.
On account of the importance of pulsar detection, several datasets have been provided for pulsar detection over the years. For the current study, the HTRU2 dataset from Kaggle is used which was collected during high time-resolution [6,23,24]. Pulsar Feature Lab tool is used to extract pulsar feature data by using candidate files [25]. Table 2 shows the details for the number of samples for pulsar and non-pulsar classes while Table 3 describes the features of the dataset.

Problem Statement.
Keeping in view the results of the related studies discussed in the previous section, it is clear that the dataset used for experiments is not balanced. Similarly, the most commonly used dataset, i.e., the HTRU2 dataset, is highly imbalanced. Only 1,639 samples belong to the pulsar class out of the total 17,998 samples. e class imbalance would result in model overfitting as the machine learning models tend to give higher weight to the class with a higher number of samples. As a result, the F1 score is affected despite good accuracy results from the machine Advances in Astronomy 3 learning models. is study aims at solving this problem by proposing a hybrid resampling approach to achieve high pulsar detection accuracy.

Data Resampling for Imbalanced Dataset.
Looking at the statistics of the dataset given in Table 2, only 1,639 out of 17,898 examples are pulsars while 16,259 are non-pulsars. is is a 1 : 10 ratio which makes the dataset highly imbalanced because the class distribution is skewed towards a specific class. Data imbalance affects the classification performance of the classifiers because the machine learning classifiers tend to the majority class while training. It creates problems for classification. Several approaches can be utilized to deal with the data imbalance. For the present study, two data resampling approaches are adopted.

Synthetic Minority Oversampling
Technique. SMOTE is a widely used oversampling technique to manage imbalanced data [26]. When class distributions are skewed towards a specific class, an imbalanced data problem arises. SMOTE increases the number of data instances by developing random synthetic data of the minority class from its nearest neighbors using Euclidean distance. e newly developed instances are very similar to the original data as the new instances are developed based on the original features [27]. SMOTE is not the best option while dealing with the high-dimensional data because it can create additional noise which is not the case with the HTRU2 dataset used in the current study. SMOTE is adopted based on the results reported in [12,28] where the data have a ratio of 1 : 10, just like the current study. By generating the samples for the minority class using SMOTE, we get a 1 : 1 ratio of pulsar and non-pulsar as shown in Table 4. For SMOTE implementation, we used an opensource Python toolbox, called imbalanced-learn which uses Scikit-learn, SciPy, and NumPy.

Adaptive Synthetic
Resampling. ADASYN is used for upsampling the minority class samples in an imbalanced dataset [29,30]. Being the enhanced form of SMOTE,   ADASYN has been regarded as superior to SMOTE. ADASYN generates synthetic alternatives for observations of the minority class. e ease and difficulty of generating observations depend upon the learning difficulty. An observation is "hard to learn" if several observations exist in the majority class, having similar features to that of minority class observation. It essentially leads to the observation surrounded by majority class instances when plotted in the features space which makes it harder for the models to learn. Due to its efficiency and reliability, it is widely used in many applications like detection of cancer, credit card fraud detection, and so on.
3.6. Cluster Centroids. Besides using SMOTE and ADASYN oversampling approaches, this study utilizes cluster centroid undersampling approaches to downsize the majority class. During this process, clusters of the majority class are formed and the whole cluster is replaced with the centroid to undersample it. For this purpose, the current study uses the K-mean algorithm to find the clusters of the majority class.

Supervised Machine Learning Models.
For performing classification, several types of machine learning models are available. e availability of open-source library Scikit-learn helps researchers to solve classification problems using machine learning and ensemble learning [31]. Well-known machine learning algorithms are selected due to their reported performance. Instead of devising new models, already established models are selected, and their performance is optimized using several hyperparameters. e machine learning models used in this research are RF, LR, GBC, ETC, and MLP. Several parameters of these models are fine-tuned to optimize the performance, and the list of used parameters is provided in Table 5.

Random
Forest. RF is a tree-based ensemble learning model, which produces accurate predictions by combining many weak learners [32]. e bagging technique is used where a variety of decision trees are used during training with various bootstrap samples [33]. A bootstrap sample is derived by subsampling the training dataset with replacement, where the size of the sample is the same as that of the training dataset. RF uses decision trees for the prediction process, and a big issue in the construction of decision trees is proof of identity of the attributes for root nodes at each level. is method is termed attribute selection. In ensemble classification, some classifiers are trained and their results are pooled through a voting process. Previously, many researchers have proposed ensemble learning approaches [34][35][36]. e widely used ensemble learning methods are bagging [37] and boosting [38,39]. In the bagging (or bootstrap aggregating) technique, classifiers are trained on the bootstrap samples to minimize the variance of classification. RF has the following mathematical form: where p is the final prediction by the majority of decision trees and T1(y), T2(y), . . . , Tm(y) is the number of decision trees taking part in the production process.
3.9. Gradient Boosting Classifier. In GBC, several weak learning classifiers work together to create a strong learning model. e working principle of gradient boosting is timeconsuming and computationally expensive because it creates several independent trees. Gradient boosting has been previously used by several studies in astronomy [24]. For example, study [40] uses GBC for photometric classification of supernova while study [41] uses GBC for the detection and classification of galaxy using Galaxy Zoo catalogue. Mean square error (MSE) is used in the GBC as where r is the learning rate and (h i + h p i ) is the sum of all the residual values which are near to 0 or minimum and predicted values are very close to the actual values.

Extra Tree Classifier.
ETC is a meta-estimator also known as the extra randomized tree that uses extra decision trees and fits them into various subsamples of the dataset. To improve the accuracy, it uses the averaging technique and controls the overfitting of the model. ETC works similar to RF, but the difference lies in the construction of trees in the forest. In ETC, each tree is made from the original training sample. Random samples of K best features are used for decision and the Gini index is used to select the top feature to separate the data in the tree. ETC has been utilized to perform various tasks in astronomy. For example, study [42] uses the ETC model for neutrinos detection from a point-like source with the collaboration of KM3Net which is the cubic kilometer neutrino telescope. Advances in Astronomy 3.11. Logistic Regression. LR is a statistical method used to deal with classification problems. LR analyzes the data to estimate the probability of class members. For classification problems where the target variables are categorical, LR is the first choice to perform classification. It processes the relationship between categorical dependent variables and one or more independent variables by estimating probability using the logistic function. A logistic curve or logistic function is a common "S" shaped or sigmoid curve and is defined as where e is the Euler number, v o is the x-value of the sigmoid midpoint, L is the curve's maximum value, and m shows the steepness of the curve. LR works well on binary classification and shows good performance for text classification as well [43,44].

Multilayer
Perceptron. An MLP consists of one or more layers of neurons. MLP is a feed-forward neural network model which maps the set of input data to a set of appropriate outputs and every layer is fully connected. Data are fed into the input layer that passes through one or more hidden layers. e hidden layers provide the level of abstraction, and predictions are made on the visible or output layer [45]. Multiple neurons can be stacked in one layer, and multiple layers have better predictive capacity. e MLP model consists of three layers: one input layer, one hidden layer, and one output layer. We used 32 neurons in the input layer with ReLU activation function, 64 neurons in the hidden layer, and the output layer used one neuron with a sigmoid activation function. e value used for the dropout layer is 0.2. For compilation, we used Adam optimizer, binary_crossentropy loss function, and 100 epochs.

Proposed Resampling Approach.
is study proposes a data resampling approach called combined resampling (CR). CR is a resampling technique that concatenates the results of three resampling techniques including SMOTE, ADASYN, and CC for enhancing the prediction results. Results of all three resampling techniques are concatenated along the horizontal axis, which increases the size of the data. CR is defined as HTRU2 (pulsar,non−pulsar) � pulsar and non − pulsar examples.
ADASYN (i,j) refers to the output data after balancing target ratio using the ADASYN technique; similarly, SMOTE (u,v) and CC (p,q) are data outputs after SMOTE and CC are applied on the original HTRU2 dataset, while u, p, i represent the number of features/attributes and i, q, j represent the number of records.
Here, ADASYN(I, j), SMOTE(u, v), and CC(p, q) are the results of the ADASYN, SMOTE, and CC techniques, respectively, while CR(m, n) is the concatenation result of these three resampling techniques. Additionally, m � i � u � p shows the number of attributes, and n � j + v + q is the number of records. Figure 1 illustrates the proposed CR approach to perform resampling from the original dataset, where RS1, RS3, and RS3 are resampled instances of data from three different techniques which are combined to make the new sampled dataset.

Proposed Methodology for Pulsar Detection.
For detecting the pulsars, the current study leverages the supervised machine learning approach. e concept of ensemble and hybrid approaches is very popular in the machine learning task. A number of studies can be found that leverage hybrid and ensemble models for a variety of tasks in several domains such as image processing, classification, text analysis, and so on [46,47]. For example, study [48] uses a stack generalization technique and ensemble learning approach for pulsar prediction. Similarly, ensemble approaches are also used for predicting the numeric scores for Google apps in [49]. Hybrid or ensemble approaches are also used for text analysis [50]. Results reported for hybrid approaches provide the motivation to utilize a hybrid approach for the task at hand. e flow of the proposed methodology is shown in Figure 2. As the first step, the HTRU2 dataset is obtained from Kaggle. e HTRU2 dataset contains pulsar and nonpulsar examples in an unequal ratio with non-pulsar examples as majority class and pulsar as a minority class. Owing to the influence of data imbalance on the performance of the classifiers, this problem is solved using the proposed approach. For analyzing the influence of data splitting on the prediction accuracy, data splitting is performed before resampling and resampling before splitting, in a ratio of 70 : 30 for both approaches. When data are split before resampling, resampling is applied only on the training set. For data balancing, CC, SMOTE, ADASYN, and CR techniques are used. Table 6 shows the count for both pulsar and non-pulsar samples when resampling is performed before splitting, and Table 7 shows the count for both pulsar and non-pulsar samples when resampling is applied only on the training set. After data splitting and resampling, machine learning models are trained including RF, ETC, GBC, LR, and MLP using 70% of data. e rest (30%) is used to evaluate the trained models. e evaluation is performed using accuracy, precision, recall, and F1 score.

Performance Evaluation Metrics.
Several performance evaluation methods are used to evaluate the machine learning models. e blend of different evaluation tools is helpful to determine the efficacy of an approach [51]. erefore, in this research, four well-known metrics are used including accuracy, precision, recall, and F1 score. In addition, the confusion matrix helps to show true positive (TP), true negative (TN), false positive (FP), and false negative (FN) which are used to calculate the values for accuracy, precision, recall, and F1 score. ese metrics are calculated using the following equations: F1 score � 2 × precision × recall precision + recall . (15)

Results and Discussion
is study performs experiments using a Core i7 7th generation machine operating on Windows 10. Implementation of the machine learning algorithms is done using Python script on Jupyter Notebook.

Results without Resampling.
e performance of machine learning models without data resampling is shown in Table 8. e performance of RF is the highest as compared to other models with 0.980 and 0.887 scores for accuracy and F1 score, respectively. Performance of LR is marginally low with 0.980 accuracy and 0.885 F1 score. A noteworthy point is a difference in the prediction accuracy and F1 score. Such difference in the accuracy and F1 score is often caused by the data imbalance. Models have an overfit due to high number of samples in the majority class and make false predictions for the minority class, leading to the difference in the prediction accuracy and F1 score.

Results Using CC Undersampling.
To improve the performance of machine learning models, data resampling is carried out using the CC technique. e CC technique is used for data balancing and reduces the chances of the model overfitting. e CC technique is an undersampling approach that reduces the number of samples of the majority class by randomly selecting the records and removing them, thus making the number of samples of the majority and minority class equal.
Results given in Table 9 indicate that the difference in the prediction accuracy and F1 score has been reduced after applying the resampling. Using an equal number of samples for training reduces the probability of model overfitting and reduces the gap between accuracy and other performance evaluation metrics. On the other hand, the overall performance of the machine learning models is reduced as well. e primary reason for this downfall in performance is the size of the data used for models' training. Being a data undersampling approach, CC reduces the size of data, and models' training is affected which leads to performance degradation. Despite a decrease in the performance of different models, RF shows the best performance with the undersampled data and achieves 0.943 accuracy score and 0.940 F1 score. e performance of other classifiers is similar except for MLP which achieves an accuracy of 0.905 and F1 score of 0.898.

Results Using SMOTE Oversampling.
e performance of machine learning models after data oversampling is shown in Table 10. Results indicate that the performance of the machine learning models has been elevated when trained on the oversampled data using SMOTE. Oversampling increases the size of data which provides large feature set to train the models which boost their prediction accuracy. As for the performance of the machine learning models, ETC outperforms all models with an accuracy of 0.982 and F1 score of 0.982. All other models also show improvement in their performance with SMOTE oversampling technique. e performance of RF is slightly lower than that of ETC with an accuracy of 0.976. Overall, tree-based models show prominent performance as compared to linear and neural network models. Tree-based models perform significantly better due to their ensemble architecture. ETC, RF, and GBC combine several decision trees in learning and prediction procedures and perform superior on the HTRU2 dataset.

Results after Applying ADASYN Sampling.
For the current study, ADASYN oversampling is also used to balance the dataset. e performance of machine learning models using the ADASYN oversampled data is shown in Table 11. Results suggest that the performance of the machine learning models has been improved when used with ADASYN oversampled data. Tree-based models again outperform linear models and MLP and achieve good scores for performance evaluation metrics. For example, ETC achieves the highest accuracy score of 0.981 and F1 score of 0.982. e performance of linear model LR and neural network model MLP dropped when used with ADASYN resampling because of the dataset's new sample feature correlation.

Results with Proposed Combined
Resampling. For the proposed approach, resampled data from SMOTE, ADA-SYN, and CC are concatenated along the 0 axis which increases the size of data and leads to significant improvement in the performance of machine learning models. Results   Table 12 indicate that machine learning models perform better with the proposed CR sampling approach. Both ETC and RF achieve >99% accuracy with the CR technique with a similar F1 score which indicates that the models do not experience overfitting when trained with CR resampled data. e elevated performance is due to the concatenation of resampled data from different sampling approaches. It provided the models with different variations of samples to learn and make them more significant as compared to an individual data resampling technique. As a result, the performance of machine learning models has been significantly improved.
Using the proposed feature resampling approach, ETC outperforms with all resampling techniques and most significantly with the proposed CR resampling approach as shown in Figure 3. Figure 4 shows the confusion matrix of the best performer ETC with all resampling approaches. e confusion matrix shows that ETC makes 20,327 correct predictions out of 20,448 total predictions with only 121 false predictions with CR resampling. On the other hand, when ETC is used with SMOTE, 166 predictions are false and 9,590 predictions are correct out of 9,756 total predictions. Out of the 166 total false predictions, the model makes 101 false predictions from the resampled data which indicates the data generated by SMOTE to balance the dataset lead to false predictions. For the ADASYN case, the ETC model performs slightly poor than SMOTE as it makes 9,543 correct and 166 false predictions out of 9,709 total predictions. In the case of the CC undersampling technique, the performance is not good enough due to the reduced number of samples used for training. ETC gives 921 correct and 63 false predictions out of 984 total predictions. In light of discussed results, the performance of machine learning models when used with the proposed CR resampling approach is better than that of both oversampling and undersampling approaches.

4.6.
Results with Resampling on Training Set. Due to the highest performance of ETC with all the resampling approaches used for the current study, ETC is used for further analysis. For this purpose, the training dataset is balanced and ETC is trained on the balanced dataset while tested on the imbalanced dataset. Results given in Table 13 show that ETC outperforms all other models with this approach as well. ETC achieves the highest accuracy of 0.981 with the proposed CR resampling approach. However, the overall performance of the model has been reduced following this approach. Furthermore, values for accuracy and F1 score are sharply different than the values obtained in the previous approach.

Results with Deep Learning Models.
is study also deploys the state-of-the-art deep learning models for pulsar detection. Customized architectures of long short-term memory (LSTM), deep neural network (DNN) [10], and gated recurrent unit (GRU) models are used [52]. Architectural details and list and values of used variables are provided in Table 14.
Deep learning models are compiled with binary crossentropy and Adam optimizer, and 100 epochs are used for training. e performance of LSTM, GRU, and DNN is measured in terms of accuracy, precision, recall, and F1 score. Performance results given in Table 15 indicate that the achieved accuracy from three deep learning models is the same. However, the performance has marginal variance when precision, recall, and F1 scores are considered. Owing to the importance of the F1 score, LSTM and GRU show a better F1 score of 0.94 each as compared to the DNN model. Results prove that the optimized machine learning models have superior performance than deep learning models. Model fitting for deep learning models requires thousands of samples to show better performance; consequently, their performance is slightly less than that of machine learning models due to the small size of the dataset.

Results Using 10-Fold Cross-Validation.
To corroborate the significance of the proposed resampling approach and performance of the machine learning models, 10-fold crossvalidation is used, and results are given in Table 16. All models are employed with each data sampling approach to analyze the performance. Results indicate that the highest accuracy is obtained by ETC with the proposed hybrid sampling approach which shows the supremacy of the proposed approach over other data sampling approaches.

Comparison with the State-of-the-Art Studies.
For evaluating the efficacy of the proposed approach, a performance comparison is done with the previous similar    approaches. To this end, approaches that utilize the HTRU2 dataset have been selected. For example, study [6] conducted experiments using the same dataset with the proposed GH-VFDT model. Similarly, study [14] performed experiments on the same dataset using the proposed RTB-VC for pulsar prediction. e T-test shows that the results of tree-based models RF, ETC, and GBC with CR techniques accept the null hypothesis and reject the alternative hypothesis which means that these tree-based models are statistically significant with the CR technique as compared to all other resampling techniques.

Conclusion
Pulsar detection is a significant task and possesses great importance for studying several phenomena of nuclear physics. Automatic detection of pulsars from the collected data is a topic of significant importance in this regard. Due to the imbalanced nature of the HTRU2 dataset, the prediction accuracy is not up to the standard. is study proposes a concatenated resampling (CR) approach for data balance and a methodology to utilize the proposed CR for pulsar prediction with high accuracy. For this purpose, the performance of several machine learning algorithms is investigated and analyzed. Experimental results indicate that oversampling approaches SMOTE and ADASYN perform better than the undersampling cluster centroid approach. e increased feature vector for the oversampled data tends to boost the performance of the machine learning classifiers, especially the ETC, which achieves the highest accuracy with all resampling approaches. Performance evaluation metrics are much better for ETC when used with the proposed CR       (32) Dense (64, activation � "relu") GRU (64, return_sequences � True)) Dropout (0.2) Dropout (0.2) SimpleRNN (32) Dense (64, activation � "relu") Dense (64, activation � "relu") Dense (32) Dropout (0.2) Dropout (0.2) Dropout (0.2) Dense (2, activation � "softmax") Dense (2, activation � "softmax") Dense (16) --Dense (2, activation � "softmax") Loss � "binary_crossentropy," optimizer � "Adam," epochs � 100 approach with an accuracy of 0.993. Combining multiple resampling approaches elevates the performance of machine learning classifiers and reduces the influence of data imbalance. Results show that tree-based classifiers perform better than linear classifiers. Regarding the use of deep learning models, LSTM and GRU provide better F1 scores than DNN. Performance comparison with state-of-the-art approaches indicates that the proposed approach outperforms them and achieves higher accuracy.
is study leverages the supervised approach by optimizing several well-known machine learning models. However, the use of unsupervised models is expected to provide interesting results. Important observations can be made by clustering the HTRU dataset into groups, and analysis can be performed to highlight the features of probable candidates for pulsars.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.

Authors' Contributions
Ernesto Lee and Furqan Rustam contributed equally to this study.