In recent years, millions of source codes are generated in different languages on a daily basis all over the world. A deep neural network-based intelligent support model for source code completion would be a great advantage in software engineering and programming education fields. Vast numbers of syntax, logical, and other critical errors that cannot be detected by normal compilers continue to exist in source codes, and the development of an intelligent evaluation methodology that does not rely on manual compilation has become essential. Even experienced programmers often find it necessary to analyze an entire program in order to find a single error and are thus being forced to waste valuable time debugging their source codes. With this point in mind, we proposed an intelligent model that is based on long short-term memory (LSTM) and combined it with an attention mechanism for source code completion. Thus, the proposed model can detect source code errors with locations and then predict the correct words. In addition, the proposed model can classify the source codes as to whether they are erroneous or not. We trained our proposed model using the source code and then evaluated the performance. All of the data used in our experiments were extracted from Aizu Online Judge (AOJ) system. The experimental results obtained show that the accuracy in terms of error detection and prediction of our proposed model approximately is 62% and source code classification accuracy is approximately 96% which outperformed a standard LSTM and other state-of-the-art models. Moreover, in comparison to state-of-the-art models, our proposed model achieved an interesting level of success in terms of error detection, prediction, and classification when applied to long source code sequences. Overall, these experimental results indicate the usefulness of our proposed model in software engineering and programming education arena.

Programming is one of mankind’s most creative and effective endeavors, and vast numbers of studies have been dedicated to improving the modeling and understanding of software code [

In this paper, we are presenting an intelligent support model for source code completion that was designed using an LSTM in combination with an attention mechanism (then known as LSTM-AM) which increases the performances than a standard LSTM model. The attention mechanism is a useful technique that takes into account the results of all past hidden states for prediction. The attention mechanism can improve the accuracy of neural network-based intelligent models. We trained LSTM, RNN, CNN, and LSTM-AM networks with different hidden units (neurons) such as 50, 100, 200, and 300 using a bunch of source codes taken from an AOJ system. Erroneous source codes were then inputted into all models to determine their relative capabilities in regard to predicting and detecting source code errors. The obtained results show that our LSTM-AM network extends the capabilities of the standard LSTM network in terms of detecting and predicting such errors correctly. Even some source codes contain logical errors and other critical errors that cannot be detected by the usual compilers whereas our proposed intelligent support model can detect these errors. Additionally, the LSTM-AM network can retain a longer sequence of source code inputs and thus generate more accurate output than the standard LSTM and other state-of-the-art networks. In addition, we diversified with different settings and hidden units to create the most suitable model for our research in terms of cross-entropy, training time, accuracy, and other performance measurements. Also, the proposed model can classify the source codes based on the defects in codes. We expect that our proposed model will be useful for students, programmers, developers, and professionals as well as other persons involved in overall programming education and other aspects of SE.

The main contributions of this study are described as follows:

The proposed intelligent support model can help students and programmers for the source code completion

The intelligent support model detects such errors (logical) that cannot be identified using conventional compilers

The proposed intelligent support model accuracy is approximately 62% which outperformed other benchmark models

Our proposed model can classify the source codes based on the detected errors. The classification accuracy is 96% which is much better than other models

The proposed model highlights defective spots with location/line number in source codes

The proposed model improves the ability of learners to fix errors in source code easily by using the location/line numbers

The remainder of the paper is structured as follows. In Section

Modern society is flourishing due to advancements in the wide-ranging fields of information and communication technology (ICT), where programming is a crucial aspect of many developments. Millions of source codes are being created every day, most of which are tested through manual compiling processes. As a result, an important research field that has recently emerged involves the use of AI systems for source code completion during development rather than manual compiling processes. More specifically, artificial neural network-based models are using for source code completion in order to achieve more humanlike results. Numerous studies have been completed and a wide variety of different methods were proposed regarding the use of AI in SE fields, some of which will be reviewed below.

In [

In [

White et al. [

In [

In [

Pedroni and Meyer [

In [

Rahman et al. [

In [

Bahdanau et al. [

Li et al. [

Li et al. [

Dam et al. [

Pham et al. [

In [

In summary, a wide variety of methods and techniques have been proposed in various studies, most of which used RNN, LSTM, or convolutional neural network (CNN) models for source code manipulation and other applications. It is very difficult to explain which proposed research work is superior over other researches. RNNs perform comparatively better than conventional language models, but RNNs have limited ability to handle long source code inputs [

The resources of natural text corpora are being enriched by the accumulation of text from multiple sources on a daily basis. The success behind natural language processing is based on this rich text corpus. For this reason, and because of their simplicity and scalability,

The Markov assumption, which is used when the probability of a word depends solely upon the previous word, is described in

Thus, the general equation of an

In practice, the maximum likelihood can be estimated by many smoothing techniques [

Cross-entropy is measured to validate the prediction goodness of a language model [

An RNN is a neural network variant that is frequently used in natural language processing, classification, regression, etc. In a traditional neural network, inputs are processed through multiple hidden layers and then output via the output layer. In the case of sequential dependent input, a general neural network cannot manufacture accurate results. For example, in the case of the dependent sentence “

A simple RNN structure.

Mathematically, an RNN can be presented using equation (_{t} is the current state, _{t−1} is the previous state, _{t} is the current state input,

Equation (

Finally, the output function can be written as follows:_{t} is the output.

RNN has multiple input and output types such as one to one, one to many, many to one, and many to many. Despite all the advantages, RNN is susceptible to the major drawbacks of gradient vanishing or exploding.

In this section, we discuss the RNN gradient vanishing and exploding problems. It seems simple to train the RNN network, but it is very hard because of its recurrent connection. In the case of forward propagation, we multiply all the weight metrics and a similar procedure needs to apply the backpropagation. For the backpropagation, the signal may be strong or weak which causes exploding and vanishing. Gradient vanishing makes a complex situation to determine the direction of model parameters to improve the loss function. On the other hand, exploding gradients make the learning condition unstable. Training of the hidden RNN network is passed through different time steps using backpropagation. The sum of a distinct gradient at every time step is equal to the total error gradient. The error can be expressed by considering total time steps

Now, we apply the chain rule to calculate the overall error gradients:

The term

The term

Now, by the Eigen decomposition on the Jacobian matrix

An LSTM neural network is a special kind of RNN network that is often used to process long inputs. An LSTM is not limited to a single input but can also process complete input sequences. Usually, an LSTM is structured with four gates such as forget, input, cell state, and output. Each gate has a separate activity where the cell state keeps complete information of the input sequences and others are used to manage the input and output activities. Figure

Internal structure of an LSTM unit.

At the very beginning, processing starts with the forget gate to determine which information has to be discarded from (or retain in) the cell state. The forget gate in cell state _{t−1} can be expressed by the following equation (_{t−1} is the hidden state and _{t} is the current input. The output (0 or 1) of the forget gate is produced through a sigmoid function. If the result of the forget gate is 1, then we keep the data in the cell state; otherwise, we discard the data.

The input gate determines which cell state value should be updated when new data appears. Through the tanh function, the candidate value

Now, we update the old cell state _{t−1} by the _{t}

The filtered version of the cell state will be output _{t} via the sigmoid function and the weight will also be updated.

Recognizing the strength of LSTM, we were motivated to apply this network model to error detection, prediction, correction, and classification in source codes.

Our proposed LSTM-AM network has an effective deep learning architecture that is used as an intelligent support model for source code completion. Accordingly, we trained our model using correct source codes and then used it successfully to detect errors and predict correct words in erroneous source codes based on the trained corpus. Moreover, the proposed model can classify the source codes using the prediction results. Our model generates a complete feedback package for each source code after being examined where learners and professionals can benefit from the model. The workflow of our proposed model is depicted in Figure

The main framework of our model: (a) conversion of source code to token IDs, (b) model training using token IDs, and (c) results produced by the softmax function.

Over the years, attention mechanisms have been adapted to a wide variety of diverse tasks [

With this point in mind, we incorporated an attention mechanism into a standard LSTM to make LSTM-AM, as shown in Figure

An architecture of the proposed LSTM-AM network.

For our attention mechanism, we took the external memory of _{t} and _{t} at the time _{t}, and context vector _{t}.

To predict the next word at time step _{t} but also on context vector _{t}. At that point, the focus turns to the vocabulary spaces to produce the final probability _{t} is an output vector.

Based on the above aspects, we can see that the use of an attention mechanism helps to effectively extract the exact features from input sequences. As such, the use of LSTM-AM will increase the capability of our model.

An online judge (OJ) system is a web-based programming environment that compiles and executes submitted source codes and returns judgments based on test data sets. OJ system is an open platform for programming practice as well as competition. To conduct our experimental work, we collected source code from the AOJ system [

Input and output sample of GCD and PN problems.

In contrast, the total number of correct source codes for the PN problem is 1538 and the overall solution success rate is 30.8%. In the PN problem description, the first line contains an integer

Before we conducted training, raw source codes were filtered by removing unnecessary elements. To accomplish this, we followed the procedure applied to [

A partial list of IDs for characters, special characters, numbers, and keywords.

Words | IDs |
---|---|

Auto, break, char, case, const, continue, do, default, double, enum, else, exturn, for, float, goto, int, if, long, return, register, signed, short, sizeof, struct, static, switch, typedef, unsigned, union, volatile, void, while | 30–45 |

A to | 96–121 and 127–152 |

0–9 | 80–89 |

=, ∗, +, −, /, %, |, <, >, (), {, }, [, ] | 94, 74, 75, 76, 79, 69, 155, 92, 93, 72, 73, 153, 155, 122, 124 |

@, #, $, %, and | 95, 67, 68, 69, 70 |

!(Not), ?(question), ‘(single quote), “(double quote),.(dot),; (semicolon),:(double colon),,(comma) | 63, 64, 71, 66, 78, 90, 91, 76 |

The flowchart of the training and evaluation process of our proposed model.

At the early stage of the training phase, the source codes were first converted into word sequences and then encoded into token IDs as shown in Figure

Source code embedding and tokenization process.

Upon completion of the embedding and tokenization process, we trained our proposed model and other related state-of-the-art models with the correct source codes of IS, GCD, and PN problems. The simple training process of an LSTM-based language model is shown in Figure

Training process of an LSTM language model.

At the end of the training process, the next step is to check the performance of the model for the source code completion task. How accurately it identifies errors and predicts corrections? Our proposed model created the probability for each word. We considered a word will be an error candidate whose probability is below 0.1 [_{1}, _{2}, _{3}, _{4}, …, _{n}] and returns probability _{1}, _{2}, _{3}, _{4}, _{n}], as defined in the following equation:

Cross-entropy is an effective performance measurement indicator for the probability-based model. Low-valued cross-entropy indicates a good model.

A simple example of the prediction process used by our model is shown in Figure

LSTM-AM network prediction process.

In the present research, we defined several experimental hyperparameters in order to obtain better results. To avoid overfitting, a dropout ratio (0.3) was used for our proposed model. The LSTM network was optimized using Adam, which is a stochastic optimization method [

Our proposed intelligent support model can be useful for source code completion. Also, it is a general model and can be adapted to any source code for model training and testing. In our proposed model, we defined a minimum probability value by which the model can identify error candidate words based on the training corpus. Accordingly, we randomly chose some incorrect IS, GCD, and PN source codes and used them to evaluate the models’ performance levels. Here, we should note that all of our research work and language model training were performed on an Intel® Core™ i7-5600U central processing unit (CPU) personal computer clocked at 2.60 GHz with 8 GB of RAM in a 64-bit Windows 10 operating system.

We used several hidden units such as 50, 100, 150, and 200 to train our proposed LSTM-AM and other state-of-the-art models. In training, the correct source codes of IS, GCD, and PN problems are used separately and all the source codes of IS, GCD, and PN are used combinedly. The number of source codes of each type of problem is listed in Table

Number of source codes of each problem.

Problem type | Number of source codes |
---|---|

Greatest common divisor (GCD) | 964 |

Insertion sort (IS) | 1518 |

Prime numbers (PN) | 972 |

We trained our proposed LSTM-AM and different state-of-the-art models using correct source codes. Table

Cross-entropy comparison using PN source codes.

Model | Units (neurons) | |||
---|---|---|---|---|

50 | 100 | 150 | 200 | |

LSTM-AM | 4.35 | 3.90 | 2.87 | 2.23 |

LSTM | 4.75 | 3.31 | 2.37 | 2.02 |

RNN | 6.35 | 4.72 | 4.21 | 3.95 |

Tables

Cross-entropy comparison using GCD source codes.

Model | Units (neurons) | |||
---|---|---|---|---|

50 | 100 | 150 | 200 | |

LSTM-AM | 2.22 | 1.80 | 1.75 | 1.30 |

LSTM | 2.56 | 1.91 | 1.80 | 1.39 |

RNN | 5.11 | 4.36 | 3.50 | 3.23 |

Cross-entropy comparison using IS source codes.

Model | Units (neurons) | |||
---|---|---|---|---|

50 | 100 | 150 | 200 | |

LSTM-AM | 3.12 | 1.55 | 1.40 | 1.27 |

LSTM | 3.26 | 1.63 | 1.48 | 1.26 |

RNN | 4.99 | 3.78 | 2.89 | 3.11 |

To evaluate the efficiency of the proposed model, epochwise cross-entropy during the training periods using the 200-unit model was calculated which is depicted in Figure

Epochwise cross-entropies of 200-unit model using (a) IS, (b) PN, and (c) GCD source codes.

As mentioned above, the efficiency of a model strongly depends upon the value of cross-entropy. During training, the 200-unit model produced the lowest cross-entropy using each type of problem set. The cross-entropy of the 200-unit model using IS, PN, and GCD problems is shown in Figure

Cross-entropies of the 200-unit model using IS, GCD, and PN problems.

We aimed to find the best-suited hidden units for our LSTM-AM network and other state-of-the-art models. In this regard, we put together all the source codes (about 3442) to train our proposed and other state-of-the-art models. The cross-entropies and total times are recorded at the last epoch of all the models as presented in Table

Cross-entropy comparison of different models using all source codes.

Model | Units (neurons) | |||
---|---|---|---|---|

50 | 100 | 150 | 200 | |

LSTM-AM | 3.96 | 2.99 | 2.75 | 2.17 |

LSTM | 3.89 | 3.26 | 2.50 | 2.31 |

RNN | 5.11 | 4.36 | 3.87 | 3.53 |

Based on the above aspect, it is ensured that the 200-unit model provides the best results because its cross-entropy is the lowest value among all the units; thus, we selected a 200-unit model for the LSTM-AM network and other state-of-the-art networks.

In our evaluations, we tested LSTM-AM and other state-of-the-art models using erroneous source codes. Probable error locations were marked by changing the text color and underlining the suspected erroneous portions. Also, the proposed model generates error words and predicted words’ probability. Since both the standard LSTM and the LSTM-AM networks identified source code errors quite well compared with the RNN and other networks when the 200-unit model was used, a 200-unit model was selected for use in all of our empirical experiments.

An erroneous source code sequence evaluated by the standard LSTM network is shown in Figure

Erroneous source code with colored fonts and underlines evaluated by the standard LSTM.

LSTM network error detection and predictions.

Erroneous words | Erroneous word’s probability | Location | Predicted words | Probability |
---|---|---|---|---|

0.000496462 | 2 | ) | 0. 6243539 | |

If | 0.014852316 | 6 | else | 0.5808078 |

“ | 0.029112648 | 15 | space | 0.6583209 |

8.5894303^{−10} | 16 | 0.9261215 |

The same incorrect source code was then evaluated by the LSTM-AM network, as shown in Figure

Erroneous source code with colored fonts and underlines evaluated by the LSTM-AM.

LSTM-AM network error detection and predictions.

Erroneous words | Erroneous word’s probability | Location | Predicted words | Probability |
---|---|---|---|---|

0. 000309510 | 2 | ) | 0. 5722154 | |

“ | 0. 045484796 | 15 | space | 0. 7051629 |

2.838025^{−07} | 16 | 0.9863272 |

Another interesting erroneous source code, which exists in some logical errors, was evaluated by the standard LSTM network, as shown in Figure

Logical erroneous source code with colored fonts and underlines assessed by the standard LSTM.

Error words detection and prediction using standard LSTM.

Erroneous words | Erroneous word’s probability | Location | Predicted words | Probability |
---|---|---|---|---|

for | 0.049593348 | 31 | 0.1376049 | |

0.02372846 | 38 | 0.6147145 | ||

} | 0.0470908 | 39 | return | 0.95013565 |

Similarly, the same erroneous source code was tested by the LSTM-AM network, as shown in Figure

Logical erroneous source code with colored fonts and underlines assessed by the standard LSTM-AM.

Error words’ detection and prediction using the LSTM-AM network.

Erroneous words | Erroneous word’s probability | Location | Predicted word | Probability |
---|---|---|---|---|

0.034574546 | 12 | 0.9269715 | ||

= | 0.012642788 | 13 | 0.9468921 | |

0.03553478 | 23 | } | 0.6362259 | |

for | 0.037460152 | 31 | while | 0.5292723 |

0.025345348 | 35 | 0.8597483 |

We evaluated our proposed LSTM-AM and other benchmark models using both clean and erroneous source codes. For extensive experiments, we selected several benchmark models to compare classification results such as (i) Random Forest (RF) [

The precision, recall, and

Under normal circumstances, our proposed language model detects all possible errors in source codes where all the detected errors are not true errors (TE). So, we considered only TE for the classification process. An error is called a TE when the predicted probability is more than 0.90. We aligned the term true positive

Classification performance comparison using insertion sort (IS) source codes.

Models | Precision ( | Recall ( | |
---|---|---|---|

LSTM-AM | 0.99 | 0.97 | 0.97 |

LSTM | 0.90 | 0.88 | 0.88 |

RNN | 0.82 | 0.79 | 0.80 |

CNN | 0.85 | 0.80 | 0.82 |

RF | 0.62 | 0.55 | 0.58 |

RF + RBM | 0.66 | 0.65 | 0.65 |

RF + DBN | 0.71 | 0.66 | 0.68 |

Classification performance comparison using the greatest common divisor (GCD) source codes.

Models | Precision ( | Recall ( | |
---|---|---|---|

LSTM-AM | 0.98 | 0.95 | 0.96 |

LSTM | 0.87 | 0.89 | 0.87 |

RNN | 0.80 | 0.81 | 0.80 |

CNN | 0.83 | 0.84 | 0.83 |

RF | 0.64 | 0.59 | 0.61 |

RF + RBM | 0.70 | 0.63 | 0.66 |

RF + DBN | 0.75 | 0.80 | 0.77 |

Classification performance comparison using prime number (PN) source codes.

Models | Precision ( | Recall ( | |
---|---|---|---|

LSTM-AM | 0.95 | 0.94 | 0.94 |

LSTM | 0.88 | 0.86 | 0.86 |

RNN | 0.76 | 0.79 | 0.77 |

CNN | 0.80 | 0.82 | 0.80 |

RF | 0.59 | 0.60 | 0.59 |

RF + RBM | 0.63 | 0.62 | 0.62 |

RF + DBN | 0.65 | 0.66 | 0.65 |

In the classification process, the

To assess our proposed intelligent support model, we defined three performance measurement indices such as error prediction accuracy (EPA), error detection accuracy (EDA), and model accuracy (MA), shown in equations (

In most cases, the proposed model detects potential errors in the codes. Among these errors, there are a few original errors called true errors (TE). Similarly, out of the total predicted words, where some of the original correct words are left, they are called True Correct Words (TCW).

In the evaluation process, we discarded the RNN and other benchmark models because they obtained high cross-entropies whereas standard LSTM achieved very low cross-entropies. Therefore, we validated both the standard LSTM and LSTM-AM networks using several randomly chosen erroneous source codes. Figure

The evaluation results of erroneous source code using standard LSTM.

Evaluation indices | Results (%) | Descriptions |
---|---|---|

EDA | 25 | TE = 1, TDE = 4 |

EPA | 25 | TCW = 1, TPW = 4 |

MA | 25 | EDA = 25, EPA = 25 |

In Figure

The evaluation results of erroneous source code using LSTM-AM.

Evaluation indices | Results (%) | Descriptions |
---|---|---|

EDA | 33.33 | TE = 1, TDE = 3 |

EPA | 33.33 | TCW = 1, TPW = 3 |

MA | 33.33 | EDA = 33.33, EPA = 33.33 |

To further evaluate the performance of our proposed model, we then took a somewhat larger and more complex erroneous source code and verified it using both the standard LSTM and LSTM-AM networks, as shown in Figures

Evaluation results by the standard LSTM and LSTM-AM.

Model | EDA (%) | EPA (%) | MA (%) |
---|---|---|---|

LSTM | 66 | 30 | 48 |

LSTM-AM | 90 | 72 | 81 |

In addition to the abovementioned source code evaluations and examples, we evaluated about 300 randomly chosen erroneous source codes using the LSTM and LSTM-AM models and found that their average accuracy values were approximately 31% and 62%, respectively. Those detailed statistics are shown in Table

Overview of the average evaluation statistical results using various erroneous source codes.

Name | No. | Models | |||||
---|---|---|---|---|---|---|---|

LSTM | LSTM-AM | ||||||

EDA | EPA | MA | EDA | EPA | MA | ||

GCD | 100 | 34.27 | 32.13 | 33.2 | 65.47 | 59.04 | 62.26 |

PN | 100 | 28 | 31 | 29.5 | 64.6 | 57.3 | 60.95 |

IS | 100 | 31.4 | 29.8 | 30.6 | 63.6 | 61 | 62.3 |

Unlike the examples used in this study, programs’ lengths can vary widely, with many containing from 500 to 1000 lines of source code, or more. One thing all have in common is that when writing a program, numerous variables and functions may be declared in many lines previously. Therefore, an attention mechanism is needed to capture the long-term source code dependencies, as well as to evaluate source code errors correctly. Our experimental results have shown that the LSTM-AM model was much more successful for the longer sequenced source code than was the standard LSTM model, as shown in Figure

Accuracy of standard LSTM and LSTM-AM networks.

Additionally, some syntax and logical errors in source codes cannot be identified by traditional compilers. In such cases, our proposed LSTM-AM-based language model can provide meaningful responses to learners and professionals that can be used for the source code debugging and refactoring process. This can be expected to save time when working to detect errors from thousands of lines of source code, as well as to limit the area that must be searched to find the errors. Furthermore, the use of this intelligent support model can assist learners and professionals to more easily find the logical and other critical errors in their source codes. Moreover, the classification accuracy of our proposed model is much better than the other state-of-the-art models. The average precision, recall, and

In the present research, we proposed an AI-based model to assist students and programmers in source code completion. Our proposed model is expected to be effective in providing end-to-end solutions for programming learners and professionals in the SE fields. The experimental results obtained in this study show that the accuracy of error detection and prediction using our proposed LSTM-AM model is approximately 62%, whereas standard LSTM model accuracy is approximately 31%. In addition, our approach provides the location numbers for the predicted errors, which effectively limits the area that must be searched to find errors, thereby, reducing the time required to fix large source code sequences. Furthermore, our model generates probable correction words for each error location and detects logical and other errors that cannot be recognized by conventional compilers. Also, the LSTM-AM model shows great success in source code classification than other state-of-the-art models. As a result, it is particularly suitable for application to long source code sequences and can be expected to contribute significantly to source code debugging and refactoring process. Despite the abovementioned advantages, our proposed model also has some limitations. For example, error detection and predictions are not always perfect, and the model sometimes cannot understand the semantic meaning of the source code because of the incorrect detection and predictions that have been produced. Thus, our future work will use a bidirectional LSTM neural network to improve this intelligent support model for source code completion.

We acquired all the training and test source codes from the AOJ system. Resources are accessed from the following websites through the API:

The authors declare that they have no conflicts of interest.

This research was funded by the JSPS KAKENHI (grant no. 19K12252).