Hybrid Deep Neural Network Scheduler for Job-Shop Problem Based on Convolution Two-Dimensional Transformation

In this paper, a hybrid deep neural network scheduler (HDNNS) is proposed to solve job-shop scheduling problems (JSSPs). In order to mine the state information of schedule processing, a job-shop scheduling problem is divided into several classification-based subproblems. And a deep learning framework is used for solving these subproblems. HDNNS applies the convolution two-dimensional transformation method (CTDT) to transform irregular scheduling information into regular features so that the convolution operation of deep learning can be introduced into dealing with JSSP. The simulation experiments designed for testing HDNNS are in the context of JSSPs with different scales of machines and jobs as well as different time distributions for processing procedures. The results show that the MAKESPAN index of HDNNS is 9% better than that of HNN and the index is also 4% better than that of ANN in ZLP dataset. With the same neural network structure, the training time of the HDNNS method is obviously shorter than that of the DEEPRM method. In addition, the scheduler has an excellent generalization performance, which can address large-scale scheduling problems with only small-scale training data.


Introduction
Job-shop scheduling problem (JSSP) [1] is one of the most famous problems in the industrial production, and it is categorized as a large class of intractable numerical problems known as NP-hard [2]. e solution space for an m * n JSSP (where m is the number of machines and n is the number of jobs) is (n!) m [3].
As it will be discussed in Section 2, many scholars have tried to solve this type of problems with population-based methods [4], gene-based methods [5], and heuristic methods [6]. However, in the face of large-scale problems, the response rate of the above methods has no distinct advantages. Many current researches show that data mining and machine learning methods have great potential in effect and efficiency [7]. In this paper, a hybrid deep neural network scheduler (HDNNS) is put forward to promote the scheduling capability. And convolution two-dimensional transformation (CTDT) is developed to convert JSSP's state information into regular information so that the process can be simplified in the convolutional network.
HDNNS has contributions in the following aspects: (i) Based on the work of Weckman [3], Metan et al. [8], and Paolo et al. [9], HDNNS transforms JSSP into several classification subproblems. HDNNS's main innovation is the classification of the processing sequence of each job on each machine. e more precise classification method makes HDNNS more effective in the large-scale problems.
(ii) Convolution two-dimensional transformation (CTDT) comes up in this paper. e function of CTDT is to convert the irregular scheduling data into regular multidimensional data with the form of Cartesian product. e transformed multidimensional data can be effectively processed in the deep convolution networks. (iii) HDNNS designs a hybrid neural network combining the deep convolution network [10] and the BP neural network [11]. In the first half of the network structure, convolution network and BP network are used to deal with the structural features and irregular features, respectively. After a certain number of layers of network processing, HDNNS merges these two networks with flattening operation for further feature extraction.
Our experimental results prove that the scheduling results of HDNNS are superior to many learning-based methods (ANN, HNN, and reinforcement learning methods), traditional classification methods (SVM, GOSS, and others), and attribute-oriented induction methods (AOI) [9] for the MAKESPAN index. HDNNS can occupy an advantage in the JSSPs compared with population-based methods (GA) and optimization methods (BBM). e value of HDNNS is not negated in the tests because GA and BBM are time consuming in computation. Besides, unlike GA and BBM, the HDNNS method has strong generalization performance. Our experiments certificate that a model trained by the data of small-scale JSSPs can address a large-scale one.
Although the training of the model requires extra time, the training process can be finished in advance. When the application environment remains stable, the model may not even need further updates. Such characteristics can increase the application value of the model to a certain extent. e training process can also be effectively accelerated by hardware such as GPU. Also, with the appropriate hardware (such as GPU and FPGA), the training speed will be significantly boosted. e structure of this paper is as follows. In Section 2, a part of the most related work on the solution methods for JSSP has been reviewed along with neural network and other approaches available in the literature. In Section 3, the mathematical model of JSSP is proposed. In Section 4, the framework of the HDNNS is introduced, which includes scheduler structure, convolution two-dimensional transformation, and the basis of deep neural network. In Section 5, a 6 * 8 JSSP example is applied to explain our method. In Section 6, six experiments are utilized to test the effectiveness and the generalization performance of the proposed method.

Population-Based and Gene-Based Methods for JSSP.
Over the last decades, JSSP has attracted much attention in the academia. Hence, a wide range of approaches have been developed for JSSP. Recently, population-based and genebased methods are investigated to find optimal or nearoptimal solutions.
Zhao et al. [12] proposed an improved particle swarm optimization with a decline disturbance index to improve the ability of particles in exploring global and local optimum solutions and to reduce the probability of particles being trapped into a local one. Peng et al. [13] combined a tabu search procedure with path relinking and showed that their method had a high performance in solving benchmark problem instances. Asadzadeh [14] tried to improve the efficiency of the genetic algorithm in solving JSSP by parallelizing populations and using an agent-based approach. Kurdi et al. [15] presented a modified island model genetic algorithm (IMGA) for JSSP. In this model, a nature-inspired evolutionary method and a migration selection mechanism have been added to the classical IMGA to improve diversification and delay premature convergence. Park et al. [16] proposed a dynamic JSSP and applied genetic programming-based hyper-heuristic methods with ensemble combination schemes to solve it. e investigated schemes had majority voting, linear combination, weighted majority voting, and weighted linear combination. It was concluded from the experiments that for the dynamic JSSP, the linear combination outperformed the other methods. Jiang et al. [17] employed the grey wolf optimization (GWO) to deal with two combinatorial optimization problems in the manufacturing field: job-shop and flexible job-shop scheduling cases. e discrete GWO algorithm was compared with other published algorithms for two scheduling cases. Experimental results demonstrate that our algorithm outperforms other algorithms for the scheduling problems under study. Fu et al. [18] proposed a fireworks algorithm with special strategies to solute the flow-shop scheduling problem under the consideration of multiple objectives, time-dependent processing time, and uncertainty. Sharma et al. [19] developed a variant of the ABC algorithm inspired from beer froth decay phenomenon to deal with job-shop scheduling problems.
ere is no doubt that population-based and gene-based strategies are effective to solve JSSPs. However, faced with large-scale problems, the number of repeated iterations and updating operations often take a long time. erefore, it is of great value to study a learning-based scheduler with fast response.

Learning-Based and Neural Network-Based Methods for
Solving JSSP. With the further development of machine learning, some scholars try to solve JSSPs with learningbased methods. In this field, researches can be divided into two categories.
In the first category, learning methods are used to optimize population-based and gene-based methods. Learning methods optimize the updates of solutions, which thus improve the efficiency of optimization. Yang and Lu et al. [20] proposed a hybrid dynamic preemptive and competitive NN approach called the advanced preventive competitive NN method. A CNN was used to classify the system conditions into 50 groups. For each production interval, the current system status group was determined by CNN. Shiue et al. [21] extended the previous work by considering both the input control and the dispatching rule, such as those in a wafer fabrication manufacturing environment. In a novel recent work by Mirshekarian and Sormaz [22], a statistical study of the relationship between JSSP feature and optimal MAKESPAN was conducted. Ramanan et al. [23] proposed an artificial neural network-based heuristic method. is method utilized ANN to generate a solution of JSSP and then took it as the initial sequence to a heuristic proposed by Suliman. Adibi et al. [24] used a trained artificial neural network (ANN) to update parameters of a metaheuristic method at any rescheduling point in a dynamic JSSP according to the problem condition. Maroosi et al. [25] proposed an approach which utilizes the parallel membrane computing method and the harmony search method to solve flexible job shop problems. Information from the best solutions was used to boost the speed of convergence while preventing premature convergence to a local minimum.
In the second category, a reinforcement learning or machine learning framework is applied to build a learningbased model (ANN [11], SVM [26], CNN [27], or others). en, the model is trained to master scheduling rules and complete automatic scheduling tasks. Weckman et al. [3] developed a neural network (NN) scheduler for JSSP in which the genetic algorithm was used to generate optimal or near-optimal solutions for a benchmark problem instance, and then, an NN was used to capture the predictive knowledge regarding the sequence of operations. Chen et al. [28] proposed a rule-driven dispatching method based on data envelopment analysis and reinforcement learning for the multiobjective scheduling problem. Mao et al. [29] presented the deep reinforcement learning method (DEEPRM) and translated the problem of packing tasks with multiple resource demands into a learning problem. is solution has an essential inspiration for solving the JSSP. Moreover, the initial results show that DEEPRM performs comparably to state-of-the-art heuristics, adapts to different conditions, converges quickly, and learns strategies that are sensible in hindsight. Shahrabi et al. [30] proposed a reinforcement learning (RL) with a Q-factor algorithm to enhance the performance of the scheduling method proposed for dynamic JSSP which considered random job arrivals and machine breakdowns. Nasiri et al. [31] used discrete event simulation and multilayer perceptron artificial neural network to solve the open-shop scheduling problem. Mohammad et al. [9] proposed a data mining-based approach to generate an improved initial population for population-based heuristics solving the JSSP. is method applied a combination of "attribute-oriented induction" and "association rule mining" techniques to extract the rules behind the optimal or near-optimal schedules of JSSP. Finally, their experiments verify the significant amount of FEs that can be saved using the proposed approach and the superiority of the proposed method in comparison with the method of Koonce and Tsai [32].
According to the retrospective literature, none of the previous studies directly applied deep learning frameworks to JSSP. is paper creates a convolution two-dimensional transformation and designs network structure to solve JSSP.

Mixed Integer Programming Model of JSSP
Job-shop scheduling problem (JSSP) can be described as a mixed integer programming problem. e mathematical description is [33] min: x ijk ∈ 0, 1 { }, ∀i ∈ M, j ∈ J, k ∈ 1, · · · , n { }. (8) e decision variables are defined as follows: (i) x ijk is equal to 1 if job j is scheduled at the k-th position on machine i (ii) h ik denotes the start time of the job at the k-th position of machine i e parameters are defined as follows: (i) J is the set of the jobs, and M is the set of the machines (ii) n is the number of the jobs, and n � card(J) (iii) m is the number of the machines, and m � card(M) (iv) p ij is a non-negative integer which represents the processing time of job j and machine i (v) r ijk � 1 if the k-th position of job j requires machine i e objective function is in (1). Constraint (2) ensures that each position on each machine is assigned to exactly one job. Constraint (3) ensures that each job only gets one position on a machine. Constraint (4) states that the start time of a job on a machine should be larger than the completion time of the job scheduled at the previous position. Constraint (5) is the precedence constraint. It ensures that all operations of a job are executed in the given order. In (5), V is i∈J i∈M p ij since the completion time of any operation cannot exceed the summation of the processing times from all the operations. Constraint (6) Computational Intelligence and Neuroscience ensures that the MAKESPAN is at least the largest completion time of the last job on all machines. Constraint (7) ensures that the start time of all jobs at all positions is greater or equal to 0. e structure of the scheduler is shown in Figure 1. HDNNS is divided into two sections: training section and scheduling section.

Hybrid Deep Neural Network Scheduler
e training section has six steps (Step 1.1-Step 1.6). First, a large number of JSSPs are generated according to the JSSP description in Step 1.1. e description includes the number of machines m, the number of jobs n, and the distribution function of processing time F(p). Next, the generated problems are solved by state-of-the-art methods (BBM or GA in this paper). Moreover, corresponding scheduling results are generated in Step 1.2. In Step 1.3, each JSSP is divided into several subproblems, described as the features of a job processing and the priority in the machine. Features of job processing generate the 1D and 2D input data with CTDT in Step 1.4. Moreover, the priority in the machine generates onehot target data in Step 1.5. Finally, the scheduler training is in Step 1.6. e training section has five steps (Step 2.1-Step 2.5). First, Step 2.1 is started when a new JSSP requires to schedule. en, 1D input and 2D input can be produced by generating subproblem operation (same as Step 1.3) and convolution two-dimensional transformation operations (same as Step 1.4). In Step 2.4, we use a trained neural network to obtain the priority of each process in each job corresponding to the input of two groups of the neural network. In Step 2.5, a complete scheduling result is created with all priority results taken into account.

Mathematical Representation of Standard Solver and
Division of Subproblems. Combined with the MIP description of JSSP in Section 3, all solvers are abstracted as follows: In (9), X is the set of 0-1 decision variables x ijk , H is the set of integer decision variables h ik , P is the set of processing time data p ij , and R is the set of operation requiring data r ijk . And S(·) can be any scheduler for JSSP, such as genetic algorithm (GA) [14], branch and bound method (BBM) [34], and tabu search algorithm [13].
In order to improve the generalization performance of the model, HDNNS classifies a complete JSSP into several subproblems. Specifically, each subproblem determines the priority category on machine of the job processing process: In (10), F * ij is the processing feature of job j's k-th position in machine i and the relationship between the job's position and the machine is given by R. S(·) is a subproblem scheduler from the S(·) in (9), and A ij ∈ 1, 2, · · · , n { } is the integer priority of job processing on the machine (if in schedule result X, job j is processed in the order k in machine i, then A ij � k).
e generation of F * ij and A ij will be introduced in Sections 4.3 and 4.4. e subproblem generation process is shown in Figure 2.

Definition of One-Dimensional Features.
is paper designs a convolution two-dimensional transformation (CTDT) to extract scheduling features. Convolution operation is commonly used to extract features in the field of artificial intelligence and image processing [35,36]. Many scholars believe that deep convolution operation is an effective way to extract complex combined features [37]. e CTDT is proposed to transform the irregular data in scheduling process (which cannot be convoluted directly) into regular data by the form of Cartesian product.
First, we define the 1-dimensional matrix relative machine processing time p l from P as P l � T 1,1 , T 1,2 , · · · , T 1,j 2 , · · · , T 1,n , · · · , T j 1 ,j 2 , · · · , T n,n . (11) In (11), T j 1 ,j 2 , j 1 , j 2 ∈ J is the ratio of processing time of job j 1 to that of job j 2 , which are represented as follows: P l will provide the scheduler with relative information about the processing time of jobs. en, we define the 1-dimensional matrix's earliest start time E l from P and R as E l � e 1k , e 2k , · · · , e jk , · · · , e nk .
In (13), e jk , j ∈ J, k ∈ 1, 2, · · · , n { }, is the earliest start time of job j's k-th position shown as follows: In (14), P l provides the urgency information of jobs. Similarly, we define the 1-dimensional other features F l ij as In (15), F l ij consists of a series of important features in reference and application. N f is the number of the features, and in this paper, N f � 10.
e features are given in Table 1. 4 Computational Intelligence and Neuroscience In Table 1, the variables in tables are defined as follows: In (16)- (18), T total is the total processing time, T cmp i is the processing time of machine i, and T cjp j is the processing time of job j.    Computational Intelligence and Neuroscience

Convolution Two-Dimensional Transformation and
Definition of Two-Dimensional Matrix. Cartesian product operation can combine linear features and convert onedimensional feature data into two-dimensional feature data. is paper designs convolution two-dimensional transformation (CTDT) based on Cartesian product. e transformation is described in In (19), m l 1 and m l 2 are the two one-dimensional features and × is the sign of Cartesian product; the mathematical definition is shown in (20). α and β are the parameters of this transformation. sigmoid(·) is a nonlinear activation function. is function will match the model parameters and extract new features in different horizons. e sigmoid function is shown in In (21), x is a matrix. An example of a T(·) function is shown in Figure 3.
In Figure 3, m l 1 and m l 2 are the two one-dimensional data like P l , E l , and F l ij in Section 4.3.1. e Cartesian product of m l 1 and m l 2 is m l 1 × m l 2 . ree sets of parameters are used to normalize m l 1 × m l 2 in Figure 3. Different parameters mean that the model pays attention to different data scales, which helps the model to discover the characteristics of different scales.

Training Labels.
HDNNS transforms the scheduling problem into classification problems. So, this paper uses onehot encoding [38] to define training labels.
Job i's k-th position (one machine j) onehot priority label o k ij is shown in (22). ree examples are given in Figure 4: In (22), A ij is the number of positions in job i machine j, defined in Section 4.2.

Structure of Hybrid Deep Neural Network Scheduler.
In this section, an innovative hybrid deep neural network structure for JSSP is introduced.
As shown in Figure 1, the inputs of the hybrid deep neural network scheduler are one-dimensional input Input1, twodimensional input Input2, and target input Target. e expression is shown in e general structure of the network is shown in Figure 5.
In Figure 5, the left side of the structure diagram is the input part of the network.
For Input1, HDNNS uses L1 layers (fully connected layer) [39] (FCL in the figure) to preliminarily extract onedimensional features. As shown in Figure 5, the output of the g-th layer is defined as D A g and the output of the final layer is D A L1 . e fully connected layer is a typical combination of neurons in the deep convolution network [39].
For Input2, HDNNS uses L1 layers (convolutional layer) [39] (CL in the figure) to preliminarily extract two-dimensional features. e size of the convolution kernel [39] is set to 3 * 3. As shown in Figure 5, the output of the g-th layer is defined as D B g to D G g in different features in (23) and the weight of the g-th layer is defined as W B g to W G g . Position order [3] k/n f *

ij,2
Ratio of machine index i to machine number m [23] i/m Ratio of job index j to job number n [23] j/n Remaining processing time of job j Ratio of operation processing time p ij to total processing time [23] Ratio of operation processing time p ij to processing time of machine i [23] Ratio of operation processing time p ij to processing time of job j [11] Ratio of processing time of machine i to total processing time [11] T Ratio of job j's processing time to total job processing time T Ratio of job j's processing time to processing time of References indicate that this feature has been used in the corresponding literature. 6 Computational Intelligence and Neuroscience At the L1 + 1th layer of the network, one-dimensional features and two-dimensional features are combined by flattening operation in the flattened layer [40], described as In (24), W A L1,q , W B L1,q , · · · , and W G L1,q are the network weights of layers FCLL1, CL1.L1, · · · , CL6.L1 and q is a neural index.
After L2 fully connected layers, the feature passes through a Softmax layer [39] containing only n neurons.
is layer converts the feature signal into a meaningful probability description o ij . o ij has the same shape with the target o ij . However, o ij is not a 0-1 variable, but a continuous quantity, which satisfies o can be interpreted as the possibility of selecting priority p.
After defining the structure of the neural network, we use the error backpropagation (BP) method [39] to train the network parameter.
A trained neural network can be described as a function mapping in the scheduling section of Figure 1, which is shown in the following formula:  Figure 3: Concise sketch map of linear structure.
Computational Intelligence and Neuroscience In (25), DNNS(·) is the deep neural network scheduler and the input function is Input1 and Input2 in Figure 1 and (23). e o ij is the possibility that the current subproblem belongs to each priority.
ere are n priorities, so there are n elements in o ij , each of which is described as 4.6. Scheduling Sequence Generation Method. In (10), the whole problem is decomposed into several subproblems. In this part, a scheduling sequence generation algorithm combines the solutions of the subproblems into a complete solution of JSSP. e pseudocode description of the method is shown in Algorithm 1.
In Algorithm 1, each cycle for i will determine the scheduling order of one machine. Each cycle for j will determine the scheduling sequence of one job in the machine i.
When determining the order of jobs, in Step 7, the algorithm first chooses the most assured judgment of the neural network scheduler, and the most reliable judgment is the output probability closest to 1. In Step 8 and Step 9, when the job I j is selected as priority I p , the other data of job I j and priority I p are set to 0 according to constraints (2) and (3) to avoid the conflict in the next loop. In Step 9, the algorithm updates the value of the output matrix.

Generalization Performance of HDNNS.
HDNNS algorithm has a reliable generalization. Specifically, we can easily extend the training results of smaller-scale JSSPs (the number of machines is small) to solve larger-scale JSSPs (the number of machines is significant). Such characteristics give HDNNS a unique advantage. When the solution of a large-scale problem is difficult to be generated by the existing methods, HDNNS can use the solution of a small-scale problem to train the network and then use the trained model to schedule a large-scale problem.
In (26), the input parameters of the trained scheduler are composed of two sets of data, one of which is one-dimensional data and the other is two-dimensional data generated by CTDT. For all inputs, as the number of machines increases, the input and output structures of the neural network will not change.
Although the absolute value of the parameter changes, the correlation between the parameters still exists. e scheduler will use these features with relationship to complete the scheduling. Of course, the more significant the gap between the scale of training data and the scale of actual scheduling data, the bigger the error of results. is paper will discuss it in the experiment.

An Example of HDNNS
In this section, we illustrate HDNNS with an example (m � 6, n � 8).
In the training section of Figure 1, we generate a series of JSSP and solve them as the training data.
An example of algorithm generation of JSSP is described as Tables 2 and 3.  Tables 2 and 3 describe a 6 * 8 JSSP, and Figure 6 shows the Gantt chart of the optimal solution (with BBM). In Table 2, the number in line j and column k is the time required for job j's k-th position. In Table 3, the number in line j and column k is the machine required for job j's k-th position.
In Figure 6, the horizontal axis is the time axis and the ordinate axis is the machines axis. Each block represents a processing, and different colors represent different jobs. e text j · k in the boxes means that the processing of job j's k-th position starts at the time of the left side of the block and ends at the right side of the block. en, 48 subproblems are generated according to (10). One-dimensional and two-dimensional features are extracted for each subproblem, and training data such as (23) are generated in Table 4.
Six groups of two-dimensional features are selected for visual display, and the pictures are shown in Figure 7.
Six groups of matrices generated by CTDT are shown in Figure 7. Among them, (a), (c), and (e) have a high priority and the other three have a low priority.
In this extreme case of the highest priority and the lowest priority, it is easy to find that images with the same priority have a lot in common. In general, the hue of matrix D 2d e remaining three matrices describe the whole problem rather than the subproblem. erefore, the same graphics are shown in different subproblems.
Although identifying similar priority categories is more difficult for human beings, our deep learning-based scheduler can effectively extract the priority information.
After training the network with the data in Table 4, we get a scheduler that can respond quickly. When a new scheduling problem arrives, the scheduler processes the problem according to (25) and gets the priority matrix O. For this problem, the output example of matrix O is shown in Table 5.
Finally, the scheduling sequence generation algorithm is used to process the output matrix and the scheduling order X and the time result in Figure 6 can be obtained.

Parameters and Effect Experiment.
In this part, the training process of HDNNS and the influence of different parameters on HDNNS are discussed.
is experiment trains the scheduler with the first 1500 questions and labels and then tests the scheduler with the last 500 questions. e learning rate of the network is 0.01. e training process curve is plotted in Figures 8-10.
In Figures 8-10, the horizontal axis is the number of training loops and the vertical axis is the classification correctness, classification loss, and MAKESPAN [22] (completion time of processing). e curves of different colors represent the experimental results obtained by choosing different model parameters L1 and L2. Among them, the loss evaluation index calculation formula is Require: (1) Priority matrix, O; (2) Number of jobs, n; (3) Number of machines, m; Ensure: Scheduling output matrix, X; init the scheduling output matrix X : X � zeros((m, n, n)) For j � 0; i < n; i + + do (7) Find the most accurate judgment of neural network in machine i, and get the index I j and I k : I j , I p ⟵ findMaxNumber(Temp) (8) Set the Temp's I j line to zero: Temp k I j ⟵ 0, 1, 2, · · · , n { } (9) Set the Temp's I k column to zero: Temp End for (12) End for (13) Return X; ALGORITHM 1: Scheduling sequence generation algorithm.

Computational Intelligence and Neuroscience
In (27), o is the target output, o is the probabilistic description of current features belonging to various classifications, and L(·) is the loss function. y ck is the bool value, and this value indicates whether the target class of input features Input1, Input2 instance is k. o ck is the probability of input features Input1, Input2 belonging to class k predicted by the model. ere is a one-to-one mathematical relationship between o ck and o k ij in (26).
e three figures show that the performance of the model improves gradually with the increase of the number of training cycles. is improvement can be achieved until the classification accuracy reaches more than 90% and the model loss reaches less than 1. Moreover, the disparity between the scheduling result and the optimal solution reaches less than 5%.
e above experiments show that the HDNNS can effectively train the scheduler to complete the JSSP scheduling task.
Seven groups of different model parameters were selected and tested. e experimental results show that among all the parameters, L1 � 3 and L2 � 12 have better results. When L1 � 3 and L2 � 12, the classification accuracy of the centralized test is more than 93%, the loss is less than 85%, and the gap of MAKESPAN is less than 4%.

Confusion Matrix of the Result.
In order to measure the effectiveness of HDNNS, this paper compares it with the classical ANN [3] method.
Dataset ZLP (7 * 7) [41] is used in this part to train two kinds of neural networks.
is experiment trains the scheduler with the first 1500 questions and labels. en, this experiment tests the scheduler with the last 500 questions. For HDNNS, the size of the convolution kernel is 3 * 3, and L1 � 3, L2 � 12, and learning rate is 0.01. For the ANN method, the ANN structure is 11-12-10-7 and learning rate is 0.01. e classification confusion matrix of ANN and HDNNS (the output of Step 2.4 in Figure 1) is shown in Tables 6 and 7.
In Tables 6 and 7, the line i and column j is the number of times that the job with the ith position of the machine has been assigned to the jth position of the machine. e priority of job 2 in machine 1 is 2, meaning that this job is in the second position of machine 1's processing. If a scheduler classifies the location of job 2 in the machine 1 as 2, one will be added to the second row and the second column of the confusion matrix. If a scheduler classifies the location of job 2 in the machine 1 as 3, one will be added to the second row and the third column of the confusion matrix. erefore, the larger the number on the diagonal line, the higher the accuracy of the model. e bar figure of the confusion matrix is shown in Figures 11 and 12. Tables 6 and 7 and Figures 11 and 12 show that the classification performance of HDNNS is better than ANN. On the stability of classification, two methods can classify the highest and lowest priority jobs more accurately because the boundary of classification will introduce less noise interference. However, the classification accuracy of each priority of the HDNNS method is more stable. e accuracy of classification results of the HDNNS method fluctuates between 88% and 98%. In terms of classification accuracy, HDNNS can achieve 90% classification accuracy. It is better than 60% of the ANN method. e essence of the learning-based method is to estimate the probability from input to output by finding the implicit relationship between them. Because the ANN method does

Job info
Output /T total has a more significant impact on the final output. e traditional neural network does not have strong ability to deal with combined features. us, ANN is difficult to achieve effective training because of the disappearance of the gradient [36]. In this paper, deep convolutional network is introduced into the scheduling problem to solve the problem of learning and training combined features, which improves the accuracy of network classification.

MAKESPAN and Time Consumption Comparisons in ZLP
Dataset. JSSP scheduling methods are divided into two categories: population-based (gene-based) method and learning-based method. e population-based (genebased) method obtains the near-optimal solution by updating the solutions set. e effect of this method is often better than the other two algorithms. Because iteration will produce a lot of time cost, this kind of method can often get excellent scheduling results. erefore, this subsection does not compare population-based (genebased) methods.
is subsection will discuss the performance of HDNNS algorithm from the above two aspects. is part tests the performance of deep reinforcement learning (DEEPRL) [29,42], deep Q learning (DQN) [43], artificial neural network (ANN) [3], Hopfield neural network (HNN) [44], stochastic processing time (SHPT) [45] method, and shortest processing time (SPT) [46] method.  1  621  66  11  0  0  0  2  Priority 2  35  543  85  12  5  1  19  Priority 3  30  70  436  96  17  21  30  Priority 4  5  14  153  357  74  24  73  Priority 5  2  6  11  198  363  62  58  Priority 6  7  1  4  34  176  457  21  Priority 7  0  0  0  3  65  135 U(a, b). For HDNNS, ANN and the first 1500 JSSPs are used in the training section. e last 500 JSSPs are used in scheduling section to test the performance of the model. e DEEPRM method is trained by interacting with the JSSP model. e size of the convolution kernel is 3 * 3, and L1 � 3 and L2 � 12. e learning rate of HDNNS, ANN, and DEEPRL is 0.01. e number of learning epochs is 100. For HNN, SHPT, and SPT, the performance of the method is tested directly with the last 500 data. e experimental environment is Lenovo k4450, Ubuntu 16, CPU i4700 2.1 MHZ, Python, and Tensorflow. e results of the experiment are shown in Table 8. Four indexes are discussed in the table: average MAKESPAN, scheduling score, scheduling time, and training time. e scheduling score is calculated according to In (28) Eight groups of JSSPs are tested in Table 8. e first four groups are 8 * 8 JSSPs, and the last four groups are 13 * 13 JSSPs. Each problem is generated by random-based function, and its processing time satisfies the uniform distribution p ∼ U(a, b).
For the learning-based method (HDNNS, DQN, DEEPRL, and HNN), the time consumption is divided into training time and scheduling time. e training time is the total time needed for 100 epochs of model training. e scheduling time recorded the total time of testing 500 JSSPs.
e ANN method is tested in two cases, one (ANN (1D) in proposed in [3]) using only one-dimensional feature as the input feature and the other (ANN (ALL)) using flattened one-dimensional features and two-dimensional features as input features. ANN (1D) has a smaller network structure, so it has faster training efficiency and scheduling efficiency. Although the scheduling results of ANN (ALL) are better than that of ANN (1D), its training time is significantly improved with the JSSP scale. It is because the network structure of ANN has no advantage in dealing with complex scheduling information. Moreover, it cannot adequately deal with the relationship between the combination features and the output. In general, the scheduling effect of ANN network is better than the SPT method and STPT method.
Hopfield neural network (HNN) can also effectively obtain the scheduling results. But, unlike ANN, HNN seeks stability point through evolution and achieves the purpose of scheduling. HNN has a good effect on small-scale problems but suffers from the resolution of large-scale problems.
DEEPRM and DQN are scheduling methods based on reinforcement learning (DEEPRM's network structure is the same as that of HDNNS, and DQN use a standard deep network). ese methods do not need labeled training data in the training section, but they need much interaction with the scheduling environment. In most cases, interaction learning is much slower than learning through training data. For JSSP which has easy access to label data, DEEPRM and DQN have disadvantages in training efficiency.
HDNNS is stable in different processing time distributions p ∼ N(a, b) and different problem scales m and n. Moreover, the scheduling ability is maintained at 90% of the optimal solution, which is superior to the same ANN and HNN. Although the training time of HDNNS is longer than that of ANN (1D), it does not affect the real-time scheduling of the scheduler in applications because the training phase can be completed beforehand. Considering the scheduling performance of all the algorithms, HDNNS has significant advantages.

MAKESPAN and Time Consumption Comparisons in Traditional Dataset.
is subsection uses the same methods as in Section 6.3 to solve the classical JSSPs, which include ft10 [48], ft20 [48], la24 [49], la36 [49], abz7 [50], and yn1 [51]. e experimental procedure is as follows. First, 2000 JSSPs of the same scale as the under test JSSP are generated. en, the state-of-the-art method is used to find the optimal solution (near-optimal solution) as the training data. In this experiment, the solution of smaller JSSP (ft10, ft20, la24) is generated by the BBM method [47]. Moreover, the solution of larger JSSP (la36, abz7, yn1) is generated by the GA method. e first 1500 JSSPs are used in the training section. e last 500 JSSPs are used in scheduling section to test the performance of the mode. e DEEPRM method is trained by interacting with the JSSP model. e learning rate of HDNNS, ANN, and DEEPRM is 0.01. e number of learning epochs is 100. e experimental environment is Lenovo k4450, Ubuntu 16, and CPU i4700 2.1 MHZ.
e test results are shown in Table 9. e structure of Table 9 is the same as that of Table 8. e first column shows the optimal solution. Six popular JSSPs are tested in Table 9.
e brackets below the JSSP name indicate the size of the problem.
Testing with separate test questions introduces randomness, so we recommend using the average of a large number of test results to measure the effectiveness of the algorithm (like ZLP datasets).

MAKESPAN Comparisons with Traditional Classification
Algorithms. In this subsection, several traditional classification methods are used to compare with HDNNS. HDNNS is essentially a classification-based method, so it is necessary to compare it with some traditional classification methods. We replace the deep neural network scheduler in Figure 1 with other classification methods and measure its effect.
In this experiment, we test k-nearest neighbor (KNN) [52], support vector machine (SVM) [26], decision tree (DT)  Computational Intelligence and Neuroscience [53], extremely randomized trees (ERT) [54], and Gaussian model (GOSS) [55]. e test dataset is ZLP dataset [41], which contains 2000 15 * 12 JSSPs (m � 15, n � 12) and 2000 15 * 18 JSSPs (m � 15, n � 18). e solution of JSSPs above is generated with the GA method. e first 1500 JSSPs are used as the training section. en, the performance of the method is tested with the last 500 data. e parameters of the above classification methods are the default parameters of Python 3's sklearn tool kit. e result of the solution is shown in Table 10.
e experimental results show that HDNNS has a significant advantage over traditional classification algorithms. Although the traditional method has an advantage in efficiency, it can only achieve the 80% of near-optimal solution. erefore, HDNNS has a big advantage in the framework of this paper.
6.6. Analysis of Generalization Performance. HDNNS has a good scalability, and a trained scheduler can be used to solve problems of different machine numbers m. In other words, models trained with less complex problems can be used to solve more complex problems. Based on this premise, it is necessary to measure the generalization of models at different levels of complexity.
is subsection discusses the performance of models trained with small-scale data in solving large-scale problems. In the experiment, 1500 JSSPs (labels are generated with GA) are used as the training section. en, groups of larger problems (larger machine number m) are applied to test the scheduling capability of HDNNS. In order to get a credible conclusion, the experiment generates 500 different JSSPs and corresponding near-optimal solution for each group.
e box diagram of the experiment is shown in Figure 13.
In Figure 13, each box in the diagram represents a test result of a group. e top and bottom multiplication symbols represent the maximum and minimum values in the test. Moreover, the top and bottom triangles between the multiplication symbol is the 1% point and the 99% point of the 200 data. e lower and upper bounds of the boxes are 25% and 75% of the 200 data. e horizontal longer line in the middle of the box is the median number, and the horizontal shorter line is the average number. Figure 13 shows that the closer the scale of test problems and training problems is, the better their performance wil l be. e average ratio of MAKESPAN obtained by HDNNS to GA is 0.97, and the MAKESPAN of the scheduling result is also stable. With the increase in the number of machines, the model's efficiency gradually decreases, which is embodied in the decline of the excellent degree of the solution and the stability of the solution. However, the decline in solving ability is not rapid and unacceptable.
We are happy to see that our scheduler can extract scheduling knowledge from a simple JSSP and use it successfully in a more complex scheduling problem.
Specifically, the excellent degree of solutions of all test problems is greater than 0.86 (average).

Conclusion
A hybrid deep neural network scheduler with the characteristics of offline training and online real-time scheduling is created in this paper. In this scheduler, we present two innovations based on the machine learning framework. One is the convolution two-dimensional transformation (CTDT), which converts the irregular data in the scheduling process into regular data; this enables deep convolutional operation to be used to solve JSSP. Another is hybrid deep neural network structure including convolution layer, fully connected layer, and flattening layer. And, this structure can effectively complete the extraction of scheduling knowledge. e results show that the MAKESPAN index of HDNNS is 9% better than that of HNN and is 4% better than that of ANN in ZLP dataset. e training time of the HDNNS method is obviously faster than that of the DEEPRM method with the same neural network structure. Besides, the scheduler has brilliant generalization ability, which can solve large-scale scheduling issues with smallscale training data.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Computational Intelligence and Neuroscience
Conflicts of Interest e authors declare that they have no conflicts of interest.