Healthcare Scheduling by Data Mining : Literature Review and Future Directions

This article presents a systematic literature review of the application of industrial engineering methods in healthcare scheduling, with a focus on the role of patient behavior in scheduling. Nine articles that used mathematical programming, data mining, genetic algorithms, and local searches for optimum schedules were obtained from an extensive search of literature. These methods are new approaches to solve the problems in healthcare scheduling. Some are adapted from areas such as manufacturing and transportation. Key findings from these studies include reduced time for scheduling, capability of solving more complex problems, and incorporation of more variables and constraints simultaneously than traditional scheduling methods. However, none of these methods modeled no-show and walk-ins patient behavior. Future research should include more variables related to patient and/or environment.


INTRODUCTION 1.Background
A healthcare provider operates in a regulated industry in which the quality of its performance is evaluated in terms of the services rendered to the patient and the effectiveness of the provider's process.A measure of process effectiveness is the number of patients that visit the practice per day.Patient satisfaction is an important measure of quality of healthcare.A well designed scheduling system could increase patient satisfaction, access to care, as well as the effectiveness of healthcare operations [6, 7 and 8].Surveys suggest that excessive waiting time is a major reason for patient's dissatisfaction [9].In addition to clinical competence, a reasonable waiting time is expected [10].Patient satisfaction is influenced by the performance of the scheduling system.The scheduling process is a critical element for the patients and practitioners, and often, is the first contact between the patient and provider [11].Patients want to be seen on time by physicians who in turn want a system to triage patients efficiently [11].
Patients who schedule appointments face direct and indirect waiting times [3].Indirect waiting time is the time between a patient's request and the actual appointment time.Long indirect waiting times prevent patients with acute needs from being seen by the provider in a timely manner [12].The direct waiting time comes between the scheduled appointment time and the time the patient receives care.The scheduling system can affect both direct and indirect waiting times.Whereas direct waiting is an inconvenience to the patient, excessive indirect wait can pose a serious safety concern [13].If patients cannot receive healthcare service at the time needed, their condition can deteriorate, perhaps even become life-threatening.
In developing patient schedules, there are a number of external factors that a provider cannot control, such as no-show appointments and cancellations, emergency appointments, and walk-ins.These factors are determined by patients' needs.No-shows and walk-ins disrupt resource utilization plans and hence operational efficiency.

Industrial Engineering Methods Applied to Healthcare Scheduling
This paper discusses the application of industrial engineering techniques to improve scheduling in healthcare.We define two categories of healthcare scheduling: work (patient) scheduling and provider resource scheduling.In health systems, the patient is the work.The work/patient can be scheduled for those departments where there are slots available (e.g., outpatient clinic).For other departments, the work scheduling is simply based on first-in first-out or triage assessments (e.g., emergency room).Work scheduling has to be accompanied by resource scheduling which includes staff, equipment, and room scheduling.This schedule is based on the conditions for each department, such as number of slots available, time required for each slot, equitability between personnel (number of hours to be worked, days in a week, and holidays).

Work Scheduling
Patient appointments are scheduled based on the number of appointments slots available.The number of appointments available is established based on the type of work, such as regular visits, follow-up visits, tests and procedures, education sessions, and the number of providers available by hour and day of the week.It is the current practice that new appointments are made over the phone or internet and the patient has to be fit into a pre-existing schedule [6].Follow-up appointments can be made at the end of the visit or by phone/internet.Appointments are made until all available slots are filled up.Scheduling can be performed using a manual system (e.g., appointment book) or a computerized system.Both methods have to search for an available spot based on patient requirements.The manual system requires that the scheduler have specific knowledge and job experience.The computerized system can do the same albeit electronically.

Staff Scheduling
Staff scheduling can be performed only after the staff requirements are determined.Staff scheduling assigns each staff member to a pattern of work and leave days [14].We define two types of staff scheduling: cyclical and non-cyclical.Cyclical scheduling assigns the same pattern over a certain number of days or weeks, with an advantage of employee's familiarity with the schedule, and a disadvantage of lack of flexibility in accommodating the demand and staff needs.Non-cyclical scheduling generates new schedules over a short period of time (e.g., two weeks) based on the demand and available staff.Although it can accommodate change in demand and staff needs, noncyclical scheduling requires more planning time than cyclical scheduling.Development of cyclical and non-cyclical schedules has used heuristics, trial-and-error, or optimization techniques [14].

An Overview of Healthcare Scheduling Method Literature
In this section, we review the proposed methods used in the healthcare scheduling literature.The early articles [15,16] are based on mathematical programming methods and queuing theory.These studies assume that the conditions in which work is performed were static.Lindley (1952) used the case of a single server and random patient arrivals, and concluded that scheduling at regular intervals improves system performance [15].Following Lindley's model, more mathematical models were developed in which dynamic conditions were added to simulate more closely the actual environment.The mathematical model proposed by Jansson [17] featured an individual block with constant appointment times.Soriano [18] later modelled multiple-block/fixed intervals.Subsequently, Mercer [19] modelled individual-block and late arrivals, also considering the probability of patient no-show.Later mathematical models use dynamic programming, variable-block/fixed interval [20], multi-server queuing models with nonhomogeneous arrivals [21], a queuing system with multiple doctors and random arrival time [22], and two-stage stochastic linear programming to determine the optimal appointment intervals [23].
Bailey [16] used a simplified queuing technique assuming fairly static conditions such as appointment times at regular intervals based on the average consultation time, patients arriving punctually for their appointments, patients being seen in the order they arrive, and the provider seeing only one patient at the time.Some variables such as number of patients visiting the clinic, length of appointment interval, and number of patients waiting at any time, were varied for comparison reasons.The goal of the model was to reduce patient waiting and consultant idle time.Bailey's model used a Monte Carlo simulation technique and is considered one of the first simulation models.It was further developed to include the scheduling of groups of patients based on appointment length, and break time for consultants [24].
Based on a mathematical model, Welch [25] concluded that the factors that may influence appointment scheduling are punctuality of medical staff and patients, and appointment intervals (time between appointments).
More simulation models were developed a few years after Bailey's model.Fetter and Thompson [26] added some of the variables that were considered fixed in the previous models, such as patient punctuality, number of appointments, no-show rates, walk-in rates, appointment intervals, and patient loads.Vissers and Wijngaard [27] modelled a system based on five variables, mean consultation time, patient's punctuality, number of appointments, and "system earliness" that is allowing patients to arrive earlier than the expected moment of treatment.Ho and Lau [28, 29, and 30] developed some of the most comprehensive models by including 50 appointment rules.They concluded that no-show, service time, and number of patients are the major factors affecting the system performance.
Klassen and Rohleder [31,32] simulated different variance patients (patients with different service times) and fixed appointment intervals.They concluded that "lowvariance" patients (patients with low variance in service time) assigned at the beginning of the appointment session may perform better.They also included two slots for emergency, and scheduler error when classifying patients.
These techniques showed improved scheduling results, however, with limitations particularly in lack of generality.The studies analyzed a specific clinic, accounting for specific environmental conditions that may not apply to other clinics.A second limitation is the use of overly simplistic models, such as single server, single-phase, and nonrandomization of arrival patterns.A third limitation is that none of these models accounted for the effects of walk-ins, no-shows, or emergencies which are common in real life.
This review paper focuses on analysis of more complex methods used in healthcare scheduling that can improve scheduling results through the inclusion of greater variability and complexity.

Research Question
The specific research question of this literature study is: Does any of these techniques analyze data that can be used to improve scheduling in healthcare?

Search Strategy
To retrieve relevant studies from peer-reviewed literature, Pub Med electronic database was searched.The search included articles published up to August 2011 using the following keywords: MODEL AND (DATA) AND (HEALTHCARE OR MEDICAL) and (SCHEDULING).In addition to the electronic search, bibliographies in the relevant papers were reviewed to identify additional studies.

Selection Criteria
The studies that were included in our review met all of the following requirements: (a) the paper described application of data modeling technique to improve scheduling in healthcare, and the modeling technique had to be a data mining technique; (b) the modeling technique was applied to a real database; and (c) the paper was a full report published in English in a peer-reviewed journal.After article titles were identified from the database, the abstracts were reviewed for possible inclusion.For the abstracts selected for inclusion, or abstracts that provided insufficient information, full papers were retrieved.The same procedures were applied to the articles that were selected from bibliographies.After reading the full article, the articles that met our criteria were selected.

Data Extraction
The following items were documented from the included studies: locations (U.S. versus Non-U.S.), study designs, study periods, number of exposed groups, exposure variables, and outcomes.

Quality Assessment
The Epidemiological Appraisal Instrument (EAI) developed by Genaidy (2004) was used to critically appraise the methodological quality of the studies.The EAI [33] consists of 43 questions grouped into five scales: 1.
Every item in the EAI was rated for a given study using one of the following options: 1.
Not applicable (not included in score calculation), 2.
The final score for each scale was taken as the average of the values recorded for each item.

Identification of Studies
The selection criteria enumerated in Section 2.3 was applied throughout the identification process.Following the search of the electronic database, a total of 2,174 citations were obtained, of which 2,066 citations were excluded for irrelevance to the topic of the review.The remaining 108 abstracts were reviewed, of which 73 were further excluded for not meeting the selection criteria.Among the 35 full papers retrieved, 26 were excluded for not meeting the selection criteria.Nine relevant studies were finally identified.Figure 1 shows a flow diagram of studies accepted and rejected during this identification phase.
We arranged the nine articles in two groups.The first group consisted of six studies that describe techniques for patient scheduling [34, 35, 36, 37, 38 and 39].The second group consisted of three studies that describe techniques for nurse scheduling [40, 41 and 42].

Description of Evidence
Description of evidence was summarized for all nine studies in terms of intervention, outcomes, study design, and main results.The description of evidence for Group 1, the six studies of patient scheduling, is listed in Table 1.These studies have outcomes of measured patient waiting time, idle time for staff or devices, and overall duration.In particular: 1.
Whereas Chien et al. [37] applied a genetic algorithm to scheduling, Podgorelec and Kokol [34] introduced a genetic algorithm and machine learning approach to solve the scheduling problem.

2.
Two of the studies [38,39] introduced mathematical models to schedule patients and personnel in hospital services.

3.
Kaandorp and Koole [36] used a local search procedure to find the optimal schedule with a weighted average of expected waiting times for patients, idle time for doctors, and tardiness as objectives.

4.
A data mining approach to support simulation modeling of patient flow was introduced by Isken and Rajagopalan [35].

Figure 1.
Flow diagram of articles selected in the process of study identification.

Methods
• An evaluation and scaling function is used to assign a fitness value to each chromosome.
where W max (k) is the maximum waiting time for chromosome K, C max (k) is the makespan of chromosome k, P(k) is a penalty function of chromosome k.w w is the weight of maximum waiting time, and w c is the weight of makespan.• Selection mechanism is employed to improve the solution quality.
᭺ The roulette wheel approach is used for population selection.
• To validate the solution, a mixed integer programming is used to provide the optimal solution.• Objectives: to minimize maximum waiting time and minimize the makespan.where x j is a dimensional pattern vector.• For cluster selection, a measure named cluster purity was used.It uses the number of distinct DRGs appearing in a cluster.

Main results
• Data preparation affects positively the quality of solutions.• The K-means algorithm completed the task in 3-5s (when implemented in ClustanGraphics 5) Table 2 summarizes the description of evidence for the three articles that reported techniques used to improve nurse scheduling (Group 2).These three studies measured the reduction in time necessary to develop the schedule: 1.
The studies used three different methods for staff scheduling: an auctionoptimization method, a mathematical model, and an indirect genetic algorithm.

2.
The outcomes showed reduced time allocated to staff scheduling.

3.
All studies used different constraints such as the number of hours worked in a week and the number of nights worked in a week.

4.
All studies were performed in the USA, Canada, and UK.
Additionally, the descriptions of all the models in studies are summarized in Table 2. ᭺ Compulsory constraints (e.g., one physician must be assigned to one shift at the time); ᭺ Ergonomic constraints (e.g., limits on number of weekly hours of certain types of shifts); ᭺ Distribution constraints (e.g., seniority), and ᭺ Goal constraints (e.g., number of worked hours per week).
• Two programs are developed: ᭺ One to generate the model in a format accessible to the branch-and-bound software, by reading input file; ᭺ The other one to read the solution, create an output file and to identify violations of the ergonomic rules.

Main results
• This approach can take into account more rules than any human expert.• It violated ergonomic rules 40% less than the human expert (111 compared with 185 violations by human expert).• It is faster than a manual method.

Intervention
• Use of an indirect genetic algorithm method and a heuristic method in nurse scheduling.

Outcome • Minimize total preference cost of all nurses Study population • 52 real hospital datasets Methods
• The problem is formulated as an integer linear programming.
• The objective function is: where x ij is decision variables, and equals "1" if nurse i works shift pattern j, and "0" otherwise; p ij is preference nurse; n is number of nurses; m is number of shift patterns.

Table 2. Description of evidence for nurse scheduling (Group 2) (Contined)
Source Description

Aickelin and Dowsland, 2004 [41]
• The genetic algorithm tries to find the best possible ordering of the nurses, and the decoder builds the actual solution.• The decoder calculates the fitness function.
where R ks is demand of nurses, a jk and q is are decision variables, and w demand is a penalty weight.Main results • 51 out of 52 of datasets are solved to or near optimality and a feasible solution is found.• Indirect GA approach shown to be more flexible and robust than tabu search.

DeGrano et al. 2009 [42]
Intervention ᭺ For the award step, the objective function maximizes the point value of bids awarded to the candidate winners.᭺ For schedule completion, the objective function awards bids which were not selected as candidate winners but can create a feasible assignment.Main results • It took 2.073s to determine the candidate winner and generate the formulation.For award stage, it took 2 minutes and 55 s.To complete the schedule, it took 5.74s.• Overall it took about 3 minutes to generate the schedule, which is much faster than the manual method.• The auction-optimization approach can account for both, the nurse preferences and the hospital constraints, and generate a good schedule.• It fulfilled 98.27 % of "on" requests and 95.51% of "off" requests.

Critical Appraisal
Studies were evaluated based on scheduling improvement through minimizing patient waiting time, appointment span, and idle time for doctors.The overall critical appraisal referred to in Figures 2 and 3 applies to the overall quality of a study.For Group 1, the first six studies, Figure 2 shows that the studies ranged from a low (0.24 out of a maximum of 2.0 for the study by Kaandorp & Koole [36]) to an average (0.97 for the study by Ogulata et al. [38]).For Group 2, Figure 3 shows that all three studies are with marginal quality (between 0.79 and 0.86 out of a maximum of 2.0).
The study quality, as measured by critical appraisal, was divided into reporting, subject selection, measurement quality, data analysis, and generalization of the study.The details of critical appraisal of the studies in Group 1 are shown in Figure 4.The following information was deduced: 1.
All studies somewhat described the reporting elements, with the exception of that by Kaandorp and Koole [36] with a score of 0.38, which did not report any use of real data or confounders/covariates. 2.
Most studies gave inadequate details about data selection, with the general score ranging from 0 to 1.00.Only the study by Ogulata et al. [38] described the use of patient's data.

3.
All studies were average in terms of measurement quality (ranging between 0.86 and 1.14 from a maximum of 2), with the exception of Isken and Rajagopalan [35] and Chakraborty et al. [39] which did not report any measurements.

4.
The scores for data analysis methods were quite low, being below average or zero (not reported).
490 Healthcare Scheduling by Data Mining: Literature Review and Future Directions

5.
The scores of study generalization were 2 for all except the study by Chakraborty et al. [39] and Isken and Rajagopalan [35] that had a score of zero.This was due to the unreported participation rate.A score of 2 means that the results of the study may be applicable to the eligible population or other relevant groups.
The details of critical appraisal of the studies in Group 2 are shown in Figure 5.
The following conclusions were deduced: 1.
The scores of reporting elements were below average (0.62-0.77).

3.
The measurement quality scores were marginal to average for all studies, between 0.67 and 1.00.

4.
The scores for data analysis were low to marginal (0 to 0.67).

5.
The scores of study generalization are maximum for each, as all participation rates were reported as being 100%.
The measurement quality was divided into quality exposure, blind measurement, outcome, and observation period.Because the studies were only intervention studies, the answers to the questions dealing with exposure were not applicable, resulting in scores that were not reported for this part.The outcome scores were, on average, marginal (0.67).The observation period had a score of two for all studies, except Kaandorp and Koole [36] (score of 0).

Mathematical Modelling and Optimization
Mathematical modelling is based on the optimization concepts of linear programming.The objective of mathematical modelling is to minimize or maximize the objective function.It has to take into account a number of constraints defined by the problem.In general, optimization techniques include linear programming, integer programming (a Study quality scores for nurse scheduling studies in Group 2. linear programming technique that requires the variables to be integers), mixed integer programming, dynamic programming, constructive algorithms and more.In this review, four studies that fall into this category were two studies that deal with physician and nurse scheduling [40,42] and two studies that deal with patient/staff scheduling [38,39].Per optimization methodology, all studies defined the variables that were the inputs to the model, and the objective functions.
Beaulieu et al. [40] defined a multi-objective integer model that had multiple objectives and all the solutions had to be integers.The objective function sought to minimize a weighted sum of all deviations.
Ogulata et al. [38] used three hierarchical mathematical models with the objective of maximization of the number of patients seen in one week, obtaining a balanced distribution of patients among physicians, and minimization of patient waiting time.The mathematical models were applied in three different stages: weekly patient selection, assignment of physiotherapists, and patient scheduling.
The study by Chakraborty et al. [39] introduced sequential clinical scheduling, which can be formulated using dynamic programming.This study was based on an earlier work [43] which had maximization of the expected revenue for patients seen minus the cost for patient waiting and staff revenue as the objective function, including the no-show appointments in the scheduling model.This study was further adopted in subsequent research [44,6] featuring the same objective function and constraints but different distribution of the numbers of patients' no-shows, whereas Chakraborty et al. study [39] considered that the no-show patients' distribution is homogeneous.
DeGrano et al. [42] took a different approach in his optimization model.In the initial stage of the model, there was an auction where nurses could bid the shifts they desired.In the second stage, the model awarded the shifts and did the scheduling.The objective for the award model was to maximize the point value of bids awarded to the candidate winners.The objective for the assignment model was to seek award bids which were not selected as candidate winner but can create a feasible assignment.

Genetic Algorithms Used in Patient and Nurse/Physician Scheduling
Genetic algorithms (GAs) are heuristic search methods used in solving complex search and optimization problems.GA's are based on the mechanism of natural selection and genetics.Often, they are able to find the optimal solution in more complex search spaces (the set of possible solutions) and can present significant benefits over other search and optimization techniques [34].Three studies used genetic algorithms for schedulingpatients or nurses: Podgorelec and Kokol [34]; Chien et al. [37]; and Aickelin and Dowsland [41].Genetic algorithms typically involve six steps: 1) selection of initial population, 2) reproduction, 3) crossover, 4) mutation, 5) evaluation, and 6) selection of solution.All of the three studies used the same GA described below:

•
The initial population can be selected randomly or through a seeded population.Podgorelec and Kokol [34] used a seeded population by filling the table with therapies in a random time order.Chien et al. [37] used a local search heuristic to derive the sequence of patients and therapies that represented the chromosome.
Aickelin and Dowsland [41] built the chromosome as permutations of individual nurses.

•
The reproduction step is explained in only two articles.Podgorelec and Kokol [34] assigned negative points to each individual selected from the initial population based on the values of some parameters.The fewer the negative points an individual had, the better chances the individual had to be selected for crossover.In Chien et al. [37], chromosomes were randomly selected.

•
The crossover step was described in Podgorelec and Kokol [34] and Chien et al. [37].The first study used the method of cutting chromosomes into different parts along the timeline and then randomly putting them together.The second study used a modified version of preserving an order-based crossover.After individuals were selected from the initial population, all precedence-dependent therapies were identified.These therapies were copied into the offspring chromosome at the same position.The remaining therapies were copied from parent 2 into the offspring following the same order.

•
The mutation step was described in Podgorelec and Kokol [34] as a mutation between two random activities.In Chien et al. [37], the mutation procedure was performed by randomly selecting a therapy from a parent chromosome.The leftmost and rightmost positions were searched to determine an interval within which the selected therapy could mutate without violating the precedence constraint.The therapy was then inserted into the selected position.

•
The evaluation method was different in each study, but all used a fitness score to evaluate the chromosomes.Podgorelec and Kokol [34] and Chien et al. [37] used a combination of maximum waiting time and the makespan (the total duration of all services for a day) as the evaluation function.Aickelin and Dowsland [41] used the total preference cost of all nurses as an evaluation function.

•
The selection of a solution was made based on the evaluation function.In the two studies involving patient scheduling [40,42], the solution with the minimum waiting time and the lowest makespan was chosen.In the studies involving nurse scheduling [38,39], the solution with the minimum total preference cost was chosen.

Local Search and Data Mining Models
Two articles dealt with the use of local searches and data mining techniques for patient and nurse scheduling: Kaandorp and Koole [36] and Isken and Rajagopalan [35].The local search method used by Kaandorp and Koole [36] started with a feasible solution and searched for a better solution in its neighbourhood until a local minimum was found.The solution found was not a global minimum, but with a well-chosen neighbourhood, it was possible to find the global minimum.To find a feasible solution, the authors used a mathematical model type of scheduling that calculated patient mean waiting time, physician idle time, and tardiness.The objective was to minimize the sum of the three variables.The number of solutions given by this model was so large that a search algorithm was needed to select the best one.
Isken and Rajagopalan [35] used a data mining technique, specifically clustering, such as K-means.The authors started with the concept that patients have different individual needs such as different types of treatments or sequences of treatments, and consequently, different resources need to be allocated to individual patients.This would be too big of a problem to solve, so the solution would be patient classification in groups that have the same needs.
To classify patients in groups, Isken and Rajagopalan [35] used the number of total hours spent in different categories of hospital units, and the path and associated lengths of stay as input variables.The authors also used Diagnostic Related Groups (DRG's) and Clinical Classification Software (CCS) that provided information about the diagnoses and procedures.The K-means clustering method was employed to classify patients.K-means is an applied clustering algorithm in unsupervised classification problems to find k optimal clusters in a data set.The algorithm includes the following steps: Repeat the two previous steps until some convergence criterion is met.

DISCUSSION
A key point in improving healthcare delivery is the improvement of process performance.Part of process performance is balancing the demand and resources.In the healthcare system, the demand is the number of patients to be seen every day and the resources are the nurses, physicians, rooms, and instrumentation that are available.Balancing demand and resources will direct us towards the problem of scheduling patients and resources.In this paper, we focused exclusively on methods for patient and staff scheduling.All of the computerized methods developed in the recent decades try not only to improve the quality of service, but also to help schedulers do their work faster [37].The attention of the new methods is focused not only at staff scheduling, but equally at patient scheduling.These methods aim at the reduction of patient waiting time (indirect and direct) and total time spent in the medical facility.
As Podgorelec and Kokol [34] noted, regardless of the methods used, there are basic rules that must be followed for successful development of a qualitative and effective automated scheduling system, such as feasibility of all obtained solutions, and fulfillment of all constraints.Additionally, an adequate solution must be found in a reasonable amount of time.An indispensable property of the scheduling technique must be the capability of solving general and independent problems, i.e., the applicability to various situations, especially in an unpredicted situation.For example, it should allow the system to search for solutions even when activities already scheduled are cancelled.To reduce the time and effort in schedule construction [40,41] Achieving these objectives will result, indirectly, in an improvement of healthcare quality and a reduction in cost, parameters that are essential to the wellbeing of healthcare system.These models take into account a number of variables such as the number of open slots per day, number of days in the schedule, length of each appointment, number of patients to be scheduled each day, number of patients assigned to each staff member, overall duration of activities, process starting time and completion time, earliest starting time and latest completion time, number of shifts available during the schedule period, and resource utilization cost.
The studies included in this review proved that data mining approaches were more successful in achieving the objectives of scheduling than standard (manual) scheduling methods.Besides achieving the objectives listed above, data mining approaches showed new advantages that were not directly related to the main objectives.For example, in the study by DeGrano et al. [42], the auction-optimization approach accommodated the preferences of individual nurses.Furthermore, the schedule could be generated in a reasonable amount of time.In the study by Isken and Rajagopalan [35], the k-means performed well completing the task in 3-5 seconds.Also, it showed that data preparation is an important step in scheduling and has great implications in the quality of the solution.Some approaches such as the study by Aickelin [40], show that the use of genetic algorithms can be more flexible in implementation than Tabu Search.
In addition to the variables used in these models, variables that are related to patient behavior should be used as inputs in a model.All the models included in this review considered that patients and staffs were always on time; however, this is not true in real life.Occasionally, physicians are late for their clinics [24] or patients are late for their appointments.Many of the missed appointments involve patients not showing up without canceling in advance.Reviews in the scientific literature concluded that the noshow rates may be around 20% [47] and vary between 15-30% in general adult and pediatric clinics [48], and 2-15% in private practices [46].Missed patient appointment (no-show) has an adverse effect on resource utilization in healthcare services, resulting in under-utilization of clinic capacity [49], loss in revenue, inefficient scheduling, and underutilization of personnel [51].While previous efforts have identified some relevant elements of the systems, they fail to provide a holistic, quantitative approach combining the organization scheduling system and patient behaviour into a common framework.constraints are specific to each healthcare facility, one may ask the question, "Can these AI models be applied to a wide range of healthcare facilities?"The challenge remains in developing procedures that can resolve the conflict arising from using different sets of rules developed by multiple experts [55].
Other considerations that have to be taken into account are related to patients such as indirect waiting time, late cancellation and no-show, emergency walk-ins and patient preferences [3].The scheduling system can affect both direct waiting time and indirect waiting time.Both direct and indirect times may pose a serious risk for the patient safety.Both no-shows and walk-ins influence the scheduling system.No-shows create gaps in the system that reduce revenue without reducing cost.Also, by not showing for an appointment, a patient denies an appointment to another patient and increases the indirect waiting time.Walk-ins occur randomly and may increase waiting time for patients and overload nurses/physicians.However, managing no-shows and walk-ins may be a challenging task.Finally, none of the reviewed models take patient preferences into account in scheduling.Although preferences differ from one person to another, one model [35] created groups of patients (clusters) with similar preferences and then used these preference groups in developing a schedule.

CONCLUSION
The methods covered in this literature review represent new approaches to solving the problem of scheduling in healthcare.They were adopted from other areas, such as manufacturing or transportation.Due to the complex and dynamic healthcare environment, their implementation has to be carefully planned.The use of data mining models was proven to be successful in improvement of healthcare scheduling.The advantage of data mining techniques is improved competitiveness of the healthcare scheduling system by discovering patterns and trends in large data sets generated by healthcare systems, and using these patterns and trends in decision making.Data mining is considered to be the best approach so far for dealing with the complex healthcare scheduling issues.
One limitation of these scheduling methods was that they were designed for specific providers, thus lacking generality.Another limitation was that each method concentrated on a single measure of performance at a time, such as time-based measures (waiting time, idle time).More external variables, such as patient behaviour, should be taken into account in developing improved scheduling methods.
One of the problems yet to be solved is inclusion of no-show, cancelation, walk-ins in a scheduling system.Accurately predicting no-shows, cancelations, and walk-ins could significantly improve scheduling performance, and consequently increasing provider revenue.Future research should forecast no-show, cancelation, and walk-ins and suggest techniques to be incorporated into the scheduling system.
The use of data mining techniques to assist scheduling in healthcare can have a great impact on the scheduling system.This impact can be attributed to: (a) the large number of factors that can be taken into account; (b) weighting the influence of each factor; (c) quantifying and processing the large number of variables; and (d) reducing the time necessary for scheduling.

Figure 3 .
Figure 3.Overall critical appraisal scores for nurse scheduling studies in Group 2.

Figure 4 .
Figure 4.Study quality scores for patient scheduling studies in Group 1.

Figure 5 .
Figure 5.Study quality scores for nurse scheduling studies in Group 2.

Table 1 . Description of evidence for patient scheduling (Group 1) Source Description Chakraborty et al. 2010 [39]
Crossover and mutation are applied to the population to generate offspring from parents.
•The objective function is unimodal for general service time and the unimodality is independent of the type of the service time distribution.•Developsaspecialcase of gamma service times which requires significanty less computation.Main results• It shows how the computational needs can be reduced significantly when service times are approximated by a gamma distribution.•Thevalue of the maximum profit decreases with increasing variance of the service time.Chien et al., 2008 [37]᭺ A set of initial solutions is encoded as a set of chromosomes called population.᭺

Table 1 . Description of evidence for patient scheduling (Group 1) (Contined) Source Description Ogulata et al., 2008 [38]
2 nd stage: assignment to physiotherapist.᭺ 3 rd stage: patient scheduling; each patient is assigned to one time interval in a working day.•Usingstage 1 model, 54 patients were selected based on their priority.From these, 44% have high priority, 49% medium, and 7% low.•Instage two, the patients are assigned to a physiotherapist as follows: 14 patients to 1 st therapist, 14 to 2 nd , 13 to 3 rd , and 13 to the last one.Main results• The proposed algorithm scheduled 6 more patients than the scheduling method used in the hospital, and scheduled 24 patients with high priority compared with 15 scheduled with the existing method.•The proposed algorithm decreased patient waiting time.•Limitations: the algorithm does not take into account patient preference for day of the week.W , α I and α L are the weights for mean waiting time, idle time, and tardiness.• Since the number of solution is too big, a neighborhood is defined.
Minimize ww * W max + w c * C ᭺ W W(x) + α I I(x)+ α L L(x)where α

Table 1 . Description of evidence for patient scheduling (Group 1) (Contined) Source Description Podgorelec and Kokol, 1997 [34]
Anumber of random schedules were generated.•Each individual was evaluated for the fitness score.The individuals with the best score were selected.• The crossover procedure was applied until new individuals fulfill the complete population.Each time when an individual is added, another individual (with the lowest fitness score) is eliminated.• The mutation operator is applied with some probability.• All phases are repeated until an acceptable solution evolved.

Table 2 . Description of evidence for nurse scheduling (Group 2) Source Description Beaulieu et al., 2000 [40]
Overall critical appraisal scores for patient scheduling studies in Group 1.
(42)elin and Dowsland , 2004 (41) Beaulieu et al., 2000 (40) DeGrano et al., 2009(42) The articles included in this review presented different methods for patient and nurse/physician scheduling: mathematical modeling/optimization, genetic algorithm, local search, and data mining.All these methods had common objectives: