Teaching Scientific Computing : A Model-Centered Approach to Pipeline and Parallel Programming with C

1 Informatics Methodology Department, Institute of Mathematics and Informatics, Vilnius University, Akademijos Street 4, LT-08663 Vilnius, Lithuania 2Department of Software Development, Faculty of Electronics and Informatics, Vilniaus Kolegija University of Applied Sciences, J. Jasinskio Street 15, LT-01111 Vilnius, Lithuania 3Department of Didactics of Mathematics and Informatics, Faculty of Mathematics and Informatics, Vilniaus University, Naugarduko Street 24, LT-03225 Vilnius, Lithuania 4Operational Research Sector at System Analysis Department, Institute of Mathematics and Informatics, Vilniaus University, Akademijos Street 4, LT-08663 Vilnius, Lithuania


Background and Introduction
Teaching of scientific and parallel computing, and advanced programming are under permanent attention of scientists and educators.Different approaches, models, and solutions for teaching, assessments, and evaluation are proposed.The constructivist model for advanced programming education is one out of the presented approaches [1,2].In the model, a learner constructs the relevant knowledge, experiments with the provided environment, observes the results, and draws conclusions.The constructivist approach incorporates the model-centered learning as well as comparative programming teaching methods.Experimenting with the provided model, "turning" the model and observing it from different sides, comparing different solutions, and analysing and drawing conclusions, the learner improves the relevant knowledge and competences.
Modern technologies widely involve parallel computing, and plenty of scientific and industrial applications use parallel programming techniques.Teaching of parallel computing is one of the most important and challenging topics in the scientific computing and advanced programming education.In this research, we present a methodology for the introduction to scientific and parallel computing.This methodology is based on the constructivist technology and uses a modelcentered approach and learning by comparison.The paper proposes a set of programming models, based on stochastic simulations of the provided model of a multiphase queueing system.The multiphase queueing system is selected due to the simplicity of the primary definitions and wide possibilities for parallelization.We implement different parallelization techniques for programming the model.That allows us to carry out a series of experiments with different programming models, compare the results, and investigate the effectiveness of parallelization and different parallelization methods.Such parallelization methods include shared memory, distributed memory, and hybrid parallelization and are implemented by MPI and OpenMP APIs. Figure 1 presents the model-centered approach to the introduction into scientific computing and parallel programming.
The paper continues the earlier authors' research, presented in [3].A possible application scope of this research could be the second level course in scientific computing and programming with an emphasis on pipeline and parallel computing programming and modelling.

Scientific Computing in Science and Engineering Education.
Scientific computing plays an important role in science and engineering education.World leading universities and organizations pay an increasing attention to the curriculum and educational methods.Allen et al. report on a new graduate course in scientific computing that was taught at the Louisiana State University [4]."The course was designed to provide students with a broad and practical introduction to scientific computing which would provide them with the basic skills and experience to very quickly get involved in research projects involving modern cyber infrastructure and complex real world scientific problems." Shadwick stresses the importance of more comprehensive teaching of scientific computing [5]: ". ..computational methods should ideally be viewed as a mathematical tool as important as calculus, and receive similar weight in curriculum." The Scope of Scientific Computing Education.One of the tasks in the scientific computing education is to provide a general understanding of solving scientific problems.Heath writes, ". ..try to convey a general understanding of the techniques available for solving problems in each major category, including proper problem formulation and interpretation of results. .." [6].He offers a wide curriculum to be studied including a system of linear equations, eigenvalue problems, nonlinear equation, optimization, interpolation, numerical integration and differentiation, partial differential equations, fast Fourier transform, random numbers, and stochastic simulation.All these topics require a large amount of computations and could require parallelization solutions to be solved.Karniadakis and Kirby II define, "scientific computing is the heart of simulation science" [7]."With the rapid and simultaneous advances in software and computer technology, especially commodity computing, the so-called supercomputing, every scientist and engineer will have on her desk an advanced simulation kit of tools consisting of a software library and multi-processor computers that will make analysis, product development, and design more optimal and cost-effective." The authors suggest the integration of teaching of MPI tools to the educational process.A large number of MPI implementations are currently available, each of which emphasizes different aspects of high-performance computing or is intended to solve a specific research problem.Other implementations deal with a grid, distributed, or cluster computing, solving more general research problems, but such applications are beyond the scope of this paper.Heroux et al. describe the scientific computing as ". ..a broad discipline focused on using computers as tools for scientific discovery" [8].The authors claim, "The impact of parallel processing on scientific computing varies greatly across disciplines, but we can strongly argue that it plays a vital role in most problem domains and has become essential in many." Teaching of Parallel Computing.NSF/IEEE-TCPP Curriculum Initiative on parallel and distributed computing (PDC), Core Topics for Undergraduates, contains a comprehensive research on the curriculum for parallel computing education [9].The authors suggest including teaching of PDC: "In addition to enabling undergraduates to understand the fundamentals of 'von Neumann computing, ' we must now prepare them for the very dynamic world of parallel and distributed computing." Zarza et al. report, "high-performance computing (HPC) has turned into an important tool for modern societies, becoming the engine of an increasing number of applications and services.Along these years, the use of powerful computers has become widespread throughout many engineering disciplines.As a result, the study of parallel computer architectures is now one of the essential aspects of the academic formation of students in Computational Science, particularly in postgraduate programs" [10].The authors notice significant gaps between theoretical concepts and practical experience: "In particular, postgraduate HPC courses often present significant gaps between theoretical concepts and practical experience." Wilkinson et al. offer, ". . .an approach for teaching parallel and distributed computing (PDC) at the undergraduate level using computational patterns.
The goal is to promote higher-level structured design for parallel programming and make parallel programming easier and more scalable" [11].Iparraguirre et al. share their experience in a practical course of PDC for Argentina engineering students [12].One of the suggestions is that "Shared memory practices are easier to understand and should be taught first." Constructivist and Model-Centered Learning.R. N. Caine and G. Caine in their research [13] propose the main principles of constructivist learning.One of the most important principles is as follows: "The brain processes parts and wholes simultaneously." Under this approach, a wellorganized learning process should provide details as well as underlying ideas.In his research [1], Ben-Ari develops a constructivist methodology for computer-science education.Wulf [2] reviews "the application of constructivist pedagogical approaches to teaching computer programming in high school and undergraduate courses." The model-centered approach could enhance constructivist learning proposing the tool for studying and experimentation.Using modelcentered learning, we first present the goal of the research after providing a model for simulation experiments.That allows us to analyze the results and to draw the relevant conclusions.Gibbons introduced model-centered instruction in 2001 [14].The main principles are as follows: (i) learner's experience is obtained by interacting with models; (ii) learner solves scientific and engineering problems, using the simulation on models; (iii) problems are presented in a constructed sequence; (iv) specific instructional goals are specified; (v) all the necessary information within a solution environment is provided.
Xue et al. [15] introduce "teaching reform ideas in the 'scientific computing' education by means of modeling and simulation." They suggest, ". ..the use of the modeling and simulation to deal with the actual problem of programming, simulating, data analyzing. ... "

Parallel Computing for Multiphase
Queueing Systems

Multiphase Queueing Systems and Stochastic Simulations.
A general multiphase queueing system consists of a number of servicing phases that provide service for arriving customers.The arriving customers move through the phases step-by-step from entrance to exit.If the servicing phase is busy with servicing the previous customer, the current customer waits in the queue in front of the servicing phase.
The extended Kendall classification of queueing systems uses 6 symbols: /////, where  is the distribution of intervals between arrivals,  is the distribution of service duration,  is the number of servers,  is the queueing discipline (omitted for FIFO, first in first out),  is the system capacity (omitted for unlimited queues), and  is the number of possible customers (omitted for open systems) [16,17].
For example, M/M/1 queue represents the model having a single server, where arrivals are determined by a Poisson process, service times have an exponential distribution, and population of customers is unlimited.The interarrival and servicing times both are independent random variables.We are interested in the sojourn time of the customer in the system and its distribution.The general schema of the multiphase queueing system is presented in Figure 2.
We consider both interarrival and servicing times as exponentially distributed random variables.Depending on the parameters of the exponential distributions, we distinguish different traffic conditions for the observed queueing system.Those include ordinary traffic, critical traffic, or heavy traffic conditions.We are interested to investigate a distribution of the sojourn time for these different cases [18][19][20] and we will use Monte-Carlo simulations for collecting the relevant data.

Recurrent Equation for Calculating the Sojourn Time.
In order to design a modelling algorithm of the previously described queueing system, some additional mathematical constructions should be introduced.Our aim is to calculate and investigate the distribution of the sojourn time of the number  customer in the multiphase queueing system of  phases.We can prove the next recurrent equation [19]: let us denote by   the time of arrival of the th customer; let us denote by  ()   the service time of the th customer at the th phase;   =   −  −1 ;  = 1, 2, . . ., ;  = 1, 2, . . ., .The following recurrence equation for calculation of the sojourn time  , of the th customer at the th phase is valid: ,0 = 0, ∀;  0, = 0, ∀.
( For the shared memory decomposition, we use tasks and the dynamic decomposition technique in the case of the pipeline (transversal) decomposition and we use the loop decomposition technique in the case of threads (longitudinal) decomposition.For the distributed memory decomposition, we use the standard message parsing MPI tools.For the hybrid decomposition, we use the shared memory (loop decomposition) for the longitudinal decomposition and MPI for the pipeline decomposition.

Parallelization for Multiphase
Queueing Systems.Statistical sampling for modelling the sojourn time distribution (Figure 3) presents the general schema of the imitational experiment on the queueing system.
The programming model of the multiphase queueing system is based on the recurrent equation, presented in one of the upper sections.The Monte-Carlo simulation method is used to obtain the statistical sampling for modelling the sojourn time distribution on the exit of the system.

Stochastic Simulations and Longitudinal Decomposition.
One of the solutions is to use the longitudinal decomposition (trials) and to parallelize the Monte-Carlo trials.Thus we can map each or a group of trials depending on the preferred granularity and the total number of desired trials.The schema of the longitudinal decomposition is presented in Figure 4.A three-dimensional model of the longitudinal decomposition is presented in Figure 5.

Pipelining and Transversal Decomposition.
In another dimension (customers) the parallelization technique is not as straightforward as it was in the previous case of parallelization of the statistical trials dimension.There arises a difficulty  , "%d%s%d%s%d%s%d\n", N, ", ", M, ", ", lambda[0], ", ", lambda[M as we have the pipelining structure of the algorithm.It is obvious, as the customer moves through the system from the previous to current servicing phase, and we need to have all the data from the previous stage for calculations in the current stage.We use the transversal decomposition and the number of customers in each stage depends on the preferred granularity and the total number of customers.Figure 6 presents the decomposition in the case of the customer's dimension.

Distributed Memory Implementation.
The distributed memory implementation is based on MPI tools.In both cases, that is, longitudinal and transversal decompositions, the message parsing interface provides synchronization tools and there is no need for additional programming constructions.
2.9.Hybrid Models and HPC.Hybrid models provide a natural solution to computational platforms, based on highperformance computer clusters.It uses MPI tools to perform a decomposition and OpenMP tools for multithreading.

Dynamic and Static
Scheduling.Pipelining requires dynamic scheduling, since there is a connection between various nodes in the pipeline.In the shared memory case we must take care of scheduling.If we use the OpenMP tasking model, the relevant approach could be twofold.First, it is possible to obtain a dynamic scheduling by monitoring a critical shared memory resource and using a task yielding construction.The other one is to use the dynamic task creation technique.In the case of MPI, synchronization is performed by the interface system, and then there is no need for additional programming constructions.

Sequential Programming Model
The flowchart of the sequential program model is presented in Figure 7.The algorithm uses the recurrent equation and cycles for modelling the queueing system phases, customers, and statistical trials.The program model of the sequential approach uses the programming language C and GSL (GNU scientific library) and it is presented in Appendix A of this paper including the comments.

Programming Model for Distributed
Memory Parallelization   tools and it is optimal for the multicore/multimode computer architecture.All the processes receive a full copy of the programming code and the rooting is made by using the number of the process.The flowchart of the longitudinal decomposition is presented in Figure 8.The programming language C model with the comments is presented in Appendix B.

Transversal Decomposition.
The flowchart for the transversal decomposition is presented in Figure 9. Processes are attached to the customer's axis which is divided into chunks.We use a mutual message parsing technique and MPI is responsible for scheduling.Statistical trials are divided into the relevant chunks and provide the desired granularity.
The programming language C model with the comments is presented in Appendix C.  is attached to the relevant thread.The flowchart of the shared memory model is presented in Figure 10.

Programming Model for Shared Memory Parallelization
The programming model for the shared memory longitudinal decomposition uses the programming language C, GSL (GNU scientific library), and OpenMP libraries.The model and comments are presented in Appendix D of this paper.

Transversal Decomposition.
One of the most comprehensive programming models is the model of the shared memory pipelining.We use the transversal decomposition to construct such a model.The model uses the OpenMP tasking technique and dynamic scheduling of tasks.The scheduler plays the central role in the model and it is responsible for creating new tasks and finishing the program, after completing all the tasks.Each task uses its own random generator which allows avoiding time-consuming critical sections.The flowchart is presented in Figure 11.Programming model for the shared memory transversal decomposition uses C programming language, GSL (GNU scientific library), and OpenMP libraries.The model and comments are presented in Appendix E of this paper.

Programming Model of Hybrid Parallelization
The MPI transversal decomposition model could be transformed into a hybrid model by adding the OpenMP threads  to the MC trials axis.The flowchart of the hybrid model is presented in Figure 12.The programming model with comments is presented in Appendix F.

Theoretical and Programming Models:
The Basis of the Model-Centered Approach.The paper provides a number of programming models for the introduction to scientific and parallel computing.All these programming models, sequential, distributed memory, distributed memory pipelining,
Implementation.The shared memory implementation is based on the OpenMP tools.The loop

Figure 5 :MonteFigure 6 :Figure 7 :
Figure 5: Three-dimensional model of the longitudinal decomposition.P ro c e ss e s/ th re a d s
) Taking into account the above two cases, we finally have the proposition results.Such a platform allows us to study different parallelization techniques and implement shared memory, distributed memory, and hybrid memory solutions.The main goal of parallelization is to reduce the program execution time by using the multiprocessor cluster architecture.It also enables us to increase the number of Monte-Carlo simulation trials during the statistical simulation of the queueing system and to achieve more accurate results in the experimental construction of the sojourn time distribution.To implement parallelization, we use OpenMP tools for the shared memory model, MPI tools for the distributed memory model, and the hybrid technique for the hybrid memory model.
Proof.It is true that if the time   + −1, ≥  ,−1 , the waiting time in the th phase of the th customer is 0. In the case   + −1, <  ,−1 , the waiting time in the th phase of the th customer is    =  ,−1 − −1, −  and  , =  −1, +   + ()  .2.3.Theoretical Background: Parallel Computing.In this research, we emphasize a multiple instruction, multiple data (MIMD) parallel architecture and presume the high performance computer cluster (HPC) as a target platform for calculations.