High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science

Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing.The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.


Introduction
High Energy Physics (HEP) experiments are probably the main consumers of High Performance Computing (HPC) in the area of e-Science, considering numerical methods in real experiments and assisted analysis using complex simulation.Starting with quarks discovery in the last century to Higgs Boson in 2012 [1], all HEP experiments were modeled using numerical algorithms: numerical integration, interpolation, random number generation, eigenvalues computation, and so forth.Data collection from HEP experiments generates a huge volume, with a high velocity, variety, and variability and passes the common upper bounds to be considered Big Data.The numerical experiments using HPC for HEP represent a new challenge for Big Data Science.
Theoretical research in HEP is related to matter (fundamental particles and Standard Model) and Universe formation basic knowledge.Beyond this, the practical research in HEP has led to the development of new analysis tools (synchrotron radiation, medical imaging or hybrid models [2], wavelets-computational aspects [3]), new processes (cancer therapy [4], food preservation, or nuclear waste treatment), or even the birth of a new industry (Internet) [5].
This paper analyzes two aspects: the computational methods used in HEP (Monte Carlo methods and simulations, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation, and Random Matrix Theory) and the challenges and requirements for ICT systems to deal with processing of Big Data generated by HEP experiments and simulations.
The motivation of using numerical methods in HEP simulations is based on special problems which can be formulated using integral or differential-integral equations (or systems of such equations), like quantum chromodynamics evolution of parton distributions inside a proton which can be described by the Gribov-Lipatov-Altarelli-Parisi (GLAP) equations [6], estimation of cross section for a typical HEP interaction (numerical integration problem), and data representation using histograms (numerical interpolation problem).Numerical methods used for solving differential equations or integrals are based on classical quadratures and Monte Carlo (MC) techniques.These allow generating events in terms of particle flavors and four-momenta, which is particularly useful for experimental applications.For example, MC techniques for solving the GLAP equations are based on simulated Markov chains (random walks), which have the advantage of filtering and smoothing the state vector for estimating parameters.
In practice, several MC event generators and simulation tools are used.For example, HERWIG (http://projects .hepforge.org/herwig/)project considers angular-ordered parton shower, cluster hadronization (the tool is implemented using Fortran), PYTHIA (http://www.thep.lu.se/ torbjorn/Pythia.html)project is oriented on dipoletype parton shower and string hadronization (the tool is implemented in Fortran and C++), and SHERPA (http://projects.hepforge.org/sherpa/)considers dipole-type parton shower and cluster hadronization (the tool is implemented in C++).An important tool for MC simulations is GATE (GEANT4 Application for Tomographic Emission), a generic simulation platform based on GEANT4.GATE provides new features for nuclear imaging applications and includes specific modules that have been developed to meet specific requirements encountered in SPECT (Single Photon Emission Tomography) and PET (Positron Emission Tomography).
The main contributions of this paper are as follows: (i) introduction and analysis of most important modeling methods used in High Energy Physics; (ii) identifying and describing of the computational numerical methods for High Energy Physics; (iii) presentation of the main challenges for Big Data processing.
The paper is structured as follows.Section 2 introduces the computational methods used in HEP and describes the performance evaluation of parallel numerical algorithms.Section 3 discusses the new challenge for Big Data Science generated by HEP and HPC.Section 4 presents the conclusions and general open issues.

Computational Methods Used in High Energy Physics
Computational methods are used in HEP in parallel with physical experiments to generate particle interactions that are modeled using vector of events.This section presents general approach of event generation, simulation methods based on Monte Carlo algorithms, Markovian Monte Carlo chains, methods that describe unfolding processes in particle physics, Random Matrix Theory as support for particle spectrum, and kernel estimation that produce continuous estimates of the parent distribution from the empirical probability density function.The section ends with performance analysis of parallel numerical algorithms used in HEP.  1 presents the general approach of event generation, detection, and reconstruction.The physical model is used to create simulation process that produces different type of events, clustered in vector of events (e.g., the fourth type of events in LHC experiments).

General Approach of Event
In parallel, the real experiments are performed.The detectors identify the most relevant events and, based on reconstruction techniques, vector of events is created.The detectors can be real or simulated (software tools) and the reconstruction phase combines real events with events detected in simulation.At the end, the final result is compared with the simulation model (especially with generated vectors of events).The model can be corrected for further experiments.The goal is to obtain high accuracy and precision of measured and processed data.
Software tools for event generation are based on random number generators.There are three types of random numbers: truly random numbers (from physical generators), pseudorandom numbers (from mathematical generators), and quasirandom numbers (special correlated sequences of numbers, used only for integration).For example, numerical integration using quasirandom numbers usually gives faster convergence than the standard integration methods based on quadratures.In event generation pseudorandom numbers are used most often.
The most popular HEP application uses Poisson distribution combined with a basic normal distribution.The Poisson distribution can be formulated as with The art of event generation is to use appropriate combinations of various random number generation methods in order to construct an efficient event generation algorithm being solution to a given problem in HEP.

Monte Carlo Simulation and Markovian Monte Carlo
Chains in HEP.In general, a Monte Carlo (MC) method is any simulation technique that uses random numbers to solve a well-defined problem, .If  is a solution of the problem  (e.g.,  ∈   or  has a Boolean value), we define F, an estimation of , as F = ({ 1 ,  2 , . . .,   }, . ..),where {  } 1≤≤ is a random variable that can take more than one value and for which any value that will be taken cannot be predicted in advance.If () is the probability density function, () = [ <   <  + ], the cumulative distributed function is () is a monotonically nondecreasing function with all values in [0, 1].The expectation value is  () = ∫  ()  () = ∫  ()  () . ( And the variance is numbers   randomly, with the probability density function uniform on a specific interval (, ), each   being used to evaluate (  ).For large  (consistent estimator), The properties of a MC estimator are being normally distributed (with Gaussian density); the standard deviation is  = √()/; MC is unbiased for all  (the expectation value is the real value of the integral); the estimator is consistent if () < ∞ (the estimator converges to the true value of the integral for every large ); a sampling phase can be applied to compute the estimator if we do not know anything about the function ; it is just suitable for integration.The sampling phase can be expressed, in a stratified way, as MC estimations and MC event generators are necessary tools in most of HEP experiments being used at all their steps: experiments preparation, simulation running, and data analysis.
An example of MC estimation is the Lorentz invariant phase space (LIPS) that describes the cross section for a typical HEP process with  particle in the final state. Consider where  is the matrix describing the interaction between particles and   is the element of LIPS.We have the following estimation: e + (p 1 ) e − (p 2 ) where  is total four-momentum of the -particle system;   and   are four-momenta and mass of the final state particles;  (4) (−∑  =1   ) is the total energy momentum conservation; ( 2  −  2  ) is the on-mass-shell condition for the final state system.Based on the integration formula obtain the iterative form for cross section: which can be numerical integrated by using the recurrence relation.As result, we can construct a general MC algorithm for particle collision processes.
Example 1.Let us consider the interaction: where Higgs boson contribution is numerically negligible.Figure 3 describes this interaction (Φ is the azimuthal angle,  the polar angle, and  1 ,  2 ,  1 ,  2 are the four-momenta for particles).The cross section is where Ω =  cos Φ,  =  2 /4 (fine structure constant),  = ( 0 1 +  0 2 ) 2 is the center of mass energy squared, and  1 () and  2 () are constant functions.For pure processes we have  1 () = 1 and  2 () = 0, and the total cross section becomes We introduce the following notation: and let us consider ρ(cos , Φ) an approximation of (cos , Φ).Then σ = ∬ Φ cos  ρ.Now, we can compute where (cos , Φ) = (cos , Φ)/ ρ(cos , Φ) and ⟨⟩ ρ is the estimation of  based on ρ.Here, the MC estimator is and the standard deviation is The final numerical result based on MC estimator is As we can show, the principle of a Monte Carlo estimator in physics is to simulate the cross section in interaction and radiation transport knowing the probability distributions (or an approximation) governing each interaction of elementary particles.
Based on this result, the Monte Carlo algorithm used to generate events is as follows.It takes as input ρ(cos , Φ) and in a main loop considers the following steps: (1) generate (cos , Φ) peer from ρ; (2) compute four-momenta  1 ,  2 ,  1 ,  2 ; (3) compute  = /ρ.The loop can be stopped in the case of unweighted events, and we will stay in the loop for weighted events.As output, the algorithm returns fourmomenta for particle for weighted events and four-momenta and an array of weights for unweighted events.The main issue is how to initialize the input of the algorithm.Based on  formula (for  1 () = 1 and  2 () = 0), we can consider as input ρ(cos , Φ) = ( 2 /4)(1 + cos 2 ).Then σ = 4 2 /3.
In HEP theoretical predictions used for particle collision processes modeling (as shown in presented example) should be provided in terms of Monte Carlo event generators, which directly simulate these processes and can provide unweighted (weight = 1) events.A good Monte Carlo algorithm should be used not only for numerical integration [7] (i.e., provide weighted events) but also for efficient generation of unweighted events, which is very important issue for HEP.

Markovian Monte-Carlo Chains. A classical Monte
Carlo method estimates a function  with F by using a random variable.The main problem with this approach is that we cannot predict any value in advance for a random variable.In HEP simulation experiments the systems are described in states [8].Let us consider a system with a finite set of possible states  1 ,  2 , . .., and   the state at the moment .The conditional probability is defined as where the mappings ( 1 ,  1 ), . . .,(  ,   ) can be interpreted as the description of system evolution in time by specifying a specific state for each moment of time.
The system is a Markov chain if the distribution of   depends only on immediate predecessor  −1 and it is independent of all previous states as follows: To generate the time steps ( 1 ,  2 , . . .,   ) we use the probability of a single forward Markovian step given by ( |   ) with the property ∫ The main result of Algorithm 3 is that ( max ) follows a Poisson distribution: We can consider the 1-dimensional Monte Carlo Markovian Algorithm as a method used to iteratively generate the systems' states (codified as a Markov chain) in simulation experiments.According to the Ergodic Theorem for Markov chains, the chain defined has a unique stationary probability distribution [9,10].
Figures 4 and 5 present the running of Algorithm 3.According to different values of parameter  used to generate the next step, the results are very different, for 1000 iterations.Figure 4 for  = 1 shows a profile of the type of noise.For  = 10, 100, 1000 profile looks like some of the information is filtered and lost.The best results are obtained for  = 0.01

Advances in High Energy Physics
(1) Generate  1 according with  ( 1 ) =  ( ⊳ Discard all generated and computed data. ⊳ The algorithm ends here.(10)  and  = 0.1 and the generated values can be easily accepted for MC simulation in HEP experiments.
Figure 5 shows the acceptance rate of values generated with parameter  used in the algorithm.And parameter values are correlated with Figure 4. Results in Figure 5 show that the acceptance rate decreases rapidly with increasing value of parameter .The conclusion is that values must be kept small to obtain meaningful data.A correlation with the normal distribution is evident, showing that a small value for the mean square deviation provides useful results.

Performance of Numerical Algorithms Used in MC
Simulations.Numerical methods used to compute MC estimator use numerical quadratures to approximate the value of the integral for function  on a specific domain by a linear compilation of function values and weights {  } 1≤≤ as follows: We can consider a consistent MC estimator  a classical numerical quadrature with all   = 1.Efficiency of integration methods for 1 dimension and for  dimensions is presented in Table 1.We can conclude that quadrature methods are difficult to apply in many dimensions for variate integration domains (regions) and the integral is not easy to be estimated.As practical example, in a typical high-energy particle collision there can be many final-state particles (even hundreds).If we have  final state particle, we face with  = 3 − 4 dimensional phase space.As numerical example, for  = 4 we have  = 8 dimensions, which is very difficult approach for classical numerical quadratures.

Unfolding Processes in Particle Physics and Kernel Estimation in HEP.
In particle physics analysis we have two types of distributions: true distribution (considered in theoretical models) and measured distribution (considered in experimental models, which are affected by finite resolution and limited acceptance of existing detectors).A HEP interaction process starts with a true knows distribution and generate a measured distribution, corresponding to an experiment of Acceptance rate (%) a well-confirmed theory.An inverse process starts with a measured distribution and tries to identify the true distribution.These unfolding processes are used to identify new theories based on experiments [11].

Unfolding Processes in Particle
Physics.The theory of unfolding processes in particle physics is as follows [12].For a physics variable  we have a true distribution () mapped in  and an -vector of unknowns and a measured distribution () (for a measured variable ) mapped in an -vector of measured data.A response matrix  ∈  × encodes a Kernel function (, ) describing the physical measurement process [12][13][14][15].The direct and inverse processes are described by the Fredholm integral equation [16] of the first kind, for a specific domain Ω, ∫ Ω  (, )  ()  =  () .
In particle physics the Kernel function (, ) is usually known from a Monte Carlo sample obtained from simulation.A numerical solution is obtained using the following linear equation:  = .Vectors  and  are assumed to be 1-dimensional in theory, but they can be multidimensional in practice (considering multiple independent linear equations).In practice, also the statistical properties of the measurements are well known and often they follow the Poisson statistics [17].To solve the linear systems we have different numerical methods.First method is based on linear transformation  =  # .If  =  then  # =  −1 and we can use direct Gaussian methods, iterative methods (Gauss-Siedel, Jacobi or SOR), or orthogonal methods (based on Householder transformation, Givens methods, or Gram-Schmidt algorithm).If  >  (the most frequent scenario) we will construct the matrix  # = (  ) −1   (called pseudoinverse Penrose-Moore).In these cases the orthogonal methods offer very good and stable numerical solutions.

Random Matrix
The numerical methods used for eigenvalues computation are the QR method and Power methods (direct and indirect).The QR method is a numerical stable algorithm and Power method is an iterative one.The RMT can be used for many body systems, quantum chaos, disordered systems, quantum chromodynamics, and so forth.

Kernel Estimation in HEP.
Kernel estimation is a very powerful solution and relevant method for HEP when it is necessary to combine data from heterogeneous sources like MC datasets obtained by simulation and from Standard Model expectation, obtained from real experiments [19].For a set of data {  } 1≤≤ with a constant bandwidth ℎ (the difference between two consecutive data values), called the smoothing parameter, we have the estimation where  is an estimator.For example, a Gauss estimator with mean  and standard deviation  is and has the following properties: positive definite and infinitely differentiable (due to the exp function), and it can be defined for an infinite supports ( → ∞).The kernel is a nonparametric method, which means that ℎ is independent of dataset and for large amount of normally distributed data we can find a value for ℎ that minimizes the integrated squared error of f().This value for bandwidth is computed as The main problem in Kernel Estimation is that the set of data {  } 1≤≤ is not normally distributed and in real experiments the optimal bandwidth it is not known.An improvement of presented method considers adaptive Kernel Estimation proposed by Abramson [20], where ℎ  = ℎ/√(  ) and  are considered global qualities for dataset.The new form is and the local bandwidth value that minimizes the integrated squared error of f () is where f is the normal estimator.Kernel estimation is used for event selection to confidence level evaluation, for example, in Markovian Monte Carlo chains or in selection of neural network output used in experiments for reconstructed Higgs mass.In general, the main usage of Kernel estimation in HEP is searching for new particle, by finding relevant data in a large dataset.
A method based on Kernel estimation is the graphical representation of datasets using advanced shifted histogram algorithm (ASH).This is a numerical interpolation for large datasets with the main aim of creating a set of  histograms  = {  }, with the same bin-width ℎ.Algorithm 4 presents the steps of histograms generation starting with a specific interval [, ], a number of points  in this interval, and a number of bins and a number of values used for kernel estimation, . Figure 6 shows the results of kernel estimation if function  = −(1/2) 2 on [0, 1] and graphical representation with a different number of bins.The values on vertical axis are aggregated in step 17 of Algorithm 4 and increase with the number of bins.
(i) The speed-up, (), represents how a parallel algorithm is faster than a corresponding sequential algorithm.The speed-up is defined as () =  1 ()/  ().
There are special bounds for speed-up [23]: () ≤  p/( + p − 1), where p =  1 / ∞ is the average parallelism (the average number of busy processors given unbounded number of processors).Usually () ≤ , but under special circumstances the speed-up can be () >  [24].Another upper bound is established by the Amdahls law: () = ( + ((1 − )/)) 1/2 ≤ 1/ where  is the fraction of a program that is sequential.The upper bound is considered for a 0 time of parallel fraction.
(iii) The isoefficiency is the growth rate of workload   () =   () in terms of number of processors to keep efficiency fixed.If we consider  1 ()−  () = 0 for any fixed efficiency  we obtain  = ().This means that we can establish a relation between needed number of processors and problem size.For example for the parallel sum of  numbers using  processors we have  ≈ ( +  log ), so  = Θ( log ).Numerical algorithms use for implementation a hypercube architecture.We analyze the performance of different numerical operations using the isoefficiency metric.For the hypercube architecture a simple model for intertask communication considers  com =   +   where   is the latency (the time needed by a message to cross through the network),   Table 2: Isoefficiency for a hypercube architecture:  = Θ( log ) and  = Θ((log ) 2 ).We marked with ( * ) the limitations imposed by Formula (33).

Advances in High Energy Physics
Scenario Architecture size ()  = Θ( log )  = Θ((log ) is the time needed to send a word (1/  is called bandwidth), and  is the message length (expressed in number of words).
The word size depends on processing architecture (usually it is two bytes).We define   as the processing time per word for a processor.We have the following results.
The optimality is computed using (ii) Scalar product (internal product)    = ∑  =1     .The isoefficiency is written as (iii) Matrix-vector product  = ,   = ∑  =1     .The isoefficiency is written as Table 2 presented the amount of data that can be processed for a specific size.The cases that meet the upper bound  ≥ 1.05 × 10 6 are marked with ( * ).To keep the efficiency high for a specific parallel architecture, HPC algorithms for particle physics introduce upper limits for the amount of data, which means that we have also an upper bound for Big Data volume in this case.
The factors that determine the efficiency of parallel algorithms are task balancing (work-load distribution between all used processors in a system → to be maximized); concurrency (the number/percentage of processors working simultaneously → to be maximized); and overhead (extra work for introduce by parallel processing that does not appear in serial processing → to be minimized).

New Challenges for Big Data Science
There are a lot of applications that generate Big Data, like social networking profiles, social influence, SaaS & Cloud Apps, public web information, MapReduce scientific experiments and simulations (especially HEP simulations), data warehouse, monitoring technologies, and e-government services.Data grow rapidly, since applications produce continuously increasing volumes of both unstructured and structured data.The impact on the approach to data processing, transfer, and storage is the need to reevaluate the way and solutions to better answer the users' needs [25].In this context, scheduling models and algorithms for data processing have an important role becoming a new challenge for Big Data Science.HEP applications consider both experimental data (that are application with TB of valuable data) and simulation data (with data generated using MC based on theoretical models).The processing phase is represented by modeling and reconstruction in order to find properties of observed particles (see Figure 8).Then, the data are analyzed a reduced to a simple statistical distribution.The comparison of results obtained will validate how realistic is a simulation experiment and validate it for use in other new models.
Since we face a large variety of solutions for specific applications and platforms, a thorough and systematic analysis of existing solutions for scheduling models, methods, and algorithms used in Big Data processing and storage environments is needed.The challenges for scheduling impose specific requirements in distributed systems: the claims of the resource consumers, the restrictions imposed by resource owners, the need to continuously adapt to changes of resources' availability, and so forth.We will pay special attention to Cloud Systems and HPC clusters (datacenters) as reliable solutions for Big Data [26].Based on these requirements, a number of challenging issues are maximization of system throughput, sites' autonomy, scalability, faulttolerance, and quality of services.
When discussing Big Data we have in mind the 5 Vs: Volume, Velocity, Variety, Variability, and Value.There is a clear need of many organizations, companies, and researchers to deal with Big Data volumes efficiently.Examples include web analytics applications, scientific applications, and social networks.For these examples, a popular data processing engine for Big Data is Hadoop MapReduce [27].The main problem is that data arrives too fast for optimal storage and indexing [28].There are other several processing platforms for Big Data: Mesos [29], YARN (Hortonworks, Hadoop YARN: A next-generation framework for Hadoop data processing, 2013 (http://hortonworks.com/hadoop/yarn/)), Corona (Corona, Under the Hood: Scheduling MapReduce jobs more efficiently with Corona, 2012 (Facebook)), and so forth.A review of various parallel and distributed programming paradigms, analyzing how they fit into the Big Data era is presented in [30].The challenges that are described for Big Data Science on the modern and future Scientific Data Infrastructure are presented in [31].The paper introduces the Scientific Data Lifecycle Management (SDLM) model that includes all the major stages and reflects specifics in data management in modern e-Science.The paper proposes the SDI generic architecture model that provides a basis for building interoperable data or project centric SDI using modern technologies and best practices.This analysis highlights in the same time performance and limitations of existing solutions in the context of Big Data.Hadoop can handle many types of data from disparate systems: structured, unstructured, logs, pictures, audio files, communications records, emails, and so forth.Hadoop relies on an internal redundant data structure with cost advantages and is deployed on industry standard servers rather than on expensive specialized data storage systems [32].The main challenges for scheduling in Hadoop are to improve existing algorithms for Big Data processing: capacity scheduling, fair scheduling, delay scheduling, longest approximate time to end (LATE) speculative execution, deadline constraint scheduler, and resource aware scheduling.
Data transfer scheduling in Grids, Cloud, P2P, and so forth represents a new challenge that is the subject to Big Data.In many cases, depending on applications architecture, data must be transported to the place where tasks will be executed [33].Consequently, scheduling schemes should consider not only the task execution time, but also the data transfer time for finding a more convenient mapping of tasks [34].Only a handful of current research efforts consider the simultaneous optimization of computation and data transfer scheduling.The big-data I/O scheduler [35] offers a solution for applications that compete for I/O resources in a shared MapReduce-type Big Data system [36].The paper [37] reviews Big Data challenges from a data management respective and addresses Big Data diversity, Big Data reduction, Big Data integration and cleaning, Big Data indexing and query, and finally Big Data analysis and mining.On the opposite side, business analytics, occupying the intersection of the worlds of management science, computer science, and statistical science, is a potent force for innovation in both the private and public sectors.The conclusion is that the data is too heterogeneous to fit into a rigid schema [38].
Another challenge is the scheduling policies used to determine the relative ordering of requests.Large distributed systems with different administrative domains will most likely have different resource utilization policies.For example, a policy can take into consideration the deadlines and budgets, and also the dynamic behavior [39].HEP experiments are usually performed in private Clouds, considering dynamic scheduling with soft deadlines, which is an open issue.
The optimization techniques for the scheduling process represent an important aspect because the scheduling is a main building block for making datacenters more available to user communities, being energy-aware [40] and supporting multicriteria optimization [41].An example of optimization is multiobjective and multiconstrained scheduling of many tasks in Hadoop [42] or optimizing short jobs [43].The cost effectiveness, scalability, and streamlined architectures of Hadoop represent solutions for Big Data processing.Considering the use of Hadoop in public/private Clouds; a challenge is to answer the following questions: what type of data/tasks should move to public cloud, in order to achieve a cost-aware cloud scheduler?And is public Cloud a solution for HEP simulation experiments?
The activities for Big Data processing vary widely in a number of issues, for example, support for heterogeneous resources, objective function(s), scalability, coscheduling, and assumptions about system characteristics.The current research directions are focused on accelerating data processing, especially for Big Data analytics (frequently used in HEP experiments), complex task dependencies for data workflows, and new scheduling algorithms for real-time scenarios.

Conclusions
This paper presented general aspects about methods used in HEP: Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, Random Matrix Theory used in analysis of particles spectrum.For each method the proper numerical method had been identified and analyzed.All of identified methods produce data-intensive applications, which introduce new challenges and requirements for Big Data systems architecture, especially for processing paradigms and storage capabilities.This paper puts together several concepts: HEP, HPC, numerical methods, and simulations.HEP experiments are modeled using numerical methods and simulations: numerical integration, eigenvalues computation, solving linear equation systems, multiplying vectors and matrices, interpolation.HPC environments offer powerful tools for data processing and analysis.Big Data was introduced as a concept for a real problem: we live in a dataintensive world, produce huge amount of information, we face with upper bound introduced by theoretical models.

Figure 2 :
Figure 2: General approach of event generation, detection, and reconstruction.

Figure 5 :
Figure 5: Analysis of acceptance rate for 1-dimensional Monte Carlo Markovian algorithm for different  values.

Figure 7 :
Figure 7: Residual check analysis for solving  =  system in HPL2.0 using simple and double precision representation.
Algorithm 1: Random number generation for Poisson distribution using many random generated numbers with normal distribution (RND).
Generate   according with  (  |  −1 ) (14) if   <  max then ⊳ Generate a new state and new probability.

Table 1 :
Efficiency of integration methods for 1 dimension and for  dimensions.