A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map XizhongWang and

We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM).The parallel algorithm is designed with a master/slave communication model with theMessage Passing Interface (MPI).The algorithm is suitable not only for multicore processors but also for the singleprocessor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.


Introduction
There is no question that multicore processors have become the mainstream.Principal microprocessor manufacturers Advanced Micro Devices (AMD) and Intel released multicore chips for PCs, laptops, and servers.The main reason is the need for less heat and more energy efficiency.At first, the fastest chips were heating up faster than the average fan could cool them down.On the other hand, single-core chips rely on tightly packed power-hungry transistors to get the job done.A multicore chip is easier to cool because the CPUs are simpler and use fewer transistors.Thus, they use less power and dissipate less heat overall.Each multicore CPU can work on a different task at the same time.However, most of the multicore systems will require new tools, new algorithms, and a new way of looking at programming.
With the rapid development of the Internet, a lot of various digital documents, such as text, image, video, or audio, travel from one destination to another via the network line.Some of these documents might be sensitive and confidential therefore need to be protected.The most effective method is to encrypt the information so that the only authorized users with the key can decrypt them.
In recent years, a great deal of chaos-based cryptographic schemes has been proposed [1][2][3][4][5].There exists an interesting relationship between chaos and cryptosystems.The chaotic properties can be found in the classic Shannon's paper on cryptography [6], for example, the ergodicity, the sensitivity to initial condition or control parameters, deterministic dynamics, and structure complexity [7][8][9][10].Unfortunately, widely used traditional cryptographic algorithms, such as Data Encryption Standard (DES), Advanced Encryption Standard (AES) and some chaotic cryptosystems meet a single-core processor.Sequential algorithms assign the tasks to be run serially on the processor.How to take advantage of multicore processors is a challenge.
As the main goal of this paper, a parallel algorithm of chaotic cryptographic scheme using piecewise linear chaotic map and Fibonacci sequences is introduced.The PWLCM is the simplest kind of chaotic maps from the viewpoint of realization, which has many desired dynamical properties.The PWLCM has uniform invariant density and good correlation functions [11].In fact, the PWLCM has been widely used in chaotic cryptosystems [12][13][14][15][16][17][18][19][20][21].However, dynamical degradation destroys the uniform distribution of the key stream generated from the chaotic iterations of the PWLCM.When chaotic systems are realized in digital computers with finite computing precisions, most dynamical properties of chaos systems are different from the ones in the continuous field.The quantization errors are introduced into chaotic evolution of digital chaotic systems at every discrete step.It makes pseudo-orbit depart from the real ones of the continuous field.Because of the sensitivity of chaotic systems to initial conditions and control parameters, the pseudo-orbit in finite precision can be distinguished from the theoretical ones even after a few numbers of iterations.Therefore, Li et al. pointed out that Zhou's chaotic cryptosystem was either not secure enough from strict cryptographic viewpoint [22].The reason lies in the dynamical degradation of the computerized PWLCM.The dynamical degradation of digital chaotic systems reduces the security of the designed chaotic cryptosystem.In order to overcome this problem, random perturbation-based approach has been used in some applications [23,24].In this paper, we use the Fibonacci sequence to convert values of the piecewise linear chaotic map into secret keys.
A plaintext as a sequence is divided into blocks or fragments, which are encrypted/decrypted in multicore processors.In the parallel system, master process assigns tasks to slave processes, slave processes encrypted/decrypted blocks.Both the master and slave processes communicate data by using the Message Passing Interface (MPI).
As mentioned above, the traditional encryption algorithms are difficult to deal with large size of files and multimedia by using the computing sources of multicore processors.Besides, a new algorithm can be suitable not only for the traditional single processor but also for multicore processors, because some users could use the single processor for encryption, when the information might be decrypted at the multicore processors.Based on this observation, we propose a parallel chaotic cryptosystem.Our major contributions are highlighted below.
(1) The algorithm can fully exploit the computing sources of multicore processors, which is designed with master/slave communications with the MPI.
(2) The MPI can be realized in the parallel computers with distributed memory and shared memory.Therefore, the algorithm has the general-purpose characteristic for the parallel platform.
(3) To the best of our knowledge, for the first time, it is reported that the chaotic cryptosystem combines the piecewise linear chaotic map with the Message Passing Interface in the parallel model.
The rest of this paper is organized as follows.Section 2 gives the chaotic cryptographic scheme.In Section 3, we describe the overall framework of the proposed parallel encryption algorithm.Section 4 evaluates the performance of the parallel algorithm.The last section concludes the paper and gives some remarks.

Chaotic Cryptosystem
The chaos has an outer complex behavior produced by the internal random property of the nonlinear definite system, which is a pseudorandom movement while it looks like a random process.In particular, many pseudorandom number generators are based on chaotic maps.This approach produces a pseudorandom sequence from chaotic maps.
A piecewise linear map is a map composing of multiple linear segments, where limited breaking points are allowed.A typical example of piecewise linear map is the skew tent map: where  ∈ (0, 1) is the control parameter.For any control parameter  ∈ (0, 1), the above piecewise linear map has a positive Lyapunov exponent and thus is always chaotic [7,25].
Figure 1 shows the control parameter of the skew tent map.As long as the control parameter does not change the onto property of each linear segment, the obtained chaotic map will be good to use in chaotic cryptosystems.
Fibonacci sequence is employed to convert values of the skew tent map into integer number.Fibonacci sequence is denoted as the following equation: where   and  are a positive integer number.Cryptosystems are typically divided into two generic types: symmetric key and asymmetric key.The symmetric key uses the same secret key both for encryption and decryption.The symmetric key is very fast and appropriate for handling large amounts of data.The proposed cryptosystem belongs to symmetric key, which is composed of the following two parts: the key generation and the encryption/decryption process.The simplified diagram of the cryptosystem is shown in Figure 2.
The secret key is described as follows:

Parallel Algorithm
The parallel algorithm is implemented in a client-server paradigm.The tasks are allocated to a group of slave processes by a master process, which may also perform some of the tasks.Figure 3 shows that the client-server paradigm can be implemented in the multicore processors.A plaintext is divided into blocks or fragments, which will be encrypted or decrypted.As we know, a parallel client-server implementation might just run multiple copies of the code in slave processes after the master process assignes different blocks to each slave process.In other words, the master process will assign a new block to a slave process after the slave process finishes its task.
The parallel algorithm is designed with a master/slave communication model in which a designated master process controls, partitions, and distributes data to slave processes.We assume that the parallel system contains  bce base core equivalents (BCE), where a single BCE represents a real CPU in multicore processors.Let  pvp be the number of the parallel virtual processes (PVP) which include a master process and some slave processes.The parallel algorithm requires the master process to initialize parameters of the parallel system, for example, the control parameter of the chaotic map, initial values of the Fibonacci sequence and the chaotic map, the length of blocks, the number of PVP, and so on, before assigning works load to slave processes.Furthermore, the master process is responsible for input and output of the encryption/decryption file when reading the original file from a disk and writing the encrypted/decrypted file to a disk.
There exist three cases:  bce >  pvp ,  bce =  pvp , and  bce <  pvp .At first, when  bce =  pvp , one of the BCEs runs the master process, which means the master processor.The other BCEs run slave processes and represent slave processors.In Figure 4, the parallel system has four parallel virtual processes which include the master processor and three slave processors.That is to say, each BCE acts as an individual processor.Communication between the  master and slave processors uses the two MPI functions "MPI Send" and "MPI Recv, " which are the basic point-topoint communication routines in MPI.For communication to occur, the sending processor must call MPI Send and the receiving MPI Recv, respectively.Note that, when  bce >  pvp , in the parallel system, some idle BCEs exist.When  bce <  pvp , some BCEs might run several slave processes.In other words, one or more of slave processors are located in the same BCE so that the tasks are to be run in order.
In Figure 5, the single-core processor includes three PVPs, which are a master processor and two slave processors.The tasks, which should be implemented by the slave processors, run serially after the master processor assigned them.In particular, the proposed algorithm can be run not only on traditional machine or a single-core processor but also on the multicore processors.
The parallel algorithm is implemented by using the C programming language and the MPI library.Algorithm 1.The encryption process in the slave process consists of the following steps.
Step 1. Receive the plaintext block from the master process (MPI Recv).
Step 3. Encrypt the plaintext block.
Step 4. Copy the encrypted block to data buffer.
Step 5. Send the encrypted plaintext block to the master process (MPI Send).

Mathematical Problems in Engineering
Algorithm 2. Master process consists of the following steps.
Step 1. Initialize MPI and parameters which include the initial value of , the control parameter of , Fibonacci sequence, the length of blocks, and so on.
Step 2 1.If the processor is master, send parameters of , the control parameter,  1 ,  2 , and the length of blocks, initial iteration number of the chaotic map and Fibonacci sequence, and so on to slave (MPI Send).
Step 2 2. If the processor is slave, receive parameters from master (MPI Recv).
Step 3. Initialize the chaotic map, Fibonacci sequence and create data buffer.
Step 4 1.If processor is master, master reads a block of original file from a disk.
Step 4 2. Send the block to slave processors (MPI Send).
Step 4 3. Create data buffer in order to receive encrypted block from slave processors (MPI Recv).
Step 4 4. Write the encrypted block to the disk.
Step 4 5.If the file pointer is the end of the file; end encryption process; else go to Step 4 1.
Step 5.If processor is slave, execute the encryption process (Algorithm 1).
Master process can determine the size, or number of the parallel virtual processes by using MPI function "MPI Comm size." A processor can determine its rank by using MPI function "MPI Comm rank."

Experimental Analysis
4.1.Chaotic Map.Generally speaking, there exists the dynamical degradation of digital chaotic systems.Therefore, the degradation lowers the security of the designed chaotic ciphers.In the C programming language, floating-point types include two sizes: float (single precision) and double (double precision).
Single-precision values with float type have 4 bytes.Double-precision values with double type have 8 bytes.In order to overcome the degradation of the computed finite precision, double type is used for computing values of the skew tent map.The long integer type is employed for calculating Fibonacci sequence.For determining the initial iteration number of the skew tent map, we simulate the chaotic sequences with slightly different initial values.Let  be iteration number.
Figure 6(a) shows the difference of the generated initial value  0 between the case of  0 and the case of  0 + 10 −12 for  = 0.527.Figure 6(b) shows the difference with 10 −12 for initial value  0 .The horizontal axis shows the number of iterations ( = 100), and the vertical axis shows the difference of the generated sequences.It is obvious that the initial iteration number at least is larger than 40.
The iteration number corresponds to the key sensitivity.The skew tent map has the sensitivity to changes in the control parameter .Small variations of keys produce large changes by iterating the skew tent map.Therefore, cipher breaking becomes difficult by increasing the numbers of iterations.However, if the number of iterations is small, there is not much difference between the variations of the keys.It is undesirable to be used as keys.
Original Lena's image and its histogram are shown in Figure 7.The encrypted Lena's image and its histogram are shown in Figure 8.
From Figure 8, we can see that the grayscale distribution of the encrypted image has a good balance property, which is secure against known plaintext attack.

Performance of the Parallel
Algorithm.Amdahl's law is useful for analysing a system performance that results from two individual rates of execution, such as parallel or serial operations.Amdahl's law assumes that the computation problem size does not change when running on enhanced machines.In other words, the problem size remains the same when parallelized [26].
Amdahl's law states the following: where  is the proportion of a program that can be made parallel and (1 − ) is the proportion that cannot be parallelized (remains serial); there exist  processors.The speedup of a program in multiple processors is limited by the time needed for the sequential fraction of the program.The time () an algorithm takes to finish when being executed on  thread(s) of execution corresponds to () = (1)( + (1 − )/), where  is the number of threads for execution and  is the fraction of the algorithm which is strictly serial.
The theoretical speedup that can be obtained by executing a given algorithm on a system capable of executing  threads of execution is () = (1)/().
In fact, the execution time  of a program in parallel system includes the initializing time  init , the computing time  comp , the communication time  comm , and the system synchronization time  para .It is described as follows: init consists of the time of the initial iterating the chaotic map and the Fibonacci sequence, initializing data buffers, and so on.We do not take account of the time of reading and writing files and blocks, because the different disks have different access times.The communication time and the system synchronization time depend on the parallel system, which might change when running on enhanced machines.According Amdahl's law, the execution time of a program is focused on the initializing time and the computing time.
The using memory of an algorithm is described as  =  lenFile + sysExe , where  sysExe is the memory for executing the algorithm, for example, to compute chaotic map and obtain keys. lenFile is the length of the file which is encrypted or decrypted.In the multicore processors, the memory architecture is the model of the shared memory.The length of the file for encryption (decryption) is the main part of the using memory.The greater the length of the encrypted file is, the larger the using memory will be.
In our experiments, the proposed algorithm was tested on a system equipped with Pentium Dual-Core CPU at 2.60 GHz, 2 GB RAM, and MS-Windows XP Professional.In Figure 9, the horizontal axis shows the length of a file, and the vertical axis shows the speedup obtained with lengths of the file: 100, 1000, 10000, and 1000000 bytes.For each execution time of a program, ten trial runs were conducted, and the total run time was obtained by computing the average.As it can be seen, the speedup of the algorithm slightly increases for the length of the file with 100 bytes.It can be explained by the fact that the computing time is a small part in the execution time of the program.In other words, the parallel proportion of a program is small.When the size of the file grows, the parallel proportion of a program increases.
The results show that, for larger length of a file, the parallel algorithm is more effective than the serial one.
In [27], the parallel algorithm is based on the logistic map.From Figure 10, we can see that our algorithm is better than the compared algorithm with a different length of files.

Conclusions
In this paper, we describe the parallelization of the chaosbased encryption algorithm.The algorithm is implemented in a client-server paradigm.It is suitable not only for multicore systems but also for the traditional single-core processor.The experiments show that the application of the parallel algorithm for multicore computers would considerably boost the time of the data encryption and decryption.We have confirmed that the parallel algorithm is of better performance than the sequential ones for encrypting or decrypting a larger size of the files, such as video, audio, and images.

Figure 1 :
Figure 1: Skew tent map with a control parameter .