Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies

MapReduce is the preferred cloud computing framework used in large data analysis and application processing. MapReduce frameworks currently in place suffer performance degradation due to the adoption of sequential processing approaches with little modification and thus exhibit underutilization of cloud resources. To overcome this drawback and reduce costs, we introduce a Parallel MapReduce (PMR) framework in this paper. We design a novel parallel execution strategy of Map and Reduce worker nodes. Our strategy enables further performance improvement and efficient utilization of cloud resources execution of Map and Reduce functions to utilize multicore environments available with computing nodes. We explain in detail makespan modeling and working principle of the PMR framework in the paper. Performance of PMR is compared with Hadoop through experiments considering three biomedical applications. Experiments conducted for BLAST, CAP3, and DeepBind biomedical applications report makespan time reduction of 38.92%, 18.00%, and 34.62% considering the PMR framework against Hadoop framework. Experiments' results prove that the PMR cloud computing platform proposed is robust, cost-effective, and scalable, which sufficiently supports diverse applications on public and private cloud platforms. Consequently, overall presentation and results indicate that there is good matching between theoretical makespan modeling presented and experimental values investigated.


Introduction
Delivery model of data intensive applications/services on cloud platforms is the new paradigm. Scalable storage and computing capabilities of cloud platforms aid delivery models with various aspects. The cloud is maintained using distributed computing frameworks capable of handling and processing a large amount of data. Of all cloud frameworks available [1][2][3][4][5], Hadoop MapReduce is the most widely adopted [6,7] owing to its ease of deployment, scalability, and open-source nature.
The Hadoop MapReduce model predominantly consists of the following phases: Setup, Map, Shuffle, Sort, and Reduce, which is shown in Figure 1. The Hadoop frameworks consist of a master node and a cluster of computing nodes. Jobs submitted to Hadoop are further distributed into Map and Reduce tasks. In the Setup phase, input data of a job to be processed (residing generally on the Hadoop Distributed File Systems (HDFS)) is logically partitioned into homogenous volumes called chunks for the Map worker nodes. Hadoop divides each MapReduce job into a set of tasks where each chunk is processed by the Map worker. The Map phase takes input as key/value pair as ( 1 , V 1 ) and generates a list of (  Final k 3 ,  3 k 3 ,  3 (k 1 ,  1 ) ( k 2 ,  2 ) ( k 3 ,  3 ) (k 1 ,  1 ) ( k 2 ,  2 ) ( k 3 ,  3 ) The Hadoop MapReduce platform suffers from a number of drawbacks. The preconfigured memory allocator for Hadoop jobs leads to issues of buffer concurrency amongst jobs and heavy disk read seeks. The memory allocator issues result in increasing makespan time and induce high input/ output (I/O) overheads [5]. The jobs scheduled on Hadoop cloud environments do not consider parameters such as memory requirement and multicore environment for linear scalability, which seriously affects performance [8]. In Hadoop, the Reduce tasks are started after completion of all Map tasks. Hadoop assumes homogenous Map execution times considering homogenous distributed data, which is not realistic [9]. Assumed homogenous Map execution times and serial execution strategy put forth utilized Map workers (and their resources) that have completed their tasks and are waiting for the other Map workers to complete theirs [10]. In cloud environments where organizations/users are charged according to (storage, computation, and communication) resources utilized, these issues burden the costs in addition to affecting performance [11]. Hadoop platforms do not support flexible pricing [12]. Scalability is an issue owing to the cluster based nature of Hadoop platforms. Processing of streaming data is also an issue with Hadoop [10]. To overcome these drawbacks, researchers have adopted various techniques.
In [5], they addressed the issues related to Hadoop memory management by adopting a global memory management technique. They proposed a prioritization model of memory allocation and revocation by adopting a rule based heuristic approach. A multithread execution engine is used to achieve global memory management. To address the garbage collection issue of a Java virtual machine (JVM) and to improve the data access rate in Hadoop, they adopted a multicache mechanism for sequential and interleaved disk access. Their model improves the memory utilization and balances the performance of I/O and CPU. In [5], the authors did not take the network I/O performance into consideration.
In [8], a GPU based model to address the linear scalability issue of Hadoop is presented. They addressed the research challenges of integrating Hadoop and GPU and how the MapReduce job can be executed using CUDA based GPU. In Hadoop MapReduce framework, the jobs run inside a JVM. Managing of jobs, creation of jobs, and executing of jobs suffer from computation overhead and reduce the efficiency of Just-In-Time (JIT) compilation due to the short-lived nature of jobs in the JVM. To overcome this, they adopted GPU based job execution approaches such as JNI, JCuda, Hadoop Pipes, and Hadoop Streaming. They have analyzed and evaluated detailed comparison of protocol of their pros and cons.
To address issues related to sequential execution in [13], a Cloud MapReduce (CMR) framework is discussed. Here, they developed a parallelized model by adopting a pipelining execution approach to process the streaming and batch data. Their cloud based MapReduce model supports parallelism between Map and Reduce phases and also among individual jobs.
The increased demand in data analytics for computing scientific/bioinformatics data has resulted in increased size of bioinformatics data. Computing and storing these huge data require a huge infrastructure. Computing bioinformatics application by adopting Cloud platform is a viable option for BioMed Research International 3 analyzing the genomic structure and its evolutionary pattern of large bioinformatics data [14][15][16][17][18] which is generated by the Next Generation Sequencing (NGS) technologies. Various cloud based bioinformatics applications have been developed to compute large bioinformatics data, CloudAligner [18], CloudBurst [19], Myrna [20], and Crossbow [21]. Cloud technologies allow the user to compute bioinformatics application and charges the user based on their usage. Reducing the computation cost in such environment is an area that needs to be considered during designing a bioinformatics computation model.
Reducing execution times and effective resource utilization with minimal costs are always a desired feature of cloudcomputing frameworks. To achieve this goal, a Parallel MapReduce ( ) framework is proposed in this paper. The adopts a parallel execution strategy similar to the technique presented in [13]. In conventional MapReduce systems, the Map phase is executed first, and then Reduce phase execution is considered. In the proposed PMR, Reduce phase execution is initiated in a parallel fashion, as soon as two or more Map worker nodes have completed their tasks. The adoption of such execution strategies enables reduction of unutilized worker resources. To further reduce makespan, parallel execution of the Map and Reduce functions is adopted utilizing multicore environments available with nodes. A makespan model to describe operations of the is presented in future sections. Bioinformatics applications are synonymous with big data. Processing of such computationally heavy applications is considered on cloud platforms as investigated in [22,23]. Performance evaluation of the PMR framework is carried out using bioinformatics applications. The major contributions can be summarized as follows: Beside the data management and computing issues, there exist numerous security issues and challenges in provisioning security in cloud computing environment and in ensuring ethical treatment of biomedical data. When MapReduce is carried out in distributed settings, users maintain very little control over these computations, causing several security and privacy concerns. MapReduce activities may be subverted or compromised by malicious or cheating nodes. Such security issues have been discussed and highlighted by many researchers as in [24][25][26]. However, addressing security issues is beyond the scope of this paper.
The paper organization is as follows: In Section 2, the related works are discussed. In Section 3, the proposed framework is presented. The results and the experimental study are presented in the penultimate section. The concluding remarks are discussed in the last section.

Literature Review
D. Dahiphale et al. [13] presented a cloud based MapReduce model to overcome the shortcomings of the Hadoop MapReduce model which are as follows: Hadoop processes the Map and Reduce phases in a sequential manner, scalability is not efficient due to cluster based computing mechanism, processing of stream data is not supported, and lastly it does not support flexible pricing. To overcome the issue of sequential execution, they proposed a cloud based Parallel MapReduce model where the tasks are executed by the Amazon EC2 instances (virtual machine (worker)); to process stream and batch data in a parallel manner, a pipelining model is adopted which provides flexible pricing by using an Amazon cloud Spot Instance. Experiment result shows that the CMR model processes tasks in a parallel manner, improves the throughput, and shows a speedup improvement of 30% over the Hadoop MapReduce model for larger datasets.
X. Shi et al. [5] presented a framework for memory intensive computing to overcome the shortcomings of the Hadoop MapReduce model. In Hadoop, tasks are executed based on the available CPU cores and memory is allocated based on a preset configuration which lead to memory bottleneck due to buffer concurrency and heavy disk seeks resulting in I/O wait occupancy which further increases the makespan time.
To address this, they presented a rule based heuristic model to prioritize memory allocation and revocation for global memory management. They presented a multithread approach for which they developed disk access serialization, multicache technique for efficient garbage collection in JVM. The experimental study shows that execution of memory intensive computation time is improved by 40% over the Hadoop MapReduce model. Babak Alipanahi et al. [27] presented a model by adopting deep learning techniques for DNA-and RNA-binding protein for pattern discovery. The specificity of protein is generally described using position weight matrices (PWMs) and the learning sequence specificity in the high throughput model has the following challenges. Firstly, there is the varied nature data from different sources. For example, chromatin immunoprecipitation provides varying putatively bound sequence length of ranked list, for each sequence, RNAcompete assay and protein binding microarray provide a specificity coefficient, and HT-SELEX produces a very high similarity sequence set. Secondly, each data provider has its unique biases, artifacts, and limitation for which it needs to identify the pertinent specificity. Lastly, the data are in huge size which requires a computation model to integrate all data from different sources. To overcome these challenges, they presented a model, namely, DeepBind, whose characteristics are as follows. It is applicable for both sequence and microarray data and works well across different technologies without correcting for technology-specific biases. Sequences are processed in a parallel manner by using a graphics processing unit (GPU), which can train the predicting model automatically and can withstand a modest degree of noisy and incorrectly labeled trained data. Experiments are conducted on in vitro data for both training and testing which shows that the DeepBind model is a scalable modular pattern discovery technique based on deep learning which does not depend on application specific heuristics such as "seed finding." K. Mahadik et al. [28] presented a parallelized BLAST model to overcome issues related to mpiBLAST which are as follows. It segments the database and processes each short query in parallel but due to rapid growth of NGS it has resulted in increased size of sequences (long query sequences) which can be millions of protein/nucleotide sequences which limits the mpiBLAST resulting in scheduling overhead and increasing the makespan time. The mpiBLAST task completion time of short queries is faster as compared to large queries which create improper load balancing among nodes. To address this, they presented a parallel model of BLAST, namely, ORION, splitting individual queries into overlapping fragments to process large query sequences on the Hadoop MapReduce platform. Experimental outcomes show that their model achieves a speedup of 12.3x over mpiBLAST without compromising on accuracy.
J. Ekanayake et al. [29] presented a cloud based MapReduce model, namely, Microsoft DryadLINQ and Apache Hadoop, for bioinformatics applications and it was compared with the existing MPI framework. The pairwise Alu sequence alignment and CAP3 [30] application is considered. To evaluate the scheduling performance of these frameworks, an inhomogeneous dataset is considered. Their outcomes show that two cloud frameworks have a significant advantage over MPI in terms of fault tolerance, parallel execution by adopting the MapReduce framework, robustness, and flexibility since MPI is memory based whereas the DryadLINQ and Hadoop model is file oriented based. Experimental analysis is conducted for varied sequence sizes and the result shows that Hadoop performs better than DryadLINQ for inhomogeneous data for both applications.
Y. Wu et al. [31] presented an outliers based execution strategy 2 for computation intensive applications in order to reduce the makespan and overhead of computation; many existing approaches that adopt a MapReduce framework are suitable for data intensive application since their scheduler state is defined by I/O status. They designed a framework for computation intensive tasks by adopting instrumentation to detect task progress and automatic instrument point selector to reduce overhead and finally for outlier's detection without resorting to biased progress calculation K-means is adopted. The 2 framework is evaluated by using application CAP3 and ImageMagick on both local cluster and cloud environment. Their threshold based outlier model improves the task completion time by 25% with minimal overhead.

The Proposed PMR Framework
The framework incorporates similar functions available in conventional MapReduce frameworks. Accordingly, the Map, Shuffle (including Sort), and Reduce phases exist in . For the sake of representation simplicity, the Shuffle and Reduce phases are cumulatively considered in the Reduce phase. The Map phase takes input data for processing and generates a list of key pair values of result ( 1 , V 1 ) → ( 2 , V 2 ). This generated key 2 and list of different values are integrated together and put into a reducer function. The reducer function takes intermediate key 2 and processes the values and generates a new set of values (V 3 ).
The job execution is performed on multiple virtual machines forming a computing cluster, where one is a master node and the others are worker nodes/slave nodes. The master node distributes and monitors tasks among worker nodes. Worker nodes periodically send their resource utilization details to the master node. Master nodes schedule the task based on availability of worker resources.
To minimize makespan of job execution and maximize utilization of cloud resource (available with worker nodes), the proposed adopts a parallel execution strategy; i.e., Reduce phase execution is initiated in a parallel fashion, as soon as two or more Map worker nodes have completed their tasks. The worker nodes are considered to have more than one computing core; the framework presents parallel execution of the Map and Reduce functions adopted utilizing multicore environments and a makespan model of the proposed is described and presented in Section 3.1. The function is a combination of the Map task and Reduce task. The input dataset is split into uniform block sized data called chunks and is distributed among the computing nodes. In , the chunk obtained is further split to parallelize execution of user defined Map and Reduce functions. The user defined Map function is applied on the input and intermediate output is generated which is input data for the Reduce task. The Reduce stage is a combination of two phases, Shuffle and Reduce. Output data which is generated from the Map task is fed as an input in the Shuffle phase; the already completed Map task is shuffled and then sorted in this phase. Now, the sorted data is fed into the user defined Reduce function and the generated output is written back to cloud storage.
A Map function in terms of computation time and input/ output data dependencies can be represented as a tuple where → M is the average input data processed by each Map worker. Variables M ↓ , M → and M ↑ represent the maximum, average, and minimum computation time of the Map function. Output of the Map function stored in the cloud to be processed by Reduce workers is represented as a ratio between output and input data M .
Similarly, the Reduce function is represented as where M is output data of Map functions stored in the cloud (represented as a ratio). Output of the Reduce function or the task assigned to is represented as (ratio of Reduce BioMed Research International 5 output to input). Minimum, average, and maximum computation times of the Reduce function are ↓ , → and ↑ . The Reduce stage incorporates Shuffle and Sort operations. Reducing execution time and minimizing cost of cloud usage are always desirable attributes. In this paper, a makespan model to describe operation of the framework is presented. Obtaining actual makespan times is very complex and is always a challenge. A number of dependencies exist, like hardware parameters, network conditions, cluster node performance, cloud storage parameters, data transfer rates, etc. in obtaining makespans. The makespan model of described below only considers functional changes incorporated to improve performance in conventional MapReduce frameworks. Modeling described below is based on work presented in [32].

Preliminaries and Makespan Bound Establishment.
The makespan function is computed as the time required to complete a job of input data size and number of resources which is allocated to . Let us consider a job to be executed on the cloud platform considering data . Let the cloud platform have + 1 number of nodes/workers. Each worker is said to have cores that can be utilized for computation. One worker node acts as the master node leaving number of workers to perform Map and Reduce computation. Job is considered to be distributed and computed using an number of Map and Reduce tasks. The data is also accordingly split into chunks represented as . In conventional MapReduce platforms, = ( / ). PMR considers a similar approach for computing . The time utilized to complete tasks is represented as T 1 , T 2 , . . . , T . In the proposed PMR framework, the chunks are further split into = ( / ) for parallel execution. In PMR execution of the ℎ task, T = T ↑ = max{t 1 , ⋅ ⋅ ⋅ t }, where t represents execution of task on the ℎ core considering corresponding data . Average ( ) and maximum time ( ) duration taken by tasks to complete job can be represented as Let us consider an optimistic scenario that tasks are uniformly distributed among worker nodes (minimum time taken to process ( × ) work). Overall, the time taken to compute these tasks is ( × )/ and it is the lower bound time.
To compute the upper bound time, a pessimistic scenario is considered, where the longest processing task ← T ∈ (T 1 , T 2 , . . . , T ) with makespan of is the last processed task. Therefore, the time taken before the last task ← T is upper bounded as follows: Therefore, the overall timespan for this longest task ← T is upper bounded as (( − 1) × )/ + . The probable job makespan range due to nondeterminism and scheduling is obtained by the difference lower bound and upper bound. This is a key factor when the time taken of the longest task is trivial as compared to the overall makespan; i.e., ≪ ( × / ).
Similarly, the upper bound T or the maximum execution time of the Map phase in using (6) is defined as Considering the lower (T ) and upper (T ) bounds computed, the makespan of the Map phase in is computed as The average makespan of each Map worker node is computed as The makespan of the PMR Map phase consisting of S J M = worker nodes is shown in Figure 2 of the paper. Ascertaining bounds of makespan, i.e., T and T , is shown in the figure.
The Reduce workers are initiated when at least two Map worker nodes have finished their computational tasks. The Reduce phase is initiated at (T − T ) time instance. Intermediate data generated by Map worker nodes is processed using the Shuffle, Sort, and Reduce functions defined. Average execution time R → and maximum execution time R ↑ of the Reduce phase considering S J R workers are derived using (3). Makespan bounding of the Reduce phase is computed (the lower bound is represented as T and the upper bound is represented as T ) as follows: The makespan of the J ℎ job on the framework is a sum of time taken to execute Map tasks and time taken to execute Reduce tasks. Considering the best case scenario (lower bound), the minimum makespan observed is Simplifying (13), we get Considering the worst computing performance, the upper bound or maximum makespan observed is The makespan of job J on the framework is defined as Using (14) and (16), makespan J is

Modeling Data Dependency on Makespan.
According to [30,31], data dependency can be modeled using linear regression. A similar approach is adopted here. The average makespan of the ℎ Map worker node is defined as where V * represent variables that are application specific; i.e., they are dependent on the Map user function. The average makespan of the ℎ Reduce worker node is defined as where V * represent variables specific to the user defined Reduce functions and represents intermediate output data obtained from the Map phase. For parallel execution and to utilize all resources, it is further split similar to the Map phase. On similar lines, the average and maximum execution times of Map and Reduce workers are computed. Data dependent computations of M → , M ↑ , R → , R ↑ are used in (14), (16), and (18) to compute makespan of the J ℎ job on the framework considering data . Additional details of data dependency modeling using linear regression are presented in [33]. The proof of the model is also presented in [33].

Performance Evaluation
Experiments conducted to evaluate the performance of are presented in this section. Performance of is compared with the state-of-the-art Hadoop framework. Hadoop is the most widely used/adopted MapReduce platform for computing in cloud environments [34]; hence, it is considered for comparisons. The framework is developed using VC++, C#, and Node.js and deployed on the Azure cloud. Hadoop 2, i.e., version 2.6, is used and deployed on the Azure cloud using HDInsight. The framework is deployed consisting of one master node and 4 worker nodes. Each worker node is deployed on A3 virtual machine instances. Each A3 VM instance consists of 4 virtual computing cores, 7 GB of RAM, and 120 GB of local hard drive space. The Hadoop platform deployed for evaluation consists of one master and 4 worker nodes in the cluster. Uniform configuration of and Hadoop frameworks on Azure cloud is considered.
Biomedical applications characterized by processing of massive amounts of genetic data are considered in the experiments for performance evaluation. A computationally heavy biomedical application, namely, BLAST [35], CAP3 [30], and state-of-the-art recent DeepBind [27], is adopted for evaluation. All the genomic sequences considered for the experimental analyses are obtained from the publicly available NCBI database [36]. For comprehensive performance evaluations, the authors have considered various application scenarios. In experiments conducted using BLAST, both the Map and Reduce phases are involved. In CAP3 application, the Map phase plays a predominant role. In DeepBind, the Reduce phase is critical for analysis.

BLAST.
Gene sequence alignment is a fundamental operation adopted to identify similarities that exist between a query protein sequence, DNA or RNA, and a database of sequences maintained. Sequence alignment is computationally heavy and its computation complexity is relative to the product of two sequences being currently analyzed. Massive volumes of sequences maintained in the database to be searched induce an additional computation burden. BLAST is a widely adopted bioinformatics tool for sequence alignment which performs faster alignments, at the expense of accuracy (possibly missing some potential hits) [35]. The drawbacks of BLAST and its improvements are discussed in [28]. For evaluation here, the improved BLAST algorithm of [28] is adopted. To improve computation time, a heuristic strategy is used compromising accuracy minimally. In the heuristic strategy, an initial match is found and is later extended to obtain the complete matching sequence.
A three-stage approach is adopted in BLAST for sequence alignment. Query sequence is represented using q and reference sequence as r. Sequences q and r are said to consist of −length subsequences known as − . In the initial stage, also known as the − match stage, BLAST considers each of the − of q and r and searches for − that match in both. This process is repeated to build a scanner of all −letter words in query q. Then, BLAST searches reference genome r by using the scanner built to find − of r matches with query q and these matches are of potential hits. In the second stage, also known as the ungapped alignment stage, every seed identified previously is in both directions, respectively, to include matches and mismatches. A match is found if nucleotides in q and r are the same. A mismatch occurs if varied nucleotides are observed in q and r. The mismatch reduces the score and matches increase the score of candidate sequence alignment. The present score of sequence alignment and the highest score obtained for present seed ↑ are retained. The second phase is terminated if ↑ − is higher than the predefined X-drop threshold ℎ and returns with the highest alignment score of the present seed. The alignment is passed to stage three if the returned score is higher than the predefined ungapped threshold . The thresholds predefined establish accuracy of alignment scores in BLAST. Computational optimization is achieved by skipping seeds already available in previous alignments. The initial two phases of BLAST are executed in the Map workers of and Hadoop. In stage three, gapped alignment is performed in the left and right directions where deletion and insertion are performed during extension of alignments. The same as the previous stage, the highest score of alignment ↑ is kept and if the present score is lower than ↑ by more than the Xdrop threshold, the stage is terminated and the corresponding alignment outcome is obtained. Gap alignment operation is carried out in the Reduce phase of and Hadoop framework. The schematic of BLAST algorithm on framework is shown in Figure 3.
Experiments conducted to evaluate performance of and Hadoop considered the Drosophila database as a reference database. The query genomics of varied sizes considered is from Homo sapiens chromosomal sequences and genomic scaffolds. A total of six different query sequences are considered similar to [28]. Configuration of each experiment is summarized in Table 1. All six experiments are conducted using BLAST algorithm on Hadoop and frameworks. All observations retrieved through a set of log files generated during the Map and Reduce phases of Hadoop and are noted and stored for further analysis. Using the log files total makespan, Map worker makespan, and Reduce worker makespan of Hadoop and is noted for each experiment. It must be noted that the initialization time of the VM cluster is not considered in the computing makespan as it is uniform in and Hadoop owing to similar cluster configurations. Individual task execution times of Map worker and Reduce worker nodes observed for each BLAST experiment executed on Hadoop and frameworks are graphically shown in Figure 4. Figure 4(a) represents results obtained for Hadoop and Figure 4( Figure 4(b). In other words, Reduce workers are initiated as soon as two or more Map worker nodes have completed their tasks. The execution time of Reduce worker nodes in is marginally higher than those of Hadoop. Waiting for all Map worker nodes to complete their tasks is a primary reason for the marginal increase in Reduce worker execution times in . Sequential processing, i.e., Map workers first and then Reduce worker execution, of worker nodes in Hadoop framework is evident from Figure 4(a).
The total makespan of and Hadoop is dependent on task execution time of worker nodes during the Map phase and Reduce phase. The total makespan observed in BLAST sequence alignment experiments executed on Hadoop and frameworks is shown in Figure 5.

CAP3
. DNA sequence assembly tools are used in bioinformatics for gene discovery and understanding genomes of existing/new organisms. CAP3 is one such popular tool used to assemble DNA sequences. DNA assembly is achieved by performing merging and aligning operations on smaller sequence fragments to build complete genome sequences. CAP3 eliminates poor sections observed within DNA fragments, computes overlaps amongst DNA fragments, is capable of identifying false overlaps, eliminating false overlaps identified, accumulates fragments of multiple or one overlapping DNA segment to produce contigs, and performs multiple sequence alignments to produce consensus sequences. CAP3 reads multiple gene sequences from an input FASTA file and generates output consensus sequences written to multiple files and also to standard outputs.
The CAP3 gene sequence assembly working principle consists of the following key stages. Firstly, the poor regions of 3 (three-prime) and 5 (five-prime) of each read are identified and eliminated. False overlaps are identified and eliminated. Secondly, to form contigs, reads are combined based on overlapping scores in descending order. Further, to incorporate modifications to the contigs constructed, forward-reverse constraints are adopted. Lastly, numerous sequence alignments of reads are constructed per contig resulting in consensus sequences characterized by a quality value for each base. Quality values of consensus sequences are used in construction of numerous sequence alignment operations and also in computation of overlaps. Operational steps of CAP3 assembly model are shown in Figure 7. A detailed explanation of the CAP3 gene sequence assembly is provided in [30].
In the experiments conducted, CAP3 gene sequence assembly is directly adopted in the Map phase of and Hadoop. In the Reduce phase, result aggregation is considered. Performance evaluation of CAP3 execution on and Hadoop frameworks Homo sapiens chromosome 15 is considered as a reference. Genome sequences of various sizes are considered as queries and submitted to Azure cloud platform in the experiments. Query sequences for experiments are considered in accordance to [30]. CAP3 experiments conducted with query genomic sequences (BAC datasets) are summarized in Table 2. All four experiments are conducted using CAP3 algorithm on the Hadoop and frameworks. Observations are retrieved through a set of log files generated during Map and Reduce phase execution on Hadoop and . Using the log files total makespan, Map worker makespan and Reduce worker makespan of Hadoop and are noted for each experiment. It must be noted that the initialization time of the VM cluster is not considered in the computing makespan as it is uniform in and Hadoop owing to similar cluster configurations.
Task execution times of Map and Reduce worker nodes observed for CAP3 experiments conducted on Hadoop and frameworks are shown in Figure 8. , the Reduce workers are initiated as soon as two or more Map worker nodes have completed their tasks which is visible from Figure 8(b). Sequential processing strategy (i.e., Map workers first and then Reduce workers execution) of worker nodes in the Hadoop framework is evident from Figure 8(a). Execution time of Reduce worker nodes in is marginally higher by about 15.42% than those of Hadoop. Waiting for all Map worker nodes to complete their tasks is a primary reason for the marginal increase in Reduce worker execution times in . The total makespan observed in CAP3 experiments executed on the Hadoop and frameworks is presented in Figure 9. Superior performance in terms of Reduce makespan times of is evident when compared to Hadoop. Though a marginal increase in Reduce worker execution time  is reported, overall execution time, i.e., total makespan of , is less when compared to Hadoop. A reduction of 18.97%, 20%, 15.03%, and 18.01% is reported for the four experiments executed on the framework when compared to similar experiments executed on the Hadoop framework. Average reduction of the total makespan across all experiments is 18% proving superior performance of when compared to the Hadoop framework. Makespan time for experiment 2 is greater than other experiments as the number of differences considered in CAP3 is 17 larger than values considered in other experiments. Similar nature of execution times is reported in [29] validating CAP3 execution experiments presented here.
Theoretical makespan of for all four CAP3 experiments is computed using (18). Comparison between theoretical and experimental makespan values is presented in Figure 10. Minor differences are reported between practical and theoretical makespan computations proving correctness of makespan modeling presented. The results presented in this section prove that CAP3 sequence assembly execution on the cloud framework developed exhibits superior performance when compared to similar CAP3 experiments executed on the existing Hadoop cloud framework.

DeepBind Analysis to Identify Binding Sites.
In recent times, deep learning techniques have been extensively used for various applications. Deep learning techniques are adopted predominantly when large amounts of data are to be processed or analyzed. To meet large computing needs of deep learning techniques, GPU are used. Motivated by this, the authors of the paper consider very recent state-of-theart "DeepBind" biomedical application execution on a cloud platform. To the best of our knowledge, no such attempt to consider cloud platforms for DeepBind execution has been reported.
Alternative splicing, transcription, and gene regulations biomedical operations are dependent on DNA-and RNAbinding proteins. DNA-and RNA-binding proteins described using sequence specificities are critical in identifying diseases and deriving models of regulatory processes that occur in biological systems. Position weight matrices are used in characterizing specificities of a protein. Binding sites on genomic sequences are identified by scanning position weight matrices over the considered genomic sequences. DeepBind is used to predict sequence specificities. DeepBind adopts deep convolutional neural networks to achieve accurate prediction. Comprehensive details and sequence specificity  prediction accuracy of the DeepBind application are available in [27].
DeepBind is developed using a two-phase approach, a training phase and testing phase. Training phase execution is carried out using Map workers in the Hadoop and frameworks. The trained weights are stored in the cloud memory for further processing. The testing phase of Deep-Bind is carried out at the Reduce stage in the Hadoop and frameworks. Execution strategy of DeepBind algorithm on the framework is shown in Figure 11. DeepBind application is developed using the code provided in [27]. For performance evaluation on Hadoop and only testing phase is discussed (i.e., Reduce only mode). A custom cloud cluster of one master node and six worker nodes is deployed for DeepBind performance evaluation. A similar cloud cluster for the Hadoop framework is considered. The experiment   conducted to evaluate the performance of DeepBind on the Hadoop and frameworks considers a set of six diseasecausing genomic variants obtained from [27]. The diseasecausing genomic variants to be analyzed using DeepBind are summarized in Table 3. DeepBind analysis is executed on the Hadoop and frameworks deployed on a custom cloud cluster. Log data generated is stored and used in further analysis.
The results obtained to demonstrate the performance of six worker cluster nodes of Hadoop and during Map and Reduce phase execution are shown in Figure 12. Performance is presented in terms of task execution times observed per worker node. Considering Hadoop worker nodes execution times of each node during the Map and Reduce phase is shown in Figure 12(a). The execution time observed for each worker node during the Map and Reduce phases is shown in Figure 12(b). In the Map phase execution, the genomic variants to be analyzed are obtained from the cloud storage and are accumulated based on their identities defined [27]. The query sequences of disease-causing genomic variants to be analyzed are split for parallelization. In the Reduce phase, the split query sequences are analyzed and results obtained are accumulated and stored in the cloud storage. Map workers in exhibit  better performance and an average execution time reduction of 48.57% is reported when compared to Hadoop Map worker nodes. Execution time of the six Reduce worker nodes in Hadoop and is greater than Map workers as DeepBind analysis and identification of potential binding sites is carried out during this phase. Parallel execution strategy of Reduce worker nodes is clear from Figure 12(b). The Reduce phase in commences after 5 seconds once Map worker node 1 (MW1) and Map worker node 5 (MP5) have completed their task. In Hadoop that adopts a sequential approach, the Reduce phase is initiated after all worker nodes have completed their tasks. Parallel execution of DeepBind analysis utilizing all 4 computing cores available with Reduce worker nodes and parallel initiation of the Reduce phase in enable average Reduce execution time of 22.22% when compared to Hadoop Reduce worker nodes. The total makespan observed for DeepBind experiment execution on the Hadoop and cloud computing platforms is shown in Figure 13. Total makespan reduction of 34.62% is achieved using the framework when compared to the Hadoop framework. Analysis results similar to [27] are reported for DeepBind analysis on the Hadoop and frameworks. The theoretical makespan computed using (18) for is compared with the practical value observed in the experiment and the results obtained are shown in Figure 14. A minor variation between theoretical and practical values is observed. The variation observed is predominantly due to application dependent multiple cloud memory access operations. Based on results obtained for DeepBind analysis, it is evident that performance on the framework is far superior to its execution on the existing Hadoop framework.
On the basis of biomedical applications considered for performance evaluation and results obtained, it is evident that the proposed framework exhibits superior performance  when compared to its existing Hadoop counterpart. In BLAST, the Map and Reduce phases are utilized. In CAP3 application, the Map phase plays a predominant role. In DeepBind application analysis is carried out in the Reduce phase. The proposed cloud computing framework is robust and is capable of the dynamic biomedical application scenarios presented: deployment of on public and custom cloud platforms. In addition, exhibits low execution times and enables effective cloud resource utilization. Low execution times enable cost reduction, always a desired feature.

Conclusion and Future Work
The significance of cloud computing platforms is discussed. The commonly adopted Hadoop MapReduce framework working with its drawbacks is presented. To lower execution times and enable effective utilization of cloud resources, this paper proposes a cloud computing platform. A parallel execution strategy of the Map and Reduce phases is considered in the framework. The Map and Reduce functions of are designed to utilize multicore environments available with worker nodes. The paper presents the proposed framework architecture along with makespan modeling. Performance of the cloud computing framework is compared with the Hadoop framework. For performance evaluation, computationally heavy biomedical applications like BLAST, CAP3, and DeepBind are considered. Average overall makespan times reduction of 38.92%, 18.00%, and 34.62% is achieved using the framework when compared to the Hadoop framework for BLAST, CAP3, and DeepBind applications. The experiments presented prove the robustness of the platform, its capability to handle diverse applications, and ease of deployment on public and private cloud platforms. The results presented through the experiments conducted prove the superior performance of against the Hadoop framework. Good matching is reported between the theoretical makespan of the presented and experimental values observed. In addition, adopting the cloud computing framework also enables cost reduction and efficient utilization of cloud resources.
Performance study considering cloud cluster with many nodes, additional applications, and security provisioning to cloud computing framework is considered as the future work of this paper.

Data Availability
The data is available at the National Center for Biotechnology  Information. (2015). [Online]. Available: http://www.ncbi .nlm.nih.gov/