The cloud computing paradigm has brought the benefits of utility computing to a global scale. It has gained paramount attention in recent years. Companies are seriously considering to adopt this new paradigm and expecting to receive significant benefits. In fact, the concept of cloud computing is not a revolution in terms of technology; it has been established based on the solid ground of virtualization, distributed system, and web services. To comprehend cloud computing, its foundations and technological landscape need to be adequately understood. This paper provides a comprehensive review on the building blocks of cloud computing and relevant technological aspects. It focuses on four key areas including architecture, virtualization, data management, and security issues.
Cloud computing technology has attracted significant attention from both academic and industry in recent years. It is perceived as a shift in computing paradigm in which computing services are offered and acquired on demand over the global-scaled network [
Cloud computing has the potential to transform IT industry and change the way Enterprise IT is operated [
However, cloud services and security concerns inherited from their underlying technologies might negatively impact an enterprise if they are not properly managed. Technical and security risks identified in this context include (a) data lock-in and system lock-in; (b) unreliable system performance due to many uncontrollable factors such as network traffic, load balancing, and context switching cost; (c) decreased performance due to virtualization; (d) complex integration between legacy and cloud-based systems; (e) incompatibility of user behaviors and enterprise process over a new version of cloud software, as software upgrade is controlled by the provider; (f) information leakage in a multitenant model; (g) data interception during the transfer over public networks; and (h) security breach in a virtualization monitoring layer [
One of the core aspects of cloud computing is that it hides IT complexity under service offering models. While it is evident that components in the building blocks of cloud services introduce a certain degree of impacts on service characteristics, it is less evident how different the impacts would be in different configurations. This knowledge is essential for service consumers. Software developers working on cloud platforms need to understand it to apply appropriate software designs and properly configure their deployment environments, in order to ensure certain characteristics of resulting software. Enterprise consumers need this knowledge during service level agreement (SLA) negotiation and to determine the line of responsibility. End users also need it to adjust their usage behavior.
This paper aims to provide a better understanding over cloud computing technology as well as its associated foundations. The knowledge serves as a basis for the in-depth analysis and assessment of cloud services. For software developers, this paper adds new aspects to consider when developing software in-the-cloud and for-the-cloud. For researchers, it identifies the landscape of the underlying technology of cloud computing, especially virtualization, data management, and security.
The remainder of the paper is organized into five sections. In the next section, we explain what cloud computing is and what it is not through reviewing various definitions, service models, deployment models, and relevant concepts. An in-depth comparison between grid and cloud computing is also presented in this section. Section
The semantic of cloud computing is identified by its definition, service models, and deployment models.
The standard definition of “cloud computing” is on the way of reaching its consensus [
Foster et al. provide a definition of cloud computing as they compare cloud with grid computing. According to their definition [
Vaquero et al. identify more than 20 existing definitions of cloud computing and propose their own definition. As it is perceived in the end of 2008 [
The most cited definition of cloud computing is the one proposed by The US National Institute of Standards and Technology (NIST). NIST provides the following definition [
These definitions reveal three characteristics of the clouds. First of all, cloud services are massively scalable, and the acquisition and release of these services could be done dynamically with minimum operational supports required. Secondly, the cost is charged on a usage basis and the quality of services is guaranteed by a providers based on a service level agreement. Lastly, the quality of cloud services, such as security and performance, relies primarily on availability of Internet and how underlying resources are managed and distributed to clients.
A service model determines the types of computer resources offered to consumers. Three main types of cloud services are infrastructure (IaaS), platform (PaaS), and software (SaaS). However, new service models are continuously emerging.
A provider provides a virtual infrastructure where computing resources including processing units, storage, and network could be provisioned in order to set up a deployment environment for their software system. A customer has flexibility to manage and control a software stack to be deployed ranging from an operating system, middleware, and applications. Examples of IaaS are Amazon Elastic Compute Cloud (EC2) (
PaaS provides customers with the capability to develop and deploy applications based on tools and programming languages supported by the providers. This hosted platform is configurable in a limited manner based on a provided set of APIs. Examples of this class of services include Google AppEngine (
SaaS provides the capability to use the applications which run on cloud infrastructure. These applications are accessible through standard interfaces such as a web browser or an email client. SaaS offers the experience of getting to work on applications and data from anywhere at any time by using various form of devices. Examples of widely used SaaS are Facebook, Gmail, and OfficeLive (
HuaaS relies on information aggregation techniques to extract meaningful information or prediction from massive-scale data [
A computing paradigm is moving toward a XaaS concept in which everything could be acquired as a service. Cloud computing and aforementioned service models are in support of XaaS [
These examples are fairly well-known cloud services. OpenCrowd taxonomy [
Different deployment models are designed to support a variation of consumers’ privacy requirements for cloud adoption. NIST defines cloud deployment models as public, private, community, and hybrid [
Cloud computing introduces a shift in a computing paradigm. Voas and Zhang in their article [
The concept of cloud computing is not a revolution. In fact, it introduces an overlap with many concepts and technologies, such as grid computing, utility computing, and virtualization. This subsection gives an overview to these concepts and points out common angles that each of these concepts share with cloud computing.
Grid computing is a distributed computing paradigm that enables resource sharing among multivirtual organizations in order to solve a common computational problem [
Utility computing presents a model that enables the sharing of computing infrastructure in which the resources from a shared pool are distributed to clients upon requests, and the cost of acquiring services are charged based on the usage [
Virtualization is an abstraction of a computing system that provides interfaces to hardware including a processing unit and its register, storages, and I/O devices [
Architectural patterns are used to create designs that are standardized, well understood, and predictable. These patterns are proven solutions to recurring common problems in software design [
Varieties of architectures could be derived from SOA. Erl, in his book “SOA design patterns” [
SOA is considered as a basic for cloud computing, as it provides software architecture that addresses many quality attributes required for cloud services, such as component composibility, reusability, and scalability. The concept of SOA is leveraged to construct extensible cloud solution architecture, standard interfaces, and reusable components [
Grid and cloud computing have been established based on a particular common ground of distributed and parallel computing, targeting at a common goal of resource sharing. Both technologies offer a similar set of advantages such as flexibility of acquiring additional resources on demand and optimizing the usage of infrastructure. However, they pose differences especially from a point of view of resource management. This section provides such comparisons based on the work of Foster et al. presented in [
A business model captures a flow in which services are created and delivered to clients in order to generate incomes to a service provider and to serve clients’ requirements. In cloud computing a role of providers is to provide computing services as a basic utility. Cloud generally offers a fixed or on demand pricing model. In either case, clients benefit from a reduction on upfront IT investments and flexibility in scaling its applications. In grid environments, an incentive to join the community is to get access to additional computing utilities, with the price of sharing one’s own resources. The whole community benefits from optimization.
In terms of infrastructure cloud is designed to serve as an internet-scale pool of resources. The whole infrastructure is managed by a single provider. Thus, a single unit of resource is conformed to a common governance model. In contrast, grid has been developed to integrate distributed, dynamic, and heterogeneous resources. A set of open standard protocols and facilities is established to allow those resources to be interoperated. The differences are reflected in their reference models (Figure
An architecture comparison: (a) grid model that provides connectivity to heterogeneous resources and (b) cloud model that manages a pool of resources by means of virtualization [
Resource management targets at the mechanisms to control resource pooling, to achieve effective resource allocation and a satisfactory level of service quality. It covers four main areas including computing model, data model and locality, virtualization, and monitoring.
A computing model concerns with how resources are distributed for computational work. Cloud resources are distributed in terms of a virtual machine. A new instance of a virtual machine is created and placed to a physical location which is unknown to clients. The placement algorithm is customized to maintain a balance of platform utilization, relevant costs, and a guaranteed quality of services. In contrast, grid uses queuing system to manage jobs and resource allocations. Job stays in queue until the required amount of resources are available. Once allocated, resources are dedicated only for that job. Due to such scheduling policy, interactive applications which require short latency time could not operate natively on the grid.
In both grid and cloud, data are distributed and replicated into a number of nodes to minimize the cost of communication between data and processors. Cloud uses a MapReduce framework to handle data locality. MapReduce runs on top of the file system, where data files are partitioned into chunks and replicated in many nodes. When a file needs to be processed, the storage service schedules a processor at the node hosting each chunk of the data to process the job. However, data locality cannot be easily exploited in grid, as the resource is allocated based on availability. One technique to tackle this issue is to consider data locality information, while a processor is schedule for computation. This approach is implemented in a data-aware scheduler.
Virtualization is a mechanism to provide abstraction to resources in the fabric layer, allowing administrative work (e.g., configuring, monitoring) to be performed more effectively in the cloud. Grid does not rely on virtualization as much as cloud does. This is due to the scale and the fact that each organization in grid community has ultimate control over their resources.
In cloud environment, client’s capability for monitoring is restricted to a type of services employed. A model that provides infrastructure-as-a-service (IaaS) gives more flexibility for clients to monitor and configure lower level resources and middleware. Monitoring inside grid environment could be done in a more straightforward manner through user’s credential which defines the right of users to access resources at different grid sites.
On top of Google’s MapReduce, a number of programming models have been created to facilitate the development of distributed and parallel programming [
In brief, cloud and grid share similarity in terms of their goal and underlining technologies that serve as a building block. They are different from a point of view of resource management. These differences are caused by the fact that (1) clouds and grids build upon resources of different nature; (2) clouds are operated for larger target group and serve a wider range of applications which put emphasis on different aspects of service quality.
The most respected definition of cloud is the one given by NIST. Cloud embraces the following characteristics: (a) it provides on demand computing capability which is accessible through the internet; (b) the computing capability could be provisioned and scaled with a minimum operational effort; (c) the usage of resources is metered and charged accordingly; (d) provider’s resources are pooled to serve multiple clients; (e) it inherits benefits and risks from IT outsourcing. In terms of computing paradigm, clouds are considered as an evolution of grid computing. Both technologies share a common goal of optimizing resource usage and offer a similar set of advantages. However, clouds and grid are significantly different from a perspective of resource management.
Cloud computing architecture is partly represented through service and deployment models described in the previous section. Its complete architecture must capture relationship and dependency among relevant entities and activities in that environment.
A reference model is an abstract view of an environment of interest. It presents relationships and dependencies among entities in that environment, while abstracting away the standard, technology, and implementation underneath. It is particularly useful to identify an abstract solution to a given issue and to determine the scope of influence.
A report by the US National Institute of Standards and Technology (NIST) gathers cloud reference architecture models proposed by known organizations [
This subsection summarizes main components of reference models list in the NIST report and proposed models found in the literatures.
DMTF (
IBM’s view on cloud management architecture combines actor’s roles, services, virtualized infrastructure, and provider management platforms [
CSA (
Instead of focusing on a service model representation, Cisco explicitly puts security architecture and service orchestration into the frame [
OSA (
FCCI (
SNIA (
The authors propose cloud ontology based on composibility of service [
The authors present a cloud open architecture (CCOA) aiming to assist strategic planning and consultancy of cloud computing services [
Table
Cloud computing reference models.
Name | Objective | Key components | References |
---|---|---|---|
Distributed management task force | To achieve interoperability | Actors, service interfaces, and profile | [ |
IBM | General purpose | Actors and roles, cloud services, and management activities to support business-related services and technical-related services | [ |
Cloud security alliance | Security assessment | Stack model, cloud services | [ |
Cisco | General purpose | Stack model for service composition | [ |
Open security architecture | Security assessment | Actors, flow of traffic and information in the cloud, security policy implemented by each actors, and servers to secure cloud operations | [ |
Federal cloud computing initiative | Standard for government clouds | Stack model representing cloud core capabilities and their associated management domain, actors, and cloud services | [ |
Cloud data management interfaces | Standard interfaces for cloud storage | Interfaces to data storage and associated metadata | [ |
Cloud ontology | General purpose | Stack model representing basic cloud resources and services | [ |
Cloud computing open architecture | Open standard and cloud ecosystem | Stack model integrating cloud virtual resources, common reusable services, core services, offerings, unified architecture interfaces, quality and governance, and ecosystem management | [ |
The existence of multiple cloud computing architectures, even though serving different purposes, reflects a lack of standardization and interoperability in this field. In fact, the views of cloud represented by each model are not disrupted. They rather reflect cloud environments at different levels of abstraction and put a focus on different aspects. However, having a uniformed model would enhance collaborations among stakeholders and help to prevent a vendor lock-in problem.
The objective of the first focus area is to understand the relationship and dependency of basic cloud components, actors, and management activities. A multilayer stack model allows us to put in place the technology associated to each layer, without being interfered by management activities. Figure
Cloud computing stacked models.
4-layer model [
5-layer model [
7-layer model [
Model (a) is introduced by Foster et al. to identify the differences between grid and cloud computing [
The bottom layer consists of physical computing resources, storages, network devices, data centers, and a mean to provide access to physical resources from other networks. CSA separates the hardware and facility layer to identify different kinds of security concerns associated to hardware and data centers. Relevant technologies in this layer include green data center, distributed system, cluster system, and firewall.
The abstraction layer provides a unified view of distributed and heterogeneous physical resources generally by mean of virtualization. The abstract infrastructure composes of the view of servers (processor, memory, and node), storages, network, and other facilities. Relevant technologies include virtualization and virtual machine monitor.
This layer provides necessary tools to perform basic cloud operations such as resource provision, orchestra, utilization, monitoring, and backup. It allows providers to manage load balancing, optimization of resource, and security in multitenant environments. Relevant technologies include resource pooling, multitenancy, distributed storages, NoSQL, virtual machine, virtual network, load balancing, cloud service bus, and MapReduce.
The API layer provides interfaces for consumers to access, manage, and control their provision resources. Relevant technologies include web services, virtual machine, virtual data center, authentication and authorization mechanisms, multitenancy, and Infrastructure-as-a-Service.
This layer provides a customizable development environment on top of a virtualized platform for the development and deployment of cloud software. Relevant technologies include hardened pared-down operating system, development environment, deployment environment, and Platform-as-a-Service.
This layer offers web applications and services running on cloud infrastructure which are accessible through standard interfaces and devices. Relevant technologies include web services (e.g., WSDL, SOAP, and REST), web technology (e.g., HTML, CSS, JavaScript, DOM, AJAX, and mash-up), authentication and authorization (public-key cryptography), federated identity management (OpenID, Oauth), secured web browsers, data format (e.g., XML, HTML, and JSON), and Software-as-a-Service.
Four types of cloud actors are defined in the reference models and literature. These include
Several models (i.e., IBM, GSA, NIS, and GSA) identify groups of management activities required to maintain cloud production environments. We derive five cloud management domains based on management activities outlined in reviewed reference models.
This domain is primarily used by providers to maintain physical and virtualized cloud infrastructure. It allows basic operation including resource monitoring, optimization, load balancing, metering the usage, and providing isolation over multitenancy environment.
The objective of this domain is to make cloud services and applications available to consumers. It allows services to be found, requested, acquired, managed, and tested.
This domain concerns with technical-related services. The responsibilities are threefold. First of all, it provides management of service instances. This includes deployment, configure, testing, debugging, and performance monitoring. Second, it provides transparency and control over an isolated deployment environment. From consumer perspective, especially for IaaS and PaaS, the provided information is sufficient for SLA management, capacity planning, and analysis security concerns. Lastly, it provides management over resources. This includes provisioning, configuration management, backup, and recovery.
This domain concerns with business-related services, for instance, invoice, billing, and customer management.
Every layer of cloud stack needs different security mechanisms. The service model determines an actor who is responsible for maintaining security concerns for each layer. Generally the security of virtualized infrastructure (facility, hardware, abstraction, and connectivity) is managed by the provider. Consumers of IaaS have to manage the integration of an operating system and virtualized infrastructure. Consumers of PaaS have to manage the configuration of deployment environments and application security. Security concerns, including user authentication and authorization, are all handled by the providers in SaaS.
Cloud computing is considered as a form of outsourcing where the ultimate management and control over acquired services are delegated to an external provider [
SLA is defined in terms of quality of services such as performance and availability. Dynamic nature of clouds caused by virtualization, resource pooling, and network directly impacts service characteristics. Quality attributes relevant to cloud services are given in this subsection.
Scalability refers to capability to scale up or down the computing resources including processing units, memory, storages, and network to response to volatile resource requirements [
Time behaviors (e.g., performance, response time) are critical of latency sensitive applications and introduce high impact for user experiences. As cloud applications are operated on virtual distributed platform, time behaviors touch upon various area such as a quality of network, virtualization, distributed storage, and computing model. Aforementioned factors cause unreliable time behavior for cloud services [
Security and trust issues are early challenges to the introduction of a new technology. As cloud infrastructure is built upon several core technologies, the security relies on every of these components. Trust requires portions of positive experiences and provider’s reputation. Common threats to cloud security include abuse of cloud services, insecure APIs, malicious insiders, shared technology vulnerabilities, data loss and leakage, and service hijacking [
Availability refers to a percentage of time that the services are up and available for use. SLA contracts might use a more strict definition of availability by counting on uptime that respects at the quality level specified in the SLA [
Reliability is capability of services to maintain a specific level of performance overtime (adapted from ISO/IEC 9126 [ISO/IEC 9126-1]). Reliability is influenced directly by a number of existing faults, a degree of fault tolerance, and recoverable capability of services in case of failures. Cloud infrastructure is built upon a number of clusters of commodity servers, and it is operated on the internet scale. Partial network failures and system malfunction need to be taken as a norm, and such failures should not impact availability of services.
Portability is an ability to move cloud artifacts from one provider to another [
Usability refers to capability of services to be understood, learned, used, and attractive to users (adapted from ISO/IEC 9126 [ISO/IEC 9126-1]). IaaS and PaaS providers should offer sufficient APIs to support resource provisions, management, and monitoring activities. Usability is particularly important for SaaS to retain customers due to a low cost of switching.
Capability of services to be customized to address individual user preferences is important for services that serve internet-scale users [
Reusability is capability of software service components to serve for construction of other software. Cloud computing amplifies the possibility of service reuse through broad Internet access.
Consistency characteristic is relevant for SaaS. If the data is consistent, then all clients will always get the same data regardless of which replicas they read from. However, it is costly to maintain strong consistency in distributed environments [
In this section we reviewed existing cloud computing reference architectures proposed by different enterprises and researchers for different purposes. Some intend to initiate an open standard to enhance interoperability among providers; some aim to understand the dependencies of relevant components for effective security assessment; others are for general proposes. Three entities are generally presented in the reference models. These include a stack model, actors, and management domains. The most abstract model of cloud consists of four-layers fabric, unified resource, platform, and application. One of the most descriptive models is proposed by CSA. It separates the fabric layer into facility and hardware sublayers. The unified resource layer is separated into abstraction and core connectivity sublayers; the platform layer is separated into infrastructure’s APIs and middleware sublayers; the application layer is further divided into a number application components. Cloud actors might take more than one role at a time; for instance, an enterprise could be a customer of IaaS provided by Amazon and develop SaaS for its customers. The responsibilities of each actor are explicitly defined in service level agreement. Management and operational activities to maintain cloud production environments could be grouped into five domains including physical resources and virtualization, service catalogues, operational supports, business supports, and security. Essential quality attributes relevant to cloud service are availability, security, scalability, portability, and performance. Cloud underlying technologies, such as virtualization, distributed storages, and web services, directly affect these quality attributes. Fault tolerance is taken as a mandatory requirement as partial network failures, and occasional crashes of commodity servers are common for systems of the internet scale. Usability is one of the most important characteristics for SaaS due to a low cost of switching.
Cloud computing services offer different types of computing resources over the internet as a utility service. A cloud service provider manages clusters of hardware resources and dynamically allocates these resources for consumers in terms of a virtual machine. Consumers acquire and release these virtual resources according to current workloads of their applications. The provider ensures a secure compartment for each of the consumer’s environments, while trying to utilize the entire system at the lowest cost. Virtualization is an enabler technology behind this scenario.
Virtualization is an abstraction of a computer system which allows multiple guest systems to run on a single physical platform [
Virtualization could be done at a process or system level. The software for system virtualization is broadly known as a hypervisor or a virtual machine monitor (VMM). The term VMM is used in this document, but they are semantically interchangeable.
Figure
A comparison of a computer system with and without virtualization [
Three immediate advantages of virtualization that could be derived from this model include: (1)
The evolution of virtualization technology is explained by Rosenblum and Garfinkel [
The rule of polarity took place at the flourish age of microcomputers and complex OS. The drop in hardware cost has led to a variation of machines. Over-complex and large operating systems compromised its reliability. The OS became fragile and vulnerable, that the system administrator deployed one application per machine. Clearly, these machines were underutilized and resulted in maintenance overheads. In ‘05, virtualization became a solution to this problem. It was used as a mean to consolidate servers. Nowadays, virtualization is used for security and reliability enhancements.
Understanding of virtualization requires knowledge on computer architecture. Architecture could be seen as a formal specification of interfaces at a given abstraction level. A set of the interfaces provides a control over the behaviors of resources implemented at that level. Implementation complexity is hidden underneath. Smith and Nair describe the computer architecture at three abstraction levels: a hardware level, an operating system level, and an operating system library level [
Interfaces of a computer system at different levels of abstraction [
The interfaces at three abstraction level of computer systems are defined as follows.
As an interface to hardware, ISA is a set of low level instructions that interact directly with hardware resources. It describes a specification of instructions supported by a certain model of processor. This includes an instruction’s format, input, output, and the semantic of how the instruction is interpreted. While most of these instructions are only visible to the OS, some can be called directly by applications.
At a higher level ABI provides indirect accesses to hardware and I/O devices through OS system calls. The OS executes necessary validation and perform that operation on behalf of the caller. In contrast to ISA ABI is platform independent, as the OS handles the actual implementation on different platforms.
At the application level functionality is provided to application programs in terms of libraries. API is independent from the model of platform and OS, given that the variations are implemented at the lower abstraction levels (ABI and ISA).
As mentioned, a virtualized system contains an addition software layer called VMM or hypervisor. The main functionality of VMM is to multiplex hardware resources to support multiple guest VMs and to maintain its transparency. To achieve this VMM needs to handle the virtualization of a processor unit and registers, memory space, and I/O devices [
There are several techniques to allow multiple guest systems to share similar processing units. We summarize three main approaches for CPU virtualization that are largely discussed in the literature. The first technique requires a hardware support; the second relies on a support from an operating system; the last uses a hybrid approach.
CPU is virtualizable if it supports VMM’s direct execution [
Figure
Privileged ring options for virtualizing CPU [
Paravirtualization technique is introduced by Denali [
This technique combines direct execution with on-the-fly binary translation. Typical applications running under CPU unprivileged mode can run using direct execution, while nonvirtualizable privileged code is run under control of the binary translator. The traditional binary translator was later improved for better performance. Instead of using a line-based translation, this translator translates the privileged code into an equivalent block with a replacement of problematic code and stores the translated block in the cache for future uses.
A software technique for virtualizing memory which is to have VMM maintains a shadow version of a guest’s page table and to force CPU to use the shadow page table for address translation [
To maintain a valid shadow page table, VMM must keep track of the guest’s page table and update corresponding changes to the shadow page table. Several mechanisms are designed to ensure the consistency of page tables [
One technique is to write protect the physical memory of the guest’s page table. Any modification by the guest to add or remove a mapping, thus, generates a page fault exception, and the control is transferred to VMM. The VMM then emulates the operation and updates the shadow page table accordingly.
Another technique is Virtual TLB. It relies on CPU’s page fault interruptions to maintain the validity of shadow page table. VMM allows new mapping to be added to the guest’s page table without any intervention. When the guest tries to access the address using that mapping, the page fault interruption is generated (as that mapping does not exist in the shadow page table). The interruption allows VMM to add new mapping to the shadow page. In the case that the mapping is removed from the guest’ page table, VMM intercepts this operation and removes the similar mapping from the shadow page table.
In addition, VMM needs to distinguish the general page fault scenario from the one associated to inconsistency of the shadow page table. This could result in significant VMM overhead.
The goal of virtualizing I/O devices is to allow multiples VM to share a single host’s hardware. Challenges in this area include scalability, performance, and hardware independence [
Through device emulation, VMM intercepts an I/O operation and performs it at the hardware devices [
Paravirtualization is introduced to reduce the limitation of device emulation. To achieve better performance, a guest operating system or device drivers is modified to support VMM interfaces and to speed up I/O operations [
An intermediate access through VMM is bypassed when using a direct access. VMs and I/O devices communicate directly through a separate channel processor. It results in significant elimination of virtualization overhead. However, the advantage of hardware independence is lost by having VMs tie with a specific hardware.
When considering computing architecture, virtualization is generally implemented at two levels: at the application level (process virtualization) and at the bottom of software stack (system virtualization). Smith and Nair summarize the mechanism behind these basic types of virtualization [
This process is executed through scheduling and context switching. Through context switching, the OS switch in the value of CPU registers for the current process, so that it starts from the previous processing state. CPU is virtualized by scheduling algorithm and context switching. The memory is virtualized by giving an individual memory page table for each process. Interactions between a process and virtual resources are through ABI and API. In short, the goal of process virtualization is to allow multiple processes to run in the same hardware, while getting a fair share of CPU times and preventing intervention, such as access to memory, from other processes.
In
As mentioned, VMM provides system virtualization and facilitates the operations of VMs running above it. Several types of VMM could be found in the literature. In general they provide similar functionalities, but implementation details underneath are different, for instance, how the I/O resources are shared or how ISA translations are performed. Different approaches to system virtualization are as follows.
Native VMM runs on bare hardware and provides device drivers to support multiple VMs placed on top. Guest VMs might use similar or different ISA as an underlying platform. VMM runs in the highest privileged mode and gains an ultimate control over the resources. VMM needs to maintain its transparency to guest VMs, while providing them secured compartment. To achieve this, it intercepts and emulates guest privileged instructions (kernel related). Most of the guest applications could be run unmodified under this architecture. Examples of traditional VMM include XEN (
Traditional VMM may virtualize a complete ISA to support guest VMs that use different ISA than the host platform. XEN uses paravirtualization, while VMWare ESX uses binary translation. Other VMMs might use a combination of both techniques or their improvement or neither. This depends on the hardware support, the collaboration from OS vendors, and system requirements [
An alternative to native VMM places VMM on top of the host operating system. The hosted VMM could be installed as an application. Another advantage of hosted VMM is that it relies on components and services provided by the host OS and virtualizes them to support multiple guest VMs. For instance, in contrast to traditional VMMs, hosted VMM uses device driver from host OS. This results in a smaller-size and less complex VMM. However, host VMM does not support different ISA guest VMs. Example of this type of VMM are Oracle VirtualBox (
Codesigned VMMs target at improving performance by compromising portability. It implements a proprietary ISA that might be completely new or is an extension of an existing ISA. To achieve performance, a binary translator translates guest’s instruction to an optimized sequence of host ISA and caches the translation for future use. Codesigned VMM is placed in a hidden part of memory inaccessible by guest systems. Examples include Transmeta Crusoe and IBM iSeries.
Microkernel is a thin layer over the hardware that provides basic system services, for instance isolated address space to support multiple processes [
The benefits of virtualization could be derived from its usage model. Uhlig et al. identify three main use scenarios of virtualization and their respective benefits as follows.
Main benefits as virtualization is used for workload isolation (Figure
Three usage models of virtualization [
Isolation
Consolidation
Migration
In many cases virtualization is used for server consolidation (Figure
Figure
This section gives a review on virtualization technology, especially on system-level virtualization which is used to manage a resource pool in cloud computing environments. Virtualization has been used since 1960 in mainframe environments for a system isolation purpose. Before the introduction of cloud computing, virtualization provides an efficient solution to consolidate underutilized servers. The benefits of server consolidation, including resource optimization, a reduction of energy consumption, and a maintenance cost reduction, drive its large adoption. System virtualization software, known as a virtual machine monitor (VMM), allows multiple guest systems to share a similar hardware system. To enable this sharing, three main hardware elements, that is, CPU, memory, and I/O devices, are virtualized. Based on architecture and supports from operating system vendors, CPU virtualization could be done in several ways. Its goal is to multiplex CPU cycles and to remain invisible to guest systems. VMM direct execution requires VMM to run in a higher privileged level than the guest’s OS and applications, in order to trap guest’s instructions and emulate CPU responses when necessary. Paravirtualization ports an OS directly to a specific CPU architecture, resulting in better performance. The direct execution could be combined with line-based or block-based binary translation to remove the dependency on OS vendors. I/O virtualization appears to be the most problematic one that causes significant performance overhead.
The goal of this section is to explore cloud storage solutions and understand impacts of their design decisions to system characteristics. Scalable storages known as NoSQL are becoming popular, as it solves performance problems of relation databases when dealing with big data. The differences between NoSQL and relational databases including their implementation details are analyzed in this section.
For several decades traditional ways of storing data persistently have been through relational databases or file systems [
Relational database is a common term for relational database management systems (RDBMS). RDBMS represents a collection of relations and mechanisms that force a database to conform to a set of policies such as constrains, keys, and indices. A relation (table) consists a number of tuples (rows) that have a similar set of attributes (columns). In other words, a relation represents a class and tuples represent a set of objects which belong to the same class. The relation is defined using data schema which describes a name and a data type for each of the attributes.
RDBMS provides operations to define and adjust a schema for table and enforces that data stored in the relation are strictly conformed to the schema. Relations can be modified through
RDBMS was designed primarily for business data processing [
However, strong data consistency and complex queries supported by RDBMS cause several limitations in terms of scalability, performance, and inflexibility of data schema.
As an alternative to RDBMS, NoSQL databases offer scalable distributed data tier for large scale data management [
There are two categories of NoSQL databases [
Data are stored as an array of entries, where a single entry is identified through a unique key. Common operations are deletion, modification or read an entry of a given key, and insertion of a new key with associated data. Key-value stores are implemented by distributed hash tables. Examples include Amazon Dynamo and MemcacheDB.
Document-oriented storages represent loosely structured data storages where there is no predefined schema or constrains limiting databases to conform to a set of requirements. Records can contain any number of attributes. Attributes could be added, removed, and modified in a flexible manner without interrupting ongoing operations. Examples include MongoDB and CouchDB.
NoSQL databases differ significantly at the implementation level, for instance, data models, update propagation mechanisms, and consistency scheme. However, several characteristics and features are common for NoSQL systems [
This section covers a concept of database transactions which is guaranteed through ACID properties and mechanisms to support ACID in distributed systems.
The ACID is fundamental principle of database system [
Based on the ACID properties, a transaction can be terminated in three ways. First, it successfully reaches its commit point, holding all properties true. Second, in a case that bad input or violations that prevent a normal termination has been detected, all the operations that have been executed are reset. Finally, the transaction is aborted by the DBMS in the case of session time-out or deadlock.
Web services are expected to provide strongly consistent data and to be highly available. To preserve consistency they need to behave in a transactional manner; that is, ACID properties are respected [
It is challenging in general to maintain the ACID properties for distributed storage systems, as the data are replicated over geographic distances. In practice, it is impossible to achieve three desired properties in the same time [
To deal with CAP, the designer has an option of dropping one of three properties from system requirements or improving an architectural design. The scenarios of dropping one of the CAP are, for instance, running all components related to the services on one machine (i.e., dropping partition tolerance); waiting until data of every replicated node become consistent before continuing to provide the services (i.e., dropping availability); or accepting a notion of weaker consistency.
CAP implies that if we want a service to be highly available (minimal latency) and we want the system to be tolerant to network partition (e.g., messages lost, hardware outages), then sometimes there will be a case that the values of the data at different nodes are not consistent.
As mentioned, dealing with the consistency across replicas is one of challenges of distributed services. A variety of consistency models have been proposed to support applications that can tolerate different levels of relaxed consistency [
A number of cloud storage systems have emerged in the recent years. A lack of a standard benchmark makes it difficult to understand the design tradeoffs and quality of services; each of them provides to support different workloads. To this end, Cooper et al. summarize main tradeoffs the providers face during the architectural design which impact the CAP property of the system and applications relying on it [
Different types of applications (i.e., latency sensitive applications at one end and throughput oriented applications at another end) needs different tradeoffs between optimizing for read and write operations. Several design decisions for these operation exists. An
Updates could be written to disk before it returns success to users, or it could be buffered for a group write. In cases that multiple updates could be merge to a single I/O operation, the group write results in a lower latency and higher throughput. This advantage comes with a risk of losing recent updates when a server crashes.
The purpose of synchronization is to ensure that data stored in all replicas are updated and consistent. An algorithm for synchronizing replicas determines a level of consistency, availability (through a locking mechanism), and response time.
A storage could be strictly row-based structured or allows for a certain degree of column storage. Row-based storage is efficient for accessing a few records for their entirely content, while column-based storage is efficient for accessing multiple records for their certain details.
The concept of “one-architecture-fits-all” does not suit for distributed storage systems. However, the ultimate goal for each design is to maximize key properties of storage system: performance, availability, consistency, and durability. Different design decisions reflect in the features and quality of services provided by different storage providers which are discussed in the next section.
Cloud providers offer a number of solutions for very large data storages. It is necessary to understand the mechanisms that each solution applies to enforce system requirements. This includes, for instance, how the data are partitioned across machines (elasticity), how the updates are routed (consistency), how the updates are made persistent (durability), and what and how failures are handled (availability).
Google Bigtable is a distributed data storage designed to scale to a very large size, to be fault tolerant, and to support a wide range of applications. It is designed to support applications which require different workload and performance. Bigtable is used by more than 60 products of Google, such as search engine, Google Docs, and Google Earth. Chang et al. describe architecture of Bigtable as summarized in this subsection [
(1)
Bigtable is a sparse, distributed, and multidimensional sorted map. The map is indexed by a combination of row key, column key, and timestamp. This design makes it convenient to store a copy of a large collection of web pages into a single table (Figure
An example a Bigtable that stores a webpage [
Read or write of data is atomic under a single row key disregard of a number of columns involved. A row range of the table is partitioned into a
A column key is formed by a syntax
Each cell of Bigtable can contain multiple versions of the same data, indexed by a timestamp.
(2)
Bigtable is built on top of several components of Google infrastructure.
Bigtable uses DFS to store logs and data files. A cluster of Bigtable is controlled by a cluster management system which performs job scheduling, failure handlings, consistency controls, and machine monitoring.
SSTable is an immutable sorted map from keys to values used by DFS to store chucks of data files and its index. It provides look up services for key/value pairs of a specific key range. Figure
Tablet.
Chubby is a highly available persistent distributed lock service. It consists of five active replicas, one of which is assigned to be a master to handle service requests. The main task of Chubby is to ensure that there is at most at a time only one active master which is responsible for storing a bootstrap location of Bigtable data, discovering tablet servers and storing Bigtable schemas and access control lists.
A structure of Bigtable consists three main components: a master server, tablet servers, and a library that is stored at the client-side. The master server assigns a range of tablets to be stored at each tablet server, monitors servers’ status, controls load balancing, and handles changes of the table schema. The tablet servers could be added or removed with response to the current workload under the control of the master. Read and write of records are performed by the tablet server. Thus, clients communicate directly to a tablet server to access their data. This architecture eliminates a bottleneck at the master server, as clients do not rely on it for tablet location information. Instead, this information is cached in a library at the client side.
(3)
A large number of Google products rely on Google File System (GFS) for storing data and replication. While sharing the same goals as other distributed storage systems (such as fault tolerance, performance, and availability), GFS are optimized for the following use scenarios [
GFS uses a relaxed consistency model which implies the following characteristics [
File namespace mutations are atomic and are exclusively handled by the master server. A global order of operations is recorded in the master’s operation log, enforcing the correct execution order for concurrent operations. Possible states of mutated file are summarized in Table
File region state after mutation [
Write | Record append | |
---|---|---|
Serial success | Defined | Defined interspersed with inconsistent |
Concurrent success | Consistent but undefined | |
Failure | Inconsistent |
A mutation could be a write or a record append. A write mutation writes data at an application-specific file offset. An append mutation appends the data atomically at least once at the offset determined by GFS and returns that offset to the client. When the write mutation succeeds without interferences from concurrent writes, the file region is defined. In existence of concurrent writes, successful write mutations leave the region consistent but undefined; that is, data are consistent across replicas, but they might not contain the data written by the last mutation. Failed write mutations result in an inconsistent file region. With regards to an append mutation, GFS guarantees that the data must have been written at the same offset on all replicas before the operation reports success. The region in which the data have been written successfully is defined, while intervening regions are inconsistent.
GFS guarantees that a file region is defined after successful mutations by using the following update sequence: (1) mutations are applied to a file region in the same order on all replicas; (2) stale replicas (the ones have missed mutation updates during the time that chunkserver is down) is not involved during the update, thus their locations are not given to clients.
Yahoo PNUTS is a massively parallel distributed storage system that serves Yahoo’s web applications. The data storage is organized as an ordered table or hash table. PNUTS is designed based on an observation that a web application mostly manipulates a single data record at a time, and activities on a particular record are initiated mostly at a same replica. This design decision reflects on its guaranteed consistency and replication process. Cooper et al. describe the architecture of PNUTS as explained in this subsection [
(1)
PNUTS presents a simplified relational model in which data are organized into a table of records (row) with multiple attributes (column). In addition to typical data types, it allows arbitrary and possibly large structured data, which is called
The query language of PNUTS is more restrictive than those supported by relation models. It allows selections and projections over a single table. A primary key is required for updates and deletes of records. In addition, it provides a multiget operation which retrieves up to a few thousand records in parallel based on a given set of primary keys. Currently the system does not support complex and multitable queries such as join and group by operations.
PNUTS system is divided into multiple regions, each of which represents a complete system and contain a complete copy of tables. It relies on a pub/submechanism for replication and update propagations. The architecture of PNUT with two regions is illustrated in Figure
PNUTS’s architecture.
In terms of storage, tables are partitioned horizontally into a number of tablets. A tablet size varies from hundred megabytes to few gigabytes. A server contains thousands of tablets. The assignments of tablets to servers are optimized for load balancing. PNUT supports two types of storage structures: ordered tables and hash tables.
In order to localize a record to be read or written, a router determines a tablet that contains the requested record and a server that stores that tablet. For ordered tables, the router stores an interval mapping which defines boundaries of tablets and a map from tablets to a storage unit. For hash tables, the hash space is divided into intervals, each of which corresponds to a single tablet. Tablet boundaries are defined by a hash function of a primary key.
Although the process of record localization is performed by the router, it stores only a cached copy of the mapping. The whole mappings are maintained by the tablet controller. It controls load balancing and division of records over tablets. Changes of record locations cause the router to misroute the requests and trigger the new mapping retrieval.
(3)
A consistency model of PNUTS comes from an observation that web applications often changes one record at a time, and different records have activities with different locality. PNUTS proposes
The system currently provides three variations of data accesses:
Amazon has developed a number of solutions for large scale data storages, such as Dynamo, Simple Data Storage S3, SimpleDB, and Relational Database Service (RDS). We focus on the architecture of Dynamo because it has served a number of core e-commerce services of Amazon that need a tight control over the tradeoffs between availability, consistency, performance, and cost effectiveness. The architecture of Dynamo is described by DeCandia et al. [
(1)
Dynamo is classified as a distributed key-value storage optimized for high availability for write operations. It provides read and write operations to an object which is treated as an array of bytes uniquely identified by a primary key. The write operation requires that a context of the object is specified. The object’s context encodes system metadata, such as a version which is necessary for validity checks before write requests could be performed.
(2)
Dynamo uses consistent hashing to distribute workloads across multiple storage hosts. Consistent hashing is a solution for distributed systems in which multiple machines must agree on a storage location for an object without communication [
Dynamo replicates objects into multiple nodes. Once a storage node is determined, that node (known as
Partitioning and replication model of Amazon Dynamo.
(3)
Dynamo provides eventual consistency, as updates are propagated to replicas in an asynchronous manner. A write request responses to the caller before the update is successfully performed at all the replicas. As a result, a read request might not return the latest update. However, applications that rely on this system can tolerate the cost of relaxed consistency for the better response time and higher availability.
Dynamo uses a vector clock to handle different versions of the object during the update reconciliation. A write request is attached with object’s context metadata which contains the vector clock information. This data is received from the earlier read. Read and write operations are handles by the coordinator node. To complete a write operation, the coordinator generates the vector clock for the new version, writes the object locally, and propagates the update to highest-ranked nodes which are preferable locations for storing that object. In addition, the system allows clients to define a minimum number of nodes that must participate in a successful write. Each write request is successful when the coordinator receives at least that minimum number of responses from the target nodes.
Cloud storage solutions are reviewed in this section. We begin the section by clarifying the fundamental differences between relational and scalable NoSQL databases. The name NoSQL comes from its lack of supports on complex query languages and data integrity checks. NoSQL works well with big data because the data are partitioned and stored in distributed nodes. It generally maintains better performance as compared to relation databases by compromising strong data consistency and complex query supports. CAP theorem explains the dependencies between three desired properties- consistency, availability, and partition tolerance. To deal with CAP, one property needs to be dropped from a system property, resulting in having a CA, CP, or AP as a main architectural driver. Consistency schemes ordered from the strongest to and weakest one are strong, eventual, timeline, and optimistic accordingly. While strong consistency guarantees that accesses to any replica return a single most updated value, optimistic consistency does not guarantee whether or not an access to a certain replica returns an updated value at all. Three scalable storage models used by leading cloud providers, that is, Google Bigtable, Yahoo PNUTS, and Amazon Dynamo, are distributed key value stores that supports relaxed consistency and provide low level APIs as query interfaces. They are in fact significantly different in architecture and implementation details. The diversity of data management interfaces among providers might result in data lock-in which is one of the primary concern of its adoption.
The objective of this section is to provide an insight to security concerns associated to cloud computing. To achieve it, first of all, we point out security vulnerabilities associated to the key architectural components of cloud computing. Guidance to reduce the probability that vulnerabilities would be exploited is described. In the end we present a mapping of cloud vulnerabilities and prevention mechanisms.
The architecture of cloud computing comprises an infrastructure and a software operating on the cloud. Physical locations of the infrastructure (computing resources, storages, and networks) and its operating protocol are managed by a service provider. A virtual machine is served as a unit of application deployment. Underneath a virtual machine lies an additional software layer, that is, virtualized hypervisor, which manages a pool of physical resources and provides isolated environments for clients’ virtual machines.
Vulnerability, threat, and risk are common terms in the context of security which are used interchangeably, regardless of their definitions. To remain consistent we provide the definition of these terms as follows: (i) Vulnerability is defined by NIST as “
Vulnerabilities to cloud computing arise from flaws or weaknesses of its enabling technologies and architectural components [
Vulnerabilities and security requirements of cloud services are identified in a number of literatures. In the work of Grobauer et al. the influences of cloud computing on established security issues are analyzed [
Through his work “Monitoring Cloud Computing by Layer,” Spring presents a set of restrictions and audits to facilitate cloud security [
Iankoulova and Daneva performed a systematic review to address requirements of SaaS security and propose solutions to deal with it [
In this subsection we shortly review the functionalities and components at each cloud layer and point out a number of vulnerabilities and subvulnerabilities related to them.
At the highest layer, SaaS is offered to consumers based on web technologies. The reliability of web services (backend), web browsers (frontend), and authentication and authorization mechanisms influences service security.
The vulnerabilities of web services are associated to the implementation of a session handler and the mechanism to detect inputs which creates erroneous executions.
Client-side data manipulation vulnerability is caused by inappropriate permissions given to web browser components such as plug-ins and mash-ups, allowing these components to read and modify data sent from the web application to the server [
The vulnerability of authentication and authorization refers to the weaknesses in a credential verification process and the management of credential information from both provider and user sides. A variety of consumer population increases the complexity in managing the controls in this area [
The use of a weak encryption algorithm, an insufficient key length, or an inappropriate key management process introduces encryption vulnerability.
A virtualized cloud platform is composed of a layer of development and deployment tools, middleware, and operating system (OS) which are installed on a virtual machine. The security of platforms relies on the robustness of each component, their integration, the platform management portal, and provided management APIs.
An OS arranges communications (system calls) between applications and hardware, therefore it has access to all data stored in the virtual machine. Data leakage could occur when an OS contains malicious services running on the background. For PaaS, a common practice of providers to secure cloud OS is to deploy single, harden, pared-down OS, and monitor binary changes on the OS image [
PaaS providers must provide a portal which allows system administrators to manage and control their environments. These administrative and management accesses share a similar set of web services vulnerabilities and the vulnerabilities of identity authentication and authorization control [
Cloud infrastructure is transferred to consumers in terms of a virtual machine (VM) that provides computing capability, memory, storages, and connectivity to a global network. Vulnerabilities to cloud infrastructure are caused by traditional VM weaknesses, management of VM images, untraceable virtual network communications, data sanitization in shared environments, and access to infrastructure administrative and management APIs.
Apart from the vulnerability residing in a VM image itself, this issue is concerned with the common practice of a provider to offer cloned VM images for IaaS and PaaS consumers and the management of these images.
Virtual communication channel vulnerability is concerned with the communications between VMs on the same physical resources. Generally a VM provides the capability for users to configure virtual network channels in order to directly communicate with another VM running on the same physical platform. The messages sent through such channels are invisible to network monitoring devices and therefore untraceable [
As previously mentioned, data in use, in transit and at rest must be secured. Access control and encryption are employed to protect data at rest. Secure network protocol, encryption, and public key are used to protect data in transit. However, the mechanism of freeing resources and removing sensitive data is overlooked, even though it is equally important in a shared environment.
IaaS must provide interfaces allowing administrators to perform their administrative and management activities. This includes remote accesses to utilize and configure computing services, storages, and network. These accesses share a similar set of web services vulnerabilities and the vulnerabilities of identity authentication and authorization control [
Multitenancy is a core mechanism of cloud computing to achieve the economies of scale. A virtualized hypervisor is designed to operate multiple VMs on a shared platform and therefore leads to resource optimization, decreased energy consumption, and cost reduction. There are, however, a number of vulnerabilities associated to hypervisors and the multitenancy architecture.
An insecure implementation of a hypervisor causes several vulnerabilities as follows.
The concept of multitenancy holds different definitions based on an abstraction to which it is applied [
In IaaS, it is likely that several VM instances share network infrastructure components such as DNS servers and DHCP servers [
This layer is concerned with physical security including servers, processors, storages, and network devices hosted in a data center.
The physical security of cloud data centers is concerned with the following issues [
A summary of the vulnerabilities associated to each of the cloud enabling technologies is presented in Table
Cloud computing core technologies and associated vulnerabilities.
Layer | Functionality | Vulnerabilities |
---|---|---|
(1) Application | Provide services through web applications and web services. | (v1) Vulnerabilities of web services |
| ||
(2) Platform | Provide programming interfaces and mediate communications between software and the underlining platform. | (v5) Vulnerabilities of a cloud platform |
| ||
(3) Infrastructure | Provide computing and storage capabilities and connectivity to a global network. | (v7) Vulnerabilities of a virtual machine |
| ||
(4) Unified resources | Three main features of hypervisors: operate multitenant virtual machine and application built up on it; provide isolation to multiple guest VMs; support administrative work to create, migrate, and terminate virtual machine instances. | (v11) Vulnerabilities of a virtualized hypervisors and its interfaces |
| ||
(5) Fabric | Cloud physical infrastructure including servers, processors, storages, and network devices hosted in the data center. | (v14) Vulnerability of physical resources |
Having identified the associated vulnerabilities to cloud computing, we are able to analyze to which extent they cover security breaches occurred and the concerns raised by current and prospect cloud consumers. In this section we present a list of threats to cloud security based on the cloud security alliance “
Based on its simple registration process and flexible usage model, clouds attract a number of attackers to host and conduct their malicious activities. Several malicious cloud adoptions have been found in recent years. Examples include
Clouds provide interfaces for consumers to use services and perform administrative and management activities such as provision, monitoring, and controlling VM instances. These features are generally designed as web services and thus inherit their vulnerabilities. Robust designs must ensure appropriate authentication and authorization mechanisms, a secure key management process, strong encryption, and sufficient monitoring of intrusion attempts.
This threat is concerned with considerable damage that malicious insiders could create by getting an access or manipulating data in a data center. Centralization of data inside cloud servers itself creates an attractive condition for an adversary to try out fraud attempts. To reduce the risk, cloud providers must ensure strong and sufficient physical access controls, perform employee’s background checks, make security process transparent to consumers, and allow for external audits.
A virtualized hypervisor provides secure compartments and mediates communication between guest systems and physical resources underneath. Flaws in hypervisors might grant an inappropriate permission for an operating system to access or modify physical resources which belong to other tenants. The risk could be reduced by monitoring physical resources for unauthorized activities, implementing best practices for deployment and configuration, performing a vulnerability scan and configuration audits on a hypervisor, and following best practices during its installation and configuration.
Data loss and leakage could be a result of unauthorized access, data manipulation, or physical damage. To protect unexpected loss and leakage, providers must implement robust access controls and secure key management process. Backup and recovery mechanisms should be put in place.
Cloud service models put paramount importance to authentication and authorization, as the security of every service depends on its accountability and robustness. The use of two-factor authentication, secure key management process, and a careful design of web services help to reduce the damage of this attack.
While most of the common threats to cloud security are associated to its underlining technologies (web services and virtualization), some of them (abusing and malicious insiders) could be solved by better governances. This list is expected to be updated as the usage of cloud computing grows in popularity.
This subsection presents a list of best practices to enhance cloud security as proposed by leading cloud providers and researchers. The practices are organized into five main areas including: identity and access control, data security, application security, security of virtualized environment, and physical and network security.
The provisions of IT infrastructure offered by cloud computing extend the focus of identity and access control from the application level to the platform and virtual machine level. The goal of identity and access control is to ensure that accesses to data and applications are given only to authorized users. Associated areas include strong authentication, delegated authentication and profile management, notification, and identity federation. A set of recommended practices is relevant to both cloud providers and consumers.
The challenges of authentication include the use of strong authentication methods, identity management, and federation.
The goal of authorization is to ensure adequacy and appropriateness of the access control to data and services. It could be achieved by matching a user’s privileges to his/her job responsibilities and to the characteristics of assets to be accessed and maintaining information necessary for audits.
Strong identity management includes the restriction of a strong password, password expiration, secure password changing, and resetting mechanisms [
Cloud providers should implement a notification process to inform users of security breaches that (might) happen [
The security of data and information on the cloud is one of the main hindrances for prospective cloud consumers. ISO 27001 defines information security through the following six perspectives:
The responsibility of cloud providers is to guarantee that the security is satisfied at every state of the data life cycle, that is, created, stored, used, transferred, archived, and destroyed. Data security policy and practices adopted by a provider should be explicitly declared and adhered to the quality of services agreed in the SLA.
This practice area supports authorization. Its challenge is to define appropriate and practical data classification scheme and access controls for different classes of data.
Encryption is used to ensure confidentiality, integrity, authenticity, and nonrepudiation. It should be applied at all states of the data life cycle when operating in shared environments.
Data availability is guaranteed by an appropriate backup and recovery plan. Its challenge is to ensure that the backup is performed regularly, and data encryption is done when it is required. In addition, it is equally important to perform the recovery test. In Google, the backup is supported by a distributed file system, in which the data are segregated into chunks and replicated over many places [
Data sanitization is the process of removing data stored in memory and storage before returning such resources to the shared pool. Sanitization is more difficult in a multi-tenancy environment where physical resources are shared [
People and processes are the main factors to achieve secured software. It is essential to have motivated and knowledgeable team members to adopt a development process which suits to the context and to have sufficient resources. This practice area aims to ensure that all of these factors are in place to support the development of secured software. We set the primary focus on web applications which is a general form of SaaS.
Software quality is influenced by the whole software development life cycle. The development team ought to set quality as a priority, apart from scheduling and financial concerns. Compromise on quality impacts a company’s sustainability in a long run, if not worse. The challenge in this area is to put appropriate and sufficient practices in place to ensure their compliance, to implement an appraisal program, and to provide the relevant knowledge to associated people. A list of best practices to ensure security of developed software adopted in Google [
A web service holds certain classes of vulnerabilities when it is not appropriately implemented. In cloud, web services are one of the core technologies in which cloud applications and operations (for instance, administration, provision, and service orchestration) rely on. It is a challenge for a development team to understand vulnerability and possible attack scenarios and to adopt counter mechanisms to ensure all perspectives of security. An example of standard counter mechanisms is
A frontend environment, including web browsers and front front-end systems, is one area of the attack surfaces [
Virtualization covers the layers of a virtualized hypervisor which handles compartmentalization and resource provisions, a virtual machine, and an operating system.
A special care should be given to a hypervisor, as it introduces a new attack surface, and its erroneous executions generate a cross-tenant impact [
A VM runs on a compartment managed by a hypervisor. The challenge for securing a VM is to verify the provided security controls, the robustness and the authenticity of an acquired VM image.
Compromises to the security of an OS affect the security of the whole data and processes resided in the systems, therefore an OS should be highly robust and secure. The following practices are recommended to secure an OS.
The centralization of data residing in a data center poses a significant security concern. Cloud consumers should be aware of a malicious insider threat in a cloud environment, where centralized data amplifies the motivation of malicious attempts [
The objective of this regulation is to ensure the availability of data and services hosted in cloud infrastructures. Consumers should ensure that the provider implements and respects standard Business Continuity Plan. Examples of such standard are BS25999 and ISO22301 [
In cloud environments a provider is responsible for protecting a customer’s data from accesses across the Internet. Network borders should be protected by robust mechanisms or devices. Firewalls are in place to protect each external interface, only necessary ports are open, and the default setting is denial. Intrusion detection and prevention mechanisms should be employed and kept up-to-date [
The mapping between the cloud vulnerabilities identified in Section 6.1 and the recommended security practices is presented in Table
Mapping of cloud vulnerabilities (column) and recommended security practices (row).
Application | Platform | Infrastructure | Unified resource | Fabric | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Web service v. | Client-side data manipulation v. | Authentication & Authorization v. | Encryption & key management v. | Cloud platform v. | Access to platform admin interface v. | Traditional virtual machine v. | Virtual network communication v. | Data sanitization v. | Access to VM admin interface v. | Virtualized hypervisor & interfaces v. | Multi-tenancy v. | Shared network component v. | Physical v. | |
Identity and access control | ||||||||||||||
Authentication | × | |||||||||||||
Authorization | × | |||||||||||||
Identity management | × | |||||||||||||
Notification | × | |||||||||||||
Data security | ||||||||||||||
Data classification | × | |||||||||||||
Encryption and key | × | × | × | × | × | × | ||||||||
Backup and recovery | × | |||||||||||||
Data sanitization | × | × | ||||||||||||
Application security | ||||||||||||||
Dev. life cycle | × | × | × | × | × | × | × | |||||||
Web service imp. | × | |||||||||||||
Frontend environment | × | × | ||||||||||||
Virtualized environment | ||||||||||||||
Virtualized hypervisor | × | × | × | × | × | |||||||||
Virtual machine | × | × | × | × | ||||||||||
Operating system | × | × | × | |||||||||||
Network and physical security | ||||||||||||||
Robust physical access | × | |||||||||||||
BCP | × | |||||||||||||
Network control | × |
Security is one of the most critical hindrances to nontrivial adoptions of new technologies. Cloud computing inherits security issues from its enabling technologies such as web services, virtualization, multi-tenancy, and cryptography. A virtualized hypervisor which enables resource sharing leads to the economies of scale but introduces a new attack surface. We analyze the security in the cloud by understanding the weaknesses associated to system procedures, design, and implementation of the components of cloud infrastructure. These weaknesses, known as
In this paper we depicted a comprehensive scenery of cloud computing technology. The view of cloud computing is shaped by its definition, service model, and deployment model. The characteristic of cloud services is shaped by its architecture, service management approach, and underlying technologies. Based on its solid foundations, cloud promises significant benefits on enterprise-level cost reduction, increased agility, and better business-IT alignment. Many more benefits arise when other levels or perspectives are taken into consideration. Apart from these benefits, however, cloud services still pose a number of limitations that hinder their wider adoption. Such limitations originate in the very nature of its building blocks. An example is the instability of large-scaled network. A limited capability of network has become a main obstacle for services that require reliable response time, such as high performance computing and latency sensitive applications, to be deployed in the clouds. Security is another major hindrance of cloud adoption that touches upon many technological factors including multi-tenancy, virtualization, and federated identity management. It is further aggravated by the fact that service level agreements are generally explicit about placing security risks on consumers. As cloud computing is not the silver bullet for all circumstances, it is necessary for technology adopters to sufficiently understand its concerns and properly handle them. Without comprehending its underlying technology, finding the most appropriate adoption approach seems to be an impossible mission. Our paper can assist this endeavor by offering a comprehensive review of the fundamentals and relevant knowledge areas of cloud computing.