Cloud computing attracted more and more attention in recent years, and virtualization technology is the key point for deploying infrastructure services in cloud environment. It allows application isolation and facilitates server consolidation, load balancing, fault management, and power saving. Live virtual machine migration can effectively relocate virtual resources and it has become an important management method in clusters and data centers. Existing precopy live migration approach has to iteratively copy redundant memory pages; another postcopy live migration approach would lead to a lot of page faults and application degradation. In this paper, we present a novel approach called TSMC (three-stage memory copy) for live virtual machine migration. In TSMC, memory pages only need to be transmitted twice at most and page fault just occurred in small part of dirty pages. We implement it in Xen and compare it with Xen’s original precopy approach. The experimental results under various memory workloads show that TSMC approach can significantly reduce the cumulative migration time and total pages transferred and achieve better network IO performance in the same time.
After the wave of pervasive computing and grid computing [
Virtualization technology has played a very vital role in resource management of cloud computing and it develops rapidly in recent years. The resources of a single physical machine are divided into multiple isolated virtual resources by using some virtualization softwares [
Live migration is the key point of virtualization technologies. It allows VMs fast relocation in data center and nonawareness of downtime. Lots of live migration techniques have been brought up these years [
In this paper, we present an optimized memory copy approach for live virtual machine migration. We combine the advantages of active pushing and on-demand copy; first copy all memory pages to target and record dirty bitmap in this phase (full memory copy stage), then suspend the VM, transmit CPU state and dirty bitmap (dirty bitmap copy stage), and finally resume the new VM and copy dirty pages from source to target (dirty page copy stage). We call it TSMC (three-stage memory copy). The main goal of TSMC is to minimize total migration time and reduce network traffic. Most of the memory pages need to be copied once in full memory copy stage; only dirtied pages need to be copied twice. Many approaches have been proposed to evaluate the performance of virtualization [
This paper is organized as follows. In Section
Precopy [
Postcopy [
Jin et al. [
To overcome the shortcomings of precopy and postcopy approaches, many other live migration methods [
In this section, we introduce the phase of live migration and describe the design of TSMC approach and its implementation on Xen. The performance of any live virtual machine migration strategy could be gauged by the following metrics.
Efficient synchronization of the memory state is the key issue of live virtual machine migration. Memory transfer can be achieved by following three phases [
Figure
Timeline for live migration approach.
To solve the weaknesses of existing live migration methods, we propose a new approach called three-stage memory copy (TSMC) which combines three phases of memory transfer. The entire memory synchronization is divided into three stages. Figure
Compared with precopy, three-stage copy avoids iterative copy of dirty pages; most of the memory pages are just copied once and only dirtied pages in full memory copy stage need to be copied twice. It significantly reduces pages transferred, thus reducing the usage of network bandwidth. Meanwhile, only dirty bitmap and CPU state need to be transferred in suspend phase; downtime of VM is also shortened. Although it would be interrupted in dirty pages copy stage because of page fault, but relative to full memory copy after resuming new VM in postcopy approach, three-stage copy just transfers dirtied pages after resuming, which significantly reduces the page fault rate and avoids obvious application degradation; also, it shortens the duration of the migration.
There are two methods used for transferring dirty page: on-demand copy and active push. Once the VM is resumed on the target, page faults would happen when memory access dirtied page; it can be serviced by requesting the referenced page over the network from the source node. However, page faults in new VM are unpredictable; on-demand copy would lead to longer resume time, so we combine it with active push whose source host periodically pushes dirty pages to the target in a preset time interval.
The procedure of TSMC is shown in Figure
Procedure of TSMC.
On-demand copy is the easiest and the slowest way. When the VM on destination resumes, the page faults will be transferred to source VM via the network and request the corresponding memory pages. Although on-demand copy copies dirty pages only once, it lengthens recovery time and degrades software performance. So it is unacceptable to transfer memory pages using on-demand copy alone.
Active push can reduce recovery time efficiently. It also reduces the long-time occupation of source VM’s resources. After new VM resumes, active push pushes dirty pages from source VM to destination in a preset interval. It avoids some page faults on new VM. When page faults occur, we request pages from source VM in the way of on-demand copy. The performance will be improved greatly by combining on-demand copy and active push.
Prepull was first brought up to predict the recent working set of softwares and it was based on software’s running history. In three-state copy, prepull is used to predict page faults in new VM. When page faults occur, then the pages around the missing page will probably be accessed, which leads to another page fault. Prepull increases the memory transfer window. When requesting pages from source VM, it transfers the pages around along with the request pages. In this way, less page faults will occur in the future.
We implemented TSMC on Xen 4.1.4. The point of our approach is to capture and recode dirty pages. Shadow page tables are used by Xen’s hypervisor to keep track of the memory state of guest OS; it can be used to capture dirty pages. Figure
Shadow page table.
Because all page tables in guest OS are mapped to read-only shadow page tables, any updates in page tables trigger page faults which would be captured by Xen’s hypervisor. Xen checks the PTE access right of the guest OS and sets PTE in shadow page tables to writable if the guest OS is writable to the PTE. Then we can record the updates in shadow page tables into a dirty bitmap.
By this way, we will be able to capture the occurrence of dirty pages and obtain a dirty page bitmap. Xen provides an API function xc_shadow_control() to handle shadow page tables. This feature can be turned on by calling xc_shadow_control() and setting flag as XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY before live migration and turned off by setting XEN_DOMCTL_SHADOW_OP_OFF flag after migration finished.
In this section, we present an evaluation of three-stage copy on Xen 4.1.4 and compare it with Xen’s original precopy approach.
We conduct our experiments on two identical server-class machines, each with 2-way quad-core Xeon E5506 2.13 GHz CPUs and 32 GB DDR RAM, connected via a Giga-bit Ethernet switch. All VM images are stored in a NFS server. We use Ubuntu 12.04 (Linux version 3.5.0-23) as guest OS and the privileged domain OS (domain 0). The host kernel is the modified version of Xen 4.1.4. Both the VM in each experiment and the domain 0 are configured to use two VCPUs. Guest VM sizes range from 128 MB to 1024 MB. And we use memtester [
Each experiment is repeated five times and every test result comes from the arithmetic average of five values. In migration process, we evaluate three primary metrics discussed in Section
Figure
Comparison of total migration time, pages transferred, and downtime.
Figure
Evaluation in downtime Figure
Figure
Comparison of total migration time, pages transferred, and downtime.
Figure
Evaluation in Figure
We focus on the network throughput in the network IO test. As shown in Figure
Comparison of total migration time, pages transferred, and downtime.
This paper presents a three-stage memory copy (TSMC) approach for live virtual machine migration. In TSMC approach, the entire memory copy is divided into three stages: full memory copy, dirty bitmap copy, and dirty pages copy. Most of the memory pages are just copied once; only dirtied pages need to be copied twice. It can significantly reduce the total pages transferred and cumulative migration time. Furthermore, because the TSMC approach just transfers dirty bitmap in stop phase of virtual machine, downtime is also shortened. Experimental results show that the TSMC approach could get better performance than Xen’s precopy.
The authors declare that there is no conflict of interests regarding the publication of this paper.