In the late 1990s when internet consumption was booming and microprocessors were growing by Moore’s law, A need to build large scale systems was arising. Traditionally, a single computer system runs a single operating system. Instead of adding more physical machines for running applications, businesses like IBM started building virtualization solutions to run many operating systems on their single powerful mainframes.
Datacenters expanded in capacity comprising thousands of physical machines serving applications on the internet. Instead of providing a single physical machine to each user, many virtualized systems running on each machine offered a cheaper and more efficient solution revolutionizing the industry. VMWare arose to the technological forefront of virtualization with their state of the art server and desktop virtualization offerings.
Advantages of Virtualization
Using VMWare or Virtualbox software, a desktop computer can run many operating systems at the same time at near-native performance. Developers can have multiple virtual machines running various environments required for development and testing.
A developer on Windows can develop Linux applications as if they were running it natively. Production environments can be simulated on local systems easily with virtualized systems. Vagrant can be used to automate the process of setting up virtual machines and share them as declarative Vagrantfiles.
Server virtualization enables datacenters to efficiently utilize hardware resources by running multiple operating systems provisioned to many users running on a single machine. Amazon EC2 is the cloud compute service on Amazon Web Services, They use a modified Xen Server based system to provision and manage virtual machines across their datacenters efficiently.
Virtualization of Hardware
Various hardware components constitute a computer, the job of virtualization software is to emulate the hardware identically in software. This emulated computer system is identical to the one running the host and exposes identical interfaces. Virtualization Software executes the code for a guest operating system as it might be running on actual hardware and thus be able to run multiple operating systems on the same machine. The difficult problem is to emulate all the aspects of the hardware identically and efficiently.
The CPU, Memory and Device I/O are the major components needed to be virtualized by virtualization software. This emulation poses several technical challenges in performance, isolation, and security.
Important terms we’ll encounter on the way:
- Hypervisor: It is an application that manages many virtual machines running on the system.
It is either executed as a separate OS on the hardware directly called a Type 1 Hypervisor.
Otherwise, it can run inside another operating system as an application called a Type 2 Hypervisor.
Microsoft’s Hyper-V, The Xen Project are examples of Type 1 hypervisors.
VirtualBox, VMWare Workstation, QEMU are examples of Type 2 hypervisors.
- Host Operating System: The operating system via which the Virtual Machines are run. For Type 1 Hypervisors, as in Hyper-V, the hypervisor itself is the Host OS which schedules the virtual machines and allocates memory. For Type 2 hypervisors, the OS on which the hypervisor applications run is the Host OS.
- Guest Operating System: The operating system that uses virtualized hardware. It can be either Fully Virtualized or Para Virtualized. An enlightened guest OS knows that its a virtualized system which can improve performance.
- Virtual Machine Monitor: VMM is the application that virtualizes hardware for a specific virtual machine and executes the guest OS with the virtualized hardware.
- Full Virtualization: The guest OS is presented an identical CPU and hardware as the original host. This is difficult to achieve on x86 without hardware support as some components like the memory management unit is difficult to simulate.
- Para Virtualization: The code of the guest operating systems is modified. The interfaces for user applications don’t change but the kernel uses modified interfaces to interact with the hypervisor to access certain functions of the system. This improves virtualization performance.
The VMM provides an emulated CPU which runs exactly as the host CPU. Programs to be virtualized can be directly executed by the processor as the guest OS itself is built for x86 hardware. Challenges arise in the direct execution of code, if given complete access to the CPU and Memory it can modify the memory of the host itself.
x86 contains various sensitive privileged and unprivileged instructions. The virtualization system should prevent the Guest OS to have complete control over the CPU and thus control which instructions can be directly executed by the guest.
The Popek-Goldberg virtualization requirements set up a framework to analyze CPUs on their ability to virtualize efficiently. The x86 architecture does not fulfil the requirements of effective virtualization due to the presence of many critical unprivileged instructions.
in 2005, with the introduction of hardware-assisted virtualization extensions (Intel VT-x, AMD-V) and x86 processors were able to fulfil the Popek-Goldberg requirements.
Emulating the CPU
Emulating the CPU is executing the instructions present in the Guest OS program. This can be done by directly executing the memory containing these instructions. We need to make sure that the guest OS is not able to manipulate the system outside of regions of its memory and it cannot modify sensitive parts of the host system like the segment descriptors, memory management registers, etc. A situation which allows this is a vulnerability and is called a VM Escape, allowing the guest OS to escape the isolation of the virtual machine.
The first virtualization software were based on Binary Translation or BT, It trapped the execution of the instructions from the guest OS and translated them as required. If it required execution of sensitive instructions, it will convert the instructions to use a different instruction in actual execution and return the data as defined in the Virtual Machine Monitor.
The extra overhead in translation due to high amount of context switching between the guest and the host for translating instructions lead to performance degradation in binary translation.
The evolution of an x86 virtual machine monitor is an excellent paper from VMWare highlighting the details and challenges of binary translation.
Hardware Assisted Virtualization
To overcome the lack of performance in binary translation, CPU makers added virtualization support to the hardware which provided various features like hardware isolation of virtual machines, hardware paging and memory management for individual virtual machines. This enables a virtual machine to run at near-native speeds. Intel VT-x introduces new instructions to x86 enabling virtualization support in the hardware.
This hardware support brings the concepts of hosts and guests to the hardware enabling the CPU to virtualize its components like the MMU, TLB’s, etc to each virtual machine automatically. This makes full virtualization possible and virtual machines can execute at near-native performance. The amount of context switching between the guest to host OS decreases and most of the guest OS executes directly on the CPU rather than via the host OS.
With hardware virtualization, the hardware can virtualize the virtualization extensions themselves allowing recursive virtualization. The nested virtual machines run at the same level as the first guest OS rather than being executed inside the guest OS.
Emulating the memory
An operating system uses virtual memory to create address spaces and processes. The address space is a virtual and contiguous piece of memory for every process realized using paging and can be as large as 128 terabytes. This is called Virtual Memory. Implementing virtual memory systems in software is inefficient thus many CPUs come with an inbuilt Memory Management Unit which gives hardware assistance in creating virtual memory systems by providing hardware page tables and translation lookaside buffers.
The x86 architecture supports virtual memory with an MMU consisting of a TLB and a hardware page table walker. The walker fills the TLB by traversing hierarchical page tables, in physical memory.
The Translation Lookaside Buffer (TLB) caches page table entries resolving to physical addresses. As the TLB fills up with entries performance of the system increases as it less frequently incurs the penalty of traversing the page table structures filling the page table entry in the TLB.
The VMM virtualizes the MMU of the CPU in the software using the virtual memory mechanisms of the host OS. The Guest OS uses the MMU to create virtual memory systems just as it will on a physical machine and the VMM managed MMU keeps the memory of the guest OS isolated from the host OS implicitly providing security and isolation. Virtualizing the MMU in software is a difficult task, it involves using the host’s virtual memory mechanisms to build a TLB in software. Software TLBs are built using Shadow Page Tables, it is inefficient and one of the reasons why software virtualization is not performant.
Hardware virtualization extensions virtualize the hardware MMU and the TLB for the Guest OS, Intel VT-x provides this in the form of the Extended Page Table (EPT). Intel x86 CPU’s provides the %CR3 register which contains the pointer to the hardware page table and hardware instructions which can walk the page table and read the required page. EPT virtualizes the %CR3 register via an EPT Pointer (EPTPTR) for each virtual machine and provides individual hardware walked page tables.