Is your code complex and have a lot of dependencies? Are you frastrated by the fact that your code
is not compiling and complaining about the version conflict? This is the situation you will need the
support of a contained environment.
Containers have a simple goal in mind, that is to
isolate an application and its dependencies into a self-contained unit that can run anywhere
regardless of hardware and shell environment etc. Form this perspective, VMs serves a similar
function, which you may be more familier. Nevertheless, container is regarded as superior approach
compare to VM, which removes the needs for physical hardware emulator and have performance
superiority.
To understand why container is more superior, we have to understand more about architectural
underline the container and VMs.
A VM is essentially an emulation of a physical computer and the execution of applications follows
the same procedures as a physical computer would. This is because VMs run on top of a physical
machine using a layer so called “hypervisor” or “virtual machine monitor” (VMM). A “hypervisor” is a
piece of software, firmware and
hardware emulator bundle that run directly on physical computers (which is called host machine
under this context). The host machine provides physical resources like RAM and CPU etc.
Hypervisors present the guest operating systems with a
virtual operating platform and manages the execution of the guest operating systems. The
resources provided by the host machine can be divided and distributed and multiple instances of
a variety of operating systems may share the distributed resources after virtualisation by
hypervisors. The name hypervisor also reflects its nature as a controller of the operating
system kernel, where hyper is a stronger term
than super and hypervisor is translated as the supervisor of the supervisor. (If you
are interested in the incarnation of term hypervisor, you may find following two items
intersting: original paper by Harry
Katzan, 1970 and a book by
Clinton McIntosh,
1970.)
A VM contains both the applications (i.e. the part you will be interested as
bioinformatician) and whatever the supports are required to run that application (e.g. system
binaries and varies dependency libraries). In addition, a VM also carries an entire
virtualised hardware stack of its own (e.g. virtualised CPU, network adaptors and storage) VM is a
fully fledged guest operating system with its own dedicated resources (virtualised from host
machine’s distribution). VM can run on host machine in one of two modes: as a _hosted
hypervisor__ or as a bare-metal hypervisor. They are fundamentally different.
A hosted hypervisor runs through the operating sstem of the host machine. The benefit of a hosted hypervisor is largely sweep away the importance of the underlying hardware, this is due to the fact that the host operating system is handling the hardware communications. Therefore the hosted hypervisor is useful in the situation where the hardware compatibility is high on the priority list. On the flip side, hosted hypervisor inevitably invert an additional layer between hardware and application, the consequences can be more resource overhead and result a resource hungry application, lower VM performace.
Contrary to hosted hypervisor, a bare metal hypervisor environment improves performance by
directly run on host machine’s hardware. In turn, a bare-metal hypervisor needs its own operating
system and own device drivers to be able to interact with underline hardware (e.g. I/O, kernel
processes etc.). Therefore bare-metal hypervisor can perform better, have better scalability and
enhanced stability. Nevertheless, this also comes a cost of compatibility given that only limited
device drivers can be implemented at a single time.
In summary, although VM contains everything we need to run an application (bioinformatics
application) and is independent to certain degree, it also need to compromise either performance or
compatibility.
As oppose to VMs’ virtualisation at hardware level, Containers provides virtualisation at
operating-system-level by abstracting the so-called “user space”. Like VM, Containers also have
private processing space, private network interface, IP address (allow customised routes and iptable
rules), full control of mount file systems and also have “root” privilege when executing shell
commands. The one big difference between Containers and VM is that Containers utilise host system’s
kernel directly.
Given that “Containers” only package up the user space but not the kernel (only bins and libs
packaged independently) and virtual hardware like VM does, this makes containers typically
“lightweight”.
If you are a programmer or have been tech nerd, you may have already heard of Docker. Docker is a
helpful tools for packaging, shipping and running applications within “containers”. It is somewhat
important to have a mindful understanding about Docker. Docker is an open-source project based on Linux containers (like LXC), therefore Docker naturally uses Linux Kernel features like namespaces and control groups in creating containers.
Docker containers have several advantages:
In the next post, we will be talking through the fundamentals of Docker concepts and implementation. Stay tuned.
From Robert