Phoenix@BioHub Home
The place where Biology meets SuperComputers

Containers - to contain your complex environment dependencies

Preface

Is your code complex and have a lot of dependencies? Are you frastrated by the fact that your code is not compiling and complaining about the version conflict? This is the situation you will need the support of a contained environment.
Containers have a simple goal in mind, that is to isolate an application and its dependencies into a self-contained unit that can run anywhere regardless of hardware and shell environment etc. Form this perspective, VMs serves a similar function, which you may be more familier. Nevertheless, container is regarded as superior approach compare to VM, which removes the needs for physical hardware emulator and have performance superiority.

To understand why container is more superior, we have to understand more about architectural underline the container and VMs.

Virtual Machines

A VM is essentially an emulation of a physical computer and the execution of applications follows the same procedures as a physical computer would. This is because VMs run on top of a physical machine using a layer so called “hypervisor” or “virtual machine monitor” (VMM). A “hypervisor” is a piece of software, firmware and hardware emulator bundle that run directly on physical computers (which is called host machine under this context). The host machine provides physical resources like RAM and CPU etc. Hypervisors present the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. The resources provided by the host machine can be divided and distributed and multiple instances of a variety of operating systems may share the distributed resources after virtualisation by hypervisors. The name hypervisor also reflects its nature as a controller of the operating system kernel, where hyper is a stronger term than super and hypervisor is translated as the supervisor of the supervisor. (If you are interested in the incarnation of term hypervisor, you may find following two items intersting: original paper by Harry Katzan, 1970 and a book by Clinton McIntosh, 1970.)

A VM contains both the applications (i.e. the part you will be interested as bioinformatician) and whatever the supports are required to run that application (e.g. system binaries and varies dependency libraries). In addition, a VM also carries an entire virtualised hardware stack of its own (e.g. virtualised CPU, network adaptors and storage) VM is a fully fledged guest operating system with its own dedicated resources (virtualised from host machine’s distribution). VM can run on host machine in one of two modes: as a _hosted hypervisor__ or as a bare-metal hypervisor. They are fundamentally different.

Hosted Hypervisor

A hosted hypervisor runs through the operating sstem of the host machine. The benefit of a hosted hypervisor is largely sweep away the importance of the underlying hardware, this is due to the fact that the host operating system is handling the hardware communications. Therefore the hosted hypervisor is useful in the situation where the hardware compatibility is high on the priority list. On the flip side, hosted hypervisor inevitably invert an additional layer between hardware and application, the consequences can be more resource overhead and result a resource hungry application, lower VM performace.

Bare-metal Hypervisor

Contrary to hosted hypervisor, a bare metal hypervisor environment improves performance by directly run on host machine’s hardware. In turn, a bare-metal hypervisor needs its own operating system and own device drivers to be able to interact with underline hardware (e.g. I/O, kernel processes etc.). Therefore bare-metal hypervisor can perform better, have better scalability and enhanced stability. Nevertheless, this also comes a cost of compatibility given that only limited device drivers can be implemented at a single time.

In summary, although VM contains everything we need to run an application (bioinformatics application) and is independent to certain degree, it also need to compromise either performance or compatibility.

Containers

As oppose to VMs’ virtualisation at hardware level, Containers provides virtualisation at operating-system-level by abstracting the so-called “user space”. Like VM, Containers also have private processing space, private network interface, IP address (allow customised routes and iptable rules), full control of mount file systems and also have “root” privilege when executing shell commands. The one big difference between Containers and VM is that Containers utilise host system’s kernel directly.

Given that “Containers” only package up the user space but not the kernel (only bins and libs packaged independently) and virtual hardware like VM does, this makes containers typically “lightweight”.
If you are a programmer or have been tech nerd, you may have already heard of Docker. Docker is a helpful tools for packaging, shipping and running applications within “containers”. It is somewhat important to have a mindful understanding about Docker. Docker is an open-source project based on Linux containers (like LXC), therefore Docker naturally uses Linux Kernel features like namespaces and control groups in creating containers.

Docker containers have several advantages:

  1. Easy implementation.
  2. Speed. Docker containers are very lightweight and light fast. Typically Docker containers take only seconds to start, whereas spawn VM takes a full boot of virtual operating system every time.
  3. Docker Hub. Rich ecosystem of Docker Hub, there is almost certain to have an docker image to suit your needs.
  4. Modularity and Scalability.

In the next post, we will be talking through the fundamentals of Docker concepts and implementation. Stay tuned.

From Robert

comments powered by Disqus