Venue: Internet
Jeff Meyerson talks to Idit Levine about Unikernels and unik, a project for compiling unikernels. The Linux kernel contains features that may be unnecessary to many application developers–particularly if those developers are deploying to the cloud. Unikernels allow programmers to specify the minimum features of an operating system we need to deploy our applications. Topics include the the Linux kernel, requirements for a cloud operating system, and how unikernels compare to Docker containers.
Show Notes
Related Links
- Twitter https://twitter.com/idit_levine
- Unikernels https://en.wikipedia.org/wiki/Unikernel
- Unik https://github.com/emc-advanced-dev/unik/wiki/UniK:-Build-and-Run-Unikernels-with-Ease
- Idit Levine YouTube Video https://www.youtube.com/watch?v=UC6p_xo1Rt4&list=PLhuMOCWn4P9gGrKEtCBKYpEl5BXGBCsQZ&index=19
- Scott Weiss on Unikernels and unik http://softwareengineeringdaily.com/2016/08/11/unikernels-and-unik-with-scott-weiss/
- Bio: https://www.linkedin.com/in/iditlevine
Transcript
Transcript brought to you by innoQ
This is Software Engineering Radio, the podcast for professional developers, on the web at SE-Radio.net. SE-Radio brings you relevant and detailed discussions of software engineering topics at least once a month. SE-Radio is brought to you by IEEE Software Magazine, online at computer.org/software.
* * *
Jeff Meyerson: [00:01:19.26] Idit Levine is the CTO of the cloud management division at EMC. She has previously worked at several other companies focused on cloud computing and virtualization. Idit, welcome to Software Engineering Radio!
Idit Levine: [00:01:33.03] Thank you for having me, Jeff!
Jeff Meyerson: [00:01:35.19] It’s great to have you. Today, most of our software development takes place on virtualized operating systems. What are the different layers of this virtual stack?
Idit Levine: [00:01:51.04] If you’re looking today at all this stack, you will see that the first thing that we have everywhere is the hardware itself. That’s where the stack is starting. Then you have their drivers, on top of it you have something like a hypervisor, which is supposed to separate between virtual machines. Then you have the virtual drivers on the operating system of the guest, then you have the kernel of the guest, then you have the OS processors – their job is to separate between different processors that are running on the same guest OS. On top of, you can put a Docker container or any container that you want. Then usually you’re putting your library, config application and everything you need; then you have the application binary. All this stack is at the end of the day showing you what you really want to do, which is to run your application.
Jeff Meyerson: [00:02:52.10] So we have hardware, we have hardware drivers, we have a hypervisor, virtual hardware drivers, the operating system kernel, a bunch of more layers… Why do we have so many layers?
Idit Levine: [00:03:05.02] That’s a good question, and the best answer for that will be evolution. In the beginning, you had only the physical machine, a big computer that the business was running; the mainframe – they were buying it, and it was very expensive. Then they said, “Well, that’s very expensive, we need to run more applications on it. We cannot run only one application on this very expensive piece of physical machine.” The job was to separate between users and applications, to make sure that two applications can run on the same machine, but they cannot harm each other, or kill the machine and then both of the applications will die.
[00:03:52.27] That was very hard. You needed to do time sharing; the same people couldn’t work with the same machine etc. It was very complicated, so what people said was, “Okay, why don’t we take one physical machine and run one application?” In terms of money, that’s not the best approach, so then with evolution came virtual machines. If we put virtual machines, we can create another layer that will separate those servers. They will give us the same ability to run one application on a server, but actually there will be several applications with several virtual machines on one server. That was the next layer.
[00:04:38.17] Afterwards they said, “Well, it’s too complicated. There is a lot of configuration management you need to do to these applications. You need to manage them. Very, very hard. Why don’t we find a better way to separate between the applications on top of the virtual machine?” This is where the container came. The container is a very good use of… “Let’s just deploy this application.” It’s like a packaging tool. That’s when the last layer (the container) came.
[00:05:08.16] All these layers are basically evolution, but they never stop and say, “Well, does that make sense? Maybe it doesn’t make sense anymore.” No one did that. They continued to develop what existed.
Jeff Meyerson: [00:05:22.20] When we do take a step back and we say, “Do we actually need all of this? Do we need all these layers? Is there unnecessary redundancy in this stack?”, what is the answer to these questions?
Idit Levine: [00:05:34.24] In my opinion, there is not. All those layers are abstractions. There is a lot of code, there is a lot of dependency, a lot of stuff to reason about, and I feel that this is all unnecessary. Today we’re running applications, usually one per server; sometimes we’re running only functions, with Lambda and other serverless technologies that exist today. You don’t need all of this, it doesn’t make any sense to use it. There’s overkill, and that’s what makes it more complicated, less performable; a lot of storage that you really don’t need, and you still put [unintelligible 00:06:14.00]. I really don’t think we need it.
Jeff Meyerson: [00:06:22.14] The aim of the current stack is typically to run a single application with a single user, on a single server. How does this contrast with how we are actually building and running applications when we think about it from a fundamentals perspective?
Idit Levine: [00:06:44.22] We are running eventually one application on a server. We are doing it. The problem is that when we’re doing it, that’s the end result — this is the interface that you’re putting the application in the end, but there is a lot of very complicated stack before, and it influences the ability of the application to be very performable. So eventually we are doing it, and the reason we’re doing it is because we created more and more layers. Eventually, this will give us what we really need, which is to run one application. The only thing is that if one will take all these layers and say, “Wait, can we take all of this stack and think about what we really need in order to run it, and clean it up – clean slate?” You probably will not choose this architecture. It is going to work, it’s just not the best, the most efficient one.
Jeff Meyerson: [00:07:35.20] You have said that in our current model we are trading off efficiency in favor of compatibility. You’ve touched on this in this conversation already, but I really want to give the listeners a better idea. Describe what you mean by these tradeoffs between efficiency and compatibility in more detail.
Idit Levine: [00:07:59.19] What I mean is that today when you want to run an application, you want to do it usually on the kernel machine. You probably don’t want to take your operating system and run it on a Pentium I, ten years old. But if you will take the distro, an enormous operating system that you want and you will try to do that, you will be successful. You can take a Debian distro (Ubuntu) and put it on a ten-year-old computer. The question is why?
[00:08:39.10] When we’re creating this evolution about “Let’s make the stack better to run one application with one user on a single server”, it was important to us that in order to make it adoptable, we will support the old architecture. When you’re doing this it’s fine, but you have some dependencies and limitations; you can’t do it the best or the most efficient that you can. The community of the operating system makes a choice, and the choice that it makes is that it’s much more important to them to support all the old architecture – probably because it’s easier to make it adoptable that way – than to actually start with a clean slate and say, “What do we really need?”
[00:09:34.19] Maybe the best support will be “Let’s think about what we really need in order to run the applications of today”, because the requirements changed.
Jeff Meyerson: [00:09:45.09] In what ways does the Linux kernel have excess complexity? As we’ve built up these additional abstractions, as we’ve built up this additional compatibility, where are the most pertinent places where we have access complexity?
Idit Levine: [00:10:03.09] First, let’s separate between the operating system itself and the levels that we put on top of it. The kernel itself has a lot of unnecessary drivers that you can run with. If you’re running on the cloud, you cannot go to the machine, but you’re still going to have a USB driver. The same thing with floppy – no one is using a floppy, but it’s going to be there. There are a lot of other examples of stuff that you don’t really need maybe you run an application that’s not even using the network, but you’re still going to have the network driver. Or you don’t need a volume, but you’re still going to have the volume driver.
[00:10:37.24] You have a lot of stuff that is unnecessary. This is the first complexity, because these are lines of code that you need to maintain, that you need to make sure that it works and support it. The other thing if you have all of these lines of code and complexity… For instance before, when you ran on these big machines, you had more than one user. Because of that, you needed to make sure that there was a permission check. You need to make sure that I as a user am not doing any harm to your application. That is, again, not necessary if you’re the only user. In the cloud today, that’s what we’re doing; we’re running the application. You can have a multi-user zone in the application itself, but you don’t really care on the level of the operating system itself. This is also not secure. I prefer that no one will be able to [unintelligible 00:11:26.13] the SSH to a machine; it’s dangerous.
[00:11:32.12] The permission check is taking your performance down, because the operating system needs to check before it’s doing something.
Jeff Meyerson: [00:11:39.27] Can you define what a permission check is, for those who don’t know?
Idit Levine: [00:11:43.13] Permission check is “Who is this user?” Second of all, there are two layers on top of the operating system; one of them what’s called “kernel mode” and “user mode.” When I’m running an application, I’m running it in the user mode, and the only thing that is running on the kernel is the operating system itself, the kernel itself. If I’m making a call in my operating system… On my application, usually I will have some libraries that I’m going to use that will make what’s called an API call to the operating system. When that will happen, it will go to the kernel mode. The kernel will catch it and say, “Is that okay? Are we writing to a place that you’re okay with writing to? Does that not violate any of the permission check?”
[00:12:34.05] Maybe it’s writing to a place where there is either another application using this memory. So it just makes sure that this is a safe call. That’s one thing that I mean – first of all with the user, and second of all, make sure to separate between users and between application and users. Does that make sense?
Jeff Meyerson: [00:12:51.28] Yes, absolutely. Explain why this model of permission checks is somewhat outdated and why it penalizes us so much?
Idit Levine: [00:13:01.25] It’s mainly not needed right now. As I said in the beginning, the purpose is to run one application, so it doesn’t have to have a virtual space address, because there is only one application; the memory is what we have. Therefore, there’s not going to be any problem with the [unintelligible 00:13:18.23] because you don’t care. You only run your application, that’s the only thing on your machine.
[00:13:25.21] If you are the only user, of course you allow it to do this stuff. You’re running on privilege more; you are the only person on the machine and this is your application, you can do whatever you want. There is a lot of unnecessary stuff. It’s not that it’s very complicated, but it’s unnecessary, and that’s taking CPU cycles that you really don’t need to waste.
Jeff Meyerson: [00:13:48.29] What is a Unikernel? How can we move this conversation towards a discussion of Unikernels?
Idit Levine: [00:13:58.03] The idea with Unikernel is just a change of approach. Until now we went into the evolution and we looked at what happened, how we created this enormous stack. Unikernel is saying, “Let’s start from the top. Let’s understand what we really need to run.” You’re taking your application and you ask yourself what the application really needs – what libraries is it using? Where do you run? Are you running on this hypervisor, on that hypervisor? Are you running in the cloud? All of these questions are important, because I don’t need all the drivers on this machine; I only need the drivers that will actually be used.
[00:14:36.24] They came to a different approach and they said, “Here’s my application, here are only the libraries and the drivers that I need. Now package it and make it a virtual machine, or a bootable image.” It’s a different approach.
Now, from having a Debian distro of 419 million lines of code, suddenly you have a very small one, like two thousand lines of code. When you have that, it’s very easy. The performance is very good, and it’s also very easy to understand what’s going on, very easy to work on, very easy to maintain. That’s the benefit of it, but it’s doing the same thing. At the end of the day, you’re running one application in the cloud, and everybody is using it. It’s that simple.
Jeff Meyerson: [00:15:26.19] When we strip away those 400 million lines of code that we don’t need and we are left with the two thousand lines of code, what are the things that we always need? What are the aspects of that Linux kernel that we’re always going to need, even though we’re running in the Unikernel?
Idit Levine: [00:15:44.01] The first thing that you need is the driver itself, because that’s what will talk to your hardware – again only the drivers that you need; if you don’t need a network, you will not have network drivers. The second thing you need is something to package and manage it. If you look at something like Rumprun, that’s what they’re doing; it’s another layer that will manage that. That can be very thin, because there’s not a lot to manage now.
[00:16:10.03] On top of it, you have to have your libraries that your application needs. Then you need your application code; the application runtime, of course, and that’s it. That’s all you need.
Jeff Meyerson: [00:16:30.11] Is a Unikernel still considered an operating system?
Idit Levine: [00:16:35.00] It’s an operating system, it’s a kernel. People like to say that the Unikernel is the application, but at the end of the day the operating system is something that manages it, and it is being managed. It’s just a very lightweight operating system.
Jeff Meyerson: [00:16:56.02] My application is always going to need these certain specific things – specific language runtimes, specific hardware drivers… How do I determine which of these is going to go into my Unikernel?
Idit Levine: [00:17:13.16] In terms of the drivers, you need to know what you’re running on. For instance, I’m running on Xen, so I’ll need the drivers for Xen. If I’m running on Google Compute Engine, I know that what they have is basically KVM, so I’m going to need those drivers. Or I’m running on ESXi, on bare metal, or Raspberry Pi – you need the driver for that. That’s something that the user needs to do before creating the project.
[00:17:40.02] The person needs to understand – I want to run there, these are the drivers that I need. Besides that, there is the wrapper itself. For instance, in Rump kernel it’s called Rumprun – this is the tool that’s doing this magic. That’s in Rumprun, but there are others, and they’re called something else. The idea is that there is something that you know how to build, and what it’s doing is basically you package it with your compiler. At the end of the day, what you’re wanting is binary. If you want to compile an application for Python, for instance, what library to take from the Python, that’s the compiler of Python. So we’ll take what a compiler is giving, which is a binary, plus the basic libraries that you need, wrapped with the drivers. At the end of the day, you’re getting the Unikernel itself.
Jeff Meyerson: [00:18:38.26] So I specify the things that my Unikernel is going to need. Describe the process in more detail of how after I specify the aspects of the library operating system that my Unikernel needs, how does that get transitioned into a Unikernel that I can run?
Idit Levine: [00:19:04.10] I can go into as much detail as I can, but the idea is that there are tools for it. You don’t really do the hard work. You’re running some command code. For instance in Mirage, we had “mirage_compile”. Then you’re giving the drivers that you need – it’s like a command tool that you’re doing. The magic itself is down in these tools, and to be fair, this is not something that either me or my team deal with. We modify those tools, but the tool itself was not created by my team or me.
Jeff Meyerson: [00:19:36.25] Talk a little bit more about how Unikernels actually work. A Unikernel runs in a single address space – what does that mean?
Idit Levine: [00:19:44.26] Let’s take it once again to the regular kernel. Let’s say that right now I have two processes running and one memory, so I can do a lot of manipulation with it. Because the processes are going to do a lot of context switch, what I can do is run more processes, that each of them think that they’re getting the real virtual memory, but actually I’m managing some table that [unintelligible 00:20:18.13] and I’m only giving them what they really use.
If I have two processors and right now I’m running only one process – the amount of memory I have doesn’t really matter – I can take another process and make it think that he’s using the same memory and it has all the memory that it needs, but actually that’s not really true. What I’m doing is if I need to give it back to him, [unintelligible 00:20:46.11]. This is how it’s working on a regular machine.
[00:20:54.08] In Unikernel, you only have one processor, so therefore all these [unintelligible 00:20:56.04] with all the pages and so on, but you just don’t need it. Therefore, you have one physical memory, the processor is using it, it’s really getting all of it.
Jeff Meyerson: [00:21:15.09] Are Unikernels better suited to small, single-purpose servers like DNS, for example? Could you fit an entire Rails app in a Unikernel?
Idit Levine: [00:21:28.25] You can fit everything in a Unikernel. At the end of the day it’s only your binary [unintelligible 00:21:32.03] You can take an old application, and as long as this application is not forking and it’s not doing anything that the Unikernel does not support, you can do that. The question whether it makes sense to do that, that’s a different one.
[00:21:44.15] What’s cool about Unikernel is the fact that the boot time is very fast, therefore you’ll find a lot of use cases, like in NFB world, for instance. It is basically a serverless function – there’s something that I want to run, it’s very small, I want it to run only when a user is calling it, so because it’s very fast for me to boot the Unikernel, I’ll do that and then we’ll kill it immediately.
Lambda, for instance, will probably be a really good implementation. It can be implemented by Unikernel.
Jeff Meyerson: [00:22:19.06] Maybe you can talk about that in a little more detail. Lambda is closely aligned with this idea of serverless architecture. Explain what Lambda is and why that makes sense in the context of Unikernels.
Idit Levine: [00:22:36.21] Let’s say you have an application. You need to write all your lines, you need to manage those, you need to write an API for it, for someone [unintelligible 00:22:47.23]. There’s a lot of management that you need to do, and usually you need to either package this in a container and deploy it on a machine, but a lot of the time the only thing you want is something simple, you want to run a very small function, and this function is really not doing much.
[00:23:07.21] Let’s say that every time that someone is putting something in S3, I want to run something. Do I really need to actually manage a server here? Do I really need to take a VM, put all the right patches, my application, everything that I need, create an API and so on? Maybe not, because it’s kind of simple. AWS were the first that I know that did this – actually, I [unintelligible 00:23:29.23], but the idea is that you’re only going to spin up this functionality when you actually need it. When the user is not going to request for it, they’re not going to have infrastructure or anything running.
[00:23:50.02] For instance, you want to put something every time that somebody is putting something in S3; what I want is that only when someone is going to put something in S3, it’s going to spin up a handler that will run my container or Unikernel, and do something and then die. That’s the idea with the AWS Lambda – don’t manage your server anymore, we’re going to do that for you.
Jeff Meyerson: [00:24:12.07] Let’s talk about that in more detail. If I’m Amazon and I’m thinking about how I want to vend this serverless capability to my customers, should I architect it using Unikernels? Should I use containers? Should I use some combination of the two? What would be the different tradeoffs in those different architectural ways of them implementing their serverless architecture.
Idit Levine: [00:24:40.11] Today what they are doing is they are using a container, and they do that because they want to spare the time of actually spinning up the container, so they’re reusing the container. But Unikernel can be a very good fit for it, because it’s very fast to boot and it’s very light in terms of size. If you know that every time someone is putting something in S3 you need to run the same function, so build a Unikernel one time and then just use it whenever you want. It will be very fast to boot, and you’re getting a lot of security and all the other benefits of the Unikernel. So that would probably be a better approach, but both of them will work.
Jeff Meyerson: [00:25:24.02] Could you contrast them in a more general context? How do Docker containers compare to Unikernels?
Idit Levine: [00:25:35.13] A Unikernel first of all is usually a VM or a bootable image, and a Docker container isn’t; it needs to run on some kernel. The kernel is something you’re sharing between all your containers. What you do put on the Docker container is all the dependency of your application, the libraries and whatever you need to run. In Unikernel, you also need to put your drivers, because it’s basically a VM with a very lightweight OS that’s running directly on a hypervisor or a bare metal. In Docker, you need to somehow get this kernel.
[00:26:17.10] Now let’s compare. Let’s say I’m running a hypervisor – an AWS, and the AWS is running Xen, now I can either run a Unikernel or I can run a container. If I’m running a Unikernel, I’m just running an operating system, a VM on top of Xen – very easy, right? There’s no unnecessary there. If you’re running it on a container, you need to spin up some Linux machine, or some operating system, and on top of it put your container. So you have the Linux kernel with all the drivers and everything, and then you put the container on top of it. That’s the only real problem – a more mature operating system, versus a very light one. But Docker will put only the libraries that you need on the application, so…
Jeff Meyerson: [00:27:17.02] How do the performance characteristics differ between Unikernels and Docker containers? Obviously, Unikernels are fast to spin up, Docker containers are fast to spin up. Are there performance differences in terms of startup time and in terms of what’s going on once the container or Unikernel is actually running?
Idit Levine: [00:27:43.06] The right question needs to be, “How are you running the container?” If you’re running the container on bare metal and a very minimum Linux, of course there is less. But most of the people today are running it on a hypervisor, and on it you have all that Linux kernel. There you have all the contact switch, the permission check and all the stuff that we’ve talked about before. Then you have your guest operating system, which is again another kernel, and then on top of it you’re putting your container. Basically, you have two layers of kernel.
[00:28:25.19] If you’re running Unikernel – again, it depends where you’re running it. If you’re running on a hypervisor, you have only the hypervisor and then a very thin operating system. In terms of performance, you have less layers – it performs better. IncludeOS has an article about how their Unikernel is running, and you can see a huge difference from the regular VM and even from a container. It performs very well. Again, it depends how many layers you have in the stack, and that’s your choice.
Commercial break: [00:29:11.10]
Jeff Meyerson: [00:29:38.02] Can you describe what happens when I deploy my Unikernel against a hypevisor?
Idit Levine: [00:29:44.23] Not much, it’s a VM. You’re putting it up, that’s it. It’s like a regular VM.
Jeff Meyerson: [00:29:54.11] Could you contrast to what would happen if you deployed a Docker container against a hypervisor?
Idit Levine: [00:30:00.00] First, you need to somehow deploy a guest OS with a kernel; that’s something that will have to happen. The kernel will be a full kernel, with all the stuff that we talked about that is not really necessary (permission check etc.) and then you’re going to put the Docker container. In the Docker container you will have the library that you need. At the end of the day, you have another layer there which has a full Linux kernel that you don’t really need. You don’t have to have it.
Jeff Meyerson: [00:30:34.19] What changes once we have Unikernels? Once we’ve removed all of these permission checks, how does our world change? What speeds up? What are the advantages that we are getting?
Idit Levine: [00:30:53.24] Basically, the operating system is started and then your application is running immediately. Then it’s your application running on a kernel mode, which means that you can do all the calls that you want to do; there’s not switch between the kernel and user mode. There is no contact switch because there are no other processes, and there is no fault page because there’s one single space. Basically, all of this unnecessary stuff is not happening, and what’s really happening is that your application is running, it’s using whatever it needs, and that’s it. It’s very simple.
Jeff Meyerson: [00:31:29.21] Since the Unikernel can run directly on the hypervisor or the hardware without being in a virtualized operating system, does this mean that we can pack Unikernels more densely on this underlying substrate than we would be able to with virtualization?
Idit Levine: [00:31:49.21] It goes like this. Let’s say that I’m creating a bootable image. I can run it directly on a bare metal machine, but usually in bare metal machines I have a lot of resource. The beauty of the Unikernel is that you don’t need those resources; they are very small in the Unikernel. You will not want to waste all this huge server that you bought, so therefore it doesn’t really make sense to run it directly on bare metal without some hypervisor.
[00:32:16.19] If you’re running it on Internet of Things embedded device, now it actually makes sense. First of all, there is not a lot of space, and space is very important; you want to give your space to your application, not to your operating system, so that makes a lot of sense. It makes a lot of sense in this world of Internet of Things, but what we’re doing is we’re making sure that the Unikernel will run directly on the device. There’s no hypervisor, there’s no other layer; it’s streamed down the embedded device. But if you’re running the server in the cloud, I would argue that you’ll probably want to put some hypervisor, because otherwise you’re wasting a lot of resources.
Jeff Meyerson: [00:33:00.24] Is there a particular type of hypervisor that the Unikernel typically runs on?
Idit Levine: [00:33:08.27] When we started this process, most of the Unikernels started in a project called Mini-OS, which is coming from Xen. Since then, we helped, and a lot of [unintelligible 00:33:19.13] Specifically, we are supporting in ESXi, which is the vSphere environment of VMware; we are supporting Xen, we’re supporting KVM and QEMU, so basically you can run it everywhere. The only thing that we are not working on yet is [unintelligible 00:33:38.11] and the reason is because it’s Windows-based and it’s closed source, so we need to figure out what to do there.
Jeff Meyerson: [00:33:47.29] Are there advantages that particular hypervisors can give to the Unikernel running on it?
Idit Levine: [00:33:56.21] I don’t really think so. At the end of the day, it’s a VM. For the hypervisor it’s a VM. The very important thing to the Unikernel itself is the fact that you’re getting performance of container, because sometimes it’s faster to boot than a regular container, but you’re getting the maturity of VM, which is great. You’re getting the security of the VM and all the management that all those hypervisors are giving you. That makes it a very strong case. Also, people don’t need to change anything in their environment, which is very strong. Most of the people today are running on a hypervisor in their data center.
Jeff Meyerson: [00:34:39.29] You’ve mentioned security. What are the security vulnerabilities of the Linux kernel that Unikernels can help give us a better model to deal with?
Idit Levine: [00:34:53.02] I’ll argue that the only one is that because there is a big Linux kernel, it’s a lot of line, it’s a lot of functionality, therefore there are a lot of points where you can penetrate it. This is the scariest thing. You can SSH to it, you can try to mimic different drivers to create a lot of mess; there is a lot of stuff that you can do there that is very dangerous, so by stripping down the operating system and taking out stuff like SSH capabilities, you’re basically limiting the surface of attack. A good example that I like to give is that it’s like a house with windows and a door, and you’re locking them, so it’s kind of safe, but people can still penetrate. Compare that to a room without doors and windows – it’s very hard to penetrate. That’s the difference between Unikernel and a regular Linux operating system.
Jeff Meyerson: [00:35:58.15] I have read that there are certain security vulnerabilities that might be picked up if you are using a Unikernel. For example, because the application is running in the same address space as the kernel, there is potentially buffer overflow vulnerabilities. Is this a realistic problem?
Idit Levine: [00:36:19.06] This is funny. We had a long discussion about it, to try to understand exactly the vulnerability. What people need to understand is that eventually your Unikernel is basically a VM that’s running only one application. With this application, if someone manages to somehow get to the Unikernel and sabotage it, the application is not going to work, and the machine is not going to work. I will need to fix it and to spin up a new one, but it’s not that I’m attached to the specific OS, it’s not that I have to go and figure out what’s going on, because nothing will be affected except this process.
[00:37:09.07] A lot of the security work that’s being done in Linux is around the fact that usually you have more than one process, and you need to make sure that if one of them manages somehow to sabotage and get to kernel mode and do a lot of damage, it will not hurt the other, and therefore you need to do the separation between the modes, and so on. That’s not applicable for Unikernel, because there is only one process. Therefore, if this Unikernel is dead, it’s dead. It doesn’t sabotage any other application that’s running somewhere else, and I think this is the beauty of it.
Jeff Meyerson: [00:37:44.07] What are the types of problems in the traditional Linux kernel if you had access to the complete address space by way of this buffer overflow vulnerability? If you had the complete Linux kernel, how would that contrast with the situation that you’ve just described, where in the Unikernel if you can have access to the entire address space you’re saying it doesn’t actually matter, because the Unikernel is so restricted in its functionality? Could you contrast that – why is it a problem in the full Linux kernel and it’s not a problem in Unikernels?
Idit Levine: [00:38:24.10] We can do an exercise to understand what in the Linux kernel actually protects you. The first thing that there is is the priviledge separation. But again, there is no need for it, because there is only one process running. There’s no point to separate. The second thing is the protection of the rings. But again, there is one application. Even if this application for some reason will try to do something very malicious to the VM itself, the machine will die, but again, it’s not going to affect anybody.
[00:38:59.04] If you’re looking at protected memory space – there’s only one process that’s running inside the Unikernel, and there’s only one single address space. Memory space protection guard, virtual process address space from another process, there is no need for this isolation. If you’re looking at the namespaces, again, it’s the same thing – there is a single process, it doesn’t make any sense. If there is a fine grain of access control, [unintelligible 00:39:35.10].
[00:39:42.21] What I will argue is that if someone will go inside and really manage to run a buffer overflow, the results will be exactly like the results on a regular VM, but it’s not going to be worse than this. You will lose the machine, which means you will lose the application, but you only have one application running, so at least none of the others got attacked then.
We can continue and talk about application execution enforcement… But it just doesn’t make sense when you’re looking at one process running on one machine. All of these things that the kernel is giving you are not needed when you’re running only one process on one space.
Jeff Meyerson: [00:40:22.19] Do you think public clouds (AWS, Google Compute) are going to host Unikernels? How could you see that happening?
Idit Levine: [00:40:34.28] That’s a great question. Today if I’m running an AWS and a VR – the tool that we’ve created, called Unik is running an AWS, it’s supporting it, but [unintelligible 00:40:45.04] the machine itself, the bootable imagine, the minimum that we can get is 1 GB. This is the minimum that you can get by AWS. Therefore, if I can run something like 52 MB, I’ll still need to pay for 1 GB. That doesn’t really make sense, so AWS is right now not the best cloud to run your Unikernel. There are other clouds that make more sense.
[00:41:12.04] If I remember correctly, in Google there are two ways. You can either do it like this – fixed price and size, or you can do it custom. EMC has a cloud that’s called [unintelligible 00:41:24.29]; there they have a pattern of something called MicroVM, which means you’re paying only for what you’re using.
I’m assuming that when it will be more popular they will adjust the price, they will have to. In the future it will make more sense. Today, if you will run on-premise it will be more useable.
Jeff Meyerson: [00:41:54.01] You mentioned Unik – let’s talk a little bit more about that. What is Unik?
Idit Levine: [00:41:59.19] You remember that we talked about how you actually build a Unikernel… I described it a little bit, but I didn’t go into all the details because it’s too complicated. The bottom line is that it’s complicated, and we wanted to make sure that if there is a user that understands that he has an application and he wants to run it, and he understands that a Unikernel may be the better fit for his application (because it’s taking less space, because it’s more secure, because it’s very performable. What we wanted to make sure is that it will be very easy for him to do that, because not everybody that’s writing an application or running a website understands how to compile drivers.
[00:42:46.05] The idea with Unik was let’s do what Docker did to Linux containers, and just make it very easy to work with. If someone wanted to go and take his application, run it in a container – it’s not that hard to do it with Docker – we wanted to give him the same experience to run it in Unikernel. We felt that if we will do that, people will use it. I’m a big believe that this is the architecture of the future cloud and Internet of Things.
Jeff Meyerson: [00:43:14.04] How do developers actually turn their applications into Unikernels today? How does that contrast with how you aspire Unik to work?
Idit Levine: [00:43:23.27] How it’s happening without Unik, or how it’s happening with Unik?
Jeff Meyerson: [00:43:27.10] Both questions. How does it contrast in Unik with what’s happening today?
Idit Levine: [00:43:31.24] The work that we did is kind of complicated. We wanted to run into an ESXi; you need to figure out which driver is working, and this is not an easy work. This work is being done by Yuval Kohavi, and that’s what he knows – he’s debugging [unintelligible 00:43:47.22] code and crazy stuff.
You need to understand what driver you really need there, so our luck was that Rump is based specifically based on a FreeBSD, so if something like ESXi supports FreeBSD, then we hoped that if we will take those drivers that have already been done for FreeBSD, it will work for Rump. So we did a lot of work on this, but it’s not always working; we needed to figure out what’s missing, which driver depends on which driver, so it’s a lot of work that we did. After we did this work, now we know how to do something like that.
[00:44:28.03] What we did is we automated that. We said to those people, “Bring us your code. You tell us which provider you want to run in, and we’re going to compile your code on the language that you choose, with the drivers that you need, and we will create your bootable image. We will be abstracting all of this for you.” But this is not an easy work; we worked on it quite a lot.
Jeff Meyerson: [00:44:52.28] There’s been a lot written about how challenging it’s been for Docker to get Docker compatibility on Windows, Mac and everywhere. It sounds like with Unik it’s paradigmatically the same challenge, where you really have to get compatibility with all these different hardware systems, all these different cloud providers, all these different types of Unikernels. How do you address that compatibility? How do you manage that workload?
Jeff Meyerson: [00:45:25.03] When we wrote Unik, the instruction to the team was, “We have to make it very pluggable.” As you said, there are a lot of providers that we want to support, there are a lot of Unikernels that we want to support, there are a lot of Unikernel providers that we want to support. And to be fair, this is such a green area that we don’t know who’s going to win; maybe Mirage, maybe Rump. We don’t know, and therefore we’ve decided not to choose. We’ve built Unik to be very pluggable.
[00:45:58.20] We’ve been inspired by the Kubernetes architecture. We’ve created an interface to our providers, an interface to a compiler. Now, if you bring me your code, there are two things that I need to know: what type of code you have, which Unikernel you want to run it on (Rump, IncludeOS, Mirage etc.) and which provider you want to run it on. Maybe you want to run this Unikernel on Xen, or maybe you want to run it on AWS which is also Xen – that will be the compiler, but the provider will be AWS. Because when we’re talking to the API, we will talk to AWS, we will not talk directly to Xen.
[00:46:45.22] We’ve built it in such a way that if you want to add support for something else — a great example is IncludeOS. IncludeOS is so unique; I’ve read some article on it, and they said, “How come IncludeOS is not one of the Unikernels that you support?”, so they just added support for it. It was quite easy for them, because what they needed to do is a container that will run the compiler itself and that’s it, because we already have the support for QEMU and the virtual box that they needed as a provider. It’s very easy to do that if you know what you’re doing.
[00:47:24.02] As I said, if you look at the code in Unik and how we’ve built it, you’ll see that it’s very easy to extend it. It’s very easy to add more support. We’ve been doing that constantly; right now we’re working on more and more support.
Jeff Meyerson: [00:47:38.26] When you talk about Rump versus Mirage, could you explain what the differences between those two are? What is Rump, what is Mirage? You frame it as if one of these is going to win – explain what you mean by that.
Idit Levine: [00:47:56.18] There is a tradeoff here. Rumprun is a Unikernel that’s kind of supporting quasi-compliance. That means that if your application is running [unintelligible 00:48:07.10] applications that you wrote, or Go, Node.js or any other language that you choose, and basically I don’t need to modify the language itself or the code that you wrote in order to run it on Rump.
[00:48:27.25] In Mirage it’s a little bit different, and the same with IncludeOS. The reason it’s called IncludeOS is because you need to do IncludeOS to your code, to the C++ code. That means that they’re using different libraries, that are not what the regular application is using if you’re running on Linux, and you will have to modify your application. But because they’ve built it specifically and [unintelligible 00:48:49.25] you can get a better performance.
[00:48:53.24] It’s basically a tradeoff. If you’ll run your application in OCaml, which is the language that Mirage supports, you probably will get a better performance, but you will need to run OCaml, and not a lot of the applications today are running OCaml. It’s a good question what people will prefer. I think they’ll prefer Rump because they don’t need to modify the code, versus Mirage. But there is a lot of community around Mirage, and less about Rump, so eventually maybe Mirage is going to end up adding more support for stuff. We will see, I don’t know.
Jeff Meyerson: [00:49:29.06] Great. Idit, where can people find out more about Unikernels, and about you and your work?
Idit Levine: [00:49:37.05] My team is doing a lot of stuff, but that’s one of the things that we did. If you’re going to github.com/emc-advanced-dev, you can see there a project called Unik. This is where you’re going to find the Unik code, and you can install it and run it. It’s very simple and beautiful. Our e-mails are there, so you can reach out to us.
You can reach out to me on Twitter, @Idit_Levine. You will find me, just google me.
Jeff Meyerson: [00:50:14.28] Great. Why is EMC focused on Unikernel? You work at EMC – what is the business value there?
Idit Levine: [00:50:24.03] I’m working in EMC and I’m the CTO of the Cloud Management Division. We’re basically changing the way EMC is trying to think. We have something called EMC Dojo, we’re doing programming, test-driven development. We’re focusing a lot on CloudFoundry and other tools. Inside this group, I picked two developers to work directly for me and we’re doing advanced development. The Unik project is coming from that. It’s Yuval Kohavi, Scott Weiss and myself that are responsible for Unik.
[00:51:02.04] It started as a development project, but it succeeded more than we expected. We actually knew that it would succeed, but it succeeded more than EMC probably expected. Now there are a lot of use cases that we can think of that EMC can leverage. The main focus was to change perception. EMC, as an innovative company, was head-to-head with Docker and other companies, and besides that, this is a cool technology and this is what we now have to do to lead.
[00:51:35.15] There is no real business value right now, but after we did it we discovered that there are a lot of options to put it inside EMC etc. There are a lot of options that we see.
Jeff Meyerson: [00:51:54.20] Idit, thanks for coming on the show. It’s been a real pleasure talking to you.
Idit Levine: [00:51:59.08] Thank you so much for having me!
great content. thanks for sharing. The “virtualization/containerization” landscape is evolving so quickly, that is difficult to keep the pace of
I love the idea of unikernels! thanks for sharing this podcast!
I have a question regarding unikernels. I understand that a unikernel runs only one process. However, Does it support multiple threads within that process?
I’m no expert at it, but I would definitely thinks so.
[…] Latest SE Radio episode on Unikernels: http://www.se-radio.net/2016/10/se-radio-episode-271-idit-levine-on-unikernelsl/ […]
[…] are an efficient way to keep up-to-date in the fast moving IT world. Just finished listening SE-Radio Episode 271: Idit Levine on Unikernels. Very […]