Andy Suderman, CTO of Fairwinds, joins host Robert Blumen to talk about standing up a kubernetes cluster. Their discussion covers build-your-own versus managed clusters provided by cloud services, and how to determine the number of kubernetes clusters an organization needs. Andy describes best practices for automating cluster provisioning, and offers recommendations about customizations and opinionation of cloud service providers, choice of container registry, and whether you should run complementary services such as CI and monitoring on the same cluster. The episode also examines the day 0/day 1/day 2 lifecycle, cluster auto-scaling at the cloud service level, integrating stateful services and other cloud services into your cluster, and kubernetes secrets and alternatives. Finally, they consider the container-network interface (CNI), ingress and load balancers, and provisioning external DNS and TLS certificates for cluster services.
This episode sponsored by Miro.
Show Notes
- Fairwinds
- Andy Suderman blog posts on Fairwinds
- Kubernetes Ingress documentation
- Kubernetes External DNS
- Kubernetes Cert Manager
- AWS EKS Service
- Google GKE
- Azure AKS
- Kyverno Policy Engine
Transcript
Transcript brought to you by IEEE Software magazine and IEEE Computer Society.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Robert Blumen 00:00:19 For software engineering radio. This is Robert Bluman. Today I have with me Andy Suderrman. Andy is the CTO of Fairwinds, a Kubernetes service provider. He’s previously held roles as SRE, principal, engineer and director of R and D and technology. He works with infrastructure spanning major cloud providers and verticals. He is a graduate of the Colorado School of Mines. Andy, welcome to Software Engineering Radio.
Andy Suderman 00:00:46 Thanks for having me.
Robert Blumen 00:00:48 And today Andy and I will be talking about setting up and managing Kubernetes cluster. We’ve done a few episodes on Kubernetes already, 446, 334 and 319, and it was mentioned in 440 on GitOps. We also have some recorded content on Kubernetes coming up that we don’t have an episode number yet, so we’ve covered it quite a bit. I’d like to just do one background question. If you could give a really brief synopsis of what Kubernetes is and what problem it solves, then we’ll be talking more about how to set it up.
Andy Suderman 00:01:23 Yeah, sure. Happy to. So Kubernetes at its core is a container orchestrator. We use it to run containers across multiple machines and do lots of things with containers. So at its heart, it’s an API that allows us to describe the desired state of containers running across multiple machines. So that’s probably the simplest way to define Kubernetes and how we think about it.
Robert Blumen 00:01:45 So I wanna start out with, let’s say an organization has decided they want to migrate to Kubernetes or adopt Kubernetes as their orchestration platform. How did that conversation go to get to the point and what alternatives did they consider and rule out?
Andy Suderman 00:02:03 I think it’s a really interesting way to ask that question because most of the time I get asked, what should we think about when we’re moving to Kubernetes? People have already made the decision. I think it’s important to think about the reasons why. So lots of different alternatives to consider. I think one of the biggest things to think about with moving to Kubernetes is taking on complexity. You’re adding so many layers of complexity to your stack. Do you really need that level of customization? Do you need that level of control? Are you building a platform on top of that? Are you serving multiple teams in multiple apps? If you just have one app and it’s already containerized and you don’t need to run it across, you don’t need a ton of control over how it’s run and you only have one. Maybe don’t use Kubernetes and use something like Cloud Run or Fargate on EKS or one of the other, many other ways to run containers. So I think thinking about the balance of complexity versus features that you get from running Kubernetes is super important.
Robert Blumen 00:02:59 I’m gonna ask you a question where the answer’s gonna be. It depends, but do the best you can. A medium-sized organization that has some different products and they want to get all in on Kubernetes, how many clusters are they gonna end up with and what are the driving factors in triggering when you can run certain things on the same cluster when you need a new cluster? And how much overhead is there for each cluster?
Andy Suderman 00:03:27 Yeah, this is a question we get a lot and the answer is almost always two. You need one non-production cluster and one production cluster. And beyond that, Kubernetes has so much built-in ability to segment workloads in different ways and control who has access to what that it’s very uncommon to really need, especially in a medium to small-sized organization, to need more than just the non-prod and the prod cluster. You have to have that separation between non-production and production because you need to be able to test changes that are cluster wide and you can’t safely do that in production. I have seen companies run massive single clusters for the entire organization, prod and non-prod, and that usually turns into a bit of a disaster. So things to think about when you’re segmenting workloads, are they particularly noisy in one particular area of resource utilization? There’s different ways to segment that out, but sometimes a separate node group is necessary. You should always utilize namespace as much as possible because they give you a very cheap segmentation line to draw between different areas in your clusters. I think I hit all the points of the question.
Robert Blumen 00:04:28 Yeah. Now my understanding it, maybe I’m wrong about this, but Kubernetes is single region?
Andy Suderman 00:04:35 Generally that is the case. Most implementations of Kubernetes allow you to run multiple availability zones in the same region, but running cross regions is generally not recommended, mostly because of network transit issues and not being able to sort of make the cluster be completely aware of what network topology looks like between different segments of the cluster.
Robert Blumen 00:04:57 If I have a product and I wanna run it on multi regions, that would imply I’m gonna need one cluster per region. Is that correct?
Andy Suderman 00:05:05 That’s typically how we recommend folks do it. I’ve seen solutions where, especially in in Google where networking is a little bit flatter, where you can run multi-region clusters, but typically we run one per region.
Robert Blumen 00:05:18 A small company that starts because they have one product idea. So you put that out on your Kubernetes cluster, medium sized company that has multiple products. Are you going to run several products all on your same prod cluster or are there gonna be different kinds of considerations of, could be anything and maybe you could include it in your answer of why you’d need to put each product on its own cluster or maybe not, maybe not all end to one.
Andy Suderman 00:05:45 Yeah, yeah. So typically, like I said earlier, we recommend all prod workloads in a single prod cluster. This is just from a complexity and overhead standpoint, right? Each additional cluster, you have to keep things up to date, you have to update the cluster itself. Now, most of the reasons that I see for segmenting products between clusters are at the business level. I need to maybe keep all of my workloads for one product in a specific AWS account so that I can do much easier billing segmentation and understand which product costs more. And so usually I think about cost allocation and things like that when I think about running multiple clusters. Just to simplify that. Now there’s plenty of tools to do that stuff in a single cluster, which it’s much more complex to split a shared cluster up from a cost perspective and from an effort perspective,
Robert Blumen 00:06:34 You have multiple services you’re gonna be running on this cluster that could include things like CI/CD that is deploying things onto the cluster and you’ve got your dashboards and monitoring that monitor the cluster. Do you put it all on your dev cluster? So we’re going to use CI on dev to deploy on dev and monitor it from dev? Or is there ever a reason why you want to put monitoring and alerting or other functions on their own cluster so you can have resiliency or manage things separately?
Andy Suderman 00:07:08 Yeah, it’s an interesting question. I think the first thing that I pick out with that question is the assumption that you’re running your CI/CD and you’re monitoring in-cluster. I think typically for a small to medium sized organization, it makes much more sense to pay an outside vendor to do those things for you. So we’re heavy users of Datadog, we’re heavy users of CircleCI, there’s lots of CI/CD systems out there. And so if it’s not your core competency and you don’t wanna have a team that has to manage those things, don’t run them yourself and don’t run them in Kubernetes. Now, if you are gonna run them, there are arguments to be made for running a third sort of management cluster or tooling cluster that will allow you to run those bits in a separate fashion and then just have all the other clusters report up to them and things like that.
Andy Suderman 00:07:54 CI/CD workloads can be especially troublesome in Kubernetes because they’re short-lived job style workloads that can consume a ton of resources really fast and then go away. So at the very least, a separate node group for those sorts of things. And then the question of prod versus non-prod with your CI/CD system is an interesting one. Typically it’s probably easiest to have one per environment, but then you’ve got the management overhead of running your CI/CD system twice. So what does that look like? Maybe a separate cluster is justified in this case. And as you said earlier, the answer always includes a depends.
Robert Blumen 00:08:31 Absolutely. That’s the catchall answer for everything. Now I want to move on to talking about some of these strategic decisions and now looking at setting up a cluster. At least two of the options I’m aware of are you build it yourself or you use a managed cluster offering from one of the cloud service providers. Amazon and Google, I’m aware, have managed Kubernetes’ offering. Is there ever any reason to build your own now or would you always let somebody else build it for you?
Andy Suderman 00:09:04 The answer is almost always let somebody else build it for you. We’ve run clusters since before EKS existed and we ran kOps clusters and that works and it’s fine, but it’s just so much more management overhead. The only time that I say build your own cluster is when you have a really specialized use case that requires you to run a very specific configuration of your control plane. And honestly those configurations are very rare. I can’t actually think of good examples anymore. There used to be several good examples, but they’ve all been incorporated into the Kubernetes entry control plane and there are options that you can just use. You don’t have to enable them specifically. So it’s very rare that I recommend running anything other than your cloud provider managed control plan.
Robert Blumen 00:09:51 We recently did episode 571 on multi-cloud governance. The topic discussed there is how the definition of what is the cloud is becoming less clear. There’s the old joke about the T-shirt that says the cloud is someone else’s computer, but there are emerging technologies where you can incorporate hardware you own into one of the cloud service provider’s managed scope. If you are in a situation where you own a bunch of your own on-prem computers, are you now obliged to build your own cluster there or can you get a vendor to manage a cluster for you and you bring your own hardware?
Andy Suderman 00:10:33 That’s a great question. And I’ll be honest, I haven’t done any on-prem hardware in five and a half years since my last role working at ReadyTalk. But I have heard good things or interesting things at least about some of the managed offerings that allow you to incorporate your own hardware into a Kubernetes cluster. And from my perspective as a cloud expert, that feels like the best way to work with on-prem to cloud migration if that’s the long-term goal of that situation. But if you are running your own internal hardware, I know there are other options as well from companies like VMware to run Kubernetes on that hardware as well. So in general, managed is probably the best way to go. Building your own control plane from scratch is a lot of overhead. Frankly,
Robert Blumen 00:11:21 I was surprised when I got exposed to Kubernetes by how much is not in the base layer, how many components you have to add to get to the point where you have a functioning cluster, which is what you want, you may not really care that much. Which, to give one example, which DNS provider is used as long as it works, how opinionated are the cloud service providers managed offerings? How many decisions do they make for you to get to that point where you have an integrated workable system?
Andy Suderman 00:11:53 Yeah, so you mentioned the DNS provider. That one’s a little bit interesting because it’s core to Kubernetes. It’s the heart of service discovering Kubernetes. You can’t really run Kubernetes without a DNS provider. So in that particular instance, the cloud providers are very opinionated. But as soon as you get beyond that point, they become less opinionated. They give you an API and you can run whatever you want on top of that, including different CNIs – container network interfaces – different storage drivers, and different options for just about everything. And so in all of the standard Kubernetes offerings, I’d say they’re very not opinionated in any way. You start getting into things like GKE autopilot, then you’re allowing the cloud provider to make decisions for you and get opinionated, which for some companies is the right choice in order to reduce that level of complexity. But in general, it’s just an API A, Kubernetes API. And then beyond that, you install the rest of your, we call them add-ons.
Robert Blumen 00:12:49 You said a couple things that I want to follow up on. The GKE autopilot. Say more about what that is.
Andy Suderman 00:12:55 So GKE autopilot is a sort of a more locked down version of GKE. There’s a lot of policy and rules associated with how you can deploy to it. There’s limitations on what you’re allowed to deploy. For example, you can’t deploy anything to a GKE autopilot cluster without a CPU and memory request. And then there are certain rules about how big they have to be, how small they can be. For a long time they didn’t really allow the creation of any CRDs – custom resource definitions. I think that has since changed, but it’s sort of a guardrails included version of GKE.
Robert Blumen 00:13:29 You mentioned the CNI first. What does that stand for and what is it?
Andy Suderman 00:13:33 Yeah, the container networking interface is the software defined network layer that all of your pods and thus your containers will run inside of. Now what that looks like is very different from CNI to CNI. We’ll take EKS for example, because it’s the one that we use most often. By default you get the AWS VPC CNI, which uses an AWS network interface on each instance for the pods. And so you get actual in VPC routable IP addresses for each pod if you choose to do it that way. And there’s a lot of other examples out there. The original one that most folks are probably familiar with is flannel, and then there’s Calico on top of that and then there’s Cilium, there’s a whole bunch of options out there.
Robert Blumen 00:14:20 If you are running on a cloud service provider, is there ever a situation where you’re gonna want to use a different CNI than the one that is built into the service provider’s managed offering? Or did they pretty much get it right for their situation and you should move on and operate your business?
Andy Suderman 00:14:39 That’s a really tough question to answer. I think in most cases that’s true. There are limitations to all of them. The popular one that folks will like to cite on the AWS VPC one is that it eats a lot of IP addresses because you’re giving an IP address to each pod, there’s a lot of IP overhead. And so in an IPV four space, you can run out of IP addresses in a smaller size VPC pretty quickly. So that’s one downside to consider. If you’re running thousands and thousands of small workloads, maybe coming up with an alternative strategy for managing those IP addresses is important. I’d say for the, you know, 85, 90% use case, whatever the cloud provider gives you is going to be the most straightforward and they’re gonna have the most expertise in it and give you the most support on it. If you go and install Cilium on top of AWS EKS, then you’re gonna get, a lot of times you’ll go to AWS support and they’ll be like, well, you’re running Cilium, go talk to the Cilium folks. We can’t help you.
Robert Blumen 00:15:34 I’m gonna guess you’ll say yes to this. Should you use the service provider’s container registry as the cluster container registry?
Andy Suderman 00:15:42 I don’t know that’s necessarily a hard yes. I think it can make things easier for you for sure. If you have a multi-cloud strategy, definitely not, go with something centralized that you can manage from one place. If you’re already paying Docker, Docker hub isn’t a terrible option, you get additional benefits from using something like Quay where you get container scanning. Although the cloud providers are starting to add that now too. That’s very much a how do you wanna store your artifacts question and not a Kubernetes question, in my opinion. It’s more of a traditional software, like where are we gonna keep our artifacts? Do we have an Artifactory instance already? Well maybe we should use that as our registry. Do we have something else going on that makes more sense? It’s not a horribly complex question because it’s an OCI registry, it’s an artifact store.
Robert Blumen 00:16:32 And if you have Artifactory, are you gonna run that on Kubernetes or where would you run it, if not?
Andy Suderman 00:16:39 Good question. If you have Artifactory, you’re probably already running it somewhere. Maybe it doesn’t make sense to change that. Maybe it makes sense to move it into Kubernetes just from a management perspective, we’re gonna manage all of our things on Kubernetes. There’s a whole slew of articles out there that are, you know, should I move everything to Kubernetes or should I not? You’ve got a whole stateful question there with Artifactory, is it keeping its artifacts on disc? And maybe we, we don’t necessarily wanna run that in Kubernetes. I haven’t run Artifactory in a long time, so I’m not an expert on that specific use case. But questions about storage and things that are typical of running any app in Kubernetes would be applicable.
Robert Blumen 00:17:17 Andy, reading about this space, I see a lot of this day zero, day one, day two. What are those days and what happens on each one?
Andy Suderman 00:17:28 That’s an interesting question. Our marketing folks would tell me to start moving away from that terminology because it’s a little bit antiquated perhaps, but I think the heart of it is really thinking about your level of maturity within Kubernetes, or within any system. The FinOps Foundation likes to use the terminology, crawl, walk, run. I think that’s a great way to describe the same thing. Day zero, you don’t have a cluster, you don’t know anything about Kubernetes. Maybe you don’t even have containerized applications, although that’s becoming very rare these days. And so you just need a cluster and you don’t need all this complexity, you don’t need additional features or things like that. You just need to learn about how to get an app into Kubernetes, get it running and keep it running reliably. When we start talking about day one, day two, which often get munched together pretty quickly we start to think about more advanced topics like how am I enforcing policy in Kubernetes? How am I optimizing resources in Kubernetes? How am I deploying to Kubernetes in a more efficient manner or am I deploying correctly? And then we start thinking more about security and things like that as well.
Robert Blumen 00:18:30 One of the things that drives the adoption of Kubernetes or any kind of scheduled orchestration is it’s very good at scaling individual services up or down. So you can optimize your resource spend, but if your cluster also could not scale up or down, you might end up with a lot of virtual machines that you’re leasing that aren’t doing any work. Do the managed service providers offer integration with their own VM auto scaling so you can scale the cluster itself up or down?
Andy Suderman 00:19:03 Yes, absolutely. We consider the ability to autoscale the cluster a core ability of Kubernetes and we run it everywhere that we run Kubernetes. It varies from cloud provider to cloud provider. So EKS, at its heart, the nodes are run as autoscaling groups in EKS. So if you’re familiar with those, you can use the sort of standard ASG scaling mechanisms. Those aren’t necessarily aware of Kubernetes in any way. So there’s a couple of other projects on top of that that can work a little bit better. There’s a Kubernetes repo called autoscaler that includes the cluster autoscaler. That is a fairly straightforward add-on that you can run in your cluster. It works with most if not all of the major cloud providers. And what it does is it watches for the need for a new pod. So when you spin up a new pod, the scheduler tries to say this pod goes here and the cluster based on the resources that it’s requesting.
Andy Suderman 00:19:57 And if it can’t find a node to put that on, then the cluster autoscaler will generate a new one. And likewise over time it will watch for empty ones and scale them out. And that’s a fairly simple and unsophisticated, I am quoting fingers around unsophisticated, it’s relatively complex, but it’s not super aware of the topology of the cluster when it does this. It’s just, do I need a node or do I not? There’s other projects out there like Karpenter, which is a newer one for AWS clusters currently that will, it sort of replicates the scheduler and runs multiple scenarios to see what type of node it should be adding and or can it compact the cluster into a smaller group of nodes. And so that’s a popular one in AWS right now. And then in GKE you get autoscaling for your node groups out of the box. It’s just included. You can turn it on from the console if you want. You can say minimum nodes, maximum nodes and it works using that similar cluster autoscaler logic that I talked about first. And then the other cloud providers, I’m not intimately aware of their built-in abilities, but the cluster autoscaler works with all of them and we’ve been using cluster autoscaler for five or six years now since the early days of Kubernetes.
Robert Blumen 00:21:08 In your Kubernetes requests you can tell a particular service that needs a certain amount of memory or number of cores, but it can also have specialized requests like needs to run on a node that has SSDs or GPUs. Are these cluster auto scalers, are they scheduler aware where you’ll probably get the right kind of nodes you need for where the workload it needs to launch.
Andy Suderman 00:21:31 So that’s true of the more modern ones like Karpenter. Karpenter’s very good at this. It’s one of its main advertised features is it sees all of those various requests about node types and GPUs and things like that and it will attempt to pick a node for that workload. The traditional cluster autoscaler is not really aware of those and so you have to be careful about making sure that you’ve organized your node groups in such a way that if I need GPUs, I have a node group that has GPUs available and I use a node selector that forces it to be scheduled on that type of node. And then the cluster autoscaler can scale that group to accommodate more pods. But you have to make sure those nodes are sort of available already or that node group type is available already. Whereas Karpenter will just pick a new node out of its list of nodes, which by default is every node type in AWS, which you might want to tune a little bit, but it will do just about anything you ask it to. So it’s a little bit more intelligent that way.
Robert Blumen 00:22:30 Sounds like the problem of auto-scaling the cluster, then you would really need to autoscale each node group somewhat independently of each other node group. Although there may be some services that could run on more than one node group, but it sounds like it’s a complicated problem.
Andy Suderman 00:22:48 It definitely is and that’s why Karpenter was created was to sort of solve a lot of those issues with the original cluster autoscaler and make that process easier.
Robert Blumen 00:23:47 Now let’s say we’re going ahead, we’re gonna have the two clusters you recommend. Maybe we’re multi-region, so maybe we end up with five clusters because prod is in three regions. What kind of tooling are you going to use to spin up the clusters? Do you recommend infrastructure as code approach?
Andy Suderman 00:24:07 Absolutely. Huge advocate of infrastructure as code. We use Terraform, we use Pulumi in some places. I know there’s a bit of drama with a capital D in the Terraform community right now, but infrastructure as code pretty much an absolute in our world. We typically use the cloud provider agnostic tools such as Terraform because we operate across multiple clouds. But I do know some folks that are strictly running in AWS that love cloud formation. Never been a huge fan personally, but I’m always multi-cloud so I don’t really get a choice.
Robert Blumen 00:24:39 I want to talk a little bit more about stateful applications, but let’s assume for the moment you have a stateful application and all your state is in something that’s durable like a database or a storage mount. Do you look at the Terraform cluster as any ephemeral resource where you could lose it and then you could rebuild it but with your Terraform from scratch if need be or if you decide to expand into a new region, you could essentially spin it all up with a minimal amount of work?
Andy Suderman 00:25:10 Yeah, that’s pretty much exactly how we treat our clusters. We typically try to keep state out of it as much as possible and that’s a very valid DR strategy – a disaster recovery strategy – if you’re not planning to have a warm standby or something like that. If your cluster is completely stateless and you can recreate it from your infrastructure’s code in minutes, then having a hot standby cluster or a failover cluster may not be necessary depending on your disaster recovery needs.
Robert Blumen 00:25:38 Were you ever in a situation where either you lost a cluster and you had to rebuild it or you were doing a DR and you were doing exactly what we just said?
Andy Suderman 00:25:47 We practice that scenario every year. We’re moving towards quarterly, but we do try that scenario out on a regular basis just to validate that we can do it. So I think I’m lucky enough, knock on wood to say that I haven’t had to do it in a live situation before. A full regional outage is a very rare occurrence, thank goodness. So I don’t think I’ve done it on the fly, but we definitely practice it.
Robert Blumen 00:26:12 Did you uncover anything like, oh, there’s that one thing and someone changed it but it didn’t get automated or something that needs to be changed? It’s outside of our automation.
Andy Suderman 00:26:23 That’s exactly why we practice it and why we want to do it every quarter because every time we do it we find some rough edges where the deploy process changed or we missed the spot that we need to change the region or something along those lines. So practicing those DR drills is super important to make sure that you catch those edge cases. Each time we do it, the list gets smaller and we get a little quicker at it. So it definitely takes practice though.
Robert Blumen 00:26:47 I don’t know if you would agree with this, but I, I read someone’s opinion is that Kubernetes was really developed to run stateless applications and the state flow was a bit of an add-on. It is true. Kubernetes does not have any native method for offering state, so you end up importing something from your cloud service provider. Can you talk about what some of the approaches are for obtaining state from the cloud service?
Andy Suderman 00:27:13 Yeah, definitely and I would totally agree with that. I think Kubernetes was designed originally to run a standard stateless API, your simplest use case is kind of what it was built around and the stateful stuff’s gotten a lot better, but I still generally recommend folks use their cloud provider for maintaining state and that depends on what kind of state you need. In our case it’s mostly databases. And so in that case you’ve got your RDS or your Google Cloud SQL to run your database and then there are best practices around all of those services for running them highly available with backups and snapshots and all of those good things to make sure that you don’t lose data. But then you also have your object stores. So we make heavy use of S3 as well for doing object storage. And then beyond that you’ve got NFS, right? You’ve got your EFS stores that can be beneficial in some ways if you need shared storage, but also performance can be lacking. So there’s a ton of different options for storage from every cloud provider and almost always you can find one that’ll do what you need to do.
Robert Blumen 00:28:18 So you’ve got your cluster up, you’ve got some stuff deployed on it, and you want it to become visible to the outside world so customers can use it. What are the additional steps and add-ons to get to that point? And I should also mention you’re probably running inside a private VPC so you may need to do things both in Kubernetes and at your cloud service provider level.
Andy Suderman 00:28:41 Yeah, so this is where your add-ons come into play. We call them add-ons. I don’t know if that’s a common term honestly, but I’ve been talking about this topic for a long time. I think one of the earliest blog articles I wrote about Kubernetes was what all the stuff you need to make it run for you. And so there’s this group of applications that I, I personally call the trifecta because I love it so much personally because I used to have to run all these things manually in a data center and these three things together make all of that go away. And so the three things are external DNS, which is a automation tool for updating your cloud provider’s DNS records to point to your applications in Kubernetes based on the Kubernetes objects themselves. There’s cert-manager which uses the ACME protocol and you can hook it up to Let’s Encrypt to do automated certificate generation and rotation.
Andy Suderman 00:29:32 So by default it’ll generate a 90 day certificate for your applications and renew it every 60. And then the third one is an ingress controller of some kind. And so in Kubernetes there’s the concept of an ingress, which is a built-in API object. And that object itself doesn’t do anything unless you have a controller to satisfy it essentially. And so there’s lots of different ingress controllers out there. Most of them are based on technologies you might be familiar with outside of Kubernetes like NGINX or HAProxy or Traefik. We typically recommend to start out the NGINX ingress controller or the project called ingress NGINX, which is very confusing naming, but essentially what it does is it creates a config for NGINX inside of a proxy, an NGINX proxy that’s running in the cluster to route traffic to your pods based on that ingress definition that you create.
Andy Suderman 00:30:28 And that will also trigger those other two projects to do their work. So essentially the end result of these three products together is that when I create a service in Kubernetes, I write all about 20 lines of YAML to define an ingress object that says this is the host’s name that I want, this is the pod that’s servicing that service. And what you’ll get out of the box is a route through a load balancer to that a DNS name and a certificate to go with it. So it automates all of that extra stuff around deploying a service and making it publicly available that you wouldn’t have had out of the box.
Robert Blumen 00:31:04 I want to drill down into some of the components of that response. Let’s start with DNS. You could either have an A record or a C name, which is an alias to another DNS. What does the DNS point at, because all of your Kubernetes is inside of VPC and it has its own networking. So is that where the load balancer comes in?
Andy Suderman 00:31:28 Yeah, you have to couple that question with the ingress controller or with a little bit of knowledge of Kubernetes services. So a Kubernetes service is another API object that you create and if you create it in a certain way, if you give it a certain type, it will have a different external endpoint or it won’t have an external endpoint at all. So we’ll take the simplest external use case where you say I want a service of type load balancer. Well that will trigger Kubernetes to create a load balancer in a public subnet that’s accessible and then essentially attach that load balancer to your pod. And I don’t know how complex we wanna get with the mechanism on how that works, but essentially what it does, it creates a load balancer that routes traffic to your pod and then external DNS if you’re in AWS will create a C name to that load balancer name in your DNS provider of choice. Now often that’ll be route 53 if you’re in AWS, but you could also use CloudFlare. You could also use one of many other DNS providers.
Robert Blumen 00:32:29 And who or what is creating that DNS entry? Is that done as part of the orchestration when you request the load balancer service?
Andy Suderman 00:32:38 No, so that’s actually the separate project external DNS. So that’s actually a thing that you would install in your cluster and it runs as a service and it watches for these objects to get created. So it’ll watch for a service that has an annotation that says, Hey, I need a DNS name. And it’ll say, okay, I see this service, it’s got a load balancer attached. That information as in the status of the actual service in Kubernetes. And so it sees that and along with its configuration to say this is my DNS provider, it’ll go to the DNS provider and say, okay, I’m gonna put in this DNS name with this C name. And then it also uses a text record to keep track of which records it has created. So there’s a little bit of safety mechanism built in there too.
Robert Blumen 00:33:20 Got it. So external DNS is a Kubernetes service and it uses the Kubernetes watch mechanism to be aware of when it needs to either spin up or tear down records in the cloud provider DNS or whichever DNS you use. Now that leads into a side question which I was gonna ask, but your Kubernetes service is able to use certain of the cloud service provider APIs. We’ve talked about requesting a load balancer service modifying DNS cloud service providers have very fine-grained permission models of who exactly can do what. So is there a step when you’re bootstrapping the Kubernetes cluster where you have to decide what permissions the cluster has and do those permissions then get delegated to specific services that run within the cluster?
Andy Suderman 00:34:10 Yes, there’s definitely, there’s multiple mechanisms by which you can do IAM mappings or permissions mappings to Kubernetes services. The most common one that’s in use now, well let’s just say back in the day originally we would give permissions just to the nodes themselves. Now this is a little bit of a security problem because if the whole node has the permissions to act on the cloud provider, then any pod running on that node, regardless of whether it needs it or not, has those permissions. So in the last three or four years we’ve moved to what I refer to as workload identity. Different cloud providers have different names for it. So in GKE, it’s actually, I just forgot the name for GKA. In AWS, it’s IRSA, which is IAM roles for service accounts. And so what you do is you create an IAM role that has a certain set of permissions and then you say this service account in Kubernetes is allowed to assume that role.
Andy Suderman 00:35:07 And then you tell the individual service, hey, this is the role that you should use to do cloud provider actions. So the end result is each pod that is running as part of the external DNS service can only assume the role that we’ve given it for external DNS, which means now through AWS’ IAM, I can give it as many or as few permissions as I want. If I only want it to be able to modify a single specific DNS zone, I can restrict it to that. And so you have that fine level of control that you have at the cloud provider level all the way down to the individual pod level in Kubernetes.
Robert Blumen 00:35:43 Okay. So we’re gonna set up a role that’s, let’s call it DNS record, read, write and this DNS external DNS service through these bindings will be able to assume that role and it’s able to create and delete DNS records, but it doesn’t have the ability to create a new database or EBS or any other of the million things you could do in AWS that you don’t want your DNS provider to do.
Andy Suderman 00:36:09 Exactly.
Robert Blumen 00:36:10 Great. Now, we’re going through these layers. The load balancer, which is provided by the cloud service provider, then that is going to proxy to the ingress. Is that the next step in the pipeline?
Andy Suderman 00:36:24 Yeah, so in the event of when we’re using an ingress controller, let’s just use NGINX for our example here because it’s the easiest one to talk about. Because a lot of folks are familiar with NGINX outside of Kubernetes, there will be several NGINX pods running in the cluster and they’ll have their own Kubernetes service that is attached to that load balancer. And so all DNS records that point to the ingress that go through the ingress controller will point to that single load balancer. So it’s a nice way to consolidate all of your load balancers into one and then that will feed through NGINX. And so NGINX will have configured a server block that says this host name goes to these pods basically and then it will route the traffic, it will forward the traffic on to that pod.
Robert Blumen 00:37:11 As you just pointed out, you might be running several instances of the NGINX ingress. So the load balancer, it needs to be up to date on how many instances there are and what their addresses are. And does the load balancer use the overlay network or external IPs or how, what set of IPs is the load balancer proxying to to get to the ingress?
Andy Suderman 00:37:38 So in, in your most standard configuration, in most cases what will happen is the NGINX will be set up as a load balancer service, but underneath that is what’s called a node port service. And so this exposes a single high port on every single node in the cluster that routes traffic to that NGINX instance. And so essentially the AWS load balancer will be routing traffic to every single node or it’ll have in its list every single node on that specific port. And that node list is kept up to date by a Kubernetes control plane component that’s managing the load balancer called the controller manager.
Robert Blumen 00:38:19 So we’re talking about all the steps that the routing goes through to get from the external world to your Kubernetes cluster. We have the cloud service provider’s load balancer, the node port service, which is a type of load balancing and then it goes to the ingress, which is another load balancing I count three load balancers. That seems a bit overdone to me. Is this a good solution or did it have to be done that way because of how the Kubernetes network works?
Andy Suderman 00:38:50 That’s a great question. I’ll start with the first one. Is this a good solution? Likely no. You know, at the end of the day it’s probably not a terrible solution and it does work. I’ll start by saying that a lot of other solutions are out there now that changed this behavior, right? That was the default as of you know, two, three years ago. It’s still the default depending on how you configure. And so a lot of things have been mitigated. For instance, you can instruct Kubernetes to only let nodes that are running the actual pods for the workload to be included in the load balancer. So it’ll actually fail the health checks for the nodes that are not running the actual pods receiving traffic. So that eliminates one potential hop where you end up on a node that doesn’t have the actual pod running and then it gets forwarded to the other node.
Andy Suderman 00:39:41 So that’s one hop potential hop removed and I think that would’ve actually been a fourth in your list there. And then we have things like the AWS VPC CNI, which I talked about earlier, which allows in newer more advanced configurations for you to create a target group for a network load balancer that includes just the pods so it routes directly to the pods, skipping the whole node hop as well. So I do think it was sort of a, maybe not a necessity, but a necessity for keeping things simple and straightforward in the earlier days of Kubernetes and making things work for everyone as much as possible and all the cloud providers. But there’s a lot of different configurations you can introduce now depending on what cloud provider you’re in or what ingress controller you’re actually using to simplify these networking scenarios if that’s needed for you.
Robert Blumen 00:40:35 The last piece you mentioned was certificate manager. Is that another service that runs on Kubernetes that does SAMO to DNS and watches for when there’s a need for certificate and then obtains it from your CA?
Andy Suderman 00:40:50 Yep, that is exactly what it is. So it watches for different things in the cluster. It has its own custom resource definition. So you can just request a cert as a YAML object. So I can say give me the certificate and depending on how you have it configured, what CA it reaches out to and things like that, it’ll generate a cert. The other thing that it does is what’s called the ingress shim, which is it watches for ingress objects that have a specific annotation and then a TLS configuration within them and it will automatically generate that certificate object and then fulfill it like it would if you created the certificate.
Robert Blumen 00:41:25 Then that last step then did I understand certificate manager it would somehow deploy the private key into your ingress? So ingress can terminate the TLS
Andy Suderman 00:41:36 Essentially, yes. What it does is it creates the certificate which then generates the Secret, which contains the key and the cert. And then NGINX ingress will actually pick up that Secret name as this is the cert I’m supposed to use. So the TLS specification in the ingress says what Secret name to use and then cert manager just fulfills that basically.
Robert Blumen 00:42:00 Got it. So it’s handing it off through the Secret rather than going directly from cert manager to ingress. And on the topic of ingress, I’m aware there are many popular load balancers, NGINX, which you mentioned are certainly very popular, you have a bunch of others. If an organization has preexisting preference for one of the reverse proxies they like, is there likely to be an ingress that’s built around that particular reverse proxy?
Andy Suderman 00:42:28 It’s quite possible. I don’t know that I’m up to date on the list of all the possible reverse proxies out there, but it’s quite likely that there may be an ingress controller out there for it.
Robert Blumen 00:42:38 And you also mentioned Secrets, which is an area I wanted to get into. The Kubernetes Secrets are not very good. You may decide they’re not Secret enough for something security that you need to have. What do you think of the built in and what are some options for doing better?
Andy Suderman 00:42:56 I was going to say, I want to start by addressing that statement that Kubernetes Secrets aren’t very good. I think Kubernetes Secrets get a bad wrap because by default their base 64 encoded and a lot of folks like sort of confuse that for encryption, which hopefully we all know is not encryption, they’re not intended to be encrypted. However, Secrets as an object in Kubernetes are treated with the respect by the API that a Secret should be treated with. They have fine grain controls over permissions, they are stored in a separate area of the state store of etcd for your cluster and they’re not printed in any sort of built in logging or anything like that. So they’re treated the way that Secrets should be. I think what folks take a little bit of objection with is that they’re not encrypted within etcd.
Andy Suderman 00:43:44 So that’s a question of your risk tolerance and your threat profile. About how much you want to protect the Secrets etcd itself is probably running on an encrypted at rest storage mechanism and maybe encrypted in other ways. And so all of your communication with etcd will be encrypted by default. And so if you don’t have the need to store them encrypted within etcd, so if you don’t think your etcd database is gonna get leaked in plain tax to the world, then it’s probably overkill to introduce one of these other solutions. That being said, there’s lots of other solutions out there that can make Secrets different or handle them differently. So there’s the ability to encrypt them within etcd using your cloud provider key storage, so KMS in actually all the clouds. I think they all call it KMS because it’s a key management service.
Andy Suderman 00:44:31 And so there’s the ability to run a controller that essentially has AWS or GCP permissions to use that key to encrypt the actual Secret before it goes into etcd, and when you retrieve it. I question the value of this because now you’re just offloading the encryption to a different place in the cloud provider. Is it truly more secure? And I’d have to draw that threat model out to really determine, but it always seemed a bit of overkill. If you’re really, really concerned about Secrets management and Kubernetes, what I recommend is just offloading your Secrets into a different place entirely. So using something like HashiCorp’s Vault to store your Secrets or your AWS Secret manager, your GCP Secret manager, and then referencing that directly from either your application or using a controller in the cluster to give you access to those Secrets on an as needed basis And with fine grained IAM permissions.
Robert Blumen 00:45:24 Okay. So we’ve covered a bunch of pieces in that stack for getting traffic into the cluster. I’m gonna change directions now and talk about some of the security features. Kubernetes does offer role-based access control. Is that gonna be a default setting or should you turn that on and should everyone be using that
Andy Suderman 00:45:47 By default, it is turned on in pretty much every instance of Kubernetes that I’m aware of these days. It’s been around for long enough that it’s pretty much just built in. I’m not even sure you can turn it off at this point, but yes, absolutely everyone should be using it. Most of the services that you deploy to Kubernetes aren’t gonna need Kubernetes permissions themselves. So you know, my web application probably doesn’t need Kubernetes permissions to talk to other stuff in the cluster. And so the service account that that particular pod runs as should have no permissions in the cluster. And then when we talk about users accessing Kubernetes and administrators accessing Kubernetes, using those RBAC roles very heavily is definitely recommended.
Robert Blumen 00:46:33 By Kubernetes permissions, do you mean the service having a permission to talk to some part of the Kubernetes control plane through a Kubernetes API?
Andy Suderman 00:46:43 Correct. Yeah, so some things need that. We talked about controllers like external DNS and cert manager. They need to be able to ask the Kubernetes API about what ingress exists and what annotations do they have, whereas you know, your web application shouldn’t need those permissions to talk to the Kubernetes API.
Robert Blumen 00:47:02 So looking at other aspects of security, there are a number of things that have the word policy in the Kubernetes world, we have a network, namespace policies, node policies, certainly role-based access control can be considered policies, although it doesn’t contain the word. And then there’s another add-on called Kyverno, which is called a policy manager. Are these to some extent completely independent and we need all of them or are they different solutions to the same problem where you pick what’s appropriate in your situation? How do you navigate through this policy space?
Andy Suderman 00:47:40 That’s a great question. We’ve kind of done ourselves a disservice with the policy word and overloading it in a few places. So the few things that you listed, I think cover very different areas and I’ll kind of separate them out. Network policy is its own specific thing because that is a Kubernetes built-in API object and that specifically dictates what traffic can come in or out. Think of it as a traditional firewall rule, right, for your namespace. And so any pod in that namespace can’t talk in or out based on that network policy. And that’s enforced by the container networking interface that we talked about earlier. And so it’s a fairly low level piece of policy, right? We’re talking about like at the IP address level, whatever. My layers are a little off in my head. It was at layer four. So that’s network policy and that’s kind of its own category of things.
Andy Suderman 00:48:32 When you start talking about Kyverno, and actually I’ll shamelessly plug one of our open source projects, Polaris, we’re talking about policy around what you can and cannot do within the Kubernetes API, it’s sort of a, a twist on RBAC. RBAC says what you can do says that, you know, this entity is allowed to perform these verbs on these nouns in the cluster, right? And it can do those different things. Whereas policy is more saying you can’t do these things. And so typically I think of it as like a lot of times it looks like JSON schema where you have a specific set of things that are allowed in this unstructured object, which is the Kubernetes YAML or the structured object, sorry, with loose definitions. And now we restrict that even further to say you can’t do this. So that’s a very abstract way of talking about it. I think an easy way to talk about it is like, by default Kubernetes lets you deploy resources or pods that don’t have a resource request that just like put me wherever, I’ll figure out how much resources I need later. Well you can say with policy that’s not allowed to happen in this cluster. The Kubernetes API may allow it, but now my policy’s further restricting what it can can do in Kubernetes.
Robert Blumen 00:49:50 Give an example of, you said one is you can’t deploy a pod without a resource request. Give an example of another policy that you could implement with Kyerno or Polaris of something you can’t do.
Andy Suderman 00:50:03 So by default, anytime you deploy a container into Kubernetes, it runs as the root user. So, and that’s part of the security context specification of a pod and that’s something you may not want to do. So we can restrict that with policy as well. And then there’s privilege escalation that’s built in as well. So like the ability to pseudo and then different capabilities that the container might have at the kernel level, so like capsis admin or things like that. So you can restrict all of those.
Robert Blumen 00:50:31 Andy, in the time we have left, we’ve covered a lot of aspects, decisions that you need to make along the way to get your cluster up and running. Are there any major areas that need to be taken into account that we haven’t covered?
Andy Suderman 00:50:44 That’s a good question. I think we covered a lot of the really foundational stuff, which is good. I think one area that we didn’t talk about much is how to deploy into Kubernetes. You know you have your Helm charts or your customized like how you manage the actual YAML that you deploy with and then how that actually gets deployed into the cluster is another thing to be, to be thinking about as part of your Kubernetes strategy
Robert Blumen 00:51:07 And what are some of the major options in that area.
Andy Suderman 00:51:10 So Helm’s a very popular way to package up your YAML. It’s a templating language essentially that lets you, you template out YAML and then it has its own ability to deploy to the cluster via Helm install and that creates a release object and sort of tracks the lifecycle. That’s one way that’s popular that we’ve done for a long time. And then the next kind of like big category of things is the GitOps tooling space where we run sort of a long live process in the cluster that watches a Git repository full of YAML or Helm charts or however you want to package your YAML and then keeps the cluster up to date with that repository so you don’t actually deploy, you just make changes to Git.
Robert Blumen 00:51:51 I’ll mention to listeners, we have episode 440 on GitOps and 509 on Helm charts. Andy. So to wrap up, anything you’d like to tell us about Fairwinds?
Andy Suderman 00:52:02 Oh, so many good things to talk about with Fairwinds, but Fairwinds has been running clusters for, I mean I’ve been here for five and a half years. They were running Kubernetes two years before that, so since pretty much the very beginning of Kubernetes. So our services arm can help you run your clusters and help your team bolster its Kubernetes knowledge or just run all of your infrastructure for you if that’s something you want. But then we talked about our open source Polaris, we have other open source, we have a lot of open source, Polaris, Goldilocks, Pluto, RBAC manager, Nova and Gemini. I think that’s most of them. And all of these tools are just ways to help you run Kubernetes better, more reliably, more securely. And then if you’re interested in running our open source at scale along with other open source, including Kyverno and then doing cost management, we have a SaaS product that you can go check out. We have a free trial of it up to two clusters. So give that a shot at insights.fairwinds.com.
Robert Blumen 00:52:56 Would you like to point listeners toward your presence on the internet anywhere?
Andy Suderman 00:53:02 I’m not super present on the internet. I’m very active in the CNCF, so various areas of the CNCF Slack and the Kubernetes Slack, and then LinkedIn. I am SudermanJr. just about everywhere you can, you can find me.
Robert Blumen 00:53:17 Andy Suderman, thank you very much for speaking to Software Engineering Radio
Andy Suderman 00:53:21 Thank you for having me. It was a great time.
Robert Blumen 00:53:22 This has been Robert Bluman for Software Engineering Radio and thank you for listening.
[End of Audio]
Wonderfully explained each feature! Thanks Robert for your podcast.