18 Nov 2024

An Argument For Kubernetes

This post started out as a huge article about why you should pick Kubernetes. But after coming back to it after 3 months and in the interim having a healthy debate at my company about the adoption of Kubernetes, I realized that the post misses the mark about the sentiment of the argument.

I’ve worked on some great projects built on top of K8s. Admittedly some of them shouldn’t have been and I have been guilty of over-engineering. But I’ve also built some incredibly robust systems on top of Kubernetes that have saved a lot of engineering hours and on-call.

So from one engineer to another, I’m not going to tell you to use Kubernetes. I’m going to point out some of the good, bad and ugly and encourage you to make the choice that’s right for you, your team and your company. If neither team nor company apply then I’d actually encourage you to adopt and learn it.

The Ugly

Kubernetes is a beast. It’s a complex system that has a steep learning curve. As this shamelessly stolen meme shows (quite accurately actually), there’s a lot of footguns in Kubernetes. At an operator level and at a user level.

That said the certifications for Kubernetes are incredibly apt. The CNCF have done a fantastic job of finding paths to learn Kubernetes and they largely fall into operators and users (developers). They’re relatively well priced, have regular sales and are well recognised. I have a generally negative opinion of certifications which I won’t delve into here, but if you’re the same, then I’d encourage you to look at these as an exception to the rule.

The problem isn’t also that bad for users. It’s the equivilent of learning a generic infrastructure domain, much like learning how a for loop works and then understanding how a for loop is constructed in many languages. The concept of a healthcheck probe failing isn’t that hard to understand and arguably easier than learning the name of 3 different load balancer product names in the big 3. I say this as someone who has had the pleasure of many years in the industry and can appreciate it is a learning curve despite my opinions. The cognitive load does exist, but I think it’s a lesser of evils.

The Bad

We are on v1.31 of Kubernetes at the time of writing. That’s 30 upgrades. A new minor version releases every 4 months and it receives regular patch updates. On one had.. fuck yeah! On the other.. oh fuck! The monumental effort in releasing these every 4 months and having all the bug and security fixes inbetween is a testament to the outstanding organisation and community effort.

It’s also a testament to the commitment of operating a Kubernetes cluster. Patches are a little easier to deal with, you can be confident about an apply and go make a coffee Minor version upgrades require more tact and reading the notes. There can be breaking changes if you jump too many versions. The usual process is they mark something deprecated and then 2 minor versions later it is removed. So keeping an eye on upgrade notes is important.

The API has become much more stable. Up until about 1.27 it felt like there were major deprecations every other release, but that seems to have stablised as features are refined for new use-cases.

The application of upgrades is also fairly easy with managed providers. You still have to deal with your own workload upgrades, but the major providers are building tooling around that to help with those too.

The Good

<Fanboy>

I could go on for a while, in fact it’s why I’ve written this post in reverse, to leave this post on the good notes.

Kubernetes is fucking awesome. It’s hands down the most influential piece of software written in the last 10 years. It has influenced our operating systems, it has influenced our cloud providers directions, it has influenced how we write and package code. It has influenced an entire ecosystem of tools around it. Most important it has influenced how we think about infrastructure.

So k8s came from Google inspired by their internal system Borg. So the design is very Googley. Written in Go and is basically the SRE handbook poster child.

</Fanboy>

So now we’ve got that out of the way. Here’s my argument for Kubernetes.

The Scheduler

It’s state of the art. Most of the API changes in the .20’s were around improvements to the scheduler design. Most people will not need the level of configuration that the scheduler provides, but it’s there if you need it. It’s there because people have needed it.

If you’re looking for stateful workloads (where the state exists in the deployed app itself), then they’re entirely possible but there is an ushering towards ephemeral workloads. This lands in the good for me. It doesn’t stop you from doing complex stateful workloads, but it does encourage you away from them with better patterns.

The API

The API is a thing of beauty. It can feel verbose specifying a version, apitype and kind. But objects are well defined and the API is extensible. This has opened the API beyond compute (and all the other bits of infra you love). You may have heard of CRDs (Custom Resource Definitions), or the operator pattern. This is where the API has been extended to allow you to define your own objects and controllers.

There’s a classic k8s meme about deploying your blog on k8s. Excellent meme, 10/10. But I can’t help feel that it was made by someone who doesn’t understand the ecosystem. I by no means advocate for deploying your blog on k8s, but I would love to see someone write a CRD for blog entries and a “serverless” operator watch and deploy those entries as an SSG to a CDN. Not a production use case, but a fun one.

The Abstraction

Earlier I alluded to k8s being worth learning because it’s a generic infrastructure. It’s a great abstraction. Your compute, storage, networking, security and probably more I’ll have missed over the years of development.. these are all now well defined and well refined. You don’t really need another representation for the same boring infra that every cloud provider has.

It’s represented once and you can run it practically anywhere from your laptop, to your raspberry pi, to your on-prem datacenter, to your cloud provider of choice.

The caveat there is the operators do have to support those abstraction layers. So a load balancer for example can be as easy as installing your cloud providers load balancer operator and sticking some credentials in there (usually done for you in a managed service). On-prem becomes a little harder but is totally doable. Local? You probably don’t care about that aspect.

The interface for the user is all the same. They only have to learn the syntax once and the domain once.

The Community

Kubernetes is a passion project for many. It shows. There’s so many SIG’s (Special Interest Groups) out there for employees or volunteers to get involved in. The software ecosystem around Kubernetes is vast and varied.

Summary

So while I wont be running my blog on k8s any time soon. I would encourage you to learn it, embrace it and enjoy it.

It’s a fantastic piece of software that has changed the industry for the better. It’s not perfect, but it’s a damn sight better than what we had before.