SE Radio 565: Luca Galante on Platform Engineering

Luca Galante, head of product at Humanitec, joins host Jeff Doolittle for a conversation about platform engineering. They begin by defining platform engineering and its relationship to, and distinction from, DevOps. Tracing platform engineering’s history, Luca describes how internal developer platforms are fundamental, and then explores the goals of addressing complexity and reducing the cognitive load on developers by creating golden paths.

Show Notes

From the Episode

From IEEE Computer Society

From SE Radio

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Jeff Doolittle 00:00:16 Welcome to Software Engineering Radio. I’m your host Jeff Doolittle. I’m excited to invite Luca Galante as our guest on the show today for a discussion about Platform Engineering. Luca leads product at Humanitec providing the platform orchestrator that lets engineering teams remove bottlenecks by enabling them to build code-based golden paths for developers. He is a core contributor of the platform Engineering Community. He hosts PlatformCon, the number one platform engineering conference and writes at Platform Weekly, which goes out to 10,000 readers every week. Luca routinely speaks to tens of engineering teams every month. He summarizes his learnings and takeaways from looking at hundreds of DevOps setups into crisp, insightful reads for everyone in the industry. From beginner ops to Cloud experts. Luca, welcome to the show.

Luca Galante 00:01:06 Hey Jeff, thanks for having me.

Jeff Doolittle 00:01:08 Let’s dive right in. What is Platform Engineering?

Luca Galante 00:01:11 Yeah, so platform engineering in my view is the art because it’s really an art more than a science of basically taking all the tech and tools that you have floating around in the enterprise today and binding them into a simple golden path that removes cognitive load on individual contributor under developer and enables self-service for engineers using it. And the superset of these golden paths is what is often referred to as an internal developer platform or IDP for short, which is effectively the end product of the platform engineer team or platform engineer organization.

Jeff Doolittle 00:01:53 So let’s unpack those a little bit. When you say cognitive load, I’m imagining we’re just trying to make it easier for people to think through complex problems and the way you’re describing that, there’s a couple concepts there that maybe we can dig into a little bit more. You mentioned self-service, be probably good to dig into that a little bit and then let’s talk a little bit more about what exactly is a golden path and how does that relate to this idea of self-service?

Luca Galante 00:02:14 Yeah, so cognitive load is basically referring to the fact that if you look back 10, 15 years ago and I just wanted to do a simple deploy to test something in a preview environment that was as developer I would have to probably touch like maybe a script, one or two tools, deployment tools and that was about it. The reality today is a lot more complex at that you have all these Cloud native sort of like technologies and trends that overlap with each other, which makes it quite overwhelming to do simple operations like that. And so developers are often almost scared of going touched as like three, four different scripts that Helm chart, those Terraform modules over there and so on and so forth because they’re scared of screwing things up. And so that’s what we mean by cognitive load.

Luca Galante 00:03:12 And so the job of the platform engineering team in, in that context is basically to listen to developers, figure out what their issues are and then design a golden path that let them, lets them consume these different tools and scripts and, underlying infrastructure and technology that they need to do their job in a way that is easy but specifically that gives them the right level of context and the right level of obstruction that they choose, right? So that’s where it’s really, really important that the platform team has a very tight feedback loop with developers because you’re going to have different types of developers, different types of users with different preferences and we can get into that later, but that’s really how you craft those different golden paths. And ultimately that’s also what makes an IDP, Internal Developer Platform, different from a PaaS that, is a lot less customizable and a lot less flexible for different types of users.

Jeff Doolittle 00:04:15 When you say PaaS, I believe you’re talking about Platform as a Service which essentially is basically just a lot of tools and technologies, and you have to understand them all in order to put them together. And it sounds like what these golden paths do is they provide maybe more cohesive and clear ways to adopt some of those technologies and services without maybe needing as much detailed understanding about all the complexity about all those services. Is that a good way of looking at it?

Luca Galante 00:04:42 No.

Jeff Doolittle 00:04:43 Okay.

Luca Galante 00:04:44 So PaaS is kind of the opposite of that, right? So like PaaS is actually like an oversimplification, like an extreme abstraction and it’s usually, I mean the OG PaaS that most people might be familiar with is a Roku, right? Nowadays there are a lot many more that popped up that are a lot more specialized in different use cases. But the idea of a PaaS is basically you have the product team behind a PaaS offering that says, hey I’ve figured out the market and I’ve worked out that this level of obstruction is the right minimum and common denominator across all these different engineer organizations. And so off you go, here you can use this. And I often tell teams if you are 50 developers minus, right? If you’re like 20, 30 engineers and not everybody is an expert in Helm charts and so on and so forth, ICS and so on, then go use a PaaS, right?

Luca Galante 00:05:43 Like a PaaS is actually what you want to use because it’s much easier, it can get you up and running without everybody needing to understand everything. The issue with PaaSes are, they don’t really scale at the enterprise size. And so when you cross that kind of 80 to a hundred engineers mark, that’s when usually you need to start crafting those different golden paths that we’ve been talking about for different users. And this will happen at the same time that your Cloud footprint and stack gets more complex. And so that’s where you basically want to make sure that you’re transitioning from a simplified PaaS offering that doesn’t let you customize because again, the product team of that PaaS says, hey, I’ve figured out the one denominator, but obviously that means you kind of actually tweak it as much. And so then that’s where you move to a platform engineering basically model where you have an internal platform team that takes the different tools of platform engineering that can be both open-source tools and commercial offerings and then ties them into this PaaS like experience. But that will be really geared towards the different types of users, across the different types of teams.

Jeff Doolittle 00:07:01 I think I understand now and I think the Heroku analogy is really helpful. So let me take another stab at summarizing that and maybe I’ll get it right this time. With something like Heroku as an example, that’s a platform as a service, which in a sense that’s, that is a form of platform engineering. It sounds like you’re saying here we’ve made some decisions for you, but the challenge is you’re saying that doesn’t scale once your team gets beyond a certain size because the design decisions that went into something like that platform as a service are very constrained. They’re specific maybe to certain team size, certain use cases, maybe they don’t work at enterprise scale and massive global scale. And at that point you need to start crafting your own platform as a service, but there it may not be a one size fits all for your entire organization. And so, you need to create multiple golden paths as you’re describing it so the teams can be successful instead of trying to create the single platform as a service to rule them all. Did I get it right this time?

Luca Galante 00:07:56 You did and that’s what we call an in-turn developer platform, right? And that’s to me is like really the key difference because sometimes people are like, oh, it’s just like a PaaS. I’m like, well it’s a PaaS like experience but on top of enterprise tech and tools, right? And that’s really the key difference.

Jeff Doolittle 00:08:10 I see. So why the two terms platform engineering and internal developer platform, how do those terms kind of relate and how are they different?

Luca Galante 00:08:19 Yeah, so platform engineering is really sort of like the discipline that for me stems from DevOps, the original promise of DevOps, which is you build, you run it right? Which became basically impossible to actualize in a world where you have 10,000 CNCF tools, landscape and everything that we already talked about, right? Where like developers are really overwhelmed, which then turns into a ticket ops issue and then operations are under pressure and there’s just a lot of friction in the system. And so, this like great idea of DevOps originally of taking down the barriers between developers and operations really actually becomes a pretty frustrating reality in the vast majority of cases. And so, the solution to that really is platform engineering, right? And so, platform engineering is in that context an evolution of DevOps that actually enables true DevOps, enables true you build it you run it at the enterprise scale.

Luca Galante 00:09:17 Because again if you go back to the PaaS example at that scale you basically have two options. Let’s say you’re 20 engineers, right? Like either not everybody’s a pro in your infrastructure stack and then you want to go the PaaS way or everybody is a pro and then you can actually do true DevOps, right? You can actually have everybody do everything. The problem is that doesn’t scale, right? It doesn’t scale as your setup gets more complex and especially as you’re adding more people. And so platform engineering is enabling true DevOps in the enterprise and the end product of the platform engineering org is an internal developer platform.

Jeff Doolittle 00:09:52 Okay. Can you give some specific examples of maybe in your experience or teams you’ve seen, what are some specific struggles that they face when they’re trying to have, as you say, if no one’s a pro and you just pick a simple thing and you go with it or everyone’s a pro and the reality is usually far more mixed, right? So, I imagine you have situations where maybe you have one or two really seasoned people and then a bunch of people who aren’t. And how does that kind of play out in struggles so that our listeners can maybe feel this pain themselves and say, ah yes, I’ve experienced that, I’ve seen that myself.

Luca Galante 00:10:24 Yeah, a hundred percent. I mean one of the most basic examples that we see a lot is in, especially fairly large enterprise are developers requesting a new environment or a new piece of infrastructure like a database, right? And that can literally take up to two weeks sometimes, right? And so the velocity and productivity of the development team drops significantly and obviously they’re frustrated, right? Cause they’re just like, hey man, I just want to prove your environment to test something. Like how is this so complicated, right? And on the flip side of that is the operations team who’s getting all this like inbound tickets, right? And that’s what we call them ticket ops, right? And they’re completely underwater. Cause instead of working on fun problems and figuring out how to load balance that production over here and all these clusters and just like, or, adding new sorts of infrastructure because developers are requesting it, no.

Luca Galante 00:11:23 Instead they’re really bogged down in constantly putting out fires and really do copy paste work, right? Like if you talk to a lot of this like DevOps and even just like platform teams that are, they’re doing the wrong type of platform engineering. They’re basically just like copy pasting the same stuff over and over again and, they’re very well paid for it usually, but it’s just like very non fulfilling work. And what ultimately suffers is the overall, not just organizations in terms of like time to market gets really affected by this, but also your, the overall satisfaction employee satisfaction and, and just like simply put happiness of the people working in this setup is decreasing. And so, if you think about how self-service helps in that case, right?

Luca Galante 00:12:15 And, by the way, the first sort of like companies, right? They’ve figured out, okay, this thing is a problem, right? Were the very kind of large tech companies leading engineer organizations, right? That were the ones that had the most complex setups, right? That were getting increasingly complex and who were also the ones onboarding literally hundreds of developers a month, right? And so they quickly saw, hey, this idea of DevOps doesn’t scale. Like there is no way that I expect all these new people that I’m hiring to quickly onboard and be a pro at this increasingly complex Cloud native stack.

Jeff Doolittle 00:12:56 Kubernetes, Kafka, Terraform, Helm charts, all this stuff that keeps getting more complex, right? Okay. So hundreds of people onboarding, no way they’re going to understand everything about everything. Something’s got to change, okay.

Luca Galante 00:13:08 Exactly. And they said, we need to build some sort of platform layered here in between the operation side of things and the development side of things so the developer can self-serve. And so, the whole idea is the whole true you build it you run it, enable developers to configure their workloads, their applications on their own, have their workloads consume the right type of infrastructure, the right type of environment, Postgres database, the right type of Helm charts and Terraform files under the hood, right? But removing that complexity away from them, right? So that they can simply declare for instance, hey I need my workload to have a DNS and a Postgres database and whatever. And then the platform goes and resolves all of those things automatically, right? And then deploys the application for instance in a new fresh preview environment, right? And in all of this, they didn’t have to slack or talk to an operations colleague and operations can do their job and do more interesting things with their time.

Jeff Doolittle 00:14:22 So let’s dig into that a little bit. And I imagine there’s maturity levels as far as platform engineering goes, but let’s go all to the end state of when we’re in the perfect platform engineering world that we all want to be in at some point in the future, which hopefully is sooner rather than later. But I imagine it takes time to get there. How does this self-service play about like what’s the delivery mechanism by which the developers find the appropriate documentation, the appropriate are we delivering documentation via websites? Are we giving them portals? Are we giving them CLIs or other things? Talk a little bit about the landscape of if I’m a developer in a platform engineering environment, what does my day-to-day kind of look like with how I’m operating with this platform engineering internal developer platform?

Luca Galante 00:15:07 That is a great question and I think probably worth answering with one of the anti-partners that I see a lot of kind of platform teams falling into, which is kind of like, hey, I am technical management and I’m speaking with my platform team and I’m like, hey guys, we need to improve developer experience, right? And it’s like this like very top-down thing. We’re going to build a platform, we’re going to improve developer experience and so on, right? And that’s kind of like what you’re asking, like what’s the developer experience look like, right? And then the mistake that a lot of platform initiatives make is they basically look at developer experience from a chronological perspective, right? So they go like, okay, what do developers do first they create a service, then they add some features, whatever deploy to dev, then test it, staging production and so on, right?

Luca Galante 00:15:53 And so they look at this chronologically and then they stop at the first step, which is service creation, right? And then they decide to look at something that’s like very trendy in the market, like a backstage for instance, and build a portal, right? And the problem with that is that how much time are you actually investing in service creation, right? I mean, if you are a crazy high performing streaming company like Spotify probably quite a bit and you have like thousands of developers on this and so on. And so that makes a lot of sense and, for some other companies it might as well, but in the vast majority of cases it is less than 1% of the times you’re actually spent on service creation. And so actually all that work that you’re doing in building that portal is providing you with a relatively pretty low ROI, right?

Luca Galante 00:16:50 And, from the perspective of the developer, now you just gave me like a UI to click around once every X weeks really, and that’s pretty much it, right? So, and that can feel good from a platform initiative perspective because it feels like you’ve built this platform layer now technical management has a dashboard to look at likely, which they always love, but really you haven’t fixed the, the most pain of the developer, right? And so how do you, how do you avoid that? And by the way, I just want to make sure that I get this across. Like I’m not disregarding stuff like backstage, we’ve worked at Humanitec very closely with the backstage product team. It’s a great product I think but, it’s a great product for a very precise set of customers. And a lot of times we see because Spotify or because Google has done it, then everybody wants to do it. And a lot of the time it’s not the same thing, right?

Jeff Doolittle 00:17:48 You’re not Spotify, you’re not Google for, at least for most of our listeners, right?

Luca Galante 00:17:52 Exactly. You don’t have the same level of resources, the same talent, right? Yada, yada yada, sorry, , you’re just not Google. And so some of the things they do could be great. Kubernetes is a great example, right? But not all of them, right? So really I think in that sense, be wary of consultants that come in and tell you, hey, yeah, we really need to think about how we get you to how you get your platform initiative to be like Airbnbs and it’s like, not really. So anyways but go back to your question. So, that’s an example of for instance, a UI heavy interface. And that can be a path in our experience in what I’ve seen a lot in the market is not the best path. Cause a lot of people will feel obstructed away by using a UI right?

Luca Galante 00:18:45 And so this like click ops setting, a lot of the times especially from the more senior backend engineers, you’re going to get huge pushback. They’re really not going to like it. And so that’s where kind of what we were talking about earlier becomes really root important, which is as the platform team, you really need to figure out what’s the right level of obstruction, what’s the golden path, what’s the right interface and interaction method for your developers? And you might have a simple UI for some use cases and from some teams. But what we see in the majority of cases is code base is what works best, right? Because it naturally fits into the workflow that developers are used to, how they’re normally used to like design things, configure things, and provision things. And so that’s where we see the high performing platform teams and some of the most successful platform initiatives. They usually are driven by a code-based interface.

Jeff Doolittle 00:19:47 So how is that delivered exactly then are we saying, let’s say I’m in Java, my developers are in Java, is the platform team delivering them just, are they looking at a GitHub repository that’s shared or are they getting Maven packages that they’re grabbing and been loading the JAR files in or how is this kind of, again, trying to get back to that, if I’m the developer day-to-day experience who’s leveraging this platform engineering internal development platform, how am I in that context interacting with it and being successful using it?

Luca Galante 00:20:16 Well I mean, not to kind of like speak my own game too much, but one of the standards that we’ve open sourced and that has grown a lot in the market in the last three to six months has been SCORE. And SCORE is a workload specification that precisely lets you do that, right? Let’s you basically use a YAML file, right? So it’s just like a YAML file that lets you specify what your workload needs to run, and you need to do it once and then it’s going to resolve itself depending on the environment that you’re running on, that you’re deploying to, and the dependencies that that you specify in the workload. It’s going to resolve all that environment-specific, right? Every time you deploy. And so from the perspective of the developer, you specify in an environment-agnostic way what your workload needs to run. And then SCORE plus a platform orchestrator like Humanitec, for instance, does everything else, right? And so in this case is a simple YAML file, it can also be you mentioned CLI, it can also be like a CLI, but in this case, to keep it like fully code base, it’d be literally just like a score YAML file that lives in your repo.

Jeff Doolittle 00:21:31 So for some of our listeners who may still be in more of a client server environment, which let’s be honest, it may be the 21st century and we may almost be a quarter of the way through it, but there are still a lot of companies that are working to try to move from more of a client server model into more of a Cloud native model. So, explore a little bit more of what you mean when you say a workload just to make sure that for listeners who may be less familiar with things like Kubernetes and Cloud orchestration platforms, that they kind of have a sense of what we’re talking about.

Luca Galante 00:21:58 Absolutely. So a workload in our world is really a representation of an application effectively, right? And specifically of a containerized application, right? And that’s really when we talk Cloud native, we’re usually token containers and that’s the vast majority of what we see in terms of like platform being platforms being built. Because again, like if you think about platform engineering, as we said is this like enabling true, you build it, you run it is really DevOps for the Cloud native era, right? So that’s kind of like what we’re talking about here. And so in this context, a workload really is a unit where you specify that connects basically to external dependencies. This could be something like a DNS, it could connect to a database, it could connect to like a secrets manager, an external secret manager, like a vault instance by Hashicorp, stuff like that, right? And so that’s really the, the unit that in the vast majority of cases developers are going to be concerned with, right? And so, when you’re designing your platform, you really want to optimize for this specific use case and for this specific unit, right? And like what is the best way to let developers just describe their ideal application architecture, right? Through this workload specification, which is ultimately what for instance core is a workload specification.

Jeff Doolittle 00:23:31 So we’ve explored a little bit about these concepts where you could build a developer portal like we’ve said you could do that, but maybe this place is where the ROI is not there. We’ve talked a little bit about declarative tooling and CLIs. Is there anything else from a developer’s day-to-day experience where, they’re going to benefit more from a platform engineering approach? Are there other things like maybe you know, talk about containerization, maybe there’s things like SDKs or other things I just want to give our listeners a flavor of, if I work in a company that’s doing platform engineering, what are other ways this is benefiting me as a developer?

Luca Galante 00:24:07 Totally. I mean, look, I think the main thing here is in a working a company that is using CloudNet technologies, you’re already benefit from everything that does, like CloudNet technologies provide out of the box and we don’t need to cover all of that. But the key thing with the company that is using is, is applying good platform engineering practices is you’ll get the benefits of using the Cloud native technologies without the downsides, which can really be a bummer in a lot of cases. And honestly, like the easiest way of explaining it is they translate into waiting times. Like right from the developer perspective, I think the biggest pain is literally waiting, like, you just want to get something started. You want to do a quick experiment, you want to test something, you want to, you just want to get a simple database or a simple environment and you can’t get that right.

Luca Galante 00:25:01 And that’s really painful. And so, you basically go from like waiting, let’s say an hour, two hours, three days, two weeks in the worst cases I’ve seen, right? To like zero. It’s instant. It’s like you click about them, you run a command, you deploy a script, and whatever it is that you need, it just is self-served. It just happens, right? And I mean, the other pain point that some people in the audience might relate to is what I call shadow operations, right? And this is a little bit kind of like what you were touching on, which is you have the, the interesting scenario sometimes where you have junior developers and senior developers in, in product teams, right? And you have operations that’s completely underwater as we described. And then you have, and so basically what ends up naturally emerging is that the senior colleagues will basically start taking over ops, right?

Luca Galante 00:25:57 Because like the junior colleagues will be like, hey, can you help me with this? Can you help me with that? And they’re the ones that are more familiar with like, handling the scripts and all these things. And so they’ll end up doing it, which then basically makes them in a further bottleneck, right? So now you have like operations being a bottleneck on an organizational level, then you have the senior people being a bottleneck on a team level. And frankly you have your most experienced and frankly most expensive, right? Like development resources being blocked and blocking others, uh, which basically just like compounds this frustration, right? And so, you can think of like what’s in it for you as an, as a developer, frustration disappears, waiting times disappear. And so, you get all the benefits of Cloud native without the nonscience.

Jeff Doolittle 00:26:42 Yeah. That focus on bottlenecks I think is really interesting. And I’ll put a link in the show notes to listeners to a book called The Goal by Eliyahu Goldratt. And there’s a, a similar book that a lot of us are familiar with called the Phoenix Project, but it’s essentially a redo of the goal for software engineering. But the concept of bottlenecks I think is really important. And, I’ve seen it a million times where you have a team with one senior engineer and they throw five juniors on the team and they think that’s going to work. And the problem is the senior engineer is no longer a senior engineer. They’re basically just constantly interrupted. So, the juniors are waiting on the senior to do things, but the seniors constantly interrupted by the juniors so they can’t do things and there you have it .

Luca Galante 00:27:20 Exactly. Yeah, yeah. I call it shadow ups, but yeah, precisely that.

Jeff Doolittle 00:27:25 Yeah. And I guess in this case, shadow in a bad way not like it’s cool, I’m in the shadows and I’m in Congo. It’s like, no, this is like bad. This is like, yeah, someone’s lurking in the alley and they’re going to like take you out kind of shadow. Like this is no good.

Luca Galante 00:27:38 Exactly.

Jeff Doolittle 00:27:40 Yeah. So, let’s switch gears and let’s look at the other side. So we’ve kind of looked at if I’m a developer and I’m working with platform engineering, kind of what is this going to give me? And we’ve talked about some of the, the things you might do to deliver that and how it can help with reducing bottlenecks because there’s consistent practices, there’s declarative models, there’s ability to basically spin up the infrastructure I need and the, the platform tooling that I need much more quickly instead of all the waiting around in the bottlenecks, but on the other side of things. So now let’s say I’m somebody who’s building an internal developer platform. What does my day-to-day look like? What are the kinds of things that I’m doing to build out this developer experience within our organization?

Luca Galante 00:28:18 Yeah that’s a great question. So, I think the main thing that you’re doing is listening, right? I think if you were to kind of like cluster the two sort of like key skillsets for a good platform engineer in my opinion are product mindset and communication basically, right? And so let’s break that down. So, product mindset is something that we really advocate for a lot in the community. And it’s the whole idea of platform as a product, right? And it’s the idea that like, hey, if you come from a devil’s background we were talking about it earlier before we got on, which is the usual sort of mindset there is, I’ll build all the systems and I’ll teach it to the developers, right? Instead, you don’t want to do that because again, as you scale and as you’re adding more systems, that’s just not sustainable.

Luca Galante 00:29:09 And so what you want to do instead is enable them to self-serve whatever it is that they need without having to understand. Now if they want to learn, if they want to understand it you have different sort of like knobs that you can, that you can serve turn and, and kind of like lower the abstraction and give them more context, right? Or you can do the other way around basically. And so a lot of the work really becomes figuring out what’s the right level of obstruction, what’s the right type of golden path for the right type of users. And so let’s get more sort of pragmatic. Let’s say you are working with a senior backend engineer, right? And they really love working with their YAMO files and their Helm charts. Now if you’re going to provide them with a golden path that is basically a very abstracted like UI where they can only do a couple of clicks and they can no longer specify what’s the amount of CPU that my pods is consuming.

Luca Galante 00:30:07 And like all these things, they’re going to be really mad at you. And the likelihood of your platform engineering initiative working out is getting lower and lower. Similarly though, if you are building an interface and a golden path for your, let’s say junior frontend engineer who basically just want to deploy their application, their front end somewhere to test some changes that they’ve made on the code, they don’t care whether you’re running on EKS or GKE, they probably don’t care whether you’re running on Kubernetes at all. And you could argue they shouldn’t. And so, in that case, giving them access to ham charts may makes absolutely no sense, right? And so, then you might want to give them like, whether it’s UI or as I said, still even in that case, I think code base interactions are prepared by all types of developers regardless of seniority.

Luca Galante 00:31:03 And so then you might want to do that, right? And so, then your day-to-day as a platform engineer really becomes that at least for one part really like listening to developers and like building this feedback loop and figuring out what is the right level of obstruction and context that I need to provide to them. On the flip side of that is then communication, right? And, I often say, hey, if you were to plot on a graph where like the Y axis is communication skills and the X axis is time, right? And you’d start all the way to your from kind of like your CIS admin to your infrastructure, DevOps, Cloud Ops and then eventually platform engineer, right? It would basically be a curve up into the right. And the reason is progressively these roles have had to like be better and better communicator.

Luca Galante 00:31:49 And the pinnacle at the moment of that is the platform engineer. Why? Because you need to be a really effective communicator with three different group of stakeholders, your developers, and we already covered that, right? Like figuring out the right level of obstruction, sell it to them, make sure that everybody’s happy, make sure that your senior backend people understand that you’re not taking context away from them, that you are avoiding them doing shadow ops eventually and so on and so forth, right? So that’s how you speak to the developer. But the lingo that you use to speak to your C level, which is another stakeholder is completely different, right? If you talk about like developer experience, look, it’s funny because a lot of times CTOs will tell you, oh, we really care about developer experience. And their reality is like, oh times they actually don’t like what they care about is time to market is your door metrics and yada yada, right?

Luca Galante 00:32:43 And so that’s how you need to sell it to them, right? It’s a completely different language that you need to speak really. And then the third group is operations. And I think this is something that a lot of people get wrong, which is they sort of don’t make very clear from the get-go that the mission and vision of the platform team is that of shipping a product, right? So, we’re connecting to the first point we said, and not basically ending up as yet another ops DevOps team, right? And that needs to be made really clear to the developers, right? Because you don’t want them to just like send you ticket requests basically. But especially to other existing operations team, because again, it’s not like the platform team is replacing the current SRE, you still need an SRE. You might need, in fact, even like multiple SRE and infrastructure teams and even multiple platform teams that’s something that we see in very large organizations where they compete to each other.

Luca Galante 00:33:38 But that’s a separate thing. And so, you need to speak to them and make sure, hey, there is a clear separation of concerns here. You’re the one like worrying about maintainability, scalability, reliability. I’m the one worrying about building a product which makes your job better and easier, right? And so you need to, it’s really complex multi-stakeholder conversations that you constantly need to juggle with. And so that’s really what your day-to-day looks like. And this is why by the way, the kind of like role of the platform product manager, I think becomes super crucial, right? Because they’re really going to be the link basically between all these different stakeholders, the platform team, right? And there, there really need to be the kind of top communicator in that case.

Jeff Doolittle 00:34:26 Yeah. You mentioned listening as an important factor in good communication. It’s like the old school wisdom that you have two ears and one mouth because you should listen twice as much as you speak . Of course, here we’re on a podcast, which is all about us talking and our listeners, listening. But what I appreciate about that too is we’ve, we’ve heard of all these different forms of something driven development. There’s behavior driven development, there’s domain driven development, there’s all these different things. But what some of us, I’ve heard joke about in the industry is something called pain driven development. But essentially pain driven development goes back in some ways to that conversation we had before about ROI. And if you can find where the maximum pain is with the relatively low efforts to resolve the pain, then you’re going to have good ROI. Whereas if you take something that’s not that painful, but you put a bunch of time and effort into it, then effectively you’ve just, you’ve reduced pain, but you haven’t gone for the big guns. I think that’s a lot of what we’re talking about here is where’s the pain, where weight, you mentioned weighting, that’s pain. We mentioned bottlenecks, that’s pain. So how do we, it’s like a pain reduction strategy for our engineering teams.

Luca Galante 00:35:31 A hundred percent. I love that framing. And that is exactly how we think about it at Humanitec. We’ve actually surveyed a bunch of like 2000 or so engineering teams of all different sizes. And what we asked them was basically we were trying to think like, okay, how do you identify the pain, right? And so we asked them, okay, out of a hundred deployments, how often do you do these things, right? How often do you create new service? How often do you spin up a new environment? How often do you provision database? How often do you do a rollback? How often do you change conflicts, whatever, all these things, right? And then we ask them to basically attach a time to that, right? Like, so how long it takes on the development side of things and how long it takes on the Op side of things. And so with that matrix in front of you, it’s really easy. Like that’s, you’ll really find the pains yell, screaming at you. It’s like, oh it’s clearly here, here and here. And usually it’s in configuration management, right? In handling application configurations, infrastructure configurations, right? Like dealing with those Helm charts, dealing with those YAML files, dealing with those Terraform and IC modules. That’s where we see it oftentimes. Like there is the highest amount of fear, cognitive load and pain in the system.

Jeff Doolittle 00:36:48 Absolutely. But what’s interesting about that to me too is every organization is going to have different amounts of pain and different kinds of pain. And so I think what this speaks to is, I’ve seen and talked to platform teams where it sometimes feels like they’re trying to create the next one platform to rule them all. And what we’re talking about here, there’s aspects of this really of servant leadership where if you’re a platform engineer, your job is not to come up with the coolest platform to rule them all and spend all your time in the Cloud working with the tools and e of course you have to do that to understand them. But the reason you’re understanding them is so that you can make developers’ lives easier. And the only way you can do that is by actually understanding what they’re struggling with. You can’t just in a vacuum say, here’s the new cool platform, it’s, but does it really solve their problems? And I think that hearkens back to what you say is the product focus, when it’s a product, the customer is the one who’s driving the innovation and driving the areas of pain that need to be reduced, not the person who’s building the implementation or building out the infrastructure.

Luca Galante 00:37:52 A hundred percent. And that’s the thing, the focus needs to be on making something that is 10 x better for the developer, not for yourself, not for you to play with like shiny new technologies and like, oh I built this thing using that. It really needs to be better for developers. Because otherwise there’s also not enough incentive for them to move over from the current status quo and set up over to the platform, right? And frankly when we’re talking about the problems that we talked about, like, if you’re waiting a week and now you’re waiting zero, it’s at least 10x better. So it’s not like because a lot of times like, oh, build a 10x better thing. It’s like, it’s hard, right? Like hoarder of magnitudes are hard. But I think if you understand the starting point, which is usually pretty bad, it’s actually not impossible to achieve, but you need to deliver that, right?

Luca Galante 00:38:38 If you’re delivering something that’s like, meh, better, like it’s just not going to cut it, right? And, and then sometimes it’s like the most obvious things you say, but still I think sometimes people just don’t, they’re like roll out this platform and they just like boop, drop it on top of the existing setup and then they expect people to use it and there’s no like lighthouse approach and no rollout strategy and sometimes even worse they force developers to use it, so those are all anti patterns that we don’t see too much of to be honest. But it did all fall back to this idea of like platform as a product. And the beautiful thing by the way about treating your platform as a product is that all the learnings and best practice stuff that we’ve had for the last like 30 – 40 years, our product management apply .

Luca Galante 00:39:27 So like all this like rollout strategies and this like, find a lighthouse team and then progressively increase users from there, like measure things, get feedback, qualitative, quantitative, blah blah blah. Like all the things that we already know of from building out our products just apply to build your platform. And so the moment, and this is why it’s super important I think, and sometimes here I do see like a lot of people fail, which is like they just take a bunch of devs, engineers that have like a different type of mindset which is just like building infrastructure, not shipping products really. And it’s like, hey, now you’re our platform team. Go build a platform and then guess what? Like they don’t make that transition, right? Because they don’t have the right PM, they don’t have the right framework around them to succeed. And that’s how sometimes platform initiatives fail.

Jeff Doolittle 00:40:14 Yeah. I think this has been a development in the industry over the last, few decades in particular where it feels like in the early stages and it kind of made sense of the day, which is it is very much technology focused and we’ve all seen the jokes and the tropes and the memes about the software engineers who only care about tech and they have absolutely zero human skills at all. And a lot of us have grown very weary of that. And I’ve really tried to spend a lot of time encouraging software engineers to hone their soft skills and their human skills. Because a lot of what we’re talking about here is, look, you have to understand the technology. Someone has to, and if you’re going to lower the barrier to adoption to a developer, it behooves you to understand the technology. But then you have to hone your soft skills and your human skills so that you can actually help other people be more productive. And to me, senior engineer ideally is less about your technological prowess as critical as that is. But it’s also about your ability to understand the needs of others and make them successful. To me that’s a true 10x engineer by the way. It’s not somebody who’s 10 times faster than everybody else, it’s somebody who makes everybody else around them more successful.

Luca Galante 00:41:18 Yeah, a hundred percent. And, still, you see so many people thinking that oh my God, my platform initiative’s going to fail because I picked the wrong tech. It’s almost never the case, right? , it’s like it’s not because you picked Argo over Flux that you’re going to fail, it’s because you failed at the cultural transformation, right? That’s why org transformations fail. They fail because of culture, not because of technology.

Jeff Doolittle 00:41:42 Yeah. The problem is people, the solution is people and that’s the problem .

Luca Galante 00:41:46 Right, exactly.

Jeff Doolittle 00:41:47 But also the opportunity, and I think there’s the call out to our listeners. So let’s switch gears a little bit here and talk about standards because as we’re talking about this, I’m thinking of this idea in my mind and when I say that, let me frame it a little bit. There’s so many knobs, there’s so many options, there’s so many things you could do. And I imagine there’s some risk here that if you’re trying to do platform engineering, that you can over abstract. We’ve talked about that on the one side of things. Like you just say, everybody gets Heroku and that’s it Like that. And not to pick on Heroku, that’s great for certain workflows and use cases. But it doesn’t work for everybody. And on the other side of things though, I imagine there’s under abstraction and I imagine you’re going to start there no matter, how hard you try, you’re never going to get the perfect abstraction right the first time. So maybe talk a little bit about in your experience, if there’s ways that teams can kind of figure out how do we do that? How do we kind of manage the change management of the dynamism of this over time so we don’t have permanent over abstractions or permanent under abstractions of these platform concerns.

Luca Galante 00:42:49 Yeah, it’s a great question. And, just last week I was giving you a presentation at platform days, Sao Paulo, it is the first event that the community did there. And I was actually talking about the 10 fallacies of platform engineering. And it’s funny you mentioned that because like two of the core ones are what I call the abstraction fallacy and then the freedom fallacy, right? And abstraction fallacy is basically like think about a developer just staring at a wall, and they’re like, I can’t do anything anymore right now. Right? And so they’re really, as we said, you want to build golden path, not golden cages, right? And obstructions are really good, but they’re also good if you want to use them. And if there is kind of like a side path that where you can circumvent them if you need to, right?

Luca Galante 00:43:37 So to be specific, let’s say your IDP spins up a default Terraform module. If a developer requests a database, right? And this has been vetted by security, it works in all cases, right? Most of the cases. But then one day comes and one of your developers has an eligible edge case, right? So what you want your platform to be able to do is that it lets the developer extend that Terraform module on their own according to permissions and roles and permissions and so on to respond to that edge case. Now, if the edge case happens more and more, then you realize, hey, this is a new golden path, right? And so then you take it and you formalize it and you standardize it, right? And this is how actually the platform becomes like a listening channel almost, right? To basically capture information from the developers to the platform engineers, right?

Luca Galante 00:44:33 We’ve all seen that picture of the college campus, right? Where you have the regular like pathways and then like all the ones in the middle on the fields. And so the platform in that sense becomes a really good way of like seeing those things form and then formalizing them before they sort of formalize themselves on their own. Because it’s sort of like a truism of platform engineer is if you don’t build your platform, it will build itself, right? The problem is it will build itself in the wrong ways, right? And then so the opposite of that is the freedom fallacy, right? And the freedom fallacy is where you basically have your platform team going, like AWS console access for everybody ? And it’s like, hey, oh hell, you want to use the help? Great. You want to use Argo Terraform, you want to know Kubernetes policies?

Luca Galante 00:45:22 Boom, right? And so then you get all the downsides of scale with no benefits, you get shadow ops, productivity drops, all the junk, right? That we talked about. And so again, the solution there is like think really, really hard where additional cognitive load is really necessary. It might be in some cases, right? But, Aaron Erickson, this guy, the core contributor as well, the platform engineer community friend, and he built, well ended up becoming the adopted platform initiative at Salesforce. He said to run a thousand different services in Kubernetes, you shouldn’t need a thousand Kubernetes experts, right? And so, I think encapsulates this like freedom fallacy really well, right? So you really want to always optimize for the degree of cognitive load that the team is able to handle and is willing to handle, not more than that, it doesn’t need to. Right? So those are like kind of the two sides of the same coin.

Jeff Doolittle 00:46:17 Yeah. An analogy that comes to my mind is, is one I use a lot, which is if you’re going to build a house, you could do everything yourself, power generation, water sanitation, water discovery. You could do a, well, you could do everything yourself. And then you could say that anyone who’s going to work on this house needs to know everything about all of that kind of stuff. But at some point we’ve discovered that that’s not very robust. It doesn’t really scale very well. It doesn’t standardize, it’s not repeatable. But, that’s how people used to build houses, right? Power generation was a water wheel, right? I mean these kinds of things. And now what you say is you’re still an expert on how to build the house. We’re not saying you’re not smart, but what we’re saying is you don’t need to be a mechanical engineer, an electrical engineer, a chemical engineer, a civil engineer, you can focus on building a house and platform engineering is going to provide all of those services that you need to generate and distribute and all this kind of stuff. So then you can be expert on the specific domain of how do I build a house? And you don’t have to be an expert on the domain of everything else that gives you the service delivery that you need in order to build that house.

Luca Galante 00:47:18 A hundred percent. Yeah.

Jeff Doolittle 00:47:20 Great. So are there any traditional best practices that we haven’t covered yet that fit well with platform engineering? What are things we want to keep from what we’ve learned from DevOps or from what we’ve learned in the PaaS that we can still continue to leverage? Just to give our listeners a sense of, oh these are good things I’m already doing, I’m already familiar with, let’s keep doing those things in this platform engineering world.

Luca Galante 00:47:41 Yeah, I mean I think this really reconnects to what we were saying earlier, which is the beauty of treating your platform as a product is that all the product best practices that we’ve learned in product management in the last few decades apply. And so I think we’ve covered a few of them. I think another interesting one that I’ve seen as well. And so like starting from kind of like what’s the issue? Like it’s another one of the fallacies, which is the sort of like the loudest voice fallacy. And it’s like really funny one because it’s like think about your hardcore Beckett engineer or doing a lover asked their partner to get married through prom commands, right? So that person, we know who that person is in our team and they’re usually the loudest voice in your team. And they’re also very articulate and they are going to be the ones that a lot of times are going to be like, oh hey, this platform is abstracting in May, or like this doesn’t work for me, blah, blah, blah.

Luca Galante 00:48:37 Right? And so here’s a really important thing that I think we’ve all learned from the last few decades of shipping products, which is a good product and, in this case, like a good platform is not designed for the strongest link in the chain, but for the weakest one, right? And so when you are designing that, make sure that you ask obviously like a diverse group of people individually though, right? Because if you’re going to put like a hardcore SRE in a room, right, with a couple of junior JavaScript developers, you’ll probably get the wrong input because especially like, I, again, I was speaking about this in Brazil last week and you have certain cultures where this is even more true than others, right? So a culture like Brazil is usually quite sensitive to like the seniority more than others. And so that’s where you’ll see like basically everyone that is like a little bit more junior or something like completely be shut down and shut off by the loudest voice in the room.

Luca Galante 00:49:36 And that can be quite dangerous. And, during this process of you figuring out what’s to right golden path and so on for my organ. So that’s just like another example I think of, again, everything that we can leverage from product management best practices and is by the way, why the product management channel in the platform engineering slack space is I think one of the most interesting ones, right? It, it was actually the one that was really like started by community members and it’s one of the most fascinating one for me to look at because, really encapsulates if you look at all the different threads, the complexity of all these different challenges that we talked about. And it is really fascinating. It’s, it’s really interesting. So yeah, I think if people are curious about how all this like product management and everything that what can we apply from what we already know to building a platform, I think looking through the lens of product management is a great way of thinking about it.

Jeff Doolittle 00:50:36 Yeah. And broadly speaking too, a lot of this triggers for me a thought that I first heard about the idea of technical empathy from one of my mentors, Monte Montgomery. And there’s something he says all the time that I think is really pertinent. He says the job of a software architect, although I say broadly, whether you’re a software architect or a software engineer who’s been in the industry for a while, he says, your job is to take a development community of a broad spectrum of acumen and make them productive. And that really touches, I think, a lot on what we just talked about where if you have a single voice that’s having a chilling effect, then you’re going to struggle to make a development community of a broad spectrum of acumen productive. And so you need to find ways to do that. And I think that to me again, is the measure of what I would call a 10x engineer. They also don’t do it alone, right? They’re able to contribute to making that development community productive, even though they have very different skillsets. And, as like Robert Martin has been saying for years, there’s, I think at any given time, half of the engineers in the software development industry have less than five years of experience because there’s so many new ones, right? And so the challenge is not going away. The challenge continues. Yeah.

Luca Galante 00:51:43 Yeah. Yeah. I love that framing.

Jeff Doolittle 00:51:45 So let’s talk a little bit about your company and how Humanitec employs these platform engineering practices and helps companies to be successful with adopting them.

Luca Galante 00:51:55 Yeah, totally. So Humanitec, you were giving this analogy of building a house, right? And you don’t want to invent everything from the developer perspective, but the same thing is true from the platform engineer perspective, right? You don’t want to, rebuild the entire stack from scratch, right? And what we believe a good platform team does is they focus on the last mile optimization, right? Because that’s really what’s going to set your platform. Your IDP apart from the rest is a platform team, platform engineers that listen, right? And they really build this platform, they tweak those last details that, and they really make sense for your own org, because otherwise you might as well have a PaaS, right? Like otherwise, what’s the point of having a platform team building an IDP? And I mean, again, PaaS don’t work at the enterprise.

Luca Galante 00:52:49 That’s why, sort of like Heroku was really great at the beginning and then kind of like flatten it out because they never could figure out enterprise, right? In the enterprise, everyone is a bit of a special snowflake, and they need their own platform team. And so the way we think about our products at Humanitec is empowering these platform teams by giving them an unopinionated toolbox to go and build their own opinionated workflows and own opinionated platforms, right? And specifically that toolbox has a core product, which is the platform orchestrator. And you can think of the platform orchestrator as a core configuration and standardization engine that kind of, is at the heart of your platform layer. And then you have drivers, which is an open-source library of connectors that let infrastructure elements communicate with each other and specifically with the platform orchestra as well.

Luca Galante 00:53:41 And then SCORE, which is kind of what we touched on earlier, which is this workload specification and is the sort of, at this point I would say the interface of reference for developers to interact with their platform layer. And so that’s kind of what we provide. And if you think about it from we integrate with pretty much everything that’s out there. Like the beauty of this model of the platform orchestrator and the drivers is we integrate with any piece of infrastructure pretty much and with anything that you already have, right? So we can deploy, but if you already have Argo CD, we can let Argo CD deploy instead. And so the whole point for us is really around standardization of configurations, both application infrastructure configurations because as we said earlier, that’s where we see the most pain and that’s what we focus on. And then from a product kind of philosophy perspective, our philosophy is like integrated and embrace because again, enterprise is complex, right? And I think Heroku and other sort of solutions that try to be the one size fits all, DevOps all in one kind of one stop thing, that just doesn’t work at the enterprise. And so we want to be really take a lot of the pain away, but be an easy plug and play solution that can work with your existing setup.

Jeff Doolittle 00:55:00 Awesome. Well, before we wrap up, let’s talk a little bit about the PlatformCon conference. What is it, what’s your involvement and why should listeners attend?

Luca Galante 00:55:09 Well, I mean they should attend because Jeff is speaking this year. Oh, okay.

Jeff Doolittle 00:55:13 Well that, I was not asking for you to say that, but .

Luca Galante 00:55:17 So PlatformCon is the number one platform engineering conference. We started doing it last year for the first time and it made a lot of noise in the industry. It got huge response; I think a lot better than what we ever dreamt of originally. We had I think 7,000 signups and about 6,000 just in the day one stream last year. 80 talks, 20+ hours of content. So it’s amazing. It’s virtual, it’s free, so anybody can join. It’s a two-day long conference. It’s going to be on the 9th and 10th of June this year. And PlatformCon ë23, and I mean this year we already see the numbers are going bananas, right? So we already crossed 8,000 signups. So we are already well above what we did in ë22 and we’re still like two, three months away from it. And we have amazing sponsors, people like Atlassian, Docker, Google Cloud, and we are crazy, we were talking about this before jumping on, but we had almost 500 talk submissions, so we could be really picky this year we had, we picked 130 to 150.

Luca Galante 00:56:25 It’s going to be the final number. And so the quality of the speakers is really going to be amazing. The quality of the content is going to be amazing. So we’re going to have probably like 40, 50 hours of content. And what I’m really excited about is a format, the way you can think about it is it’s completely async, right? So we usually have a kickoff where we welcome everybody that I host both for the European morning and for the America’s morning. And then, that’s like about 20 minutes long. And then off you go, you can take your kids to school, go to work, whatever, and consume content at your own pace. And then what we do is we crunch all the action in a two-hour Q&A on the platform engineering Slack. And this is fantastic because unlike a lot of other virtual events that, force you to use this like weird virtual event conferences tooling that you kind of like use once and never again.

Luca Galante 00:57:16 This really leverages two existing assets of the community. The platform engineering, YouTube channel, that’s where we publish all the talks as well as the PlatformCon.com website and the Platform Engineering Slack where we already have 11,000 members. It’s super active. It’s really like the beating heart of the community. And so there’s no like cold star awkward, like people don’t talk, like there’s already a ton of action and we basically, and platform conscious becomes like a boost on top of it, right? And so what we ask the speakers to do is, hey, when the Q&A starts, the speakers for that day, they start a thread in the Slack basically posting a link to their YouTube talk with the main sort of like takeaways. And then it’s kind of like, hey, ask me anything. And what that does is it starts a frenzy of threads all over the Slack and people keep engaging and, you have cross pollinations like speakers jumping on each other threads.

Luca Galante 00:58:09 It’s super fun and also super cool to see last year, it’s a two days conference and I thought, okay, it lasts two days, but actually because the speakers hang out on the Slack anyways and people kept watching talks on the YouTube, the threads kept going for like 10 or 12 days, which was really cool to see. So it’s a super fun event. It’s a bit different from a lot of the other virtual conferences that people are used to, I think is a lot more fun, is less monotone. And we have some of the best, platform engineering and DevOps people out there speaking. So yeah, PlatformCon.com, it’s free.

Jeff Doolittle 00:58:45 Great. So if listeners want to find out more about Platform Engineering, they can definitely go to Humanitec’s website. They can go to PlatformCon.com. Where else might they find you out on the interwebs?

Luca Galante 00:58:57 Yeah, platformengineering.org is another great kind of like starting point. And then as for me, Luca underscore Cloud, that’s where I tweet or Platform Weekly. That’s my newsletter platformweekly.com.

Jeff Doolittle 00:59:09 Great. Luca, thank you so much for joining me today on Software Engineering Radio.

Luca Galante 00:59:13 Thank you, Jeff. It was fun.

Jeff Doolittle 00:59:15 All right, this is Jeff Doolittle for Software Engineering Radio. Thanks so much for listening.

[End of Audio]

SE Radio 565: Luca Galante on Platform Engineering

Show Notes

From the Episode

From IEEE Computer Society

From SE Radio

Transcript

Join the discussion

More from this show

SE Radio 717: Eric Tschetter on Decoupling Observability

SE Radio 716: Martin Kleppmann Local-First Software

SE Radio 715: Sahaj Garg on Designing for Ambiguity in Human Input

Menu

Recent posts

Search

Search

SE Radio 565: Luca Galante on Platform Engineering

Show Notes

From the Episode

From IEEE Computer Society

From SE Radio

Transcript

Join the discussion

More from this show

SE Radio 717: Eric Tschetter on Decoupling Observability

SE Radio 716: Martin Kleppmann Local-First Software

SE Radio 715: Sahaj Garg on Designing for Ambiguity in Human Input

Menu

Recent posts