Search
Alex Olivier and Emre Baran - SE Radio guests

SE Radio 664: Emre Baran and Alex Olivier on Stateless Decoupled Authorization Frameworks

Emre Baran, CEO and co-founder of Cerbos, and Alex Olivier, CPO and co-founder, join SE Radio host Priyanka Raghavan to explore “stateless decoupled authorization frameworks.” The discussion begins with an introduction to key terms, including authorization, authorization models, and decoupled frameworks.

They dive into the challenges of building decoupled authorization, as well as the benefits of this approach and the operational hurdles. The conversation shifts to Cerbos, an open-source policy-based access control framework, comparing it with OPA (Open Policy Agent). They also delve into Cerbos’s technical workings, including specification definitions, GitOps integration, examples of usage, and deployment strategies. The episode concludes with insights into potential trends in the authorization space.

This episode is sponsored by Penn Carey Law school
UPenn Carey Law logo



Show Notes

References

Related Episodes


Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Priyanka Raghavan 00:00:19 Hi everyone, this is Priyanka Raghavan for Software Engineering Radio and today on our show we are going to be discussing the topic ìstateless decoupled authorization” frameworks. And for this we have two guests, Alex Olivier, and Emre Baram. Emre is an entrepreneur and a software executive with more than 20 years’ experience in B2B and B2C product areas. He’s currently the co-founder and CEO of Cerbos. And before that he co-founded Turkey’s largest social network in the mid-2000s, called yaja.com. And after that, has been in a variety of different organizations — one is, of course, Google. And Qubit. And one of the podcasts he appeared on, they called him a serial entrepreneur. So I’m going to stick with that. And Alex, he’s the CPO and co-founder at Cerbos. He has a wide variety of roles and experiences — be it engineer, consultant, tech lead, product manager. And there’s also this one line which says, “always an eye on developer experience.” So that’s great for us here at SE Radio. He’s worked on different companies, again, Microsoft, Qubit, and a myriad of startups with a focus on areas such as authorization, data management, and security. So welcome to the show, Emre and Alex.

Emre Baran 00:01:35 Thank you for having us. Yeah.

Priyanka Raghavan 00:01:38 Great. So in SE radio, we have done a few shows on authorization as well as authentication on Episode 492, which I just want to call out to the listeners, we had a show on building consistent authorization service, mainly on the Google Zanzibar project that we talked about. And then Episode 406 on the open policy agent. We’ve done a few shows on OAuth 2.0 and API authorization. However, since we are exploring this topic again, I think nearly after gap about four years, can I pose this question to both of you on what is authorization? So Emre, can I start with you?

Emre Baran 00:02:16 Sure. I want to start by saying what it is not. Authorization usually comes with its twin authentication. And authentication is a fact of who you are. Are you, who you say you are and what roles and what attributes you have: that’s authentication in your directory. And authorization is the fact that now we know who you are, are you allowed to do a certain action or not? And you can think about this, the application of this, in many things in life as well as in software. Now the fact that you can log in doesn’t really mean you can do every action in any given software. And the control mechanism of what are you allowed to do versus not is authorization.

Priyanka Raghavan 00:02:59 Great.

Alex Olivier 00:02:59 Yeah, I think there’s a really good analogy for anyone that’s taken a flight recently; you got your passport, you fly to some exotic location for your vacation. You get to passport control, they take your passport, they authenticate it to you by comparing your photo and your biometrics. It’s like, cool, Alex has arrived, this is his document. But the actual decision around whether you’re allowed into the country or not is an authorization decision, which is based upon, have you got the right visa? What’s your immigration status? Have you got the right funds? Those sorts of things. And that’s a check: knows who you are, but should you be allowed in — is the difference between authentication and authorization.

Priyanka Raghavan 00:03:33 That’s a great example and I think maybe Alex, I’ll ask you this question then, in a lot of literature I see there’s this term called as an authorization model. Is that something that you can describe for us and maybe what are the key components?

Alex Olivier 00:03:47 Yeah, so authorization, authorization models, there’s kind of various ways you can think about what decides access to a particular system. And the term that I imagine most of this audience would be familiar with is RBAC or Role-Based Access Control, where your authorization — your access — is controlled by whether you have a particular role or not. So you must be an admin to do certain actions. You must be a user to do other actions. You must be a subscriber to do the download action let’s say. RBAC is one that probably most people are familiar with. ABAC or Attribute-Based Access Control is kind of the, either the evolution or the superset or the subset — depends on how you look at the world — of that. And that’s about deciding your access based on more than just your role. It’s about deciding access based on attributes. And those could be attributes about who you are, it could be attributes based upon the resource you’re accessing.

Alex Olivier 00:04:35 It could be attributes based on the context. So where did this request come from? Is it from a known IP? Those kinds of rules. And there’s lots of different components you could bring in to decide your access. There are other models such as relationship-based access control where your access is based upon what relations you have with a particular entity or the resource you’re trying to access. So there’s different ways of approaching authorization and there’s use cases for all of those. And there’s some cases where doing an attribute based controlled check is more sensible, they’re doing a relationship based or vice versa. And so really goes back to as always looking at your requirements, looking at your use cases and then picking the model that’s best for your system and best for your requirements inside of your application.

Priyanka Raghavan 00:05:15 I think I’m going to come back with a question there on that, but I think it’s a good point for me to also discuss a little bit on why you think authorization is important for software engineering teams. So Emre, I’m just going to give it to you because I thought it’d be good for you to explain this and maybe is there something that you can relate to an example where things have gone bad because authorization was implemented incorrectly?

Emre Baran 00:05:38 Yeah, you can think of many different examples, but also there are real life examples of when authorization goes wrong or when authorization isn’t taken seriously. A simple one I can give you an example of is, imagine these neo banks, right? These neo banks giving you a bank account that you can actually log in and suddenly you start using that bank account for your company and multiple people need access to it to be able to do certain things. But suddenly because there is no roles and permissions or limitations that have been in these user accounts, everybody’s capable of making as large transfers as they want or everybody’s capable of seeing everything. And certainly as a software builder you don’t want that, right? You want to make sure everybody’s limited to their roles and limitations of what they should be able to do. If we want to look at a disaster case scenario, we can actually take a look at news in early days of a very popular ride share application where people from the customer service team or people from inside the company who had unfettered access to everything within their thing, they were able to take a look at some celebrities account and the trips that they have actually taken.

Emre Baran 00:06:48 In a normal world scenario, you only want to be able to enable the right person at the right time to be able to look into that trip. But now everybody has access. In the correct world implementation, a person should only be able to look at that account if there’s a complaint, if there’s an issue with a payment or if there’s a complaint from a driver or from the rider. Other than that, nobody should be able to go in and look at that account. And that is a lack of proper author thinking about authorization and requirements and limitations and not actually implementing them.

Priyanka Raghavan 00:07:22 I think that’s a case where there’s a term also the granular control in a permission management system. So they don’t have good granular controls is what I’m hearing.

Emre Baran 00:07:32 Exactly. Probably in that scenario they had customer success. Employees can look at the right information, that’s as course as it gets, but what does that mean? They can look at anybody’s information, they can look at any timeframe, any country and anything. So that’s coarse grained. But a fine-grained one would be only you can look at a specific customer that there is a support case open for or you can take a look at only a customer again trip if you have been specifically given permission to look at because of an upstream event that has happened.

Priyanka Raghavan 00:08:12 Okay. I think Alex, based on what Emre said, you talked about the domain model and you explained to us like, the IEBAC and RBAC and relationship based access control. So I was wondering when you have a, like an authorization model, can you have many types of things? Can you have an RBAC, an EBAC and also like a ReBAC in the same model?

Alex Olivier 00:08:32 Yeah, so the way to kind of think about it is less to do with whether it’s ABAC or RBAC or ReBAC et cetera. It’s more about is this more of a policy-based model or is this more of a sort of a data-driven model? And what I mean by that is policy-based model, which is what Cerbos is where you have policies that define here are the different resources, here are the different actions and here are the conditions that are, which those actions should be allowed. And it could be that simple RBAC role-based check where you simply say, has this user got this role? Or it could be a finer grain attribute-based check where you’re looking for individual attributes about the user and the resource they’re trying to access. And that’s defined as a static versioned tested, audited policy. But the key thing in that model is there’s no actual user or resource data stored in it, it’s purely the rule set.

Alex Olivier 00:09:14 And then at evaluation time the system or the architecture would bring the data to that rule set. That rule set will be evaluated as policies will be evaluated. And the simple allow or deny decision comes back in the kind of primary use case. The model and the other approach is kind of where the permission is embedded in the data itself. You mentioned Zanzibar at the start, the Zanzibar white paper outlines the architecture behind sort of Google drive and Google docs. And in that world, you are basically storing the data, you’re storing the relationships between resources inside of this sort of authorization layer itself. So in that world you don’t just sort the policies, you’re maintaining the relationships or the permissions between individual resources. And so that requires you to kind of replicate and duplicate and synchronize data into your permission store. Also the policy-based approach.

Alex Olivier 00:10:01 And that is the requirement. You bring the data to the authorization of the system when you need to a check that way it ensures it’s always up to date and correct and you always got to get the answer based on the most relevant data. And so it’s kind of two-way approach and again it goes back to kind of what your architecture base makes sense, but being that policy driven approach I personally think is kind of the one that gives you the most clarity of exactly what your rules are. And you could inspect on the side exactly what’s going to happen inside of the system.

Priyanka Raghavan 00:10:26 When we did the show four years back on building a consistent global authorization service, we talked about the Zanzibar project and then there was a big question there on, they had specific goals on correctness, flexibility, low latency, high availability, and large scale. Obviously, it’s Google. But then I wanted to ask you and I guess this is a question I’ve seen in a lot of other podcasts that people have asked the two of you, where does it make sense to build your own service like Zanzibar and where do you use an off the shelf authorization service? But I’m sorry, I have to ask you the question again. Can you give us some advice?

Alex Olivier 00:11:01 It’s a great question. We get asked this all the time ourselves and the whole reason we started this service nearly four years ago now, is we’ve had to build this ourselves in previous companies. Myself both as a developer and then laterally as a product manager. I’ve been both the guy that had to write the code and the guy that had to write the specification and the commonality there is, it was never a core functionality of the business we were building this in. I’ve had to build this for supply chain systems, I had to build this from our tech systems, I’ve had to build this for analytics system, I had to build this for finance systems. And the common thing is those businesses were not authorization systems. We should have been spending our engineering time on delivering the features and the capabilities that our customers wanted.

Alex Olivier 00:11:39 And much like you would never build a database today, you would never build file storage today, you would never go and build an image processing pipeline today. Those are the things that you could just pull off the shelf. So apart from, edge cases where you do need a very specific system, we’re in a world now where there’s amazing open source projects out there where you can just go and grab it, bring it in, and be off to the races and not have to spend time working out all the edge cases, working out all the carve outs, debugging what’s going on inside of some custom code. There’s an ecosystem of rich ecosystem out there for around a lot of these projects, including Cerbos that is making this, offering better without you having to dedicate time, effort and an engineering resource inside of your own business to go and build things. Now edge case is excluded. I would take a serious look at like do we really need to be spending our time on this and we’re past the zero interest rate phenomena of the early 2020s and we’re now in a world where we need to be really looking at are we delivering the right value to our customers and are we delivering what our customers need and are we putting our, all of our effort focusing on that rather than these other external things that we just pick up off the shelf and use.

Priyanka Raghavan 00:12:45 Emre, you want to add anything to it?

Emre Baran 00:12:47 I mean the question is, Alex touched upon an important point, like you wouldn’t build your own database, you wouldn’t build your own software infrastructure unless it’s going to make your software differentiated than any other competitors of yours. It has a specific need in there. One other state of software building that doesn’t need authorization but for that same reason doesn’t need also authentication or many other things, many other security features is when you are actually building your POC, not even POC, let’s call it POC and POT, you want to make sure your technology can solve a problem in the world, right? And at that point you’re just very much so focusing on making the machine work to solve the problem at the moment you need to take that solution and actually now make it available to your end user, to your customers. That’s the moment where authentication and authorization and everything else is the time you need to start thinking about it and put those restrictions in place.

Priyanka Raghavan 00:13:45 Great. So I think the next logical question I have is what are the challenges that one would face if you had an external or decoupled authorization? Maybe can you state like three hard challenges?

Alex Olivier 00:13:58 So I guess firstly it’s worth kind of explaining what decoupled or externalized authorization is. If you think of authorization logic, if you were to just do something quick, you will probably end up in a situation where in your code base you would have like an if statement somewhere or a case switch statement that says if user role equals admin, let this request go through. If user role equals manager only you allow this request under X, Y, Z, sorry. And for those small applications, that’s perfectly fine, get you where you need to get to prove the value. Cool, move on. But as your application grows, particularly if your application is start being made of lots of services and those services might be in different languages, anytime you need to evolve or change or update that authorization logic, which spoilers will happen, you’re going to have to go touch that code and that code is going to get more and more fragile as you add more complexity to it.

Alex Olivier 00:14:43 And there’s going to be more places you need to update logic and whenever the business requirement changes, you’re going to have to take that written Jira tick or whatever and convert that into application code. And that application code might need to be a GO, might be a Java, might be in .NET depending on what your services are. And then you’re going to have to go and touch and redeploy all your applications, et cetera. The other side of it is from a business awareness perspective, we as developers are happy to write code all day but those that define the requirements for authorization are more on the business side of things and maybe in a security team may not even know code. And if they need to go and look and understand how some logic was implemented, they probably can’t because they don’t know Java, they don’t know GO.

Alex Olivier 00:15:23 They don’t know x, y, z language. So the attitude of externalized authorization is you are externalizing, funnily enough, all that logic out into a standalone service or a standalone component in your application stack. And that component has in it the authorization logic and now because it’s just another service inside of your setup, your authorization logic can be defined in something that’s maybe a bit easier for someone that isn’t a developer to understand. So it could be policy files, we’re talking about policy-based access control, it could be, lookup tables or data stores if using one of the other models and that’s essential source of truth, that’s essential one place where all that logic is defined. It could be version control, it could be tested, it could be fully audited, et cetera. And then in each part of your application architecture where you want to then check permissions rather than having all that logic hard coded in there, you’re essentially just calling out to that authorization service and will you simply say okay here’s those requests, here’s the user, here’s the resource and here’s the action they’re trying to do.

Alex Olivier 00:16:20 And then that gets sent over to that authorization service which then evaluates his policies and get returns back, allow and deny. So that you no longer need that FLS case switch our logic listed across your code base. It’s now simple ìif” statement. If the authorization service says allow, do the action, if not return in some sort of error. And that really gives you two big benefits. One is whenever you want to change your authorization logic, there’s a one place you could do it, you update it once you make sure your tests will work, etc. Push out that policy change and then all your different parts of your application architecture, that dual authorization are now behaving based upon the new logic without you having to touch your application code. And secondly, and for regulated businesses or high compliance environments, this is a really key one because there is a single component in your stack that is doing all the authorization checks. There’s a single point where you can capture an audit log of every decision and every action that was made inside of your application that comes through a single point and that’s going to be consistent, it’s going to be well structured, you do not have the cobbled together logs from different application services, et cetera. And that gets you to a world where this externalized or decoupled authorization model gives you kind of lots of advantages around that audibility visibility and scalability ultimately to get authorization logic across your application.

Emre Baran 00:17:35 And on the back of that, if we want to focus on the hard parts of migrating onto this would be one for existing pieces of software, you need to now figure out where you’re doing all these checks and actually replace them rather than a business logic in there, replace them with a API call or like local library call to serve us or to your authorization check system. And the bigger, I wouldn’t call it a challenge, but it’s the effort that’s required from this is also looking at your software and trying to centralize or try to define the authorization requirements of your system. How many roles do you have and what does that mean when you have that role, which components can that role access? Which actions can they do under what circumstances? Coming up with that meta understanding of your authorization and turning that and then once you understand it, writing that into a policy takes minutes to maybe a couple of hours but it’s the understanding your system and being able to nail down your authorization requirements is the harder part of the process.

Priyanka Raghavan 00:18:41 So what about the challenges now that the authorization has kind of moved out to another place then it almost feels like you’re losing a bit of control, right? If you’re used to having it in your code, I mean of course it’s great because it’s one less check to do, but the thing is what are the challenges if you were outside, would there be like a latency challenge or other things if you have to go to some other place to pick up the decision to allow something?

Alex Olivier 00:19:05 As with kind of everything with do software architecture, there’s a compromise you need to make and one of the things that you do run into once you start externalizing authorization is you are going to put another blocking call essentially in your request pipeline. Now depending on what authorization solution you are using and whether it’s a stateful or a stateless system will very much depend on what that deployment looks like. What we always say to service users is make sure you run Cerbos as close to your application as possible. So I’m sure many are familiar with like Kubernetes. The way we recommend deploying Cerbos in that environment is you run a Cerbos sidecar in every one of your application pods that needs to do authorization checks. So you basically bypassing as much as the network as possible. It’s just a local call at that point. And then your authorization layer itself should be smart enough to figure out how to distribute policies in a sensible, scalable, consistent way across your architecture.

Alex Olivier 00:19:56 And so actual the runtime checks, the lookups and permission checks are being done are literally just talking locally inside of its own pod to get a decision. And there’s lots of things you could do around like choice of APIs whether you use GPC or HP or these sorts of decisions you can make and options that you should be considering when you are doing a deployment of something like this. But the biggest one that does need some thought is your deployment to reduce things like latency and number of hops involved. Do you start doing things at the gateway level? Do you start things doing down at the service level? Do you use authorization just to populate your claims and your token? There are other approaches you could do still using an authorization service that is managed centrally to get to where you need from a security point of view but also a performance and a an SLO perspective outside of your system.

Priyanka Raghavan 00:20:42 Okay. So brings us then to like Cerbos, which is a policy-based access control. So what inspired the creation of Cerbos and what’s the gap in the market that you’re trying to fill?

Emre Baran 00:20:54 What inspired the creation was the fact that earlier Alex was talking about this, our previous lives we had to, I think collectively within our funding team we had to build this authorization. They built or rebuilt or improved 10 times. And every single time we’ve done it, we’ve been always complaining about why are we still building this? This contributes zero differentiating features to our product, yet it was something that we had to go and build. And at the time looking at the solutions in the market, none of those things really addressed the challenges that we had. So the gap in the market that we’ve seen was there wasn’t a good decoupled or let’s say I call it decoupled necessarily. So authorization solution that we could have easily implemented and moved on with our lives. And funny enough, as we were starting Cerbos, that was a pretty much the same time where many other authorization, decoupled authorization or externalized authorization providers also started the same thing, which kind of told us, okay, the market is now ready for this, this is the right time to do it.

Emre Baran 00:21:57 And our goal was always making life easier for software developers so they can just purely focus on what they want to build, what they need to build rather than having to reinvent the wheel when it comes to security. And as we all know, nobody really likes to reinvent the security wheel because it’s hard. It has a lot of loopholes, it has a lot of gotchas, and we wanted to provide developers something robust and safe, secure and fast enough so that they could have one less worry as they were building the product they were building.

Priyanka Raghavan 00:22:32 You talked about Cerbos, the primary users being developers, but are you focused on startups or enterprises or what are the primary users of Cerbos?

Alex Olivier 00:22:42 So the users we see kind of will vary based upon this type of organization. Cerbos at its core is an open-source policy decision point. It’s an open-source project ready to go grab of Github, GO and enjoy it patch license. But the requirements for authorization and who’s involved with authorization will very much depend on, what your business is doing. What we see is startups earlier on, as I said earlier, you kind of get going and prove the value with something quite simple and then you might mature in terms of using something that’s like externalized authorization later on. But if you’re working in a regulated industry, finance, medical technology, insurance, those kinds of industries, even as a startup, you’re going to have those much stricter requirements around authorization earlier on. And in those type of businesses, the requirement isn’t coming from a developer who’s just trying to get something implemented quickly and may five servers, the requirements are now actually literally coming from the whole value of the business being, say a FinTech, you have strict access control requirements you have to implement if you’re going to be a regulated business.

Alex Olivier 00:23:44 So you’re now getting those requirements from the security team, the product team, the compliance team side of the company and you’ll end up implementing a standardized externalized stake, hopefully authorization system much earlier on in the lifecycle of your business. In terms of who’s involved authorization we’ll be talking about developers a lot and ultimately, they’re the ones that are going to have to write the code. But there are the stakeholders here. You have a DevOps or a platform team who will go and deploy the authorization system inside of your environments. Inside of your clusters you’ll have maybe a security compliance team that are doing the regular order reviews of your policies and running audit checks, etc. If you are as a business, you are getting subject data access requests from users, I mean you need to be able to pull out what they did inside of a system that be coming from a different part of the team.

Alex Olivier 00:24:27 But there’s also teams you may, may not necessarily think of your customer support team who might be handling support tickets about why can’t I access the system? Might need some insight into the authorization logic behind it. Even on like the sales team if you’re trying to sell software to the world and they’ll come to you saying like we’ve got this customer, they really want to use our system, but they have very fine-grained authorization requirements or permission requirements just due to the nature of their business or their organizational structure. So there’s a lot of different parts of a company and roles and of a company that will have sub input and authorization. And as Emre said earlier, the hardest part is getting you on to agree on what the requirements are and then going off and doing implementation.

Emre Baran 00:25:03 Yeah, one more thing to add into there is you might have your standard software, you might have just four roles and that might actually work, but then you might go sign up a very large customer where they have 5,000 internal users and those four roles aren’t enough, right? For that customer you need 10 different roles with regions, etc., various other things, or 2050. Now you might go sign up another enterprise customer which has a different internal structure than the previous one. So they want their roles to be structured differently. So Cerbos in that world allows you to be able to customize your roles and permissions on a per tenant basis. So suddenly we go away from one size fits all model where the product manager of the original product must think very hard, how to get common roles working for all their customers. Suddenly we give them a world where every customer can have their own structure within their software.

Priyanka Raghavan 00:26:45 So one of the things when I looked at the open-source, Git repo and I was also looking at the Open Policy Agent because we had a show on that as well. How does, Cerbos differ from OPA?

Alex Olivier 00:26:57 Yeah, so OPA Open Policy Agent is it’s a great CNCF project is heavily adopted on infrastructure components like Kubernetes for example, uses OPA inside of it as well. And when we started building out Cerbos, we looked at kind of what OPA was doing, we looked at Rego its language as well and kind of saw like this is the right idea in terms of externalizing and taking a policy-based approach to things. But where we saw there was a bit of a gap is really focusing on this application layer permissions because there’s a whole set of things you kind of disregard at that level. There’s a whole set of capabilities you need on top. And so when we kind of looked at it, we sort of went okay, policy-based, having a way of declaring your logic in a version control tested way of doing things is the right idea.

Alex Olivier 00:27:40 But we really wanted to simplify things down for that application commission use case, that kind of multi-tenancy application use case and making sure in that level you do have much more involvement from security, from product, from sales, from customer support. How can we bring that kind of save experience but in a way that those teams and those different parts of the organization can be a much more involved with authorization. And the key thing we did there was the actual policy language itself. So several uses YAML and there’s no extra language to learn. It’s very parsable and grokable, and you can kind of scan through it and really understand exactly what what’s going on. The way we’ve structure things around here are your resource policies, there’s one per different resource type in your application and the way you can say okay, here’s a variant for a particular customer x, y, z, there’s a very clear differentiated way of explaining and defining the custom rules for that particular user as well. So looked at OPA as a great project, we kind of took our interpretation of that and applied our application-level permission lens on top. And that’s kind of got to where we are today. Four years later — nearly — the service is being used by — well you can see in the Github stats: tens of thousands of deployments and Github stars and such of our solution out there in the world. And it’s meeting this requirement of this application-level permissions.

Emre Baran 00:28:51 One thing to add on top of it is OPA is great. OPA is built for everything. OPA is a very general-purpose one. When we built Cerbos for just the application layer, we were able to reduce the footprint a lot and we were also able to reduce the response time a lot because, we don’t have to handle a lot of those things. So as a result, Cerbos is a very minimal deployment when you look at the CPU requirements and memory of the application that it needs from an application which makes it a great companion because it almost exerts zero extra load on your systems, and it gives you this super flexibility in a much faster response time.

Priyanka Raghavan 00:29:32 That’s a very good distinction that you made for infrastructure OPA and then also maybe general-purpose for a lot of things that OPA uses. And this is more for the application-level authorization that we have. Can you give us a little bit of how it works under the hood? So I’ve got a YAML file, and I can fill that in with all my permissions for a particular project. Then what happens?

Alex Olivier 00:29:52 Yeah, so you go through that policy definition process. So working with the different stakeholders inside of your business and on your application, defining your different resources, the different actions, the conditions under which they should be allowed or not. We always recommend users then go through the additional step of writing tests against those. So as well as writing your policies with Cerbos, you can then give example fixtures: here’s some example users, here’s some example resources, and then defining under which condition or which should be allowed or denied for each of those. And so you have a test suite and then we take a very GitOps-style approach to deployment. So we recommend you go and check those into a Github repo. You go and wire up CI, be it something you run yourself or you use Cerbos hub, which is one of our offerings.

Alex Olivier 00:30:33 And now you have policies that are good and valid and ready to go. For the deployment side of things, you then need to go and run Cerbos, the policy decision point that the container, inside of your infrastructure somewhere. And like I was saying earlier, our recommended approach is to make sure that service is running as close to your application deployments as possible. We keep saying the word stateless and what we’re saying in this context is Cerbos itself doesn’t require a database or a data store, or anything like that to hold users or resources, etc. Cerbos is purely evaluating requests based upon the context of parts of it from the application layer. And that stateless architecture means you can put Cerbos everywhere; you can put it inside of every pod and on every cluster and every deployment and you can have servers spread out and running everywhere to ensure that every service has a local version of the policies to evaluate against.

Alex Olivier 00:31:18 So you go and deploy your server instances, it’s now running inside of your environment. And then the final step is updating your application code to cool that server instance. So we have SDKs and APIs available — pretty much every language and framework now and you do that one type of process to update the application code and call that Cerbos instance. So that service instance when you deploy, you can you tell it where to get its policy files from and we support a Git repo, we support a cloud storage bucket, we support just files on disk, and we also support Cerbos hub, which is our managed control plane. So that’s a synchronization layer and CI pipeline that pulls the policies down as well. But ultimately those YAML files end up compiled, tested and distributed out to your environments and that local policy decision point running alongside your application, you simply say here’s a user trying to do this action or this resource, it evaluates the current policies, comes out with a decision, creates an audit log of that decision, and then returns it back to your application. So it’s actually a very, very simple interface by design. There’s essentially one API in Cerbos with a secondary one for a data filtering use case where you say user action resource, it goes yes or no. And that’s all you have to kind of worry about from implementation perspective. And then all the smarts and the rules engines all part of the open-source project that you get by putting Cerbos down as your service architecture.

Priyanka Raghavan 00:32:29 You also have like an audit log, is that what you say for every action? So it’ll be running sort of locally and then it gets synced to some master.

Alex Olivier 00:32:38 Yeah, so every instance of your policy decision points of your service container and generates its order log and then you have a configurable option of where you want to send it. If you just want to use the open-source project, you can have it just log to standard out and then have your existing logging infrastructure pick it up and you can tell it to go right off to a Kafka topic either. If you want to also we have a very common setup we see is users are running the typical low-key Grafana type setup. So that will go pick up the logs and set them off or use something like Fluentd and those kinds of tools. We also have a managed log collection system as part of Cerbos hub, which gives you nice UI for delving into your authorization logs. And the one thing I will say is audit logs are kind of one of the superpowers and also almost like a bit of a side benefit of externalizing authorization — not just with Cerbos but generally your application logs are going to be spitting out all sorts.

Alex Olivier 00:33:25 You’ll have stack traces and memory dumps and all sorts going on there and you can have a very large volume of data, but authorization logs — these audit decision logs — are kind of a different type of log that you do need to keep and you want to have more than a three month retention on, you might want to have a three year retention on because of compliance reasons. So being able to send those specifically to a destination that is a goes to an environment that gives tools to your security team, to your compliance team, to your application developers to debug or, access control logic is a real advantage and one of the things you just kind of get for free for using externalized authorization approach and that will tell you at this time, this user tried to do this action on this resource and it was allowed or denied by this particular version of this particular policy. So you get that very granular insight what’s going on inside of your system without having to necessarily dig through your actual application-level logs.

Priyanka Raghavan 00:34:17 Absolutely. I can see a use case for that. Yeah, that’s a lot of digging that you need to do.

Alex Olivier 00:34:21 Oh yeah.

Priyanka Raghavan 00:34:22 Also thinking about like where I work at sometimes, we also have this case where like if you are auditing a database there’s always you have to decide on what to audit, right? Every action. What should you audit? Because again, the logs can be huge. Do you have to have a similar consideration with your authorization logs or is that a bit more leaner?

Alex Olivier 00:34:41 Yeah, so the logs themselves are a bit leaner because you’re purely just capturing the decision. You’re not capturing the whole request context, you’re not capturing the whole request pipeline, et cetera. And for authorization logs, particularly for regulated industries where you must maintain a log of X number of years, you do need every single decision captured because now you’re dealing with the actual actions of individual customers or users or subscribers inside of your system. And you need to be able to pull that out and essentially replay exactly what that person did. Particularly if you go to a kind of a subject access request type environment or got a suspected breach identity, you need to be able to go fetch that. So your security logs are a different type of log concern than kind of the application side of things.

Emre Baran 00:35:24 In the regulated industries. It’s not only enough to know who did what and whether they were allowed to do or so, but why. Why were they allowed to do that and why they weren’t. So ultimately there’s that custody chain of not only what they did, but what that had in the policies are who changed the policy that allowed that person to be able to do something? So they need to be able to also trace it all the way to the policy and who updated that policy at the end of the day, let’s not call it finger pointing, but they want to understand if there’s an incident you want to understand the full reason behind it. And service allows you to do that as well because it not only all the decisions are logged, all the policies and all the different versions of the policies are also logged and with their entire commit log. So you can figure out what in your organization actually caused this incident to happen so that you can actually prevent it next time properly.

Priyanka Raghavan 00:36:26 Thanks for that. I think that was a very good discussion we had. And I had a question on the stateless authorization. How does that work? Like, so do you work with standards like say JWT tokens or OpenID like and how does it get the context?

Alex Olivier 00:36:40 Yeah, so again, stateless authorization versus stateful authorization. In the stateless model, the authorization layer doesn’t retain any data store of users of resources versus the state full model which would have like a replica for your data. So the onus is on the also refer as the policy enforcement point the component which is going to do the check to see whether an authorization should be and should be allowed or not. It the onus is on that component to send the state, so who the user is, what the resources and other context in the request as it happens in order for the policy engine to evaluate and come back with a decision. So how you transfer that data, typically it’s just a big JSON object of here’s all the details you need, but using standards like JWTs or two tokens, those sorts of things kind of smooth that journey out.

Alex Olivier 00:37:28 So in the case of Cerbos you can fill in the data yourself or your application can or you can just go and fill or pass on the GWT directly to Cerbos and the Cerbos itself can actually go and verify that token if you can provide the key set and then the content that token is made available inside of the policy and for the what we refer to as the principle or the user components of that there are defined standards for it the OAuth 2.0 work and JBT tokens being the obvious one there. For the actual resources it is a bit more freeform because it’s down to what your application, what data model is. So there isn’t a standard to point to for that. But where there is a relevant standard, those are adopted and can then be used inside of Cerbos as well.

Alex Olivier 00:38:07 And just on the topic of standards more generally, there is an ongoing effort of which Cerbos is part of under the OpenID foundation called the AuthZen Working Group in which we’re active contributors of around standardizing the API interface between applications and policy decision points or authorization services like Cerbos. The first specification has been published that’s out there and been now adopted and we’re getting more application implementers through getting the author standard implemented inside of their application layers of which then you can then go and plug in any policy decision point like surplus interchangeably into your different systems in your applications.

Priyanka Raghavan 00:38:47 Just to kind of buildup on that for the decisions to happen where you rely on an external source, what are they like for like when you’re doing an enforcement of a policy, would you go to a database or API or is that what you’re saying is configurable?

Alex Olivier 00:39:00 So we have a pretty strict line on what Cerbos itself or policy decision point should do inside of the system and one of the things we really design for is predictability inside of how your policy decision point will behave. So Cerbos is fully stateless in the sense that it doesn’t store state, but it also won’t call out and go and fetch state from other parts of your systems. My background as well as Emre’s is from building very high throughput, low latency data processing systems. Billions of billions of requests a day is the kind of typical day for us in in our previous lives. And so we’ve made kind of every mistake possible when it comes to venture consistency and scalability and thundering herd problems and all that sort of stuff. And one of the things we decided very early on when defining Cerbos and specifying Cerbos is Cerbos itself when it’s running, once it’s got policies in there, it would not do anything else on your system.

Alex Olivier 00:39:50 It is down to the cooling application to pass all the state through that. And the primary driver to that is many orders of layers of management and process involved etc. behind someone may make a very small change to a policy. And if that policy decision point had the ability to go and fetch state from across your architecture one small change in a policy somewhere upstream once it hits your production environment, that small change could result in some massively unexpected load to some other parts of your architecture. Because if that policy now needs to go and fetch some new data point about you from some other system which doesn’t normally get any traffic, you’re now going to push this change out and now suddenly that system is not scaled, it’s not ready, you’re now going to add this massive latency or even just request failures because they can’t handle the load to your system. So we made that call early on from like I said, being burnt in previous lives to make sure that services extremely predictable in what it will do and what load and performance characteristics it’ll have across your architecture, and it’ll never be in a position where it can start putting unexpected load and traffic onto other parts of your system.

Priyanka Raghavan 00:40:53 So where do you store policies in a stateless decoupled framework and if something changes how do you do this policy reloading without, disrupting a service in a distributed environment?

Alex Olivier 00:41:05 Yeah, hot reloading and such. Yeah, absolutely. So in the distributed environments there’s obviously a challenge of how you get those policy files down to those different instances that deployed potentially hundreds if not thousands in some cases across your architecture. So the way this works is you store your policy centrally, as I mentioned earlier, there could be a GitHub rebate, it could be in a storage bucket, it could be an asset stored someone inside a stack. And then each of those service instances in the open-source project you could figure it to say go and get the policies from this location. And that is a poor model. So each of those service instances will go and check on some regular configurable basis from a get repo or from an S3 bucket or wherever you are storing your policies, and we’ll pull those policies down and swap them hot, swap them in memory if they’re valid to go and start evaluated base.

Alex Olivier 00:41:51 Now for those of you that have dealt with these kind of problems before, you kind of immediately run into the problem of well if I’ve got a hundred services instances running and each of them is taking ten second intervals to check for updates, it’s going to take up to 10 seconds. Let’s say for a policy change to apply that may be okay for your scenario or it may be a bit of a problem depending on how fast moving your policies are. So as part of Service Hub, which is our management control plan that sits on top of the open-source project, we flip that model around and it becomes much more of a push model. And so we can coordinate and synchronize the rollout of policy updates across the entire fleet without you having to kind of worry about anything like that. So the policies are still stored in central location and get repo or storage bucket, etc. but the compilation and distribution on those policy updates is now coordinated via the control plane and that is service hub.

Priyanka Raghavan 00:42:36 I guess the next question I have is you talked a little bit about testing that’s offered as a part of Cerbos like so how do you test and validate policies? Do you have like some examples that you can talk about? Like how do you validate like a new policy?

Alex Olivier 00:42:51 Yeah, certainly. So there’s a validation step and there’s a testing step. So first off, because Cerbos, we use mentioned previous earlier, YAML as our format for running policies, there’s a strict schema for that. We publish those schemas publicly. So your VS code, your editor of choice, whatever you may be using these days will light up and give you validation of the actual structure of the policies themselves or to complete all that sort of fun stuff as the kind of the first step. And then Cerbos itself has this test framework built in as well. So you can define your policy file structure may be valid, but then you want to make sure it’s logically valid as well. So you define those test cases, example users example resources, expected actions and then as part of the open source CLI tool, it goes through that firstly validate the structure and then also run all the tests, make sure that the expected outcomes are as, they should be similar with any sort of test-driven type development. And those same tests can then be running in running your CI pipeline, be it when you set up yourself say GitHub actions, we publish your GitHub action for that or as part as more of a managed control plane offering like Cerbos hub.

Priyanka Raghavan 00:43:55 I also wanted to ask you one more question. Everybody’s now at the time where they’re trying to build their own Chatbots or LLMs and those models. So when you do this authorization, I feel like a lot of the good practices that we got on say these web application-based projects, OSP and all of that, we had a lot of checks that were there and it’s important to do. But with the AI and ML Chatbots, some of them are lost. But do you think is it a different type of framework that has to be applied to those kinds of applications or, do we use the same principles?

Emre Baran 00:44:27 Yes and no is the answer when it comes to software engineering, it’s never a pure yes or a pure no. So if you look back at software development, we’ve spent the last 40 years in trying to secure the backend and the front end and the communication in between them, right? And now with the AI being so advanced and Chatbot technology has been around and when those two married, suddenly we have now a third interface where, the AI can actually have access to your data and it’s actually even potentially bypassing your backend and it’s having unrestricted access to your data to be able to train the models and then it can actually get also the same models LLM models and same RAG architecture and AI can give you the answer straight out, right? And it does bypass your entire backend and frontend security that you’ve built in there.

Emre Baran 00:45:17 A classic example of this is that you can think about any analytics system or like any HR system where there’s an AI chatbot on top right? It’s leaking data because, if A CEO asks for what’s the current payroll, he should get an answer of inclusive of the entire company’s information. But if a regional VP asks, hey, what’s the payroll? It shouldn’t give the same answer, it should only give the answer for that given region, et cetera. So we need to now start securing these AI Chatbots AI agents with the limitations of the user. And in order to be able to do that, we need to be able to actually filter the data that comes into these AI models and filter the data that actually comes out of it and Cerbos, it’s data filtering, authorization aware data filtering capability, something that Alex talked about earlier, which is the query planning and being able to actually filter the data based on what you should have access on gives possibility to the AI agents to be able to only return a subset of data rather than entirety of it. So there is a use case for the AI agents to be able to use this authorization logic when as the data is passing through it.

Priyanka Raghavan 00:46:34 Great, because I was just thinking when you’re talking that even about this, that Chevy Chatbot, right? I think they had this case where it was just opened without any controls and I think finally I think the chatbot, they could they had to like to give them a Chevy for a $1 or something like that because the person had like prompt engineer.

Emre Baran 00:46:54 There are lots of examples of this, right? In terms of there are some in airlines there would be some cheap tickets and refunds being given. At the end of the day, we need to inspect each one of these things that the LLM models as returning as a response and turning them into potential API calls and be able to check if the user is allowed to do certain things.

Priyanka Raghavan 00:47:17 Okay. So then in that case also like a policy decision point should be built on top of those Chatbots is what I’m saying. So that’s lot been.

Emre Baran 00:47:26 Absolutely. So Cerbos policy decision point has two major API one API is very specific question, can this user do this action to this or can this subject or principle or user, whatever we want to call it, do this action to this resource. It’s a very deterministic question, yes or no. And then the second question is what resources can this user do this action on? And being able to filter that, being able to give that gives you the ends ability to be able to filter your data as it’s coming out of a database to those only those records that the user has access to.

Priyanka Raghavan 00:48:02 Great. So the last question I want to ask you both is, do you see opportunities for say AI or ML to complement stateless frameworks? I was reading this paper a few days back on adaptive authorization and anomaly detection. Is that something that you think will be the future or is it already being done at Cerbos or other places?

Alex Olivier 00:48:24 Yeah, so there’s lots of places that I think make sense to use this kind of new world. There’s also a couple of places where I think you definitely do not want some AI model meddling in. And the places where I think it makes sense is at the start of the process when you’re trying to take those business requirements and convert them to policy. I think that’s a really interesting area for renovation. And you can ask Chat GPT or Claude at the moment, here are my requirements, give you a service policy. And they actually most of them will, and it’ll cover up with a pretty good policy these days. So, which is kind of nice. So it’s clearly read all our documentation, etc. And at the other end of it, which is once you’ve got that audit log of all the decisions being made, you got that log stream, that’s another area where you could start doing things like anomaly detection and understanding kind of what’s going on and use these new tools to help you find the signal from the noise.

Alex Olivier 00:49:09 So I think those are two ripe areas for opportunity where I have, I’m strongly think today at least, AI should not be involved, is right in the middle where the actual decisioning process happen. Authorization is rules, it is business requirements, it is compliance needs, it is regulatory hurdles that must be met and that needs to be certain to behave in a certain way. You don’t want to be worried about what the temperature of the model that deciding your authorization logic should be. You need to make sure that that middle po, the component, the rules engine, the evaluation engine, is always going to give you the right answer every single time. And that is where good code, efficient code, call it handwritten artisanal code if you want in the middle, should be the one driving the system. But certainly the, this new world of tools can really help us, both the authoring and the understanding side of things.

Emre Baran 00:49:59 The enforcement needs to be deterministic, and you cannot afford to hallucinate even once because that one instance may cause disaster.

Priyanka Raghavan 00:50:09 That’s a nice way to end the show. It must be deterministic, the policy enforcement trait. So what’s a good place to reach you if somebody wanted to in cyberspace like our listeners, Alex and Emre, would it be LinkedIn, Twitter, or X or anywhere else?

Emre Baran 00:50:27 Absolutely. So our website is Cerbos.dev. All of our resources, all of our products and all our documentation can be found there. If you want to reach us or our teams, we have a Slack community that we are pretty responsive on and we want to help developers adopt externalized authorization as much as they can. And then if you want to reach out to me individually, I’m Emre Baran on LinkedIn and @Emre on Twitter or X.

Alex Olivier 00:50:53 Yeah. And I’m Alex Olivier on LinkedIn and Alex Olivier on Twitter.

Priyanka Raghavan 00:50:56 Great. I’ll make sure to add that to the show notes. This has been a great show. Thanks for coming. This is Priyanka Raghavan for Software Engineering Radio. Thanks for listening.

[End of Audio]

Join the discussion

More from this show