Shahar Binyamin, CEO and co-founder of Inigo, joins host Priyanka Raghavan to discuss GraphQL security. They begin with a look at the state of adoption of GraphQL and why it’s so popular. From there, they consider why GraphQL security is important as they take a deep dive into a range of known security issues that have been exploited in GraphQL, including authentication, authorization, and denial of service attacks with references from the OWASP Top 10 API Security Risks. They discuss some mitigation strategies and methodologies for solving GraphQL security problems, and the show ends with discussion of Inigo and Shahar’s top three recommendations for building safe GraphQL applications. Brought to you by IEEE Software and IEEE Computer Society.
Show Notes
Related Episodes
References
- LinkedIn: @shacharbinyamin
- GraphQL Vulnerability Analysis: the Top Threats
- GraphQL Is Not Meant To Be Exposed Over the Internet
- Inigo.io Blog
- GraphQL Cheat Sheet
- OWASP Top 10 API Security Risks – 2023
- DevSecOps Must Turn the Tables on GraphQL API Attackers
- GraphQL API Vulnerabilities
Transcript
Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.
Priyanka Raghavan 00:00:19 Hi everyone. I’m Priyanka Raghavan for Software Engineering Radio and today I’m chatting with Shahar Binyamin the CEO and Co-founder of Inigo to talk about GraphQL security. Shahar is a software engineer by trade. He has extensive experience working on many high-profile enterprise applications and security projects. He has written several articles and given talks at technology conferences, all of which I have added to our show notes. So welcome to the show, Shahar.
Shahar Binyamin 00:00:48 Hey, Priyanka, great to be here.
Priyanka Raghavan 00:00:50 One of the things we’ve done is we’ve done a deep dive on GraphQL, which is Episode 530. So listeners can obviously listen to that episode to understand the history and the basics of GraphQL. Having said that, since we have you on the show, and it’s been a while since we did that, I have to ask you if you can just briefly define for us what is GraphQL. I know it’s tough, but maybe a little bit, one or two lines. And then can you also tell us why the state of GraphQL adoption is so great, or whether you think it’s otherwise, but I personally think there’s a big adoption in the GraphQL space, and what is the state of GraphQL adoption?
Shahar Binyamin 00:01:26 Yeah, absolutely. Great question. So GraphQL is merely a spec. It’s an API spec that came out of Facebook, no Meta, back in 2016, and it’s a query-based API and really was there to solve some of the REST limitations of under fetching and over fetching. And it’s a great tool to really expedite frontend developers, allow them to move firmly, extract any data that they want in any type of hierarchy. And we’ve seen a lot of it when it comes out of Facebook. We’ve seen the open-source communities, specifically developers, really adopt in it. And you can actually find an implementation of GraphQL in any programming language you might think of. Now, when we think about the adoption of GraphQL, I would say that like any developer driven technology, developer brings it into the organization, they get all hyped about it. And like any new technology, like let’s put this everywhere , it doesn’t work everywhere, but it does work really well when it’s in the right place, like client server or we have an open API. So it had a rough patch at the beginning with rejections and like lots of excitement, a lot of rejection, and now it’s really finding its place and actually becoming mainstream in many enterprises.
Priyanka Raghavan 00:02:40 And this kind of following up on that question, you had an article on DevZone, which was titled, You Love GraphQL Now How to Make Sure your Organization Does Too. So why do you think GraphQL is popular among developers and what could they do to bring it into their organizations? Maybe you have like a case study where you have worked with a company where the developers liked it and then they had a success story of bringing the organization in.
Shahar Binyamin 00:03:04 Usually there’s always one champion and maybe this person worked with GraphQL in their previous company. And that’s the story we see a lot. And then people move around and they say, hey, we know about this new technology and there’s always a new feature, there’s always a new product. Some legacy companies say, okay, it’s time to refresh our stack, and then GraphQL is becoming a discussion. Can we do this? And frontend developers want to hear about this and say, yes, this is what we want. So now you have the backend champion and the frontend team pushing for the same thing. And usually it’ll start with the side product with some exploration. But now if you want, you as the champion to push this down the org., then you need more supporters. And when it even gets to a point that you would like to productize it, what we’re seeing in a lot of technology companies, this new concept of platform teams or API teams, some call them architect teams or core team, the names really is vast.
Shahar Binyamin 00:04:03 And now you basically need to ask them to own this other type of API as part of the API responsibility. And you need to have a good reason, a good mandate, and a really good set of building blocks to allow an organization to adopt a new technology. If we think about the equivalent to Kubernetes, when developers brought it in and then they said to the DevOps, hey, now you need to operate this. And maybe two minutes later the security team will come and say, hey, now how do we secure it? We can see a similar story with GraphQL with the platform teams, those API teams, they say, hey, what is this? You’re asking us to do this, but how do we do this? And I’m sure we’ll break it down.
Priyanka Raghavan 00:04:44 And that’s very interesting. To summarize, I think what you’re saying, this is probably true for any new technology, right? They go through a review board and then pass it on to a central team, which then allows the whole sort of spread of that new technology across the organization and then the security teams come in. I think that probably then goes on to my next question on why is GraphQL security important?
Shahar Binyamin 00:05:07 It’s a, let’s unfold this question. Okay. When adopting a freeform API, which is the nature of GraphQL, which is super powerful intentionally or unintentionally in many cases, it couldn’t get abused. So GraphQL, because of its nature, it opens the door to a new paradigm of attack surfaces. Think about all the attack you can have with thrust API and now add a whole new set level of tools that are specifically for GraphQL. You can see this I guess in four places of attack surfaces, you can think about the spec itself can be abused, which is interesting, the parser, the logic. And we know that not all graphical implementations are the same. You can think about resource exhaustion when you allow to ask questions in hierarchy, what is the cost of such a call? Are you protecting from data leakage? Which is much more open.
Shahar Binyamin 00:06:03 If you think about REST, again, if we take a step back, it’s a contract between the sender and the reply. These are the question; these are the response. Very strict. GraphQL is free. Not all requests are the same. They’re unlimited number of requests unless you put the right guard rails. So same as the responses. And lastly, how do you bring these all notions of access control that you already have in your system, and you bring them to GraphQL as well. So a whole new set of challenges, but who’s the owner of them? That’s another question.
Priyanka Raghavan 00:06:36 Okay. Wow. That’s great because it’s very interesting. I also noticed in the OWASP 10, 2023 API edition, I noticed that there were so many examples on every category which had like, how can it be abused by GraphQL APIs, you know? So, that’s very interesting from what you said because I think that’s one of the things, I thought we can dive in a bit deep in the next section, but what are your thoughts on that?
Shahar Binyamin 00:07:00 Well, absolutely like anything you can do with REST, you can do with GraphQL and more. What sometimes is frustrating that you go to the OS website or other API gateways or other WAF tools and you go and see what they can do for you for security. And sometimes it’s very limited and sometimes it ends with just a list of recommendations. So how do we take this recommendation and put them in play? That is a big question and it’s great. In the last, I think two years specifically, there is a lot of content about Graphco security and articles like this are perfect because before that the information out there was like extremely limited. People were talking about height and depth and N-One, which is a great example, but it’s the tip of the iceberg. And if you really want to increase education around it, more and more articles and information and blog posts are coming out there really enriching the community and it, which is very important just because of the variation of implementations of GraphQL out there.
Priyanka Raghavan 00:08:12 Interesting. So I think as the implementations differ, you’re saying that the documentation also has to keep up to know the different ways it can be exploited.
Shahar Binyamin 00:08:22 When last year, what something that we’ve done is we wanted to be super critical about are these vulnerabilities really exist? So we have done two interesting things. First of all, we looked into the CVE database and we just wrote GraphQL to see what will come up and then we saw a lot of incidents and remember to import, not all incidents are reported. Not all the ones who reported are actually getting a CV, but you start to see a trend and then we try to analyze where those vulnerabilities really rely, and you can see a lot about authorization, a lot about Dos. And then do you have the other classical like caught execution and injections and maybe disclosure. So we had so much fun doing this exercise, we did the same with HackerOne and we went to the HackerOne vulnerability database and we said, let’s try to do the same.
Shahar Binyamin 00:09:19 And it actually the first one dated to 2018, which is even before the first CVE. And we’ve noticed bounties really ranging from 250 even to 20K. And that’s the beginning. That’s all the different implementations are coming out and companies are going out and say, hey, can someone, we don’t even have the tools to try to hack ourselves and we need help here. This is new territory, this is new water. And for HackerOne, all of it was almost 90% was about authorizations flow, which makes sense. You don’t usually go to HackerOne for someone to Dos you. So all those attacks are out there and we can talk about, it’s interesting to see what hackers, how they do it, and we can break it down.
Priyanka Raghavan 00:10:08 Okay. So I think one of the things I should ask you before we move to the next section is also why and how do, attackers hunt the GraphQL? Is it just because the fact that, like you said at the beginning, the REST contract is very rigid, so therefore it’s, is this more easier?
Shahar Binyamin 00:10:27 Okay, so what do hackers do with GraphQL?
Priyanka Raghavan 00:10:30 Yeah, exactly.
Shahar Binyamin 00:10:31 And how did they, well, they do two things. First, they detect it, then they will fingerprint it and then they will abuse it. So let’s talk about detecting. GraphQL will always return something, which is one of the biggest blind spot of every GraphQL owner or developer, that there’s a big blind spot on the responses and errors, because the errors or the content or the intent relies in the body and not in the HTTP response. So there’s always a response, even if it’s, hey, you’re wrong, there is a response. And there are very, specific notions on how companies put their GraphQL endpoint. It could be \GraphQL\query\API\playground graphical. There’s specific notions that are very, very common to use. So nothing is preventing an abuser to go to any domain and just write \GraphQL and send a query.
Shahar Binyamin 00:11:32 It doesn’t need to be a valid query. It’s just sending something. So you can go over every company, any domain and start say, hey, something will reply at some point and you can automate this obviously. And once you get the JSON reply, even if it does, doesn’t matter what is reply, then you know, then you have a graphical server or an endpoint out there. So that’s the detect part. Now, based on the reply, and again, you don’t need even have to, you don’t need to know the introspection, you don’t need to know the schema. You can just send type name. And again, please don’t use this. It’s very basic once you hear it now, but then why not use it? And so don’t , we don’t want to encourage abusiveness, but we want to educate the ones who actually own GraphQL servers. Their response could be very different between the different implementations.
Shahar Binyamin 00:12:21 One of them can have the word in syntax error, and one of them can have the word in syntax GraphQL error. So based on this, you can actually fingerprint what implementation is being used. And in some cases, you can even fingerprint the version. It could be PHP, Golan, JavaScript, Java, Haskell, RUST, they’re all built by different open-source communities, and they all have different levels of efforts in them or dedication in them. And there’s a lot of, you also published ones, there’s other who published a whole table that identify the vulnerabilities in each implementation. In GO, there’s a few implementations of graphical server. So knowing able to detect, able to fingerprint the implementation and be able to now with equip with all those, you can run against the vulnerabilities of that specific implementation. And that’s just a spec abuse. It goes even before. So that’s what out there. So what can you do? I don’t know, don’t use a classic endpoint so people can guess, and we can talk about what does it mean really to harden a graphical endpoint?
Priyanka Raghavan 00:13:39 I think that’s very interesting. But it opened up a whole lot of questions, which I hadn’t really thought about the hardening part. Yeah, I think that’s interesting. So I think the first thing you said is also that’s the big takeaway from this is also not to make your, you know, the endpoints have to be a bit not so easy to find, right? I think the very first thing that we should tell people to protect their GraphQL.
Shahar Binyamin 00:14:01 And I’m not, the biggest supporter of security by obfuscation, but anything you can do to reduce automation against, it’s just good practice.
Priyanka Raghavan 00:14:12 No, absolutely. Because nowadays I think automation is, yeah, everybody’s running those scripts to do everything. So I think it obviously makes sense to follow security, you know, obfuscation, even though it’s not really obscurity, actually security by obscurity, even though it’s not the most intelligent thing to do. Yeah, I think now it would be a good time to like dive into some of those issues that you talked about. Like you briefly alluded to the fact that one of the things that you see in a lot of your research has shown that the authorization errors, they seem to crop up a lot in these use cases. So one of the things I noticed when I was reading the OWASP top 10 API security checklist was the broken object level authorization where essentially one of the things you do, you have these API endpoints where they might be querying something like an ID for example. And that seems to be one of a very good attack vector. And there was an example there also with this can be abused by for GraphQL API. So is that something that you can just briefly talk us through what you’ve seen?
Shahar Binyamin 00:15:18 I’ll break my answer into three. First of all, visibility and then attribute access control, and then role-based access control. The first thing, which is the baseline is even knowing it’s happening or ability to know it happened, it really goes into deep observability and monitoring tool. That is also still a big gap when it comes to GraphQL, is the ability to know when these things happen in case of an audit or an incident or even trying to figure out what goes wrong and what can be improved. And we’re seeing many, many companies that it’s a massive blind spot. Everything, all the graphical traffic and all sort of attempts to put in their REST hat. Let’s say, hey, I’ll just throw everything to my log tool, to my SIM, Splunk, or Datadog doesn’t really help. And really, really hard to create insights and understanding of what’s happening.
Shahar Binyamin 00:16:16 So if you’re blind to everything, there’s no point in trying to, but that’s the first step. First line of defense is even knowing what’s going on and then later with a fight. And the second thing, let’s talk about role-based access control. Your schema could be big or could be small, but doesn’t mean that everyone should see it. And the idea is to bring your existing notions of roles in your organizations — maybe anonymous user, maybe a member, a user, an admin — and really shape the schema when call for introspection to only allow what this specific role is allowed to do and enforce it at that time. So when they call introspection or when they call queries or mutations, you start to differentiate who can access what in the schema. And the sooner you do it, the better before it hits your business logic.
Shahar Binyamin 00:17:11 And lastly is the ABAC (attribute-based access control) is like where do you put those business logic that are specific to your org of who can access what specific data fields in your org, if we’re thinking about, I’m trying to get a specific, let’s look at like a Twitter or X as an example. Maybe anonymous user should not be allowed to read comments. So if they never logged in, if they don’t have a jot, if they try to, has a query mutation about comments information, they’re blocked immediately, they cannot. But if they are logged in a user and they try to get the filled comments, then you have to write your own unique business logic that says, hey, this user is user seven, they cannot access. And that is more hard work, but both of them are needed because you don’t want to trigger this business logic if that user should not even be allowed to ask for that field. So you have to have some combination of both. Hope it makes sense.
Priyanka Raghavan 00:18:10 So you’re saying essentially what I understand is also like have like various levels of check. So like the first level cuts you off here and then you go one level deeper, then you have another check, like the aback kind of check, which you said, okay.
Shahar Binyamin 00:18:23 Mistakes, mistakes always happen. Mistakes always happen. Changes of fields always happen. We can’t think of the schema as a static thing. Developers around the org will constantly, will continue to change the schema. ED fields reduce fields. At the end of the day, those vulnerabilities or hacks or data leakage or data manipulation happens because of a stake. And so even so, the more guardrails you can put even as part of the development lifecycle in your CICD are crucial. And the more checkpoints that you have, the more protection you will have down the road. Because changes of the schema happens all the time, more than you can even imagine, especially in large organizations.
Priyanka Raghavan 00:19:06 Okay. So the thing is that, when you have a lot of checks, can you tell me a little bit of do we have to have some guardrails for the performance also or, does that suggest a trade off?
Shahar Binyamin 00:19:19 Absolutely. It’s super important because it could cause resource exhaustion or Dos, it depends. An attack that we see, it’s a Dos attack basically if you get to learn if you’re an abuser or actually it happens unintentionally, not as abusive, intentionally abusive code, it just happens. There’s a change in the backend. And some field used to cost one millisecond, but now it costs 10 milliseconds because it computes on the fly or whatever it is. And then a big query or is calling for this and suddenly there’s a change and that calls becomes super long and maybe the parcel will fill maybe the graphical servers and maybe the timeout will happen, and you can cause a Dos, or you just consume in so much resources that is very expensive to your org. That goes back to that blind spot, that lack of what we start calling field level analytics and sub graph visibility is the ability to really to trace in this GraphQL journey what is slow, what is slowing you down and be able to have those platform teams really have, be able to have a database conversation with those sub graph owners.
Shahar Binyamin 00:20:41 We see this all the time.
Priyanka Raghavan 00:20:45 So I think it, so from what I’m saying, the observability piece is very important. So how you set your, what fields your query to provide the insights right to your performance.
Shahar Binyamin 00:20:55 Yes. What fields your users are using or your customers are using to query things. Because if something costs a lot of time and it’ll be asked again and again and again, 10,000 times in one query and that query has been sent a thousand times in a second, it’s not hard to predict what’s going to happen.
Priyanka Raghavan 00:21:16 Yeah. In fact, I think this was the one that also was mentioned a lot on some of the articles and blogs I read with talked about GraphQL being very susceptible to these batch attacks and then, and people are just like trying to brute force these batch attacks and then try to bypass the rate limiting. So that’s something, could you maybe give us an example like have, if it’s okay to talk about something where you’ve seen an example, I mean you don’t have to mention companies, but maybe just an example, these types of attacks.
Shahar Binyamin 00:21:48 Yeah. It’s hard to admit, but this is where your existing API gateway is failing you. It’s notion to, I’m going to give two examples of a rate limited for GraphQL can hurt you. The existing tools of counting API calls, they don’t work for GraphQL, it doesn’t matter. For GraphQL when you think about rate limiting you have to stop counting how many calls that happened versus you need to start counting operations and mutations. I will break it into two things to have in mind when it comes to GraphQL. The first, think about field level rate limiting, which is the classic example of brute force. One API that will fly through all of your existing security tools, can have thousand or 10,000 login attempts with different passwords. So you have a very easy brute force attempt that it goes undetected without the proper monitoring and rate living in place.
Shahar Binyamin 00:22:46 And you can do this the same with the example we talked about before. If we know that there is a very expensive field, an abuser can just hammer this field in one call or again and again, again repeatedly with multiple queries or multiple aliases and just hammer this field. So that is thinking about field level rate limiting and why it’s important to dive there and extend the intent of their request and put some, I guess, tighten the knobs on those guardrails. If it’s max aliases or max root operations, there’s many ways to address this problem or rate limit how many times a login can happen. And if you tie it back to what we spoke about before with the roles, maybe anonymous users can only do five of them a minute and maybe an admin could do a hundred of them. So how all of them just starting to work together.
Shahar Binyamin 00:23:38 And the other side of rate limiting when it comes to GraphQL is start thinking about cost-based rate limiting. And that means you have to start processing the response. Again, very unique to GraphQL. Are you counting how many objects have been returned and why is this important? It’s important because you want to protect from resource exhaustion. We also want to protect from data scraping. And we see this all the time when you have maybe a marketplace and you have a competitor trying to figure out the pricing that you have, they’re just like hammering your listing. So by monitoring those heavy objects, you count how many objects are being returned per user per role. And if you want to take this to the extreme, you can also have a dynamically assigned weights, meaning if you realize that specific queries started to be more expensive because a field, again, fields change all the time. Databases change all the time. And if you can have something fully managed that you can detect, oh, this feels suddenly more important, you give more weights to that, more credits you can arguably say and then you can constantly monitor and keep your health state of your comparability of GraphQL in a good shape.
Priyanka Raghavan 00:24:55 Wow. That’s really good. That’s very insightful. I think one of the things to just take away from that is there’s also this another attack vector called broken object property level authorization. So just, just one of the examples that, what I wanted to ask, I thought it’d be good to ask with an example and then take you through it because I think this is one where, you know, you have I suppose you have an example of an online marketplace that offers like, you have two types of users, like a host who’s used to rent out their apartment and another type of guest who actually wants to stay there. Then some of the things that there’s usually like a call which says like, approve booking to the person who’s renting out their apartment. And then the payload will be like, something legitimate, like approved is true comment is check in after 3:00 PM and suddenly this is one thing which I saw in some of the blogs where said that they add another entire new field to that JSON structure, like a malicious payload saying they modify the price. What was just supposed to be an approval, API now just has like an extra field and then that’s suddenly bank money is gone from this person. Right? So like these kinds of things. Can you talk a little bit about also this kind of property level authorization that we need to have?
Shahar Binyamin 00:26:13 It’s a great example. And again, I would assume it happened because of efficiency, trying to move fast, trying to empower, maybe there’s an admin function that they use, and suddenly it was exposed, suddenly the tree is exposed. Even if introspection is closed, there are many ways like fuzzing and others to know what’s really a pair option is. Options are with the mutations or with the query. The way we think about this at Inigo, it really ties goes to the enforcement of role-based access control and role-based access control could be very, very deep. It could go all the way to a field level. Do I want to expose an email? Do I want to expose a name or a last name or an SSN based on the role? It can also be an attribute in the mutation arguments, meaning that in this example, when someone’s calling a mutation and trying to enforce in the argument a variable that’s not supposed to be there, first that variable will be rejected.
Shahar Binyamin 00:27:15 And lastly, you can also, and that’s another common attack. We didn’t talk about this, which is exposed. You can also put injection in any free form aspects of GraphQL from operation name to alias name to a variable information. So how do you run validation against any input field? It also critical to put, so we’ve been talking like for 30 minutes and we just expose a lot of different vulnerabilities of GraphQL and the more the conversation deepen more things surface. And it’s really challenging to chase them because at the moment you thought as the champion of GraphQL that you’re bringing new technology into your org and suddenly you’re deep into conversations about, hey, how do we operate this at scale? How do we secure it? And you might not have that expertise in your org. And we’ve seen companies, again, a very, very large-scale companies have dedicated GraphQL developer teams, which is very, very expensive. You think about this. But the reality is that the first movers of GraphQL had to, the first enterprise movers of GraphQL, we can name Coinbase, we can name Reddit, Wayfair, GitHub, happily or not happily. They had to invest a lot in those building blocks and created a lot of innovation, which is phenomenal to see.
Priyanka Raghavan 00:28:38 To kind of take it back. So that’s community led effort for doing a lot of this innovation. Can you tell us any of the kind of good tools that came out of these community led efforts to find out these GraphQL issues?
Shahar Binyamin 00:28:53 There’s a lot of post blogs, okay. Blog posts, blog posts, a lot of blogs, a lot of tools. There are a few open-source tools out there. Unfortunately sometimes those tools are attached to the specific implementations or the programming language they were written at. There are some commercial solutions out there. There are some free solutions out there. We’re seeing, I think this is just very, very new in a year or two. We’ve seen a few open-source attempts to create a layer that does some sort of gatekeeping around GraphQL queries. Some of them do a good job, some of them are still early, but that’s great. It just means there’s more education about this. And we’re also seeing the communities around them around GraphQL implementations starting to add a little more rules about what is possible, what is not, which is I think the phenomenal for those who are getting started because off the shelf they can find something, that can help them questionable if these are enterprise grade solutions.
Priyanka Raghavan 00:30:00 So is there any like recommended scanners that developers can use?
Shahar Binyamin 00:30:06 I’ll hesitate to answer this question. I don’t want to put scanners examples out there for people to be abused. I would say that you can think about a scanner that runs against your GraphQL that define vulnerabilities. That’s a great option to include as part of your CICD. My worry with those scanners is that they need the counterpart, they need the ones that actually also run in your production that can force real time protection. And that is the key of how we are thinking about graphical security, A real time ability to address, to protect, to monitor and alert and those scanners, they’re great as part of your development lifecycle, but how do you get a slack alert when your most critical mutation is failing again and again and again. And I think this is the existing pain that GraphQL owners at scale are challenged with. How do they know they don’t unless they invest heavily.
Priyanka Raghavan 00:31:24 But the investment. Do you have any examples where the investment was worth the price? Is that what is stopping the teams from investing? Because I think you gave us some great examples, but they look, they seem to be like you really need to inspect every field and you need to go through to find out a lot, think behind every field that you put in, what are the guard rails? And sometimes that might kind of slow you down, right?
Shahar Binyamin 00:31:48 It does, it does slow you down. It does create some resistance and it does create frustration and roadmap. Slowdowns as engineer leaders are facing this responsibility of a new stack, what we all do is like, hey, what do we already have that can work with GraphQL? Oh, I have this apogee, or I have this gateway. It is like, what can they do for me with GraphQL? Well, very little. And then they think about performance monitoring that we touched before and we think about observability and say oh, Iím just going to put my Splunk or Datadog and just send everything to all the logs to it. It’s super expensive. And again, not every one of your developer team have access to it and it doesn’t do a good job in providing that field level intake. And then you start to say, as your organization, as your graph evolve, your organization will evolve with it and start thinking about composition and registry.
Shahar Binyamin 00:32:49 And your platform teams will get super frustrated with developers changing things and say, oh, it’s time to enforce linting it first start to enforce rules. So how do you get all of this as part of your API management developer life cycle, CICD? How do you get all of this component and suddenly you have to deal with a lot of things? What’s great to see is companies coming up from the GraphQL space like Inigo is they’re trying to think about like a holistic solution that like, hey, let’s connect all the dots, let’s sit very close to the GraphQL server and let’s think about, and this is, it’s a new concept. Let’s think about GraphQL management. What does that mean? What does that mean to your org? Can we answer basic questions about our GraphQL APIs? And you’ll be surprised to know that like when you ask like how many GraphQL API calls do you have or how many unique GraphQL calls people struggle to answer this question. It shouldn’t be a hard question to answer.
Shahar Binyamin 00:33:50 So when you think about recommendation or come to this like really ask your existing tools or vendors youíve work with, what can they do for you when it comes to GraphQL? What we often see is, not too long after trying to enforce REST tools through GraphQL, we see the data don’t work. And that caused a lot of frustration. And now you have to ask yourself what do I do? So in the first years of GraphQL adoption, the first enterprise movers of GraphQL went homegrown, let’s build this ourselves. But you can learn from this, and you can learn from what we’ve done and what can be productized, what can we generalized and can actually work for your organization. And you might not need all of this, you might only need some of it, but these are conversations to have with your team.
Priyanka Raghavan 00:34:41 I think one more thing I wanted to ask you before I go into the GraphQL deployment piece was one of the other attack vectors is the denial of service, which is also something we didn’t talk so much. I mean I think we kind of briefly talked about it in the introduction. Is it really true that GraphQL is more susceptible to denial-of-service attacks?
Shahar Binyamin 00:35:03 It is very susceptible to attacks more probably yes. Because it exposed more there’s more steps in the way that could be attacked. The parser itself can be attacked; the business logic can be attacked. And then you have the database connectivity that can cause data leakage or expose or data manipulation, which in some cases worse. Graphical responses could leak information even if there’s errors. Some implementation returns to stack or hints PII in the error response. You have to think if it introspection is a risk or not. There’s two different schools for that. One classic example that we like to give about the spec attack. Things that don’t exist in risk, directive attack. What happens if you overload a parser with directives that don’t exist? And I mean thousands of them, I didn’t sound who’s trying to be abusive. This could spin off the parser, it can overload the system and you have a Dos attack even before one line of your own code was executed. So directives attack is a very common and, and ISO attack that we like to share because it’s very easy to replicate, but you have batching attacks. We mentioned s attack that we mentioned that can do Dos, field duplication can cause Dos directive overloading, nested queries. Things people don’t talk about a lot is nested fragments super easy to do and the list goes on. And those are just whole new set of challenges that exist just for GraphQL.
Priyanka Raghavan 00:36:40 Wow. So then I think we ever talk a little bit about the mitigation strategies, . So one of the things I learned when Iím reading a lot of these blogs on the mitigation strategy was to limit the depth of the queries. What is your take on that? Can you explain that?
Shahar Binyamin 00:36:55 It’s just the tip of the iceberg to limit the, you should, you should limit it unless you know someone is using it. So it ties goes back with like, are people using it who are frontend developers actually using it. If you harden it too much, youíre harming your own developers from using it. So you have to have a balance between knowing how it’s been used before, limited. So throwing a few that depth, height, yeah. Control, no, those, those I call the snobs, like handles. You want to control how much people can play with this aliases, directives. What characters are allowed for injections and operation aim? Something we love to do is to force operation aim, because it kind of could hint the intent of the sender. And this is not for intentionally abusive calls, but an abusive call happen. But your own team, if you have the intent, you know who’s doing this max request size, max request response or time, how long it took. If you see a specific query, it took five seconds. Maybe next time you want to allow it. So you have to also start relying on historical data to make decision moving forwards. You want to maybe block get request and only approve post request. A lot of handles. You want to make sure they’re all locked and tied and you might have want to complete, have a set of rules between the roles because you want to allow your admins or the developers free access, but in production you want to little more tight.
Priyanka Raghavan 00:38:27 That’s really cool. That’s, I think, yeah, these are all very valid mitigation strategies that I’m seeing. So just being very superficial and saying just limiting the depth is not.
Shahar Binyamin 00:38:37 Oh, not at all.
Priyanka Raghavan 00:38:38 Yeah.
Shahar Binyamin 00:38:38 But if two years ago you would Google graphical attacks, you’ll only see three things. Depth, height, and PLUS+1. There’s much more. I would see if we’re talking about mitigations, the way you can think of optionally, think about this is, some abstraction. You have to think about the query coming in, let’s call it query protection. Does it look right? Height, depth, directives, does it go through the standard of queries theyíre willing to accept? The second thing, is it accessing the right fields? So think about access control. Can it access the right fields based on the role, based on the identity of the sender? Then you start thinking about rate limiting and say, okay, are we going to allow this user who asks so many questions in order to get more information or they’re using something we’re not going to allow, we allow only 10 login attempts and they’re sending 20 login attempts in one call?
Shahar Binyamin 00:39:37 And then when processed, look at the response, what’s in there? Let’s start counting how many objects has been returned. So the next time they’re reaching out to us, we say, hey, you already ask us too much. Or maybe it’s time to evaluate the internal dynamic cost of the, how much it actually cost us to response to this query. And if you go to the extreme and you’re in a completely financially or governed environment, maybe you should search for PII in the response. Either a specific fields or in the air response. This is a very expensive task to do in real time. So again, I have to put like to be very mindful about what security rules should you put. Now we are talking about GraphQL as in its open form. This does not align with everyone. Like if you don’t have to have a completely open GraphQL introspection schema, if you have a strict client server situation, yeah, those things should not be open and you should really find a very good implementation of operation registry. Which is funny it’s like you’re hardening your REST at your GraphQL to look like REST, but it makes a lot of sense in a client server environment that you’re only going to allow specific pre-approved queries to come in. But it doesn’t, that’s doesn’t make true to everyone. So having both of these options to you are critical to be aware of.
Priyanka Raghavan 00:41:03 Okay, so it almost feels like maybe it’s not, I mean you, you started off GraphQL to overcome the REST challenges, but maybe there are some good things from REST that you also need to put into Graph. Like Yeah, okay.
Shahar Binyamin 00:41:15 I think that the notion is during development cycle, everyone should just move as fast as they can. And in production, depends on how you interact with your customers or your developer client, server, app, whatever mobile website, then you can enforce different rules or different strategies or mitigations approaches. And that really depends on how your org is leveraging GraphQL. Some don’t expose GraphQL outside at all and use it as an internal data hub so teams can interact with.
Priyanka Raghavan 00:41:53 Yeah, in fact, based on this, I had a question because I came across an article that said GraphQL is not meant to be exposed over the internet. So what are your thoughts on that?
Shahar Binyamin 00:42:04 It’s a fair argument. Like if you want to, if you want to strip down what it does, it’s basically sequel to the world. Anyone can ask you any question and do you want to allow it or not? And we’re seeing so many different examples and companies thinking about this. The example I gave before when GraphQL is an internal hub of data and you have multiple teams that get REST requests, but everyone sent internally a GraphQL request to figure out to reply, right? That’s all valid. I don’t think there’s one way to go. I question the people that say there’s only one way to go. Because that’s not how create innovation. That’s not how you move fast. You have to move fast. You have to make mistakes and learn from them and learn from like examples that exist out there or tools out there. We don’t invent everything from again and again and again not to invent the wheel, but you still have to try new stuff. So I’m not saying GraphQL should be exposed for some companies and for some companies it shouldn’t. It really depends.
Priyanka Raghavan 00:43:09 On your particular use case, your schema, what you’re trying to expose and all that thing.
Shahar Binyamin 00:43:14 No, and how mature your engineering team is. Yeah. Because some companies are like super technical and want to innovate and some companies move at a different pace and they want to try things out in safer place, make risk. We’re seeing even legacy companies that used to use soap are moving saying, okay, it’s time to replace our stack. And now GraphQL is mainstream. It’s a very good alternative based on their use case. But you still want to move in phases. Maybe not exposing GraphQL to the world is part of your adoption phases. As your org is shifting and down the road, you might be willing to say, oh, we’re mature enough, we’re ready to expose it. It really does not only depending on the type of company you are, but also where are you in your journey of GraphQL adoption. And we meet companies in all sorts of adoption phases of GraphQL.
Shahar Binyamin 00:44:10 The way we map it is you have the ones who explore. They’re just like, let’s find our first GraphQL project, our first several implementations. Let’s connect this to all these databases. And that’s great. And then you have the second wave of it, phase off like, let’s put some safeguards for our developers and let’s start talking about schema checks and maybe operation registry and maybe linting. And then you also have a phase of acceleration when you really want to like advance monitoring, think about performance, maybe think about sub-graph and think about how do we make sense of all those errors that happen in our system? And do we have some sense of health check? From there they might or might not move to the intelligent phase where like, is there any BI in all this traffic? Can we have alerts?
Shahar Binyamin 00:45:01 Can we have anomaly detection? Can we identify how the schema change over time? And the last one, you might find your spot that the adoption journey does not mean you have to complete it. It’s like where you are at the stage of your company might be fine. The last one is really governance and compliance. Like this is a free nature API, do we know who can access what and can we answer during an incident who actually access what? This might not be a problem for most organizations. So five steps of adoption journey that not everyone needs to go through and not necessarily this specific order.
Priyanka Raghavan 00:45:40 Okay. There are some interesting takeaways there. I do see this whole piece on the observability piece, which you said is, seems to be like a big thing. You knowing exactly who is using your GraphQL APIs, how it’s being used, which is, really standing out through a lot of your, the advice you’re giving. The next question is, I wanted to ask about this GraphQL deployments. Can they be secured? Because I think I remember reading this article from you which says, DevSecOps must turn the tables on GraphQL API attackers. So can you tell us a little bit about this?
Shahar Binyamin 00:46:16 Let’s go back to that Kubernetes example that we started with saying like, it started with developers, it brought to DevOps team, and then the security teams realized two minutes later, hey, what is this? How do we secure it? How do we have monitor it? How do we gain control over it? What is it exposing? And really a whole ecosystem of Kubernetes security came around it. So we are at the phase of GraphQL adoption that it does not need to be an afterthought. You can from day one, even if you don’t have a federated environment, even if you’re just starting from day one, you already now can have the knowledge, the educations maybe the tools from day one to set your team for success. Because the reality is that the CISOs don’t know much about GraphQL today. They’re like, what do I have a GraphQL server? They’re like in this discovery inventory phase.
Shahar Binyamin 00:47:15 And when there will there be a massive incident of GraphQL down the road, people will wake up. So as you are representing your org at what stage you want your company to be, when they’re going to come for you, your own DevSecOps or your own security, how is your graphical security posture looks like? This is just an internal, my goal here is so to encourage everyone here to have an internal conversation internally with their own team. What is our posture? And that’s it. This a conversation like this alone will take you 8020, will take you far way down the road than you are today. So that is the goal of this like pieces, this blog post that we’re releasing, encourage education, encourage discussions. Because your org has a lot of smart people, I’m sure of it. Bring them to the table to discuss this new technology. We all love to introduce to org.
Priyanka Raghavan 00:48:15 Okay. So just to get back on the deployment piece, so you’re saying that the maturity should be like the same way, like how the Kubernetes, that whole ecosystem, how that developed, is that what you’re saying? So the, I mean is it like that? Is it at that stage or are you saying that we need to get better?
Shahar Binyamin 00:48:33 Both. Both I guess both. I’m saying, okay, there are things everyone can now include in their CICD deployment model that can harden in their deployment, right? You could run a scanner against your GraphQL. You could write QA scripts that try to access fills they’re not supposed to. You should ask yourself, is introspection open or not? You should ask yourself about rate limiting. All those rules you should be able to identify that the change in the schema is not breaking production because you remove the field or because it used to cost one millisecond and now it’s a hundred millisecond and you realize as part of your test development. So this all needs to happen, ideally happen in a robust deployment model early before it hits production. Because of all the challenges we talked about. The earlier you find it like in anything else the better the posture is.
Priyanka Raghavan 00:49:32 Okay. I wanted to ask the question like I think just to kind of try to marry in a lot of the things that you said. So do you have any top three recommendations to prevent GraphQL attacks?
Shahar Binyamin 00:49:45 Look . First one is observability for sure. You have to elevate those field level analytics, elevate sub-graph visibility, elevate errors in your system. Do you even know when sub-graph are returning errors about authentication? Is someone trying to access the field? Do you notice? So first, remove the blind covers and get intimate with your GraphQL traffic. That’s the first thing I would recommend. Second thing I would recommend is the really bare bone of query protections. Put some knobs around how query should look like and what does it do? And the third will most likely be rate limiting. Going back to the inbound rate limiting, give rules on how many specific fields can be asked in a query. And that’s a good baseline to start. And then you can have more advanced rate limiting once you have this in mind. But just make sure like the classic brute force attempts are prevented per query before starting having like a Redis database that started calculating rate limiting. Of course your, these are like more, most advanced things, more advanced things you have to start somewhere. So these are the three things I would recommend. Know what’s going on, which is observability. Put some controls of how query should look like and top-level rate limiting when it comes to field level.
Priyanka Raghavan 00:51:19 Okay. That’s great. So I guess the last question I want to ask you, and I want you to spend some time, is to tell us what exactly your company does in this space, in the GraphQL ecosystem and how is it helping adopters of GraphQL?
Shahar Binyamin 00:51:35 When we, when Iinigo started, kind of looked at the GraphQL ecosystem and we saw this like tremendous developer and open-source efforts to get started, build a lot of GraphQL servers, focus on database connectivity, allow people to get started. At the same time we saw those enterprise first GraphQL movers struggle to put this in production because a lot of the topics we talked about today, we talked about security, observability, maybe schema management. What does it mean to have schema checks and linting and graphical playground or a good implementations of operation registry. So we want to innovate at that space. We wanted to create a layer that does all the management layer that is needed for going to get organizations to feel confident with GraphQL that will work with any implementation of GraphQL. So think about it. So we built some sort of a middleware integration that will work with any open-source implementation.
Shahar Binyamin 00:52:39 So GraphQL from JavaScript to Yoga to Ruby to Apollo server, it really doesn’t matter. And once it’s connected to it, it provides all the building blocks you might need for your platform teams, for your GraphQL adoption to really continue this motion of journey. And then give you that building blocks you needed that will free you up even if down the road you need to make changes. Meaning moving from Python to Ruby, you still have a declarative management layer that will carry on with you. And something that is built as an enterprise grade solution that can handle tens of billions of monthly calls that have real time protection, not an afterthought that gives you the composition and registry that really empowers your organization to move forward. And we love GraphQL. That’s what we do every day. This is what the team is focused on every day. Our tools completely reading with GraphQL. And we love, we love the success stories. We love to see people looking at their field level analytics and say, oh, I like this. Or unique notions of error impact. Like, we have a lot of errors in the system, everyone has a lot of errors in their system. What’s important, how can we help them prioritize it? So we’ll go deep there.
Priyanka Raghavan 00:54:00 Yeah, I think it’s been very interesting for me also to look at this amount of analysis that you should do to at least secure your GraphQL APIs. So I think that’s, it’s been nice because I think you’ve asked literally the listeners to dig deep with some great strategy on observability query protection and rate limiting. I think those three things, I’ll sleep with that tonight. Yeah, this is great. The last thing I need to ask you before I let you go is where can listeners find you on cyberspace? What is your preferred way that people can reach you?
Shahar Binyamin 00:54:33 Well, at [email protected] you can always go to Inigo io. We have an active Slack channel. You can find me on LinkedIn, find me on Twitter, you can email me. It’s the best way, very responsive. We love hearing stories, we love learning from others because people went through a lot and whatever we can generalize, productize, and give, that we have a very generous free tier. So wherever we can give back to those who are starting or have a startup and growing one piece of mind when it comes to GraphQL we love those success stories.
Priyanka Raghavan 00:55:09 Great. I’ll make sure to add that on our show notes, at least your LinkedIn profile as well as the Inigo site, which I already have. This has been great. Thanks for coming on the show, Shahar.
Shahar Binyamin 00:55:20 It was a pleasure, Priyanka, thank you for having me. Really enjoyed the conversation.
Priyanka Raghavan 00:55:26 This is Priyanka Raghaven for Software Engineering Radio. Thanks for listening.
[End of Audio]