Matthew Adams, Head of Security Enablement at Citi, joins SE Radio host Priyanka Raghavan to explore the use of large language models in threat modeling, with a special focus on Matthew’s work, Stride GPT. The episode kicks off with an overview of threat modeling, its applications, and the stages of the development life cycle where it fits in. They then discuss the STRIDE methodology and strideGPT, highlighting practical examples, the technology stack behind the application, and the tool’s inputs and outputs. The show concludes with tips and tricks for optimizing tool outputs and advice on other open source projects that utilize generative AI to bolster cybersecurity defenses. Brought to you by IEEE Computer Society and IEEE Software magazine.
Show Notes
- SE Radio 416: Adam Shostack on Threat Modeling
- mrwadams – Overview
- Stride GPT
- Stride GPT App
- Threat Modeling Designing by AdamShostack
- Dummies.com: cissp/security-threat-modeling
- Threat Modeling – OWASP Cheat Sheet Series
- AI Exchange
- GitHub – NVIDIA/garak: the LLM vulnerability scanner – Open source vulnerability assessment tool
- GitHub – mrwadams/attackgen: AttackGen – cybersecurity incident response testing tool that leverages the power of large language models and the comprehensive MITRE ATT&CK framework to generate tailored incident response scenarios based on user-selected threat actor groups and your organization’s details.
Transcript
Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.
Priyanka Raghavan 00:00:18 Hello, this is Priyanka Raghavan for Software Engineering Radio. Today we have Matthew Adams on the show. He’s currently the head of Security Enablement at Citi, where his role focuses on driving security innovation across emerging technologies such as Generative AI, blockchain, and quantum computing. He’s here on the show because of his interest in finding practical applications for LLMs in cybersecurity, and today we’re going to be discussing using LLMs in threat modeling, focusing on its popular work, STRIDE GPT. Welcome to the show, Matt. I’m so excited to have you here.
Matt Adams 00:00:55 Thanks Priyanka. Great to be here.
Priyanka Raghavan 00:00:58 We have done a show Episode 416 on Threat Modeling with Adam Shostack. This is nearly about four years back, so looking forward to your take on this topic and therefore we’ll start with some basic questions. The first one being, what is threat modeling?
Matt Adams 00:01:16 So for me, threat modeling’s really about having a structured process to identifying potential security risks and vulnerabilities in systems. And that’s a long way of saying let’s look at what can go wrong and how we can address those issues to prevent them or at least lessen their impact. And I think for me it’s easy to get wrapped up in the complexity of threat modeling, particularly at the enterprise level. But I think it’s important to remember that we do threat modeling all the time as humans. So a common example I like to use is if you’re driving your car into a rough part of town, you are not going to leave it unlocked when you leave it, you’re going to lock the doors, you maybe even might decide that the level of threat is sufficient for you to buy something that fixes the steering wheel. Maybe if you live there long enough, you might even have an immobilizer attached to the car. All of these things, they just kind of fit into your natural threat model around the world and there’s really no reason why we can’t apply that similar approach of balancing risks to mitigations when we’re building systems as well.
Priyanka Raghavan 00:02:21 Great. That’s a super analogy, but one of the things we see is that threat modeling is not done very often and therefore just wanted to ask you, why do you think it’s important to do threat modeling for say, software engineers as well as software teams?
Matt Adams 00:02:36 I think probably the primary reason is that it’s just much easier and cheaper to find security issues during the design phase where you would typically start your threat modeling process. Earlier in my career, I was involved as a security consultant to the smart metering program in the UK, and threat modeling was a really key part of that program because at the end we were going to take millions of metering devices and install them into people’s homes. And so the cost and the impact of getting a design decision wrong at that point, because we hadn’t fully considered the threats to that system in the round, would’ve stretched into billions of pounds and a tremendous amount of disruption. So, I think that’s a really good example of why you want to do this as a discipline and to do it earlier in the process the better.
Priyanka Raghavan 00:03:22 When we had Adam Shostack on the show Episode 416, it was recorded by another host, I found one question very interesting. The question was what type of applications would benefit from threat modeling? And he replied, the ones that use technology, which is maybe tongue in cheek, but what do you have to say about that? What are your thoughts?
Matt Adams 00:03:42 So I’d agree with that, but I think there’s also a cost benefit analysis to be done to threat modeling as well. And while you might want to do threat modeling for all of those systems, I don’t think you can always justify the benefit that it brings because of the cost of the exercise and will come onto my tools and the efforts that I’m making in that area to make it simpler and cheaper to do threat modeling. But certainly at this point in time, for most organizations, it’s cost prohibitive because of the level of security expertise that you need to go and look at every system that involves technology.
Priyanka Raghavan 00:04:17 Just to follow up on that, you said it’s a bit cost prohibitive. So if I’m a small startup and I am running a whole set of tools like SonarQube for doing my static analysis and then CodeQL or Dependabot for my SCA, which is the software composition analysis and then maybe I use one of these open-source tools like OWASP ZAP and then run my APIs or UIs through that, now if I do all of this, will I still benefit from the threat modeling exercise?
Matt Adams 00:04:46 Yeah, so it sounds like you’re building up a really good stack of security tools there, but for me you need to ask the question of where to deploy those tools and where they’re going to be effective and to protect which assets? There’s also the question of will they be effective against the types of threats that affect your system? So it’s perhaps not good to deploy one type of tool when it’s not going to be effective against a particular type of attack. And threat modeling really helps to dig into those questions and give you answers. And that’s why I really like it as a discipline because particularly over my career, I think the focus has moved from just being compliant or just getting that certification against IA27-001 or SOC2 or whatever it is, to being able to demonstrate the value of security to the organization. And if you can tie the risks that your organization faces or that your system faces to the mitigations that you’re deploying and explain that in a logical way, I think that goes an incredibly long way.
Priyanka Raghavan 00:05:48 So then is it something that you just do once like in high level activity, or do you do it multiple times?
Matt Adams 00:05:54 So for a lot of organizations, again, going back to the constraints and sometimes how difficult it is to do threat modeling, it can be a one-off activity that’s done at the inception of a system. That isnít the optimal way to do it. But I fully recognize that for many organizations, time, and resource constraints, again to drive that frequency of threat modeling rather than having set criteria for guaranteeing when a threat model should be triggered. And I think in an ideal world what you would want to do is to get closer to the point where your threat model is triggered by even quite small changes to your application or the code base of that application. And you might even think about initiating a mini threat modeling exercise for each commit to a code repo, for example. And you could even take it to that level rather than the more traditional whiteboard driven approach that perhaps many of your listeners are familiar with.
Priyanka Raghavan 00:06:53 So the last question I want to ask you in this, the context that I’m setting up is we have a lot of apps right now. Everyone wants to create a Chatbot, they just know a little bit of Python and want to create a little Chatbot. So is there a set different kind of threat modeling that you would have to do for applications that are built or use LLMs or maybe Genie? What should we be thinking about? Is it different or would it follow a similar pattern?
Matt Adams 00:07:22 I think the fundamentals remain the same. There are some subtleties and unique aspects, particularly because of the emergent nature of those technologies. And there are some particular risks around prompt injection, model theft, that I think we’re still trying to come up with good answers and good mitigations to. But in terms of that fundamental approach that I outlined at the start of how your system looks overall and what defenses do you need to put in place to protect it, I think it’s still the same.
Priyanka Raghavan 00:07:53 I was reading this thing called the Threat Modeling Manifesto today, on OWASP just preparing for this show, it asks you like if you’re building an application, you should ask yourself four questions. One is what are you building? What are the risks for what you’re building? And then how would you mitigate those risks? And are you happy with the analysis? But like for a lay person, what are the basic elements of a threat model and how can I go about answering this?
Matt Adams 00:08:20 Yeah, so if we maybe break that down. So in terms of the What there, I think there are a few core elements to almost every threat model. So what is it that you’re trying to protect, first? So what is it the assets in the system? That could be the availability of the systems themselves, it could be the data that’s contained within them. Then typically you’d look at defining a trust boundary around those assets. So the point at which you consider different parts of the system to be trustworthy and there might be a difference between the public internet and your own private Cloud type environment. And there may even be separate trust boundaries inside your environment. So a particular enclave for storing secrets or other highly sensitive information. I think you then look at the attack surface across each of those elements and bearing in mind those trust boundaries.
Matt Adams 00:09:13 So how could you be attacked? Is there an obvious ingress point into that environment? Or there might even be several ingress points. We might also think about internal attack services as well. And then that brings me onto threat actors within that system or even externally to it and the capability that they have to attack that system and the techniques that they might use. I think that’s another key component. And you probably think about then the types of attacks that they might use. So different frameworks that perhaps will come onto that, talk about those different types of attacks. And finally, and this is a key part of the threat modeling process, you then also want to think about the mitigations. So it’s not just about enumerating all of the threats to really deliver the value of the process. We need to really consider the mitigations and how we can deliver those as well.
Priyanka Raghavan 00:10:03 So if I were to summarize it, you need to find your assets, your threat boundaries. Then you talked a little bit about the attack surface and that gives you like your threats and then the threat actors, which is important. And finally mitigations, this is super to start off on the definitions of the basic elements of a threat model. And if one were to go and Google threat modeling methodologies, there are scores of them out there. There’s something called STRIDE, there’s something called PASTA, there’s DREAD and so on and so forth. But this methodology called STRIDE, which is what your tool is also built on, seems to be very popular. So could you maybe explain that to us?
Matt Adams 00:10:42 Yeah, I think, and I’d get so far as to say that STRIDE is probably the most well-known threat modeling methodology. The key for that, I think is that it’s been around for a long time. First and foremost, it’s had time to get the adoption and the groundswell of fans behind it. And at its heart it’s just a really simple framework that I think is readily understood by people of all disciplines. You don’t need to be a security expert to understand that it’s trying to help you enumerate threats and different types of threats to your architecture. And I think that’s really where its strength lies.
Priyanka Raghavan 00:11:17 Is that something that you can like list out for us? Like it’s an acronym, a STRIDE stands for different things. Maybe you could tell us what each one is.
Matt Adams 00:11:25 So the S is for Spoofing, the T is for Tampering, the R is for Repudiation, the I is for, I forget now, .
Priyanka Raghavan 00:11:35 Information Disclosure.
Matt Adams 00:11:37 Yeah, thank you. Information Disclosure, Denial of Service. And then youíre really testing my knowledge and I’ve forgotten the E as well.
Priyanka Raghavan 00:11:46 Yeah, yeah, Elevation of Privileges.
Matt Adams 00:11:48 Elevation of Privilege, thank you. Yes. And so I think that really helps to then draw out the different types of threats to the system across quite a wide range. There’s also a more recent evolution of it, which is called STRIDE LM, where in addition to those five you also add Lateral Movement as well because of the importance of that vector in terms of threat actors being so successful at compromising and moving quickly through a system. So I think that’s also a recent addition that that people might want to take a look at.
Priyanka Raghavan 00:12:18 Okay. I’ll definitely add that to the show notes. One other thing I wanted to ask you was, one observation I noticed is a lot of these methodologies, whether it’s STRIDE or Dread, they seem to be coming out of Microsoft. Any insights on why you think this happened?
Matt Adams 00:12:34 I think that’s because probably Microsoft were really pivotal in terms of bringing security into the software development process. And again, this goes back to the 1990s and Bill Gates released a famous or infamous memo called Trustworthy Computing across Microsoft that really emphasized that his belief that if Windows and in fact if Microsoft were going to succeed as a company, then people needed to have trust in their systems. And they then started given the resources that they had as well. I think they had the technical ability, but also the funding and the opportunity then to go away and come up with some key methodologies such as STRIDE and Dread. I think were both of theirs, so yeah.
Priyanka Raghavan 00:13:20 Interesting. I didn’t know that, but I’m going to Google that and maybe add that to the show notes. And now I have to get to, what was your motivations or idea for coming up with your open-source repository, STRIDE GPT? How did you come up with this idea of using LLMs for threat modeling?
Matt Adams 00:13:36 Yeah, so this was around the time of probably GPT-3.5 and ChatGPT coming out. I think we’re pretty much at the two-year anniversary of that now. And what I was looking for was a use case in cybersecurity that I thought would be a good use case to apply these new tools and techniques to and looking back over my career as a security consultant and more recently a security architect, one of the processes that I felt could deliver huge value as we’ve talked about. And one of the processes that I felt people or lots of teams really struggled with was threat modeling. And so that’s really where the two ideas came together. And yeah, STRIDE GPT was born.
Priyanka Raghavan 00:14:21 Great. And your open-source repository of course is so widely popular with threat modeling enthusiasts, including me. And one thing is it’s very simple to use as one of the users, but another thing that I’ve also noticed talking to other people who also use the tool is a lot of us have started embracing this more than say the traditional tools like the Microsoft threat monitoring tool. I think that’s the name, right? Yeah. Do you see the same thing when you talk with people who use your tool
Matt Adams 00:14:51 In terms of why it’s perhaps more readily adopted yet? I think because I’ve tried to make it incredibly simple, and again, drawing on that experience of working with development teams and taking them through threat modeling processes and sitting in half day workshops more than I would care to remember, there was often that challenge around what do we mean by threat modeling as security professionals? And you start with a blank sheet of paper and you are then trying to articulate what you mean by a particular type of threats. And you go through that process. And when I was making STRIDE GPT, my focus was really on how we make it as easy for a developer to use or a non-security professional to use as possible. And I think that’s why in comparison to things like the Microsoft threat modeling tool or OWASP Threat Dragon, which is another modeling tool, is just so much easier to adopt because the barrier to entry is as low as I can possibly make it. And I’ve used those other tools, I’ve been with teams that have used them, and there’s definitely a learning curve. It’s like any software package that you need to use, but for STRIDE GPT, you just need to describe your application, which a developer and engineer should understand the core elements of the application, the core functionality, and that’s all you need to do to get it to produce a threat model.
Priyanka Raghavan 00:16:09 So I think that gives us a good segue to go into STRIDE GPT. The first thing I wanted to ask you is what are the typical inputs that one would give to the tool and what is the output?
Matt Adams 00:16:21 Yeah, so that input, as I said, is really just the application description. That’s the starting point for the whole threat modeling process through STRIDE GPT. And you can generate that description either manually. So you can just type it in, you could copy and paste it from a Confluence page or from some existing documentation that you’ve got, or I’ve also introduced some features that make it even simpler and quicker to produce that description, which is you can upload a diagram, so an architecture diagram of your system, and then there will be a separate call to the language model to produce a description of that architecture to then feed into the threat modeling process. So you can even do a threat model without writing a single word of a description. Laterally as well I’ve also added a feature where you can provide the URL of a GitHub repo and provide a GitHub API key, and the tool will then go off and just read through the contents of that repo and then generate a description again based on the contents of that.
Matt Adams 00:17:25 So that’s sort of the key input. You can also then define some additional metadata about the app, and that’s really just to help the threat modeling process. So for example, what type of app is it? Is it a Cloud native app? Is it serverless, is it standard web application? All of those things I think just helped to set the context for the threat modeling process. And then once you’ve done that and you’ve provided that description, the outputs are either a basic list of STRIDE threats, so typically maybe two or three for each category within the STRIDE framework. But you can also then go into different tabs. And these aren’t strictly part of the STRIDE methodology, but they are part of a broader threat modeling methodology for many teams. And so there are things like attack trees, which present a hierarchical view of different types of attacks.
Matt Adams 00:18:17 Those can be easily generated. I also provide the ability to generate a Dread assessment, so that takes the threats that have been generated during the first phase and will risk score them and prioritize them based on that risk score. You can of course generate the mitigations, as I mentioned, that’s absolutely key to the process. And lastly, you can also generate and output some test cases, which I think is a really interesting part because that then starts to close the loop on the threat modeling process where not only have you generated your threats, you’re generating your controls, but then you have a list of tests that you can execute to verify continuously, potentially those mitigations are in place.
Priyanka Raghavan 00:19:00 Okay, interesting. So the tests which are generated, are the based on a language or is it more like a test specification?
Matt Adams 00:19:07 So they’re written in a syntax called Gherkin, which uses I imagine particularly some of your listeners that are familiar with writing test cases will be familiar with Gherkin, but it uses a structure of a set of verbs to articulate to the test or the scenario that you want to test. And it’s just a really nice way of articulating the types of things that you would want to test for. You then need to turn that into a piece of code to go and verify that test. Ideally a piece of code, it could be something else, but ultimately you’re probably looking for a way to automate and execute that consistently.
Priyanka Raghavan 00:19:43 This is great. So you give it an input and then you also get the threats as outputs as well as the mitigations. And like you said, you also have some test cases available to tests. This is where I wanted to ask you maybe a specific example with the teams you’ve worked with or you’ve interacted with that use the tool which solved a particular security problem. Like what I wanted to ask was a better version of, if you were building a web application and you had a user interface that talks to a backend for front end that talks to an API and then you want to protect the user interface from the user. So you’re putting in a firewall, if you describe the application, well, will it give you mitigations? Like I can suggest to you that you have a good firewall of this particular type and therefore that reduces your vulnerability. Would the mitigations be very specific?
Matt Adams 00:20:37 So we can maybe talk about like how you can improve the overall performance of the tool, but in general, what the tool is doing is it’s leveraging the base training data from the foundation model that you are using when you are calling the LLM. So STRIDE GPT supports a number of leading foundation models. So either the GPT model series, it will also support Gemini models, some from Mistral as well. There’s even the capability to integrate it with some local models if you’re running a tool like OLAMA. But really what we’re leveraging is the general security knowledge that’s trained into those models. So if you want to go really specific, you’re going to need to provide some additional context. But even with the foundation, the fundamental knowledge that’s in those models, I’ve seen teams use it to generate novel valid threats that haven’t appeared in previous threat models that they’ve produced for a system and that they’ve definitely seen value from as a result of that.
Priyanka Raghavan 00:21:39 Would that be by giving a better input?
Matt Adams 00:21:42 No. So even just from running it as a, like the standard description, just describing the application and running the tool, I think people often focus on the inconsistencies of working with large language models, but actually there are lots of inconsistencies in people and sometimes they just have a bad day and they miss something during the threat modeling process. And we found a couple of examples of that with various users of the tool where they’ve then gone back and looked at it and gone, yes, of course that is absolutely a valid threat and that’s one that we should definitely pick up next time. And that was done even with, yeah, a relatively straightforward description of the application.
Priyanka Raghavan 00:22:21 So I just wanted to talk a little bit with another example just to kind of bring it to light. I was using the tool and I have an example, this is just one of my hobby projects that I’ve been doing, and I was trying to feed that to the STRIDE model. And what I have is it’s this application that is built is a web application and the data is not something that is, I would call it almost like internal data, it’s not public, but it’s something that’s used that I only like my services to know about. And the authentication that I used for this was at SSO authentication and the tool suggested as a mitigation because I’d given the SSO authentication that I should probably slap on MFA with it, which is good, but knowing my application, I think it was maybe a bit excessive. It’s good enough just having the SSO. So the question I’m asking you is sometimes the tool produces output, but you still need to use your better judgment when using the output, right?
Matt Adams 00:23:24 I think that’s a really good example of where language models don’t necessarily grasp the risk appetite or organizational policy and standards that would influence the controls that you would ordinarily implement. If you can provide that context, then you can address that challenge, but out of the box it won’t understand the broader sort of organizational concerns that you will have or necessarily the need to balance how you would effectively use your security resource. And perhaps having MFA everywhere isn’t something that you necessarily want to consider.
Priyanka Raghavan 00:23:59 So here I wanted to ask you about an example where you said, when we talk about Gen AI and actually large language models, one of the problems that people often complain about is hallucinations. So are there any tips that you’d suggest reducing this? Because that can obviously give bad output because sometimes even when you’re giving the same inputs, you don’t necessarily might get the same output back. Any tips for that?
Matt Adams 00:24:33 As a general rule for working with large language models? The more you put in, the more you’ll get out. And, where you might particularly see hallucinations is where you use acronyms or other jargon that’s very specific to your company that won’t be in the training data. The model can just get a bit confused by what it is that you’re asking it to assess. And so, you might talk about, for example, a vault service. Now it might interpret that as HashiCorpís vault or it might also interpret it as some CyberArk vault or which have got slightly different characteristics or it might interpret it as a physical vault and it’s really the lack of specificity that’s tripping that up. Whereas if you can just provide that extra bit of detail around it’s okay, we’re retrieving secrets for machine-to-machine communication and authentication and you give it the specific name of the service that’s doing that, then that gives a long way to addressing that challenge of hallucination.
Priyanka Raghavan 00:25:38 What I’m hearing is to overcome that hallucination problem, make sure that your inputs are as specific as possible, avoid acronyms and maybe give a specific example of the tool you are using. For example, could it be a service like HashiCorp, Walt or an S3 bucket, et cetera, or even a Postgres SQL database. So the next question I wanted to ask you is the application is built using Streamit. Why did you use this to build the application?
Matt Adams 00:26:12 My focus as I mentioned when thinking about the concept for STRIDE GPT, was to look at how could I apply LLMs to solve problems in cybersecurity. And for me the interaction of the LLM with the use case is more important than the UI and think stream. It has a couple of things that make it really easy to work with in terms of quickly building a UI. I generally work in Python as well, so that’s another positive because it’s a Python framework and really it just makes it super simple to get going and to deliver something that’s well presented and I think pretty readily understandable by users as well. I think it’s also got a really nice feature in the Streamit Cloud as well where you can host tools for free providing that you don’t exceed the container thresholds for memory and compute, which I think are pretty generous and they’re ideal for POCs or demo tools just like STRIDE GPT.
Priyanka Raghavan 00:27:16 Yeah, I was actually going to ask you that. So typically, where do you deploy this? You do it on the Streamlit Cloud, but if someone was to use your tool, is there a danger of deploying it onto Streamlit Cloud? They were to use it for their organization.
Matt Adams 00:27:30 So I wouldn’t recommend using Streamlit Cloud for production, definitely not, but just by cleaning the repo you can run it from a local machine I think most people use it on a local machine. I’m aware of some companies that have used the Docker container image that I provide as well. So that typically is a really good way to get that onto a Docker environmental Kubernetes cluster inside your enterprise. And then I think that gives you obviously much more confidence about the data that’s going to the tool and I think that might be going on within the infrastructure to log the inputs. I do want to say that one of the reasons why I use Streamlit is that it doesn’t retain any data. So when you provide your API key, because I can’t pay for everybody’s inference bills for open AI or Mistral or whichever service you’re using, but if you provide your API key, it gets stored in this session for that browser session. But if you refresh the page, it’s then removed from the session and there’s no logging of the keys or anything else or the information that you provide on the backend. And that’s another reason that I really like the Streamlit platform because as a security professional, I don’t want to be holding other people’s API keys or system descriptions. I just want to give them a proof-of-concept tool that they can experiment with and see the power of applying these large language models to this particularly interesting problem in cybersecurity.
Priyanka Raghavan 00:29:03 And another thing which I liked about Streamlit is also the fact that since it’s built on web sockets, I think I find the speed of the UI also very fast. The next thing I wanted to ask you, which is something that teams ask is where do you generally store the threat models? Right now the output of these threat models if you’re using the tool as, stored as an empty file and is that something that you would store closer to your code repository or is that not a good idea?
Matt Adams 00:29:33 I’d say whatever really works for your team and the way that you work. So the reason why the version that’s on Streamlit produces markdown files is because I think that’s just an easy format of output to work with and you can take that. And there are markdown to PDF converters that are readily available equally if you want to take it into some other format, it’s very easy to strip out the markdown syntax and work with that. But I’m aware of use cases where people want to take the threats and generate them in a very specific structured format, so that might be semi-structured in terms of CSV or perhaps even JSON. And they would then take that to import into something like a Jira ticketing system because what they ultimately want is for each threat to be represented as a ticket that they could then assign a corresponding mitigation ticket to. So that’s one approach. Some teams also just want really detailed confluence pages and don’t take the ticketing approach, but they want really thorough documentation. So I think really whatever works for you and your team as a workflow and the important thing is just to get threat modeling.
Priyanka Raghavan 00:30:43 I guess the question I was asking was, there’s some people who also worried that if somebody gets a hand on a threat model, they can attack the system because they know what the threats are. So should you be taking more steps to our back control, the access to that threat model, I guess? That’s where I was coming from.
Matt Adams 00:31:03 Right. Okay. Yeah, so I think definitely threat modeling information is certainly more sensitive or some of the more sensitive information that you’ll have in an organization. I’m always wary though of having that security by obscurity impact and that want to say that your system should be designed to defend itself whether or not the threat model is known by an attacker. And so yes, just for general in-house management, you would probably want our back controls around it to keep it to your threat modeling teams and your developers, but ultimately your system should be robust enough to survive that information being exposed.
Priyanka Raghavan 00:31:41 While you were answering this, I wanted to ask, where would you use this output of the threat model? I think one of the cases is the generation of test cases anywhere else that you use the threat models or teams, where do they use the output apart from of course writing the code based on the mitigations?
Matt Adams 00:32:00 Yeah, I think I’ve sort of touched on this already, so perhaps in a more mature threat modeling process, you would want that record to go into some sort of ticketing system so that you can identify when you’ve applied the mitigations and even when you’ve written them the test cases to verify that those mitigations are effective and continue to be effective. I think that’s really the key output.
Priyanka Raghavan 00:32:24 You have a lot of other open-source projects which are all built in Triton. For example, I have seen this tool called Attack Gen, which is an incident response tool. Can you maybe talk a little bit about that?
Matt Adams 00:32:36 So Attack Gen really came to me as another way of helping security teams address a problem that I’d seen throughout my career. And listeners that work in financial services in the UK which is a relatively small community across the global community, will be aware of a program that was driven by the Bank of England around threat driven red teaming. So they would get very specific threat intelligence that related to the UK financial sector, and they would then instruct a consultancy to go and test organizations against those specific threats. And those were really valuable exercises, but because of the time taken to research and plan them, they could only be performed for a very small percentage of the total financial market and sector. But the approach really struck me as a good one because you were actually testing against or simulating a realistic threat to your organization.
Matt Adams 00:33:39 And so when suddenly with again ChatGPT or GPT-3.5, you suddenly get the ability to generate text at an amazing volume, I started to think about, well maybe we could generate incident response testing scenarios using these tools. And that’s really what Attack Gen is. So you will give it either a particular threat actor group from the MITRE ATT&CK framework, or if you have your own threat intel that that there are particular techniques and tactics that you want to test, you can then build your own custom scenario effectively just by selecting those techniques. And then the tool will within 30 seconds build you comprehensive incident response testing scenario that covers all of those techniques that you’ve specified and it will allow you to then tweak that as well to your specific requirements.
Priyanka Raghavan 00:34:33 Impressive. So one of the things I did notice in a lot of your projects, I see one of the motivations for the open-source LLM security tools seems to be in reducing the time to do these activities, right? Do you have any numbers for example? I can say like for me previously I’m quite passionate about threat modeling. I’ve been doing it for quite some time again, again you said, using a lot of the tools, but the problem was that it would always take me, even for a simple project, it would take me about by the time that I could generate the threats, but then you’d have a meeting with the team and then discuss them, mitigations and the whole thing would take, even for a simple one would take me like about a week. And now I feel that productivity is really increased too.
Priyanka Raghavan 00:35:18 I’m generating the threat model in about five minutes, maximum maybe five, 10 minutes. And then once I have all the inputs and then it’s just that question of having that one meeting to sort of go with the mitigations. So for me, in terms of productivity, we are also not dependent on a security professional. You can tell the teams, please go ahead and generate this and more threat marketing is done. But what would you say in your experience what are the kind of numbers you’re seeing in reduction of effort?
Matt Adams 00:35:48 I’d say that’s pretty representative to what I’ve seen in the, for a reasonable quality threat model of most enterprise systems and applications. It’s obviously hugely dependent upon the input scope, but let’s say that might take you 20-25 hours to produce and work on one, to produce an adequate threat model using something like STRIDE GPT, conservatively probably you could cut that time by 75% upwards of that even. And that’s still allowing for diligently verifying the outputs, the risk ratings, the threats that it generates. And so still giving yourself a good maybe half a day to go and do that. But that’s a huge win for what are expensive resources and scarce resources within an organization. If we think of security architects.
Priyanka Raghavan 00:36:43 And it’s this similar thing for the incident response tool, also you’re seeing similar kind of numbers?
Matt Adams 00:36:49 That’s probably even higher because threat modeling is, as people that have practiced it is a relatively complex task and there’s a good amount of, some is intuition, some is experience, and some is just down to the particular style of the architect. It’s quite an opinionated practice to just generate a scenario based on a list of techniques from the MITRE ATT&CK framework. The input to that is maybe half a dozen to a dozen different techniques from various stages of the kill chain. And then once you’ve done that, which again you’re probably getting from your threat intel provider, if you’re paying for a service like that, once you’ve done that into the tool, it will generate that within a minute. Even if you’re using the most advanced model that I support at the moment, which is the O1 preview model from OpenAI.
Matt Adams 00:37:44 It’s quite interesting to see the level of difference in the outputs that’s generated by that model and the additional thought that goes into, or the additional reflection that goes into generating the model because it will also then talk about much broader considerations around the scenario. So not just how are you going to be attacked, but if you are running it as an exercise, which stakeholders should your insurer in the room, how should you plan your time? Those sorts of things, which, when you are using a smaller, simpler model, it just focuses on generating a realistic scenario based on the tactics. So yeah, I’d say that’s probably more like 90, 95% improvement in productivity for that one.
Priyanka Raghavan 00:38:25 This is actually fantastic. Matt, I really feel you’re doing a lot for the community and that’s why I wanted to ask you this next question. I’ve been reading this blog by a person called Ross Hallauer. I follow him on Twitter, on LinkedIn, and I came across one of his blog posts, which I’m going to put in the show notes, but it’s called Lifting the World Out of Cybersecurity Poverty. And he has a quote from his blog, and I’m just going to read that because I don’t want to paraphrase it. What it says is, there is about 185,000 small businesses in the United States, which constitutes about 99.9% of the American businesses and small businesses pay 39.4% of private sector payroll and generate 32.6% of known export value. Despite all the importance of the small business market, I would estimate that over 90% of the cybersecurity startups are built as enterprise first solutions. So was your quest for doing a lot of these open-source LLM security tools a way to alleviate cybersecurity poverty and help these SMBs or small business units?
Matt Adams 00:39:42 I think that concept of cybersecurity poverty is an interesting one and certainly probably more with the attack gentle then I was aware having been a consultant and been involved in incident response testing scenarios and the planning for that, how costly they are, and also being on the client side and having to do them as well. And knowing that you may only get maybe one, two opportunities a year to do that kind of exercise given the stakeholders that you want in the room and the amount of preparation that it takes. And then suddenly you have this ability, or it’s struck me that you would have the ability to even go to the point of picking parts of a kill chain and that you were concerned about and then testing those maybe just with your security analysts and running through a very quick scenario and that then became possible using that kind of tool.
Matt Adams 00:40:37 So I think there’s a huge role for open source to play in helping to close that poverty gap. If tools like STRIDE GPT and Attack Gen and others that I’ve released help to do that, then I’m really happy about that. I think though that what I’m trying to do primarily with those tools is address those people out there at the moment that, and I think they’re getting fewer number as the months go by, but who look at large language models and Generative AI more broadly and say, there’s no value here. I don’t see the use case and what I’m trying to do with these tools is that, there absolutely is value and that these are the worst versions of these models that we’re ever going to have to work with. And I think looking back over nearly 20 years, there are so many things in security that I know that security professionals want to fix and do better, but we just don’t have the time or the resources to do it. And I think what has emerged over the last couple of years strikes me as an amazing opportunity to go and resolve a large number of those issues that have been dogging us over the last 20-30 plus years.
Priyanka Raghavan 00:41:49 Absolutely. Couldnít have said it better. It’s fantastic. I was going to ask you two questions to kind of wrap it up. The first one was, what are the other open-source security projects you follow that we should look up as software engineers or developers or architects or teams? Are there anything that you’d list out that I can obviously add to the show notes
Matt Adams 00:42:14 More generally? The OWASP projects are amazing projects. There’s a huge amount of value and I know a number of the project leads that pour in an amazing amount of time and dedication to helping those projects succeed. So, I’m sure your listeners are aware of many of those, but definitely worth a look at the OWASP AI security project that’s out there, the top 10 for large language models and the new, the newer governance frameworks that have come out for large language models as well. I think more specifically, one of the projects that I really like is a tool called Garrick, which I’m sure we’ll leave a link to in the show notes. And that is an open-source LLM vulnerability scanner. And because of those threats like prompt injection being so new and so novel, I think it’s really encouraging to see an open-source project out there addressing some of those absolutely leading-edge challenges that we have as security professionals to secure those systems.
Priyanka Raghavan 00:43:16 I make sure to add that to the show notes. I think I found the OWASP AI security page. Fantastic. So just been reading it the whole day. Yeah, it’s useful for anyone who’s interested in how to secure systems built using, Gen AI or large language models or ML. The last question I wanted to ask you, there’s this concept of using AI to fight AI was like good AI versus bad AI sounds a bit dramatic. What are your thoughts on that?
Matt Adams 00:43:48 I think we probably will get to that point if we’re not already at that point already. It’s just you would look at the more traditional predictive AI that’s been baked into cybersecurity tools for actually a number of years now. What I think we have seen on the adversarial side is that just like mediocre developers such as myself have been improved greatly by having access to large language models then. So it seems that the lesser skilled threat actors have been able to up their game using large language models in terms of improving the level of robustness of their code, the degree of automation that they’re able to achieve stringing different attack types together. What I don’t think we’ve seen yet though is nation states that are then using some of their existing exploit data, which I think we can safely assume that they have to then train a model that’s specifically designed to leverage very sophisticated malware or cybersecurity attacks against organizations or other nations at scale.
Matt Adams 00:45:00 And I’m thinking there about the, something like the eternal blue malware that was leaked from the NSA if you take exploits like that and you train models on them, that I think will be a significant day if and when that occurs. And our best bet at defenders is to look at how we can leverage AI for the more positive use cases of covering as many systems as possible, making sure that we’ve done everything we can to go through the threat modeling process, understand where we might have issues, and use the AI to scale our security processes to make our defenses as robust as they can be.
Priyanka Raghavan 00:45:39 That was really great. Thank you again for coming on the show. I think I’m done with my questions. Where should people find you on cyberspace if they wanted to get in touch with you?
Matt Adams 00:45:51 So if you find me on LinkedIn, my profile is Matthew RW Adams, or you can find me on GitHub. I’m mrwadams on GitHub and you’ll find the links to my repos there. And again, I’m sure we’ll drop those into the show notes as well.
Priyanka Raghavan 00:46:07 Yes, I’ll definitely do that. Thanks for listening. This is Priyanka Raghaven for Software Engineering Radio.
[End of Audio]