Search
Luke Hinds - SE Radio Guest

SE Radio 680: Luke Hinds on Privacy and Security of AI Coding Assistants

Luke Hinds, CTO of Stacklok and creator of Sigstore, speaks with SE Radio’s Brijesh Ammanath about the privacy and security concerns of using AI coding agents. They discuss how the increased use of AI coding assistants has improved programmer productivity but has also introduced certain key risks. In the area of secrets management, for example, there is the risk of secrets being passed to LLMs. Coding assistants can also introduce dependency-management risks that can be exploited by malicious actors. Luke recommends several tools and behaviors that programmers can adopt to ensure that secrets do not get leaked.



Show Notes

Related Episodes

Other References


Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Brijesh Ammanath 00:00:18 Welcome to Software Engineering Radio. I’m your host, Brijesh Ammanath. Today we’ll be discussing privacy and security of AI coding assistance with our guest Luke Hinds. Luke is the CTO of Stacklok. He created six stores while a distinguished engineer at Red Hat. He’s a security thought leader, engineer, public speaker who loves building open-source software and communities, as well as leading talented engineering teams to develop innovative cutting edge security technologies at scale. Luke, is there anything I missed that you would like to add to your introduction?

Luke Hinds 00:00:49 No, you pretty much caught it all there. Yeah, no, very succinct to Nonpoint. Thanks

Brijesh Ammanath 00:00:54 Luke, if you can start off by explaining the increased role of AI coding agents in improving programmer productivity.

Luke Hinds 00:01:01 Certainly. So this really harks back to 2017 when the paper came out of Google around the Transformers architecture. The paper was called Attention is All That You Need. Speaking about the attention mechanism. Prior to then we’d had AI machine learning own had been around for quite some time, but we hadn’t really seen any killer applications of it beyond statistical analysis and people would use it as a business tool essentially. And so we started to see these sort of general-purpose models come out. And then quite early on there was of course GitHub copilot. And this is where it was quite astonishing really, even though the quality was not quite there, the ability for these machines to generate code. It was noticed as being a very clear emerging use case application of these models was how good they were at generating code. And initially we started off with a system called ìfill in the middleî.

Luke Hinds 00:01:59 And ìfill in the middleî is auto completion of code. And what happens is the code is generated based on the prepend and the append. So the content that surrounds the prompt or effectively where the cursor is situated within an IDE. , so the prompt would fill up the code that’s above, so 10 lines and the 10 lines below the cursor. And then the model would have to predict what is in the middle. So it was called ìfill in the middleî. FIM is the term that they use. So I think this was one of the first times that we started to see AI really start to get some traction with software engineers. And you got this auto complete magical auto complete functionality that came around. And that was really where there would’ve definitely been a marked increase in productivity for people that were using that because as a software engineer there would be a lot of tasks that you would perform where you wouldn’t really need to exercise any level of really pushing your brain to the limits.

Luke Hinds 00:03:02 There’ll be things such as hashing out structs and setting up functions and just sort of general housekeeping type tasks. And AI proved to be very good at doing that. I remember when I started to use co-pilot myself, I was quite surprised at how it would understand precisely the response format that I would need to set out for a struct or it would somehow magically understand what I needed for a Json structure. And this was where we really started to see, I think a productivity increase was in a lot of ways it was a tool as a developer tool in much the same way as when Shells started to become more purpose to developers. And you started to get that sort of rich experience within a terminal and people made them very much their own thing because it improved their productivity essentially. And that was the start of the productivity gain that has happened within software development. I think there is a lot more future applications of AI that will increase productivity and will help a lot to reduce churn. It will take away the grind of a lot of stuff that engineers have had to do, they’ve had to live with and accept. I think we’ll start to see AI factor those away over time as well.

Brijesh Ammanath 00:04:14 Alright, we have covered GitHub Copilot in one of our previous Episode 533. I’ll make sure we’ll link it to that. Have any studies been done to quantify the improved productivity by using AI coding assistance?

Luke Hinds 00:04:27 Yes, there are. Now people have different views as to how accurate they are, but one of the ones that I heard was it was related to GitHub and it was around the region of 45 to 50%. I cannot remember the exact figure now, but there’s quite a substantial claim that productivity had been significantly raised through the use of AI.

Brijesh Ammanath 00:04:49 That’s quite a large number.

Luke Hinds 00:04:50 Certainly is, certainly is. And we have noticed a marked ability to be able to move more quickly with what we build. Thank you to AI. So, I’m part of a startup and we use AI very effectively for prototyping. So we wouldn’t really trust it in a large code base. It just doesn’t handle large code bases so well. But when you very quickly want to prototype and build something, that’s where AI is exceptionally strong. To move from zero to something, you can act very quickly. And that’s very useful in a fast-moving environment such as a startup where you need to quickly validate ideas, you need to connect up and integrate different systems. And AI can be great there because you just want to see something wiggle, you want to see something working, you want to prove the hypothesis to yourself. And that’s where I found AI. I’ve really embraced AI myself, is to do a lot of tasks where I need to move fast, I need to be reactive, I want to try something out, I want to validate something. That is where AI’s really come into its own for myself and some of the engineers that I have in my company working on producing software.

Brijesh Ammanath 00:05:58 And what are the key risks introduced due to the reliance on AI coding assistance?

Luke Hinds 00:06:03 So there are many, , some of them are old risks within the new world. We looked at coding assistance and did some studies on their security. What are the risks? And I think one of the clear ones that were became immediately present was data exfiltration. Even if unwillingly the leakage of secrets, tokens, passwords, personal information, essentially coding assistance, they rely on context. Context is really important. So context is essentially snippets of code. They need to understand the entire, or at least a section of a syntax tree within code so that they can make changes without breaking of a functions or methods that belong to a particular class that they’re working within. So they like to grab entire files. That’s one of the things you’ll see quite often when you’re working with a coding assistant such as cursor or copilot. They will take a whole file via the IDE, they’ll load it into a prompt and then that prompt will then be sent to an endpoint where it’ll be canonicalized and serialized and put into a format that the LLM can then make a prediction around what the task is at hand.

Luke Hinds 00:07:22 It’ll start to generate code. And quite often these files can be — it could be cryptographic keys, they could be dotm files where people have database passwords and perhaps for a production site it could be API tokens quite often I spoke about earlier, being able to prototype very quickly. That’s one of the useful things for Ai. We’ve all done it as software engineers. You are writing a single script, you want to see an API work you hard code the token in because you’re not going to push it to GitHub, you’re certainly not going to deploy it, but you just want to see the thing work. So you take a token and you hard code it and those effectively are captured by the idea and they’re all sent out. And these keys are, I mean it’s, you could discuss the receiving party could be trusted, it could be a large corporation, Microsoft Azure or Anthropic or wherever the cloud service is running.

Luke Hinds 00:08:18 They may well be entities that you should never leak keys, but they’re not likely to exploit you by receiving those keys. But I think the other concern is that quite often the data that is received is used to train models. So hypothetically it’s not too farfetched to have a scenario where a large language model may spit out a token that it’s learnt during its training phase. It could spit out a password, it could spit out a bit of personal information. You may have a spreadsheet that you, by accident put within a folder that contains code and that could be picked up by the IDE and it could have pay slip information, social security numbers, medical information. And this data can then be used for reinforced learning. It can be used to train the model because data is key, effectively data’s the key value that’s derived from a lot of these services.

Luke Hinds 00:09:11 So those are the key risks I think for developers is that it’s very easy to leak information. For the actual models themselves, I think some of the other risks are quite often they have a distant training dates. , so knowledge is a model is tuned and the knowledge is recent at the point of where that training happens. Now afterwards knowledge becomes stale effectively. So it is at a certain point in time, it’s a snapshot in time. And because these models are very expensive to do full training, it’s not something that happens every week or month even. Sometimes it can be months that will go by, a model will be in circulation and then the next new frontier model will come out. And during that time security risks can happen. ? Packages can become malicious, maintainer accounts can be overtaken, vulnerabilities can be discovered, critical vulnerabilities and the large language model will have no implicit knowledge of that because of its training cutoff date.

Luke Hinds 00:10:13 So there is a chance that a model could recommend dangerous content effectively. It could introduce security risks into your code because it doesn’t have that current knowledge that it is a risk. I guess one way of highlighting this, if we remember Log4J, , this is sort of way before the current batch of large language models and the transformer architecture was in public use. But you imagine a model of being trained, say a week before Log4J and then the next new model would not come out for four or five months later. It could quite easily be creating people’s dependency files for when they want to create a Java project. And then in there is Log4J effectively. So yeah, I’d say stale knowledge cutoffs are, one of the things that we look to do in Stacklok was that we use an embeddings model and a retrieval augmented generation system to bring in the latest threat intelligence around, what’s dangerous, malicious archive deprecated and so forth.

Luke Hinds 00:11:11 So there are ways that you can get more fresh recent information in there. The other, as I said is leaking. That’s something else that we’ve also looked to address on the tail end of our research. And in general, just hallucinations. Models will come out with functions that do not exist. It will be sketching out some code that uses an upstream library, a dependency, and it will just make up names. It will come out with function names that don’t exist. And there is a chance that if that’s happening at a high frequency, then somebody could attempt to kind of squat on these hallucinated names and weaponize them. So those are some of the key risks. And I mean there are other things such as prompt injections and insecure models, models have you know, they use PyTorch and you’ve got, it’s not used so much now, but Pickle, which was susceptible to various attacks and they started to use something called safetensors, which is a lot safer now around the execution of code. But I would say those are the key aspects, the key security concerns.

Brijesh Ammanath 00:12:14 Right. Quite a few key points over there and we’ll double click on two of them specifically. One is secrets management and the other one is dependency injection, in this session. It’s a good into the next theme of the podcast, which is around secrets management. You’ve elaborated on what kind of secrets could be leaked, but what risk does the passage of secrets to LLMs pose? Is it primarily that once it’s used in training it could be exposed? Or are there other risks because of LLMS having that secrets within its context?

Luke Hinds 00:12:50 I guess it would be who receives the secret. So there are some organizations, corporations that we’re going to feel a level of trust towards, because they’re a large business that has a reputation. Why would they want to use an individual’s key to exploit a system? It’s not very likely to happen. So I would say with these large vendors that have these frontier models, the risk is much likely going to be that the large language model regurgitates a secret or a key that it’s discovered. Now, if you were to use a provider who is less well established, does not have a brand that they’re going to be very protective of, does not have a core business that’s financially lucrative enough that they don’t need to resort to nefarious malicious actions, then that could be a risk. I mean this was one of the things that people perhaps rightly or unrightly inferred for DeepSeek.

Luke Hinds 00:13:47 When they put up an API, I had absolutely no opinion myself there. People were effectively saying your secrets are being leaked to a company which operates outside of the jurisdiction of where most software engineers and companies tend to operate. So I’m actually not insinuating anything at all there myself, but the DeepSeek people seem like very bright, smart people. But there was this element of if these models are the inference points are much wider and there are many more providers, then obviously your secrets are going to be leaked to many more APIs, many more cloud services, they’re going to be stored in many more logs and monitoring systems and so forth. And so somebody effectively with bad intentions could possibly exfiltrate those secrets and use them.

Brijesh Ammanath 00:14:33 And how easy is it for malicious actors to extract secrets or secret data from LLMs? Is it through prompt engineering or are there other methods they use?

Luke Hinds 00:14:46 It would be through prompt engineering. I would say it’s not easy. It would require probably a prompt injection attack to get the large language model to break out of its safeguards. So large language models are trained so they have a set of guardrails around their certain dialogue that they will not enter that consists of physical harm or hate and so forth. And security is another one of them. If you ask a lot of large language models, tell me how to write a Bitcoin miner or tell me how to write a script that exploits somebody’s computer, it will refuse. It has its guardrails, but they can be broken out of this. There’s a sequence of words that will use which will allow them to drop all of that context, that protection. And that is when you could no doubt likely get them to reveal some of what they’ve ingested. I must admit it’s not something I’ve achieved myself or I’ve done myself, but prompt injections are not as difficult as you would imagine. A recently when Gemini came out, I found one myself that allowed me to break the model completely out of its sort of safe controls. And that is a concern effectively is people break the system outside of its guardrails and then it could start to reveal some of that information.

Brijesh Ammanath 00:16:02 Have there been known cases where secrets have been leaked due to AI generated code? Any examples come to mind?

Luke Hinds 00:16:09 Have been leaked from the large language model or from a user’s environment?

Brijesh Ammanath 00:16:14 Due to use of AI generated code?

Luke Hinds 00:16:16 I don’t know of a large language model in the wild generating code that uses a secret that like a cryptographic token. I imagine there’s probably quite a few passwords out there, but it’s not something I’ve seen myself.

Brijesh Ammanath 00:16:31 , so nothing like the Log4J example, which was before LLMs and before AI generated code. Nothing major has hit the news?

Luke Hinds 00:16:39 Not as yet. But I have a feeling something large will arrive. I don’t know of any particular attack that’s about to manifest, but I’m pretty sure with you on quite soon.

Brijesh Ammanath 00:16:51 Hopefully not. Fingers crossed.

Luke Hinds 00:16:53 Yes. Yeah.

Brijesh Ammanath 00:16:54 Are there best practices developers can follow when using AI generated coding assistance beyond just secret scanning and package validation?

Luke Hinds 00:17:03 Yes. So it’s very easy to switch off when AI’s generating code. Now people will have a different propensity to how much they do that, but there’s this, they call it vibe coding in which is where you essentially let the AI just keep spinning. You auto approve everything, and you allow an AI just to have unfettered access to building an application. I’d say the tighter that you can review the code that AI generates, the better. It’s much more likely that if you read the code, you scrutinize the code, you’re going to notice certain patterns that are insecure, certain ways that code can be generated. Where if you really look at it, you’ll start to see very simple sort of top 10 OASP type risks that could be introduced. Shell execution, unsanitized input is one that I see quite often LLMs will generate. So always try to sort of stay close to your code, try not to switch off too much and let it just have free reign to generate what it wants.

Luke Hinds 00:18:02 Most senior engineers will probably be nodding the heads going, oh yeah, tell me something new. But something that’s very key. I think as newer engineers start to come in and start to use these tools as more of a first go-to approach effectively, I’d say that’s one of the key things. And then around information leakage awareness, really be aware that anything that’s within that folder is likely to be leaked. So use environment variables of course for secrets or even better use a proper cryptographic system where you have something like Vault or even a hardware security module or a UB key or you start to leverage these technologies so that the secrets are not in the code in the first place. And then there’s also Code Gate, which is something that Stacklok we built. It’s an open-source project which will provide those protections for you. You won’t even know that you’ve been at risk of leaking something because it will be redacted on the fly. So we use lots of technology to sense secrets leak and that’s a free to use open-source project that anybody can use with their coding assistance.

Brijesh Ammanath 00:19:08 Why are existing productions like Gate ignore or cursor ignore? Not enough.

Luke Hinds 00:19:13 I think really because environments — I don’t know those systems too well, but the truth is that when AI is operating on code — I mean code is forever changing. It’s like a .gitignore could be good to stop you accidentally checking secrets into your GitHub repository. You could put an dorm file in there. You can kind of block these things from being pushed to a repo. I’m not too aware of the dot cursor rules whether they allow to block secrets or not. But quite often it’s not very deterministic where you’re going to put tokens, where you’re going to put keys, you accidentally get a medical report in your Gmail, you do Save as in the browser. You forgot that the last time you used the Save as feature was to download an image that you wanted to use on our website and oops, I’ve saved as my medical report’s gone into my code repo folder and now it’s been grabbed by the IDE and set up to the cloud. I would say things are too arbitrary really to rely on a having to constantly update a text file on your system.

Brijesh Ammanath 00:20:22 Agreed. So .gitignore our cursor. Ignore is more about stopping the file being checked in and not from…

Luke Hinds 00:20:29 I believe so, yeah. Like I said, I’m not too aware of the dot cursor what they’re doing there, but if it is like .gitignore and it stops the ID from processing the file, like I’d say things are a lot more transient, really. They’re not as fixed that I think that that will be a suitable protection system.

Brijesh Ammanath 00:20:46 Right. What techniques does Code Gate use to ensure that secrets are not accidentally leaked?

Luke Hinds 00:20:54 So we do several things for personal information. We actually use system being in part of machine learning for a while near entity recognition. So we have a small model which is able to recognize credit card numbers, medical numbers, social security numbers, personal details, phone numbers, emails, any sort of PII. This model is able to effectively match quite well. We also have forms of sensing technology for keys and passwords. So we use entropy detection, different systems there to see that something is likely to be a password or a cryptographic token, an API key. We leverage these machine learning techniques to be able to pattern match effectively sensitive materials. And then what we do in Code Gate, Code Gate is effectively an inline system. So when you converse with an LLM, you are actually using Restful APIs. So it’s like any other application. Your coding assistant is a client, it will take content, the code, or perhaps if you’ve typed something into a chat box, it will take that content and it will load it into a Json payload, which will be posted to a Rest API where the payload will be processed, a response will come back and it’s very much that typical sort of Restful flow that we’re all used to.

Luke Hinds 00:22:23 And so what we are able to do is capture this payload as it comes through Code Gate on its way to the LLM inference point. And then we scrutinize it, we check it for secrets on the return path, we check it for malicious packages and we effectively are able to trace these things on transit. . But the key thing with Code Gate is this happens locally on your machine. So it’s not like we are a kind of an EDR endpoint where you are just sending your secrets to someone else. This is processed right on your machine and that information never leaves your machine. So if we pick up that there is a cryptographic token within the prompt payload, what we do is we redact it. So we swap that token, let’s say it’s an API token for a unique string. And that payload then goes off to the LLM, which may well refactor the code, it could move the lines around and then it comes back as a response.

Luke Hinds 00:23:25 As it, when it comes back as a response, we receive it, we match the placeholder, the nominator of where that secret has been redacted with this particular randomly generated string that we have. And then we swap it back. We put the secret back in so that when the code appears in your IDE, you can see your API token there. And so what effectively has happened is it’s been redacted on the fly. And we do this with all personal information as well. If somebody was to accidentally send a file or a file was to be sent to an LLM and it contains a credit card number, we would redact it on the fly out it goes, goes to the LLM, the LLM does whatever it needs to do, refactor it, fix a bug, the content comes back, and then in line we switch back the redacted string to the cryptographic token or password or credit card number.

Luke Hinds 00:24:18 And then it appears back in your IDE, but we will also decorate the prompt so that via the coding assistant will give a warning and we’ll say there was a credit card number that was found in your prompt. We actually protected that for you. So you don’t have to take any further action, but you should perhaps be aware of why this is a risk and here are some best practices that you can implement. We are just using standard machine learning technologies here to provide these protections. We do something similar with malicious packages. We have a database of malicious packages that we’ve created over the years in Stacklok. And we also run our own SEA system and we have our own threat hunters that report packages as being malicious or, deprecated no longer archived. And if we see the LLM recommend one of these to the coding assistant, what we do is as soon as we see something that looks like a package name, we have a vectorized database and an embeddings model, and we do a similarity search around that name to see if it matches anything within our vulnerability database.

Luke Hinds 00:25:20 If it does, then we’re able to flag that to the user before they run PiPi install or NPM install and end up backdooring their machine. So we do similarity search because sometimes, they can be typo squatting attacks as well. And that allows us to sense that those are happening. So yeah, with Code Gate you get all of these controls locally in your machine, you get the ability to redact any personal information or secrets, but your code still comes back refactored. It doesn’t disturb your flow effectively, your development flow is unhindered. You don’t have to take any action. We just inform you that we’ve covered your back and that we redacted the secrets and the tokens and likewise, if the LLM recommends any malicious packages or suspicious packages, we’re able to raise alert of that and we provide all these protections in line and within the safe confines of your machine, nothing ever leaves your machine.

Brijesh Ammanath 00:26:15 I think that’s a very clever way of doing things. So redacting or replacing that secret information or personal information with something else so that the flow of programmers not impacted, I think that’s very key.

Luke Hinds 00:26:27 That’s correct, yeah, very much. Because one of the things I found being a security engineer, somebody that’s always been focused on developing security technologies that will be adopted by engineers, people want to be productive and if security is something that hinders them, blocks them, slows them down, they’re much less likely to adopt it. So for security to be successful with developers, it’s got to be as seamless as possible because there’s always some new shiny framework or API that’s got their attention that’s going to make them more productive. Whereas security, you don’t want them to have to learn about why they should protect themselves against a particular threat and then have them go through a long-protracted set of steps to implement the security to then be blocked and to want to turn it off. Because it’s frustrating. It’s just not the right way to get to developers to embrace security, is to make it a difficult experience.

Luke Hinds 00:27:23 If you can make it seamless, they’re much more likely to adopt it. I mean, that’s one of the things we found with SIG store when we were trying to get developers to adopt best practices around signing software. Up until that point, it had been quite a painful, frustrating experience. People were worried about how to manage the keys. Did they need specialist hardware? How would they rotate keys? It was these tools that were quite dated, like PGP, GPG, and it was just a frustrating experience with developers. So with six to what we got right, was we made it very, very easy to sign effectively. It’s very minimal effort required from the developer. So it’d almost be seamless. And that’s why we got the adoption then because when you are trying to convince developers to take on security, it’s like trying to encourage your 6-year-old to eat the vegetables.

Luke Hinds 00:28:16 They don’t want to they’re not really thinking about their health when they get older. They’re thinking about their immediate needs. Another good analogy I’ve used is it’s like selling insurance. It’s protections against something that could possibly happen in the future but might not. Generally people are going to be more focused on what’s going to unlock productivity for me now, what’s going to allow me to be more productive and move faster? And so if security is impinging upon that and restricting that, then it’s less likely to get adoption with developers.

Brijesh Ammanath 00:28:49 We’ll move to the next theme, which I’ve already touched on, which is around dependency management. Can you start off by giving an example of, or any story where coding assistance have actually introduced dependency management risks and how was it found out? What was the mechanism or what harm could it have caused?

Luke Hinds 00:29:08 Yes. So we run a system internally at Stacklok, which scans packages based on there being certain heuristics that determine that they’re more likely to be of risk. So they require closer scrutiny. So we have this system which are monitors, packages from dependencies from Python PiPi crates for Rust, NPM for JavaScript, Maven for Java, and there’s probably a few others, Go as well. So we monitor all of these packages and when we see one with suspicious heuristics, we’re able to pull it in and take a closer look. And we found a good number of malicious packages this way. One of the ones that our threat researcher found was I believed to be staged by our North Korean threat actor, effectively possibly government sponsored. And the packaging question was used as part of a series of mock fake interviews with developers. So they would be contacted on LinkedIn, offered a role with a very high profile, high paying company and told, all you have to do is a coding challenge, here are the instructions.

Luke Hinds 00:30:23 And the instructions were to download this package that we found. And once you download it and execute it by running NPM install, it would effectively back door your machine. So it was predominantly focused on max, but your machine would be backdoored and the attackers would start to exfiltrate information to a telegram server. And this was an attack that we found in the world. And one of the things we noticed was that we would go to these very, very high profile, cutting edge, widely used frontier models and months later we would say, hey, can you tell me how to use this package? And it would say, sure, yeah, this is how to use this package and it’ll generate a code snip. And then it would say, and then run NPM install bad package. I’m obviously replacing the bad package name here.

Luke Hinds 00:31:10 And so it was effectively telling people, yep, go ahead, use this package. In fact, here’s how to install the package, which would’ve resulted in their machines being back doored. So that was one of the reasons that we started to look up introducing this system that could protect people against these malicious packages because it was quite clear that we were asking some of the top coding assistance, some of the top models, could we use a bad package? And it would say, certainly you can, and here’s how to. So that was something what we saw quite commonly, which was obviously a very, very clear security risk.

Brijesh Ammanath 00:31:47 And I believe the reason for this was the topic you mentioned about training data being stale. So LLM had not caught onto the fact that this was a malicious package.

Luke Hinds 00:31:59 Yes, very much. And also a lot of the time when these packages, these malicious packages are crafted, what they will do is they will choose a name that is very close to the name of a popular package. So for example, there’s the, the Python package called requests, which a lot of people use as an HTTP client. It’s very, very widely used. Millions of downloads a week. And you could take the S and you can maybe swap it to a Z request with a Z. that’s called a typo squatting attack. The idea is somebody fumbles on a keyboard, they make a typo, they download and install your package and they’re compromised. Now the thing is, with neural networks, there are a very, very large collection of vectorized weights and biases. And how they work is they take natural language and they tokenize it, and then they perform these multiplication matrix multiplication.

Luke Hinds 00:32:56 There’s various math takes place where they calculate the predicted next token. There’s all sorts of things such as a co-sign distance and they’re effectively taking words, breaking them down into these mathematical numbers, and then they’re looking for close proximity patterns within the neural network. That’s probably not the right language to use. If somebody, there’s a neural network expert, they’ll probably be raising their eyebrows. But I’m just trying to make this sort of accessible for all. They find a very similar close proximity matches and then they again decode that to natural language. So if you make spelling mistakes with an LLM, it often won’t mention them. It won’t pull you up on them because it knows what you’re trying to say because it knows of words that are of close proximity to the word that you’ve used. And within the wider context, it’s able to match similarity.

Luke Hinds 00:33:54 It’s able to sort of find similar calculations that produce a result of a natural language response. So you might have the dog sat on the mat, okay? It is a very similar mathematically, it’s a very sort of very close set of words and letters where tokenized into a neural network. And so you can start to see where if you had said to not a neural network or to a large language model, how can I use package name with slight typo? It’s going to calculate, calculate probably not being the right word that you are inferring this particular popular package. And because it’s such a popular package, it would’ve appeared a lot within its training data. So the weights and biases would naturally allow it to match to the popular package. So that’s a good example of where large language models are prone to not being aware of typo scoring attacks. There’s that one that they do quite often. They say how many R’s in strawberry? And the LLM is just convinced that it’s two, and you try to convince that it’s three and it just, it will not have it. It’s to do with this vectorization, they’re these huge high dimensional spaces of these float numbers effectively these and they calculate similar distances around words. That’s how they appear to be human. So that’s obviously a key risk as well.

Brijesh Ammanath 00:35:19 On the example that you gave for typo squatting for requests package, won’t the LLM naturally pick up the package, which is most popular? Why would it go and pick up a package with a typo or a spelling mistake in it?

Luke Hinds 00:35:34 Well, interestingly, they do. That’s the interesting thing. When you do give them a word, which has a slight character difference, they still will go with that recommendation, go with that recommendation is probably not the right way to articulate speaking about a neural network. But they do go along with the original word that you gave them. I’m not sure why that is. I mean these neural networks, there’s a whole study group that are trying to figure out what’s going on inside them, really. Because there are aspects of them that we still don’t understand really how they operate certain way, but it’s just a pattern that they do follow.

Brijesh Ammanath 00:36:11 Do the security risks differ between companies using proprietary models like GitHub copilot versus open-source models like DeepSeek or Starco?

Luke Hinds 00:36:21 Well, good question. Now I think we could kind of bring this up a few notches where to trust a model, you need to peel back the layers effectively. So this is about model provenance, understanding the source of origin of a model. So you start with the training data and if there are any risks that are introduced, it’s going to be at that stage. And for the closed source models, you have no idea of the training data that they use. You have no idea of the weights. So there’s very little transparency if you make any sort of trust, it’s based on them being an established corporation with a brand to protect, like I spoke about earlier. Now, open-source models, there are, I mean people call open-source models, open-source models, but there is some nuance to this. Effectively, some models are, they have an open license, so they have an unrestricted license like Llama, for example, which deems people to label them as open-source, but they’re not really truly open-source in a way because the weights are not open or the data that was used to train the model is not freely available.

Luke Hinds 00:37:28 So for you to be able to really trust the model, whether it’s open or closed, I guess by definition of it being closed, this would not be possible. You would need to have reproducibility, you’d need to be able to take the data set, train the model, and then end up with an identical model. It’s a reproducible training flow. The pipeline is reproducible and that is really the only way to be able to a hundred percent trust the model is to have that level of provenance, to be able to take all the component parts and end up with the same result in large language model. So I guess can you trust open-source models with there being different variations of open-source models? Perhaps, because there is a bit more transparency, that is certainly a factor. But to truly trust the model, you would need to have the actual training data as well.

Luke Hinds 00:38:20 You would, and you would need the computational resources to perform the training, which let’s be honest, very few of us have. Now, one of the efforts that is looking to solve this is part of the six-door project, which I started a few years ago. And there is some work around model provenance, which is just cut a 1.0 release. And this is being used by various people such as Google and Nvidia and other entities that are starting to either explore or leverage this form of provenance. And this consists of cryptographic signing of the different flows of a training pipeline so that you can then get guarantees around the integrity and the provenance of the model. So there is some good work going in that direction to make these models more accountable and transparent and for you to have some guarantees around the source of origin, like who did actually train this model?

Luke Hinds 00:39:15 Is it the person or the people that it claims to originate from? With the cryptographic provenance, you can get those assurances that it is a model that comes from the entity that it claims to have come from. So again, this is same problem, same technology that’s just rehashed for a new technology. We had this for quite a while with container images and packages, understanding the source of origin, supply chain security, an area that I’ve worked in for quite a while myself. All of these technologies have been repurposed for the AIH, so it’s like any the old ideas to the new ideas.

Brijesh Ammanath 00:39:53 Right. And make sure we link to the store in our custom notes. Before we wrap up, do you have any thoughts on what’s next in AI security? What are the emerging risks and what are the key mitigating solutions that are being worked on?

Luke Hinds 00:40:08 That’s a very good question. I think a lot of what I’ve spoken about today are yet to really sort of escalate as being problems. So I think a lot of what we’ve touched upon today are some of the future risks, but outside of that, I think the things that are particularly concerning, and this perhaps goes above security as well, is how influential these things are becoming in our lives, okey? Personal counselors, accountants, they’re becoming deeply ingrained in our lives, large language models, and they appear very trustworthy. The guide rails are working well, our models seem safe. But these models can experience pretty dramatic malfunctions. They really can catastrophic, forgetting, overfitting, these aren’t things that we’ve seen play out with frontier widely available models. But there is that intrinsic capacity for a model to really malfunction on a level that could completely cause mayhem if other systems are relying on that model for automation of infrastructure and the services that we rely on to be present every day for society to function as LLMs start to become more of a guiding factor in our lives.

Luke Hinds 00:41:28 The safety of LLMs becomes absolutely critical. And we spend some time tuning large language models, trying to tune them to be more security aware. And we experienced them go bad robot, and it really doesn’t take much. You overfit them to a certain degree and then suddenly they are just spitting out the same word repeatedly over and over and over and over. You’re just trying to kill the process to stop it. It’s effectively gone fully effectual. So I think that is a real concern. AI is creeping more and more into our lives, we’re starting to rely on it. You are aware of model context protocol? Have you come across model context protocol? So model context protocol is a way of providing an interface for an LLM to control machines. So an NCP server effectively becomes a tool that the LLM is aware of that it can use to interface with other machines and systems.

Luke Hinds 00:42:25 So people are producing these NCP servers where an LLM can control your Google drive or it can create issues and pull requests within a GitHub repository or it could connect to a Jira system and create tickets. And it’s a very interesting technology. You can see a lot of utility in this where LLMs are effectively able to autonomously interact with systems and automate these processes so they can go beyond just generating code and engaging with a human. They can start to control machines. And that’s of course a big concern because if these models were to malfunction or to hallucinate and they have the control of machines, then you can start to see where it becomes quite obvious where that can be very concerning from a security angle. And these are technologies that are very much under hype at the moment, starting to really get traction, especially model context protocol.

Luke Hinds 00:43:23 It’s one, there’s lots of NCP servers coming online to allow you to interface and control applications and machines. These are, they’re JavaScript based and not saying that JavaScript’s a bad language, but it’s a language that is a little bit easier to potentially foot gun yourself in regard to security, web security and so forth. Like any language, any language can be dangerous if it’s used in the wrong way. But it’s an area that’s been known for web exploits quite commonly. It’s probably one of the ones that’s seen more risks manifest than most languages I guess because it’s web facing. And so there’s a lot to be concerned around with this ability for large language models to control machines and access APIs and move data around and, and then of course you have agents. So agents being these autonomous goal driven entities that are connected to a large language model, and they are able to go off and with the goal of solving a task effectively. If one of these was to become misaligned or repurposed, it could feed false information, it could perform damaging integrations with other systems. There’s a lot that could go wrong. So yeah, it’s very much a brave new world that we’re entering, and I don’t think life is going to get any quieter for us security folks for quite a while.

Brijesh Ammanath 00:44:46 Absolutely. Thank you Luke, for coming on the show. It’s been a real pleasure. This is Brijesh Ammanath for Software Engineering Radio. Thank you for listening. [End of Audio]

Join the discussion

More from this show