Nir Valtman, co-Founder and CEO at Arnica, discusses pipelineless security with SE Radio host Priyanka Raghavan. They start by defining pipelines and then consider how to add security. Nir lays out the key challenges in getting good code coverage with the pipeline-based approach, and then describes how to implement a pipelineless approach and the advantages it offers. Priyanka quizzes him on the concept of “zero new hardcoded secrets,” as well as some ways to protect GitHub repositories, and Nir shares examples of how a pipelineless approach could help in these scenarios. They then discuss false positives and handling developer fatigue in dealing with alerts. The show ends with some discussion around the product that Arnica offers and how it implements the pipelineless methodology.
Show Notes
Previous SE Radio Episodes
- 288 – Francois Reynaud on DevSecOps
- 541 – Jordan Harband and Donald Fischer on Securing the Supply Chain
- 559 – Ross Anderson on Software Obsolescence
- 514 – Vandana Verma on the OWASP Top-10
- 475 – Rey Bango on Secure Coding Veracode
- 498 – James Socol on Continuous Integration and Continuous Delivery
References
- What is pipelineless security? (blog post)
- What is an sbom, what is it not, and do you need one (blog post)
- How to Reduce Code Risk Using Pipelineless Security
- Arnica’s Real-time Code Risk-Scanning Tools Aim to secure Supply Chain.html
- What is CI/CD Security?
- https://github.com/arnica-ext/GitGoat
- Linkedin: valtmanir
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Priyanka Raghavan 00:00:16 Hi everyone, I’m Priyanka Raghaven for Software Engineering Radio. Today I’m going to be chatting with Nir Valtmann, who is the co-founder and CEO at Arnica. Nir is an experienced information and application security leader, and he’s been at a bunch of companies. I just want to call out, he was a VP of Security at Finastra and also CSO at Cabbage. Apart from that, he’s given talks at many different conferences including Black Hat, DEFCON, BSides, and RAC. And today, we’re going to be chatting about a concept called pipeline-less security. So welcome to the show Nir. We’re really happy to have you on board.
Nir Valtmann 00:00:53 Thanks, Priyanka. It’s really my pleasure to join.
Priyanka Raghavan 00:00:56 Okay, is there anything else in your bio that you would like listeners to know about you before we jump into the show?
Nir Valtmann 00:01:02 I think that put aside the title or maybe my history, I really have that state of mind of a hacker. I like to develop code maybe just to prove a point, but not necessarily to develop a production product. For that, I have my co-founders to help with that. But I really like to code. I’m really obsessed about developer experience and how developers actually perceive security because at the end of the day, if you want to help developers to get to a specific point, whether it’s in security or quality, you really need to understand them first to get there.
Priyanka Raghavan 00:01:35 Yeah, I think that’s really great and wonderful to hear because I think one of the things for people who are building products for developers and their experience, it’s almost like a servant leadership model, right? Because you really need to serve the people who, you know, use your product. So let’s get onto the show, and before we start looking at pipeline-less security, could you explain to our listeners what is a pipeline?
Nir Valtmann 00:01:59 Okay, so think about a pipeline as an automated script that is typically being kicked off when certain event occurs in your source code management system. So for example, it can be when you maybe open up a pull request. In that case, you may just want to maybe just build that container that you’re going to deploy. Or maybe it can be also an event that occurs when you merge a pull request or you make changes to a pull request. And in that case, let’s say that you merge the pull request. What is typical to see is certain tests are being executed. So for example, you have a script that builds your software, another script that maybe runs certain integration tests, within your environment that you’re trying to deploy and eventually also runs that deployment script. So that piece of the pipeline is essentially automated based on events and you have multiple systems that have predefined configurations and scripts that you can actually scale that process in a fairly simple way.
Priyanka Raghavan 00:03:04 Great. And I guess today the problem is are there any problems that you see with the ability to integrate security into your pipelines?
Nir Valtmann 00:03:13 So I think that there are some challenges when it comes to integrating security into pipelines. First of all, you actually need to work with either the DevOps or center of excellence or even developers to actually modify the scripts within your pipeline to embed your security tests in it. And let’s say that you do have the buy-in either from management or developers like you. Even then it’s very siloed. So think about companies that have, you don’t need to go far with thousands of thousands of thousands of repos. Think about a company with a hundred repos. How do you go and deploy your security tools where it actually matters, and you get that a hundred percent coverage where it matters to the business. That is the challenge because you need to go and work with multiple teams and modify the scripts to meet their scripting standards. Not only that, what happens if, let’s say tomorrow a new team spins up a new repo, maybe they want to develop a new microservice, how do you ensure that you are going to be there as well? So you always need to chase someone to embed your security tools into that pipeline.
Priyanka Raghavan 00:04:25 To add to that there’s sometimes people, because the task of getting onboarding people onto the pipelines is so huge that they think that’s security, they’ve done, they’ve got the pipeline. So I’ve done my security. So that’s another challenge I guess, right?
Nir Valtmann 00:04:39 Yes. And it also depends what you are actually doing with that because at the end of the day, one of the trends that we see is that many companies utilize even open source tools and embed them into pipelines and then it becomes more of a, I wouldn’t say maybe feature creep, but essentially an essential vulnerability management challenge that you might have. For example, how do you exclude vulnerabilities that are maybe not applicable, maybe false positives? How do you manage all of that in a centralized way, which is quite challenging with open source. And this is also the reason why you see many commercial products running open source to provide you that wrapper to actually manage all of those vulnerabilities in a central place.
Priyanka Raghavan 00:05:25 Okay. And before again we move into pipelineless. Another question I need to ask you is what are the three kinds of top risks that you see in your research on repos, three security top risks?
Nir Valtmann 00:05:38 Yeah, so one of the risks that we see is not necessarily an evil risk, but it’s something that just occurs. One of the things that we tested quite widely with our hypothesis initially and eventually ended up seeing a bit more in the wild is where access to code equals access to Cloud. Let me just explain what it actually means. Today, there’s an increasing trend of something that is called inner sourcing. Inner sourcing is a process that is very similar to how let’s say Google or Amazon work, where you have write access to all repos. You can contribute to every repo in the company, but for you to actually deploy that code, it needs to go through a pull request. So you can essentially create your own feature branch, you can develop the feature, you can create a pull request, and then only after it’s being–it’s approved only then it goes and deploys it. What we see is that thereís quite a lot of, and quite a lot is not quantifiable, but I would say that we see almost 50% of the repos that are misconfigured in a way.
Nir Valtmann 00:06:47 And by saying in a way, it can be either don’t have any enforced pull request process. So for example, if you think about like a GitHub for example, they have a code owners’ functionality, right? So you have a specific individuals that their approval is counted towards the ability to merge because everyone can approve, but then whose approval counts? In other cases we see misconfigured code owners files. So sometimes you do configure that but it’s not working. So for example, you do have maybe a configuration file, maybe a setting in your, let’s say in Azure DevOps for example, just a setting on the branch protection policies. And even then it’s optional and it’s not enforced. When you have that optional setting and not enforced setting, this is where access to code means access to Cloud. Obviously, there are some caveats whether you can approve your own pull requests or not, but that’s the gist of it.
Nir Valtmann 00:07:42 So that’s essentially one thing that we see with repos. The other thing that we, that we do see is that is kind of associated to that is also unenforced status checks that are important for the organizations. So for example, if you have a status check to validate maybe your software composition analysis issues or maybe have a status check to identify whatever linters that you want to run. It can be a security issue, it can be a quality issue. But essentially when you do run checks, in some cases I agree you don’t need to enforce them. Maybe some of them are good for your information purposes only, but when it comes to security tools, that’s where you do want to enforce it. So for example, one of the things that we see that is quite widely utilized within our customer base is something that we call zero new high severity vulnerabilities policy.
Nir Valtmann 00:08:35 Which means that whatever I have in a backlog, I will let everyone handle that in their own pace as part of my vulnerability management program. But I want to enforce that all of the changes that were introduced in that particular feature branch will be mitigated prior to the merge of that code. Because the problem space is that security teams chase developers, they have vulnerabilities, they open the Jira tickets, they need to follow up with them, get the commitments. And when you go in, actually you, you fix that. But in that case, because developers are already in that context of developing within that feature branch, it’s very easy to say either, hey, I fixed it and I focused only maybe on the high severity and above instead of seeing all of the risks. I focused on all of that, or I just had a magic wand to dismiss it.
Nir Valtmann 00:09:28 But at the end of the day, you got to the point that you have a clean so-called report that you can actually go and merge with that. And that’s where we see an additional misconfiguration with those branch protection policies. And I’d say that maybe the last one that we see is, and that really depends whether you’re a regulated company or less regulated company. We’ll also see quite have utilization on misconfigured permissions into the code repos themselves. We see some of that with maybe multiple admins on repos, but in many cases we can also see just excessive even write access. The main thing is that you can reduce permissions to minimize the blast radius when credentials are being compromised. We’ve seen that with, for example, think about like LastPass when a developer left and could clone the repos, maybe find secrets in there, but you could actually get to the point where you minimize the impact not only on the read, which is very hard to-do if you do inner sourcing, but if you do inner sourcing, you can at least minimize the impact on the write to resources or to repos. And that’s the balance that needs to be taken. I wouldn’t say it fits everyone’s needs, but these are key three trends that we see.
Priyanka Raghavan 00:10:47 Okay, great. So I say you’re going to be providing us the magic wand to get rid of this. Is that what pipelineless security is?
Nir Valtmann 00:10:54 Well, I wouldn’t say that it’ll fix everything because obviously there are some places where I think that pipelines actually do way better job than pipelineless. But at the end of the day, maybe worth going through the three different places where you can integrate the security tools and then we can talk about the pros and cons in each one of those and that will likely maybe emphasize where each one of them can fit.
Priyanka Raghavan 00:11:20 Okay, so what are the three places then?
Nir Valtmann 00:11:23 So the three places where you can integrate security tools are either in the developer workstation, you can put it either as in, let’s say an IDE plugin, like in your visual studio plugin. Or you can put it something that looks like maybe a pre-commit hook, something that kind of blocks you or maybe informs rather blocks, depends how you look at that when you actually commit a code. Another place would be in the pipeline itself, in that piece of the script as a step in it. And the last one would be pipelineless, which is some sort of a combination between the two. And maybe we’ll go through each one of them and give these specific examples. So maybe on the IDE, you can get maybe a linter to your source code. Maybe you can scan for hardcoded secrets in your source code.
Nir Valtmann 00:12:07 Maybe you can scan for software composition analysis issues such as which third party packages you’re bringing or maybe low reputation packages that you want to bring into the environment. But then there are two challenges that you have with that type of deployment. One, it’s extremely hard to take your security controls and actually deploy them across, let’s say a hundred percent of the laptops owned by engineering. Not only that, but keep in mind that in many cases developers work with different types of IDEs. So one can work with visual pseudocode, another one with eclipse, different versions, different settings, workspace settings and such. So it’s challenging by itself, but it’s great because it’s really on the left. And another challenge with the same setup is that many tools that run locally don’t necessarily have the server-side context that the enterprise wants to enforce.
Nir Valtmann 00:13:01 So for example, let’s say that I did a software composition analysis scan. Maybe I look for a third-party vulnerabilities in my packages. You’ll get a list of all of the vulnerabilities and vulnerable packages that you have, low risk, medium risk, high critical and such. Okay, what is important? How does the developer know what will actually break the merge later on? So there’s additional context that you need to inject for that to be effective because the moment that you integrate a tool that scans source code on the left side, you get the challenge of overwhelming developers and plus it’ll scan everything. What about the changes that I made versus everything I care about, the changes that I made at this point because I’m developing in my feature branch and not everything. So there’s that balance that needs to be made on the workstation. If you look on the other side, on the pipeline, you have a very similar challenge with the coverage to make sure that you’re going to have the security tools embedded in all of the pipelines.
Nir Valtmann 00:14:00 And if new repo pops up, as I mentioned, you want to make sure that you have it there. But also what we noticed in some of the companies that we’re integrated with is that when they utilized maybe more legacy scanners, they actually ran the scanners after the build process. And sometimes the build process doesn’t take modern times like minutes. Sometimes a build process can take two days. So why you run a scanner after two days and then fail it? The only conclusion out of this is that security makes you look bad, right? It’s not the right way to-do that. And another challenge that we see with pipelines is that at the end of the day, when you do run a pipeline, I mean you see the name of the person that introduced that break, the break of the chair, the break of the pipeline and such. And that is more of a psychological issue because now you blame the developer. Maybe blaming, maybe shaming, but it’s essentially, I see my name next to a red icon. I don’t like that. Okay, now let’s go to pipelineless.
Priyanka Raghavan 00:15:04 Before that, I needed to ask you because you, you really got my goat by the two days bill process. Is that because it’s a monolith,
Nir Valtmann 00:15:11 It can be a monolith, or it can be just a bill that maybe comes from multiple repos or multiple sources or it can be just a legacy product that it has so much code, even a service can have very heavy functionality. Think about a single service that runs fraud prevention. Well a single service that is quite heavy or to build. So that’s the complexities that you might end up with.
Priyanka Raghavan 00:15:35 Okay, yeah. So that’s something that I think you’ve clearly laid out the problem. So what are the choices that we have? I think you, just to recap, I think you talked about you have a choice of putting it in the IDE or you can have a pre-commit hook, or you can put it on the pipelines. And we’ve seen the various problems here in these different areas because I think it’s always like a trade off in the balance. And so yeah, now if you come into pipelineless, what are the solutions we can have?
Nir Valtmann 00:15:59 So a pipelineless is an approach. You can do pipelineless security as well as you can do pipelineless quality. Pipelineless is just the concept of how you should execute the tools. The pipelineless security approach that we’re taking is essentially all event based. So what we see as more beneficial with pipelineless approach is that we are actually listening to not only pull request events, but also code push events. So think about the developers that as they push code while they work on their code, they kind of save their changes when they push code, right? That’s where we provide them the feedback. And the beautiful thing about pipelineless is that you can apply a lot of logic associated to that particular change because if you look only at those changes of the code, one, you can run very specific scanners to identify specific risks.
Nir Valtmann 00:16:54 So for example, if you look at software composition analysis, this is easy. If you look at static code analysis, again, it’s fairly easy to run that. But if you look at dynamic testing, you can’t really do that before you have a maybe a built or deployed artifact, right? Which is where pipelineless works, it works with code, something that you can scan rather than a live environment. Now when you set that up and you scan all those code changes, think about me as a developer, I just wrote a piece of code that maybe has a hardcoded secret in it. If it has a hardcoded secret in it and you already manage to push that code regardless of which controls you have in the system, now you want to route that message first to the developer and ask, hey, maybe is it a real thing?
Nir Valtmann 00:17:41 Or maybe you can say to the developer, I know it’s a real thing, you put me AWS credentials with root access. Like it’s a big no-no, right? So you can have that balance, but you can send the message directly to the developer. And because it’s not associated to any pipelines, you can make changes in the code itself. So you can either maybe overwrite the secret or maybe create another branch for the developers and do other things there. But at the end of the day, because it’s not associated with that particular event, you can actually go make the changes, maybe scrub the secret and send the developer a direct message. And that’s takes our implementation via Slack of Teams. Say, hey Priyanka, we just detected that you pushed a secret, we fixed it for you. Click here maybe to merge the changes into your own personal or your own feature branch, for example.
Nir Valtmann 00:18:34 That’s one example. Another example would be, let’s say that you just pushed a code with a new SQL injection, whatever. In that case because, we ran the scan, maybe static code analysis on that, we identified that you have a SQL injection in it. Again, we’ll send a message directly to you, say, hey Priyanka, we did identify a new risk that you introduced in that code push. This is SQL injection; this is how you fix it. And then you as a developer, you are empowered to either say, I’m on it. Yeah, good catch, thank you, I’ll fix it. Or you can say, you know what? Dismiss and dismiss because no, there’s no way to control whatever parameter that comes into there, this is indeed a static code analysis issue, but it cannot be exploited. And therefore you may either dismiss the item or maybe suggest a dismissal and that will go to an AppSec team to approve that you’re good with that.
Nir Valtmann 00:19:31 The whole premise here is that you pushed code, it’s real time. You got the feedback; you got the opportunity to respond on that before you open a pull request. Now when you go to the pull request, then you can run again the same scan, but this time, anything that you fixed or dismissed will no longer appear as a failed status check because you either resolved it or you got the approval or you dismissed it and now you have all the green lights to go and either review the pull request or even merge the code. So that’s how it becomes a blameless and shameless process in which all the feedback works in a private message to the developer until everything is, everything that is important for the business is resolved. And again, it’s not all issues, it’s only things that you introduced and maybe you’ll focus only on the high-end critical severity issues first and then you’ll start blocking others. And that gets you to very minimal work before you can actually pass the security gates.
Priyanka Raghavan 00:20:32 It’s very interesting. So I’m just picturing this, it’s interesting that you said that you’ll send a message by Slack, just giving an example. Now what about all these things with, you know, we did an episode on GitHub Copilot as being an AI. So do you see that that’s also a future like almost, I mean I guess that’s as you type, but would that be again, a place where say this concept could be used as well? Is that it’s that too time consuming because you have to write the code and immediately has to run the check?
Nir Valtmann 00:20:59 I think it’s complimenting because at the end of the day, developers use Copilot to help them to write the code, but it’s not necessarily going to write the entire code that they need. Or maybe when you add a package, well that’s a package, you just need that package, right? So there is more areas that are not covered by GitHub Copilot that are covered by GitHub Copilot. I can tell you we are users of GitHub Copilot in the company. We really like it. But at the end of the day, it saves up to 20% of the developer’s work, which is phenomenal, but not necessarily develops the secure code that we’re looking for. Or maybe when we want to add external functionality or embed third party packages, it’s not necessarily the piece that will get all that functionality into the code.
Priyanka Raghavan 00:21:43 Yeah and also, I think when I’m talking with you, I think it also might be a little bit too intrusive and not very performant maybe to run a kind of background check as you write. So I’d rather wait for it to be a part of a pull request and then, or yeah, a part of my feature branch and then at that point when I’m ready to push, yeah,
Nir Valtmann 00:21:59 Exactly. It can be part of that feature branch. It can be per push. So you can get results for that push and then push another code and you’ll get new results only for the new push because you probably don’t want to get all results for the feature branch every time. It creates a lot of fatigue.
Priyanka Raghavan 00:22:15 Yeah. So this is super interesting. And so let’s just go into the next session where I can dive a little bit deeper into this area. So what exactly do you need to actually implement this concept of pipelineless securities? This hook is it going to be a web hook that’s listening to every push event. How does it work?
Nir Valtmann 00:22:33 Yes, technically it is you subscribe to web hooks that will send you any code push or any pull requests or any changes to a pull request. And when you get those events, then you need to make that determination based on whatever has changed there, that this is the piece that you want to scan. Because maybe it’s a code push, you might not want to scan the entire repo. You just look at that push, you look at the data, you scan the data, if you need to enrich it, you enrich it with other sources. Maybe you do need to make a decision that you do need to clone the entire repo and do some additional scans on top of it. But it’s up to the business logic that you develop and eventually it’s up to that service that receives the webhook to take additional decisions.
Nir Valtmann 00:23:18 Such as how do you know, for example, if you look at GitHub, right? How do you know that the email of the personal GitHub account that just pushed code in your Slack that is associated to the corporate, like it may not be on your profile, right? So there’s a lot of additional logic that you might need to write to have that context. But let’s say maybe you are willing to do that work manually. Maybe you’re just willing to go through each one of the GitHub usernames and map them to the email. Maybe you have other ways to do that, then it’s fine. You can route the message to Slack, but then you have other complexities like, hey, I routed the message to Slack, I want it to be actionable. How do you actually collect it from the developer and put that in the same process of the decision making when you open up a request? So there’s a lot of complexities in how you do that, but at the end of the day, you just want to maybe ping the developers only when it’s absolutely necessary. That’s the differentiator between different ways that you can implement pipelineless security.
Priyanka Raghavan 00:24:21 Okay. There’s one thing that we talked a lot about, which is amount of code coverage, right? So in terms of the pipeline-based approach of what we’re, we’re not entirely sure of the code coverage. So with pipelineless, since you’re dealing with small chunks, do you think that the code coverage can be almost percent?
Nir Valtmann 00:24:40 So first of all, you do have a 100% code coverage in your scans because you can do partial scans, or you can do full scans with a pipelineless approach. The things that you need to be wary of is that there are some timelines or deadlines to those times that you can run pipeline in a pipelineless approach. So for example, let’s say that you pushed code, and it took me more than, I don’t know, two minutes to scan it. I mean you probably saved it, maybe you already closed your laptop, and you are out of context. So the user experience might not be that good. Or maybe if it took me 40 minutes to provide you results every time, it won’t get it because maybe you, you have a status check. You said the status check is a pending state while you scan the pull request and then it times out, not because your process timed out, but because of maybe a GitHub implementation of how long you can actually be an in progress or in process state. And therefore when we do our testing, when we look at how fast we scan, you know, we test against repos and the size of Linux. So that’s like a clone that we have on our different source code management tools. When we push code, then we check how long it takes to run all of the scanners that we’re looking at and time it usually less than five minutes, think about the clone time. The clone time is included. It’s very long clone time.
Priyanka Raghavan 00:26:02 Okay. So the thing is, how many checks do you also run? Can that be also something that you can configure? So one of the things that now a lot of teams run is they also run like the static code analysis, then they run the static application security testing and they do the third-party code analysis. And then sometimes there are some people who are very ambitious, they also run a unit test, right? They’re also doing that on every pull request. And then by the time they’re like dead tired, they say, okay, let’s check out the security. But that, you know, typically happens, right? They’ll be like, okay, this thing is taking too long. So what happens with, now what I’m worried about is when you actually do this, you’re listening to this push event, can that be configured on what checks you want to run?
Nir Valtmann 00:26:44 It’s up to the implementation. In our case in Arnica, we do run all if needed. So we do the secret scanning, we do static testing, SaaS, we run software composition analysis, reputation checks on packages, as well as infrastructure risk code. All of that is something that we deliver. But then the timeframe that I mentioned, however, at the same time, sometimes it can have just simple unit tests and sometimes it can have more comprehensive integration tests? When it comes to unit tests, we do have our own developed script in our, you know, own push events that we also try to build the software, try to run unit tests and only, and that’s again on every code push that we do. Because we don’t want you to figure that out later. But that’s an internal process that just makes sense to us. Every company would likely have different processes.
Priyanka Raghavan 00:27:35 No, I’m just curious because in terms of like the, the reason I was asking that was also in terms of say like if you’re writing a piece of code, which might not require say like an infrastructure code analysis, like you’re not pushing a Terraform template, it’s just like some Java code that you’re writing, then I guess could there be a case that you only run the unit tests and the static, I mean the sonar Cuban SAST and SCA, that could be depending on the implementation or does, is that a choice for the implementation? Or could there be some intelligence built in based on what you’re pushing?
Nir Valtmann 00:28:08 I mean, at the end of the day, it can be anything that you implement behind that service in a pipelineless concept. In our case, we run them all and you select when you want us to act, even in a Terraform, we do scan for infrastructure code vulnerabilities so we can find vulnerabilities in the terraform. But when it starts with anything dynamic, that’s where you probably want to have your pipeline. Anything that is static on source code, you can do with a pipelineless approach with a caveat that it needs to be timely.
Priyanka Raghavan 00:28:40 Okay, so since you brought up that , since it’s timely, so how do you measure that?
Nir Valtmann 00:28:45 So for example, just at the top of my head, I remember that GitHub has 30 or 40 minutes time that you can actually update your check that you’re running. So if you do like a check on a PR for example, that needs to be finished within a very fairly short timeframe. You can do things on push as well. But at the end of the day, if it’s a long process, you probably want to edit into a pipeline anyways because you want to see the logs, you want to see everything that is going on in that script as opposed to a pipelineless security approach, which is typically controlled from a central console. And then it is essentially think about the, maybe on if it’s a security team, the security team controls the pipelineless approach for the entire company, right? Maybe it’s a DevOps team that just wants to see governance.
Nir Valtmann 00:29:36 Maybe it’s just a code coverage scan or maybe, linters or make sure that you don’t overwrite their Terraform scripts, whatever it is. In that case, they will control it centrally, right? But that is something that, developers will not necessarily need to look at the logs because if there’s an issue there that might be an issue with the DevOps team or might be an issue with security. And that goes back to what is the thing that you want to expose to the developer that if it fails, it will, the developer will be fine with that red icon next to the developer versus what do you want to provide privately. Security, you probably want to provide it privately first. Maybe quality in some cases as well, but that’s going to, the difference whether it’s in the pipeline versus on a separate pipeline that you don’t have much visibility into until you get a private message or a until you, it can also be a comment on a commit. It doesn’t need to be only a message to the developer, right? You can annotate in line in the commit and you’re going to be okay with that as well. So it’s really up to what is the automation that you set up with this?
Priyanka Raghavan 00:30:44 The thing that you just talked about, delivering a message privately to the developer, not a shame. Also brought me to this blog that I read on your blog, Portlet Arnica, which talked about something called is a result sensitivity. I’ll also reference this on our show notes for our listeners, but it talked about how actually telling someone that you had a high security risk in your code base could also be an attack vector for an insider attack. I think that’s what I was saying. So can you explain that a bit to our listeners?
Nir Valtmann 00:31:14 Yes. So think about it this way. Let’s say that you manage all of your security vulnerabilities in Jira as well as you manage all of the rest of your backlog. Now, as an adversary or maybe insider threat to the company, I’m a developer, I can, you know, maybe I’m going to leave the company and now I see all of the issues with the tag security and I know that there is a SQL injection and I know that this issue is in fact in Maine. in the main branch, right? Well, I see where the problem is, I can see how the database is built and then I can go and exploit it because I know it’s right there and it’s not fixed yet. For that, if you want to minimize your risk, not only in the product but also the exposure of certain findings, that’s how you probably want to handle that.
Nir Valtmann 00:32:02 You want to handle it either in a private way or maybe allow, and if you can configure your issues or issue tracking system in a, in a very granular way, then just create issues in a different project and let only the owners of that product or maybe the, the principal developers, whatever, only them to access that project. But then, okay, it’s a separate project for security or it’s a separate project for security for each other project that you have. And then, okay, how do you manage it? So you have a lot of questions that you need to answer to make that separation, and therefore the way at least that we solve that is that we track all the issues automatically within Arnica and then we have a, like a slash command that you can use in your maybe a Teams or a Slack and you can ask, hey, which vulnerabilities I introduced as a developer?
Nir Valtmann 00:32:53 And then you’ll get a list of all of the vulnerabilities that are associated to you, specifically to your code attribution. Another way you can do that is do something similar with the slash command and ask, hey, what are the vulnerabilities associated to the product owners of that product? And me as a product owner, I will be able to see only the issues associated to the product that I’m accountable for. So it’ll also get to the point that one, you have private findings and two, you really get only the important things that the governance team or the central team decided to expose to you.
Priyanka Raghavan 00:33:29 That’s quite interesting to note. I’d never really thought of the fact that the way we manage vulnerabilities could also be an attack vector.
Nir Valtmann 00:33:38 It is and by the way, another idea, try to open a Jira ticket with a finding that says you have a hard coded secret. Okay, so where’s the secret? Here’s the link.
Priyanka Raghavan 00:33:48 Yeah, don’t get me started. Actually, I spent nearly two days trying to wipe out, Git history with a hard code of secret. It’s not that easy. It messes up the whole branching and everything, you know, so yeah, it’s not an easy task. So imagine pointing someone to that and then trying to like delete history. Yeah, it’s a mess.
Nir Valtmann 00:34:08 Funny point when we run secret detection and mitigation with our pipelineless security approach. So we have a patented way to, you know, mitigate that secret and we provide three buttons to the developer. We say, either fix it for me or I’ll fix it myself. Or dismiss maybe, I don’t know. it’s a private key that only you use in your unit tests, whatever, right? I can tell you based on our analytics, we did not see any developer click on the I will fix it myself button twice because that’s exactly what you were doing. You had to go through that history and change that. It’s not easy.
Priyanka Raghavan 00:34:46 Really not easy. Yeah, it’s very messy. This actually leads me to one of the questions that I think I have seen a feature on GitHub, right? Where they don’t let you, they also have a kind of a pre-commit, I guess it’s on the main brush, but they don’t let you push if there’s a hard coded secret. So how would that be different from say, the pipelineless approach? I think, in fact, I remember there was this, the very interesting blog again on Arnica, it’s said zero new hardcoded Secrets. So how does that effort look compared to say what this GitHub push committers, I donít know what it’s called, push pre-committer, I’m forgetting the name, but there’s some way that you can avoid hardcoded secrets going into the code?
Nir Valtmann 00:35:23 So first kudos, you really did your homework.
Priyanka Raghavan 00:35:26 I mean apart from the homework, there’s also been a lot of pain where I have actually committed, like I shouldnít be mentioning this on Software Engineering Radio, but I have actually committed secrets and I’ve tried like wiping it out and yeah, there’s a lot of pain involved with the question that I’m asking. So,
Nir Valtmann 00:35:41 Okay, awesome. So I’ll tell you the difference. The way that GitHub advanced security does that is that they have something that is called pre receive hook. The, a pre receive hook is, is a known hook with a Git, SPACs essentially. And it means before the push of that code is actually being stored, persisted, they run certain checks and then if it doesn’t meet all of their checks, then they can tell you back, hey, you can’t push that code. It’s very similar to access control check that you can have that says, hey, you can’t push it now. Okay? But then when you have it, you get the message back as a developer and it says even with a custom message, it’ll say you need to fix it or this is the link to the guide in the company that tells you how to fix it.
Nir Valtmann 00:36:27 And then you go through the exact same process that you just mentioned, you need to go and remove it from the history. With Arnica you don’t need to do it. With our approach as we see the code that is being pushed, we overwrite it for you. And that’s the main difference because the developer experience is essentially our main strength within Arnica. Now I’m not saying that GitHub doesn’t have a great developer experience as a customer, phenomenal developer experience, but at the end of the day with that particular feature, you want to make it least painful for the developer, especially in such a sensitive issue because the moment that the secret is in your repo, anyone else that clones that repo can get to that secret by just iterating through the commits or do the, just go through the audit log of the commits. You’ll be able to see that, right? So that’s why we do everything runtime and just override it for the developer and therefore you have that button that I mentioned that says, fix it for me, which it does exactly what it should do. It fixes it for the developer while ensuring that no one else actually has a copy of that secret on their laptops. And if they do, we provide that insight as well.
Priyanka Raghavan 00:37:42 That’s interesting. Let me just move on to some of questions. I want to talk a little bit about two areas. One is of course the false positives, right? So all of these tools produce false positives. I’m assuming a pipelineless approach will also have the same problems or?
Nir Valtmann 00:37:58 Yes, I wish I could say no, of course not, butÖ
Priyanka Raghavan 00:38:03 That’s an honest answer.
Nir Valtmann 00:38:05 Yeah, we have zero false positives. That’s how you know that we lie.
Priyanka Raghavan 00:38:09 , Okay, okay. But then does it learn, the reason I asked that question was also because you said that things are customized to every developer, right? So then how does your, how does the intelligence build? Because some of the, like the other tooling, the false positives kind of reduce because you set up things as a profile and that’s applied across the entire project. Can you do the same thing with the pipelineless? Does that make sense?
Nir Valtmann 00:38:33 So with the pipelineless approach, again, it depends what you implement. You can have full scans, you can have partial scans and such, but at the end of the day you will have false positives no matter what you do. That’s the nature of handling or finding security vulnerabilities in code. And at the end of the day, if you provide that feedback very quickly to the developer, and that’s essentially what we see, even if it’s a false positive, it’s like, okay, fine, I’m just going to say dismiss. Because it’s not, it’s not something that occurs that frequently with small code changes and therefore when you save the code you can say, ah, that’s not a big deal, I’ll just dismiss it and you know, we’re good or, and that’s essentially the balance that you have because if you provide all results every time, you’ll create a lot of fatigue.
Nir Valtmann 00:39:17 If you provide it in small chunks on something that developers already have context in and they are the authors of that piece of code, it’s very easy for the developer to say, you know what? It’s not a big deal or you know what it is, or maybe you know what, I’ll just, I’ll ask to accept that risk and I’ll send it to security. So it’s very iterative process as opposed to, you know, I just pushed code. Now let me look at the pipeline. Let’s see that it’s right there. Let’s see, it’s succeeded. You don’t need to wait for all that. Just get it immediately and respond.
Priyanka Raghavan 00:39:49 The next question I wanted to ask you was, another thing that a lot of security teams or grapple with, is we have this list of to-do items and there’s this big backlog and then you have to go through that list and fix them but nobody likes you and then teams don’t like you and you feel bad yourself because always running around list and eventually everything is forgotten even though it’s in the backlog. Yeah, I’ll do it. We’ll do it. And I wanted to know if there is with this pipelineless approach, because you’re sort of looking at like smaller chunks of code and the feedback is immediate. Do you think that this to-do list culture will go away?
Nir Valtmann 00:40:25 Actually, I’m thinking about you just gave me an idea that might be really good for a pipelineless, not security, maybe pipelineless productivity approach, which is when you look at the code changes, when you see that the developer just authored a slash slash to-do thing, go and create the Jira ticket and that’s it. And send the developer the link to the Jira ticket. Hey, or maybe modify the code and add this is the Jira ticket for that thing and that’s it. Okay. Credit you commit on the code that says this is the Jira ticket for it.
Priyanka Raghavan 00:40:55 You mean like you also provide an automatic fix for that is what you say? Oh wow, yeah, that would really be great. No, but jokes apart. I just wonder, do you think there would be this to-do list culture, would that sort of go away if the feedback is more immediate or it’s provided in smaller chunks?
Nir Valtmann 00:41:15 I do see that the to-do list or creating even Jira tickets for new vulnerabilities is something that some companies will still need to do because of compliance. Because from whatever regulation they have, they would likely just want to have that tracked. However, I do see how that number of to-do issues is being reduced or automatically resolved when it comes to real time feedback to the developers. So think about maybe a developer at least, I’ll tell you how we would do that with Arnica. When we scan a new vulnerability, obviously it creates a new finding in our system and we differentiate between a finding in a feature branch versus a finding in let’s say an important branch. And what happens is that we’re not scanning only the bad stuff, which is the vulnerability. We’ll also scan the good stuff, which is the fix of the vulnerability. So if we detect that someone actually fixed the vulnerability and maybe merge it into the relevant branch, we ought to close the issue.
Nir Valtmann 00:42:11 I mean you already have that with certain workflows, but it’s mainly manual, right? When you create a feature branch, you can say I created a feature branch for that Jira issue, right? And then when you merge it, that issue is being closed, but while you write code, it’s not a new feature that you develop, it’s part of that feature that you are developing, that you introduce a new vulnerability and therefore you can create a Jira ticket, but it’s way easier just to resolve it in your feature branch. And that’s, that’s the difference that, that I see between, you know, managing the issues that are more functional, maybe let’s say non security specific or maybe product features as opposed to fixing security vulnerabilities that can, can have its own silo as we discussed, you know, privately and such. But you also want to have that capability without looking necessarily only at the merge of the code or having an issue open. You just want to scan the code and determine, hey, did that developer just fix it? Yes or no? If it’s a yes, just close the issue and continue with that without the need to have a project manager that will follow up with you every time on that.
Priyanka Raghavan 00:43:16 Right. No, actually I think that’s a useful feature. In fact, I built something similar, I mean only the Jira part of it in a tool that I work at in my company where also I kind of, when the fixes happened, it automatically closes the Jira ticket and I think that’s super useful. So that would be really good. I want to move on to another two questions. Actually, these are two things that came out a conference I attended yesterday besides Bangalore where I was, so one thing that came out of that was, we talked about developer fatigue, right? So the third-party libraries, right, or that we’re trying to upgrade. Usually that’s not something that’s very simple. Like if you look at the, the OSS supply chain attack vector, that’s not very simple. Somebody in the audience, a person was giving the talk, they’d asked the question to the presenter if there was a way that when you scan for the third-party libraries, can you also say that how exploitable it is from the code you’re using. Because if you’re using a library and not using a method that has the exploit, then maybe you’re okay. What are your thoughts on that?
Nir Valtmann 00:44:14 So we actually did quite a lot of analysis on that. If you go and parse all of the almost 300,000 CVEs that exist out there, and you actually look for the function names, even the word function, I mean really do really comprehensive reading of all of the functions that are there directly from the description or maybe even, you know, other resources that explain it, you’ll see them less than 1.5% of the CDEs even have a function name to give you. Now take that 1.5% and consider or take into account as well that not all of these function names are within that package. Some of them are within a sub package that that package actually uses, and they just say that’s a function name of a sub package that is being fixed. Okay? So even that 1.5% is not all direct function names that you have.
Nir Valtmann 00:45:09 Now, why I’m telling you this because I know that there is a hype around exploit reachability of that code, right? But it is a hype and hype has a tip at the top and it has a tip at the bottom. And at some point, it gets to that plateau. We’re at that tip where it’s at the top in my mind where the question that you ask is actually being asked, by a few others, whether, hey, could you provide me with that reachability? And the simple answer is, yeah it’s a hype and it’ll likely reduce some of the vulnerabilities, but you will get a ton of false negatives. And its question is, do you want to focus, or do you want to miss? That’s the balance that you need to make. We have a different approach to looking at that, can’t expose yet that will not get you to either 1.5% will not get you there and will not get you to the place where you don’t have anything. So we’re working on something in the middle that will actually give you a bit more context and confidence about that particular package that is being used. We’re not reinventing the wheel, it’s actually something that we’re taking from a well-known capability with a specific language that came from like the world of performance and kind shifting that into a security concept that will give you very similar answers.
Priyanka Raghavan 00:46:32 After this interview, I’ll do my passive recon skills and try to think about that. So,okay.
Nir Valtmann 00:46:37 And this is why I gave you the hint because I figured out that you might want to look at that.
Priyanka Raghavan 00:46:42 Okay. Okay. The other question which I wanted to ask you also I listened to in these sides was we talked a lot about GitHub repos and then there is a lot of attack vector from GitHub applications being exploited. In fact, I think there’s some things even in your in Arnica’s blog about this, what do you think about this attack vector and are there special ways to protect GitHub? Because you talked a lot about misconfiguration when we started the show, so are there any tips or does your company provide that kind of a startup sort of to-do list? Sorry, I should be using to-do, but yeah, template I mean.
Nir Valtmann 00:47:18 Yeah, it’s hard to say whether it’s going to be a to-do list or not, but it, at the end of the day, every developer has its own behavioral pattern, and some developers behave in a very specific way, maybe in their Git clones. It can be that way. So I know that you might look at the world of problem by just maybe streaming your logs into your SIM and maybe just ask the SIM, hey, give me every developer that cloned more than, I don’t know, 30% of my repos, and the last two hours or within two hours, you can definitely get it and you might end up with some good results. But then what happens with, you know, maybe build agents? They can also clone 50% of the repos in the same hour. And then you start introducing false positives with automation. I mean the lean solution is, you know, go and send it to the SIM and do some statistical analysis and you can live with this with these false positives.
Nir Valtmann 00:48:12 Another way to solve that is through machine learning. This is the way that we’re looking at that. We’re actually building clusters of behaviors for each, for each identity. And based on that, we can determine which behavior is more anomalous to that developer. It can be the Git clones, it can be even your commits in the code. Because in our case, the way that we’re thinking about the world of problem is that of course it can exfiltrate source code, it is a risk. But in the grand scheme of things we’re looking at either inside a threat or account takeover, right? And if it’s an insider threat, it might be a piece of code that you typically don’t write that seems maybe not belong to you, not belong to the repo, something odd and therefore this is that type of an attack scenario. Or maybe it’s just an account takeover and someone just tries to clone your repos or get your secrets from the repos and such. So there’s these types of areas that we’re looking at with machine learning. I mean, as I said, some of that you can implement with a good SOC analyst. Others are way more difficult to implement like coding styles and such.
Priyanka Raghavan 00:49:20 Okay that’s good. I have the last few questions, I think weíre towards the end of the interview. One, since we talked about security misconfiguration, can you tell us a little bit about this project called GitGo that Arnica has?
Nir Valtmann 00:49:32 Oh yes.
Priyanka Raghavan 00:49:33 Yeah. What is it?
Nir Valtmann 00:49:34 So when I said that I liked to develop code, this is actually what I wrote with my Arnica hat. This project essentially helps you to test various tools when you try to protect your supply chain or software supply chain. So what GitGo does, it’s essentially, it can be a Docker container, or it can be a small script that you just give it personal access token to GitHub. An org that you just created can be an empty org. You can even create an enterprise trial and spin that up. And then what Arnica does is, first, it goes to, it invites users to your org. It misconfigures multiple repos. So some repos will have code owners maybe without enforcement, some of them will have excessive permission, some of them will have maybe stale repos. So you’ll have different types of repos that might be misconfigured.
Nir Valtmann 00:50:25 And then what we do on top of that is that we go to different open-source projects that we know that are vulnerable like Webcor, we clone them. But since we care a lot about those commits of the code and we want to associate a commit to someone that is actually in the company, a piece of code that I’m really proud of is the piece of code that actually goes and rewrites the Git history and recommits the code with the individuals that are invited into your org. And therefore you may see an open-source project that the author is not from, the open source is from your org and now you can work with that. So that’s what it creates. It takes not only that, the nice things that it does as well is also it simulates different behavior for each of those users. So for example, some users could be just you know, writing into feature branches and then they create a pull request which is approved by someone else from the org.
Nir Valtmann 00:51:18 All of that process is also included with the same script and obviously part of the pull requests are not approved. So if you want to check different areas of the misconfigured org, you can do that. And needless to say, it also has hard-coded secrets in it. So if you have any secret detector, you can scan it. If you have SaaS, you can scan that org with a SaaS or with any infrastructure risk code vulnerability scanners or software composition analysis scanners. Essentially you have enough options that you can run scanners on that org and you’ll have a variety of results.
Priyanka Raghavan 00:51:51 Okay, interesting. So I’ll definitely put a link in our show notes to that and then yeah, people can contact you. The last question I have for you, because we have actually talked a little bit about how Arnica works, and we did, I mean the pipeline list implementation that you have, and you did say that it’s sort of a web hook, right? Listens to your push events. I wanted to also ask you about the question about the licensing. How does that work? Because do you use open-source tools for doing all of these different parts of your tool chain? How does that work for consumers to pay for this?
Nir Valtmann 00:52:23 Yeah, so the way that we built the value proposition of Arnica is through some bad experience that I had in the past. The thing is that I’ve been a buyer of Fast SCA, IAS, you know, API security and such. And based on that experience, I acquired quite a lot of dashboards. And dashboards, they showed me the risks, but in many cases they did not actually, in all cases, they did not interact with the developers in the right way or did not really help me to mitigate that risk or those risks that I presented put aside even prioritization and even finding the right owners for each product. So the way that we built Arnica is we said, you know what? I’m not a big fan of hiding risks behind a paywall. So therefore, what we said, anyone that integrates with Arnica as small and as big as companies it might be, they get visibility for free, unlimited time, unlimited users.
Nir Valtmann 00:53:22 So even if you have an org, let’s say a GitHub org or Azure DevOps bit whatever org, with 5,000 developers, it’s free, really free. We charge for automation, we charge for what we can fix. So if you’re up to viewing dashboards and you’re not willing to take action on it, then you probably will not necessarily renew with Arnica next year. So I don’t want you to pay, but if you are willing to take that challenge to talk with developers, to get them to mitigate risks, to see your trend going down in risks, that’s where we excel and that’s how we charge with different tiers within Arnica. Okay. And needless to say, all of that experience, even onboarding, it’s completely self-service. So anyone can try it out. The only limitation that we have, it’s a known limitation, and we’re designed it this way, is that, for example, if it’s a GitHub org, you cannot install it on a personal org. You can install it only on an org that you create. And the reason is because at this point, we’re really focused on the organizations that want to onboard into Arnica, and there are some additional complexities that need to be considered if you are looking into personal repos, not in the Git repos and such, but other areas that’re not necessarily important for now.
Priyanka Raghavan 00:54:43 Okay. Good to know. But anyway, this was great, Nir. It’s been a very engaging conversation and I just have to ask you, the very last question before I let you go is, what’s the best way that people can reach you?
Nir Valtmann 00:54:54 Well give me a phone call, no but more seriously Ö
Priyanka Raghavan 00:54:55 No, I mean, it’s like any of the social media sites, so we want, okay.
Nir Valtmann 00:55:01 No, no, I’m kidding. I’m kidding. But more seriously, I’m on LinkedIn, so I’m trying to be as responsive as I can or go to Arnica.io., we have a chat there. You can interact within Arnicaís app or within Arnicaís website. And I mean, literally we have almost an entire company listening to new communications coming through that chat. And the first person that is available is responding to that. I am actively monitoring that channel, even, actually I’m monitoring that channel more than a month. I’m monitoring my LinkedIn. So, try to do both. Connect with me. I hope that friendly enough to the people that connect with me would be happy to get any, any feedback on one, on the episode, on the website, on the messaging, on the product. We’re really thirsty and really obsessed about feedback on our product.
Priyanka Raghavan 00:55:55 That’s great. I think it’s been quite a learning experience for us also to know about the concept of pipelineless. So thank you for that. And this is Priyanka Raghaven for Software Engineering Radio. Thanks for listening.
[End of Audio]