Open source developers Jordan Harband and Donald Fischer join host Robert Blumen for a conversation about securing the software supply chain, especially open source. They start by reviewing supply chain security concepts, particularly as related to open source, and then explore: package managers and open source; attacks on open source projects; challenges in validating software that is written outside of the organization; security standards in open source projects; best practices in open source; validating compliance of open source; automating the consumption of open source standards; scanners; and generating useful information from software bills of materials (SBOMs).
- Jordan Harband’s github
- Open SSF announces efforts to securing the supply chain
- NIST Cybersecurity Supply Chain Risk Management Practices for Systems and Organization
- NTIA Minimum Elements of a Software Bill of Materials
- OpenSSF security mobilization plan
- Tidelift 2022 open source software supply chain survey report
- A people-centric way forward for the open source software supply chain by Donald Fisher
- Log4j, Open Source Maintenance, And Why SBOMs Are Critical Now
- IconBurst NPM software supply chain attack grabs data from apps and websites by Karlo Znki of ReversingLabs
- The Octopus Scanner Malware: Attacking the open source supply chain by Alvaro Munoz
- Details about the event-stream incident from npmjs.org
- What is a supply chain attack? on wired.com
- Preventing malicious packages and supply chain attacks with Snyk
- Detect and prevent dependency confusion attacks on npm to maintain supply chain security
- Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies by Alex Biran
- 489 Sam Boyer on Package Management
- 338 Brent Laster on the Jenkins 2 Build Server
- 498 James Socol on CI and CD
- 416 Adam Shostack on Threat Modeling
- 385 Evan Gilman and Doug Barth on Zero-Trust Networks
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Robert Blumen 00:00:17 For Software Engineering Radio, this is Robert Blumen. I have with me today two guests, Donald Fischer, who is the co-founder and CEO of Tide Lift. He’s a board member of several companies and organizations and is a graduate of Stanford University where he received his MS in Computer Science. Donald, welcome to Software Engineering Radio.
Donald Fischer 00:00:40 Thank you. Very glad to be here.
Robert Blumen 00:00:42 Donald, would you like to say anything else about your background?
Donald Fischer 00:00:45 Yeah, I think you encapsulated pretty well. I’ve been fascinated by software — in particular, open-source software and the creators and communities behind it — for most of my career, and excited to talk about some of that here with you today.
Jordan Harband 00:01:19 Thank you. Happy to be here.
Robert Blumen 00:01:21 Would you like to expand on your background at all?
Jordan Harband 00:01:24 No, I think you’ve covered it pretty well. Thank you.
Robert Blumen 00:01:27 Let’s proceed to the content. We will be talking today about securing the software supply chain. This episode goes very well with 535, which was about software supply chain attacks, but it did not offer solutions. Today we’ll be looking more at solutions. Donald, briefly, could you tell us when we’re talking about supply chain, and keep in mind we had a whole episode in this, but a quick review of what do we mean by software supply chain?
Donald Fischer 00:01:56 Yeah, absolutely. So the so-called software supply chain is really anything that affects your software at any point in its development and release, including the original creation, writing the software, continuous integration, continuous deployment pipeline that it goes through. All of the channels through which your software flows from the moment of creation into production. And that includes a whole bunch of different systems and sources of software. In particular, one of the areas that’s been in the spotlight quite a bit recently is third party open-source software, which is one of the key ingredients that goes into most applications these days.
Robert Blumen 00:02:36 We did, as I mentioned, cover a lot of attacks on the supply chain in 535. It became clear that supply chain attacks are not any particular type of attack, like say buffer overflow or escalation of privilege. They could be any type of attack, whatever, that attacks and component of the supply chain. How can we talk about securing the supply chain when it’s not any particular type of attack that we’re defending? Jordan?
Jordan Harband 00:03:07 Yeah, I think in the same way as every time you add people to your company, you’re incurring some trade-offs. You have more eyes on your code base, you have more people who can test it, more people who can make sure it’s of good quality. You also have more people that could make mistakes that could get a laptop stolen at a coffee shop that could fall on hard times and get bribed to betray the company in some way. You also have more potential people who could be enemy state actors, right? That happened at Twitter on one or two occasions. But these are all trade-offs as you add more people, right? And it’s the same with adding dependencies to your supply chain. Each dependency is one or more people who maintain that dependency, and there are huge benefits to adding them in terms of there’s more eyes on the software, there’s more experience involved. You have more specialists who can do that, the specific task they’re charged with doing exceedingly well. You also have more points of failure. And I think that securing your supply chain is really a balancing act about how you accept the large benefits of adding people to your process while still managing the weakest link in any process on the planet: the humans.
Robert Blumen 00:04:23 While I was researching this interview, I came across a quote from Bo Woods of the Cybersecurity and Infrastructure Security Agency. He says that software supply chain issues are more organizational than technological. Does either of you agree or disagree with Bo Woods on that?
Donald Fischer 00:04:43 I think the people part and the technology part are co-equal in this question, right? It’s really both people and technology that play a role here. And it’s actually true not just on the consumption side of software. The software developers that are assembling existing code and authoring new code to comprise an application for a business purpose. It’s also true on the creation side of the components that flow into those applications, right? There are, I mean, often left underappreciated, including recently, is that there’s humans behind these Open-Source projects that we rely on. And so we can’t just solve all of the problems by throwing more technology and software and tools at it. There’s also a human element, a people process, and also a human incentivization and alignment challenge that we need to grapple with.
Robert Blumen 00:05:35 Jordan, do you have anything to add to that?
Jordan Harband 00:05:37 Yeah, I totally agree with that sentiment. I think that my comments would be easily interpreted as saying that it’s only the humans that you have to be concerned about. And I think I was adding a little hyperbole because I think it’s primarily the humans that are under-addressed and that needs to be sort of compensated for by overdoing it in the other direction. But obviously you cannot ignore the moving parts of the technology involved. I think that it takes a lot of effort to consider both of those things and handle them invest in them appropriately and have the appropriate checks and balances on both sides of those things. Yeah, I think that’s the challenge ahead of us all. How to balance those.
Robert Blumen 00:06:16 Jordan, suppose you are looking at organization, they tell you we’ve set up Jenkins’s build server, we’re running a bunch of builds, we haven’t really thought about security. Do you have a checklist or where to start? What does that organization need to do to improve their situation?
Jordan Harband 00:06:34 Your question omits a lot of information I would need. So, I would want to understand the entire process that leads to code arriving in that build server. So I would want to understand who has permissions to do what, who has to review it? What are the gating procedures to make sure the code has moved through this pipeline appropriately? What are the remediation paths if problems are discovered at any point in the pipeline? Like, can it be rolled back? Do you file a ticket? Who do you notify? And once that is all kind of in order, then you get to the build system, and then your question is, well, if somebody were to compromise the build system, what could they do? And ideally, the only thing they could do is make, perhaps make network requests and perhaps build a corrupted or malicious artifact to deploy. And so, then the next question would be those same sort of kind of chain of custody and permissions questions around the build environment. Who has the ability to configure that and verify it, and how do you make sure that code that comes in, even though ideally, it’s trusted, how do you make sure that that code cannot exploit any vulnerabilities in the build environment and to produce those harmful network requests or file system access and so on?
Robert Blumen 00:07:47 So we’ve been talking at a pretty high level. I’d like to drill down more into some particulars, starting with the role of open-source. It’s pretty much impossible now to do anything without probably pulling in hundreds of open-source libraries. Is open-source a major vector for supply chain attacks?
Donald Fischer 00:08:10 So, it’s a vector for software supply chain attacks for sure. Now, open-source software is still just software, right? So, it’s going to suffer from a lot of the challenges that any software including privately proprietary software would face in terms of secure development and secure release and secure handling. And in fact, in a lot of ways, Open-Source software has some advantages over privately built software. I mean, there’s inherently the many eyes principle many eyes will find even obscure bugs. It’s been said there’s inherently more transparency around open-source than your garden variety internal proprietary software project. But I think the balance there is that because open-source in particular these popular frameworks and components that are so often used in our applications because they’re so widely used, the blast radius of an incident can be really vast. I mean, we’re talking about frequently about open-source packages that are downloaded and incorporated into application builds billions of times a month, right? So, one of those packages being compromised in a supply chain attack can have really, really far-reaching implications. And I think that’s why a lot of the conversation around software supply chain security has recently focused on this part of it, the open-source software supply chain security challenge.
Jordan Harband 00:09:40 Yeah, I would add in the same way that a security attack can compromise things, a bug can cause problems too, right? And if I just work at one company, the likelihood is that my bug is that if I make a mistake, I’m probably just going to break that website. Let’s say it’s a web company. But as we’ve seen with AWS, for example, when AWS goes down, a lot of websites go down. When you have a shared piece of infrastructure and that shared piece has problems, you see the impact of those problems spread much more widely. And so, it’s the same with Open-Source, is that the type of innocent bug that one can commit can have a much wider effect than one can expect. One time I refactored a package and changed the text of an error message to make it better, to make it more helpful.
Jordan Harband 00:10:30 And I had no idea that some very popular packages were using the text of that error message and like in their code paths. And so I broke angular and ember users all over the planet by changing the text of an error message. Luckily, I was able to find it and fix it within like a matter of hours, but that wasn’t an attack, that was just a mistake and it still caused like widespread damage. So, I would say that in the same way that Donald phrased it as it’s a vector, I think when anyone says something is a major vector, are you talking about the scale of its reach or are you talking about it having outsized risk more than it “should”? And I would say that because open-source is used so widely across many different industries and companies and code bases, it is a major vector in terms of scale. Whether it’s an outsized vector or not, I don’t believe it to be. But obviously that part would be debatable.
Robert Blumen 00:11:25 In calling it major, you can, in fairness say that’s a vague term. And let me try to describe a little more of what I was thinking, which is when you run a package manager, it pulls in packages that you know about, but you didn’t write them and you probably haven’t done a thorough audit of those packages. And then it pulls in a ton of other dependencies, which pull in other dependencies. And by the time you’re done, you may have hundreds or thousands of packages. If you looked at the list of all the dependencies, there are probably some, you don’t even know why they’re there, nor did you write them. And I’m not sure it’s a case that it’s, you can control everything that happens within your own organization, but you certainly can’t control everything that happens outside of your organization. So, you’ve imported this enormous amount of code you didn’t write and that sounds like there’s some risk involved.
Jordan Harband 00:12:15 Well, and I think you just nailed the problem, right, is even within your own organization, especially as your organization grows, you can’t even control all the code written within it. I don’t think the people deploying facebook.com, all of their internal code that’s used on it, I am quite sure that it is written by people across the company who may not even work there anymore and who knows how much, how audited all of it is. And it’s the same for any company Google or IBM or whatever. I think that when you acknowledge that you cannot ever reliably audit all of your code, whether it’s internal or external, then it’s the fact that it’s external no longer differentiates anything. And in fact, then you need to design your processes around the reality that you cannot audit all of your code instead of based on the fear that you haven’t. And once you do that, then you’ve solved that problem for internal or external code.
Robert Blumen 00:13:11 Donald, do you have something to add?
Donald Fischer 00:13:13 I would just add to what Jordan said there, that certainly is not possible for every organization to audit every line of every bit of code that’s written continuously all of the time as applications are being assembled. However, there are well-known, well-established principles for secure software development that can be used internally in an organization for the code that’s written there. And one of the opportunities that’s coming to the forefront in the recent conversations on the open-source software side is to establish what those standards and processes should be for third-party open-source packages and then provide both a way to communicate that to the Open-Source creators who we rely on and verify that those practices are being followed and really importantly create an incentive there as well — a recognition of the work that’s being done that we all get to rely on. So, in a lot of ways that’s taking some of the process and standardization that’s been designed for software development within an organization and extending that in a useful way to these third-party independently authored open-source projects as well.
Jordan Harband 00:14:24 I agree with your nuance. So, there was a thing floating around the internet a few years ago about how you should design your program to crash. Meaning instead of trying to make a program that never crashes, you should design one that at all times is saving its state, and at all times can be reopened and restore itself. What I like about this philosophy is that it’s accepting the reality that all programs will crash, and it is instead designing processes around that reality to make it so that a crash is not a big deal, you just reopen it. And so similarly, it’s not that code is un-auditable, it’s that you’re never going to read every line of code. And so how do you design your processes around that, assuming that code will be compromised, assuming that there will be bugs and secure processes, immutable package repositories, public databases of checkpoints and attestations that have been made about individual immutable packages are all really wonderful ways that you can design around the reality that every bit of code, no matter its source, will eventually have bugs or be a security attack vector.
Robert Blumen 00:15:34 Where you might be going with that, Jordan, is there’s a well-known security concept called defense in depth, which is the idea of having multiple layers of security. The example would be if someone gets into your network, but they don’t have root on all your machines, if they get on 20 your machines, they don’t have credentials for all the databases and so on. So, the entire system should not be compromised if one layer fails. Is that more or less where you’re going with that observation?
Jordan Harband 00:15:59 Yeah, I think so. I’ve also heard that described as mote-based security where you have a ring of motes and it’s doesn’t, each one that they get in doesn’t give them the entire castle. It just gives them a slightly different advantage.
Robert Blumen 00:16:11 So there are a couple directions we could go with this discussion of auditing and how do you manage as large amounts of code that you didn’t write. One of them is scanning and I do want to talk about that, but right about now we’ve been talking about Open-Source. I want to cover, there are some efforts going on in the community by organizations such as the Linux Foundation and Open-Source Software Foundation to improve the overall security profile of Open-Source software. Is either of you familiar with any of those efforts, Donald?
Donald Fischer 00:16:44 Yeah, in fact, I think we both had some engagement around some of those efforts. So, you mentioned the Open-Source Security Foundation or Open SSF, that’s an industry collaboration where Tide Lift (my company) has been one of the participating entities that gathered to work on some of these challenges. And Open SSF had some really useful kind of umbrella of projects, some that were pre-existing and sort of adopted into the efforts, some of which are novel. One that in particular I think has been really helpful in this conversation is the Open SSF security scorecards project. So what security scorecards is about is essentially agreeing on what are some of these practices that we would all like our software, the creation of our software to adhere to, right? Whether it’s security of the systems that are being used to host the code or to package the code, or whether it’s specific secure coding practices that are being taken or different kinds of preventative analysis that can be undertaken around Open-Source codes such as fuzzing it or supplying it with random inputs to find potential avenues of vulnerability.
Donald Fischer 00:18:01 So the security scorecards project has been great because it’s been a way to sort of catalog or inventory what’s the wish list of what we would like all of our software to follow, in particular open-source software projects. And that’s an important part of the challenge is just agreeing what our desired state would be. Then I think some of the opportunity in the industry is how can we cause that to happen, right? How can we get the right folks, including companies and organizations like the ones that are predominantly authoring standards like the Open SSF security scorecards are contributing to them, but also the independent creators. I mean, know folks like Jordan, frankly, who are authoring a lot of those Open-Source projects outside of a traditional corporate context. We need to meet in the middle there and convene these two audiences if we’re going to turn it from being just a wish list of attributes that we wish software could have to a list of attributes that we have confidence or know that software has, because we’ve engaged with the actual creators.
Robert Blumen 00:19:05 The industry could agree on what these attributes are. Would there be a standardized way for an Open-Source project to either get certified or to advertise metadata saying, I have been fuzzed, my code is on a secure server code has been audited, it has passed this and this and this. It could be read by machines that would enable you to score your own code base.
Donald Fischer 00:19:30 Yeah, well, the Open SSF security scorecard project does include various ways for some of the stacks to be communicated in machine readable formats. But some of them require a human attestation right around what’s happening. And so that attestation can be digitized, but it cannot necessarily be determined without the involvement of the creators there. Right? And I think this is an important point that gets to your defense and depth argument earlier. The traditional approach in the industry has been very much a downstream tools-based approach. It’s been centered around different software tools that we can apply against either the code that’s being written inside an organization or the code that we import, packaged, third party Open-Source code that we import, looking for known vulnerabilities that have an identifier, et cetera, et cetera. But there’s only so much that a downstream tool like that can do by looking at just the code itself without actually having a connection back to the moment of creation. The human or group of humans who authored that code and knows all the surrounding context about how that code came to be and what standards and processes were being followed.
Robert Blumen 00:20:50 Do you have any thoughts on how to incorporate that human context into the broader supply chain?
Donald Fischer 00:20:56 Yeah, so I mean, this is one of the challenges that we’re working on at Tide Lift is creating both an incentive mechanism that makes it clear to those independent maintainers that we would ask of them. And then rather than just placing yet another obligation on these folks who are creating this code that we rely on in our enterprise and organizational applications, we’re also providing an economic incentive to go undertake this non-trivial work to cause these practices to be followed and to document how they’re being followed in this combination of machine readable and human validated mechanisms. I think Jordan probably could speak to this as well.
Jordan Harband 00:21:41 Yeah. Without speaking to the merits of any particular system, the world is largely capitalism and thus the only, it’s not a pure capitalism, we have regulation, but that means that there’s really only two mechanisms to exert leverage in the world. Capital and the law. And so, if we want something to happen, like if we want Open-Source software to be more secure, those are the two mechanisms. Spend money, put money towards it and or regulate it and provide incentives slash penalties to make what we want to happen. And the way that the tech industry has been treating Open-Source for decades has been essentially as a source of free labor. But they keep wanting things from the Open-Source ecosystem. We want you to make your software secure, we want you to fix bugs, we want you to release on a timely fashion and so forth. Well inject capital. That’s where we’re at. If you don’t like that as a mechanism and you don’t want to lobby for regulation, then vote for a different economic system. I mean, similarly to my earlier comments, we need to acknowledge the reality that we’re in and the reality we’re in is capitalism. So, spend your capital to make it happen.
Robert Blumen 00:22:54 Any improvements, we might talk about Jordan, at some point they involve, someone has to do them, or they involve resources like servers and compute time. Your point is that as resources have to come from somewhere and the people providing their time have other things they could be doing. And so, if there’s funding to make these improvements, that would encourage them to be done.
Jordan Harband 00:23:16 Yeah, and I mean, so I can speak to myself personally as it relates to Tide Lifts. Like at Tide Lifts as a funding source. So, I am doing hopefully all the best practices of software maintenance for all of my packages. And I’m doing that largely out of a sense of ethics and ideology. However, if at any point in the future I decided those were no longer enough for me, that would suck for the entire ecosystem, that is depending on my packages. The best way to make sure that those incentives remain even if my mind changes, again is capital. If I have a source of income related to it, then I’m going to think a lot harder about changing my mind on a whim about ethics and morals and whatnot. I think we can’t, again, acknowledging the reality we’re in, at least in the US, money is required for healthcare, for raising your children, for having a home, for buying food. And these things are pretty important. So, if you want to make sure that something happens, you tie those things to that thing happening and money is the, the glue to tie them together.
Robert Blumen 00:24:24 So I’m going to tie up this discussion about open-source and move on to something different. We have been addressing this issue of how you have some understanding or risk mitigation of large amounts of code that maybe you wrote or maybe other people wrote it. There are different types of automated scanning that could be put into the pipeline. Would either of you like to address the role of automated detection in preventing exploits from being injected into the supply chain. Donald?
Donald Fischer 00:24:58 Sure. So definitely there is a role, an important role for different types of application security software tools that can scan for different kinds of vulnerability patterns or known vulnerabilities using different approaches. There’s a couple different subcategories of that class of tool or product. For example, there’s a long-established category of static software analysis tools that look for the structure of software programs and look for coding patterns that can be suggestive, leaving open security vulnerabilities. There’s a second class of products that are commonly called the dynamic scanning tools that more are essentially running different kinds of exploitation paths against whole applications or parts of applications, often following common exploit paths like cross site scripting or techniques like that. And then there’s a third category certainly pertaining to Open-Source software that’s had a lot of adoption in recent years: the category of software composition analysis, the tools that help you look at what are all of the ingredients flowing into your application and cross-checking those against lists of known vulnerabilities — oftentimes drawn from the national vulnerability database — these security vulnerabilities that people know from their naming convention, the CVE vulnerabilities.
Donald Fischer 00:26:27 And so all of those are, I think, great tools that are well worn and time established as part of your defense in-depth approach. But I think that the moment now is demonstrating to us that that’s not the sum total of everything that we could be doing around software supply chain risk management. There are other things that we can do. One of the things that I think personally I’m very fascinated by is how we can add a layer of kind of in advanced upstream or proactive engagement with the Open-Source creators behind our projects to agree in advance what are some of these practices that we would like to see followed and engage in the process of creating and releasing the software before it even flows into our organization. I saw that recently echoed in the words of a senior bank executive who I saw speaking just last week, who said their philosophy at this global bank is the best way to deal with security vulnerabilities is to head them off in the first place, not just to have the fastest scanner. Because if you’re relying just on reactive scanning tools, there’s always going to be a window of vulnerability no matter how quickly you can go. But if you can head off the issue entirely using a combination of some of these tools like static analysis and dynamic analysis, but I think now importantly, agreeing on the practices that need to be followed before that software flows into your organization, that’s really the best solution.
Robert Blumen 00:27:57 If I understand that what you’re talking about is the Open-Source packages would adopt not only scanners, but other security practices on their own, and then there’d be some way of communicating that the project has adopted those practices. So you know that you’re pulling in only projects that have adhered to a certain level of best practices. Is that more or less what you’re talking about?
Donald Fischer 00:28:23 Yeah, and that connects back to, we were talking about what are some of these lists of standards that folks would like to have their software adhere to, like the Open SSF security scorecards project. But not only that, I mean there’s also now requirements coming not just from industry but also from government. So, Jordan talked a little bit about the role of capital and law. Law is showing up in this conversation recently as well. So for example, you have the National Institute of Standards and Technology introducing the Secure Software Development Framework, SSDF, that is binding on, a large number of US federal government agencies in terms of the software that they develop in software that they acquire and use. So, I think it’s a combination of both of those incentives, or methodologies that are coming to bear in this problem space.
Robert Blumen 00:29:20 And what are some of the major elements of this framework?
Donald Fischer 00:29:23 So the Secure Software Development Framework, it’s pretty far-reaching in terms of different, some of the things we’ve been talking about, like different tools that can be applied to analysis of software. Also the practices in terms of the security of the systems on which software is developed and released. There’s requirements around documentation and reproducibility of software so that you can know that there’s, you’re going to get the same result every time and that you can recreate software reliably. So it’s pretty far reaching and actually pretty intimidating to your average software developer, whether that’s somebody building an application in a downstream organization, much less essentially a volunteer independent creator working on an Open-Source project in the open-source community.
Robert Blumen 00:30:14 Let’s say that you implement some of these practices in, this could be having to do with scanning or reproducibility, but you have different layers of automation in your pipeline, and they detect issues. What kind of a, either machine or human workflow process do you have around evaluating? Can you still ship the software? Is this a bug that you can live with? How do you investigate it? How do you fix it, Donald?
Donald Fischer 00:30:44 Yeah, so one of the practices that is I think increasingly being recognized and recommended in industry is as much as possible to both standardize but also centralize this activity within an organization. So, one of the really painful things that has happened in recent years with the accumulation of multiple different kinds of tools and methodologies and all of this expertise required around vetting the open-source software that flows into your applications in that sense of a software supply chain, is that there’s a lot for individual developers to keep track of, right? The tools that they need to become expert in. And actually there’s often a lot of repetition and kind of wasted effort investigating the same issues and named vulnerabilities over and over again in the context of multiple applications within one organization, much less before you get to how that repetitiveness is felt upstream by independent open-source maintainers.
Donald Fischer 00:31:48 Maybe something that, that Jordan could talk about a little bit. So, one of the approaches that we see organizations adopting that follows the lead of some of the most sophisticated software development organizations in the world, like Google and Netflix and LinkedIn, who have written about their practices in this area, is to centralize the vetting of the third party Open-Source that’s flowing into their applications. Do it once within your organization to at least save you from having every developer reinventing the wheel for every vulnerability that emerges. And I think one of the really fascinating opportunities is to get centralized that body of work across organizations, right? By pushing that vetting and validation as far upstream as possible, that’s one of the places where we can actually partner with the creators behind open-source software to address it for the benefit of many organizations. That’s going to be the most efficient way to approach this. And it also creates the best developer experience allowing application developers to focus on the thing that they’re trying to build, not constantly inspecting and redundantly evaluating the building materials that are going into it.
Robert Blumen 00:33:02 In your view, Donald, then could there be a department within a large organization, or maybe for small companies, there could even be independent third-party quality assurance vendors who would say things like, log for J, version two X has passed. Not only do they attest they follow best practices, but we’ve done some external validation on them that they didn’t have input into and we think they’re good. And then you could rely on that to some degree.
Donald Fischer 00:33:31 Yes, very much so. So, within organizations we see this mandate for validating the third party Open-Source that’s flowing into applications becoming part of the scope for teams that have previously been focused on establishing a standardized software release process or DevSecOps pipeline, basically the standardized mechanism in these organizations for building and releasing software. We see those teams increasingly looking for solutions around third party Open-Source vetting as part of their mandate. And I definitely believe that there’s a role for products and services to enable those teams. I mean, I’m kind of talking my own book here. That’s really the area where my company Tide Lift is focused on delivering capabilities to those centralized software release teams around Open-Source software. And again, the way that we are most novel, a part of how we’re approaching it is by involving the independent Open-Source creators who authored those projects or maintain them in the process of validating that they meet these standards that we all believe are important.
Robert Blumen 00:34:45 So wrapping up this discussion we’re having now about standards and validation, something else I wanted to cover is the idea of chain of custody. We’re pulling things down from GitHub or other sites or software is distributed. Most built systems have a series of steps that are building artifacts and handing them off. What are some things that can be done to ensure the integrity of artifacts and that at each point you’re getting the thing that you wanted to get, that it hasn’t been in some way compromised?
Jordan Harband 00:35:19 I think that once you have an artifact, so for an MPM package that’s a tar ball for a RubyGem, that’s the gem file, and so on. Once you have an artifact, it’s relatively easy to track and make sure it’s not tampered with because you use some sort of check summing mechanism and then you pass that check sum result through a verified communications channel, and then you just see if it continues to match up. The challenge is all the steps before that artifact. So, that includes the pull request that landed the code that includes the repository itself, the build process to generate that artifact. And that’s a little trickier given that most languages and tooling most ecosystems don’t have deterministically repeatable build processes. Some do, and certainly there are attempts to build those in probably every ecosystem, but it’s simply not currently a thing that you can take a GIT repository, let’s say, and deterministically produce an identical artifact every time. You can get real close and you can use heuristics. So that’s sort of the challenge that I believe the SLSA project is looking at under the Open SSF. And there’s a lot of attempts to kind of address that problem. But I think that personally, I think that our heuristics-based approach is the one that will work the best.
Robert Blumen 00:36:49 What would it mean for us to be identical every time? And why is that difficult?
Jordan Harband 00:36:55 I mean, by identical every time I just mean that the like bite for bite the artifacts are the same. Sometimes it’s because the timestamp, the current time is used somehow in the build process to generate output. Sometimes it’s because random numbers are used. And so every time you rerun the build process, you’ll get a random variable name or some in somewhere in the output. When you have those sort of points of variability, that’s where you can apply heuristics to say, well, this chunk of the code may change in this type of way, but the rest of it will be identical. And so, you’d basically extract the artifact and compare the results. You sort of attempt to, you like repeat the build process and compare and say, well here’s where I expect there to be diffs and here is the format I expect them to differ in, and so on. So, I think largely those are like, there’s a long list of ways that a build process can be non-deterministic. Those are just two of the most obvious I think.
Robert Blumen 00:37:53 Does either of you have an opinion on the role of code signing where you’d set up something like a PKI, you can ensure that the thing you got is from the trusted provider that you wanted to get it from?
Donald Fischer 00:38:09 I think Code Signing is one of the well-worn tools in this problem space. I mean, it’s not particularly novel, right? So code signing, I can think of when we were doing code signing, when I was part of the team at Red Hat 20 years ago or more, and we didn’t invent it there. I think one of the things that’s been encouraging regarding code signing is that in a sense it’s become more democratized, it’s more available to different kinds of software projects. So, back in the day, and probably even today for the sort of Linux distribution model of code signing, there’s fairly elaborate process and even technology set up. We would have special build environment and keep the signing key in a secure enclave typically some specialized hardware there to keep that really locked down.
Donald Fischer 00:39:02 And so, it could only be applied in that special context to vendor distributions of open-source very broadly. Contrast that to today where you have some really innovative projects. Like a great example is, a Sigstore, which has taken a lot of inspiration from projects like let’s encrypt in the general web secure communication space and applied those in this code-signing realm by making it easy to get, assigning key to manage a lot of the complexity around the key verification infrastructure and the services required there, even just to remove a lot of the cost and overhead of doing code signing. And that’s increasingly being adopted across many of the Open-Source software ecosystems that are dominated not by corporations with folks doing this on staff as part of their professional responsibilities, but, largely by, again, volunteers doing this as a contribution to the community of technologists in these open-source ecosystems. By making it easy and convenient to do, we can make it a lot more pervasive. And I think I see that playing out.
Robert Blumen 00:40:19 There’s a concept I’m running into while I was researching this and it is also included in a part of the US Department of Commerce standards for software development. That idea of a software Bill of Materials. Can you explain what that is and what does it have to do with supply chain, Donald?
Donald Fischer 00:40:41 Sure. So, the idea of a software Bill of Materials is basically that of an ingredients list for your software application. It’s intended to be a comprehensive list of all of the components that comprise your application, whether they be, again, third party Open-Source components that are coming from public software repositories or whether they’re coming from commercial vendors or internally authored, right? And one of the ideas of software, bills of material is that they allow you to at least know what you’re using, which is a great reference point if you want to go and look at for one thing, are there known vulnerabilities that you need to address in the versions of software that’s flowing into your applications, or a little bit more proactively, what are the known facts or attributes? What standards do these components that are flowing into your applications meet? So that certainly is a reasonable motivation.
Donald Fischer 00:41:39 One of the things that has happened though with the recent popularity of software Bills of Materials is there’s a little bit of backlash, right? From practitioners who protests that just having a list of the components doesn’t necessarily solve the problem, right? It’s a prerequisite for being able to answer a lot of questions about your software application, what’s flowing into it, but just having a list of the packages and the versions and so on, it doesn’t make that software secure. It gives you a basis for starting the investigation or the analysis of whether it’s secure and on what basis.
Robert Blumen 00:42:18 Now, if you had this list of all the packages and versions you’re using, combined with our earlier discussion, that if the projects themselves advertise in machine-readable format, some of these standards they follow, audits they’ve passed, or you can correlate that against a database of known vulnerabilities. It should be possible to assemble from that a kind of an overall profile of vulnerabilities of the Bill of Materials as a whole. Donald, could that be the step you need to take beyond just assembling the list in order to have it be useful to you?
Donald Fischer 00:42:56 Yes. So, one useful phenomenon that has emerged is there are standardized formats for specifying that list, right? So, two of the leading standards are Cyclone DX is one, and SPDX is sort of an evolution of an existing standard are standard ways to express the Software Bill of Materials. But yet, to your point, software bills of material, they’re really like an index of the things that you want to go look up facts about, right? And so, the next stage of this conversation and where I see the industry as well as regulations heading is where is the table of facts about those open-source components that you’re looking up and that you’re where you’re basically creating these judgements as to the risk profile or the health, depending on how you want to think about it, of these individual components and how that relates to the health of the application that you’re assembling out of these components.
Robert Blumen 00:43:55 We’re close to end of time, Jordan. I want to close out with getting your thoughts about how organizations manage the risk of their people. It’s not only in relation to supply chain that employees pose a security risk, and that could be through malice or simply through errors. Companies will all have adopted things like authentication, role-based access control two factor. Are there any processes, tooling or education that specifically relates to securing the supply chain through vulnerabilities of your people?
Jordan Harband 00:44:34 One way that I would answer that question is something that a lot of companies have had to grapple with through the pandemic: If you previously tried to have a couple remote employees and mostly in-office employees, what ended up inevitably happening was that those remote employees would be effectively second-class citizens of your company because you were designing all of your meetings and processes around in-person, and it just didn’t work. And so, in order to make things work equally well for remote folks, you have to design your processes around them initially — around the least amount of physical presence. And then that also kind of works fine for people that are in the office or who are present. So similarly speaking, if you design all of your security processes around the people whom you have their social security number and their paycheck and all of their information and the ability to fire them, then you have not accounted for the largest group of people that is part of your infrastructure.
Jordan Harband 00:45:39 So I’m not saying you shouldn’t have all of those security mechanisms, of course you should. But if you design your process such that it allows you to feel secure around people over whom you have no hold and whose identity you do not have exhaustively detailed, then you can just be extra secure about your employees. But you’ll be kind of secure for the baseline of everyone, including all of your open-source maintainers, including all of the random employees at your data centers, the janitor who’s walking right by your billion-dollar server with their mop and broom, right? Like, that person could screw you over just as easily with a mistake or with malice. But like you need to kind of design around all of those sorts of things, right? It can also be, you could handle inclement weather, right? It’s all in the same bucket of, there’s a lot of stuff beyond your control. And by acknowledging that reality, you can design processes that attempt to address that baseline instead of trying to artificially restrict your mitigation.
Robert Blumen 00:46:40 I think that’s a very sound thinking, but what I’m trying to get at, and maybe you’ve answered this is, is securing the software supply chain, does that simply mean good security practices and there’s nothing special about supply chain, you just need to think security all the time? Or is there something different where you really need to do this and this practice because of supply chain that you wouldn’t necessarily care about if you weren’t thinking about supply chain?
Jordan Harband 00:47:12 Well, I think the argument I’m trying to make is that if one was applying what I consider to be a better model of thinking about security, then the supply chain would be no different. But because, historically, companies do not apply this sort of egalitarian approach to thinking about security, the supply chain is different in that it introduces something that kind of a forcing function that makes you start thinking about and accounting for these things.
Robert Blumen 00:47:42 Okay, I got it. I think your point, and if I’m getting at this time is that the supply chain is one of the areas where you have the most people who are not your employees who are possibly impacting your security. And that is where that broadening of thought process will then encompass those people. Did I get it this time?
Jordan Harband 00:48:02 Yes, exactly right.
Robert Blumen 00:48:04 Okay. Donald, any closing thoughts?
Donald Fischer 00:48:07 Yeah, I think another way of expressing that is every modern software development team, that used to be sitting in your office perhaps and is now increasingly distributed post-pandemic in remote settings, it also includes another ring of humans who we haven’t really appreciated as much as we should before. Those are the humans creating these open-source projects that get incorporated into our applications and actually usually make up most of the code in a fully formed modern application. Right? So those folks have been there the whole time. There’s more of them all the time and we’re asking more of them. And I think that the software supply chain security conversation is forcing folks to realize that we need to work together collaboratively with those open-source creators to achieve the outcomes that we desire for our internal teams and the applications, the software, that they’re creating.
Robert Blumen 00:49:14 Donald, would you like to say anything else about either Tide Lift or where people can find you on the internet?
Donald Fischer 00:49:20 Sure. So, at Tide Lift our mission is to help organizations tackle challenges like this around the open-source software that they are relying on, and specifically doing so by working with maintainers like Jordan and many others to validate that their packages meet some of these defined software development practices and standards. So, if you’re interested in that topic area or your organization is facing those challenges, like so many are, you can find out more at tidelift.com, T-I-D-E-L-I-F-T.com.
Robert Blumen 00:49:53 Jordan, would you like to either mention any projects you’re working on or where people could follow you?
Jordan Harband 00:50:00 I would say the best place is my GitHub, LJHarb is my username. Historically, I would’ve also pointed you to my Twitter, but that may or may not be the best choice. As far as which projects I’m working on, I mean, I maintain hundreds of projects, so that changes hour to hour. But yeah, I think the biggest thing I would say is the amount of reach that certain open-source projects have greatly outpaces the amount of capital that companies are investing in having those projects be sustainable. So, everyone listening, I would encourage you to advocate for your employer to — through Tide Lift or through other means, or both — to inject capital into the Open-Source ecosystem and help us all make it a more sustainable and secure place.
Robert Blumen 00:50:47 And for the listeners, we’ll link to those Tide Lift and LJ Harb in the show notes. Jordan Harbin and Donald Fischer, thank you very much for speaking to Software Engineering Radio today.
Jordan Harband 00:50:58 Thank you for having us.
Donald Fischer 00:51:00 Thanks so much, glad to be here.
Robert Blumen 00:51:01 For Software Engineering Radio, this has been Robert Blumen. Thank you for listening.
[End of Audio]