Search
SE Radio Guest Marc Brooker

SE Radio 710: Marc Brooker on Spec-Driven AI Dev

Marc Brooker, VP and Distinguished Engineer at AWS, joins host Kanchan Shringi to explore specification-driven development as a scalable alternative to prompt-by-prompt “vibe coding” in AI-assisted software engineering. Marc explains how accelerating code generation shifts the bottleneck to requirements, design, testing, and validation, making explicit specifications the central artifact for maintaining quality and velocity over time. He describes how specifications can guide both code generation and automated testing, including property-based testing, enabling teams to catch regressions earlier and reason about behavior without relying on line-by-line code review.

The conversation examines how spec-driven development fits into modern SDLC practices; how AI agents can support design, code review, documentation, and testing; and why managing context is now one of the hardest problems in agentic development. Marc shares examples from AWS, including building drivers and cloud services using this approach, and discusses the role of modularity, APIs, and strong typing in making both humans and AI more effective. The episode concludes with guidance on rollout, evaluation metrics, cultural readiness, and why AI-driven development shifts the engineer’s role toward problem definition, system design, and long-term maintainability rather than raw code production.

Brought to you by IEEE Computer Society and IEEE Software magazine.



Show Notes

Related References


Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Kanchan Shringi 00:00:18 Hi everyone, welcome to today’s episode of Software Engineering Radio. Our guest is Marc Brooker, VP and Distinguished Engineer at AWS, and our topic is Spec-Driven Development. If you tried to use AI to write code, you’ve probably noticed how productive web coding can be, but also that things start to break pretty soon. Spec-driven development aims to correct that. So rather than treating prompts as a main artifact, it treats specs and intent as a source of truth and uses them to drive the code, the tests and reviews. Welcome to the show today, Marc. So great to have you here. Is there anything you’d like to add either to the topic or your bio before we get started?

Marc Brooker 00:01:03 Yeah, well thank you. Thanks for this opportunity to talk about this topic. I’ve been building software in various ways for the best part of 30 years, maybe slightly over 30 years, and this is so exciting to have this new set of tools in my toolbox as a software engineer and as somebody who’s really interested in building software and more broadly building systems to serve customers with software.

Kanchan Shringi 00:01:25 So you said you’ve been doing this developing software in different form or fashion for a long time. Could you help us by summarizing the current SDLT best practices?

Marc Brooker 00:01:37 Yeah, well, I think obviously that varies from organization to organization and team to team, but the way that I see that is when I think about a high performing software team, this is a team that is iterating quickly. They’re very close to the customer in gathering requirements. They’re very close to the business in gathering requirements. They’re working to get feedback from the business and from the customer about what they’re building as quickly as possible. Often, they are operationally close to running the things that they are building, especially if they’re cloud services or software as a service. And they are working closely across multiple disciplines between software teams, designers, product folks, customer facing folks and so on. And we can talk about, get into the more kind of software development part of that, but for me it is really about structuring teams and organizations to be able to iterate quickly and be close to their customers and close to their businesses that lead to good outcomes for software development.

Kanchan Shringi 00:02:38 So you mentioned a lot about iteration and feedback loops and structuring teams. So far, a lot of teams have been aligned with the scrum or a similar form of iterative dev process. As we start to introduce AI into the mix, what are the things that you have been observing?

Marc Brooker 00:02:59 Probably the biggest thing that’s changing is the multiplier of overhead of process goes up. It becomes so much more of a big impact, right? And so, if I think about, hey, I used to be spending 70% of my time building software. I actually just writing code and testing code, writing designs and so on and 30% of time on overhead. If the speed of the software building portion has gone up by 2X or 4X, it’s a 2X right? Now, I’ve suddenly shifted the ratio from 70/30 to something that’s closer to 50/50. And as that software portion speeds up or the code building portion speeds up, so we also have to figure out how to speed up the rest of the software development process or we don’t get a ton of the benefit. And so that’s been this kind of second order effect I’ve seen with teams is they have embraced AI and embraced the fastest speed of coding that have gone with that is they’ve also had to pay a bunch of attention to how do we build better processes?

Marc Brooker 00:04:05 How do we make those processes lighter weight? How do we use AI to accelerate a lot of those processes? And that’s something that there’s a ton of conversation around now that I’m excited about. Then I think the other part of that is another big enabler in taking advantage of the speed of being able to build with AI is having great infrastructure for testing and for knowing that that code works. And so, the teams that I’ve seen get the best benefit from AI are the teams who have made investments in test infrastructure, in testing, in knowing what their software should do and therefore can move really fast and move really fast with confidence.

Kanchan Shringi 00:04:47 You spent a lot of time in focusing on all the things that surround the actual writing of the code, though a lot of the hype in the last year has been just on the coding with white coding being a term that nobody could not have heard of

Marc Brooker 00:05:04 Yeah.

Kanchan Shringi 00:05:04 How did you make that journey or how did you observe developers making the journey from white coding to now understanding that’s a small part or not the whole part of the whole experience?

Marc Brooker 00:05:19 I think that’s something of a natural journey. So, I start off vibe coding and it feels great, right? Like it feels so productive, and I get so much done and I generate so much code and then I realize, I’ve got to get this out into production. If I want to have an impact on my business and my customers, I’ve got to get it out into production. Oh, now I need to be able to test it, now I need to be able to validate it, now I need to be able to know that those features that I’ve built are the right features for the customer. So, it’s this natural thing that happens to teams of saying, hey, we’re going to pick up these vibe coding tools, it’s going to feel great, we’re going to feel super productive. And then this sort of and then thing happens and then I need to maintain this piece of software.

Marc Brooker 00:05:58 I need to test this piece of software. I need to listen to customers and feedback in what they’re saying and improve the piece of software. And that’s where it becomes really obvious that vibe coding, as great as it feels, and as useful as tool as it can be isnít everything and doesn’t solve the end-to-end software lifecycle. It solves a small part of that or accelerates a small part of that. And even if you accelerate that substantially, you haven’t yet accelerated the whole end-to-end process of what it takes to build a successful software system. But if I’m building something for my own use little hobby project, that’s okay, right? It’s fine. And I get a huge amount of the whole end-to-end lifecycle done just with vibe coding. If I’m building a piece of a critical service or an application that I need to put in front of my customers and meet Safety Bar and a security bar and a quality bar, well then, I need something more and even more so over the long term. If I need to maintain that software for weeks, months, years, decades, then I need a more, it’s a sustainable process than the one that this kind of pure vibe coding approach offers me.

Kanchan Shringi 00:07:11 You mentioned maintaining a couple of times and that’s key, but what about even reading the code before that? How does that work? How do code reviews work?

Marc Brooker 00:07:21 Yeah, I think this is one of the key questions that’s being asked in the industry right now is where do we go with code reading? Where do we go with code reviews? Code review, from my perspective, has historically been this incredibly useful tool for driving software quality, for driving alignment, for driving API quality. I’m not super optimistic about it being a way of preventing bugs, but I think it does drive up quality, it does improve code design and has these good outcomes, but as we accelerate or what is the role of that kind of code reading? And I suspect where we’re going to end up with for a lot of code bases is, and not all but a lot is code reading is going to go a little bit of the same way that reading Assembly Code did 50 years ago. We’re going to stop building in these high-level languages and people are going to say, well, if you’re going to build in a high-level language, you have to read the compiler output and check it’s right.

Marc Brooker 00:08:20 And people did that, right? They did that a long time ago, and then over time we stopped doing that because it just didn’t feel like a good use of our time, and we felt that reviewing the high-level language was much more useful and much more productive and led to a better outcome. And I think that’s more or less where we’re going to end up with AI powered software building is where we’re going to look at a lot of the code that gets generated, almost like it is this assembly code, right? It’s an implementation detail and we want to be reviewing and paying attention to is the specification, right? Is the writing of what should the software do and how are we checking it does that.

Kanchan Shringi 00:08:58 That’s key in terms of spec-driven development is how are we checking what the software is supposed to do and how did we provide that in the first place? Is that the key goal? Like can you maybe take a step back now and explain what is spectrum and development from your perspective?

Marc Brooker 00:09:17 Yeah, good to take a step back on that for sure. I think, as long as we’ve been building software as an organizational thing, right? As a team sport, part of that has been figuring out what should the software do and writing that down. And sometimes we’ve written that down as formal documents. Sometimes we’ve written that down as a series of tickets, sometimes we’ve written that down as a bunch of napkin sketches and some stuff on a whiteboard, and all those things are specifications. And I think in this new era of AI, what we’re finding is that writing down what your software can do, and we’re going to call it this word specification, that implies a level of formality that doesn’t necessarily require, but it is a description that is as crisp as it needs to be about what the software can do and what a successful piece of software looks like.

Marc Brooker 00:10:10 And then once we have that artifact, then we can start using that to do two things. One of them is build the software often with an AI assisted process. And if you look at the specification driven development flow in something like Kiro, this is an agentic process that takes that specification and starts building it with code. Then the other thing that we can do with that specification is build tests. And this is where techniques like property-based testing come in are super powerful. I can extract properties from that specification and build tests that assert that those properties are true about the piece of code that I’ve built. But crucially, that specification isn’t a static artifact. As I build the code, as I test the code, as I talk to customers, as I talk to my business, as I listen to the various stakeholders, I’m going to keep improving that specification.

Marc Brooker 00:11:03 I’m going to clarify it, maybe I’m going to generalize it in some ways, maybe I’m going to make some pieces of it much more specific. Maybe I’m going to add in some code snippets or UI Marcs and improve that specification. And then that is going to feed into improving the code and improving the tests. And so, this is an iterative software development approach, a short-term iterative software development approach, but one that is very explicit about writing down the goals of what it is that we’re trying to build and then really using AI tools to accelerate the building of that.

Kanchan Shringi 00:11:39 You mentioned Kiro? Can you just talk to our audience about what Kiro is?

Marc Brooker 00:11:44 Yeah, so Kiro is essentially three things. It’s an IDE, it’s a CLI command line interface. And most importantly, it’s a set of agents that we’ve built here at Amazon that we use for our internal development, and we can talk about some of our successes with that and is available to customers to, it’s a set of tools for agentic development. It supports the vibe coding mode, it supports the CLI based step-by-step mode. But most importantly in my mind, it supports this specification driven development approach where alongside the Kiro agent inside the Kiro IDE, you develop a specification of the software, then you use that specification to develop the software, use that specification to build the tests, powered by those Kiro agents, and then that feeds back. Then you go out to the world, you show the world your software, and then you feed that back into improving and sharpening that specification.

Kanchan Shringi 00:12:41 There are a lot of tools out there today. So, I’m just curious, what led you to build a new one?

Marc Brooker 00:12:49 I think we had two pieces of vision there. One of them was when we started this journey of building Kiro and building the Kiro agents, the world was starting to move on from the kind of using AI as smart auto complete, which was a nice interim step to this vibe coding mode of prompt-by-prompt development. And we had used that internally at Amazon and at AWS. And what we were finding with that prompt, by prompt development is it, it felt really good early on, but it wasn’t scaling super well to more complex code bases. And most importantly, it wasn’t scaling well over time, right? Like we would build software prompt by prompt, then we would write a prompt to add a feature, and it would go and undo a bunch of the stuff that it had done. It sort of forgotten requirements from earlier on.

Marc Brooker 00:13:35 And that got a bunch of us in different ways thinking about what if we could solve that requirement to over time problem. And that’s where we ended up with specification driven development of saying, hey, let’s write down all these requirements. Let’s keep all these requirements handy. So, as we iterate on a piece of software, we can refer back to them and say, does this piece of software still meet these other things that we were doing? Right? It’s like having a map rather than having turn by turn directions. So, building Kiro allowed us to really experiment with and then really invest in the specification driven development. The other thing, the other investment that we’ve been making is in code reasoning and neuro-symbolic AI where we use symbolic tools solvers and so on these more formal side of AI to reason about code. And we wanted to bring the power of those tools to software developers in an easier way too. And by building those tools into Kiro where they can do things like power, some of our work on property-based testing, we can bring the power of code reasoning tools and symbolic reasoning tools to programmers in a way that it doesn’t require them to be experts in formal methods or automated reasoning and so on.

Kanchan Shringi 00:14:52 Could you take an example and walk us through the steps of how somebody would use Kiro? And I’m really curious about an example of a property-based test that you mentioned a couple of times too.

Marc Brooker 00:15:05 Yeah, let’s pick an example. One of our recent releases, maybe not that recent anymore, we built Aurora DSQL, right? This great scalable multi-region active SQL database. And what the team realized that customers needed was a set of drivers to make that database easier to use from within Java. So, we could write a specification that said, here’s the driver that we need to build, here are the properties that it needs to have, here are the ways that customers should interact with it, sort of an API sketch, and then use that property that flow to generate the implementation of that driver. And then when we think about a property-based test, a property-based test would take out something from the specification, like every connection attempt contains an authorization token, and then build a test that tests in an automated way, lots and lots and lots of permutations of using the driver’s API and make sure that that property of every attempt to connect contains an authorization token is always true no matter what permutations of the API.

Marc Brooker 00:16:16 So either I put things in the API, and I get a valid request out, or I put things in the API and I get a helpful failure or helpful error message out. And so that’s quite in the small, it’s a relatively modest sized project, but a very useful one. Another bigger project is that we have been building a new fundamental piece of our inference infrastructure, also using Kiro, also using specification driven development, and that’s more of the kind of cloud service type rather than end driver. And so there the specification is about this is a cloud service, it does these things, has this architecture, it has this API, it’s written in Rust. And so, we want to use all safe code for example and then generate the code from that. And then we get these specifications or properties again, like if a customer calls this API, either it passes on the request in this form or routes the request in this form, or it gives the customer a useful 400 or 500 error message. And what the property-based test will do and what makes it so interesting is it isn’t going to test five or six cases like a human developer might. It is in an automated way going to go off and test hundreds, thousands, millions of different API permutations and make sure that property holds in all of them.

Kanchan Shringi 00:17:37 Can you maybe walk us through the stages potentially for, based on some of the things you said potentially may not be applicable to all projects, but when I think of this SDLC and from some of the pointers you had, the spec is the first thing, but could you use AI to refine the spec is the question then what about the architecture documentation, the design documentation, the code gen will follow that and then talk about code review too. You did say the generated code, you could start thinking of it really as assembly and not really necessarily to read it, but would you still potentially get a different model perhaps to validate the code by some form of code review? You mentioned a lot about test, but maybe just walk us through all the stages for a complex project and what would you then not necessarily do for a simpler project?

Marc Brooker 00:18:29 Well, I think step one for every project is figuring out what problem you’re trying to solve and for who, right? Like who is the customer, what do they need, what are the requirements of the business? And AI helps there. It can help me pull in a bunch of requirements, it can help me write documents, it can help me figure out what questions I need to ask different stakeholders but doesn’t fundamentally change that step of the software lifecycle. And that’s going to vary, right? If I’m writing a super important piece of code for a new AWS service, I’m going to go wide with that. I’m going to talk to a bunch of people, I’m going to really understand the economics of that. If I’m just hacking on a software project, that’s something I’m going to do in my head in a couple of minutes. And then we have the step of writing down the initial version, let me say, of the specification.

Marc Brooker 00:19:14 And then, yeah, I mean I can do that manually or I can have the AI help me with that. That can be I start off with develop a specification based on these prompts, based on these customer use cases, based on these customer anecdotes. And then I can read that specification, say, well no, that’s not quite what I meant. Let’s iterate on that together. And so that can be anything from me writing it down all at once to kind of vibe specifying, right? Like, hey, I’m going to go piece by piece, I’m going to go anecdote by anecdote and check that those meet in the specification. But what’s important is that I’m reading that specification, I’m thinking about it, is it the right thing? And it’s only an initial specification. So, we’re going to close the loop on that in a second. Then again, working with the AI, I am going to turn that specification into a set of design decisions, right?

Marc Brooker 00:20:06 Like is this a web service? Is this a microservice, is this a library, is it a UI application? Is it a web application? Right? Those are important design choices, that again are going to be driven by the needs of the business, needs of the problem I’m trying to solve. They aren’t kinds of decisions that the AI can make for me. They’re decisions I’m making based on my understanding of the problem I’m trying to solve. Then we’re going to make a bunch of implementation choices. I’m going to choose a programming language, maybe I’m going to choose some frameworks, maybe I’m going to choose some libraries, right? If I’m building a web app, I’m probably going to use TypeScript. If I’m building a new component for an AWS service, I’m probably going to use Rust. If I’m building a library for people to consume in Python, I’m probably going to use Python. If I’m doing some AI science and data analysis, maybe I’m going to use Python again, what frameworks do I want?

Marc Brooker 00:20:58 Do I want to use internal or external frameworks? Do I want to use open-source things? So, then I’ve got my specification, I’ve got this high-level description of my software design, and then I’ve got this low level description of these implementation choices. Then I’m going to step into, okay, let me go and generate the code, the first versions of the code and the first versions of the tests. And I’m going to do that working with the agent step by step, stepping through the specification and at each step making sure that I’m moving forward in implementing more and more of this specification as it comes. And then we get to the crucial part where I’ve generated the code, it’s passing all the tests, and now I’m going to see does it meet the needs of the customer or business? Does it do the thing that I’m building this software system to do?

Marc Brooker 00:21:44 And maybe it does initially, but then people will say, well, it would be really cool if you had a feature to do this thing, or it’d be really cool if this button was over there, or it’d be really cool if you redesigned this to be clearer in some way. And then I loop back, I loop back to the specification, I loop back to those design choices, and I iteratively changed them. And then we rebuild the code alongside the agent, we rebuild the tests alongside the agent and then run the software in production again. And so, it remains this iterative process, but it’s an iterative process where the core goal of the iteration is keeping a specification up to date and downstream of that is keeping code and tests and so on up to date.

Kanchan Shringi 00:22:26 Another benefit from I got from what you said is it gives you an ability to check along the way rather than just handing off the spec to the AI and then getting the final code generated. You are actually having a checkpoint where you read the design doc that is generated, make sure it’s correct, or any details of the spec itself and make sure it’s correct and so on. Is that right?

Marc Brooker 00:22:48 Yeah, that’s exactly right. And how long you let the AI run probably varies from application to application and goal to goal, but it does give you this checkpoint of checking in of am I headed in the right direction? Am I driving to the right place? So, do you get into an autonomous taxi, and you say, I want to go to this place, and you watch out the window and see, oh, are we going there? Or maybe I’ve changed my opinion, and I want to go somewhere else. But yeah, and it does give you these interim steps of, okay, before I go off and generate all this code and spend all this time and spend all these tokens, am I actually building something that is going to solve my customer’s problems?

Kanchan Shringi 00:23:28 A question about the moral that you use. Is there any benefit that you’ve observed from maybe using a different model for some of those stages, perhaps for code review?

Marc Brooker 00:23:40 Yeah, so that’s interesting. Actually, we need to talk about the code review question. I think tentatively yes, right? It certainly is useful to have at this moment with the technology where we are and things are moving so quickly that this might not be true by the end of the week, but from what I’ve seen at this moment, it’s really useful to have, it’s say, a separate agent doing things like code review rather than letting the code builder self-supervise. And so, you want this agent that’s going to look at the code as an artifact and say, is it meeting my goals? Is it well designed? Does it have good internal APIs? All these things that we think add up to good code maintainability over time. So, a different agent differently prompted an agent with different goals, and that’s what’s important for agents. Is it helpful to use a different model for that? I think tentatively yes. Again, I think we’ve seen some good indications that doing that code review with models of different sizes or models of different capabilities is useful, but I wouldn’t say that’s anywhere near as useful as the sort of fundamental multi-agent architecture of saying, hey, I want my code review to be done by an agent whose goal it is to drive for great code quality rather than the agent whose goal it is to implement the specification and build the code initially.

Kanchan Shringi 00:25:00 Earlier we talked about how you came up with this methodology. Do you happen to have a story where you can compare and contrast projects that were built white coded versus falling a spectrum in methodology?

Marc Brooker 00:25:14 I don’t know if we’ve done any kind of side by side between vibe coding and spectrum and development, but what we have found is, let me say that Amazon has a very, let’s say, service orientated architecture, kind of microservice architecture, and we have literally thousands of services within the company that provide certain things. And as we’ve adopted AI, what we’ve heard from the teams that own those services is they really started with this kind of vibe coding. And what they were finding was they would pull their code base into an IDE; they would prompt for a change. Often, they would see that change successfully implemented, but they would see also a regression. At the same time, the AI would kind of forget one of the things, properties about the code and be like, well, I’ve optimized this, but that breaks the API and makes the service not work anymore.

Marc Brooker 00:26:12 And so what we’re seeing internally at Amazon is going through the process of, I’ve got the service code. I’m going to extract from that a specification, even if it is a kind of method by method code to text process, and then I’m going to use that specification in a spectrum and development flow makes that agent that is building the next feature much better at not undoing goodness of the past and a separate agent that’s building tests much better at thinking about which tests should it build. And so, I don’t think we’ve done a lot of side by side, but I think what we have found as teams have embraced specification driven development, that specification abstraction is a step that helps them move a lot faster because it stops this two step forward, one step back dynamic that can happen with existing code bases with vibe coding.

Kanchan Shringi 00:27:04 I think that’s a good point and we’ll come back to that as we talk a little bit more about providing context. But before that, I’m curious, so Spec-Driven is a methodology Amazon built Kiro, but I’m sure Spec-Driven could be applied to other agent tech IDs. Is that a fair assumption?

Marc Brooker 00:27:22 Oh yeah, absolutely. It’s a methodology that you can use with any of the popular tools. It’s not as ergonomic as we’ve tried to make it in Kiro, but it’s a methodology with general goodness. So, one we recommend to our teams who are building software, whatever tools they use.

Kanchan Shringi 00:27:39 Thanks, Marc. Let’s go back to providing the context. I think you brought up a good point is that unless the spec itself or what you told the LLM had the context of the code, they were always regressions. Can we generalize that? Is it then true that white coding is more effective for new projects or for open-source projects?

Marc Brooker 00:28:06 I don’t know about open source, but I think we have seen at least anecdotally that if you build a code base with AI from the beginning, the AI finds that code easier to understand. Why that is I don’t really know, but anecdotally, that is true from my experience and similar to what I hear from my colleagues. I’ll also say that for me, the easy and fun feeling of kind of pure vibe coding works best in the kind of greenfield new project situation, but as soon as I have a significant existing code base that I’m working on, then I don’t find I get as much of a benefit from that kind of interaction. And so yeah, I would say that that five-coating flow is just maximally awesome and productive for getting from nothing to a great first prototype. But after that, it’s less scalable. It kind of peters out, right?

Marc Brooker 00:29:03 Whereas the specification flow loads a little bit more work upfront, and so there’s no hiding that. But what you get out of that is a much more sustainable ongoing process. An ongoing process that is much easier to maintain, much easier to iterate on, much easier to add features to without regressions, much easier to get great test coverage, much easier to get great QA coverage. And so, by loading a little bit more, not a huge amount more work, but a little bit more work at the beginning, what you end up with is a process that doesn’t drop off, right, a process that remains effective and productive over the long term.

Kanchan Shringi 00:29:41 So in terms of context, you talked about extracting the context and having that provided as part of the spec, but is that sufficient? I mean, the code base could be pretty large. You certainly couldn’t do that all the time. What other ways are there of providing context?

Marc Brooker 00:29:58 I really like that question, and I think for two reasons. Like one of them is, I think in a lot of ways, 2026 is the year of thinking about context management for AI agents. It really has become one of the most important problems in the world of agent building. So, this is something that everyone across the industry is going to be giving a bunch of thought to and a bunch of investment in this year. But then if we think about, okay, how do we think about that in this case of software building? And what’s really interesting to me is so many of the techniques that we’ve developed to make software and code bases understandable to humans also work really well in this context, right? Like we have always, from the early days of kind of structured programming, kind of going all the way back to, 60 years ago, thought about how do we build interfaces?

Marc Brooker 00:30:46 How do we build modularity? How do we extract functionality into libraries? How do we extract functionality behind APIs? Whether that’s a library API, or a local service API, or remote microservice API, we have made software modular, we’ve built APIs, we’ve built contracts, we’ve built protocols, right? These are all ways to take an extremely complex piece of software, limit the interactions and allow us as humans to reason about the way that that piece of software works. And in a lot of ways, what makes the difference between long-term successful software and unsuccessful long-term software is their ability to be built in a way that they can evolve without every change, being able to change everything about the way they work. And that’s been what has driven the popularity of service orientated microservice architecture, what’s driven the popularity of libraries and modularization for the whole of the history of software.

Marc Brooker 00:31:53 And so we can kind of think about that. Well in AI, also what we need to manage the context and help an AI understand our code base is modularity, is good design, is good APIs, is good documentation of the APIs and their contracts and what they mean and what you can assume and what you can depend on. And then once you’ve done that, then you can layer on top things like semantic search, right? I’m looking for the library method that does this. Well, here it is. And so, I don’t have to understand everything about the entire system. I can say this library call takes these inputs and has this effect on the system. So, I can reason more locally about the behavior of the system. So for me, kind of context management for software agents is very much like context management for human developers, is converting the impossible problem at scale of reasoning about every line of code in the system in their interactions into this more local reasoning thing of like, can I think about this one library call, this one change? And the more that we can design our software to make changes as local as possible, the less context you need to successfully make changes, the more you can well test changes and the more productive you can be. And I think that’s true of AI agents, just as it has been true of humans over six plus decades of software engineering practice.

Kanchan Shringi 00:33:21 So you mentioned one way of, I guess two ways. One was extracting context into the spec and then semantic search where you potentially have to create vectors for bunch of code in your code base. What else is there that you’ve experimented with? Do you have any guidance on somebody that are getting started now in this area, what they should do?

Marc Brooker 00:33:44 Yeah, I mean, I think there’s some really interesting work going on around indexing code bases and around code understanding and extracting understanding from code bases. But what I would encourage people to do who are starting out, getting into this kind of coding is to think about software design and think about internal interfaces. Think about types, think about APIs, and think about building your software in a way that well encapsulates the functionality of each component. And that makes it much easier when you say, to the AI or you change the specification to find the lines of code that are going to do things right? Like it’s hey UI dot or TS whatever, rather than, hey, this is just spread out through my code base. I can go and find the lines of code, I can go and change the lines of code and they are as local as possible.

Marc Brooker 00:34:46 And so, one of the things to maybe optimize for and pay attention to as you’re getting started with this, is you look at, I’m going to make a change to my specification maybe to add a small feature. How many of the modules of my system does the AI need to touch to make that happen? And if it has to touch every module in my system, well then, I’ve got a design problem that is going to make my AI driven development less productive over time. If it can make a small feature change by touching one or two of the modules in my system, one or two of the libraries, one or two of the services, however I’ve factored things, well then I’ve got a design that is likely to work over the long term and scale up well without blowing up context windows, without requiring sophisticated techniques to index and so on. And so, if I’m seeing that number go up, that kind of complement or number of modules I have to change to build a feature or to add some capability go up over time, maybe that’s time to step back and then work with the AI to do some refactoring to kind of tease apart modules, to tease apart APIs. And the good news is that AI and these agents and especially, once you’ve invested in great testing, they’re extremely powerful at doing refactoring tasks.

Kanchan Shringi 00:36:10 Let’s talk about your comment on blowing out context windows. When does that happen? And a related question is when is there too much context which actually harm outcomes?

Marc Brooker 00:36:25 This varies model to model and it has become less of a problem as we’ve had some slightly better model capabilities, but it is still a real problem, right? And so, if I go into a coffee shop and I say, what would you like to drink? And I say, I would like a latte, and by the way, it’s also 18 degrees outside and the Seahawks went yesterday. And not only is that going to waste a bunch of time, but it is going to confuse the person who, it’s like, why is this relevant? Why are you telling me all of these things is some way that you want me to customize your drink based on all of this extra information? And so, it confuses humans. And I think, almost by analogy, and it’s sort of a little bit risky to reason by analogy with AI, but almost by analogy, putting a bunch of irrelevant stuff into a context window makes those AI outcomes worse.

Marc Brooker 00:37:16 And so, this is where context management becomes so important, and there are two aspects of it. One of them is all of that design and modularity stuff that I just talked about. And the other is the AI tool, whether that’s Kiro or one of the many other AI power development tools have to do, have to manage what’s in the context window, making sure that there is enough context to give the AI the power to at least discover what it should be doing. But there shouldn’t be too much context. If it contains a bunch of irrelevant noise, we’re going to have a worse outcome? And so, if you think about like what do agents do? Well, one of the things that AI agents do is build their own context window through a process of discovery.

Marc Brooker 00:38:05 They’ll go and read that file, search this index, search you know, use this MCP tool to do a semantic search over the documentation and try and build their own context window. And again, one of the things that makes agent development challenging is making sure they build that context window as reliably as possible, containing as higher ratio of relevant to, to irrelevant information as possible. And so thereís kind of three things there. There’s, how do I build my code base and maintain my code base to make it kind of context friendly for humans and agents? How do I, as somebody who’s building development tools think about context management and then how do I as a developer of agents more generally think about guiding those agents to build their context windows effectively? And then the tools I give to the agents, what do they return? Do they return the right stuff? Do they give the agent the ability to request only the facts it needs and so on.

Kanchan Shringi 00:39:06 What about the process to make sure that the way you provide the context, the source of that is if it’s not the code directly is kept up to date. Is that something outside of the dev process or it’s kind of inbuilt there?

Marc Brooker 00:39:22 Yeah and this has been, again, not a new problem in the world of software. How do I keep my documentation up to date. And one of the real challenges of the kind of pre-AI era is, how do I keep my requirements and specification up to date and specification driven development has much made that much easier by making everything flow down from the specification and has helped with that specification freshness problem a huge amount. But there’s still a documentation freshness problem. Well, how do I keep my documentation up to date? What I did in the previous AI era is, I would spend time from my development tool or technical writers who would look for changes in the code and go off and update documentation. Now I can very effectively accelerate that process with AI and have an agent in my kind of set of agents that is doing my, helping me with my software development process that is going off and keeping my documentation up to date, keeping my read me up to date, keeping those API docs up to date.

Marc Brooker 00:40:32 And that feeds back into the next round of my process where I want to make a change. How do I know that that API doc is up to date because I have continuously kept it up to date as part of my software development flow. And so even when I’m vibe coding, even when I’m not doing the kind of specification development process, I always make sure that my prompts or my steering documents are directing the agents to keep a read me up to date or keep a piece of documentation up to date, both for my reference as a human and my customer’s reference as they use my software, but also for the reference of the agent itself as it makes the next set of steps.

Kanchan Shringi 00:41:15 Thanks, Marc. I’d now like to spend a little bit of time on how do you make sure that the configuration and the model itself is effective? Initially, you roll this out to the team and now the core group that’s maintaining the config, the context, the different workflows that somebody has to do, how do they keep them up to date? Let’s say there’s a new model that comes along; how do you make sure that it hasn’t regressed the flow in any form or fashion?

Marc Brooker 00:41:45 Yeah, and so I think the formal answer to that is, we have an evaluation process and that’s where we will take an agentic workload and probably starting off offline, run a bunch of examples through a new model or a new configuration or a new prompt and compare the success of the old versus the new. And that could be with a human looking at that could be looking at, are the tests passing? It could be looking at acceptance rate, or it could be using a pattern like LLM as a judge where it uses another agent essentially to rate the agent outputs and say, these ones are better, and these ones are worse. The next step of that and the one that I think is actually most powerful is, online evaluation where, okay, I’m going to start experimenting with this new model in production, kind of AB testing.

Marc Brooker 00:42:39 I’m going to send it 5% of traffic, I’m going to send it 10% of traffic, and then I’m going to feed back all of my success signals, whether those are LLM as a judge, whether that’s human feedback, whether that’s latency information, test success rate to look at, am I doing better with this new version? If it looks like I’m doing better, if those initial experiments are looking good, well, I’m going to ramp that up. Eventually I’m going to get that up to a 100% and then I’m going to be ready to do the next experiment. And then continuously in production, I’m going to continuously be evaluating the outcomes of my agent. Again, the metrics I use for that depend on the task, and I’m going to know over time, is my success rate good? And if my success rate suddenly changes for whatever reason or trends downwards, well then, I need to understand why that is.

Marc Brooker 00:43:33 Maybe my tasks have changed, maybe my customer’s needs have changed. Maybe I have some bad piece of context, maybe that read me contains some false information that’s leading my agents off in the wrong direction. But the really key thing here is to have from the agent builder’s perspective and here I’m not talking about the IDE user, they don’t need to really worry about this stuff, but me as an agent builder, I need to think about how do I have a robust set of evaluations so I can run a really data-driven improvement process around agents, around the models, around the prompts, around the context, around the way the context is managed. And the more I do that, or the better a quality product I can offer to my customers, the more reliable it’ll be. But it also helps me do things like, hey, I’m doing something with a really big model today. Can I lower my latency by using a smaller model? Answering that question in isolation, man, that’s next to impossible. Answering that question when I have a robust set of evaluators is much easier. And so, this is a process for agent builders, but it really goes back to the beginning of the whole question of what are we trying to achieve? What is the outcome that we want? What do our customers need and what is our business need?

Kanchan Shringi 00:44:55 What is the outcome? And then what is the success rate? But maybe just specifically, what kind of things should team be measuring? Should they be measuring bug rates? Is there human correction needed during human code reviews? Should they be measuring the divergence of the code from the spec? Could you share some of those thoughts on what metrics teams should focus on?

Marc Brooker 00:45:19 Again, like the most important ones are the end-to-end ones. Our customers seeing bugs in production, our customers seeing regressions in production, are our customers happy with the in-production performance. Now, those are the most important metrics. They’re also the hardest to measure. They’re also the ones with the highest risk because we want to catch things before customers are reporting bugs in production. And so, we’re going to choose some earlier in the process kind of proxies for those metrics. And that’s where, am I seeing failures in regression tests, for example? What is the test rate of failures in regression tests? If I’m using human code review, am I seeing a good pass rate for that human code review? If I’m using a separate code review agent, am I seeing a good pass rate for that code review agent?

Marc Brooker 00:46:11 Is it giving me good success? If I’m using a pen testing agent like the one that we announced at reinvent that does kind of last mile security testing, is that finding security bugs before things go into production? And so, if I’m seeing bugs being caught at my sort of last pre-production milestone, I know something’s not going well earlier on in my software development process. And that could be an inadequate kind of early in the software lifecycle testing. It could be a symptom of bad design and interfaces. It could maybe be a symptom of some bad implementation choices I need to go and fix. Or it could be a misalignment between the requirements of the tests that are going into the test and the requirements that are going into the software building process. And so, if one of these things says you need to return a result to the customer within 10 milliseconds, and the other one says 10 seconds, well, obviously I’m going to have failures.

Marc Brooker 00:47:12 And so there’s this tension in these metrics of the most valuable metrics are the ones all the way at the end of my software development lifecycle because they are what I really care about, they’re what my customers see, but that’s the most expensive place. And so, I want to push as much of that catching of things and those metrics sort of left in my pipeline as possible. Again, classic idea of software engineering, right? Catch bugs early. We’ve been, software engineers have been saying that for four or five decades, and that hasn’t changed.

Kanchan Shringi 00:47:45 Thanks, Marc. Now let’s talk more about how do you roll this out to the team? Are there any prerequisites? Is there a mind shift that has to happen? Or do you just have a meeting, you introduce the tool and tell everyone, hey, go use it?

Marc Brooker 00:48:00 Yeah, I mean, I think that varies. I think the biggest prerequisite is almost cultural right, too. So, what we’ve seen with adoption of new development practices is you need a team that has, great understanding of their customers and business and high standards, right? You can’t accept, I don’t know, slop, I guess is the term that people use these days, right? We don’t want that in our software processes, but we also need folks and see the fastest transformation with teams that have great curiosity, who are really interested in picking up new things and great bias for action of like, hey, we’re going to go off and try stuff and experiment with stuff. And so it’s that tension I think within teams that I’ve seen really drive success here is the teams who are like, well, we’re going to go off and we’re going to try new things, and we’re really interested in learning and trying and adopting new tools, but at the same time, we’re not going to settle for bad outcomes for our customers, for our business, for our engineering teams.

Marc Brooker 00:49:02 And when you put those two things together, I think that’s really where magic happens. Beyond that kind of cultural goodness, we also shouldn’t expect that teams can pick up a new tool and know immediately how to use them well, right? You need this kind of communication and sharing of saying, hey, we’ve tried it this way. These are the things we learned, these are the things that we want to pass on to you. You want to bake those into communication that, you have between your teams. And then if you’re building tools, you want to bake those lessons into your tools. And so again, I think another piece of success is making sure that as a team learns how to use the tools better or to improve the tools the whole organization learns.

Marc Brooker 00:49:49 And ideally the whole world learns, and we can kind of drive everything forward. And sometimes that’s big learnings of saying, hey we want to introduce specification driven development because we’ve learned that those scales better than prompt by prompt vibe coding. And sometimes it’s little things of saying, hey, we found that indexing our code this way or structuring our code this way works well. We found, for example, concretely internally at Amazon that building AI, using AI power development with a language like Rust, with its strong type of system and a bunch of compile time checking is really useful and helps accelerate development by catching bugs earlier in the pipeline. And so that’s a message that we want to send out to our teams and say, Rust works well, TypeScript works pretty well the type annotations in Python they’re worth investing in.

Marc Brooker 00:50:42 And so it is, that’s spreading of learning across an organization that is super important. And then maybe the last thought on that is, wow no surprise to anybody, this whole world of AI is moving so fast, right? There are new techniques, there are new tools,theyíre pretty seemingly every day and every week. One of the things that you need to balance for success is you want your teams to be paying attention to those new tools, to be curious about them, but you also can’t afford to go and chase every week’s trend day to day. And so, there’s a balance to be found there. If we want to adopt new stuff, we want to adopt the best things quickly, but we don’t want to spend all of our time just thrashing on trying the latest, the latest and greatest or most fashionable tool of the moment. So yeah, there’s a lot to put together into organizational success.

Kanchan Shringi 00:51:37 So certainly of course, curiosity is critical, but bias for action seems to be more important. Based on your comment though, with this, specifically with AI specifically, there’s a little bit of fear as well. How do you address that?

Marc Brooker 00:51:54 What kind of fear? Can you say more about the fears?

Kanchan Shringi 00:51:56 I mean with AI doing more and more is different things expected of the developer or less expected of the developer as we proceed on this journey?

Marc Brooker 00:52:07 Well, I think that the job of software development and more broadly the job of software engineering is changing. And our jobs are going to be very different in 2, 5, 10 years from the jobs that they are today. And with any major change there is going to come uncertainty. People are going to think about what does this change mean for me? And for the reasons that I went into this field and the things that I love about this kind of work. And I’m super optimistic about that, right? I think AI power development is freeing us up to spend more time on the things that matter and whether those things that matter are being closer to customers, being closer to the business, really understanding the big picture of the systems we’re building or getting really deep on the various parts of the design, getting really deep on protocols, getting really deep on efficiency and less time on the sort of undifferentiated busy work of building.

Marc Brooker 00:53:17 So yeah, I mean, it’s a time of really fast change in the industry and I think that is always going to come with some level of anxiety. I’m super optimistic that it is going to free us up to do a job that is more fulfilling and more valuable. And we’ve seen over decades how powerful an economic force software has been and over decades. And for the whole lifetime of software, it has been supply constrained. The economic impact of software has been constrained by how much great software we can build. And I strongly believe that if we can drive down the cost of software development with AI tools by 10X, there’s going to be at least 10X economic impact of software, if not a 100X economic impact of software.

Marc Brooker 00:54:08 And so, I think software and software engineering is going to be even more important in the future than it is today, but it is going to be a different job. It’s going to have different things and we’re going to need to optimize in different ways. And I think it’s very reasonable for people to feel challenged by that. I think it’s very reasonable for people to reflect and say, what does this mean for me? I can’t say anything other than personally as someone who’s been excited and passionate about technology, I think the future is just super exciting.

Kanchan Shringi 00:54:43 Thanks, Marc. So, starting to wrap up now, a couple of questions. What about the steps beyond the development of the software? What about deployment, incident management, et cetera? Is that tied to anything that you’re thinking of with Kiro or beyond that in AWS?

Marc Brooker 00:55:02 So, at Re-invent back in December, we announced our DevOps agent, which is an agent that is designed for doing these DevOps things in the Cloud, helping out with incidents, helping out with tickets, helping out with infrastructure as a service maintenance and so on. So yeah, another part of the work that software teams do, we announced our security agent, which is for helping with that pen testing and other parts of the kind of security testing lifecycle. We have AWS Transform, which is a set of agents for taking moving systems from one implementation to another. Whether that is small patching tasks like, hey, I want to pick up the latest version of Java to major software, kind of re-architectures of, I want to get off a mainframe and get into the Cloud.

Marc Brooker 00:55:52 So, it is vastly beyond just the work of building software. We’re going to use agents to accelerate the deployment of software to simplify the operations of software to simplify software maintenance. And that’s a huge one, right? I don’t know many software engineers who are excited about making sure I can migrate from Java 11 to Java 17 or whatever, 21 or whatever the latest one is these days. But it’s critical, it’s critical work for security, for efficiency, for all these other things. And so, that’s going to be a big one. The ability to change backend implementations much more easily and with lower risk is huge for engineering teams, right? Like, hey, I have all this legacy, I built these things. Maybe I wrote all this coding for Fortran. It is the crown jewel of my organization because it’s the only expression of my business logic.

Marc Brooker 00:56:47 How do I bring that into a modern implementation? How do I bring that into a modern architecture again. This is something that AI is really going to power for us. Testing validation, UI testing, there’s so much of the process around software, even updating tickets, all these things. I think these are areas that we’re investing in heavily in AWS because we have paid a bunch of attention to where software developers, both internally and in our customers spending their time. Software teams and organizations are spending their time, but they’re not as shiny and cool as software building. And so, all the buzz is around here’s this thing that helps me write more code more quickly. And I think that’s okay, but really the big picture is, here’s something that allows me as a software organization, as a leader of a software organization, or to deliver value to my customers more quickly and more reliably. And that’s going to take much more than just writing more code.

Kanchan Shringi 00:57:51 But still, you are writing code as you’re building these agents, you’re writing code to build these agents. Are there special considerations for generating code for agents itself?

Marc Brooker 00:58:03 Yeah, I think that is an area that is emerging. So, if I look at the kind of first few generations of agent building, very bunch of Python, bunch of kind of ad hoc techniques, it’s been very valuable. But over time, what’s emerging is can we mix in tools like symbolic reasoning? Can we add more structure? And so, for example, we built this cool set of features into strands, which is our agent framework called strand steering, that uses a neuro symbolic technique combining kind of models and symbolic reasoning to nudge agent trajectories in the right direction. We are seeing more tools that close that optimization and evaluation loop with improving tool descriptions and so on. We are seeing tools like the policy that we built into Agent Core, which allows you to take a kind of plain text description of a security constraint or an operational constraint, turn that into a crisp piece of authorization code in a provable language like Cedar and apply those to all the agents.

Marc Brooker 00:59:20 And so I think what we’re going to learn and are actively learning is how do we build software development practices and tools that go around agents that allow us to take advantage of the flexibility and power of agents, which is what we need, right? We want our agents to be flexible, we want them to almost be creative in their ability to solve problems, but we also need to do that in a way that is secure and meets the business needs and doesn’t do things that are dangerous. And so policy helps balance that. Strand steering helps balance that, we also need to reduce cost and that’s where things like model customization come in. And so, agent building is in this super exciting moment of a huge amount of both existing impact, but also a massive amount of innovation. And so, I think what we’re going to see is over the next few years, building reliable agents is going to become even easier and easier. And building cost effective agents with great ROI is going to be easier. And so that’s been our focus at AWS and it’s certainly a focus across the industry of how we take agent building from being this fairly ad hoc, almost kind of bunch of spells to being a real engineering discipline.

Kanchan Shringi 01:00:44 Thanks, Marc. So we don’t have time, so I won’t ask you about this that you mentioned a few times the neuros symbolic reasoning, but I’d love to get some references to add to our show notes for readers.

Marc Brooker 01:00:55 Yeah, we’d love to share.

Kanchan Shringi 01:00:57 Is there anything that we missed today that you’d like to spend a few minutes on?

Marc Brooker 01:01:02 I think the overall message for me for software builders is if we think about code, it encodes what it does, right? Like very, very crisply to computers. And we haven’t historically been reliably in the habit as software builders of writing down why. Why is the code like this? And I think, specification driven development gets its power from making the why is the code like this explicit both to humans and to models. And so, I think that is going to be this huge transformation we see in software is that the details of the how are going to maybe be less valuable and the artifacts explaining the why. Why is the software designed this way? What is it trying to achieve? Who’s it trying to achieve it for is going to become more and more and more valuable. That is going to be true if you’re doing specification development. I think it’s true if you’re doing vibe coding, and I think it’s going to be true of whatever the next waves of agent powered, or AI powered, or even human development practices look like.

Kanchan Shringi 01:02:12 Thanks Marc. How can people contact you and keep up with your work?

Marc Brooker 01:02:16 Check out my blog. That’s brooker.co.za. And I’m on LinkedIn and X and various other social platforms. I love to share the work that me and my team are doing here at AWS on those platforms and always, love to hear thoughtful questions from folks and have great discussions about the future of software engineering.

Kanchan Shringi 01:02:37 Thank you so much, Marc. This was a very interesting discussion and a very interesting topic.

Marc Brooker 01:02:42 Great. Well thank you and thanks for the opportunity and fantastic conversation.

[End of Audio]

Join the discussion

More from this show