SE Radio 626: Ipek Ozkaya on Gen AI for Software Architecture

Ipek Ozkaya, Principal Researcher and Technical Director of the Engineering Intelligent Software Systems group at the Software Engineering Institute, Carnegie Mellon, discusses generative AI for Software Architecture with SE Radio host Priyanka Raghavan. The episode delves into fundamental definitions of software architecture and explores use cases in which gen AI can enhance architecture activities. The conversation spans from straightforward to challenging scenarios and highlights examples of relevant tooling. The episode concludes with insights on verifying the correctness of output for software architecture prompts and future trends in this domain. Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related Episodes

References

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Priyanka Raghavan 00:00:19 Hi, I’m Priyanka Raghaven for Software Engineering Radio, and today I’m chatting with Ipek Ozkaya, a principal researcher and technical director of engineering Intelligence Software systems group at the SEI at Carnegie Mellon. On the topic, Gen AI for software architecture, Ipekís research interests include developing techniques for improving software development efficiency and system evolution with an emphasis on software architectural practices, software economics, and managing technical debt. She’s also co-authored a book on managing technical debt. So welcome to the show, Ipek.

Ipek Ozkaya 00:00:57 Thanks a lot for having me, Priyanka, especially for a topic that is very exciting given all the things going on in software engineering and AI these days.

Priyanka Raghavan 00:01:07 Yeah, I’m really looking forward to it. And before we jump into it, I wanted to set the context by asking you a question on what are the key elements needed to define a software architecture?

Ipek Ozkaya 00:01:20 So this is a very good place to start, especially now even the term architecture is being overloaded with model architecture, software architecture, and system architecture. The way we define software architecture has been at the Software Engineering Institute for the past two decades and more. It’s a way to reason about different levels of abstraction. So we define it as the software architecture of a system is the set of the structures needed to reason about the system that comprises the software elements, relationships among them, and the properties of both. I gave you the textbook definition. What this means is you need to think about the structuring of the behavior of the system from different perspectives. For example, how the code elements are structured together and how they relate to each other is one perspective. One structure, which we call module structures. How the software runs at runtime, how it interacts with different external systems is the runtime behavior component, connector structures.

Ipek Ozkaya 00:02:18 And that’s another perspective of the system. And then how the system is deployed, distribute, deployment, and where things run globally or locally and whatnot is another perspective, which is the deployment view. So realizing when we say software elements, you look at the elements from different perspectives, how they interact with each other, either in module time, runtime or deployment time. So that’s what makes the software architecture, how you define those elements, which is very important. But those are just the, maybe the boxes, how they communicate with each other, their relationships and the message exchanges and all the dependencies are also very critical. So that totality is the software architecture of a system. We love the boxes and line diagrams or expressing them, but that’s only one perspective of it. We might sometimes forget the other perspectives, and that’s what we base a line how we talk about software architecture when we talk about especially large-scale systems.

Priyanka Raghavan 00:03:14 Okay, that’s an impressive answer. And so I would have to ask you now what are say the routine tasks which are involved in architecting, because the cases we see where Gen AI is used is typically in terms of automation. So are there routine tasks in architecture or architecting?

Ipek Ozkaya 00:03:34 I think this is an excellent question. What are the roles and responsibilities of a software architect? And it of course varies, right? There’s definitely being able to define that software structure and behavior and the requirements that drive it. What is your context and how you’re implementing it, which is the hands on the keyboard for the implementation and how it relates to the abstractions. That is one aspect, but there are also all kinds of other roles and responsibilities of a software architect in terms of collecting requirements, documenting the key decisions, interacting with stakeholders in terms of documenting the requirements, understanding the context. There are all kinds of, maybe not necessarily hands-on implementation related tasks. There are the communication, information gathering and decision-making tasks. So let’s put those aside for a moment because our topic is automation and Gen AI. So let’s focus on the tooling and what it means in terms of a software architect’s role and responsibilities and tooling when it comes to especially the explosion of tools that we’re seeing with the Gen AI.

Ipek Ozkaya 00:04:34 So there is the decision making and the requirements to architecture decisions. One aspect of the architect’s roles and responsibilities. What is the knowledge that you’re driving from? Where does that knowledge exist? Some of it is routine knowledge like design patterns, architectural knowledge, other similar systems. The legacy of the system that has been there for a while that you need to take into account because what has been implemented drives your set of next decisions. It creates the context, it creates the constraints that you need to drive with. So understanding the current system is definitely a key responsibility of the architect. Understanding where the key architecturally significant requirements are implemented within the system. What elements of the system really comprise those implementations is also another key responsibility of the architect. Deciding how responsibilities are brought together and encapsulating into software elements is a responsibility of the architect.

Ipek Ozkaya 00:05:31 And of course, most importantly, how does the implemented system conform to the architecture? And how that evolves together is a responsibility of the architect and the team, of course, because this is not necessarily a one person responsibility. So when it comes to the tooling, there are quite a number of tools that might be available to the architect. But at the end of the day, what we’re seeing with Gen AI is really implementation driven tasks where like, how do we generate code structures and methods and classes or what that implementation structures, but we’re forgetting the design of the system, we’re forgetting the structure and behavior of the system because of the excitement of the scope of the tasks. This doesn’t mean it’ll not relate to the design and architectural aspects. What it means, however, is we need to step back and think about what are some of the tasks that are automatable.

Ipek Ozkaya 00:06:23 For example, code summarization is one, all the extraction is another one, generating different kinds of implementation constructs, maybe test cases and code repair. These are all aspects of the system and they relate to the design and structure and behavior, but I think we’re more focused on the generation implementation code generation part, not necessarily relating it to the structure and behavior abstractions. So, and how does it relate to the roles and responsibilities? It’s the architect’s role to do that conformance and use the required tools to be able to evolve the system based on the architectural decisions and the evolution of the system.

Priyanka Raghavan 00:07:04 Okay. That’s brilliant. I think we probably deep dive into a lot of those areas that you’ve alluded to, but one of the things that I wanted to ask you here was when you talk about the conformance to the architecture, then what’s the gap between, you know, your abstractions turning into reality?

Ipek Ozkaya 00:07:21 Oh, this is in fact I think the probably multiple decades long issue that the architects and the challenge of the architects in not architects, software engineers face, because the abstractions are communicated with different tools, right? You communicate the abstraction. In fact, sometimes in the earlier days of the Agile, they were communicated with metaphors. Abstractions are communicated with high level diagrams, context diagrams, UML diagrams, whatever diagrams that are fit for your purpose. Whereas when you’re implementing it, you are really talking about lines of code methods or any of those abstractions could be encapsulated in frameworks, libraries, and whatnot. The gap starts opening when the traceability from your architectural decisions are lost into your implementation. Especially when the architecture is not talking. The vocabulary of the details, like the design patterns and tactics help that vocabulary. For example, if I’m talking about with the publish subscribe approach, then I could say, okay, I have published elements, I have subscribe elements, my published elements will have particular methods.

Ipek Ozkaya 00:08:22 Those methods need to be implemented. Here are the signatures of those methods versus subscribe elements and whatnot. So that level of detail then starts driving the implementation as well. It’s when we are staying at the very high list level abstractions, then we lose that, then the gap starts happening. Another way the gap starts happening is you have the implemented system. You are evolving the system from the code base rather than evolving the system from the architecture. And we to this state, we do not have tools that help us speak both languages in the same tool base. And because of that, software engineers need to have that translation between those. Of course, many of these repeatable structures are now implemented within frameworks. They’re part of our tool chain that helps incorporate some of the known good architectural decisions into our systems because they might be just coming up in terms of these open-source libraries, frameworks, tools, elements. But when you have the domain specific ones, then you start losing that gap. And that’s how that gap happens. If you’re evolving the architecture but not evolving the system or vice versa, or you are not necessarily recognizing that the system evolved in the way that needed to be represented in the architecture.

Priyanka Raghavan 00:09:38 So I think that’s again, a wonderful answer to show the gaps. And it took me back to the original reason for having this talk, which was all based on the article that you which was like, can architecture knowledge guide software development with generative AI. And there you used the term and you know, it hit me so hard you called some of the ivory tower architects. So the question I wanted to ask you here before we deep dive is, do you think the ivory tower architects, could that problem be resolved with something like Gen AI because now they’re able to see those potential code?

Ipek Ozkaya 00:10:15 The question you’re asking is a fundamental question. I think in terms of how we develop and evolve and sustain systems, which is there’s the very high- level knowledge that you might make decisions at your corner based on whatever inputs. And then there’s the reality of the day-to-day implementation, the technology changing, hardware changing, your user base changing. And the teams need to react to those. They do not have the time to coordinate with many of the stakeholders that need to be coordinated. So it’s the individuals, every single developer, first of all has to understand architectural abstractions and be responsible. Some of it, what we’re facing with generative AI tools is both an opportunity and a risk. Let me talk about the opportunity first. You talked about the ivory tower architects who might know all these grandiose knowledge in terms of okay, architectural patterns, tactics, quality attributes, scenarios, and some of those details at the level that they’re exposed to, but not maybe always exposed to the developers.

Ipek Ozkaya 00:11:15 Now when we have these tools, all that knowledge is in fact easy to be incorporated into the tools because they’re texts, they’re out there and the existing large language models can actually digest that information. And you can query, okay, what do you mean by a published subscriber? And can you please show me an example of it? That interactive mode with some of these tools that we’re seeing even with the chat tools, will allow the knowledge to be transferred. So that’s the positive side. And there are particular kinds of trade-offs that you might say. Okay, what might be some of the alternatives that you would pick solve a particular design challenge or an implementation challenge? So knowledge sharing could actually be improved with the tools that are emerging. So that’s the positive side. The potential risk is especially when we’re using these tools in for implementation tasks, we’re interacting with those tools with a limited scope.

Ipek Ozkaya 00:12:07 Although the moment I say this, things are improving, we know the tool space is really, really evolving in a very fast-paced way, but we’re not necessarily thinking about the entire system, how the entire system is really implemented. Am I incorporating the implementation question that I ask in the same way with the rest of the system? Is it consistently done? Is it taking into account some of the concerns that might be coming from architectural significant scenarios and so on? And those are things that we’re only now seeing some of the questions being asked because the excitement have maybe leveled a little bit. Now we see a lot of tools out there that are helping different kinds of roles and stakeholders in terms of interacting with those tools. So I think that’s where we will see the ivory tower architects, maybe less because now the tools will have a little bit more power and that interaction will be improved.

Ipek Ozkaya 00:12:58 And I would like to also emphasize here is the tools are not one all and end all. Every single individual, wherever they interacted, whether they’re the model developers, data collectors, or any of the individuals that might be residing in copilots and birds or wherever they are in the tool chain, I guess food chain, they are recognizing that the advances of generative AI tools is a human AI partnership. The responsibility is on the user to take advantage of these tools while the tools will take some of the burden off the user’s shoulders. So it’s an important that we recognize it’s a human AI partnership. What does that say from the perspective of the architect is, or the software engineer who’s using these tools? How do I guide the tool with my expertise so that I can actually get the right answer? How do I craft my prompt so that it includes some of the design vocabulary and how do I judge the response in terms of the evaluation that it will be fit for my purpose for whatever task that I’m trying to address?

Priyanka Raghavan 00:14:03 Very interesting. So I was just wondering if we could maybe wrap this introduction part with another question, which is to ask you like some typical use cases where Gen AI is used for architectural activities.

Ipek Ozkaya 00:14:19 So first of all, I think we need to be honest with ourselves. We’re really, really in the very, very early stages of these tools being investigated for more complex tasks. I call architecture a complex task because the workflow involves multiple steps. That’s why I call it a complex task. You need to be able to also interact with a larger scope within the system. So because of that, we’re seeing only now some of the investigations emerging, but some common scenarios, obviously architecture documentation is one, either summarize this document for me, help me document this particular part of the system, generate an API, and all those kinds of things are very straightforward. Tasks, code summarization to help understand what you’re looking at in terms of understanding the context is another one that helps. In terms of the roles and responsibilities we talked about architecture, knowledge sharing.

Ipek Ozkaya 00:15:12 Again, you could ask all kinds of architectural knowledge in terms of, okay, how do I implement this particular framework? What does using a I use pops up before pops up Architectural pattern means for this particular system? What are other similar systems that I can learn from? These are all very high-level questions that you could take advantage of What these models, the more interesting ones that are also emerging are of course the software evolution ones. How do I re-architect the system? How do I translate from one programming language to another while also re-architecting and changing the structure and the behavior of the system? These are more complex because only generative AI tools will not address these. You’ll have to have other tools within the system, but it’s also requires more research and more development of the tools to be able to address these more complex workflows.

Priyanka Raghavan 00:16:06 Okay, great. I think we probably like maybe deep dive into both of these cases in the next section. And before I want to deep dive, I also wanted to ask you a question which I keep seeing in the literature where they talk about, you know, different models like the gans, which is the generational adversarial networks, then the variational auto and S and then recurrent neural networks. They always, in a lot of the papers I see, or even articles I see they say that the results are quite better with each model. Do you have any take on that?

Ipek Ozkaya 00:16:37 So this is where I think the architecture 101 answer applies. It depends. Okay. And it’s really important because one of the things like yes, some models perform others than others. And in fact just yesterday GPT-4 oh came up and yes, it’s faster in some tasks, but there are also some even very quick experimentation coming out saying, yes, they’re faster, but they’re not always as accurate. So these are all trade-offs. I think it’s important to recognize that these trade-offs will change. It’ll be very difficult for me to create a very broad bank. Okay, one particular sign kind of models are better than the other one there are, but it’s important. Maybe let’s go back and step back in terms of what are our goals in terms of these models. Some of the questions are they trained with the relevant information that we would like to extract from?

Ipek Ozkaya 00:17:27 For example, if our goal is to use a brand-new version of a particular library or framework, even if the model might perform better in some tasks, if there’s not trained with that knowledge, it’ll not perform well. The other one is whether the scope window is able to address the scope of the question that I’m asking. So that’s another one. Does it have the interaction mode in terms of how I’m able to input the prompts and whether it’s able to process those prompts? So those are some of the ways I think I would evaluate these from, and that’s why there’s a lot of experimentation that goes into it. One of the important aspects of using these models for some of the software engineering tasks is the consistency, especially when you’re using it for larger tasks. Will it consistently gimme the accurate enough response so that I could actually replicate it for whatever part of the system that I need to replicate? Or if some other software team member were to interact with the same question and same task, would they really get the same response? So those would be some of the evaluation criteria that I would bring to bear rather than generalizing one is better than the other.

Priyanka Raghavan 00:18:35 Okay, that’s a good point. Thanks for that. So let’s now deep dive into some of the Gen AI use courses that you alluded to before. The first thing I wanted to ask is, as an architect, there’s this concept of an architectural context to help you make that trade off decisions. So can the tooling help you?

Ipek Ozkaya 00:18:55 This is where whether your domain has very particular details that is in the public domain or not, comes into play Quite a number of software engineering decisions and implementation decisions and architecture decisions are in fact not necessarily too specifically tied to the domain. The domain specific parts are smaller percentage of the overall systems, and that’s why we’re able to talk about architecture knowledge, generalizable tactics and patterns or ways that we’re implementing them. With that said, there are particular key domains, especially in machine critical domains or proprietary domains, where you might need to fine tune the models for your context to be able to address some of the design decisions or address some of the concerns there. So I would say a significant portion of the architectural design decisions would be possible to address at the high level. The question is whether you could put those in terms of the implementation constructs throughout the system is one, and whether what information, what data do you need to fine tune is the other part of it.

Ipek Ozkaya 00:20:03 For example, we talked about new versions of external libraries coming up as one scenario. You might have really, really specific business rules within your system that drive some of your architectural decisions. And you might need to fine tune the models with those business rules. However, what that means is you need to pay attention to the privacy concerns, security concerns, whether any information is leaking, whether some of the public private APIs that utilize that information need to be taken into account when they’re encapsulated at design decisions. So those are some of the considerations that you would need to take into account. If those are dominating in the system, then the answer would be no. You would really need to make the trade off from the perspective of do I use a publicly available open and general-purpose tool, or do I need to fine tune these models for my particular use case?

Priyanka Raghavan 00:20:54 So I think one of the things I forgot to ask you was this term architectural knowledge. It’s some in the public domain, some is in the private domain. Is that something that can be defined?

Ipek Ozkaya 00:21:06 Let’s see, how do we define architecture knowledge? So software architecture knowledge has several components. Like first of all, it’s been around and vetted over a number of implementations is one aspect of the architecture knowledge that it’s generic. So there’s the architecture design. What are the design elements that is encapsulated in terms of the components, the relationships as we talked about is part of the architectural knowledge. What are some of the particular decisions? Like for example, I will not allow a particular element to communicate with another element because I’m constraining it for a security reason, maintainability reason, performance reason. That’s part of the architectural knowledge. Any particular assumptions you’re making? There are quite a number of cases where things work in one context, but it wouldn’t work in another context. Like the deployment environment, for example, might be part of what are the assumptions you’re making and what are some of the context that the particular architectural design decision is applied.

Ipek Ozkaya 00:22:06 So the totality of these make the architectural knowledge, but what are some of the examples? All kinds of things that we’re talking about. For example, API design different distributed system design, different categories of design elements, embedded design. These are all architectural knowledge that are in the generic that is out there. What is important is how you instantiate that knowledge. That’s when it becomes the delta that you’re talking about. Yes, I take a generic architectural knowledge construct that what it means to put it in my context. Are the constraints on my context the same as the assumptions of that architectural knowledge that is out there? Can any of the existing tools help me walk through, even if they don’t give me the answer? Can these tools help me walk through the elements of the process so that even if the tool does not have the information, maybe it helps me to collect the information, to make the right choices in terms of how I’m encapsulating different elements and making them ready for implementation. I don’t know if that addresses your question.

Priyanka Raghavan 00:23:08 I think so. I mean, I think one of the things that struck me was also the fact that every time we kind of set out on a new architecture for a new component, a lot of the times is we sort of look at these old software architecture design documents or things like that. And that’s like our architectural knowledge. But oftentimes I think either one, we don’t have the patience to look through that, which I think a lot of people and people just sort of start fresh and then they end up with the same problems, which was there in the previous system. So I was wondering if these Gen AI tools could also help you with that, that you can use the architectural knowledge to mine and then help you feed your new architecture?

Ipek Ozkaya 00:23:47 Correct. So I think you bring an interesting point in terms of can we mine variations of the same maybe construct through existing systems and come up with maybe all these space of assumptions and the constraints we’re talking about. One of the things, if anybody has fancy then read any of the pattern oriented software architecture books and all of that, the first thing they’ll recognize is it is so not formal. There’s a lot of, well, if it’s in this particular context, this variation might apply, in a different context this variation might apply. There’s a lot of variation in terms of the assumption and sometimes you have to make those break the rules to be able to make it to fit to your context. But when are you breaking the rules so much that it does not apply to the pattern anymore? And what is that design decision space in terms of the different rules apply. For example for performance, let’s pick a one-on-one pattern layered pattern, right?

Ipek Ozkaya 00:24:42 In layered pattern. The vanilla rule is each layer is only communicating with the layer below and so on. And you do not do layer bridging so that you’re actually able to encapsulate the decisions and whatnot. But if you have a critical performance concern, you say, you know what? I’m going to make an a particular exception for this. One particular layer is allowed to access a different particular layer. Okay, this is an exception. It’s allowed, it does not necessarily break it, but that’s maybe one particular one allowed. So when it gets more complex, these tools can in fact help maybe mine some of these over time so that you can say, okay, here are some of the, this is the vanilla way to implement it. Here are some of the variations, and here are the tradeoffs. Bottom line in architecture, decision making and design decision making is the tradeoff, right?

Ipek Ozkaya 00:25:29 What is the tradeoff and how do we understand the tradeoffs? However, it’s very important to remember large language models, generative AI tools foundation models. They’re probabilistic models. They do not have the ground truth. The way they work is they select the next element in sequence based on a probabilistic reasoning. Of course, all these more sophisticated models are mimicking or learning with large data sources so that they are able to provide as correct as possible responses. But the probabilistic nature does not change. Because of that, the expertise of the user is essential. It increases the pace of the user, especially the expert user. But it does not necessarily always guarantee that you always have the 100% correct response. So you need to check the response. You need to understand how you’re checking the response, which might become more complicated with architectural decisions because we’re talking about all kinds of what if scenarios.

Ipek Ozkaya 00:26:26 We said, okay, it’s based on the context. We said it’s based on the trade-offs. We said it requires some of the technical knowledge of the patterns and tactics, which by itself means there’s no one size fits all, which also puts the responsibility on the users, especially software architects, to be able to provide the right information in terms of the prompt and also assess the outcome for that. So maybe that’s something in terms of, to keep in mind when we’re looking into how am I making this domain specific to my need and what kind of fine tuning I need with that data and whether if I have the data available to do the fine tuning.

Priyanka Raghavan 00:27:06 So this is of course, the one that we talked about was the fact, that Gen AI I could help you in articulating or maybe understanding your design decisions. There’s a lot of work as you explained. But another case which I wanted to ask you, in terms of architecture, a lot of times they also talk about the architecture mimicking your organizational behavior and structure. And so if you’re coming in as a new architect or a new organization, would these tools help you also kind of understand a bit of this?

Ipek Ozkaya 00:27:38 So one of the ways these tools, whether it’s an architecture task or any task, is they’re really good with text. So if you have a text that is describing the organizational structure, if you have an architecture document describing it, they could definitely help with the summarization and helping you navigate through these, whether if you have the organizational structure and you can deduct the architecture of the system from that. I would not rely on those because first of all, we know that that mapping is hypothetically true, but that diverges very quickly as well. And we also know that that’s not always the right way to do because this is in fact said, not always in a good way. Sometimes it’s a negative way, right? The organization structure derives the system architecture, but it shouldn’t be that way because A department doesn’t talk to B departments should not be the two parts of the system should not be talking to each other. So there are all these positives and negatives part of that. But underlying in your question is because such tools are good at summarization and text, any document that you have could help with bringing users up to speed. But this is applicable to any role and responsibility that does not have to be a software architecture related one at all.

Priyanka Raghavan 00:28:53 So having talked a little bit about the proficiencies these tools have with text, I wanted to ask you about whether these tools are also good for producing a good architectural data.

Ipek Ozkaya 00:29:03 So there is in fact some early investigations in terms of whether you can produce diagrams or whether you can study the diagrams. I would, yes, however, I think these are limited uses of these tools because a good diagram from my perspective is bottom line is how are these tools helping the end result, which is to develop an executable system that I deploy and then the users can take advantage of. So a good diagram is a diagram that could actually be part of a model. It could help round trip engineering with generating some of the code from the model or whatnot. So from that perspective, we need to go into the formalisms and whether we are able to use a probabilistic tool like generative AI tools for formal reasoning for generating formal artifacts, whether they’re models, diagrams, and whether we’re able to check whether that formalism is correct and use it for other downstream activities. I think that’s really the key question we should be asking. But yes, of course they can generate diagrams. You can review the diagrams. There are all kinds of fun things that you can do with them. That’s probably less exciting for me, more exciting is am I able to generate formalisms formal artifacts? Because implementation is also like code is also a formal artifact, or at least there are rules to it and propagate those rules consistently so that my goal, which is the executable implemented system, is working in its intended use.

Priyanka Raghavan 00:30:38 That’s a very great answer. I think it made me think also because I think the key question is it’s a probabilistic model, right? So that, yeah, so we have to.

Ipek Ozkaya 00:30:48 We have to remember it’s probabilistic. And we also remember why UML never worked, right? Unified modeling language. I mean, yes, there are benefits of it, but it never necessarily worked because the systems are very big and it gets overwhelming. So the bottom-line question is, a large-scale system needs to be represented with different levels of abstraction without forgetting the implementation constructs. Will these tools help us navigate through different levels of abstraction consistently and speed up the process while taking into account the context and the experience of the expertise that is required. And I think there’s an opportunity, and the opportunity comes from, especially some of the new approaches that are coming with even Copilot has workspaces and workflows to be able to take advantage of generative AI tools, you need to think about the workflows. What is the workflow? It could be a narrow-scoped workflow.

Ipek Ozkaya 00:31:47 For example, code repair. I find a bug, I repair that bug, I compile and I deploy it. So that’s a layer of work because the scope of the implementation is maybe a couple lines of code. You can create rules for it. It’s not, the space is much, maybe more tractable. That’s a small workflow, but then a larger workflow as well. If it happens in many places in the system, I can scale it. That’s again, maybe possible, but it starts getting more complicated. Well, if I have this, I could do it in a particular way or a different way, but that affects my performance. How do I choose one over the other? And how do I propagate that throughout the rest of the system? Then you’re starting to ask design questions, but that’s a large, like, that’s still a workflow. It just have different tasks within the workflow. And we shouldn’t assume that it’s only generative AI tools that will be part of the workflow. Other tools might be part of the tool chain that can be not as, that the developer can utilize without having have to go between different tools. Those could be masked within the world creative users of development tools for software architects as well as developers.

Priyanka Raghavan 00:32:55 Yeah, actually this brings me, took me back down memory lane because I guess the question I was going to ask you is can Gen AI be good for producing code that fits the architecture? And I remember like, you know, back in the day when I was doing my masters in CMU, we had a similar thing where we had this, the Eclipse plugin, which we used to write for all our Java code. We used to have these architectural design decisions and things, you know, written there. And we would do that and then that would produce the code. And that way we would always see what was the gap between the architecture and the code. So like in terms of that, in these Gen AI use cases, can architecture be, I mean, these tools be good for like producing code that fits the architecture, or even if the architecture changes, can they refactor the code?

Ipek Ozkaya 00:33:41 So I think the, within your question, what I’m hearing is how do we construct the prompt that represents the architectural decision so that they actually performed with, like for example, I would like to implement whatever capability, but I would like to make sure that I use a layered pattern because I might need to extend it. And when you’re doing the layered pattern, put particular functionality in or layered call to X, put other functionality in a layer called Y and so on and so forth. So it comes to how the user constructs the prompt. Given the prompt. So that’s one step of the way, right? Should the prompt include all those key concepts to drive the generation? That’s step number one. That’s maybe a question. And the question number two is, given those concepts, will the models be able to structure the response that actually fits that?

Ipek Ozkaya 00:34:42 I think the answer to both are yes, but I think we need to probably get a little bit smarter in terms of what it means to construct that prompt. Because within that, we’re already saying we need to teach the user the architecture vocabulary. In fact, I have jokingly say, well, generative AI tools will make software architecture cool again because that make your, makes your job easier rather than interacting with it method by method or line buyer. You could just say, I would like to have a, I mean at the very, very level could I say I would like to have a service-based system. I have X number of services. Please encapsulate these services as this way or that way. Here are some of the communication constraints. Voila, it didn’t work. Okay, maybe regenerate and fix that prompt. Will that make me faster? Will it give me feedback in terms of the mistakes that I might be making in encapsulate, like communicating my decisions and how do I check the result?

Ipek Ozkaya 00:35:36 So that’s, maybe it’s the same workflow, but multiple steps are combined together. And that’s, I think, interesting for us to articulate and maybe think through as we go through experimenting with these systems. Let’s all be honest with ourselves. We’re still experimenting. I think none of us really know, know where they are really going to change how we develop. Yes, we do see some data coming out from like a productivity improvements, this and that. However, whether they will really will be able to take the benefits, reap the benefits for large scale system development. Evolution is an aspect. You ask the refactoring question, I think that’s similar. There is a lot of knowledge out there in terms of refactoring. So I think when we look at it from a narrow scope, like for example, I would like you to refactor my long methods or insert a method extract method, like there are all these Martin follow refactoring rules.

Ipek Ozkaya 00:36:32 I think those have potential to work. They’re already in other kinds of tools. There are search based, there are other AI approaches that in already incorporate some of these. I think from that perspective, those refactoring approaches will be possible to use with generative AI tools. Those, I call those more local refactorings. When we talk about large scale refactoring, which comes, brings back the architecture, I think the jury is still out there because that brings back the steps of the workflow. What are the tasks that are going to happen and how does the change propagate with the rest of the system? I think for that, you still cannot just rely on Gen AI tools. You need to really take advantage of other tools and expertise and evaluation and compiling unit tests and whatnot that come into the all the good and bad of software development and software evolution.

Priyanka Raghavan 00:37:24 No, that’s right. Absolutely right. For more complex systems. Yeah, I think you say it’s not so trivial.

Ipek Ozkaya 00:37:30 And you know, it’s also, we do not have to maybe use it for every single task we have. That’s also important to recognize if these tools help with code complete, local code repair, some generation tasks that make that part of implementation faster and more accurate and quality conformant, then the software engineers are left with the time to spend more in terms of the design than the structure and behavior or trade of analysis or whatever they need to do that require more time. And that tradeoff is also not that bad of a tradeoff. I am not willing to let go of potential higher complex use cases yet. But probably it’s not one or the answer is not one or the other. It’s really a combination,

Priyanka Raghavan 00:38:19 I think one of the things in the article that you talked about, I mean that you wrote about also referenced this self-testing code with Martin Fowler, and I think it’s Q how, where they talked about how Chat GPT can be used for, I reference that in the show notes. When I was reading that another part of the, while I was very awestruck by the self-testing code examples over there, I also noticed this part of the article that talked about, I just quote it once it receives enough words in the conversations, it starts forgetting the earliest material and in effect, it makes it very curiously forgetful. So he talks a lot about the context and having that strict tokenization. So has that changed as the technology has gotten better from the time you wrote the article?

Ipek Ozkaya 00:39:15 Oh yes. I mean at the time when I wrote these articles, I would have to change it maybe every week. There’s like the reason these articles were written for, I was the Editor-in-Chief of two, three Software magazines. So there’s a timeline I typically have maybe two months? So between the first week to the third week, okay, let’s revisit what’s changing. So there, that was a period of very fast change. We are continuing to see change. I think there are two aspects of we’re seeing change. One is the scop window is larger. Now you can feed PDF, the models are not multimodal. You can feed text as well as images and voice and whatnot. So that is going to continue to improve. The forgetfulness will also, that is related to the scope window, what keeps in mind and all that, that’s also going to improve as well.

Ipek Ozkaya 00:39:59 But what is also changing is specialized tools built in on top of these models. So maybe even if the context window does not improve, there might be other workarounds that help you keep that context and continue to build on to the information and you build on. That’s one thing that is improving. The other thing that is improving is the prompt engineering aspect. How you construct prompts so that you are able to get the results in the way that you would like. In fact, Martin Fowlerís article was the inspiration that resulted me to think about, wait a minute, there is more here. Going back to the make soft architecture. Software architecture will actually be again important because you’ll have to recognize these. It’s the kind of knowledge that you feed into the, not only generate the system with these design constructs, but also generate the tests.

Ipek Ozkaya 00:40:49 When you generate the tests, you have to generate the tests using these particular rules. So that vocabulary becomes important as part of the recognizing. So the tools will continue to improve, but the apps that are built on top of the models will also continue to improve as we think about more complex use cases or more particular use cases that software engineers can take advantage of. And I mean, not to mention all the Copilots and all of those tools are also improving by the minutes because there’s a lot of excitement as well as potential in improving our systems through these tools. Because if we can write correct code and assess their design to start with, that’s actually quite intriguing. In terms of our overall goal from a software engineering and system development perspective,

Priyanka Raghavan 00:41:40 I think I want to move on to this a little bit on the state of the practice and tooling. So from what I understood from what you said is a lot of the architectural knowledge and constructs and elements after now they have to like feed this prompt text. And so there’ll be this new area of, I don’t know what you say, research or maybe area of study where you, how do you make things better with your prompt text? So do you think a lot of the tooling will come around that area?

Ipek Ozkaya 00:42:06 Thereís some generic knowledge that is already out there in terms of the prompt patterns. How do you construct prompt patterns? This is generic. These prompt patterns apply to any kind of use case, not just software engineering or software architecture. So that’s one class that is already out there. In fact, there’s already a work that is coming out in sharing some patterns in terms of how to construct prompts. For example, work by Vanderbilt University Chat GPT prompt patterns for improving code quality refactor carbondale station and software design share some of the patterns. These patterns are generic patterns. What actually is going to be the next level is, okay, when I’m, if I have a particular task, this task could be as I’ve been using code summarization, documentation code repair as examples. What does it mean to construct a prompt for a particular task? But let’s bring it back to our topic software architectures.

Ipek Ozkaya 00:42:59 How do I share vocabulary that will guide the tool to address my concern properly will be the next level of patterns. And I do think there is work to be done there, and that will be coming out as part of the evolving work. Like how do I share that information? Is that information part of the model and whatnot is part of definitely the improvements that we’ll see in the very near future if not being developed as we speak. But it’s also important to recognize that the prompt provides part of clearer aspect of it. The models will also continue to improve with the semantic knowledge and the knowledge of the, how you incorporate the elements of the patterns within the model and so on and so forth. So those are all, I think we’re going to see fast improvements in both spaces.

Priyanka Raghavan 00:43:42 So instead of a software architecture document, you might have a software prompt document or some flow.

Ipek Ozkaya 00:43:48 So I would equate prompt with the requirements than the architecture document because the prompt provides the way, how am I communicating what I need from the system and how I get out of that with. So the requirements, in fact, there’s a lot of good thought to be done in terms of how I am expressing my requirements. Are there particular ways that I could improve expressing my requirements? And how are the architectural significant requirements are communicated at what level? And do they include the solution or are they just the requirement? Because when you’re expressing a requirement, there might be many solutions. When you’re expressing generative code, you’re expressing a solution. So there’s that different steps that you need to take between an end user requirement or a particular functionality use case to what it means to create the implemented space. But yes, you are right. That’s the nature of the artifact will change, eventually might change. That’s in fact a good point. I had not thought about that.

Priyanka Raghavan 00:44:50 One other thing I wanted to ask you, and this might be something a bit basic, but one of the things as someone starting off at as a developer, like trying to figure out whether, should I go for a merge or a bubble sort or how should the database design be done? If you have a complex query requirement, these are some things that you actually run through with your software architect. Is this typical questions that you can ask the Gen AI now?

Ipek Ozkaya 00:45:22 I think these are questions that not only you can ask, but they actually does pretty well in some of these established algorithms and providers. Again, I will complete, keep repeating myself here intentionally. It’s a probabilistic technology. It will just pick probabilistically the next one. In fact, there are all these parameters that you set in the tools sometimes so that it has more variation. It might provide because it might have high probability, hence it might provide a wrong answer since it’s just probably depicting that. So how you test the outcome is really important. It’ll generate. However, I will say, and I think this will probably be true for anybody who’s been played with these tools for whatever reason, if you are an expert in a particular task, it really improves your ability to generate results for whatever your task has, especially if you’re able to craft the prompts appropriately.

Ipek Ozkaya 00:46:22 I’ve seen it work that way very well. I think a lot of developers are excited because code complete and thinks these are technologies that have been already out there with other techniques. It just really, really accelerated their utility. And those are, I think, important advancements that now allow us to even talk about, okay, maybe what is our next more complex task that we can investigate utilizing these tools. So yeah, so I think establish algorithms and their ability to help you. Yes, with a caveat, you have to test it. You cannot just rely on it because of the probabilistic nature of the tools.

Priyanka Raghavan 00:46:57 What are things that you should watch out for? Can gene AI reduce the creativity or innovation of an architect?

Ipek Ozkaya 00:47:04 Oh, maybe. So there are several areas to pay attention to. One is the data that these tools are chained with. So after a while it gets not, maybe repetitive is not the best way to express it, but you get what you put into it, right? If you have, so you are getting the solutions that it was generated. So creativity required tasks may risk it. However, if we’re talking about giving a solution to an existing problem, we are already talking about repetitive solutions. So maybe the risk is here, not that high. However, let’s look at the other side. If it’s trained with bad examples, because all the examples we have out there are open-source examples, examples that people are willing to share or incomplete examples. That is really where the risk is. Is the training corpus really completed with correct and well-constructed and good examples to provide you the correct answer.

Ipek Ozkaya 00:48:03 And I think this is a community effort in a way. It’s going to continue to improve. I’m really hoping the developers of the models are working in the background, improving some of these and cleaning their data and whatnot. So that’s really where the risk resides. The other one is like if you are getting, is it circular and you’re injecting the same mistake back in and so forth. So those are probably the mistakes to watch out for. For creativity requiring areas or tasks which are, I might get, yes, there is some creativity in software architecture, but I would want to believe that correctness and fit for purpose is a little higher up than creativity. Maybe there is some creativity there, but we’re not talking about artistic creativity here. We’re talking about ability to answer particular significant, specific requirements. So I don’t know whether it matters we have a different solution at all the time or the same solution works. So that’s maybe every organization, every individual might have a different opinion on that.

Priyanka Raghavan 00:49:07 I think that’s really well answered. I’ll keep in mind the probabilistic nature of the answers. So I think that is very important for all of us listening to the show.

Ipek Ozkaya 00:49:17 It is very important. And in fact, let me give another important reference for the listeners. There is an ACM article done by Microsoft colleagues who looked into some of the early outcomes of using Copilot by developers. And one of the things that they realize is it does not replace expertise. It actually shines a light on the expertise. What that means is, if I’m an expert in a particular area, I would very quickly be able to understand that the answer that the tool is giving me is wrong or I would know how to ask the question correctly so that I get the right answer. So that’s really important in terms of the uses with the tools. That’s why human AI partnership is something now everybody recognizes it’s not replacement for the skills. No software engineers will not go out of business. However, how we develop expertise, how we develop different skills, evaluation skills, critical thinking skills, quick verification and validation skills become important in utilizing these tools. Whether you’re a software architect, software engineer, software tester, wherever you sit on the food chain of software development lifecycle, these are important tasks to skills to develop.

Priyanka Raghavan 00:50:30 Interesting. So now I have to ask you the next question, being from security background, what are the security implications like misinformation, data leakage from using these Gen AI tools when you’re designing or architecting systems?

Ipek Ozkaya 00:50:47 So absolutely a concern. And in fact because of that we see limited experimentation in some domains and some organizations by these tools. These are absolutely concerns, especially what we don’t know. Although developers of the models have disclosures, whether they’re using or not, to some of the data that is being input. Once you start interacting, especially with the public models, you are interacting with that domain and your information becomes part of the public domain. So that is a concern. From an architectural perspective, it might create potential risks in terms of what is being exposed, whether that data is being utilized for other parts of the system or whether that data now is being trained within that overall, public area. I mean, early on we all read about the Samsung case where the developers utilize, use proprietary information and fed the code into the tool because one of the use cases, okay, here’s my code, do X or YZ or with them.

Ipek Ozkaya 00:51:49 So that’s also becomes a risk. So I think the risk is not just the data leakage and security, it’s really any proprietary information that you would give into it. And I think the users, it’s on the users to be careful rather than on the, I mean, yes, of course there’s the responsibility of the tools, but I would err on the side of assuming the worst from the tools and until we’re sure, or whether we’re able to take precautions and then provide the information that is more proprietary or security risks. So that’s one aspect of the security. The other aspect of the security, of course, the implementations, right? Whether it’s giving the correct answer or not, again, the tool code specific tools are improving because these concerns are critical concerns, but that doesn’t mean it’s like perfect errors will leak in.

Priyanka Raghavan 00:52:38 I think that’s interesting because the question I had next was, as a user of these Gen AI tools, there’s a certain lack of transparency of the output produced because you don’t really understand how the AI model comes at a, you know, delivers a particular output. It can be a bit difficult to understand because of the complexity and therefore it’s easy to check the correctness for code. But how do you check the correctness for architecture? But you talked a little bit about the correctness being of most important. So what do you do in that case?

Ipek Ozkaya 00:53:11 So I think we need to focus on the artifact that is being generated. I like how you phrased that question from that perspective, Priyanka. So what is the artifact being generated? So how do I check code tests, walkthroughs, code review, all those kinds of techniques. So we need to ask ourselves how do we check correctness of the architecture? Which probably we don’t, I mean yeah, we do qualitative architecture reviews and all that, but that’s I think brings up a gap in our software architecture practice. So we need to think about how do we check the correctness of an architecture design decision to start with, which is mostly qualitative techniques, expertise-based techniques. Okay, if you’re using some formal techniques, maybe if you’re implementing the system, the system becomes the reality and all of that. So that’s, I don’t think is on the Gen AI tools at all.

Ipek Ozkaya 00:54:05 That’s on our software architecture practice. How do we check the correctness of an architecture decision? How do we check the correctness of a trade off? Which is why it becomes more of this fuzzy high-level conversation often. But again, we can resort to some of the, the devil is in the details. In fact, Paul Clements, who is one of the authors of the software architecture documentation and software architecture principles and practices books from the Software Engineering Institute has always emphasized that software architecture is a very detailed practice. We always think it’s very high level in boxes and diagrams. So if we have those details, then we are able to actually think about, okay, how do I check those details? So I think that’s really where the question is. I don’t think that’s on, we cannot blame Gen AI and probabilistic models for that. I would love to, but.

Priyanka Raghavan 00:54:58 Fair enough. So then I guess I should end the show by asking you the last question, which is, what are then the future trends in this space and how do you see it growing?

Ipek Ozkaya 00:55:08 So I think it’s a very exciting time for automated tool development. This is probably where we’ll see the future developments as we, our understanding of how to use the tools improve, the models will improve as the models will improve. We’ll slowly expand our use cases, we’ll get more confidence in some tasks and we’ll have less confidence in others. So I think the key question is how do we avoid focusing on the small tasks that might risk the overall structure and behavior of the system? And are there ways to ensure that some of the architectural concerns are part of the, I’ll say conversation for lack of a better word, interaction. Maybe that’s better interaction of the tools and what are some of the capabilities that need to be incorporated in these tools to be able to bring design knowledge into them is probably where, and start small and go from there is probably the right way to do it.

Ipek Ozkaya 00:56:07 Okay. We’ve done some of the more implementation, small scope implementation specific tasks. Now what is the next level? What is the smallest design construct that we can experiment with? And that might help us look into it. There are other ones that are being utilized. I mentioned a couple that we’re working on, like for example, how do I upgrade known libraries using some of these tools? Again, Gen AI is not the only tool. There are other tools that are out there, but they’re small enough and complex enough to experiment with these. Copilot came up with the workspace and workflow kind of an approach. And that’s also going to be, I think, so we’ll see those kinds of more complex use cases being experimented with. I think that’s one way we’ll see being able to provide data quickly into the corpus as new versions of tools are coming out, frameworks, libraries, and all those. That’s also another area that’s, that’s on tools anyway, on the models anyways. And then one aspect that is probably important is to recognize that it’s important to continue to develop expertise rather than assume that the tool takes care of it. That’s probably where we’ll have to continue to remind ourselves.

Priyanka Raghavan 00:57:16 Thank you very much, Ipek. I think a lot of your answers were spoken in a way that it also reminded us of what is important to do as well. So I think apart from the aspect of exploring Gen AI tools, it’s also maybe for us to reflect a bit on what things we should do a bit better. So thank you for that.

Ipek Ozkaya 00:57:35 And also I think we forget, we get all excited. There’s a new technology, we’re all very excited. What is the goal we’re trying to accomplish is has to come always first. I think that’s important and sometimes it’s difficult to admit, but maybe existing tools are good enough in some cases. I’m not suggesting that, but it’s really the combination of the tools. Yes, Gen AI will I think help improve. And I do believe what I say, the generative AI tools will shine the light on software architecture in ways that we had not recognized before. Because at the end of the day, it’s not this small scope class or method or whatever the scope is that we’re developing. It’s the system we’re developing. How do I develop the system rather than the small experimentation that we’re dealing with currently.

Priyanka Raghavan 00:58:20 Point taken, thank you very much.

Ipek Ozkaya 00:58:23 Well, thank you for having me.

Priyanka Raghavan 00:58:24 And I think one thing that I need to ask you is what is the best way that people can reach you if they want to get in touch?

Ipek Ozkaya 00:58:31 Email? LinkedIn?

Priyanka Raghavan 00:58:33 LinkedIn. LinkedIn maybe?

Ipek Ozkaya 00:58:34 Or email me.

Priyanka Raghavan 00:58:35 I’ll add both in our notes. And thanks once again, thanks for coming on the show.

Ipek Ozkaya 00:58:40 Thank you.

Priyanka Raghavan 00:58:41 Thanks for listening. This is Priyanka Raghavan for Software Engineering Radio.

[End of Audio]

SE Radio 626: Ipek Ozkaya on Gen AI for Software Architecture

Show Notes

Related Episodes

References

Transcript

Join the discussion

More from this show

SE Radio 646: Matthew Skelton on Team Topologies

SE Radio 645: Vinay Tripathi on BGP Optimization

SE Radio 644: Tim McNamara on Error Handling in Rust

Menu

Recent posts

Search

Search

SE Radio 626: Ipek Ozkaya on Gen AI for Software Architecture

Show Notes

Related Episodes

References

Transcript

Join the discussion

More from this show

SE Radio 646: Matthew Skelton on Team Topologies

SE Radio 645: Vinay Tripathi on BGP Optimization

SE Radio 644: Tim McNamara on Error Handling in Rust

Menu

Recent posts