SE Radio 603: Rishi Singh on Using GenAI for Test Code Generation

Rishi Singh, founder and CEO at Sapient.ai, speaks with SE radio’s Kanchan Shringi about using generative AI to help developers automate test code generation. They start by identifying key problems that developers are looking for in an automated test-generation solution. The discussion explores the capabilities and limitations of today’s large language models in achieving that goal, and then delves into how Sapient.ai has built wrappers around LLMs in an effort to improve the quality of the generated tests. Rishi also suggests how to validate the generated tests and outlines his vision of the future for this rapidly evolving area.

This episode is sponsored by WorkOS.

Show Notes

Sapient.ai

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Kanchan Shringi 00:01:01 Hi all. Welcome to this episode of Software Engineering Radio. This is your host Kanchan Shringi and today we welcome Rishi Singh. Rishi has been a platform architect at Apple, a co-founder and CTO of @Harness.io, which is a CICD platform and he is now founder and CEO at Sapientt.ai. Today we will explore the technology and methodology behind how Sapientt.ai leverages GenAI to help developers automate test code generation. Rishi, is there anything else you’d like to add to your bio before we get started?

Rishi Singh 00:01:39 Hey Kanchan, thank you so much for inviting me. Great to be here. I think you really covered it well other than CPTI, @Harness.io and my stint at Apple, one thing that I can add is, I’m really, really passionate about the developer’s tooling. That anything that leads to help a developer become more productive. So before founding the CPTI, was more into the software delivery space with the CPTI, this is a bit in the upstream environment especially for the developers so that they don’t get stuck in the testing process. So yeah, I’d love to discuss more.

Kanchan Shringi 00:02:13 Before we get into the first set of questions. I’d like to point our listeners to episode 167, which is the history of J-unit and the future of testing with Ken Beck, which I think sets a good stage or how we are going to change some of those methodologies. Well while the methodology changes, the problems that need to be solved are probably the same starting with identifying what to test and input. Can you comment on that and maybe help us understand all the things a tester needs to solve or the developer needs to solve?

Rishi Singh 00:02:54 Yeah, that’s a great question. The software testing has been as old as the software itself, right? So ever since, started building the product you need something to test it. Even with the testing, has it itself evolved with the software development. Now if you recall back in the days we used to have the waterfall, software testing used to be a very significant stage in this entire software development lifecycle. And as you referenced about the Ken Beck episode, I had a chance to listen to that episode before this regarding. So, if you look back in the early 2010 to where we have come today, the problem statement remains the same. Because we want to assess the quality of our product, we want to make sure the product that we deliver is meeting the requirement and it’s helping the customers or their users to have a good experience.

Rishi Singh 00:03:48 But the software development landscape itself has changed. I think the way we are building the software, the way we are delivering the software, that itself has changed. And if you dig deeper underlying the requirement is the same thing. Broadly speaking, almost every product will have some functional testing requirement. We’ll have some kind of non-functional testing requirement and then you can just break it down into this multiple areas and you start tackling each one of them in a respective way and you solve it. So it’s really a right observation that the requirement itself is same, but it’s just the way we tackle has changed.

Kanchan Shringi 00:04:23 In addition to identifying the functional requirements and hands what needs to be tested to meet them, things keep changing all the time. So identifying what has changed and how to test the delta is another problem that one probably has to focus on.

Rishi Singh 00:04:41 Yes, yes. So back in the days we used to have a testing done by Acuity. Not everything was manual and that evolved as sort of test automation. So you’re not only just doing the testing one time, but you’re writing a program to simulate the entire process, set the steps so that you can test as many times. And now the test automation itself is getting replaced by some kind of automated process. So the test automation is generated, right? So the underlying philosophy that most of this QA engineer used to follow is what we call a test pyramid, right? So, what about the testing requirement that you had? You just break it down, otherwise it becomes quite overwhelming. So I’ll give you an example. Let’s say we have a web application, there is some sort of authentication layer in the front and then you have the actual web application doing it.

Rishi Singh 00:05:36 Imagine that it’s a brokerage application. It might have a, let’s say thousands of different use cases or thousands of different test cases that are emerging out of it. But then you have some kind of authentication that can vary. You might have the Google based authentication, you might do Okta based authentication, or it may be traditional, the user item, the password kind of authentication. But the way a senior QA engineer will do it is that they will take these two things in two layers of the application and test it independently. And so you have a three mode of authentication, but then you don’t do three times hundreds of these test cases. That becomes 300. Instead you do three testing of this authentication separately and you do these a hundred test cases separately.

Rishi Singh 00:06:20 And so it’s just total hundred three. So that is one example, right? It is the same philosophy that most of the QA engineer will follow. They will break down the overall testing requirement in the form of what we call it, unit testing, where you just go and focus on this individual, the test classes, individual methods, let’s make sure each and every methods are behaving the way it’s supposed to behave. They will do a sort of integration testing. Just try to identify those different logical layers within the code or your application and making sure these are all coming together. It’s all coming along. And then finally those end-to-end testing, some of the flows that you want to make sure as a user, when they are using the application all of them are coming together in result performing.

Rishi Singh 00:07:05 So those are like a very high level. You just break it down those functional testing requirements and then you try to tackle one by one. You employ certain strategies so that there is not a massive sprawl of the test cases or the test code because everything that you do eventually has to be maintained. And so a senior, the QA engineer, they’re always looking to optimize everything, optimize the number of unit test cases, optimize number of the integration test cases, and the end-to-end test cases you might have come across. You might see some of the engineers, they will be very creative about introducing the right input data for these test cases so that it goes and touches as many different code blocks, it touches as many different tests cases. So the overall number of the amount of code which accumulates in the code base is minimal. And it helps you minimize the overall Cloud expenditure. It helps you minimize the overall test code liability because in there everything has to be maintained.

Kanchan Shringi 00:08:14 And I guess happy part and also failure cases have a lot of benefit to make sure theyíre working back then.

Rishi Singh 00:08:19 A hundred percent.

Rishi Singh 00:08:23 I think this is where not to underestimate like the software developers, I think the software developers, they come from the certain mindset they’re always good at the building and amazing the design and architecture, but more like the traditional QA individuals, they come with a different mindset. They look at the product spec, they look at the code and they always have this paranoid kind of mindset that you’re always looking at what are the different loopholes how this application could potentially fail in the real production environment. And so that is the reason why the software testing becomes so overwhelming. It’s a very simple code like five lines, ten lines of the code. But if you look at it, the number of the test cases that are emerging out of it, can be exponential.

Rishi Singh 00:09:12 And the moment you have, there’s something called cyclomatic complexity and so cyclomatic complexity is one of the way to measure the code. Every time you introduces some kind of conditional statements or a kind of loop, it’ll automatically lead to the two different parts. And so a simple code just the five lines, 10 lines of the code might have two or three, the conditional or the follow loop while loop kind of statements will automatically lead to 10, the 20 different test cases. And that is the reason why there’s so many, the test code, so much test code has to be written. And I think this is where the QA engineer, they’re really, really good. They always focus on the test cases that are more meaningful. They capture the positive test cases, but they also capture the negative test cases in such a way the amount of the code doesn’t accurately too much and it remains within the realm of maintenance.

Kanchan Shringi 00:10:10 Today, of course we are talking about how to help engineering with this by automating a lot of the test. So there’s a lot of problems that have to be solved.

Kanchan Shringi 00:10:23 Can you give us some history around what has been happening in this area of automated test generation?

Rishi Singh 00:10:30 Yeah, so when you’re talking about the test code or the test automation, there are two things involved. There’s one which is the planning and strategy aspect of it. And the back in days most of the QA engineers used to look at the business requirement document or the product specification. They would come up some with a plan. These are the different test cases that have to be executed and have to be done in order to certify certain product. And then there is an actual implementation that once you identify the set of test cases, then somebody has to write the code to automate the entire testing process. So we have again there has been many terms historically, obviously there is not a good way of understanding the product specification that come up with this test cases.

Rishi Singh 00:11:19 That is always done manually. I think that is also changing with the recent evolution of the AI with our comfortable to understand the product specification and tie in with the certain action with it. There is an implementation aspect, how to generate the test code. That has gone through many attempts. You might have heard of something called random testing or the input fuzzing, the backing days. A lot of people would write the code trying to, it’s like a code that is trying to understand the code trying to understand each and every instruction and trying with these various inputs and just watching, observing where exactly it tends to like what are the different code blocks that are getting executed with those random testing and fuzzy.

Rishi Singh 00:12:06 But again it was not a very viable solution I don’t think it took where we would’ve liked because the number of combinations that you can expect will be too many. And it’s just impossible to use some random testing even with the best of the heuristics that you can expect. And it’s just not possible to cover all the test scenarios and come up with a good result. Then there are something called symbolic execution, there is the model based testing. So again, all of them are geared towards the fact that just try to understand the code, try to build a form of equation. Try to figure out what is the right value of the input that I could potentially pass it, which will steer through this different execution pass it and potentially give me the result that covers everything in the code.

Rishi Singh 00:12:58 And so in some way we are able to generate the test code and provide the right set of the input data so that it covers the test cases. This problem in computer science it’s known as satisfiability modular theory. It is one of these NP hard problem in a computer science something which is not easily solvable in the finite times or in other words it becomes infinitely complex to solve this kind of problem. So that was the case. Now obviously things have changed. Those were the different times. It did something, it did solve into some scenarios those solutions that are still used in some cases where you are doing the, in the security testing where you’re trying to figure out if there is a one potential input that could potentially crash the system or brings stack overflow or some of these kind of the issues. But it hasn’t made as much headway on the functional testing. The most of the functional testing is still done by the individuals, either by the QA tester or the developers. They’re the ones who are analyzing, coming up with the virus test cases and they are the ones who are writing the code to test it.

Kanchan Shringi 00:14:04 You mentioned fast testing. I’d like to point our listeners to episode 474, Paul Butcher on Fast Testing. Your point is that use of this technique has not worked for functional testing given it has the same level of complexity as I think you mentioned symbolic execution, which examines the programs by identifying which input, result in activating which parts of the code. And then you also talked about model testing. Can you elaborate a little bit more on what was the approach there and possibly why that was not successful?

Rishi Singh 00:14:43 Yeah, it’s strongly tied with our older approach of the waterfall model of the software development. And so if you go back 10, 20 years ago, the entire industry was big on creating the UML diagram coming up with the right design and then you start writing the code. This whole extensive process doesn’t work, so it’s a very similar kind of approach on this model-based testing where you try to create this model and then try to use that model as a base to generate the code in an automated fashion. So now these approaches don’t work in more modern, in the fast-paced kind of environment. Everything is changing so fast. You don’t just create a model and you end up maintaining one more thing, especially if you’re making the change every day. You’re deploying the code every day, maybe multiple times in a day. So it doesn’t work. So again, it is a pretty, the heavy process that was there and that didn’t gain enough traction because it’s so hard to implement.

Kanchan Shringi 00:15:43 I see. So you also mentioned analysis of the code complexity. Any pointers on how that was leveraged in the past for automated test generation?

Rishi Singh 00:15:55 Yeah, the cyclomatic code complexity which really deals with what are the different execution path potentially you can have based on the input. And there are a lot of static code analysis tool, it can look into the program and it can understand the virus instructions and then it can come up with those different test cases that are applicable. So the way it happens is that if, let’s say if you have a simple program trying to calculate a factorial, let’s say, and you have written a method like if, input X equal to zero return, this if input X equals something else, then get the factorial X minus one times the value X and so on. So again this is a simple if then else condition it could be for loop and while loop.

Rishi Singh 00:16:48 And so, every condition adds into the overall complexity and that is used to come up with a list of the execution parts that are applicable in the code and that’s how people end up creating the test out of it. On the other hand, the biggest challenge remains is what is the right input that you can introduce which will lead to those different execution paths and that has the same level of problem. And so it’s a similar kind of problem that the input fuzzing or the symbolic execution had just suffered. It’s not easy. You cannot come up with a value, especially if the code is extremely complex.

Kanchan Shringi 00:17:26 Sounds like model-based testing has its place in certain approaches to software development. And lexical approaches also have some level of success, but none of them have really solved the larger problem. And now we are coming to the use of generative AI. Can you discuss that? Like what are the capabilities today that large language models offer for use of test code generation?

Rishi Singh 00:17:56 Yeah, yeah, a hundred percent. So I think the approach itself is very different. I think in the backend days when we were trying to generate the code, we were always trying to integrate the individual instruction in the code and then we trying to do. In this particular case, the generative AI, and especially for the code, it’s nothing but an extension of the recent breakthroughs that we have seen in the large language model and the natural language processing, right? And so the chat GPT or in the Google board or in the Google PaLM 2 two or the Llama code almost all of them are they’re based on the same thing. And so here we’re using this deep learning algorithm and the large neural network to train the huge amount of the dataset. And then, those examples which were created by the humans become a base for these models to provide you some output or help you doing the code generation or even the test code generation.

Rishi Singh 00:18:53 So it’s an approach itself is different. It’s unlike the typical algorithmic approach. Here you have a program that is understanding the natural language processing, is looking at the examples from the GitHub where you have a lot of public repositories, there is a lot of code available for every code and you have some sort of test code available. It’s just covering the internet and various developers, community forums and looking at many examples of how people have written the code and how the code is tested and based on that, it’s a suggesting something, right? So here the approach is completely different, not necessarily taking the algorithmic approach instead it’s a learning and based on that it’s providing the feedback for the test program, right? Which is very, very different. I think it’s very successful, just like the ChatGPT, just like in the world of natural language processing, it’s much better on the code generation side. But again it has a long way to go. Love to share more details. There are some shortcomings as of now, but this approach has taken much farther than anything that we had in the past.

Kanchan Shringi 00:20:04 Can you tell us any example or any story that you may have about when you actually used an AI generated test directly? What happened? Did it work as expected or did not work?

Rishi Singh 00:20:18 Yeah, yeah. So I think there are plenty of examples. In fact I’m sure you’re aware of this whole buzz about the generative AI. Anything which is fairly common or which is available in the public domain. I think generative AI is awesome. And so let’s say if I’m trying to look to write a program called email validation, and if I just give a simple prompt my generative AI will easily generate the code for the email validation. Letís say I have the email validation code and I’m asking generative AI to generate a test code out of it, it’ll do it right away. There is no problem. I would say like in the problem, the limitation where the generative AI encounters is when you have really complex your own project, your own code which has nothing to do with any of this public repository or the public, the simple example it has nothing to do with one of the sample code that you might have come across either on the GitHub or maybe internet forums or maybe in one of the schools or university writing the quick sort algorithm or the mugshot or any of these things.

Rishi Singh 00:21:20 Then the generative AI does run into the limitation and especially when you have like 50 lines, hundreds lines or even much longer the code, which is very very common in the real world environments at many of the enterprises. And I think this is where I have seen the generative AI does the partial work I won’t say it, don’t do anything it’ll do the partial work where, it’ll clear some skeleton or it’ll do something, it’ll write something and expect the software developers to take up from there. Try to understand what it has done and try to complete the rest of them. So I have seen the varying result in some of the scenarios where I’ve seen the generative AI just created a skeleton, didn’t do anything I’ve seen example where it wrote some code which was completely unrelated and there are times where it has done a really a good job where like 80% or 90% part of the job. But I, as a developer, since I understand the code, I could easily complete remaining 10 to 20% and get the job done.

Kanchan Shringi 00:22:23 In that case, what would you say about the productivity using GenAI as compared to the programmer starting from scratch themselves?

Rishi Singh 00:22:34 Yeah, I think the number one thing is that there is an increase in productivity. No matter where you start, like if you’re starting from scratch, a simple prompt definitely you gain a lot of code. If you’re doing something, an existing code, then you do some help from the generative AI. So again I would like to call this the moment when, when I’m talking about this generative AI, it’s a typical the code generation that you see from the ChatGPT or maybe, Google PaLM 2 or in the Llama code, but then there has been many companies that they built the product on top of it. And so they have done some wonderful job on top of the current state of the GenAI and we have done a better job.

Rishi Singh 00:23:19 The Sapient.ai is one of them. So for example, if a developer is writing the code in an existing code base generative AI can help to some extent, but the developers still have to put a significant effort understanding the changes that they’re doing and coming from the testing perspective, like the developers still have to figure out what are the new set of the test cases that are emerging out of the changes and the test cases that are becoming obsolete. So there is the good amount of the work that is still left out with the individual developers and I think that’s where instead of using the plain GenAI from the ChatGPT or any of these tools, I think probably the tools like the Sapientt AI is better because it has built on top of it. We have trained the GenAI to be more accurate than that. And then we have also created some kind of experience on top of that so that just developers don’t have to figure out themselves instead of the solution itself reveal the set of the changes that has to be brought up.

Kanchan Shringi 00:24:26 Can you talk a little bit about the methodology of training GenAI?

Rishi Singh 00:24:32 Yes. So there are the GenAI tool or the models that are available and basically these models can be instantiated and then it can be trained with your own set of data. And so, we have gone through this like in years of this whole experience where we have seen, we have built a layer on top of it, which is continuously verifying the output from the GenAI. And so, GenAI is very notorious that it might hallucinate it can give the confident response even if the response is not necessarily accurate. And it happens the same thing on the coding side where there is a certain code that gets generated but it’s not accurate. So we have used, like in some of the programs, to look at the code, interpret the code in an automated way, making sure is it the right code, it’s not the right code.

Rishi Singh 00:25:22 If it’s not right code, then we try with this virus mechanism. We have built some of this our own internal secret sauce where we look at the code, we break it down in this multiple chunks and, send it back to the generative AI so that the overall accuracy is much higher. That is one. And then we take it to the even next level where once we learn certain pattern about the generative AI where it’s not performing well, we introduce the right set of training data so that the generative AI response is more accurate.

Kanchan Shringi 00:25:57 Training data. And you said many yearsí worth of code and you also mentioned that the wrappers, correct me if I’m wrong etymology, but the wrappers check the code that is generated and maybe retry. Can you talk a little bit about what that check entails?

Rishi Singh 00:26:15 Yes. So let’s say more from the testing perspective. Let’s say you have a certain code that code leads to 10 different use cases. And so there is a layer that will try these various use cases, try with the GenAI, look at the algorithm from the code, make sure it is addressing the use case that you had intended for. If it’s not, then it’ll make certain changes. It’ll break the original code into this multiple chunks and, it’ll try to replant the GenAI so that it gets a different output. So there is a sort of the flows that we have created internally and that makes the overall code the final output of the code for the individual developers to be more accurate and more relevant.

Kanchan Shringi 00:27:05 I see. So I was looking at Sapientsí website and it looks like you’ve developed a plugin for the IDE. And as you kind of explained, it does use GenAI to generate the unit task, but it analyzes the results and I also read it comprehends exit points of methods, et cetera. How does it do that? Like what is the learning there? Is that algorithmic or is that also AI based?

Rishi Singh 00:27:35 Yeah, it’s both algorithmic as well as AI. But you mentioned the important point about these plugins. I think I just cannot emphasize more that probably the ID plugin is the most natural place for the software developers to interact. I have seen most of the software developers, they don’t even come out of this intelligent AI the VS code. I think these plugins are extremely powerful. You can get all sorts of environment including the terminal, many of these utilities, these are all built into the ID. And that is the reason why I think I’m really big on the plugins because that’s the place for the software developers to go. Now how do we do it? I think the plugin has a lot of advantages. So number one, it helps us create this whole simulation richer experience because the plugin has access to the entire project.

Rishi Singh 00:28:26 So we have a much deeper understanding about the source code. It’s necessary, the dependencies and so on. So no one has to manually feed the code or to provide any kind of context. Instead this plugin is able to extract all the details, all the information so that we can help our developers there. Once we have that information, then it does go through both like algorithmic as will this AI approach it works in conjunction, it integrates with some of the backend and then combination of these two are able to generate the test.

Kanchan Shringi 00:28:59 Can you elaborate? It connects with the backend, what do you mean?

Rishi Singh 00:29:03 Yeah, as you, when a backend, it’s really the cloud environment. And so let’s say if I as a developer if I’m working with a certain code and if I’m making certain changes, now I’m about to generate the test code. This plugin is capable of pulling the necessary details and then it’ll integrate the cloud-based Sapientt AI backend to help generate the code using the, the generative AI. And so like the generative AI is a very complex process, obviously we cannot run everything on the plugin. The plugin is really an interface for the individual developers to work, but a lot of things are done in the backend in the Sapientt AI cloud.

Kanchan Shringi 00:29:45 In terms of the languages that you support, is that simply depends on how much training is available in that language for GenAI. Like what language do you support today? And maybe you can talk about why it’s that set.

Rishi Singh 00:30:03 Yes, yes. So the GenAI, again the plain vanilla GenAI which is available from the Google, from the Open AI, many of them. And you could just use it, anyone can build a wrapper and GenAI can be supported from day one. I think what I’m really big on is the fact that plain GenAI is not sufficient for the individual developers because the OpenAI has published one doc — not OpenAI, I think the Microsoft copilot, the GitHub copilot published one report where they admitted that the 29% of the code that gets generated from the copilot doesn’t require any kind of involvement from the developers. But the remaining 71% is where the developers they have to first understand what is the code that has been generated and then they have to work on top of it.

Rishi Singh 00:30:52 And I think this is where we have to be careful that it’s not just a use the code directly generated from in the large language model, because then we are not adding enough value. And I think what were the languages that we support? We have been building layer on top of it. And so that accuracy level is not at 29%, but accuracy level is probably 80%. And so the individual developers have to step in only 10 to 20% instead of like 70 to 80%. Right now the SapientI supports any of the JDK language family, Java, Kotlin, and so on. We are expanding our reach into other languages. Python is on our roadmap in publishing or releasing pretty soon. And there is a Go lang, there is a TypeScript, and some of these languages are out there as well in our roadmap. But again, the goal is, it’s not just about just supporting the language but supporting in a way that we bring enough value and make sure the individual developers have very minimal effort required to get the job done.

Kanchan Shringi 00:32:02 In terms of training, is it training with just the code or it’s also training with test cases?

Rishi Singh 00:32:09 It’s about the training with the code and the training with the test cases. So given a product specification, you want to understand about all the test cases that are applicable for it. And so there are a lot of acceptance test cases that gets generated out of it. But once individual developers are implementing the code, then the code itself gets interpreted by the GenAI and we are able to generate the test code out of it.

Kanchan Shringi 00:32:38 Reason I’m asking that even for just somebody directly using ChatGPT APIs or ChatGPT interface for example, will their accuracy improve if they present examples of test cases in the prompt? Is another approach maybe starting with writing some skeletal test cases and asking the GenAI to improve them, do you think that has a higher success?

Rishi Singh 00:33:03 Definitely it’ll get better than what you get it in the first attempt. But again, it’s an ongoing effort like it really like coming across the different use cases where the GenAI has not been giving it the good result. No, this is where we have got this bread and butter, this is what we do. We figure out the different scenarios where the GenAI has not been working well and we are giving the different prompt, we are giving the different training data to make sure the output is better.

Kanchan Shringi 00:33:32 Can you give some examples of numbers, like how many tests would get generated based on a certain body of code and how does that correspond to the pyramid that you talked about earlier?

Rishi Singh 00:33:46 Yeah, so it depends on the code that has been written and how well it has been written. What we have seen in our experience is that if the functional whatever the number of lines of the code that you have it for, the production, your test code ends up becoming somewhere around two to 10 times that many lines of the code. This is an average number, but it really depends on how the code has been written. If cyclomatic complexity is high, that means the ratio is going to be higher. If it’s really a simple code then it might be just the same number of lines of the code.

Kanchan Shringi 00:34:21 Can you just clarify my understanding, so the number of test cases generated, like you said is much higher if the code is more complex, has many code paths, is that also how the parameters that you use to validate once the tests are generated? For example, if you expected based on the code complexity and number of tests to be generated, but you only see much fewer, is that when you re-prompt and retry, is that the kind of enhancements that you were alluding to that have been implemented?

Rishi Singh 00:34:56 I think the number of test cases is still predictable. You can ask the GenAI to do it or you may not want to do it. I think what I was alluding to is the actual code that is generated by the GenAI. So imagine that you have the test code that has a certain value like input and that doesn’t work. And so you what were the code that you came up with and you try to execute an environment but it failed. And so the code that was generated ideally it was supposed to work but it didn’t work. And there as we dig deeper into it, we figured out there are some scenarios, there are the certain methods, there are combination of things that have been used where this GenAI has not been doing well, which means that let’s try to prepare some code and feed the GenAI so that it has a better understanding about the scenarios. So from that point onwards, that code that it generated with the input is all correct and it compiles, it executes successfully without any user intervention.

Kanchan Shringi 00:36:02 As you’re explaining this, I’m wondering how does TDD or test-driven development methodology fit in with this approach? Or is it not at all?

Rishi Singh 00:36:13 The TDD, it is like one of the thing that came in early 2010 when everyone was really big on the extreme programming and so on. So the TDD doesn’t fit in. I think, in fact, TDDs won’t fit in anywhere. Where what the environment is really the fast-paced kind of environment where things are continuously changing. The TDD is something you create the test, and you try to build everything around it. The reality is when things are really fluid, your test changes as well. And so you don’t get this time. However, the TDD has two more things that comes as a part of it and obviously the TDD brings the quality in the center stage, that means whatever that you ship it has to be of really high quality. And the second thing is it forces the individuals to pick the right design choices.

Rishi Singh 00:37:05 And so it’s not just about doing the code coverage, it’s also about having the right design in the code so that it remains testable. If things are not testable you might be able to do the copies but it’s just a matter of time. It’s so brittle it’ll start failing fairly soon but the TDD forces you because, test is the number one thing. Now with the tools like the Sapient.ai, you are able to achieve the same thing now, you no longer have an excuse to not cover the quality side from the beginning. And so anytime when you’re making the change and if you’re shipping any kind of functionality, you try to cover the quality right there and if you have not written the code in a certain way, you get prompted about those fundamental issues and you are able to make those changes before it’s too late.

Kanchan Shringi 00:37:56 So it sounds like a lot of the focus that you have is on unit testing. Is that correct?

Rishi Singh 00:38:04 Yes, at the moment I think our focus has been more on the unit testing. Again the goal is to help the developers as much as possible. Ever since this whole shift left moment has started the most of the QA responsibility has converged with the software development process. But I still see the QA team stepping in doing a lot of API testing, integration testing, they’re doing a lot of end-to-end testing. But when it comes to the unit testing, that remains the sole responsibility of the individual developers and that’s one of the reasons why we wanted to focus more on the unit testing. But in the long term our goal is to really cover the entire QA spectrum. And so as the shift left movement has been going on, the software development team has received a lot of responsibility. They are not just implementing the code and building the new features but also doing the QA. They’re also involved in operationalizing things and shipping the code all the way into the production environment. And I think this is where the various tools and platforms will come in. And so the developers, they don’t necessarily need to become a QA engineer or the operational engineer. Instead they can use these various platforms, simply push the button, get the job done and take care of the full responsibility.

Kanchan Shringi 00:39:23 Let’s talk a little bit now about how does the developer gets comfortable about the quality of the tests. So do you measure or have you measured or are you in the process of measuring how effective the tests are at finding bugs or regressions as compared to hand-coded tests?

Rishi Singh 00:39:46 Yeah, so the testing framework, itself is the challenging things. I think the testing framework itself is evolving. I think one way to measure the testing framework is how often the software developers have to intervene to make the software test pass. And so any of the tools that we are coming across in all of this AI assisted tool, these are not perfect. It’s getting better but it’s not hundred percent there and the code that it generates or that number of the test cases that it generates, it’s always bringing the software developers into the context so that they can review, they can verify before they can commit the changes. So right now it’s always working as a copilot mode where software developers are equally involved in the entire process and always verify. So I think the goal is that these tools become so smart, so sophisticated that the code generated doesn’t require any intervention from the developers. So that is number one. Other than that, I think really like in the short, it’s really the developers who have to look into the code and verify. There is no silver bullet, that it’ll automatically know everything especially those semantic context about the what is required from the code is a bit hard to implement and it’s not there as a part of the AI.

Kanchan Shringi 00:41:08 So this analysis by the developer also means that they have to retain some of their earlier methodologies and frameworks. So just having an idea of a test spec would still be important, right? Measuring code coverage would still be important.

Rishi Singh 00:41:27 Yes, it’s important and I think that’s where they can also be a bit strategic about know engaging with the tools which is a bit agnostic, which is aligned with whatever they were doing. So if you’re adopting a tool which is totally disruptive if it’s bringing some kind of proprietary stuff, then it’ll be a challenge. What I’m seeing nowadays, especially with the Sapient AI, we have taken the approach that let’s not bring anything proprietary. Let’s try to compliment everything that software developers have been doing. Let’s try to become a multiplier into everything the software developers have been doing. And so instead of writing the code, we write the code probably 90% or maybe 95% and then instead of taking like 20 to 40% of their time, maybe it’ll take like 5% of their time and it’ll get exactly the same result that the individual developers would’ve done themselves.

Kanchan Shringi 00:42:19 So unit test today and what’s your vision for the future?

Rishi Singh 00:42:24 Yeah, I think future is bright for sure. It is really, I think generative AI has been massive. I can see ethics in the short term because GenAI is not going to replace the developers. Instead it’s going to make software developers probably 10 times, maybe a hundred times more productive. With the DevOps I can clearly see the time to market has significantly reduced now with the AI that’s going to accelerate the entire process, which means there is a more amount of the code, there is a greater number of iteration, there will be a lot more activity going on in this world of enterprise development. All of this is extremely exciting and it’s very, very promising provided it’s managed and executed properly. Else, it can result in really a massive to quality issue. It can turn everything into a chaotic situation. And I think this is where I feel really, really excited that after Harness.io now my focus is more on the quality side and with the companies like Sapient Ai, which is laser focused on the quality, if the company start employing the tool like this, then they’ll be better prepared from the quality standpoint so that they can deal with the situation, they can make their developers more productive with all the AI tools, but they can also cover the quality angle.

Kanchan Shringi 00:43:42 Can you comment on the plans or what makes sense like fitting in with existing frameworks? So for unit tests, for Java for example, my guess is you use the test conform to the J-units spec. Is that a fair assumption?

Rishi Singh 00:43:58 It is, and I think this is what I was referring to earlier, that the software developers can assume that the AI is not really going to replace things, but AI is going to compliment and the companies like the Sapient AI which is built on this AI platform is going to compliment. And so everything that it produces has to tie back with the generic framework like the J unit or X unit or, if you’re using Python then PI unit and so on. And bring the value, bring the code on top of it and so it can be understood by the individual developers and it feel like complementing instead of trying to replace in any some form.

Kanchan Shringi 00:44:37 Rishi, is there any topic today that you feel we should cover in more detail or we haven’t touched upon?

Rishi Singh 00:44:44 Yeah, we talked a lot about the testing, and I think a lot of our conversions than within the unit testing thing. We talked a lot about cyclomatic complexity and other things. Yeah, love to share my thoughts about this API, about end-to-end testing. The reality is that the quality has to be tackled as a whole. People are making changes in the QA standpoint or more from the developer standpoint. I would like to optimize the cure responsibility as unit testing, integration testing and, end-to-end testing and so on just to optimize the cloud resources, optimize the efficiency and so on. But everything has to look together for the end users so that they get the maximum benefit when everything is becoming so fast paced. Then how these things are going to be covered more from the EPA perspective from the end. Interesting perspective, love to share the thought in some other episodes or so because that means broad topic. Thank you so much. I think this was a wonderful conversation.

Kanchan Shringi 00:45:47 How can people contact you if they have any follow up questions?

Rishi Singh 00:45:51 Yes, I think we are available at www.sapient.ai. Anyone can contact us from there or they could also reach out to us using the email address [email protected]. We also have some community forums so I would encourage people to join there and just watch out for you, they can also subscribe to some of the blogs or the newsletters so yeah, they can stay connected with us and I’d love to be there.

Kanchan Shringi 00:46:17 That’s good to know. Just one final thought about your invitation to people to contact and give feedback at Sapient. What Iím hearing from people that are testing or using the test generation now, how much is it adding to their productivity? Are you getting any feedback and how are you using that to improve the process?

Rishi Singh 00:46:39 So the people who are using the product, I think they’re seeing and a lot of benefit. I think that there are, I come across two groups of people. One group they are seeing the huge productivity gain because they were spending so much time writing the code as I said, like what were the functional code that you write your test code could be as measure like twice the number of lines of the code or maybe up to like eight to 10 times depending on how comprehensive you want to implement, right? So that’s the one group of people that I see. There is a second group of people who were taking this entire thing in a hopeless manner in the sense like, they were just not writing the test at all. They don’t have time, there is a continuous agile sprint going on every two weeks you know, and they will come back and try to finish these tasks and then theyíd move on and try on a new set of tasks. For them it has been amazing. I think they’re super excited that they were all passionate about the quality, but the reality is that it’s very hard to balance the speed and the quality together. And in today’s world where, everyone is trying to deliver bold, everyone is trying to rush the features into this production environment. For them the tool like this has been the lifeline. So that’s the kind of feedback that we see.

Kanchan Shringi 00:47:57 Thanks Rishi. This is very informative, it’s rapidly evolving area and it was great to hear your thoughts on how you are improving and adding more functionality over.

[End of Audio]

SE Radio 603: Rishi Singh on Using GenAI for Test Code Generation

Show Notes

Transcript

Join the discussion

More from this show

SE Radio 626: Ipek Ozkaya on Gen AI for Software Architecture

SE Radio 625: Jonathan Schneider on Automated Refactoring with OpenRewrite

SE Radio 624: Marcelo Trylesinski on FastAPI

Menu

Recent posts

Search

Search

SE Radio 603: Rishi Singh on Using GenAI for Test Code Generation

Show Notes

Transcript

Join the discussion

More from this show

SE Radio 626: Ipek Ozkaya on Gen AI for Software Architecture

SE Radio 625: Jonathan Schneider on Automated Refactoring with OpenRewrite

SE Radio 624: Marcelo Trylesinski on FastAPI

Menu

Recent posts