Search
Eran Yahav

SE Radio 666: Eran Yahav on the Tabnine AI Coding Assistant

Eran Yahav, Professor of Computer Science at Technion, Israel, and CTO of Tabnine, speaks with host Gregory M. Kapfhammer about the Tabnine AI coding assistant. They discuss how the design and implementation of Tabnine let software engineers use code completion and perform tasks such as automated code review while still maintaining developer privacy. Eran and Gregory also explore how research in the field of natural language processing (NLP) and large language models (LLMs) has informed the features in Tabnine.

Brought to you by IEEE Computer Society and IEEE Software magazine.



Show Notes

Related Episodes

Other References


Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Gregory Kapfhammer 00:00:18 Welcome to Software Engineering Radio. I’m your host, Gregory Kapfhammer. Today’s guest is Eran Yahav. He’s the CTO of Tabnine and a faculty member in the computer science department at the Israel Institute of Technology. Eran, welcome to Software Engineering Radio.

Eran Yahav 00:00:36 Hey, great to be here. Thank you for having me.

Gregory Kapfhammer 00:00:39 Today we’re going to be talking about the Tabnine AI coding assistant. It uses large language models to help software engineers complete tasks like code explanation and test case generation. Eran, are you ready to dive into the episode?

Eran Yahav 00:00:53 I’m definitely ready, yeah.

Gregory Kapfhammer 00:00:54 Alright, so one of the things I noticed on the Tabnine website is that it has a million monthly users. First of all, congratulations. That’s really cool. What I want to do now is to talk a little bit about what those Tabnine users are doing. Can you give a few concrete examples of how software engineers use Tabnine?

Eran Yahav 00:01:15 Yeah, sure. So the Tabnine provides assistance across the entire SDLC. It helps you write code with code completions that are very advanced. It has a chat interface that allows you to create new code, review code, refactor code, translate between languages, generate tests as you mentioned initially, document code, explain it. So it basically helps you do anything that you have to do as a software engineer and the vision is really to provide agents that help you do anything and everything a software engineer does just faster and with a higher quality.

Gregory Kapfhammer 00:01:52 So you mentioned that it helps software engineers both move more rapidly and also do things with better quality. What’s the reason that Tabnine helps that to happen?

Eran Yahav 00:02:01 I guess there’s so much boilerplate and so much boring work in co-generation in programming. We know that and for years people have been obviously copying snippets from past projects, from Stack Overflow, from the internet at large just to get over these boring things and get over with it. And then you basically get whatever is out there with the rise of LMs and with the rise of assistance like Tabnine that can contextualize deeply on your context on your organization’s code, etc. You have the opportunity to generate these piece of code much more efficiently and more, I guess true to what they should be in your context, right? So part of the quality of the code that you’re generating is being suitable for the environment in which you’re operating. And I think this is one particular aspect that Tabnine excels with.

Gregory Kapfhammer 00:03:07 So later in our episode we’ll dive into the specifics of Tabnine’s implementation and how software engineers can use it. Before we do that, at a high level, can you explain a little bit about how Tabnine actually automatically performs these tasks?

Eran Yahav 00:03:22 Yeah, sure. So at the bottom of the whole thing are obviously large language models, right LMS and they are capable of doing amazing things these days. They’re very powerful and they’re becoming more powerful on a weekly basis almost. But from the LM to an actual use of software engineer, there is a hole that has to happen in the middle, right? Part of that is the context that I mentioned earlier. So Tabnine has what we call the enterprise context engine, which is something that basically knows how to draw relevant context from all code sources of information and non-code sources of information in the organization to make sure that again, Tabnine operates like an onboarded employee of the organization and not as a foreign engineer in the org. So Tabnine knows everything about the org, and which informs the code that it generates, how it reviews code, etc.

Eran Yahav 00:04:22 The second thing that is there between LM and the agents is a component that provides trust. One of the major questions that we ask ourselves as engineers when we delegate a task to AI or even to a junior engineer, is how can we trust the results that we get back, right? And Tabnine really focuses on that with something that we call coaching, Tabnine coaching, and that allows Tabnine to make sure that the code being generated and code being pushed into your code base follow organizational specific rules of what does it mean to be high quality code, what does it mean to follow organizational best practices, etc. So again, we can and we’ll talk about the technical detail of how these things are done later, but if you’re trying to look like for 30 feet view, 30,000 feet view here, it’s more LM contextual context, organizational context, trust and agents that are built on top of that.

Gregory Kapfhammer 00:05:27 Okay, that was really helpful. We’re going to dive in more to the issues about trusts and the specific LLMs that are used by Tabnine. Before we do that, I wanted to just get a high-level perspective of the various features. Maybe if I could read off a feature and then you could briefly describe it, we might be able to give the listeners a full featured picture of some of the things that Tabnine provides. So you talked about this already, let’s revisit code generation. How does code generation work in Tabnine?

Eran Yahav 00:05:56 Yeah, so in the old days as I call it, to which old days, meaning like two years ago, maybe that’s like ancient history, like this is akin to Roman Empire in terms of AI for software engineering, most of the users were really using code completions to generate code and it could be either code completion just from typing the prefix and getting Tabnine to complete what you’re typing or typing a natural language comment and getting code generated off that. Progressively, I think, or increasingly people are using more and more the chat interface to generate larger amounts of code, even code that crosses file boundaries, different places in the projects that have to be updated. So you can just describe what is that you’re trying to do as a very high-level prompt and Tabnine will generate the code, do the whatever modifications required for you. And I think we’re seeing more and more code being generated that way.

Gregory Kapfhammer 00:06:54 Thanks for that description of Code Generation. I’m operating under the assumption that Tabnine can also generate the documentation for the code, is that correct?

Eran Yahav 00:07:04 Yeah, absolutely. Generating documentation to me when we started sounded a bit off because, I’ve been taught that documentation should describe why things were done and not what or how they’re doing it, right? But turns out that first of all it’s better to have some documentation than none. And second, once you get Tabnine to generate some documentation, turns out that developers are actually completing the part that is so the extra information that only the human knows maybe. So it’s different writing documentation from scratch versus hey, Tabnine already generated the bulk of it. Let me just add a couple of sentences here that clarify for a run of the future why this was done this way.

Gregory Kapfhammer 00:07:49 So it sounds like Tabnine can generate both documentation for the code it’s generating and it can also generate documentation for existing code in your code base.

Eran Yahav 00:08:00 Yeah, absolutely. Yeah.

Gregory Kapfhammer 00:08:01 Okay. Now, a moment ago you seemed to hint at something which looked like a larger scale change. So how does Tabnine support things like large scale code changes or a system refactoring?

Eran Yahav 00:08:13 Yeah, system refactoring, I think we’re not quite there yet. I think large scale changes, changes across multiple files, edits that take into account existing microservices in the organization and edits that refer to code that exist elsewhere in the org. These are all things that are enabled by the Tabnine context engine. And again, I think it’s quite remarkable and surprising the kind of things that we can do right now by prompting Tabnine or assistance in general and getting them to make changes across, I know tens of files in the project changes that are consistent and compile and run, right? So that’s I think something that happened really in the past year or so, that progress, I expect that to keep improving so we can make changes across the entire org, maybe refactoring across the entire org.

Gregory Kapfhammer 00:09:11 Thanks for that response. I heard you say a moment ago that Tabnine operates in an agentic fashion. Can you explain what that is?

Eran Yahav 00:09:20 The agentic fashion for us varies for different levels of autonomy. At a high level we mean something that has a higher degree of autonomy than just a human AI conversation of like a request response or chat question and answer, right? We mean something, for example that can say, hey, Tabnine generate all the tests required to reach a certain code coverage target, or hey, Tabnine, why don’t you give me a test plan for something and then implement it, right? So things that are multi-stage and quite sophisticated and involve planning and you don’t know upfront, for example, how many iterations it requires with ELM and things like that. Again, we can get into like technical definitions of all those things later, but just to give the intuition, we’re talking about things that have a higher degree of autonomy than just a conversation.

Gregory Kapfhammer 00:10:13 Okay, so if I’m working in a way that there’s less autonomy, I might be, for example, writing code and Tabnine will complete it, or I might ask Tabnine to generate a test or generate documentation, but as I give it more autonomy, it’s taking on more of the tasks for me. Is that the right way to think about it?

Eran Yahav 00:10:31 Yeah, that’s the right way to think about it. And in general people talk about coding assistance versus agents versus AI engineer, and there’s not much of a dichotomy there. It’s actually a continuum, right? As we increase the level of autonomy in every task and as we increase the percentage of tasks that can be completed by AI, we are getting closer and closer to the AI engineer. And I think the challenge is to do that in a way that continuously provides value to the developer and still maintains the trust in the delegation, right? It’s all about delegation. The AI assistance or the AI engineer is a matter of delegation. So if you think about code completion, that was kind of a micro delegation, right? I typed something, I did not have to specify anything because the specification was implicit. The prefix and the context were the specification, right?

Eran Yahav 00:11:29 And then Tabnine completes it. So that’s a micro delegation. And the question of trust is also quite simple. It’s very tight. You the look trust you, Greg, look at the completion and say, oh yeah, these three lines look great. That’s exactly what I would’ve written myself. Or hey, let think about it for a second. That’s not what I would’ve written. But it looks correct so le me take that. So the micro delegation and you can say that ambitious delegation is like, here’s the definition of my project, let me give you like one paragraph in English, go ahead, develop the backend for me, the front end, everything, the CI, the integration and come back when you’re ready, right? So that’s a higher degree of autonomy, but now trust is a little more challenging. Am I or are you Greg going to sit down and review 30,000 lines of code generated by AI and make sure that they did what you wanted in the spec? Does your spec even cover all the kind of twists and turns of the possible implementation? And so that continuum like going from the micro specification with the very tight human in the loop trust with kind of common-sense judgment all the way to high level spec and somehow establishing trust on that, I think that journey is interesting and that’s what we’re doing.

Gregory Kapfhammer 00:12:46 I really like that concept of a micro delegation all the way up to a macro delegation. I think that’s really helpful. Thanks for sharing that idea.

Eran Yahav 00:12:55 Yeah, I think really the future is every engineer is really a team lead and that team lead is leading a team of AI engineers. They could be specialized in different domains, the AI engineers, they could have varying expertise, but I think every engineer would have to start thinking like an engineering manager in a sense, or at least like a team lead.

Gregory Kapfhammer 00:13:20 That’s a great concept. Thanks for sharing that idea. One of the things I heard you mention a moment ago is the idea of review. Does Tabnine have features that would help me as an engineer to support automated code review for like a poll request on GitHub?

Eran Yahav 00:13:35 Yeah, yeah. Tabnine does what we call Tabnine coaching, which is organization specific code review. So code review is really a challenging problem, right? You’ve been around as an expert on these areas of software engineering and you know that code review is really challenging first and foremost on like, hey, what is even good code and what are valuable comments and what is just pure noise, right? I mean, do I want the AI to give comments about, hey Greg, use more meaningful variable names. Maybe do I want comments? Hey, your function has whatever cyclomatic complexity 15 and you should refactor it to lower, maybe not sure that these are like the super high value comments that I expect from AI, right? What I want AI to do is to give me comments that are thoughtful about the meaningful parts of the code. Hey Greg, you are sending customer data without encryption here.

Eran Yahav 00:14:39 That’s actually valuable, right? That’s a comment that I would like to get, not something that you may have divisionalized zero. Again, divisionalized zero is great and valuable, but there are many other ways linters and, study analysis tools that I’ve worked at for many years that can give you that. But kind of the meaningful things of like, hey, you’re handling the database connection in a way that may be leaking your credentials or something like that. These are the meaningful comments I look for in coding. Sorry, but I went off a tangent. What I wanted to say initially, is that what Tabnine lets you do is to define organization specific rules of what does it mean to have quality code and like what do you consider quality code, right? So you can say, hey, in my project we only serve our files from Google Cloud storage.

Eran Yahav 00:15:28 So if you’re doing something else that serves files from another source, maybe that’s correct technically, but it’s not appropriate for my project. So you can define these rules, you can have Tabnine check them, you can have Tabnine, learn them from passcode reviews and code history. So it’s really a very powerful module and I’m really, really excited by it and excited by the key role that it plays moving forward to the AI engineer. Because as I said, when moving to the AI engineer when moving the delegation boundary to more code being generated by silicon, by AI, you need to somehow scale the question of trust or the problem of trust. How can we trust whatever it is that is being generated by these agents? So code review is central to that question.

Gregory Kapfhammer 00:16:16 Later we’ll talk about trust when it comes to the specific models that Tabnine has generated. But in the context of code review, I’m wondering, does Tabnine give us advice and do the code review in the IDE, like in PyCharm or in JetBrains, or does it do it inside of GitHub? Where does the code review actually take place?

Eran Yahav 00:16:35 Yeah, so we have code review in the pull request. You can do that in GitHub, GitLab, Bitbucket, wherever you work. So Tabnine’s philosophies to meet the developers wherever they work. And so we have that integration already with the main source code platforms as I mentioned. And also I think very soon it’s going to be in the ID as well. There’s a CLI that you can invoke from your CI process and get the results in whatever format you’d like. Again, the idea is to have the same engine power, all these integrations and have developers consume the results of the code review wherever they need it.

Gregory Kapfhammer 00:17:13 That sounds really interesting. So there’s a Tabnine CLI that I can run in my terminal window, and then is it the case that I can also run it in GitHub actions and then it comments on the pull request there?

Eran Yahav 00:17:24 Yeah, exactly right.

Gregory Kapfhammer 00:17:26 Okay, that sounds really awesome. Now, it’s clearly the case that Tabnine is providing features in PyCharm or JetBrains or Neovim. And I see you have a lot of different plugins, and now I’ve learned that you can also integrate with the CLI or into things like GitHub actions. So how did you build all of these different interactions and integrations? Are they all separate tools? Are they all part of one Tabnine binary? How is this system actually built?

Eran Yahav 00:17:53 Oh, in terms of architecture, I think first of all, ID integration is really challenging. IDEs are plenty and they move pretty quickly in ways that are often not very backwards compatible. So the APIs are, yeah, you’ve seen these things. So really our design goal from the start was to keep the ID integration really as a sort of a shell and have the engine outside the IDE. So Tabnine is architected as a central engine binary, which is written in Rust by the way, that powers all of Tabnine’s functionality. And the various ID plugins are all talking to this single binary and binary does all the heavy lifting of all the interesting tasks.

Gregory Kapfhammer 00:18:41 So you mentioned architecture a moment ago. Let’s dwell there and then I want to talk more about why you picked Rust in order to build Tabnine. But first of all, from an architectural standpoint, I’m guessing that there are parts of Tabnine that run on the developer’s workstation and then maybe other parts that run in the cloud. Can you kind of explain how that works in Tabnine?

Eran Yahav 00:19:02 Yeah, so first of all, yes, there’s obviously kind of the local interaction integration with ID plugins, awareness of the current project, what is going on in terms of existing code, any workspace, etc., etc. Semantic analysis of the code base, all of that is happening locally on your machine. And Tabnine maintains some representation of your code base locally. So it can use that as part of the Tabnine context engine, right? So when you are doing something like generating new code, Tabnine is aware of your existing code when you’re doing code completion, Tabnine is aware of similar code that already exists in your code base when you’re asking to generate tests Tabnine knows the existing test in your code base already. All those things are happening on the client machine. So there’s some indexing and analysis, embedding vector database, all the components that you’d imagine that exist in something like Tabnine are running on the client.

Eran Yahav 00:20:04 And of course there’s a server side to the context engine that is, for example, creating their presentation of all the code in the organization, creating a presentation of non-code source of information like Jira tickets, like Confluence pages, other sources of information. And that part sits on the server. And obviously the two communicate. In addition there is the LLM inference itself, which runs server side because this requires GPUs for the most part, etc. Tabnine does have a way to run models locally, but we’ve been kind of increasingly moving away from that for the general LLMS. There are some models that may be running locally still, and we kind of play with moving models around based on tradeoffs of compute and latency and others. But generally inference is happening mostly on the cloud these days on server side.

Gregory Kapfhammer 00:21:00 Okay, that’s really helpful. I noticed you talked about something called semantic analysis, and you mentioned that there’s these various analyses that are running locally on the computer that the developer is using. Can you say a little bit about what semantic analysis is and why Tabnine needs to perform that task?

Eran Yahav 00:21:19 Yeah, sure. It’s actually quite interesting and cute. I like it. As part of the context engine, when you are performing certain tasks, you need to know what are the available semantic entities that you can rely on. So the most example is, let’s say I’m in Python and I’m importing certain modules, right? Tabnine needs to know what does it mean to import a module and where that other model exists. And Java is maybe even a stronger example, you are importing things from libraries. Tabnine needs to know what is the signature of functions in the library, right? So it generates code that matches signature, type signature as defined in the library. And for that you need to understand the notion of import and to understand where the library is need to analyze the structure of the library. You need to actually have some representation of the signatures in the library and need to do that even when you don’t have the source code available, which is typical for Java. So in Java, often you get some JAR files that are the libraries, right? You don’t have the source code for that, and you still need Tabnine to generate code that is correct to those signatures. And in order to do that, you need to do some sort of semantic analysis, understand the dependencies, etc. It’s not as dral(?) as it sounds.

Gregory Kapfhammer 00:22:39 So there is a Tabnine program implemented in Rust. It’s running on my developer workstation. It’s doing things like semantic analysis and we’ll talk more about how it’s building up this program context. But just to make sure I understood you correctly, I think you’re saying that Tabnine works across multiple programming languages. Is that true?

Eran Yahav 00:22:59 Yeah, it works across, yeah, most programming languages that, or mainstream program languages, it supports 80, but not all of them have the same level of sophistication in the semantic analysis, etc. All major languages have pretty deep sematic analysis. But yeah, this varies by language. So definitely I wouldn’t say that we support Java TypeScript, Python, JavaScript in the same level that we support Lua, right? So I think you can imagine that the major languages have much better support in terms of the sematic analysis and other things.

Gregory Kapfhammer 00:23:35 Yeah, so this is a really interesting point and I think it moves us into the next phase of our discussion because what I’m understanding is that Tabnine needs to be able to represent program source code for Java and for Python and for JavaScript. So do you have some kind of internal representation in Tabnine that abstracts away from different languages?

Eran Yahav 00:23:56 So depends on the language. For some of them we have something, but for the most part I’ve worked quite a bit on static analysis and program presentations that are, no ASD based and otherwise, and deeper than that we try not to reinvent the wheel there but use existing presentations for most things. Maybe the most interesting thing there in terms of presentation is sort of dependence graph, some dependence edges between important entities in the program, but not fine-grained representations beyond dependence graphs and ASTs. That’s I think the important bits there. Obviously, we do have the embeddings of the code base with vector database that in which we represent piece of code based on their embedding and we can do near state neighbor search in order to find relevant piece of code for particular queries, etc. So that obviously exists, but in terms of the semantic presentation itself, you can think of those as additional edges over points in the vector database. Does that make sense?

Gregory Kapfhammer 00:25:08 Yeah, that makes a lot of sense. I’m catching what you’re saying, and I do in a moment want to make sure that we define things like a dependence graph and an AST. But before we do that, if I could jump back 10 minutes, can you briefly comment again on why you picked Rust for the implementation of Tabnine?

Eran Yahav 00:25:25 Yeah, so it’s a historical choice, I think it was not done by me, it was done by the original developer of Tabnine which was 2019 I think. And the choice was always made to be cross platform and cross arc deployment to make those things easy. It’s unfortunately quite hard to deploy to different OSS and different architecture in most languages without carrying over a huge runtime and Rust solves for that problem. Originally, we were also doing a lot of very clever inference on the local machine, so that requires low level work with CPU Intrinsics and stuff like that. And then the choice is either really C++ or Rust and it’s choose your poison basically, right? I used to do a lot of C++, so I prefer Rust to C++ as well, but I don’t want to get into that particular religious war here.

Gregory Kapfhammer 00:26:28 Yeah, what you’re saying makes a lot of sense. Thanks for giving us those details. Now also, I heard you say a little while ago that Tabnine’s Rust based program is creating an AST and a dependence graph. Can you explain both of those for our listeners?

Eran Yahav 00:26:43 Yeah, so again, let me recap a bit for the audience. So when we represent programs, we would like to know something that is a little more structured than just the program text. For that, we parse the programs based on the grammar of the program language. We create a representation that is called an abstract syntax tree (?), the abstract syntax tree. You can think about that as a compact representation of a piece of code that abstracts away a lot of the details, but keeps the essence so you can have some tree (?) structured representation of a function and inside the function, the different blocks of the function inside each block you can say, hey, is it your conditional block or is this a wire loop? Or whatever it is. And so it gives you a structured representation, a tree(?) based structure representation of a program in a way that can be unified also across different programming languages.

Eran Yahav 00:27:42 So once you have the grammar of a programming language, you can use something like tree sitter or another parser to get an abstract syntax tree for the program. And that makes a lot of operations on programs much easier to perform right than just performing it on text. So that was the abstract syntax tree part. I think the other thing that we mentioned is dependence graphs. And again, not going too deep on what those means, but you can say something like, hey, I am calling a function FU and that function FU is defined elsewhere. And so let me create some edge in my graph in like a larger program graph between the use of the function FU and the declaration of the function FU. And similarly, you can create dependencies between variables, between functions, between all sorts of entities in the code, maybe one class inheriting from another. You get the intuitive idea that entities are related in some way, and you’d like to capture those relationships so you can use them when, for example, creating context for an LLM based operation.

Gregory Kapfhammer 00:28:55 Okay, thanks for that response. Now, very quickly, you mentioned something called a tree sitter parser. Can you briefly say what a tree sitter parser is and how that works?

Eran Yahav 00:29:05 Tree sitter is just a name of a project from GitHub for parser generator actually, it’s a really nice project. They provide a bunch of parsers for most languages and a way to create parser for your favorite language as well. And they have a lot of utilities that allow you to do that quickly and conveniently. It’s a really nice project that I think the academic audience, if not already aware of, should really be made aware of and using the research, whoever’s doing PL research, at least.

Gregory Kapfhammer 00:29:41 Thanks for that pointer. That’s awesome. Now what I want to do is leave the land of code and jump into the land of non-code. If I’m understanding correctly, Tabnine somehow knows how to incorporate design documents and issue tracker tickets. Can you explain more about how Tabnine manages those diverse sources of data?

Eran Yahav 00:30:02 Yeah, we’re just in the beginning of that journey, but that journey is really important to us in the sense of, as I said, when you think about the levels of delegation and the future of the AI engineer, we see a lot of shifting left beyond the ID into the higher or further left in the process into design documents, right? You’ll write the Jira ticket, hit a button and get the implementation to a large degree, right? And so it’s important to establish these connections and relationships early in the evolution of the AI engineer. And so this is what we’ve been doing right now. What Tabnine allows you to do allows you to take a Jira ticket and say, operate on it as additional context for whatever it is that you’re doing. So you can mention that your ticket and implement it directly in your ID, which I often do and it’s quite cool.

Eran Yahav 00:30:58 You can ask Tabnine to check whether your implementation matches what is mentioned in the Jira ticket. So you can say, hey, what the Jira ticket said, you should have such and such button. Actually your implementation does not have that. But this is just the beginning of the evolution here. I think we are pushing forward on integrations that are going beyond that, like updating the ticket for you, expanding on the ticket for you saying like, hey, the ticket is missing the following details based on what you already have in the organization, right? Or hey, you said I should create a button that whatever updates employee details, there’s a microservice called update employee details defined elsewhere in the org. Do you want me to use that or do you want me to redevelop that part? Right? So and again, trying to bridge the semantic gap between what is defined at high level in a ticket and what already exists in the code base, which is I think one of the harder problems for engineers in general, right? Because it’s hard for you to know it’s I’m a new engineer to the organization, I have no idea that no a microservice for getting employee details or updating employee detail already exists, right? And I may try to implement that and if I use ChatGPT or something, it’ll give me a perfect implementation of a new service that does that, but I actually don’t want to regenerate the service. I already have one that I should be using.

Gregory Kapfhammer 00:32:22 You brought up a good point. I think now you’re highlighting one of the distinctions between ChatGPT or Anthropics Claude because Tabnine has access to this context and so must be able to make better decisions because it has more context. Is that the right way to think about it?

Eran Yahav 00:32:40 Yeah, I think again, LMS are commoditized and they are progressing in tremendous space, but at the end they are the computation engine and they need to be aware of what already exists, what is the context in which they’re operating, right? So you can think about the LM as a kind of an ignorant genius. It knows how to do many, many, many things, but not in in the relevant context in which I’m operating, right? And you can think about the best engineer in the organization as someone who really knows the nuts and bolts of how things operate, right? He knows or she knows, hey, that microservice depends on the other thing and the database is there and oh, you should actually never update the version of that module because something else depends on it. And so they have this representation of the entire system in their head and they know how things interconnect and what should be changed together and what shouldn’t be touched, etc.

Eran Yahav 00:33:36 So that’s the best engineer in your org. So you want really to bridge these two things, make the LLM as aware of the org as your best engineer and this is how you get like to 10x productivity of the AI engineer, right? To onboard the AI to your organization. This is really the task here and this is what Tabnine is doing, kind of onboarding the AI to your organization in a way that the AI engineer operates like your best engineer. Ideally Tabnine should always be the best employee of the month, right? Employee of the month award should go to Tabnine in organization.

Gregory Kapfhammer 00:34:11 Thank you for that response. That was helpful and informative. Now in some of the projects I’ve worked on, I’ve had situations where the documentation drifted from the implementation. So how does Tabnine actually address the fact that some sources of context are really helpful and it’s going to lead to it becoming employee of the month, but other sources are less helpful because they like for example, no longer reference or represent the truth of the code base. Can you give more details about that?

Eran Yahav 00:34:41 Yeah, that’s definitely a challenge. I don’t want to overplay the abilities of the AI to kind of know what the best source of information is really here, and it is actually not that hard to mislead the AI. So let me just maybe give you an anecdote from customers, right? So initially whenever you talk to customers they say, hey, I have 30 million lines of code and why don’t I connect Tabnine to this 30 million lines of code to inform Tabnine. There’s documentation there, there’s a bunch of stuff as well, right? They don’t want to connect it to everything and then they say, know what? Come to think of it, 27 million lines of code from this code base actually think that no one should ever look at. That’s like legacy, which is the counter example for what we want to do. So let’s disconnect all that and let’s focus on the 3 million lines.

Eran Yahav 00:35:32 They didn’t need not be the most recent ones. By the way, initially when I started, I thought like, hey, we just look at the recent stuff and that must be the right way to do things. It’s sometimes correct, but sometimes they’re just like some golden repository, something that is like, here’s like our best work, our masterpiece and everything should contextualize on that because that’s the best way to do stuff, right? And so there are some nuance to that and you still need some human guidance to connect Tabnine to the right sources, not mislead it. And in particular there is a problem with huge legacy code bases that unless you’re careful and give extra guidance through coaching, you may be trapped in the kind of Jupiter level gravity of legacy in the sense that you will always keep generating code that is similar to the way you did things 20 years ago, right? You’ll be generating new legacy. So you don’t want that. And this is also why we created the coaching module that allows you to explicitly say, you know what Tabnine, that’s the way I want you to do things, right? If somebody asks you to connect to a database, the way to do that is this new way and not the old way, right? So you can control what AI does explicitly as the architect or as the dev manager or as a senior engineer.

Gregory Kapfhammer 00:36:52 So it sounds like Tabnine can first of all automatically assess the quality of some sources, but then it will also take hints from me as a software engineer in order to help me to be the human in the loop to point out what is good code or good docs. Am I thinking about it the right way?

Eran Yahav 00:37:10 Absolutely. So I think again, if you think about data’s onboarding the AI engineer to the organization, there are certain things you expect the AI to understand by itself. Hey, look at this code base. No, get the gist of it. You should just implicitly learn what’s important. And Tabnine would do that, but certain things cannot or maybe are too hard to be learned implicitly and you want to control them explicitly as the human in charge, right? You want to say, hey, I don’t care what you learned from this old crusty Java-8 code, we’re in a different world now and, here are the rules of the new game. I’m telling you as the architect right here are, here’s the way to do stuff.

Gregory Kapfhammer 00:37:52 When I was reading the Tabnine documentation and checking out your blog posts, one of the words that I noticed you used frequently was provenance and the other was attribution. How do provenance and attribution play into the tasks that Tabnine can automatically complete?

Eran Yahav 00:38:08 Yeah, I think those are important things when you are moving to LMS that have been trained on anything and everything out there. So Tabnine really cares about what we call protection and protection means protecting you against generating code that potentially validates IP, right? IP rights. And we have a model called Tabnine protected, which has been trained on the on permissively open-source license scope. So you can pick that if you are very protection conscious, if you’re very risk averse in terms of IP, you can always pick Tabnine protected. But as external models become more and more powerful and want to enable those for our customers and give them the freedom of choice without having to worry about protection, or we provide provenance and attribution or what we call also inference time protection, meaning that we have recitation checks and basically matching any code that is generated by Tabnine against all of the code out there in some abstraction and letting, hey Greg, you know what this piece of code is really similar to the Linux kernel which is licensed under GPL-2 right?

Eran Yahav 00:39:22 So hey, do you want to take that? Do you want your developers in the organization to not even see code of this flavor? Maybe you want to only suggest code that is MIT or Apache license. So provenance and attribution lets you find kind of the, the origins of the code being generated in case it’s very similar to something else out there. And it does that at inference time, almost near real time. And you can configure the behavior, as I said, to even block those things from being presented to your developers. So Tabnine is, I did not mention that earlier, but Tabnine is like an enterprise first AI system. So we really care about privacy, about protection, about personalization to the enterprise. So all those things like privacy, protection and personalization are the things that make Tabnine great for the enterprise.

Gregory Kapfhammer 00:40:11 Okay, that makes sense. I remember earlier in our conversation you mentioned that trust was central to Tabnine and now I’m understanding that inference time protection is one of the keyways to make sure there’s trust between the software engineer and the Tabnine system.

Eran Yahav 00:40:27 Yeah, in some sense, protection is one facet of that. When I talk about trust, when I said trust earlier, I meant more on the functional side of things, more in the sense of like, hey, you ask me to generate whatever, something that connects with database and gets all employee records, can I trust that what I got is actually reading all employee records or maybe due to a pagination error, you only got the first hundred and you did not paginate through the tape, right? Like these subtle bugs or hey, maybe you asked me to read all employee records and I did not know that they are across three different tables for whatever reasons, right? And I only got you one of them, right? So these kinds of functional things that I have to worry about when I delegate a task. This is what I talk when I mean when I say trust.

Gregory Kapfhammer 00:41:18 Thanks for that clarification. That was helpful. In a moment I want to chat about Tabnine and how it can help with things like automated test data generation. But before I do that, I thought it would be helpful if you could briefly comment on the different cloud based LLMs that you can integrate in with Tabnine. Can you share more details with our listeners?

Eran Yahav 00:41:39 Yeah, absolutely. So Tabnine is always trying to bring the best models out there to our users and as such we offer all the latest, greatest models from Anthropic, Open AI, Google, etc. So you can get Gemini, you can get Claude or Sonet 3.5, you can get GPT-4, you can get all of these models. Again, the pace of innovation on these with new reasoning models and others is really fast. We’re always going to bring the best outer tool to our customer, whatever matches their deployment restrictions, their preferred cloud provider and maybe protection requirements. So our goal is really to work with the customer to see what are the requirements in term of privacy and protection and preference to cloud provider and model abilities and provide that. As of today we find in our internal benchmarks that Claude Sonnet is the best right now, but Google is having a very strong showing with the latest Gemini and these things change constantly. So we give our customers the freedom to just switch between these models completely seamlessly. It used to be the case that different models did had different performance across different tasks. So maybe one model is better for code review, and another is better for test generation and others better for code generation. Right now looks like Claude is doing the best on all of them, but yeah, some open-source models are also performing really well. Latest Llama 3.3, I know the latest QN family and others. Yeah.

Gregory Kapfhammer 00:43:23 Actually that’s a really interesting point and something that I was going to discuss. So I can configure Tabnine when it’s generating test cases to use one model, but then perhaps configure it to pick a different model if it’s creating documentation or doing code review.

Eran Yahav 00:43:37 Yeah, right now we don’t offer that control at the customer level. We decide for certain things, but you could, if we find that models diverge again in their abilities, we may open that again, right? Right now it’s just like if you bet on Claude Sonnet to do this, you’re not going to be wrong, right? So no need for this fine grain control on the customer side.

Gregory Kapfhammer 00:43:58 Okay, very interesting. Let’s talk briefly about test cases. I noticed when using Tabnine that it can like suggest test cases to me by looking at my code and then saying, hey, these 10 tests could be useful to you. So my question is how does Tabnine decide how many test cases to suggest to me? And then how does it decide which test cases it thinks I will find most useful?

Eran Yahav 00:44:19 Yeah, that there is some simple agentic flow there that is trying to look at the existing test and see what is missing and then generate a test plan according to what is missing. So that’s just I think the one of the test agents. It kind of generates the test plan, allows you to refine it and then generate tests from the test plan. The next version is, as I mentioned earlier, something that is going to generate code coverage target completely automatically. So it’s generating the tests, measuring the coverage, seeing what parts of the code are not covered, generating additional tests, etc. until you reach coverage. Of course all of those things still require human supervision in the sense that you want these tests to actually be interesting and meaningful and not cover code in trivial ways and maybe also want them to be meaningful tests when going into the future, right?

Eran Yahav 00:45:16 Only you, the human kind of know where you want to go with this code base and what tests are kind of like forward looking, right? So you want still to provide some human guidance, but as I said, because the question of trust is so central, how can I trust the things that are being generated by the AI engineer? I think the code coverage agent and automatic test generation are also central tenants of what Tabnine is going to do in the future. So these agents are going to improve tremendously and also be tied to things that we see in code review and things that we see in runtime, etc.

Gregory Kapfhammer 00:45:56 Okay. So it sounds like coverage guided test case generation is on the roadmap for Tabnine. Is that the right way to think about it?

Eran Yahav 00:46:04 It’s working, it’s not just on the roadmap. It’s not released yet. Okay. But it’s, yeah, internally this is something that we’re close to deploying.

Gregory Kapfhammer 00:46:12 So does Tabnine generate unit test cases or integration tests or regression tests or fuzz tests or all of those? How does it work?

Eran Yahav 00:46:20 No. Right now mostly unit tests and the others, again, we will get there, not quite there yet. I think the regression is another interesting point to intervene, but I think this is still, there’s some way to go there.

Gregory Kapfhammer 00:46:35 So in certain cases I’ve either automatically generated a test or written one of my own and then the test case fails, however, I ultimately realized it was a bug in my test case and not a bug in my function under test. So how does Tabnine distinguish between a test case that’s failing because it’s a buggy test versus a test case that’s failing because it found a bug?

Eran Yahav 00:46:56 Obviously without an additional specification somewhere, you would not know that, right? So the thing is you need to enter that specification somewhere and Tabnine will check against that. People are not that big outside academia about writing specs really. Right?

Gregory Kapfhammer 00:47:15 So when Tabnine is generating these test cases, can it also generate like the mock objects or the spies or the stubs that it needs?

Eran Yahav 00:47:24 Yeah, it does, it does. So if you need some mock objects that right, it will do that again to a certain degree of sophistication. Don’t expect it to do the level of mocking that, to your super-duper complicated project, right? But it’ll do the basic things it’ll definitely get out of the way.

Gregory Kapfhammer 00:47:41 Okay, that makes a lot of sense. Now I noticed that when I write test cases, sometimes my test cases aren’t good because ultimately they prove to be flaky or sometimes, they’re too tightly tied to the implementation and they’re like a change detector test, which I would also consider low quality. So does Tabnine have guardrails in place that help it to automatically generate test cases that are not going to be flaky or not going to be a change detector?

Eran Yahav 00:48:08 No, not yet. I think flakiness I have not seen much, but definitely change detection as you call it. I think that’s definitely some bias in, in the AI because that’s like an over fitting to the existing implementation engineering, the test that I think that is happening, not exactly sure how to generalize around that, but we’ll see now these are problems we’re working on.

Gregory Kapfhammer 00:48:33 So if there’s a failing test case, can I give that test case to Tabnine and have it fix the program under test so that the test passes?

Eran Yahav 00:48:41 You can do that. It’s not an automatic integration yet, but you can definitely point to it and say like mention the test and say this is failing, please change the code according to that.

Gregory Kapfhammer 00:48:49 Okay. So we’ve covered a lot of very cool features of Tabnine. We talked about how it uses all of these different forms of context and how it can automatically generate test cases. Are there features of Tabnine that we haven’t highlighted yet that you wanted to draw to the attention of our listeners?

Eran Yahav 00:49:05 Let me see. I have to review the things that I’m using all the time. So I think when you are working with the chat interface, for example in Tabnine, one question is how do you consume back the results that you get from chat? And I think we’ve worked hard on kind of automatically applying these changes across files and things like that. That’s going to be an area of further improvement in Tabnine. So in recent months. So I used to work mostly with code completion because I typically know what I’m doing and I’m like doing very point changes in my code, etc. So I’m probably doing what most would consider specialized work in the logic of the product and stuff that is fairly deep and deeply contextualized also. So I used to use mostly code completions but in recent months I moved to use mostly chat because since we introduced the apply all kind of magic, which is like here’s everything that was generated just hit a button and all the code gets changed automatically, I started using that for almost everything that I do.

Eran Yahav 00:50:13 And so that apply was a leap of abilities in Tabnine for me at least as a user. And I think the further it improves; I find that I do less code completions even for things I know exactly how to complete. Maybe even if it’s a one-line change, I still do it in the chat interface and going to have the line change whatever needs to change in order to make it happen. So habitually, I would say against my better judgment in a sense, I started using more the chat and that’s the code completion. And I imagine that, and I know actually I just imagine that a lot of the people who kind of adopt AI for real like all in are also transitioning to using more of the chat interface than completion. It’s just again a question of habit, where do you go first? Right? And so I think the habits are changing in front of our eyes, right?

Gregory Kapfhammer 00:51:05 Yeah, that’s a really thought-provoking insight and it’s something that I have noticed in myself and many of the other engineers with whom I regularly chat. So thanks for pointing out that change in developer practice and experience. What I would like to do now is to go back to what we might call the roots of Tabnine. You’ve actually been involved in publishing and presenting a number of research papers and I think some of those research papers have actually planted the seeds for Tabnine. So there’s two frameworks in particular that I wanted to briefly chat about. One is called code2seq and the other is called code2vec and it’s S-E-Q and then V-E-C and I think code stands for source code. Can you tell us what those other things are all about and how they influenced the creation of Tabnine?

Eran Yahav 00:51:53 Oh these are beautiful works by my former student, Realon(?) and Metal(?) and Shakie(?) Brody and I think one of them is also with other friends and contributors like Orlevi(?) from Google. These are really, wow, so much time has passed. These are circa 2018 and they’re not even, or 2019 and they’re not even the first work that we’ve done around code completion and these things. Like the first work that we’ve done around code completion I think is 2012 when we started seeing that you could do some ML based code completion and that is really what informed this entire line of work. So let me talk for a second about the specific things that you asked. Code2vec was really a very nice work on how to represent a code for tasks like function name completion. Maybe you could use that also to predict names of variables and stuff like that.

Eran Yahav 00:52:58 So it was very, very early and straightforward use of neural networks. I would say for code specific tasks, you can think of that also as like the first embedder for code, I would say creating some neural representation or dense representation of code as vectors, as dense vectors. So I think that was 2019, that was nice work. So that is code2vec. It’s still very widely used because it’s very simple. So it uses basically paths in the abstract syntax tree to represent a piece of code and then basically compute some embedding or some neural presentation of those paths of the combination of paths in the AST. So that was like the first work that we’ve done around that with really beautiful work by or Ori alone. And I think one year later we did code2sec, which was generating a sequence of, from a piece of code, the idea that was to say, hey, let’s generate just like captions for images.

Eran Yahav 00:53:57 It was captions for piece of code, right? So just generated document or like a line of description for a piece of code. And those were tiny, tiny neural networks. This is I think, neural networks in the size of like 150 million parameters, which is like in these days this sounds like something you can, I know I can probably run hundreds of those on my iPhone, right? So these are like early days of work on New Relic presentations of code. And I think there has been a lot of follow on work on that. We’ve done, again, a lot of work following that on adversarial examples for models of code and all sorts of additional practical work like how to do edit completions by Shakie(?). I think Shakie(?) probably did very nice work on how to predict edits rather than to predict code completions. You want to predict how code is going to be edited.

Eran Yahav 00:54:53 You made an edit in one place, you want to predict other edits in the same code, right? How code is being updated as a result of change. And there’s a bunch of very nice academic work there. I think when we did all the academic work, the trade-offs are quite different than what you’d like to see in industry, right? So academic work is often very high accuracy, but maybe latency is too high to be used in practice and maybe you get 10% improvement in accuracy, but at 10x latency or 10x computational cost and these things, again, for academic research, it makes perfect sense and then maybe other people find a way to make this faster or better or cheaper, but the trade-off there is not something that matches the requirements in the industry.

Gregory Kapfhammer 00:55:48 So thanks for sharing those details about the trade-offs. For listeners who are interested in exploring more, we’ll link them to episodes of Software Engineering Radio that have also talked about generative AI and software engineering. And in the show notes we’ll also link to the websites for Tabnine and code2seq and code2vec. Thanks for giving us those details. I’m wondering, since you’re both a CTO and a professor and you’ve already begun to hint at this, can you share with us some details about how you transitioned to Research prototype into production use?

Eran Yahav 00:56:20 Yeah, so I think my friends in Microsoft Research told me many years ago that when you transition from research to product, you realize that the research part or like the deep tech part becomes at most 20% of the product at the end. So my experience has not been different. I think there’s just so much that has to go into a product in terms of the user experience, in terms of how the product behaves, what are the kind of latency requirements, what are the cost requirements, what does it mean to do inference at scale? All those things are not super interesting academically for the most part but are really important in making things work for real. And I think I have tremendous appreciation for that practical work and for what does it take to make a real product work in the hands of real users, right?

Eran Yahav 00:57:19 I think it’s just a tremendous amount of work to do that. In terms of the research to kind of the academic research into practice lens, I would say that a lot of what people are doing these days in no EC or FSC or a lot of the software engineering conferences in terms of the ML stuff is also being done in industry. And I think not necessarily in a worse way, just people don’t really write about it, right? So I would say that people in academia, my advice and what I do myself also is academically to think about how this would look 10 years from now, not one year from now. Because the one we are from now industry knows and industry is already doing, even if you don’t read about it in a research paper. Right? And so it’s not an easy task.

Eran Yahav 00:58:11 So it’s hard to imagine 10 years from now how things would look, but trying to imagine and trying to imagine a completely different world of how people program is what I expect academic research to bring to the table, doing prompt engineering for code review. Sure, you can do some research about that. And it’s interesting. It’s not a solved problem in any sense, but I think industry is working on that and industry will solve it in a way that is maybe not the most interesting, but definitely the most working practical. I think that’s the balance that I’d expect, like for limit to look further out, think some people are definitely doing that as well.

Gregory Kapfhammer 00:58:54 Thanks. As we draw this episode of Software Engineering Radio to conclusion, I’m wondering, do you have a call for action to our listeners who are software engineers working in practice and may want to try to effectively apply generative AI tools like Tabnine?

Eran Yahav 00:59:09 Yeah. So it’s the obvious advice I guess, but just play with it. Lean in, adopt these tools. They are not going away. This is how things are going to be in some way. Resistance is futile and you should lean in, see what works for you. Not everything works the same way. Have realistic expectations. This is not some magical thing that will make all your work go away and will definitely wonít replace you. So lean in, see what works for you and adopt it. I think realistically speaking, you can automate a lot of the of simple things, right? Which should give you more time to focus on the work that you want to do anyway, which is the creative work and the deep algorithmic maybe or innovative work parts of your job. It is interesting to also see the evolution of how the code base, how your code base should evolve to make AI’s life easier.

Eran Yahav 01:00:07 So if you think of how Tabnine works by contextualizing on the entire code base and you know that it is doing some redevelopment generation that is based both on semantic and non-semantic similarities and embeddings and stuff like that, you can think about adding maybe a bit more documentation to your existing code base, maybe changing the names of functions to be, and parameters to be slightly more meaningful such that both future humans and future AI can make better use of the code base. And I think these things are things to think about as you design a system for the future and as you refactor code for the future is not like, how do I make it easier for future humans, but also for the future AI systems that are going to want to leverage that. I think people who are going to build their code base this way are going to have better productivity multiplier from AI based tools.

Gregory Kapfhammer 01:01:04 That’s an interesting insight and a great call to action for our listeners. Eran, thanks for taking time to chat with us on Software Engineering Radio. This has really been an awesome conversation.

Eran Yahav 01:01:15 Well thank you so much for having me. I enjoyed every minute.

Gregory Kapfhammer 01:01:18 Thank you.

[End of Audio]

Join the discussion

More from this show