Search
Leo Porter and Dan Zingaro

SE Radio 582: Leo Porter and Daniel Zingaro on Learning to Program with LLMs

Dr. Leo Porter and Dr. Daniel Zingaro, co-authors of the book Learn AI-Assisted Python Programming, speak with host Jeremy Jung about teaching programming with the aid of large language models (LLMs). They discuss writing a book to use in Leo’s introductory CS class and explore how GitHub Copilot de-emphasizes syntax errors, reduces the need to memorize APIs, and why they want students to write manual test cases. They also discuss possible ethical concerns of relying on commercial tools, their impact on coursework, and why they aren’t worried about students cheating with LLMs.



Show Notes

Related Episodes

Links

  1. Learn AI-Assisted Python Programming
  2. Leo Porter
  3. Daniel Zingaro
  4. GitHub Copilot

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Jeremy Jung 00:00:43 Hi, this is Jeremy Jung for Software Engineering Radio. Today I’m talking to Dr. Leo Porter. He’s an associate teaching professor of computer science at the University of California San Diego and he co-founded the computing education research laboratory there. I’m also joined by Dr. Daniel Zingaro who is an associate teaching professor of computer science at the University of Toronto and he is also the author of the book Learn to Code by Solving Problems and the book † Algorithmic Thinking. They are co-authors of the book Learn AI Assisted Python Programming, . Leo and Dan, welcome to Software Engineering Radio.

Leo Porter 00:01:22 Thank you for having us Jeremy. I really appreciate your podcast.

Daniel Zingaro 00:01:24 Thanks Jeremy.

Jeremy Jung 00:01:26 The first thing we could start with is why this book and why now? How did you decide on like, okay, this is the thing we need to do now?

Leo Porter 00:01:34 So Dan really early, when LLMs first kind of were coming out and being seen on the scene for programming, he started playing with them for programming projects and I think Dan really quickly realized that they’d have this a big impact in how we teach programming. So he reached out to me and said I really needed to give him a try. And after I played with him for a little while, I had the exact same realization that this is going to change how we teach programming in a pretty dramatic way. So having realized that, having realized that we had to change our introductory CS-1 courses, we knew we need to do that, but in order to teach that class we’d have to have a book that we could assign our students so that that would go along with the class. And so we knew we had to change the class but we also, we had to have a book for it. And given the timeline to write books, we started in the book first. And so that’s how we got started.

Daniel Zingaro 00:02:22 I guess we figured out that our course had to change first before we knew exactly how it had to change. One thing we learned early on was that the kinds of assignments we give in our introductory courses, they’re just solved by these tools like chat, GPT and GitHub copilot. So we knew something had to change and then it is just a matter of figuring out what. And so we spent quite a bit of time with these tools and we started to realize that what’s going to change is the skills that our students need to learn to be effective using these tools. So like before these tools, we would spend a lot of time teaching syntax and students struggle quite a bit with learning syntax, which I mean it’s very frustrating, right? Because you can’t even do anything until you get the syntax right? And you’re getting all these errors like missing colons and mismatched braces and stuff like that.

Daniel Zingaro 00:03:12 Uh, so it’s actually good that’s the LLMs are doing the syntax for the students. But just because that skill’s not needed as much doesn’t mean that there aren’t still skills for students to learn. So instead of syntax, other things become more important. So for example, Leo and I realize that reading code is going to be extremely important even more so than before. I think if that’s even possible. And that’s because sometimes you’re going to get back code that just doesn’t work. And so we realize that students are going to need to be able to read the response that they get to see if the code looks reasonable or not, right? And if the code is unreasonable then they need to read more code and look at other solutions, right, that they get from the LLM, other things they can do as well, like messing around with a prompt and so on.

Daniel Zingaro 00:04:03 But they’re going to need to be able to read code throughout the process. So we just kind of kept on using these tools and documenting the skills that students are going to need. And we just kind of realized that all the skills students are going to need are skills we would want to teach anyway. So like one more example is testing, right? So students may now not have an understanding of every last detail of, the Python language like they would before and so then that makes testing even more important, right? Than it was they need to verify that the code they’re getting is correct. And so they have to be very good at writing test cases. And, similar for debugging, we need our students to have strong debugging skills again, even potentially stronger than before, right? Because if the code isn’t working, they need to first determine what the code is doing to be able to fix it. And then I guess one more I’ll mention is problem decomposition. And this is a big one, I think this is going to come up a couple times probably in our talk today, but LLMs struggle when you give them tasks that are too large and students need to know how to break problems down into small components so that the LLM can solve each one and, have a good chance of getting it right.

Leo Porter 00:05:13 Kind of to piggyback off of that, you might be hearing these skills and saying, oh these are absolutely essential skills every software engineer should know, these are being taught right now, right? And the answer is not really. Like these aren’t core topics in a lot of introductory CS classes because so much time is spent on syntax. And so fairly early on when we kind of realized these skills would be so essential, we got really excited because these are skills we want to teach in our classes and the LLMs are now giving us the ability to do that.

Jeremy Jung 00:05:39 I think that’s interesting about the syntax comment because you were saying how reading is going to be more important than ever because you have LLM generating the code and you need to understand that code that’s being generated and understand that it does what you think it does. And so I wonder if when you say you spend less time on syntax, is it because you feel like they’re going to generate this code and they’re sort of organically going to pick up syntax that way versus having to focus on it at the start? I’m just trying to picture what you see changing there.

Daniel Zingaro 00:06:13 Yeah Jeremy, so I was, I guess speaking specifically about syntax errors, which don’t generally happen when you’re using LLMs and I also agree with you, you need to know what the code is doing, but you can do that without worrying about each specific piece of syntax. Like you’re going to need to know what the keywords do for sure, but missing brackets and colons and oh there needs to be like a blank line here, indentation. A lot of this kind of thing is done for the most part correctly by the LLMs. So yeah, I agree with you. You need to be able to identify the structures, right? So in our book actually Leo and I have a couple of chapters on reading code and I don’t think we ever break down a line of code into its individual tokens. We do talk about the main structures, so like ifs and loops and functions and all that. But compared to other books I think or other ways of teaching where you would focus on the micro level, we try to focus on the line level now because we want our students to be able to grasp what each line is doing, I guess more than each token.

Leo Porter 00:07:28 Yeah and maybe to add that a bit, it’s almost, if you think about the advent of block-based languages, it was to make sure that the essentially the author can’t make syntax mistakes, right? Is the whole purpose of kind block-based languages. And they’re huge for introductory programming, especially in like K through 12 and in a sense LLMs do this because they’d never give you back wrong syntax or they almost never give you back wrong on syntax. And so it takes away that kind of cognitive burden of making sure you handle the token level as I Dan saying.

Jeremy Jung 00:07:57 I’m curious, so you said the syntax is correct, but what are the typical mistakes you see coming back from these LLMs? Is it a logical mistake or is it ever something that actually doesn’t compile, I’m kind of curious what your experience has been.

Leo Porter 00:08:14 I think the more common errors that we’ve been seeing are logical. So it misinterprets the prompt that you’re giving it, it essentially tries to solve a problem that’s different than what you’re trying to solve. It may have bugs in it, so it is in fact trying to solve the right problem but it’s off by one is maybe replicating some mistake that it found in the large code base. And so most mistakes are going to be you need to write test cases, run it. That mistake is then going to show up when the test cases catch it and then you’ll have to try to fix it. If the students can read the code, if we train them well to read the code, often you’ll look at the response and if the response is just not even trying to solve the right problem, you can usually pick that up pretty quick.

Leo Porter 00:08:57 I think the students will learn to do that and then they can just say, okay, this is clearly not the right answer. And use the different tools and say VS code to find another answer and then pick one that’s right or change their prompt to get a response that’s right, go through that whole flow, but then some point or other it will give an answer that looks right. And then I think all of us as software engineers know that even if the code looks right, it may not be. And so then they have to actually write the test cases, get some level of confidence that’s actually working right before they’ll know. And so sometimes, sometimes you know really quick is, it’s just clearly wrong, it’s solving the wrong problem and sometimes it looks right but it actually has some bugs that need to be fixed.

Daniel Zingaro 00:09:35 I guess one thing that struck me is how much a change in the prompt can matter. Leo, we’ve seen this over and over again where we’ll write a prompt, it seems fine to us and then we’ll realize, oh there are actually two different ways of interpreting this and the ambiguity of English strikes again, right? And so it’s just amazing to me how clarifying the prompt, how many times that fixes the code. Not always. We’ve definitely have examples where that’s not the case, but more often than not, in my experience, changing the prompt appropriately has a bigger than anticipated effect on the code. It’s amazing.

Leo Porter 00:10:17 And if we’re thinking about the prompt in terms of like doc strings for functions, adding the test cases certainly help. Sometimes it is surprising sometimes that you can add the test cases to the prompt and it’ll still give you back code that does not actually pass that test case because it via code and copilot doesn’t actually run the code that comes back from the LLM. But I do find the test cases do tend to help with the quality response you get back.

Jeremy Jung 00:10:40 As a part of your prompt, you’re asking it to implement some functionality and you’re also asking it to write these tests for that same functionality?

Leo Porter 00:10:49 Oh no, sorry, it’s more the doc test kind of format. So you’re writing, let’s say you’ve run your function signature and then you have the description of the function in a doc string and then towards the end of the doc string I’m articulating the test cases that I intend to use and the articulating the test cases that I intend to use helps it come with a better prompt. I haven’t found it to be great at writing test cases. I haven’t spent a ton of time with this, but the time that I have spent, it tends to want to do almost like a brute force search of all possible inputs. As opposed to doing okay, well here’s a couple common, here are the edge cases. Now I can feel fairly good about it. It doesn’t seem to have that intuition yet. For the most part we’re writing the test cases ourselves and we’re going to be teaching the students how to write the test cases themselves as well.

Daniel Zingaro 00:11:36 Yeah, so Leo and I have actually made a conscious decision to have students write test cases from scratch. Even though you could play around with the LLM and have it try to generate test cases, whether it’s flawed or not, we still want students to do this from scratch. We think that writing test cases is a skill we want our students to have.

Jeremy Jung 00:11:58 Sometimes what these models will generate, like you were saying, has logical errors and hopefully if you’re writing the test cases, you’ve put some thought into them and your test cases are actually checking the correct behavior. So then you have the LLM generate the implementation, it’s running against tests where you know what the correct answer should be and so it generates something that’s incorrect. You’ve kind of caught it, you’re not totally relying on it, telling you everything is good. It’s confidence in something that you personally can’t see. It’s just what the machine gave you.

Daniel Zingaro 00:12:35 Maybe it takes away one layer of uncertainty too, Jeremy, right? Like so the code could be wrong, right? And then if it generates test cases, okay, the test cases could be wrong too and maybe you get unlucky and two wrongs make a right and then your test case is passed for the wrong reason. So we really want to hone this skill in our students and like Leo said earlier, these intro courses used to be so full of low-level syntax concerns that we didn’t do testing properly. We all try to cover testing but I think we’re going to be able to cover it a lot more detailed now.

Leo Porter 00:13:08 I think we’re enthusiastic about how students will approach testing when you’re working with the LLM, this is fairly anecdotal, but when they interact with us talking about testing, often students aren’t testing their code because they wrote it. And so of course it’s right. It’s like this really famous kind of bug and human thinking, right? Is that if you write it, of course the computer’s going to interpret what you’re saying, right? And so students tend to trust their code in a way that professional software engineers never would. And I think because it’s coming from this third party that you know is wrong, it’s coming from the LLM that can often make mistakes, I think they’re going to be more inclined to actually engage in those testing practices. Kind of knowing about the fallibility of the LLM.

Jeremy Jung 00:13:47 You’re shifting the order. I mean there is test-driven development that some people practice, but I feel like probably what’s most common is you write the implementation yourself and then, then you’ll go and see like, oh did this thing I wrote, did it do what I thought it should do? Whereas this is kind of flipping it where it’s the large language model is going to write my code so I’m just going to start with the test and then I’ll ask it to write me the code and maybe that will kind of make test driven development be the default?

Leo Porter 00:14:21 I think that students may want to engage more in kind of test driven development because they’ll want to think more about what exactly should this function be doing, how should behave, what kind of inputs and outputs should expect and then it can kind of write the prompt to co-pilot or whatever LLMs using to express those inputs and outputs. Well they’re more apt to get a good answer from the LLM and they’ve kind of already got their test cases worked out as well so they can immediately just go right into the testing if the prompt came back right.

Jeremy Jung 00:14:50 And you mentioned writing a prompt to implement a specific function, have you found that they work well at the function level? But if you try to ask it to build something broader, that that’s kind of when it has problems.

Daniel Zingaro 00:15:05 So I think in general, LLMs do work best at the function level. We have tried to get it to generate bigger apps, collections of functions and it can work, but sometimes it does worse. But also we want students to do the problem decomposition for themselves and break up the problem into individual functions. Even though maybe the LLM could work with bigger chunks of code, we want students to do it and one reason is so that they can customize what they get from the LLM. So in the book we have a bunch of examples where you could probably just throw it at the LLM and get an answer and, eventually get it to work. But I think at that point making changes to it might be trickier than it would be if you knew the architecture of what you were building. So in the book we have a bunch of top-down design diagrams and we want students to understand what they’re building at that level, like at the function level instead of, like we said earlier, instead of like at the token level or the line level.

Jeremy Jung 00:16:09 And so like in this example, you’re thinking more from a learning perspective, you want the student to look at the big picture, figure out, okay, what are all the different functions or parts of my application? Break that down and then feed those individually to these large language models. I’m wondering from like, let’s say you’re a professional software engineer and your interest is more and I want to make the thing and less so and I want to learn how to make the thing. In that case, do you feel like you could feel confident in giving the large language model a larger piece of the design? Or do you still feel like it’s good to have that overall structure done by the developer and then just be very targeted about how you use the large language model?

Leo Porter 00:17:04 I think that’s a tricky question because we haven’t worked with these tools heavily in a professional programming setting. I think often when we’re thinking about large design of software, you’re going to be working on teams talking with other members of the team about the interfaces and things like that. And so I’d be pretty hesitant to outsource that, that thinking to the LLM because you, the communication between the team still has to happen even if it weren’t for that, I can’t think of it as a probability. So essentially whenever you ask copilot or any of these LLMs to do a task, the more it has to write the kind more likely it’s going to make a mistake. And so that’s kind of why I like the function level it seems like partially because it’s not that much code that tends to write, so you help to avoid kind of the probabilistic problem, but also because it’s learned on a huge code base that has lots and lots of functions that have been implemented, it tends to do well at that, that solving the function kind of task.

Jeremy Jung 00:18:01 I think the way you put it as outsourcing that designer or that decision is interesting because yeah, if you are working on a team and whether it’s in code review or just in a discussion, often people will ask, well why did you do it this way? Or why is this the good way to design it? And if you kind of handed that off to an LLM, maybe your answer is, I don’t know, it’s just what it told me .

Leo Porter 00:18:29 That isn’t an answer I want to use talking to my boss, right? Yeah, well the chat GBT told me I should have it this way. That doesn’t seem a good answer.

Jeremy Jung 00:18:36 We’ve kind of been talking in more a general sense of working with LLMs and you’ve mentioned how you’re going to be teaching introductory computer science courses this coming quarter or semester. And so when you teach these classes, what tools are you going to recommend your students use? And yeah, maybe you could go into that a bit.

Leo Porter 00:18:59 Absolutely. So we’re going to be recommending, at least for my class, I’m going to be recommending that they use a VS code with copilot. I just like the integration of the IDE with the interactions with the LLM, I think it avoids just a whole bunch of copy pasting from another interface into your IDE to then run it. I think it also reduces the barrier of them kind of immediately getting the code and then testing it right there in the environment. I’m sure any of the other tools at work, it’s just, that seems to have worked well for us when we’re writing the book. And that’s actually the technique we recommend in the book as well. So that would be the primary tool for the students writing the code in addition to having them using co-pilot with in the IDE for a lot of the code generation, depending on where things are at with co-pilot X, which is right now available through wait list that’s available publicly, I think we’re going to be recommending that because it has a co-pilot chat feature, which can be really nice to interact with.

Leo Porter 00:19:55 And the main use that that we’re going to be encouraging students to use, whether it be copilot chat or a chat GBT is in just a conversation with the LLM about particularly modules and libraries. So if you are diving into merging PDFs, which Dan did a great job in one of the chapters in our book talking about, if you want to dive into that, well what libraries should be using in Python for that. And we found that the LLMs do a pretty good job at this of actually saying, here are the different libraries you could use, here are the pros and cons of them. These are the ones that, need to be actually having an additional install done. Or these ones that come in with Python, they’re actually really good at kind of giving what you should use for the various libraries. And so that’s one other way that we’re going to be encouraging the students to use LLMs.

Daniel Zingaro 00:20:42 Yeah, so whenever the students or the junior programmer doesn’t know how or doesn’t think they can do something in base Python, we have them interact with the chat and ask. So another example that comes to mind from the book is we have a chapter–Writing Some Games. And so for most games, including the two that we’ve got in the book, you need to be able to generate random numbers, right? So how do you do that? And so in the past you would’ve used a search engine Stack Overflow or something and you would’ve found some sample code and you would’ve pasted it into your file and made variable name changes and things like that. And so what we do now is we ask chat, okay, I need to generate some random numbers, how do I do it? And then it will come back to you with a few options and then you can systematically work through those options if you like.

Daniel Zingaro 00:21:37 And you can ask, okay, is this one built into Python or not? And then it will tell you, oh, this one’s not. And you say, oh, well okay, so like how do I install this? And then no, does it work on all OSS or just Windows? Right? So we guide the reader through these questions that you can have to help you make a decision. And I think what I like the most about this is not having to learn APIs yet. Another API, like I don’t think I have room in my brain for any more APIs. And, and what’s cool is I’ve forgotten every API that we’ve used in the book, so we have like examples of merging PDFs and removing duplicate images from directories, from like people’s phones and stuff like that.

Daniel Zingaro 00:22:22 And I don’t know, I don’t know which library it’s using and I’m totally okay with that, right? Like I just wanted to get the job done, I wanted to write a tool and the tool got written and it used some sort of library and it worked great and I didn’t have to look through the documentation for that library and figure out like which functions do I have to call and things like that. So I know it can be fun to really learn an API well, but a lot of people, they don’t want to program for programming’s sake. Like they just want to get work done, right? So while I fully admit to enjoying programming just for the sake of programming, I do a lot of competitive programming problems just for fun.

Daniel Zingaro 00:23:01 It’s like Sunday morning and it’s like–hey, I got like an hour and I got an hour to work on something. Let me work on this little competitive programming problem. A lot of people, they’re not motivated by that. They’re motivated by consequences of code. And this is one thing about LLMs that I’m very excited about is you can just make a lot more progress without having to learn what these people may believe is just useless knowledge, right? Like, does it really matter how I should invoke this API right to merge pdf files? I mean, the answer for many people is no, like, they just want the result to happen. And I love how we can kind of match what they deem important, right? With the LLMs it’s like a new level of abstraction for many people.

Leo Porter 00:23:49 There’s a couple of audiences that come to our introductory classes and what Dan’s talking about here is one of the things I’m most excited about with this, and that’s the students who come and take just one programming class. I know there’s probably a different audience than a lot of the people listening right now, but the people who just take one programming class, it’s required for, for their major. They, I just wanted to explore it a little bit, but they don’t go into this as a career. I think a lot of those students right now, if you asked me a year later to program something, do any of these tasks that we’re talking about right now, I doubt they’re able to, even if they did really do well in that class. And that’s really disappointing, right? If they’ve taken a programming class, they should be able to, to do something with that a year or even five years later.

Leo Porter 00:24:30 And I really believe that if you teach them the skills of interacting with these LLMs, they’ll be able to do these tasks later. They’ll be able to come back and go, you know I don’t remember any of the Python syntax. I don’t remember even how to get started with this, but what, I’m just going to ask my copilot, how do I go about merging these PDFs, evidence directory? And then the copilot chat comes back and says, oh, you might use this and that. And then they go, oh, I remember how to write these functions and I just said, you have to go prompt. I think they could really do it. That’s a bit of a game changer, right? That means a larger portion of our society will be able to write code and use them in a useful way. I’m just really excited about that. I think it’s going to be really nice after the changes happen.

Jeremy Jung 00:25:10 I can totally see in the context of someone who’s not seeing it as a career or someone who is like, hasn’t done it in a while, these tools can be incredibly useful, right? Or it can even get you interested in this field at all, right? Like a lot of people, they struggle through the syntax and then they decide like, oh, this is not for me. Even though they had something really cool they wanted to build and maybe these kind of tools can get them over that hump.

Leo Porter 00:25:39 Exactly. I think there’s a population of students, and it varies a bit by demographics who come to computer science with really the best motives in mind, right? They want to make their goals in their life to make the world a better place and they want to achieve those goals. And if you spend the first three quarters or three semesters working with them and all they’re seeing is syntax and they’re not actually solving anything meaningful, it starts to create this disconnect of what their goals are for their life and what they think the goals of our career are. Of course, as a computer science, I want to say stick it out. You know if you go into the fourth fifth class, you’ll start seeing how these are really useful tools that can make society a better place, but it’d be really nice to front load that and have them solving useful problems much earlier and seeing that computer science can be used in really nice ways.

Jeremy Jung 00:26:29 And so within the context of people who are studying computer science will eventually who may become professional software developers, things like that. Something more long term where it becomes more of a craft, the code that comes back from these large language models. Sometimes it could be something that’s not maybe the easiest to read or it may be doing something inefficiently. And I’m wondering from your perspective how users of these tools should think about that and recognize when that’s a problem.

Daniel Zingaro 00:27:04 In the first couple of courses, typically in the CS program, we don’t spend much time on efficiency. The reason is that there’s just so much to learn early on and we worry about overwhelming people with too much for them to process it at once. And we don’t want to prevent students from becoming interested by giving them all of these requirements early on. So typically we push efficiency down the road into a data structures course for example. But your question points to another reason why we’ve decided to teach some of the skills we teach early on. So if a student came up to Leo or me and said–hey, I want to generate a phishing code, how do I do it? My answer would, would be so like, get familiar with programming first, but you are learning the skills necessary where you’ll be able to look at code later because how to read it still, right? It’s not something that you don’t understand, you’re going to know it. We’re going to spend lots of time on code reading and so later I think we can just teach efficiency the way we always did. So doing time complexity analysis on the code and there’s still going to understand what the code is doing. So I don’t think this is going to change much in the earliest courses.

Leo Porter 00:28:20 So the point about code readability, I might add that certainly they’re going to get back some, some code that’s maybe not the best style and may not be as readable. But what’s kind of interesting is that students aren’t exposed to a lot of different styles kind of in our existing courses, right? They see the code that they write and they see the code that the professor writes and gives them and there’s not much else. And so, I mean we’re going to need data and we’re going to need research to know this for sure, but I suspect them seeing lots of different code styles and having to read those different code styles may actually inform them better than we do now about what makes code more readable and then they may be able to employ that as they go forward.

Jeremy Jung 00:29:00 And when you’re saying they’re going to read different styles and things like that, are you referring to code they’re going to see from the LLM or are you talking about them reading just other code bases in their classes or their professional work?

Leo Porter 00:29:16 Oh, I’m sorry. Yeah, I was referring to the code they’ll see from the LLM, oh I see. The LLM will come back in all these different ways. They’ll have different styles and it’ll have different approaches to solving it, right? Sometimes it’ll come back with like this one line lambda expression thing that solves it and they’ll have no idea how that works and they’ll, they’ll ask for a different answer and they’ll get a much more user friendly first programming experience kind of code back and they’ll be able to understand that and go, okay this, this is the kind of code that I want to see, not this, this other thing that was completely not readable. Yeah,

Daniel Zingaro 00:29:45 By default you can get it to give you 10 code segments to solve the problem, right? So it’d be kind of cool if we ask students about each of them, right? Each of the 10, which ones are right, which ones have bugs, which ones have good style, which ones have bad style, it’s like a built-in learning opportunity right there. So

Leo Porter 00:30:02 Oh it’s true. Yeah. And so the 10 things that Dave was referring to is if you do control enter and VS code when you’re working with a co-pilot, it’ll give you back 10 possible responses and you’re totally right Dan, you could just say of these 10, how readable are they? Are they right? And there’s lots of fun things you can do to ask students questions.

Daniel Zingaro 00:30:18 And often many of them are right with just subtly different ways of solving the problem. I mean I’ll admit to having some fun looking through all of the suggestions just to kind of see what the variability is and when there’s a lot of variability, I really like it because like Leo said, it exposes people to different styles they may not have seen before and it may encourage you to ask questions, right? Like why does this one work, right? I’ve tested it, it doesn’t look like it should work, why does it work? I feel like that’s the beginning of a pretty powerful learning experience right there.

Jeremy Jung 00:30:50 Yeah, that makes sense to me because I think about how when a lot of people are doing software development before all these LLMs, they will search on the internet and go, okay, what’s an existing answer for this thing I’m trying to do? They’ll find a post on stack overflow, and they’ll find the accepted answer and it’ll be like, okay this is it. This is the solution. Whereas at least in this case it seems like you can go like, okay well here’s 10 potential solutions and at least you get a little bit more exposure to what are the different ways you could do it.

Leo Porter 00:31:25 Exactly. And it’s nice for them to see these different options. And I think there is for professional software engineers seeing that stack overflow post of like, here’s the accepted answer, integrating that into your code isn’t a big jump for a lot of us. But I do want to stress that for the intro students, it often is a really big jump. Just the how do I change around this? I would, this was the interface for this function, but I’m asked to have this other interface for the function and they really can struggle in that domain. And so I think copilot and these LLMs are nice and that they give back answers that are more tailored to the existing code that they’re working with and will reduce that barrier of them trying to incorporate the answer.

Jeremy Jung 00:32:04 So it seems kind of overall, when you’re talking about people who are using programming in a more professional capacity, the code style and efficiency that will probably be taught very similarly to however it is now, where you basically have to get exposed to different styles and types of code, get exposed to the algorithms and, that will allow you to read the answers you get back better. So the answers you get back from the LLM with the knowledge you gained from these later courses, you’ll be able to tell like, oh okay, this level of complexity or this has like exponential performance implications, that kind of thing.

Leo Porter 00:32:53 So I think the performance piece is really important. I mean I appreciate you bringing it up. I think I’m kind of curious what percentage of the time professional programmers are really spending optimizing the codes they write. I suspect a lot of the code that’s written is pretty straightforward. You already know how to work with their database you’re working with, you already know how to write the queries for that. You’re just, you’re still doing something that that’s certainly thought provoking, but it’s not the hard work of, oh how am I going to write design the right algorithm for this to get the exact best runtime. And so I think there are sometimes that does matter, but those may be the times that the LLMS aren’t as helpful and there’s still going to be a pretty big need for programmers who know how to do that themselves.

Jeremy Jung 00:33:35 Yeah, I mean I think that of course this is going to vary from industry to industry, but Dan you’re talking about learning APIs and I feel like a lot of jobs are learning APIs and gluing them together. .

Daniel Zingaro 00:33:48 Yeah, I would agree. But I wonder what can happen if some of that’s automated, right? So maybe people who are gluing APIs together will be able to get even more done, right? Incorporate even more APIs in the same amount of time that they’ve been doing it. Now I don’t know if that job changes as dramatically as it seems. I guess there’s this tension between people having to change jobs or become more efficient in the current job and, obviously I hope it’s the latter and there is some recent evidence that it could end up being the latter. Just more productive people overall building bigger software, incorporating more APIs than before and not overloading yourself. So we’ll see how it all turns out, but I’m hopeful that we’ll just be doing our jobs better

Jeremy Jung 00:34:41 In that context. Sometimes people will say that the reading of code and comprehending code can sometimes be more difficult than writing the code and in fact can sometimes take you more time. Like let’s say you’ve built out a project and now you need to add new features. Well, to add the feature, you have to understand the code base that existed before. And so when we talk about LLMs in the context of not programming, but just general writing, people talk about the fact that it’s easy to generate more writing, right? We can generate more documents, blog posts, more articles, that sort of thing. And with code it sounds like it’ll be similar, right? Where it’ll be easier for us to write more code, generate more code. But I wonder if either of you, I think it’s a concern that we’ll be generating so much code that now we’ll have so much we won’t be able to even have the time to understand all of it.

Leo Porter 00:35:40 I haven’t thought of much about the generating so much code that you can’t understand. I mean I think if we’re generating code, I’m really hoping someone’s testing and making sure it works right and stuff. And so I guess it depends what level of the interface we are looking at. But I have thought a fair bit about what you described early on in your question, which was diving to a big code base, figuring out what needs to be changed and changing it. That is a really common task, especially for like new software engineers and their first jobs, right? And it is also one that’s really well documented in the education literature. Uh, see education literature that we aren’t teaching them to do. Like we almost always are giving them write these functions are really well defined or write the code all from yourself, but we rarely ever give them large code bases to learn from.

Leo Porter 00:36:29 Now I don’t think diving into a large code base and trying to understand how it works is the right thing for like an intro class. And then we’re mainly talking about students first learning a program here. But I am encouraged that we are teaching code reading as kind of a first level skill when I think current programming courses teach code reading, right, in parallel with writing. So a lot of the writing’s happening very early before they even know how to read well. And so I think there’s some optimism here of that if we teach code reading first and make it a core skill, they’ll be better set up in the later classes to maybe take on those large projects where they tackle the exact problem you’re describing, which is also the exact thing they’re going to have to do when they get to the jobs.

Jeremy Jung 00:37:10 Yeah, it also kind of, I wonder sometimes when you’re writing code you’ll write it in a certain way because it’s tedious to write a lot of code, right? Like you’ll make something generic in such a way where you can reuse it and maybe reduce the amount of lines of code. But then when you have something generate that code, maybe it’ll be a solution that is a lot more code than you would’ve written personally and it works. But by nature of the fact that it was easy to generate, you chose that solution versus one that maybe was more generic and had less code. And I’m not sure if that makes sense, but I’m kind of curious if the use of these models will sort of change maybe how we write code?

Daniel Zingaro 00:37:58 I’m kind of wondering if the amount of code we throw away is going to increase exponentially. Because if you spend time working on something, you’re probably going to keep it. But I wonder because Jeremy, like what you said, it’s so easy to generate code now. So I’ve had this thought where , I’m not sure how, how much I believe myself here, but should we be storing the prompt, like not the .TY file, right? Like just store the prompt and then if you do have to regenerate the code later, maybe you got to make some tweaks or something. You just change the prompt and then rerun it. Because code is, it’s not there yet, but it’s becoming free, right? You can generate as much of it as you want . And so I wonder how much of it, so there’s a lot of code already that you write once and you run it once and then you get rid of it or lose it or whatever. And I wonder if that practice will increase. So it’s like, okay I want to do this data analysis. Okay, so you write a prompt, you get some code, you generate some graph, and then you just don’t even think about it. You just get rid of it and then maybe later you want another similar analysis and you just do it again, right? So I kind of wonder because there’s maybe less ownership now of code, right? You didn’t like sweat as much to write the code. So maybe more if it gets thrown away.

Leo Porter 00:39:17 I completely see what you’re saying, Dan. So you have the prompt and you had it perform some form of data analysis and you want to tweak it to do a slightly different data analysis. I wouldn’t go into the, I mean right now if I wrote the code from scratch, I would go into the code and find that one spot that I need to change and I would tweak it. But if I’m just generating the code, I would just tweak the prompt and then get a new piece of code that does exactly what I want there without having to fight with a code to find the one spot to change.

Daniel Zingaro 00:39:39 Yeah, you know how it can take a long time to familiarize yourself with a program that you wrote six months ago. It’s like, oh yeah, it’s variable temp one. Like was this for again?

Leo Porter 00:39:53 I think we’ve all been there .

Daniel Zingaro 00:39:54 Yeah, right. But yeah, I don’t know, it’s just a thought I’ve been having. It’s like, so now when I hear people talking about code maintenance for example, like using good variable names and consistent style and stuff, in my head I’m thinking, well is the code the artifact now? Is it still the artifact? And right now, of course it is. But, fast forward a little while, maybe some of what I just said sort of becomes true eventually.

Leo Porter 00:40:18 That’s getting to perhaps kind of a larger issue about what is the interface that we’re, we work with as programmers. I’ve been thinking about this a lot just because I teach, my background’s, I have a PhD in computer architecture and so I teach the classes that do machine code and assembly code and their core classes for computer scientists because you need to know how computers work. And I think that’s a core component to understanding that. But we don’t start by teaching students machine code. Like no one wants to learn how to program a machine. At least, I can’t imagine anyone wanting to learn that. And we’ve kind of cognitively picked Python or Java right now the most common two programming language to learn from because they’re easy to learn, they’re easy to read, the code tends to be more understandable when you read it. It tends to be a little bit more forgiving when you write it.

Leo Porter 00:41:04 And so we pick these because we think they’re nice interfaces, they’re convenient for programmers and they’re convenient for new learners. And it just seems to make sense that the LLM may be that next step of interface that we start choosing that catches because it can be wrong. It’s not like a compiler, A compiler is deterministic, it’s going to be shy of that. Maybe one time in your career you find a compiler bug, like the compiler’s always right. This time the LLM isn’t always right. And so I’m not sure how this is all going to play out. You can imagine the LLM is the new interface and all we ever store is code prompts and we don’t ever even see the code perhaps as one scenario. And the other is we do in fact still interact with the LMS and still interact the code. But I think it’s too early to kind of know where this is all going to fall. But we could see some big shifts I think in the field over the next few years.

Jeremy Jung 00:41:48 Yeah, I think that’s pretty interesting to think about what Dan had mentioned where yeah, you could check in your prompt and maybe a set of test cases for the app that’s supposed to come out and yeah, maybe that’s your alternative to the actual source code. Especially for things that, like you were saying are used not that frequently or maybe you only use it once and so the quality of the actual code is maybe less so important in terms of readability and things like that. And as long as you can reliably reproduce that thing. Yeah, maybe that does make sense.

Leo Porter 00:42:31 The reliable reproduction could be the tricky part and you there may be even something that you start doing where you tag don’t try to reproduce this. Like we actually spend a whole bunch of time on this, it’s super optimized, like don’t think the LLMS going to give you this answer again. So, keep the code along with the prompt, keep the code too and don’t scratch that because the LLMs not going to do better. And then in some cases you’re like, yeah, the LLMs going to do a pretty good job on this and so feel free to just regenerate it.

Daniel Zingaro 00:42:57 Maybe we have to distinguish between code that you can just get out of an LLM no problem and code that people have spent time working on . I like that. Yeah, ,

Leo Porter 00:43:07 It’s some new like hashtag and don’t change.

Daniel Zingaro 00:43:10 Yeah, humans were here.

Jeremy Jung 00:43:12 .

Leo Porter 00:43:13 Exactly.

Jeremy Jung 00:43:14 Yeah. This is the 30th iteration of this code we generated and we verified that this one’s good.

Jeremy Jung 00:43:23 It’s an interesting future we might be heading into. So one thing you mentioned a little bit earlier is the tools that you’re going to recommend to your students. It sounds like it’s primarily going to be GitHub copilot and GitHub copilot X for the chat interface. And one thing about these tools is these are tools by commercial companies, right? These are tools by OpenAI and Microsoft. They’re tools that you have to pay a subscription fee to use. You have to send your code to a commercial server. And I wonder if that aspect concerns you at all. The fact that the foundations that our students are learning on is kind of reliant on these companies and these cloud services?

Leo Porter 00:44:14 I think it’s an amazing question. I think to some degree these are the tools that professional software engineers are using. And so there’s a bit of an obligation as instructors to teach them the tools that they’re going to be using as professionals going forward. I think right now they’re free to use for education’s sake. And so as long as that stays the case, I’m a little more comfortable with it. If it started to move to a pay model for education, I think there could be some really big problems with equity. And I think it’s not just true for computer science, but I’ll start with computer science. I mean if it’s computer science and we start making it where you would have to pay to get access to these models or use these models, then whether we tell the students they can use it or not, they still can use them.

Leo Porter 00:45:01 And so there’s going to be some students, the wealthier students who may have access to these who are being able to learn better from these, be able to solve better homeworks with these. That’s super scary. And you could imagine the same thing for even just K through 12 education, right? If you’re thinking about them writing essays for homework or anything else, if it’s a pay model, then the students who have the money will pay for it and get access to these tools. And the students who don’t won’t. You could imagine all these kinds of socio-economic divides that already exist only being exacerbated by these tools if they switch to this pay model. So that has me very worried and there’s some real ethical issues we have to think about when we’re using them. The other ethical issue I kind of want to mention is just the copyright and the notion of ownership.

Leo Porter 00:45:43 And I think it’s important for us as instructors to engage students in the conversation about what it means to create content and intellectual property and how these models are built and what they’re building off of and just engage in that ethical conversation with the students. I don’t think we as a society have figured this out. I think there’s going to be some time both legally and ethically before we have the right answers. But at the very least you need to talk to the students about these challenges so they know what’s going on and they can engage in the debate.

Daniel Zingaro 00:46:15 Yeah. Just to underscore that, Leo, this is the reason we’re doing research on the first version of the course that Leo’s teaching. We need research on the impact of LLMs on students especially. We need to know if students benefit from this, in what ways they benefit, how are these benefits distributed across demographic groups? We have a long and sad history in computer science of inequities in who takes our courses, who succeeds in our courses. We’re very aware of this and it’s unacceptable to make that situation worse than it already is. So we’re going to be carefully doing our research on this first offering of the course.

Jeremy Jung 00:46:58 So we’ve mostly been talking about the benefits of using these tools in classes and in education. We just mentioned the possible inequities. If you don’t have access to those things, I wonder if from either of you, if there are negatives you see to this technology, whether that’s the impact on what people learn or in anything else. Like are there downsides you see to the use of this technology?

Daniel Zingaro 00:47:27 Yeah, so in addition to the importance, inequity concerns that we just talked about, I have a concern about students using the tools in ways that don’t help them learn the skills we think they need. So it’s a power tool and you can get pretty far, I think, without being systematic in how you work with it and without testing, without debugging, it’s kind of magic right now. And so I can imagine a lot of students just taking off at a hundred miles an hour and . So then one of the things we have to worry about in these initial courses is convincing students that there really are principles to using this technology. You can’t just type something and get an answer and then go party. And so that is one of my concerns. That’s one of the negatives. It’s super powerful.

Daniel Zingaro 00:48:15 So before you can’t just type some python and make it work, but now you can sort of type in whatever you want and kind of get something back. And so part of our job as educators is to help students use these tools in a way that will ensure their long-term success with these tools, right? So I’m not saying that they can’t just do whatever they want and make some of their first assignments work. I think they could, I think they could be like unprincipled with the prompts and just throw it in there and get code and submit that code. But we’re going for longer term effectiveness here, right? We have students who may not take another CS course, we need to keep them in mind. We have students who are going to want to eventually be software engineers, security experts, PhDs in computer science, right? So we have a number of audiences that we’re talking about and we think they all need to know the fundamental skills of programming still, even though they have this power tool at their expense now.

Leo Porter 00:49:14 Speaking of the fundamental skills for programming, because of my Harvard background, I’m this huge fan of teaching mental models in classes. Like what is the mental model of computation? Like how do you imagine the computer is executing as you write the code? And ideally a professional computer scientist should be able to take, okay, well this is kind of the, my interpretation, this is my mental model for what I’m working in Python. If I really want to drill this down, I can turn that into assembly. And if I really had to into machine and even think about how this is working within the cache, subsystems and virtual memory and all these things, I want them to be able to play those things out. We are changing the first class and I think the first class is going to be doing some things much better than before.

Leo Porter 00:49:53 Like teaching prompt decomposition and things like that. I’ll mention that in a second. But we are doing some things better, but we may not be teaching that how is the computer working as well? And so you can’t just change one course and think the rest of the curriculum is going to work. And so I think the entire curriculum’s going to need to adjust in a way of just adapting to these LLMs. The second piece for things getting potentially more challenging is instructors, we’re in a good place right now as instructors in terms of how we assign and grade homework. So grading this probably isn’t going to be a shock, is not one of our favorite things to do as faculty. I mean, it’s actually really important. It’s central to us understanding how our students have learned, but it’s generally not the most favorite thing that we do.

Leo Porter 00:50:37 And what a lot of instructors have done myself included is for much the introductory sequence. We have created assignments that can be entirely auto-graded. So we define functions incredibly well, like write a really good description, this is exactly what it needs to do, and the students write that one piece of code. And whether we like it or not, that is exactly when copilot does very well. And the LLMs do really well. And so the LLMs are going to solve those very easily already. So we have to fix our assignments just like it’s a given, but it means that we’re probably going to have to rethink how we do assessment. And so we’re probably going to be writing assignments that are much more open-ended and we’re probably going to have to be grading those, putting more care and time integrating those potentially by hand. But I think these are all good things for the community and for the field. But you can imagine how it’s going to be a bit of a shift for faculty and it may take some time to be adopted as a result.

Jeremy Jung 00:51:32 And so if you’re shifting to homework that is more broadened scope, has more code, needs more human eyes on it, how does that scale within the educator’s side? Right. You were talking about how you’ve got things that could be auto graded before and then now you’re letting somebody generate this whole project. How does that work from your end?

Leo Porter 00:51:56 I think there’s a few things that are play. We at large institutions, like Dan and I are at, we have kind of armies of instructor assistants, instructional assistants that help us. And so we can engage them in various tasks. And so one of the roles they heavily have now is helping students in the labs solve these auto grade assignments. And so you can imagine they will still be in the labs helping the students with these creative assignments, but now they’re going to have to have potentially a larger role in assessing the success of those. There’s been some really creative work in assessment. And so I’ll mention a couple of the ones, I’m sure I’m going to be omitting some, but one is students could complete their project and then they have to record a short video of them explaining the code that was in their project and how it worked.

Leo Porter 00:52:40 And you actually assess them on that video and their explanation of the code and how it works, right? Because those can be perhaps shorter than trying to go through a really big project and see how it works. There’s a tool out of a UIUC called PrairieLearn that helps with, these are still auto graded, but it helps with the test setting where you can write questions and have them created kind of in an exam or homework setting. The need feature of that is that it can be randomized. And so you don’t have to worry as much about, students can have leaking information to each other about task content from quarter to quarter. And so because of the randomization, they have to learn, actually learn the skills. And so you can kind of engage them in these test centers. And so right now a big grading burden on faculty is exams. And so you can actually give more exams, give more frequent feedback to the students and without the same grading burden. And so that, that’s the other kind of exciting assessment piece

Jeremy Jung 00:53:36 In the different types of assessments. Like the example of the video you gave, I’m just thinking to myself, well, the person could ask co-pilot or chat GPT to give them a script, right? And they can rehearse that when they send you a video.

Leo Porter 00:53:52 I think this is a philosophical shift in assessment that’s kind of been gaining momentum over the years. And that’s that the assignments are all formative and they should all be pretty low stakes and the students should be doing them for the process of learning. And then, and, and it is unfair in some ways. There’s a, there’s a lot of things right now where you kind of grade them on, were you present at this time? Did you meet this deadline at this time? Which, if you’re thinking about a diverse population of students, like you can imagine like a working mother who’s also trying to do this, grade them on, were you here at this time? Doesn’t feel very equitable to me. And so there’s this whole movement for grain for equity that shifts much of the assessment onto the exams. And so yeah, the students could find multiple ways to cheat on the homeworkís, but that’s not the point of the homework.

Leo Porter 00:54:44 And the homework’s just to learn. And it’s a small scale of the grade, but you still then have those kind of controlled environments where they’re taking these tests and that’s where the grade actually comes from. It’s going to take some time to make that shift. At least a number of schools, my own included, assess that those take home assignments are a huge portion of the grade. And students will love that because they can get all this help. And they can, especially with the auto graders, they don’t even write their own test cases. They just use the auto graders, the test cases, right? Yeah. Which is really depressing. And they go to the, the instructional staff. The instructional staff tend to give away the answers. That’s actually a paper that we published a few years ago. And so the students love this high stake, but tons of help version of assessment, but that may not actually measure their, their level of knowledge. And so it’s going to take a little bit of adjustment for students and for faculty to do the shift to where the exams are the bulk of the grade.

Daniel Zingaro 00:55:33 Also, I’m not convinced that cheating is going to be a problem here. It’s very possible, for example, that students cheat on our previous assignments because the assignments were not authentic. In industry, no one’s going to come up to you and say–hey, like from scratch, write this exact function, takes two lists and determines how many values are equal between them. It’s like, that’s not going to happen, right? You’re going to be doing something that has some sort of business purpose. And I kind of wonder, and this, this will play out one way or another in the next few months, but I kind of wonder if we give students authentic tasks, now you’re cheating yourself right out of doing something of value, right? Like before you were probably cheating yourself out of a learning opportunity. But how can students know that, right?

Daniel Zingaro 00:56:24 The assignment’s boring, right? It’s like, write all these functions and then something happens because of the magic starter glue code we wrote. So I don’t know. I feel like if you give students opportunities to learn what they want to learn, there’s, I don’t know. I don’t, I just don’t think there’s a reason to cheat. And, I mean, I’ve been much happier in my career recently when I don’t worry about it. So it’s like, okay, I’ve got a bunch of students, some of them are going to cheat, some of them are not. And I’m here to talk to the ones who want to learn. So I don’t know, A lot of people we’re on some email lists, for example, and a lot of people seem to be panicking about it. And I kind of think buddy, you had a huge cheating problem before. I don’t think it’s going to become worse now that you’re giving students authentic work to do. Right? They all want to be using programming to do their jobs better or make their lives better, or their world better. They don’t want to waste their own time. But if you give them a decontextualized task, it’s like, it’s super tempting to just cheat, right? Because what’s the point? Right? And so I’m very hopeful. I am not convinced that cheating is going to be a problem.

Jeremy Jung 00:57:29 Yeah, that’s a good point. And I think it’s very motivating for any student or anybody who’s learning a thing to be able to see a clear connection to like an actual thing that I made versus I’m writing functions to pass these test cases is like not very , not very interesting intellectually. So I think if you structure the projects where it’s like, oh, I’m going to actually make this thing that does this thing that seems pretty cool, then yeah, that’s definitely more motivating to actually go through with it.

Daniel Zingaro 00:58:02 Like, just off the top of my head, imagine if every student had to make a landing page. Mm-hmm. like a website who’s going to cheat . Like, I want a landing page. Yeah. Like, I want that and students are going to want that too. And so it’s like, well, okay, like I may as well make it right. Like this has a purpose. So Leo, I’m curious, you’ve been patiently listening to that. I’m curious what you think about it.

Leo Porter 00:58:24 Oh, I can’t agree more. I think, I mean, we can leverage the research, right? That computing and context is kind of this well-established thing that if you teach computing in a context that’s meaningful to the students, they tend to learn more and engage more and want to stay in the major more. And I think we’re just going to be able to do this, right? We, for convenienceís sake, and because of the scale of the number of students that we’ve had in our classes, we’ve kind of moved away from that and gone to these auto graded, not so exciting assignments. And I think this is the impetus we need to go back to fun, creative, interesting assignments that the students are going to put time and care into because they want to, not because they have to.

Jeremy Jung 00:59:00 So it sounds like through our discussion, you’re really excited about bringing large language models into the classroom and kind of what that means for you and your, your students. And I wonder if there’s anything we didn’t really touch on or maybe something that was unexpected that you think is going to make a really big difference to you and your students?

Leo Porter 00:59:21 I think one of the things that we haven’t touched on yet that I’m, I’m really excited about is the piece of problem decomposition. And so over the years, because of this trend towards auto grading, what’s happened is all the cognitive work of taking a big computing task and breaking into smaller pieces, deciding what classes should exist, what functions should exist, all those interfaces, all that work that I think is really interesting and exciting is now done for students because the auto grading structure just makes it so you have to have these functions and they just code the functions. And so I think that’s really concerning just from a software engineer in perspective, that students are learning how to program without learning those core abilities as software engineers to take a large problem, break it down, figure out what the right interfaces are, and that’s a lot of, that’s actually more art than science I’d argue. And so the more time you have to practice it, the better. And I am incredibly excited that LLMs are kind of forcing our hand to make us step back, give larger programming tasks to them, and teach them the process of problem decomposition explicitly in a way that we’ve never really never done before.

Jeremy Jung 01:00:33 I think that’s a good place to wrap it up on. So if people want to hear more about your upcoming book or maybe even enroll in your class, Leo, where can they get some more information?

Leo Porter 01:00:47 Both Dan and I have active LinkedIn pages and we’re happy to have folks follow us there. Manning Publishing is the publisher for our book, and so we have that book out on early access right now. It should be available entirely electronically by August in time for the start of the fall quarter, and it should be out in print shortly thereafter.

Jeremy Jung 01:01:06 Cool. Well, this has been an interesting discussion. I mean, large language models are kind of, that’s the thing right now. Everybody’s trying to stuff it into every single product, and I think getting both of your perspective on where it fits in education has been very interesting. So thank you. Thank you very much for coming on the show.

Leo Porter 01:01:25 Thank you Jeremy for inviting us and for writing such great podcasts. We really appreciate it.

Daniel Zingaro 01:01:28 Thanks, Jeremy.

Jeremy Jung 01:01:29 This has been Jeremy Jung for Software Engineering Radio. Thanks for listening. [End of Audio]

Join the discussion

More from this show