Search
SE Radio Guest Scott Hanselman

SE Radio 711: Scott Hanselman on AI-Assisted Development Tools

Scott Hanselman, the VP of Developer Community at Microsoft, speaks with host Jeremy Jung about AI-assisted coding. They start by considering how the tools are a progression from syntax highlighting and autocomplete. Scott describes the ambiguity and non-determinism of agentic loops, why vague high-level prompts usually don’t give good results, and the need to express intent and steer the models. He explains how knowing fundamentals helps you create better plans and know what to ask the models, and how to treat agents differently based on your knowledge level. He discusses his experience porting Windows Live Writer to a modern .NET stack, and defining success and providing tools for models to verify their work. Finally, he explains why you need to read and understand generated code in production environments, plus methods for sandboxing agents.

Brought to you by IEEE Computer Society and IEEE Software magazine.



Show Notes

Related References

Related Episodes


Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Jeremy Jung 00:00:19 Hey, this is Jeremy Jung for Software Engineering Radio and today I’m talking to Scott Hanselman. He’s the Vice President of Developer Community at Microsoft. And I last spoke to Scott back in 2021 where we discussed dot net. Scott, welcome back to Software Engineering Radio.

Scott Hanselman 00:00:34 Hey man, how are you?

Jeremy Jung 00:00:35 Pretty good. Today we’re going to talk about all these AI assisted development tools that are out there, and I think it’s super overwhelming for a lot of our listeners and for myself. So, let’s give an overview of maybe the types of tools you have experience with. I think maybe we could start with just the Chatbots, the ChatGPT and the textbox. Do you go back and forth on,

Scott Hanselman 00:01:01 I would even go farther back if I may. And not to put too fine a point on it, but let’s go back to 1975 because I was thinking about this yesterday. I was trying to explain this to my wife and it’s like, when I started coding in 80, 84, I went from Assembler to C and people would be like, C come on, that’s too high level, man, you got to get down to the middle, right? It’s going to rot your brain, C’s going to rot your brain. And then in the early 90’s we got Color and by that, I mean we got syntax highlighting and they’re, oh, it’s going to rot your brain. And then we got Stack Overflow, and they were, oh, you’ll rot your brain. And now we got this. When we started thinking about AI augmented coding, it was next token prediction.

Scott Hanselman 00:01:38 You type a for loop, you hit space and then it makes ghost text and suggests. So that’s the first thing. Then there was the Chatbot. So just right before the Chatbot there was the ghost text we were, it would interrupt you and they would have next token prediction within the editor. And some people really that because it’s like, oh, I know what I’m doing and then it’s just going to auto complete. So, it was auto complete on steroids. Then people went into ChatGPT and started asking questions and then they would copy paste not from Stack Overflow, they’d copy paste from ChatGPT. Then the agentic loop started and that’s where things got interesting. And if you really want to sound smart in meetings, just say agentic as many times as possible. And agentic loop sounds even smarter.

Scott Hanselman 00:02:20 But the idea was that why am I copy pasting this code from ChatGPT when I could just let the agent, the LLM run build, look at the text, see a warning, fix the warning, and then loop. And that’s when we started getting into things like GitHub CLI and Gemini and AMP and Open Code and Claude and all of those kinds of things where it’s LLM sees some tokens, gets the result, makes a judgment and I call it the ambiguity loop. Because this is the thing about programming versus LLM programming. Programming is not ambiguous. It runs exactly as you wrote it. And if there’s a bug, it’s your fault. And if you want a for loop, you write the for loop, you do it in bash or PowerShell or whatever, and it does the thing. But an ambiguity loop is, it’s like yeah, I don’t really know.

Scott Hanselman 00:03:06 Things could happen, and we fool ourselves into thinking regular expressions and stuff, handle ambiguity. But if you imagine this kind of slider bar of ambiguity, between I’m parsing this binary and it’s going to work this way and here’s the way that the bytes are packed versus the, I don’t know man, it’s unstructured data. What are you going to do? LLMS are really good at ambiguity. Turns out the toil part of programming is really ambiguous. I’m writing this bottom on this other machine over here, I’m running what’s called a Ralph Loop that we’ll talk about later. And it’s bumping into all kinds of little weird problems. And those little weird problems are just crap that I would have to go and Google or look on Stack Overflow. It’s just ambiguity. So, I’m running not a Ralph Loop but an ambiguity loop that’s working that out because that’s not the fun part of programming. The fun part of programming is building stuff. So, I think that these agents and why people are freaking out about them is because it’s that same feeling that we got when Stack Overflow happened. We’re, oh, programming’s over. Or when I got syntax highlighting and I was, oh, I don’t have to do anything now that all the text is in color. Does that make sense? That’s how I contextualize it across a historical context.

Jeremy Jung 00:04:10 Yeah, I think that’s good to bring it back to the syntax highlighting and the auto complete. Because I think nowadays that’s very uncontroversial, right? People for the most part will accept, oh, the syntax highlighting is there. It helps me point out what I’m looking for. The auto complete. I can quickly see is the suggestion something I want? Yes, or no? I think for the most part people are on board. And I think this ambiguity loop you’re talking about is maybe a little bit, I think people have more mixed feelings about is because of that what you’re describing where it’s this slot machine, right? Maybe it’ll give you a good answer, maybe it won’t.

Scott Hanselman 00:04:48 Yes. And here’s a really great, I had this internal conversation with this guy, we’re on teams and this is an internal guy, and he was so excited because he did a one shot. A one shot is when you type one prompt and then software pops out. He did a one shot, and he cloned Minecraft. And there’s this thing that I think it’s called, there’s a paper, I think it’s Project Unicorn or something that where Open AI cloned Minecraft, but forget about that result for a second. And think about the simple semantics of typing a prompt. Make me a clone of Minecraft that runs in the browser using 3JS, which word in that sentence is doing the heavy lifting, the heavy semantic lifting? There’s Minecraft and there’s 3JS because you’re steering it and saying use 3JS versus using Babel versus using whatever.

Scott Hanselman 00:05:35 But in that sentence, make me a clone means a copy of Minecraft means a lot. The word Minecraft semantically is effectively a 50-page spec. So, I was trying to tell the guy, dude, that word is doing literally all the lifting, try explaining to it to give you a clone of Minecraft. Except you don’t get to use the word Minecraft, right? And now that would be effectively impossible. Now you go from, oh my gosh, I did a one shot of Minecraft to this is a hot mess and I really need to be specific. And you’re going to get, like you said, a roll of the dice. So, we need to ask ourselves when we are making these ambiguity loops, what is the job of the programmer? Well, it’s to be specific because it’s all about expressing your intent and making your intent understood. And I’m sure we would’ve loved in the sixties, seventies, and eighties to just talk to the thing, but it wasn’t specific enough.

Scott Hanselman 00:06:30 So we needed to be so specific then put that electron in that register and we’re going to use assembly to do it. And now suddenly I can say, hey computer, what’s my blood sugar? And it knows to go into a rest API to get the tokens to go up to my Mongo database to bring all that kind of stuff down because suddenly the Lego pieces are high enough and the stack is deep enough that I can get what I want. And it might try it in Python one day, it might try it in JavaScript one day, but the result is I get to know what my blood sugar is. Because a rest API is called. But that ambiguity loop shouldn’t be something that we should take for granted because it’s not going to work every time. And that’s why I think it’s interesting to use the ambiguity loop to get me specific software that runs absolutely reliably. Because talking to ChatGPT or talking to Claude, that’s not a scripting language, right? It’s effectively a pros compiler and that’s different. So, I’m using it to make grounded scripts that I then run rather than just yapping at the thing and pretending that it’s bash files.

Jeremy Jung 00:07:30 In some ways it’s like that shift you were talking about earlier where you went from writing assembly or machine code to writing compiled languages, compiled down to that. And then we got to higher level languages like JavaScript and then now we’re one level even before that, we have our English pros that’s helping us compile or generate, the high-level language.

Scott Hanselman 00:07:53 Right. So, how people use Lua or C# and they build it into like unity apps, right? And that’s the scripting language for their game. That makes sense to me. To embed an LLM would be to embed ambiguity, which they think, oh yeah, look, the NPCs are going to be able to do stuff. But now they could do anything. They could say dumb stuff. They could go off the rails and the game could completely do something that you don’t want it to do. That’s what we don’t want to do. We don’t want to push out slop because rolling the dice is not software engineering.

Jeremy Jung 00:08:23 And I think maybe another important point is with that example where you said somebody tried to one shot, let’s write Minecraft, right? My guess is that the end result was not actually Minecraft. It was probably something that resembled it a little bit butÖ

Scott Hanselman 00:08:39 Yeah, I wish I could show you. It’s actually, to be clear, it’s super impressive. It’s a full voxel world. It generates the voxels, it runs in the browser, the textures are all there. There’s no crafting in mine, but you’ve got an inventory, it’s Minecraft. If you woke up from a dream and you’re just, I had a dream, I was in Minecraft. So, it’s maybe like 60% of Minecraft and that’s cool. But again, the spec lives entirely in the word Minecraft because if you said make me a clone of fufu and fufu isn’t a thing, you’re going to just roll the dice and it’s going to go insane. That’s why it’s not a reasonable demo to say, make me a Space Invader, make me a Pac Man, make me a Donkey Kong because those words do all the heavy lifting.

Jeremy Jung 00:09:21 I feel some of the higher levels people who are running these businesses, CEOs and things like that, they have this vision that I’m going to be able to ask this rather ambiguous thing and get a complete product. But to me it feels we’re not actually there. Right?

Scott Hanselman 00:09:36 So an example that I really use. And then may I ask, do you have a car, and do you drive? Okay. Do you drive stick, or do you have automatic?

Jeremy Jung 00:09:43 Automatic.

Scott Hanselman 00:09:44 Do how to drive stick?

Jeremy Jung 00:09:46 I don’t know it well enough to comfortably do it.

Scott Hanselman 00:09:49 Okay. Do you feel that it is a personal failing, that you don’t know how to drive stick?

Jeremy Jung 00:09:53 I do not.

Scott Hanselman 00:09:54 Okay. So, take everything that we just talked about and now apply it to programming, right? I could do assembler, but I couldn’t do it without Googling anymore. Do I feel bad about that? No. Could I drive it in an emergency? Yes. Could you drive someone to the hospital and stick? Yes. You could probably make it happen if you were in a pinch. Here’s the question though. What if you answer just now, I don’t have a car, I only know how to Uber. Then your relationship to the vehicle fundamentally changes. And if the vehicle breaks, you’re going to get out of that car. You’re going to leave that poor gentleman there, he’s going to have to fix that car, he’s going to have to repair it, oil change and tires and you’re just going to get in another Uber and fly away. But how deep is the stack for you to simply move your body from point A to point B and you have no control over how the guy’s going to get you there?

Scott Hanselman 00:10:38 So I think that there is value in driving stick occasionally trying to get somewhere without a GPS because then the ambiguity has to be, and the responsibility is on you. The honest is on you. While right now the idea, hey, send a stranger in a car with candy to pick me up and I just lean back. That’s the same thing asking an Uber to send you somewhere and asking Claude code or GitHub Copilot to make something for you with no specificity. You’re lucky. I hope you get to the airport, good luck. You know what I mean?

Jeremy Jung 00:11:08 Well I suppose in that example, the Uber or the Lyft, you’re giving an address most likely. And it’s sort of this well-defined problem of I need to get this person to the airport or I need to get this person.

Scott Hanselman 00:11:23 You have a vision of the end.,

Jeremy Jung 00:11:24 Right. And I feel that’s a little different than me saying, telling an LLM, please build me a web browser or build me Minecraft.

Scott Hanselman 00:11:33 Yeah, that’s fair. I think an example would be, hey Uber, take me to the airport.

Jeremy Jung 00:11:37 Without specifying which one.

Scott Hanselman 00:11:39 Without specifying and there’s no ask user question.

Jeremy Jung 00:11:42 I guess your point maybe is that it’s somewhat unrealistic to say something so vague and not expect there to be a back and forth to be clarifying questions, that kind of thing.

Scott Hanselman 00:11:52 Exactly. And I’m sure you’ve been in an Uber, and you’ve told them, hey, take the 405 because you are steering them. You’re not driving the car, but you steered. And that’s why we’re finding steering in these CLI coding agents so delightful where it’s like you’re 90% in the direction I need you to go just get off on this exit. Those little moments matter. Specificity matters. I’ll give you another example. My son is 18 and he has a Depop. So, he sells vintage clothing on Depop, which is like a Marketplace for people to sell clothing. But he ships on a website called Pirate Ship. He doesn’t use the Depop built in shipping. So, I had this vision to make it, so he didn’t have to copy paste stuff back and forth. And I thought I was going to do it with playwright, and I was going to automate browsers, and I was going to build a whole dashboard for him to do shipping.

Scott Hanselman 00:12:38 And I was basically using the Copilot as a rubber duck. I was talking back and forth to it. I was talking to myself in the mirror. I’m not actually talking to an entity; I’m talking to myself. And in the brainstorming, just as if I called you and brainstormed, we realized that oh, I don’t want to write a playwright extension, I want a browser extension. So, I wrote a browser toolbar form, and I wouldn’t have come up with that. I would’ve gone all the way down playwright automation and been wrong because of my own biases if I hadn’t brainstormed. And a little bit of the ambiguity loop up and said, have you thought about a browser extension? And it clicked and then I went immediately it’s, hey, take me to JFK. You know, EWR has a direct flight, have you thought about that? You’re right let’s do it. And that back and forth as well as having a lot of experience and understanding the power of a browser extension turned out to be a really great experience. And now he’s got one click shipping for his little business.

Jeremy Jung 00:13:31 Very cool. I have people ask me sometimes, is it still worth learning how to code? Is it still worth learning all the specificity of http or how the browser works or databases? And I think what you’re maybe implying is that having at least some background in these things will help guide you when you’re asking these questions, when you’re trying to come to a solution. If you don’t have that background, you might not get there.

Scott Hanselman 00:14:01 I acknowledge my position in history, my age where I grew up. I speak English, I’m a white guy on the west coast. all those kinds of things are the context in which I came up. I came up in a very interesting time in the eighties programming, so I don’t want to sound old man who shakes fisted cloud. Hey bro, why are you gatekeeping stuff, right? But I see people saying, hey man, I just vibed the world’s greatest new SaaS. Check it out. And they tweet local host 3000. And it’s just like, ugh right? So, do you buy your furniture at IKEA? I do. Could you cut a board, and did you take probably wood shop? Yeah. Would you build your own furniture? No, but you an appreciation for building IKEA stuff. I think that’s why IKEA makes you build it so that you don’t feel it just appeared in your house. It can’t just spawn up like Billy shelves. So yes, while recognizing that it might sound, hey, this old guy wants me to write everything in assembler. Yes, yes, yes. Appreciating that oil changes are a thing. DevOps is a thing. Build loops are a thing and quality matters. So, I would argue that it’s not the death of software engineering, it’s the death of toil. And the only thing that matters now is taste and judgment.

Jeremy Jung 00:15:13 So where do you think that that balance is? What do you think people should be learning where they can use these tools effectively and they can still be building quality?

Scott Hanselman 00:15:25 I think, and by the way when I retire, I’m going to go and teach high school computer science. That’s my plan. I’ve got it all lined up and I was a professor before, and I’ll be a professor again. I don’t think there’s enough computer science history and I don’t think there’s enough basics. I don’t think anyone learned HTTP in college, right? They just kind of sat down, they did compilers, they learned lex and yak and then suddenly you’re learning OSS and then you’re writing apps, but you pick up http and DNS and stuff that, maybe. I took a 400 level TCPIP class. But as a general rule, you’re not spending a lot of time learning how to do Kubernetes in school. I think HTTP, DNS, NATS, that’s all bread and butter. That’s sociology, right? Why do you learn about world history? So, you can go and be like a corporate drone, you learn because it makes you a whole full person. So, there is someone right now vibe coding their bachelor’s degree in software engineering and I love that for them. But they’re going to get nailed just as people who don’t learn about sociology and history and psychology are going to get nailed. The Dunning Kruger effect is a real thing.

Jeremy Jung 00:16:29 Hypothetically, when you’re teaching your computer science class, what would you tell your students in terms of using these AI assisted tools?

Scott Hanselman 00:16:38 I would say it is a funny two-faced thing because it is a senior engineer with infinite patients, but it is a junior engineer with infinite energy. And the role that the LLM chooses depends on how you’re coming at it. If I come at it as a dot net engineer and I’m competent in C#, then it is a junior engineer with infinite energy, and I need to point it exactly and I tell it what I want and it does the thing. But if I come at it as a Python developer of which I am a novice or a medium, then it’s more senior than I am. So, I would be less likely to disagree if it makes a recommendation, right? So, if it does something in dot net, I’m like no, don’t use that, use this. I will say no, we’re going to JFK, trust me, this is how we’re getting there. But if I’ve never been to that country, I’ve never been to that state, I don’t know which airports the better.

Scott Hanselman 00:17:29 I’m going to take the advice of the Uber driver. But it’s the same LLM, it’s the same model. And if you come at it with that sense of, no, I know what I’m doing, strong opinions weekly held, you’re going to have a much better experience. Additionally, the opening prompts matter and I think that there’s room for learning prompts. So, if I were teaching the class, I would include a markdown file and skills that would force the user to, oh, you just tried to vibe your entire final, here’s some multiple-choice questions. Here’s an exercise for the reader. I’m not going to let them just one shot in your computer science final, you’re doing yourself a disservice. I think if you do that.

Jeremy Jung 00:18:05 Yeah, that’s an interesting point about how the way you interact with the LLM is so different when you know the domain or you know the language file. And I wonder if in your opinion, if it’s, let’s say you don’t know it well, are you in a sense sort of faded to get a result that’s maybe not as good because you didn’t have that background because you couldn’t steer it?

Scott Hanselman 00:18:28 If you don’t understand why it did a thing, then I would not feel comfortable shipping it. And that’s where I think the limit is. I’m doing a bunch of Python stuff, and I think I maybe a four out of 10 or a five out of 10, but I know how their type system works versus TypeScript versus whatever. So, I’m asking generic computer science questions like what’s the Python equivalent of a linter? Do I have enough guards for my exceptions? That’s not Python questions, that’s just quality, that’s just taste. And I’ll say I’ll challenge its assumptions. I build a blood sugar management system in Python a couple of days ago and I’ve asked, should this be TypeScript? And I said, make me a pros and cons table about moving this to TypeScript. And it was, no, I think Python works great. And we kept it at Python.

Scott Hanselman 00:19:13 But then I had another thing that I was doing in C# and it was on .net framework and I thought it would just be fine on .net framework. I said no, we should move to .net10. Here’s 10 reasons why. And it’s a back and forth. I think too many people are trying to vibe code stuff in two or three shots while I’m doing stuff in 50 or 60. And having ongoing conversations and I’m gesturing right here, people can’t see to my terminal which has 13 tabs open and they’re all doing something right now. And I use Copilot CLI, which is like Claude code, it’s an agent loop and it has infinite sessions. So, it has this idea of the context of how long it can think about stuff doesn’t exist in Copilot CLI. So, I can just go dash, dash resume and pick up stuff I worked on a week ago and it’s like a hibernation, I just hibernated the agent, I brought it back.

Scott Hanselman 00:19:59 So it’s remembering all the stuff and all the decisions I’ve been making on, and I’ll say stuff revisit our conversation from a week ago. Do you still think that’s true? Go and look at this poll request. Go and look at this stack overflow, go and look at this new tweet that someone that Simon Willis and said and given that new context, what do you think? So, I’m constantly challenging it and it’s almost I’ve got 13 engineers that are either working full time or they’re just asleep and I wake them up and I ask them a question and then I go off and I do my thing. And it’s really been I feel like an engineering manager with a lot of really cool little junior engineers with so much energy.

Jeremy Jung 00:20:34 For someone who isn’t super familiar with the domain or what the language they’re working in, how do you verify the assertions that the LLM is making?

Scott Hanselman 00:20:46 I love that. I love that. So, because I’m not an expert in Python, but I wrote my blood sugar management system in Python, however I am an expert in Type one diabetes. So, I thought to myself, how can I verify that this is valid, but I don’t know the language and it’s what I call fear-based development, right? It’s FDD, Fear Driven Development. I don’t want to screw this thing up; it’s my blood sugar. And in the words of Brian Lyles test all the fucking time, right? Always. So how many tests do I need before I don’t feel scared? In this one here it ended up being 270, that’s just a number. But I needed about 70% code coverage across almost 300 tests and it’s on real data and it works. So that’s CICD and it runs in GitHub actions, and it runs locally. It does nothing until it tests. And I would again argue that that’s about software engineering, not about Python. So as long as there’s a test framework and a loop and that loop is verifiable, that’s what’s important. Because everyone’s excited about Ralph Loops, which we should talk about at some point. Ralph Loops are great, but if you have a million monkeys and they’re typing in a million typewriters and you’re hoping that they’re going to type Shakespeare? If you don’t have a test that validates the Shakespeare, then you’re just wasting monkeys and wasting typewriters.

Jeremy Jung 00:21:59 These tests, are you generally writing them yourself or is this something that the LLM is also writing?

Scott Hanselman 00:22:06 So I will write four or five tests myself, then I’ll tell the LLM say write me other tests on the boring stuff like edge cases. And then because we know that in software engineering, the top three issues in software engineering are naming stuff and off by one error. So that is the things I watch for. I know that it’s edge cases. So, I’ll write a test, I’ll have it make me 10 more and then I’ll look for all the edge cases. And yes, it’s true that I haven’t deeply validated them. And it is true that depending on the model you use, it’ll comment tests out, it’ll hide stuff from you, it’ll pretend, it’ll gaslight you into thinking the tests work. And this is why it’s so important that you understand, not you but one understand that you are shipping this code. I always think it’s funny when someone puts something up in the cloud and the cloud goes down and then they’re mad at the cloud provider?

Scott Hanselman 00:22:54 I get that you’re mad at the cloud provider, but whose responsibility is it for that app? It’s your responsibility. If you felt that strongly you should have had it in two cloud providers, right? So, if this thing fails, it’s me, it’s a hundred percent me. I’m not going to blame Copilot or Claude or whoever wrote that code. That’s why when you check stuff in to GitHub it says on behalf of, if I did something and my proxy did it for me, the buck stops here. So, if this were production, production, production? I would go and double check it. And that’s why for example the Copilot CLI team, it’s seven or eight engineers and they’re doing like 200 PRs a week. It’s crazy. They’re spending more time now reviewing than they are writing code, but they will not ship slop. And if you look at another way, and I’m sorry I’m talking so much, but another way to think about it is when open-source started and the open-source supply chain started having all this flood of open-source software, how do we deal with this? Code came from somewhere into the machine.

Scott Hanselman 00:23:48 Doesn’t matter whether Jeremy wrote it or whether some kid wrote it, and it got in via PR or whether an LLM code, it has got to go through the same verification loops, the same security loops, the same testing loops, the same provenance loops, the same governance.

Jeremy Jung 00:24:01 That makes sense to me. I mean something that you sometimes hear people say with all these tools, the agents, the ability to generate code, they’re saying well I am generating so much I just don’t have time, or I just don’t need to verify the output. And it sounds in your case you’re saying no ultimately even though these tools are generating it, it still comes down to you need to review it because otherwise you don’t know what’s going on.

Scott Hanselman 00:24:28 If I don’t have time, then what are you making? Right? The point is to make a thing and ship a thing that you’re proud of. So, then we got to go, we got to go. Well then you don’t care. You don’t care about your customers. You have to understand the semantics or what are you even doing? I just reject that. Now that said, there is this rise of toy fun personal apps like using computer science history. To put this in context, we’re having a GeoCities moment, right, where everyone’s just making stuff. There’s this wonderful friend of mine on TikTok named Rodney Norman, he’s a comedian, he’s got crazy white hair like Professor Brown, and he’s got a big, long beard and he pops up on my for you page and he says you can just make stuff, you can just do stuff now. That’s what it feels. I got a rocket ship strapped to my back and I’m just making stuff that doesn’t mean I’m going to go and plug Stripe into it and start taking money for it and taking people’s money and real blood sugar. Do you see the difference? If you’re shipping this, you better care if you’re just playing. Yeah, play.

Jeremy Jung 00:25:29 Yeah. I mean colloquially, right? People are saying I vibe coded this.

Scott Hanselman 00:25:34 I don’t say that anymore. If I vibe coded it, I don’t care. You either are a vibe coder or you’re an AI augmented software engineer. AI augmented software engineer doesn’t have good mouth feel but bro, it’s just vibes. No, I’m not vibing into production. That’s not happening.

Jeremy Jung 00:25:48 Yeah, and I think too, what people may run into is whatever they build may look like it works initially, but there may be architectural problems that make it difficult to change. There could be more severe flaws that you just can’t find because you don’t know enough about what’s happening.

Scott Hanselman 00:26:06 You don’t know what you don’t know. And the first database migration or the first scale issue and you’re, oh we made a decision. Well actually, we didn’t make a decision. The ambiguity loop punted on a decision and now we’re painted into a corner, an architectural corner.

Jeremy Jung 00:26:21 So I think one of the things you mentioned is when you work on a project, it sounds you have this plan, or you have a lot of context that you’re providing to the model. To start off, can you kind of walk through what are the kinds of things that you’re saying? Are you asking for specific architecture patterns, languages, ways to test? That kind of thing?

Scott Hanselman 00:26:42 I’m trying to think of an example. I’m looking at my tabs that are working right here. Okay, here’s one. So, there’s an application that is 20 years old, it’s called Windows Live Writer. It’s a blogging tool that we Microsoft released in 2006 and then we opened sourced it later, and it’s a mix of C++, C#, the ribbon control from Office, the Internet Explorer, the actual typing the blog post is actually a calm object from Internet Explorer and it works, but it’s janky and it needs to be modernized. So, it’s a 20-year-old software. That’s an interesting technical problem because it’s a brownfield pile of stuff. But I have a vision that we could get that running on .net10 which didn’t exist with Chromium or WebView2 instead of Trident, which is the Internet Explorer thing, we could update the ribbon control.

Scott Hanselman 00:27:33 I can see you’re nodding; the spec is already in your head. So, then the question is how specific do you express that intent to something like this? And then arguably there’s kind of two ways to do it. There’s the persistent loop, the Ralph Loop, which is just do this make no mistakes, right? That’s just monkeys slapping. But verifiable loops with persistence is really, really interesting. So how do you verify that it works? What’s changed contextually since this thing came out, 4K monitors happened. That wasn’t a thing 20 years ago. Multiple DPI across multiple monitors. So, I know that that’s a thing, so I put that in a spec, .net is cross platform, but this is a Windows app. So, I’m not trying to get this thing to run cross platform. So that goes in the spec, it’s C++ and it uses comm, but I don’t really need comm anymore for this because I’m going to be switching over to Chromium.

Scott Hanselman 00:28:24 So I’ll make a comm to JavaScript bridge. Those little moments, make that loop, which I could have probably pointed a loop at it and it would fix it at some point, but it saves time. These are all these little architectural shortcuts where I’m anticipating that this is important or that’s important. It found a reference to MySpace because it had an insert MySpace link. So, I said, go through the thing if there’s plugins for dead services, mark them as to do, keep track in the to do of all the things that you’ve stubbed out, et cetera, et cetera, et cetera. I think I ended up basically dictating to the thing about a two and a half page spec. And I think we looped, I want to say 20 times and it’s running it freaking works. I’d say it’s probably 70, 80% done and then now people are doing PRs against it and we’ve revitalized this 20-year-old application.

Scott Hanselman 00:29:14 The question is, could we have done it by just saying get this running on Windows again? I think that the reinforcement that we would give an LLM to do that would be artificial because it doesn’t know what success looks like? It might just say, oh I got it running. But did you get it running? Did you post a blog post? Did the blog post work? Did you edit the blog post? So, you got to think a tester. Remember back in the day when software testers were like that was an actual job, it’s a job again. So, I feel I’m talking to this engineer that’s overseas or outsourced or insourced and they’re throwing code at me and I’m validating that code. So, I make a plan that’s about 80% of a plan and I tell it, if you have questions, ask, if you’re making an architectural decision ask.

Scott Hanselman 00:29:56 And if there’s a lack of clarity or the loop is unclear, then I will come up with a solution. Another example actually because it’s a Windows app, how’s it going to validate? Does it take a screenshot? Does it do debug got here debugging? Does it do interactive debugging? Those are all interesting things that are happening right now as well because it can’t see unless it’s multimodal. Something like a Copilot CLI told it to take screenshots and compare it to the other one and see how close it was. So, there’s anything you can do to make the plan clearer and make the loop tighter will cause it to be more successful.

Jeremy Jung 00:30:29 And when you refer to these loops, I’m assuming a loop is the model attempting to write the code to accomplish what you asked for. Is it automatically looping on itself and it keeps trying or is it where it’s prompting you and going, okay, I gave my first try time for you to review?

Scott Hanselman 00:30:49 That’s a great question. So, with this particular application, which is trying to bring back an old Windows app, it would get it to compile and think that was success. So, I think no running it is success. So, then it gets it running, it looks at the process tree and says, look, it has a PID success. I’m, no, it has to look good. Oh, okay well then, and then you can kind of start upleveling. And then I realized I was running this app called debug view, which is effectively the Windows debug spew kind of thing. And I was selecting the got here, debugging the debug right lines and pasting it back in. And after the third or fourth time I’m like, it can run it. It then asks me, what do you see? It literally would, what do you see in the debug view?

Scott Hanselman 00:31:28 And I would copy paste it and I’m like, this is stupid. So, then we made debug view and MCP server, now I’m out of the loop. I actually got myself out of the loop. So now it can compile, run, see the log, kill the PID, build and then the inner loop becomes simpler. Then I said, let me know and stop the loop when you have questions, or you think you’re done. And that’s what’s so kind of exciting about these loops is if they’re clear and success is clear and verifiable, you can do really amazing stuff.

Jeremy Jung 00:31:59 So it sounds as a part of your instructions, you’re also giving it the tools or telling it how it should verify whether, it actually did the thing?

Scott Hanselman 00:32:10 So I’m making a Windows Claude bot now Moltbot Windows tray application and it declared success just right before we got on the call and it’s, yeah, it’s working, it’s great. And I looked at the tray and it says connecting and it’s not connecting at all, right? It’s completely. And I’m, no, this isn’t working. And it’s, no it’s working. I can see it’s working. Look. And it shows me all the, it’s very excited. It’s, oh and it’s giving me rocket ships and an emoji. It’s super excited that everything is working great. And then I realized that the UI is done but the underlying stuff is not done. So, I need to separate the UI from the business logic, build the core, which is much easier with these loops. So then once I get it talking to the API, so it was my own mistake to tell it build this and I went UI down. That was foolish of me. It was my own enthusiasm that nailed me. So now I’m going to go and refactor it, pull out the core, build the Windows front end to the thing, get it connecting and then layer that UI on top of that because that’s way more verifiable.

Jeremy Jung 00:33:12 I think it’s a common experience with people who use these tools that there are times when they’re in a loop or they’re trying to come to a solution and each time they try it’s just not what you wanted or it’s saying it fixed it but it really didn’t. Are there specific strategies you have for I guess one recognizing you’re caught in this loop, it’s not going to get out and how do you break it out of that cycle?

Scott Hanselman 00:33:38 For me it’s always been that it is confused about what success looks like. You have to remember that an intern or an enthusiastic early in career engineer is very excited to be at the company. I’m so happy to be working with you Jeremy. I’m going to do whatever it takes, lie, cheat, steal, comment out tests in order to make sure that you don’t fire me. These are reinforcement learned, large language models that are very, very excited to be here. And as such, just like the Devil Wears Prada, the interactions that they’re going to have with a salty old engineer like me are going to be very interesting. I guess I’m Meryl Streep in this analogy, so I need to be very specific. Okay Jeremy, I need you to do this, I need you to do it nicely. I want a half double decaffeinated, half calf with a twist of lemon, not two twists, know what I mean? So, you need to be a little bit crisp with it. If you can give it test data, if you can give it verifiable test data, if you can say build a harness, if it doesn’t know what success is, it will make up success, declare success. And then if you’re not whatever enough, senior enough, wake enough to declare success and agree with it, then you’re going to get nailed.

Jeremy Jung 00:34:48 Yeah. And then the example you gave before, you had to tell it specifically to use this debug view tool and give it that tool through the MCP.

Scott Hanselman 00:35:00 It didn’t know. Yeah, how would it know, right? You got to tell it that you have these tools available, it’s going to take a regular screwdriver and keep poking it into a Phillips screwdriver thing. It’s like you got to, Philip screwed everything. You got a Philip right there bro. Oh you’re absolutely right. I actually, I can’t show you because we’re on an audio podcast, but literally yesterday I forgot that I had a whole test harness for my blood sugar thing and I had another a hundred tests that I hadn’t told it about and it was frustrated about something and I said, oh, go into this folder, I’ve got these tests. And it literally said, this is gold, thank you so much. Now I see. And then I was another totally unrelated thing, I was building a front end for Microsoft Loop, because I wanted to loop is notion and I was building an API for it and then I found API documentation, I just pasted it in, here’s the URL for the documentation. And I was, now I have a plan. So, it’s just giving it a little breath of fresh air and it’s now I’m re-energized. That’s what these loops require. Ambiguity loops require specificity. That’s what we are supposed to provide as software engineers.

Jeremy Jung 00:36:01 And as you have these longer loops or conversations, one of the things about these tools is they have a context, and that context is only so large. And I wonder if you ever run into the issue of, I believe they call it context compaction, where eventually the model has to kind of get rid of some of the context that you had before. And if you’ve run into that and how you deal with that.

Scott Hanselman 00:36:24 Yeah, so I just put in context here, I’m 73% into a 200,000 token context window. That is often a thing in these loops where it’s talking to you and it’s talking, then it kind of goes, huh, what? Oh, I don’t remember bro. What? And that’s because it just compacted everything, summarized it. And it doesn’t remember an hour ago. That happens a lot on these tools. It happens less so with the infinite sessions that GitHub Copilot CLI has because it does very, very active management of that window. But I’ve also with any tool, found it to be very helpful to keep telling it to save stuff. And I don’t like to anthropomorphize these things. They’re not people. But I have a context window and I also have a remarkable E- Ink thing and I sync to paper. Like tell it to write stuff down as if it is a person.

Scott Hanselman 00:37:17 So this big loop that I’m running on the Claude bot over here, I said, every time you achieve something write it down. And if you look at things like Steveyegge/beads, which is basically, get focused to-do list that’s distributed across multiple agents that are working on a swarm of problems. If everyone can agree on a ledger of the things that we’re working on, those context Windows matter less because if it wakes up, it’s going to go and say what? What am I doing? What was I working on? Just the movie Memento, when he forgets stuff, he tattoos it all over his body, he wakes up in the morning, he doesn’t know who he is, and he gets caught up by looking in the mirror and seeing all the tattoos. So anytime anything interesting happens in his life, he tattoos it on himself to catch himself up.

Jeremy Jung 00:37:59 Do you ever end find yourself doing that manually where you yourself know what the context should be and you just go, here you go. This is, remember all this or?

Scott Hanselman 00:38:09 Yeah, so context Window management is not something I have been actively doing with the Copilot CLI because of the sessions feature, but I still think it has a huge amount of value because it’s not about their context, it’s about your context. When you Jeremy inherits the code later, for example DPI, we were having a dots per inch issue with the open live writer and it was complicated. I spent a day messing around with DPI, because multi-DPI, multiple monitors is complicated. So, I said, write down everything that we’ve learned so that no one has to ever deal with this again. And it made a two meg markdown file, it effectively wrote a chapter of a book about nuance and I’m running the Claude bot now Moltbot on Windows under WSL. And, every once in a while, a Windows ism will come in and I said, regularly check for Windows isms and remember it so that you never hit it again. And that’s not thinking about the larger context window that’s just like put a post-it note on your monitor to remind you of this. And with any project I’ll end up having 10 or 15 of those that are secondary to the main plan. The main plan could be in something like beads or it could be in GitHub actions, or it could be a markdown file with check boxes.

Jeremy Jung 00:39:20 And the main plan and these additional notes that you provide, is there a way that consistently works across whether it’s GitHub Copilot or it’s Claude code or Gemini CLI is there a, hey this is the markdown file you put it into and all these things, we’ll understand it.

Scott Hanselman 00:39:39 They’re all fighting right now about what the thing will be called. There’s AGENTS.md and there’s SKILLS.md and you know how there’s a robots.txt, which is the thing that your website puts up in order to tell robots what to do. And then there’s the LLMs.txt, which is literally a text file you put on your website in order to tell OpenAI or whoever’s scraping your website what to do about it. If you have an AGENTS.md, it’s a read me for agents and then within that I’ll say there is a docs folder and within that docs’ folder here are the main subsystems and you’re effectively building architectural context. And then if it finds itself doing a thing, then it will go and refer to the documentation. So, I taught the Claude bot about my blood sugar, which is unique to me. Anytime I mention anything blood sugar related, it doesn’t have to hold it in context, it just has to know, oh there’s a post-it notes with the details that I need to know. And then it pages it in, does the thing and then it pages out.

Jeremy Jung 00:40:37 Something you mentioned earlier was you were referring to this Ralph Loop. What is that?

Scott Hanselman 00:40:43 Yeah. So Ralph Loops are fun. So, Ralph Wiggum is a character on the Simpsons, and he is an extremely naive and extremely persistent, funny little man. What we admire about Ralph is that he is persistent, and he will never give up and it is designed for these continuous autonomous development cycles. And Ralph Wiggum, the concept came up with a gentleman I named Jeffrey Huntley and he’s basically trying to do less, and this is a thing that I think we need to acknowledge as programmers that we are ultimately very lazy. So, if it can be automated, that’s a good thing. So, he’s like, why am I doing this work? Let Ralph do it. So, the Ralph Loop kind of embodies the spirit of this little character on The Simpsons, which is, I’m going to stubbornly iterate, I’m going to fail, but I’m never going to give up until it succeeds.

Scott Hanselman 00:41:33 So a really great example of a Ralph Loop is not necessarily the tech journalists will say, oh you can just point it at some software, and it’ll clone it, and it will make a Minecraft clone. It’s not going to be real, right? But David Fowler distinguished engineer in .net had a really interesting problem where he had 400 issues in his GitHub repo for an application and it didn’t have good reproductions of the bugs. And that’s just one of those things, it’s just technical debt and we don’t know if they’re bugs, it could be good bugs, we don’t know. So, he made a Ralph Loop that said, spin through all 400 of these issues and reproduce them. If you get a good reproduction, keep track of that reproduction, then add that context to the issue. That is the very definition of toil and what better thing than a Ralph Loop to do that?

Scott Hanselman 00:42:20 And I think it came up with something like 384 good solid repos, right? Now the big problem with a Ralph Loop is that it doesn’t have a task tracker. So that’s where things get interesting. If you have a task list and Ralph can then keep track and understand what success looks like, then you’re going to have a great experience with it. So, I will do interactive work during the day and then as I get towards the night, I’ll be, okay, what’s a loop that it could work on overnight? Because I want to wake up in the morning and have something successful happen. It sounds wasteful, but it’s shockingly cheap. I’m a bit of an AI vegan, I don’t do images, and I don’t do video, but next token prediction is becoming extremely inexpensive and you could run a loop all night long and it might cost you two bucks. So, if you have a problem that is Ralph Loop shaped, it can be an extremely powerful tool. I think though that people sometimes think that it is the only thing, or they’ll take vibe coding is fun to say, and now Ralph Loop is fun to say, but it’s just another technique. It’s another tool like CICD.

Jeremy Jung 00:43:28 And is this where people have built this, I don’t know what you would call it, this harness that is going to manage and run the loop?

Scott Hanselman 00:43:36 Well, so Jeffrey Huntley calls it a wow true bash loop. So, it’s just like, wow, true? And the concept is very simple. It re-prompts your agent a set number of times with the same prompt. And we mentioned before that it’s an ambiguity loop and if you ask it something a hundred times, you’re not going to get the same result a hundred times. And then you refuse to let the agent say it’s done. But you have to identify what the stop condition is. And if the stop condition is unclear and it’s done anything at all to try to wiggle its way out, you say no, do it again. For example, all tests must pass, endpoints must be working, and then if it comments a test out and you can’t comment, you can’t change the tests or whatever, right? So, you’ll say slash Ralph loop refactor this to whatever, and it all has to pass, and you have to go and say step, step, step. And when it’s complete, when it’s complete, when it’s complete. If it doesn’t have the ability to see something, if it can’t see a log file, if it can’t run a test, then that doesn’t work. So that’s why test harnesses matter so much in that context.

Jeremy Jung 00:44:39 So you’re basically building your own while loop with a known stop condition that’s going to call out to your model of choice.

Scott Hanselman 00:44:49 Yeah, and then you can also mix and match models. So, I’m on Copilot, I’ll hit slash model, I hit enter, I’ve got 14 models, so I might plan on Opus and if I want to save money, I might then do the work on Sonnet and then I’ll have GPT5-2 Codex review it, just to kind of mix it up. Diverse teams are a good thing. So having only one model do it and then review it, sometimes you can get a little bit of sycophancy. It’s nice to get fresh eyes, and a way to do fresh eyes is throw another model at it. So, I can switch models mid-conversation.

Jeremy Jung 00:45:23 For your own personal projects, what does that bill look at the end of the month?

Scott Hanselman 00:45:30 Well so far, I spend 200 on Claude and then I pay whatever the Copilot Pro plus is and those are the two that, oh, and then I have a subscription to OpenAI. So that’s basically it. I mean, here’s the thing, right? What do you value your time at, right? If you just pick a number like a hundred dollars an hour and you say, my time’s worth a hundred dollars an hour, which is a big number and one magical moment once a month saves you a hundred bucks an hour, oh my God, yes, I will do that. It has definitely brought me more joy than Netflix .

Scott Hanselman 00:46:02 And the Netflix is 11.99 or whatever it is. All it takes is one amazing moment and you just go, oh wow, that was, I just paid for the whole thing, you know what I mean? My son just got a climbing gym membership yesterday with his own money. He’s 18 and it’s $90 a month. And he’s, oh man, this sucks. I climbed for two hours, it’s $45 an hour. And I’m, no dude, climb four hours a week. It’s $16 an hour, it’s five bucks an hour. And he’s, oh my son. He is, bro, you’re right. I need to climb more. Yeah, the more you use it, the better dude. So, I’m what is it here? Copilot Pro plus is like 40 bucks and you get 1500 premium requests. And I’ve used, how many have I used here session?

Scott Hanselman 00:46:46 I’ve used 63, right? I’ve upgraded Open Live Rider. It’s not even, it’s not even close. It’s so valuable that it is a delight. So, I honestly am not, I’m not thinking that much about it simply because I said before, I’ve got a rocket ship on my back. I’m busting out so much cool stuff and people might say it’s a toy, this thing’s a toy or that thing’s a toy. But there are toys that I was meaning to get to. I’ve got a podcast as well; it’s been going on for 20 years, but it’s never had a backend admin site. I’ve just been editing text files and manually editing databases because I was, who’s got a weekend to make a pod? I don’t want to make a pod. It took me 43 minutes to make a React website that does the backend admin for my podcast. That’s worth whatever my subscription is. So yeah, did not take any time at all to make that decision.

Jeremy Jung 00:47:36 Yeah, I mean I think as developers, for whatever reason, a lot of us, we undervalue paying for tools.

Scott Hanselman 00:47:43 Oh my God. Developers would just love I’m a $99, never. I’ll write my own, take nine hours to do it. Do you know what I mean? So yeah, I mean I’m sure that these AI companies are burning money, and the pricing will figure itself out. This is just when Netflix and streaming started, and they couldn’t figure out what the pricing is. But they’re all trying to figure it out. And it’s also going to get faster, it’s going to get more environmentally friendly, we’ll have more hybrid models, we’ll have more local models. I can plug in Ollama, I’ve got a 40/80 I could do work on work locally. There’s lots and lots and lots of options and it’s only going to get easier and hopefully cheaper. And that’s important as well. I want this to be available for everyone.

Jeremy Jung 00:48:22 Yeah. One of the things you mentioned is that there are models you can run locally. What’s the state of that? How do they compare to the ones that you pay for from cloud services?

Scott Hanselman 00:48:34 It is challenging to get really good deep, thoughtful plans from those local models. But if you have a good plan that was made by a big cloud model, you can then execute on that plan with some of the smaller ones. I think though that people don’t realize that these models in the cloud know more than just coding. They could go and translate Star Wars into Shakespeare, but I don’t want that feature. I don’t need that. So, when I download a model, I would like a Python optimized model that doesn’t know how to do anything but Python. It doesn’t, it shouldn’t have, how tall is Brad Pitt embedded in the model that’s wasting space. So I think we’re going to see local models that are very specific and you can go on hugging face and find Llama coding models, Deep C coding models that are specific programming languages or problems. That I think would be really interesting, save money, time and energy by solving it locally to think of it as a reddish cash for LLMs, but locally.

Jeremy Jung 00:49:32 I also wonder, because there’s so many things in these models, how much of that knowledge actually helps it code? Because we have knowledges of coding.

Scott Hanselman 00:49:44 I don’t know that’s a great question. They all know how tall Brad Pitt is and they insist on bringing it with me everywhere I go. But a really fun way to test this is to just go and download LM Studio or LM studio.ai. It’s a delightful application. Third party app or Foundry local or Ollama. And then just throw a couple of problems at it. And if you like the answers then maybe you can do some coding on an airplane.

Jeremy Jung 00:50:07 One thing we haven’t really talked about is these agents or this Ralph Loop. I believe in a lot of cases it has full access to your computer. Is that correct?

Scott Hanselman 00:50:18 That is 100% up to you. So, for example, if I were to fire up Copilot CLI or Claude in a folder on my machine, if I just made C:temp Jeremy Copilot, it’ll say, do I have permission to open this folder? That’s the same thing that you get when you open it in Visual Studio Code and it’s, do I trust this? Are we cool? No. So that’s preventing it from accessing that data. So, there’s basically three things that an agent can do. Can it access a URL, can it access a tool, or can it access a path? So, by default it gets no tools, no paths, no URLs. And then every single time in a session it will ask you, can I do that? And it can be very annoying and you say, yeah I’m going to write a website for Jeremy; go and do this.

Scott Hanselman 00:51:05 And it’ll be, oh can I Google for stuff? Eh, yeah, go ahead, fine. Okay, now I have access to Google. Okay, cool. Can I write the HTML file to the disc? Is that cool? And it’ll show you what it’s about to write and then you say yes. And then at some point you’ll say, well yeah, you can just write to the disc for the rest of the session. We’re good. Stop asking me. And then you build up this knowledge base of what it’s allowed to do and what it’s not allowed to do. What people are starting to do is they just go dash dash YOLO, You Only Live Once. If you fire up an agent, dash dash YOLO or allow all tools and it’s running as admin, it’s running as you, so then you bring in things like sandboxes. You could use Windows Sandbox, you could use Docker Sandbox, you could use WSL.

Scott Hanselman 00:51:45 A lot of people are really enjoying WSL on Windows. And then you can unmount the drive or you can use Docker and use volume mounting. But simply running it in a container isn’t security, it’s one layer of security. Like, a true sandbox — if you’re going to run this as a bank, you’re going to have auditable governable layers of what it’s allowed to do, what functions, what sys calls it can make, et cetera. So, I think we are in kind of YOLO mode, but no, by default they do not have access to all of your stuff. But if you ran it in a loop in YOLO and went to sleep, I’ve got a 3D printer, do you have a 3D printer? As a general rule 3D printer, people know not to leave the 3D printer on when they sleep because it could burn your house down. And there’s always somebody that ends up doing that, and they get spaghetti when they come home and it’s just all over the place. And it could cause a problem. So, I don’t leave the house when the printer’s going, and I don’t leave the house when I have a YOLO Ralph Loop running.

Jeremy Jung 00:52:39 But in general, I suppose it sounds like when you are going to use these tools, you do put it in YOLO mode and maybe hope for the best?

Scott Hanselman 00:52:47 No, well if I’m standing in front of it and I’m actively doing it, usually I’ll say allow all tools because I know the tools that it has, I tend to lock them down by path. So, for example, I’m in D GitHub, open live rider, it can go ham on anything from there down, it can talk to anything on the web and it can talk to anything from there on that path down. And it can call any tool. I’m sure it could decide randomly to format my hard drive, but it’s not running as admin. So that would fail. I suppose it could randomly have a reason to go out and hit the internet somehow and do something problematic. But again, it’s working on one specific thing. The loop that it’s doing is looking at a log file, running a compiler and doing it again. That’s different than Claude bot or now called Moltbot, which is a basically Siri as a Ralph Loop running on a server somewhere or running in a background session. But yes, you don’t know. But giving it access to your source code and having if it deletes it then I’ll just go and get reset and I’ll get it back. I’ve had it do dumb stuff. I’ve had it push when I didn’t want it to push, I’ve had it commit stuff and then I just undo and go back to running. I always run them in Git, I want to call that out though. I always have source control because if it makes a mistake you back out.

Jeremy Jung 00:54:06 Yeah. because I was thinking of the example you gave earlier where I believe it was looking at processes and killing them and things. So, to me it seems it must have quite a bit of permissions if it’s able to do that.

Scott Hanselman 00:54:19 Well. So, when it runs as you, it has all the permissions that you have. So, for example, the loop that I’m running over on this other machine, I told it that it can fire up the Windows app, save the PID of the app that it just ran and it can kill that PID, it can kill that process. That doesn’t mean it couldn’t accidentally kill all of the processes. And I have one time two weeks ago had Claude code kill itself by doing a stop process and it was a little bit too *.* and then it woke up and it’s, what happened? How long? It literally said how long have I been out? Wow.

Scott Hanselman 00:54:52 Which I thought was crazy. It literally is, how long have I been awake? And I was just, and I said, check the markdown files, see where you left off and keep going. And it picked it up but it basically bonked itself on the head with its own hammer and that’s not a guarantee. So, then the question is, is it a sandbox to put in a markdown file? Hey man, don’t do that. No, it’s not. That’s obviously not a sandbox. So that’s why running it in a container or running it in WSL or whatever and that that’s non-specific of any operating system. If you decide to double click on a batch file and it does something stupid, I don’t see that as being any different than running an agent. The only difference is the ambiguity, which gets us back to the beginning of the conversation: Ambiguity loops are ambiguous.

Jeremy Jung 00:55:34 Yeah, I mean I could think of an example of if you’re giving it access to the internet and maybe it downloads a utility, I think it’s going to help.

Scott Hanselman 00:55:42 That’s a great example. Yeah. Downloads a utility that your company has an okayed and then suddenly you’re in trouble. Right. Yeah. And I think that the difference between making a toy application that plays with your blood sugar and your spare time and working at a bank are different. So yeah, I would not go running anything this in YOLO as admin in a bank.

Jeremy Jung 00:56:00 Makes sense.

Scott Hanselman 00:56:01 That said, if it’s your machine and what you’re doing and you have this folder and down, there’s a lot of power in this. And the nice part about it is that it’s a slider bar of good, better, best. You don’t have to say it’s not between bother me all the time and YOLO, you can give it, and you can do this as a team. The whole enterprise could roll out, here are the five approved tools, here are the six approved websites, here’s the SKILLS.md that we’re all going to use. We check it all in with the repo and everyone’s on the same page.

Jeremy Jung 00:56:27 I had an interview with Brian Cantrell, he’s the CTO of Oxide computer.

Scott Hanselman 00:56:32 Yeah, yeah. I was on Oxide with friends.

Jeremy Jung 00:56:35 Oh, nice. And one of the things he mentioned with working with large language models is he thought it was really advantageous to use strongly typed languages, like say Rust, because the model is able to verify when things are not correct. Just through that compile step and through the design of the language. I wonder in your impression, whether you found that true for yourself? Are there specific languages that work better when using these agents?

Scott Hanselman 00:57:09 That is a very good question. I think that is a good opinion by Brian. And as a general rule, I agree with it, but I would argue that in lieu of strong typing, one overcompensates with tests. So, correctness is what Brian is arguing for and that is valid. If I am using Python, like I am for my blood sugar thing, having 200 tests make me worry less about how Python compiles or doesn’t. With my .Net application, I have fewer tests, but I’ve got 419, I’m looking at these 419,000 lines of code and it all compiles. But compilation doesn’t equal correctness, and strong typing doesn’t guarantee correctness. So, I would say I mostly agree with Brian, but if you go back to fear-driven development, if you’re afraid of the code, you don’t have enough tests. So, I write tests until I’m no longer afraid regardless of language. I would say I am picking the right language for the environment, not picking a language based on whether it’s strongly typed or not.

Jeremy Jung 00:58:12 Probably a similar philosophy to when you’re writing the code yourself. Even before these tools.

Scott Hanselman 00:58:18 A hundred percent. Although I am finding myself writing more code in languages that I don’t know because it’s the right language. Not everything has to be a .Net application that’s a complete self-contained cross-platform thing. Sometimes a hundred lines of Python is fine.

Jeremy Jung 00:58:37 Yeah, I think that would be an interesting shift for people because I think there does tend to be people who they know, let’s say they know JavaScript, so they go, okay, I’m going to write everything in JavaScript or Node or TypeScript, and maybe this removes some of the difficulty in getting started, let’s say Python as an example. If you’re working on something in the machine learning space, even if you’re not an expert in Python, these tools can help get you there.

Scott Hanselman 00:59:06 Well, the blood sugar thing is meant to be a skill that is plugged into an agent. So you put it in, see users, Jeremy Copilot skills or see users, Jeremy Claude skills. So, I did it in Python because it’s one file, one markdown file and one Python file and you have Python and I have Python, it works. I could have made it a .net application, it would be larger, it would require a runtime. So that wasn’t necessary. So, I used Python for portability, even though I don’t know Python, I’m confident that it will work for you, and it’ll have a good experience. But I might have, I’m trying to think about one, which one of these is a .Net application, my, my live rider application. I wouldn’t do it in anything but.net. because that’s what it’s good at. So, pick the car brand that is going to make your trip the most delightful. I’m definitely focused on delight right now.

Jeremy Jung 00:59:49 I think that’s pretty good place to end it on. But is there anything else that you think we should have talked about or you want to mention?

Scott Hanselman 00:59:55 I just think that it’s a funny moment that we’re in where people who are craftspeople are worried that the craft will go away. And I hope that it is just that the toil will go away, but the craft remains. I am having a blast. I hope I will be having a blast this time next year. Maybe we’ll chat and see how it turned out.

Jeremy Jung 01:00:14 All right. You mentioned you have a podcast, you actually have many podcasts. If people want to check out what you’re up to, where should they head?

Scott Hanselman 01:00:22 Just go to Hanselman.com. Hanselman.com, my last name, if there is a tariff on podcast, I’m afraid I will have to pay the tariff. I tend to yap and I apologize for that. But Hanselminutes is my podcast. You can get to it hanselman.com. It’s been around over a thousand episodes over the last 20 years. And then I have a new podcast with Mark Russinovich, the CTO of Azure, where we learn something new every week. It’s called Mark & Scott Learn To. So those are two that people can check out.

Jeremy Jung 01:00:45 Yeah, I highly recommend both of those. It’s also amazing how long you’ve been doing Hanselminutes. I think it’s been, you said over 20 years, right?

Scott Hanselman 01:00:52 Yeah, it’s been a minute. It’s been a minute. Every Thursday for the last thousand plus episodes, it tries to showcase faces that you maybe not ordinarily see. There’s the same three or four people that do the shows and I’m guilty of that. But there’s a lot of cool people doing a lot of amazing work. I recently had a lady who is the IT manager of the Philadelphia airport, and I learned all about how airports run, and you don’t see folks that on a podcast. That’s amazing. So yeah, showcasing of people that are doing great work.

Jeremy Jung 01:01:19 Very cool. Well Scott, thank you so much for coming on Software Engineering Radio.

Scott Hanselman 01:01:23 Yeah, it’s good to see you.

Jeremy Jung 01:01:24 This has been Jeremy Jung for Software Engineering Radio. Thanks for listening.

[End of Audio]

Join the discussion

More from this show