SE Radio 609: Hyrum Wright on Software Engineering at Google

Hyrum Wright, Senior Staff Engineer at Google, discusses the book he co-edited, “Software Engineering at Google,” with host Gregory M. Kapfhammer. Wright describes the professional and technical best practices adopted by the software engineers at Google. The wide-ranging conversation investigates an array of topics, including measuring engineering productivity and writing effective test cases.

This episode is sponsored by the Algorand Foundation.

Show Notes

SE Radio

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Gregory Kapfhammer 00:01:13 Welcome to Software Engineering Radio. I’m your host, Gregory Kapfhammer. Today’s guest is Hyrum Wright. Hyrum is a senior staff software engineer at Google. He is the technical lead for the C++ concurrency primitives that they use at Google. Along with teaching at Carnegie Mellon University, Hyrum was an editor of and contributor to the book called Software Engineering at Google. That book is the topic of today’s show. Welcome to Software Engineering Radio Hyrum.

Hyrum Wright 00:01:44 Thanks Greg, it’s great to be here.

Gregory Kapfhammer 00:01:45 Thanks for taking your time to participate in the show today. In order to dive in, I want to read you a few quotes that come from the Software Engineering at Google book and if you would be so kind to do so. I’m going to read the quote and I would love it if you could contextualize it and then try to explain it. Does that sound cool?

Hyrum Wright 00:02:03 Sounds great.

Gregory Kapfhammer 00:02:04 Alright. The quote number one is as follows, ìWith a sufficient number of users of an API, it does not matter what you promise in the contract. All observable behaviors of your system will be dependent on by somebody.

Hyrum Wright 00:02:21 So, this is probably the thing that gets quoted the most and it’s come to be known as Hyrum’s Law — and I should be quick to point out that I did not name it Hyrum’s Law, so I want to be humble in that respect — but this is an observation that came from many years of doing software maintenance activities across the Google code base. And recognizing that every time we tried to change something as simple as a line number or a comment in a file or a log message, tests would fail in various interesting ways. And we started to realize that if we get enough users of our system, anything we try to change will break somebody in some way. This isn’t a new observation. There’s an XKCD comment about it. There’s other places you can find references to this online, but the presentation in the quote that you mentioned and the name Hyrum’s Law (?) came out of the book.

Gregory Kapfhammer 00:03:10 So I heard you say that sometimes you could make a very simple change and then a test case would fail. Can you give us like a concrete example of when you’ve had that type of experience at Google?

Hyrum Wright 00:03:20 Sure. So we were trying to reorganize one of Google’s core file API primitives, and the result was that we were moving functions into different header files. We were changing just the basic structure of these primitives to be more understandable and be more consistent across our code base. What ended up happening was that at one point we changed a comment line which ended up changing line numbers within a file. We didn’t think anything of it, but we run all the tests just to make sure we’re not breaking anything and tests started to fail. It turns out that some tests somewhere was including the line number of the log message in their validation in the test. Now this is not good practice. You should never depend on things like line numbers that you don’t have any control over, but somebody had copied the error message and pasted it verbatim into their test. And so when we changed the comment, it reflowed the file changed the line number and the test started to fail. So even the line number of an error message was an observable behavior that somebody was depending on.

Gregory Kapfhammer 00:04:21 All right, thanks for those insights. We’re going to talk more about testing later in the show, but now I want to turn to quotation number two. So here it is. ìTraditional managers worry about how to get things done, whereas great managers worry about what things get done and trust their team to figure out how to do it.î What does that one mean Hyrum?

Hyrum Wright 00:04:41 This is really about trusting as the quote says, the people that work for you to do the job in the way that they think best. As a manager, you may manage many different teams and many different individuals, and you cannot be the domain expert in all of the things that you’re responsible for. It can be tempting, especially if you have a technical background and lead technical teams to jump in and dictate exactly how they should solve the problem that you’re giving them. But if you’re smart and as the quote mentions, great managers will just tell the team what needs to be done and let them as the domain experts figure out how to accomplish it. Managers still needs to support the team. That doesn’t mean you’re completely hands off, but it also means that you trust them and their expertise to get the job done.

Gregory Kapfhammer 00:05:28 Okay, so those two quotations have revealed very clearly that the book has both technical and professional content inside of it. This leads me to the next question. When you were writing the book, what type of audience did you have in mind?

Hyrum Wright 00:05:42 So we really want to share the things that we had learned within Google with other practitioners, my other co-editors and I spent a lot of time talking at conferences and in other external forum about the experiences that we have at Google. And we found that we kept saying the same things over and over again and we wanted to be able to write down the lessons that we had learned in a way that made sense for this external audience. We also found out that there was a significant desire within academia to take some of these lessons and present them in college courses. And so every once in a while, I’ll find some random college course somewhere that lists the book as either a required reading or supplementary material for their course. And that really makes me happy because it means that this message is getting to a much broader audience than we even intended when we started writing.

Gregory Kapfhammer 00:06:31 So if you won’t mind me saying, it actually turns out to be the case that I teach one of those random software engineering courses and we use the Software Engineering book in the class.

Hyrum Wright 00:06:41 Well, I’m grateful to hear it. I hope that your students find it useful.

Gregory Kapfhammer 00:06:43 Yeah, we find it really useful. And one of the things I noticed in the book is that even though it’s called Software Engineering at Google, the entire book isn’t about topics like testing or debugging or writing computer programs. So why did you decide to have a book that has both technical content and additional content about things like knowledge sharing or project management?

Hyrum Wright 00:07:07 Software engineering encompasses a much broader scope than just programming. I tend to think of programming as carpentry. Somebody can be a very good carpenter and they can do very excellent work, but that’s not sufficient to build a house. You need architects, you need concrete workers, you need a bunch of other trades and skilled individuals to build the house. You need somebody that’s overseeing the entire project. And so software engineering in some sense really encompasses the entire process required to construct a software program or a software system rather. And not just programming both people that get hired into a place like Google as well as people that are in university, they think my job is to write software. I’m not successful unless I’ve written some code today. But our job is to solve problems and there’s a lot of pieces to the problem-solving process that are beyond just writing code.

Gregory Kapfhammer 00:07:59 Thanks for your response. I think that brings up a good point. Can you give a concrete example of how software engineering is different than programming at Google?

Hyrum Wright 00:08:07 So if your job is just to program, then you’ll spend your time optimizing for how much code can I write today? And you may write more code than you need to. Software engineering is an interesting field in that if you’re writing the same thing that you wrote yesterday, you’re probably doing it wrong. Software can be reused. We have libraries, we have all kinds of abstractions that we can build. You really shouldn’t, even though we teach undergrads how to write link lists, if anyone writes a link list in a professional setting, they’re probably doing it wrong. And those kinds of things I think really differentiate why software engineering as a whole, if we’re not careful, we can confuse it with programming and we can optimize for programming, and we really need to be thinking about the whole system and the whole software engineering process.

Gregory Kapfhammer 00:08:50 When I was reading the book, one of the things I noticed is that when you’re working at Google, scale and efficiency challenges are a big deal. And so you mentioned reuse a moment ago, but can you comment briefly what are some of the challenges you faced as a software engineer at Google when it comes to things related to scale or efficiency challenges?

Hyrum Wright 00:09:11 So I’ve often said the problems that Google are not problems of kind, email existed before Gmail, but no one had given a gigabyte at the time of data storage, right? That’s a scale problem. Search engines had existed before Google, but no one had given the comprehensiveness that the Google search engine did when we came out, right? So like there are problems of scale, not of kind. In software engineering, some of our problems exist about testing. How do we test everything that needs to be tested in a reasonable way? How do we index all of our source code and give discoverability into our process in a reasonable way? How do we ensure that we have that we have appropriate quality across our code base in a reasonable way, right? And when I say in a reasonable way, we’re talking about code base that’s hundreds of millions of lines large and that’s not anything that anyone can review manually. When we do migration, how can we do that in a reasonable way, in a scalable way? We’re really looking for sublinear scaling costs as opposed to something that’s scales with the side of the size of the code base because as the code base continues to grow, you don’t want to have to be investing ever more resources into just keeping it maintained.

Gregory Kapfhammer 00:10:18 You mentioned several concrete examples like for example, being able to index all of the source code. Can you explain why you would need to do that indexing and what would be the practical ramification of getting the indexing wrong so that it didn’t handle the scale that was necessary?

Hyrum Wright 00:10:33 So there’s a couple of consumers of something like a codebase index. One is, as I mentioned, the large-scale maintenance tools that read this index and then can run stack analysis and other kinds of tools across the code base to do either maintenance activities or find bugs or other kinds of analyses that give insights into the software for the owners of every project at Google. Another consumer is engineers themselves, right? They want to look at the code browser, they want to see what’s going on. We find that a great way of learning to write software is looking at other people’s software and finding out what they did. So if you’re using a new API for the first time, you can browse Google’s internal code base and see how is that API being used and what way should I be using that API. If we’re not indexing everything, then those two processes fall down. We can’t effectively migrate everything because we can’t get a complete picture of the world, and somebody may not be able to find the examples that they need in their existing or to make the changes that they need to their existing system.

Gregory Kapfhammer 00:11:32 So when you talked about indexing, it sounds like that’s a challenge you’ve faced in the past. I’m wondering since the book was published, have there been any new scale or efficiency challenges you faced?

Hyrum Wright 00:11:43 I think one of them that’s not just us but the entire industry, we’re looking at how can we use our physical plant more efficiently? How can we use the systems that we have? How can we run our software more efficiently so that we can better provide service to customers, provide service to more customers? AI has become a set of workloads that requires a significant amount of compute power in the last few years. And we’re thinking about how we can make sure that we provide those services to users in a scalable way without having to completely rebuild our physical plant. And so these software engineering scalability problems still have a lot of meaning today.

Gregory Kapfhammer 00:12:23 When you talked about the idea of indexing the code, you also mentioned the phrase you need to read the code. So I’m wondering can you talk a little bit about some of the strategies that you employ at Google when it comes to, for example, working in a team or sharing knowledge effectively?

Hyrum Wright 00:12:38 We think about code as something that is written once, but it’s read many times. And so we optimize for readability, not writability. And what this means is that style guides and other tooling essentially encourage maybe a little bit more robusticity in the code base, which may take more time to write, but also means that somebody coming on later will be able to read and understand that code easier.

Gregory Kapfhammer 00:13:04 That’s a good point. Later in the show I hope to talk about style guides. Before we get into the details about style guides, I wanted to point out a couple of really catchy phrases that I remember reading in the book. So this first phrase was called the genius myth. What is the genius myth at Google?

Hyrum Wright 00:13:22 So I love that phrase. That’s from one of the chapters written by Brian Fitzpatrick who I’ve known for many years. And it’s basically this idea that in tech generally we think about the genius programmer, the person that can do everything. And we sometimes hold up examples like Linus Torvalds or Guido Van Rossum or Bill Gates or somebody that is really a standout in their field and that’s actually not true. I’m not trying to say that those folks aren’t really smart in their domains, but if you look at how systems are actually built, they’re not built by just one person in their space, building a thing and it kind of springing fully formed from the mind of Zeus. They’re built over many iterations with many different inputs, with many different people involved. And over time that number of people involved, and that team will be better than any specific single individual.

Gregory Kapfhammer 00:14:16 So at Google, if there is a genius myth, how do you organize your teams to ensure that you don’t fall trapped to the genius myth?

Hyrum Wright 00:14:24 We want to ensure a lot of communication within the team that may be by email or chat or through code review. We want to make sure that things are done in as public a way as possible. Now there’s business constraints around how broadly you can share something within or without a company, but we want to make sure that things are being done as publicly as possible. That means getting early feedback on things like design docs or project design. Code review means getting, being willing to be receptive to feedback. A lot of engineers, myself included, a lot of early career engineers specifically can struggle with getting concrete and sometimes critical feedback on their code or their designs. It’s not that you’re a bad person, it’s that how can we help you improve your system? And the reaction to that is oftentimes we tend to hide things; we tend to keep it hidden until we think that we have the best product. But over time, that actually takes longer to get to the best outcome, and it doesn’t always lead to the best outcome because you can’t fix things farther down the road.

Gregory Kapfhammer 00:15:26 Yeah. A moment ago you used the word hiding and in fact that connects to another quotation. I remember in the book the quotation was something like hiding considered harmful. Can you tell us why it’s harmful to hide based on what you said a moment ago?

Hyrum Wright 00:15:40 So it relates to a lot of those things I just mentioned. We want to get early feedback on designs and implementation. We know that the team is stronger than any individual and so getting good feedback on a design in process will give you a better outcome. If you hide it for too long, then you might actually get to the point where you can’t actually make meaningful changes based upon feedback because certain parts of the process have already finished and you can’t go back and remake some of those decisions.

Gregory Kapfhammer 00:16:10 So it sounds like you’re saying if software engineers are going to hide their work, it may limit their productivity and also limit the productivity of their team. Exactly. So my next question is at Google, how do you actually measure engineering productivity? What strategies have you developed?

Hyrum Wright 00:16:27 So I wish that I could say here is a recipe that everyone can use, and we can, apply engineering productivity to your organization. If you just do what we did, I will take a step back and say the entire book was not written as a, you should do what we do. It was written as a; these are the problems that we have encountered, and these are the ways that we have solved them hopefully as useful to your organization. So as I talk about things keep kind of keep that in mind. So engineering productivity is a really difficult thing to measure. There’s a lot of concrete things we can measure. You can measure lines of code written or number of bugs fixed. There’s a famous Dilbert comic where one of the people says, I’m going to go code myself a minivan this weekend after the boss says that he is going to reward the number of bugs that people fix.

Hyrum Wright 00:17:11 So you can have bad metrics. The way that we approach engineering productivity is first of all we talk to engineering leaders. What do you want to measure? What are you trying to improve? What outcomes do you expect? And frankly, would you change your behavior if you get an outcome different than the one you expect? We oftentimes find engineering leaders are interested in knowing something but finding it out isn’t going to change their behavior. And so there takes a lot of honest discussion to make sure that we’re measuring things that are meaningfully going to change the behavior of the organization. And then we talk about goals and signals and metrics. Know a goal is something that you, the outcome that you want, it may not be necessarily clearly specified, but it’s the outcome that you want. A signal is the thing that you would like to measure.

Hyrum Wright 00:18:00 But oftentimes it’s difficult to measure the signal concretely. If I ask you is your happiness better today than it was yesterday? Like you might be able to say specific events happened that increased your happiness. But that’s sometimes a difficult thing to concretely measure. Metrics are the concrete proxies for signals that we can measure. We have to be careful in designing our metrics because we want to make sure that they are the things that actually properly reflect the signals that will give us the goal that we want. Everybody, most people will gain metrics if we tell them that this is what they’re being measured by. They will start performing to the metric. And so we want to make certain that the metric is the thing we care about. The other side, and this kind of goes against what I was just talking about in hiding, is that sometimes we will tell people we are measuring something, but we don’t tell them exactly how like that it will be used or that it can be used. We don’t want them to change their behavior based upon the fact that we’re doing measuring. We want to change their behavior maybe in some other way, but we don’t want them to just increase the number of changes that they make or the number of bugs that they fix because we happen to be measuring that thing as part of a more holistic whole.

Gregory Kapfhammer 00:19:14 Yeah, you made a couple really interesting comments, which is when we’re presented with a metric, we sometimes over-optimize towards that metric. Another thing I heard you say a moment ago is that sometimes engineering managers will want to measure something but then actually not change their behavior when they get that data. Can you comment? Why does that happen? Hyrum?

Hyrum Wright 00:19:35 I’m not exactly sure all the time. I think sometimes it comes down to we have a natural human reaction to want to hear the things that we want to hear, right? There’s a certain confirmation bias in people and we are looking for specific outcomes. Google is a data-driven company, but like many data-driven organizations, there is a temptation to find the data that matches your existing worldview. And so if that’s going to be the case, then it doesn’t really make sense to try to measure productivity in a different way. Now I would say this is pretty rare. Most of the time when people are looking for an engineering productivity experience, they’re trying to find something that will actively help their organization and they’re willing to adjust based up on the things that they find.

Gregory Kapfhammer 00:20:19 I know you were one of the leaders of Google’s code health team. First very quickly, could you define what the code health team was, and then perhaps could you share a concrete example of how you measured productivity on that team?

Hyrum Wright 00:20:32 So our team was responsible for building out a platform that basically managed all of the large-scale change automation that flowed into Google’s code base. Now there’s a lot of words there and I don’t know how deep we’ll get into some of the large-scale change stuff I think a little bit later. But that team basically provided service to other teams to ensure that they were able to make effective broad sweeping changes across Google’s entire code base. One of the ways that we measured productivity was not necessarily in how our specific team performed because ultimately, we’re serving other parts of the company. We’re looking at things like are we able to effectively provide service to those other teams? How easy is it for them to onboard into the platform? How easy is it for them to integrate with the platform if they’re already on or if they’re adding new changes into the system? Simple things like throughput and latency of changes that are flowing through our platform? Ultimately our platform was responsible for making engineers across Google more effective and more productive. And so those metrics helped us know if we were doing our job.

Gregory Kapfhammer 00:21:41 Thanks for that response. If listeners are interested in learning more about measuring engineering productivity, they might want to check out a prior episode of Software Engineering radio, which was number 317. We’ll reference that in the show notes for greater details. Hyrum, what I’d like to do now is turn our attention to something you mentioned previously, which was the idea of style guides and code reviews. I know you’re an expert in C++ programming. Could you give an example of a style guide that Google adopted in the context of C++?

Hyrum Wright 00:22:12 Well firstly I’d say I’m not sure anyone could be an expert in C++. It’s a very complex language and ecosystem, but I appreciate the comment. Our C++ style guides published, it’s publicly available. You can search for it and find out some of the rules and requirements that we have. We have a number of rules I mentioned earlier that are optimized more for the reader and not necessarily for the writer. Things like don’t use auto all the time. When the auto keyword was first released as a type of type deduction in C++11, there were a lot of supposed commentators that said, always use auto, use it every place you can. It helps the compiler it helps the ecosystem or what have you. We have found that having types spelled out where reasonable, is actually an important part of program comprehensibility. It means that the reader of a piece of software doesn’t have to go figure out what this auto stands for. It may take a little bit more typing and a little more time, but it makes it easier for a reader to come through. Now like most style guide rules, there are reasonable places to use auto. It’s not a pure ban on the auto keyword, but it is a place where we think you have nuance and use good judgment instead of just always use it everywhere.

Gregory Kapfhammer 00:23:26 So do you have a specific strategy at Google for enforcing the adherence to these style guides?

Hyrum Wright 00:23:31 We have a number of tools. So we have a number of stack analysis tools and linters and other kinds of pipelines. We’ve found that style guides are most easily complied with when engineers are helped with them. So for example, historically style guides have had a lot of content around formatting, you have to use tabs or spaces or indent this way or, braces go on this line or am I using K and R style or some other style, right? Like those are actually uninteresting questions, just pick a style and use it. But it also means that we want to be able to enforce that programmatically so we can check and see is your code compliant with Google’s whitespace guidelines, but also give you tools to enforce those. So for example, claim format, which was written at Google as part of a large-scale change process that will run automatically over code and just style it the right way. As an engineer, I no longer have to remember where does the brace go or what does the whitespace have to look like because the tooling enforces it for me and that’s really powerful.

Gregory Kapfhammer 00:24:33 So are there examples of things in your style guides that you can’t automatically enforce with a tool?

Hyrum Wright 00:24:38 The auto examples one of them, right? Because it requires a little bit of nuance. There’s a number of other pieces. It’s been a while since I’ve been in this style guide so I don’t have any off the top of my head sadly, but there’s a number of other things that require a little bit of nuance. Should I name this function this way? What does the comment look like? Maybe those are some good examples.

Gregory Kapfhammer 00:24:55 So when I’m contributing myself to an open-source project, oftentimes these linters or other checkers are run in continuous integration. Another thing that I often run in continuous integration is the test suite. So what I want to do now is turn our attention to some of the testing techniques that you’ve explored at Google. I have to say you wrote an interesting talk for the C++ conference and I think it had a tongue in cheek title. The title was All Your Tests are Terrible: Tales From the Trenches. Can you tell us a little bit what was that talk about?

Hyrum Wright 00:25:29 So that’s actually, a talk that I co-gave with Titus Winters who was another editor on the Software Engineering at Google Book. And honestly that was the most fun talk to give and I think that it has really stood the test of time over the last 10 years. As we do a lot of changes to the Google code base, we get to see a lot of tests, we run a lot of tests and we fix a lot of tests and we noticed a bunch of anti-patterns in testing that we kind of boiled down and distilled into this talk. I’ll let your listeners go and find the talk themselves. It’s on YouTube, you can find it, but there were patterns, things like having too much boilerplate, not testing the right things, right? If your system uses a vector for instance, like don’t test the behavior of vector which may change, this goes back to Hyrum’s Law, test the behavior of your system making sure that your tests are readable and understandable to users, to other people that may come back, come back through all these were kind of common principles that we found as part of our experience in digging through everyone else’s tests across the code base.

Gregory Kapfhammer 00:26:36 So later we’re going to talk about an example of what might be a quote unquote good test case. But before we do that, I remember you used the word boilerplate and that connects to something in the book, which was about writing the quote unquote smallest possible test. Can you comment briefly what is the quote unquote smallest possible test for some source code component,

Hyrum Wright 00:26:56 You want to test the basic behavior. Lots of tests will want to test all the corner cases and every everything and that’s useful, but sometimes you actually find that in a single test case, we’re going to test a bunch of different scenarios in a single test case. Start by writing the test for the common scenario, then write separate tests for the borderline conditions, the cases that you expect to fail or expect to have odd behavior. One of the interesting things about tests is that they are the first user of your system in many cases and there is great opportunity to test the APIs to experience what a user of your system will use. Writing the simplest possible tests gives you a good feeling for how a user is going to use your system in the common case.

Gregory Kapfhammer 00:27:43 You’re making a really good point. I often think of test cases as being like executable documentation. Yes. One of the things I heard you mention is that when you make the test small and focused, it helps you to avoid some of the gotchas with testing. I’m predicting that one of those gotchas is flaky test cases. In fact, the book says that flaky test cases are expensive at Google. So this leads me to my next question. What is a flaky test and why are they expensive at Google?

Hyrum Wright 00:28:12 So a flaky test is a test that behaves differently or appears to behave differently under the same set of conditions. So if I run the test, it may pass and then I run the test again and it fails. I haven’t changed anything about the system under test and I haven’t changed the test itself, but I’m getting different outcomes. Now something has changed in some way unless there’s a randomness inherent to the test. Something has changed but it’s often not considered part of the thing being tested. The reason why they’re expensive, there’s a couple of reasons. One is that we tend to rerun flaky tests multiple times in order to actually determine if they’re flaky or not. This has obvious expense in terms of computation cost. The other one is that oftentimes the people running your tests aren’t on your team. So if tests are a way for other people to validate whether their changes interact correctly with your software and you have flaky tests and they run them and they fail, then they have to spend a bunch of time learning about your system, understanding how or why this test is flaky and trying to figure out whether that failure is actually related to the thing they have changed or whether it’s just a part of the flakiness of your system.

Hyrum Wright 00:29:29 And because of the scale of tests at Google, even if we have tests that are flaky one 10000th of the time, we’re running enough tests that those will show up with high degree of probability across a code base wide test run.

Gregory Kapfhammer 00:29:46 It sounds like you’re saying that flaky tests are expensive for two distinct reasons. Number one, they have a computational cost and then number two there’s a human cost associated with figuring out why the flaky test is failing and then deciding what you’re going to do. So this leads me to my next question. How does Google handle flaky test cases?

Hyrum Wright 00:30:06 Well, on a technical level, we rerun the flaky tests multiple times during off peak hours to try to determine whether they’re flaky or not and the degree to which they’re flaky. Our technical ability has increased a lot over the last several years. And so even when we run all the tests, the continuous integration system will determine should I run these flaky tests or not? Are they likely to give good signal as to the correctness of the change that I have under test? And then it will surface that signal to the person that’s requesting the testing so that they can make a decision whether this test is something that they should look at or not.

Gregory Kapfhammer 00:30:39 There’s two other testing topics that I wanted to quickly talk about. The book mentions that software engineers at Google measure the adequacy of their test suite through something called test code coverage. What is code coverage and how do you use it at Google?

Hyrum Wright 00:30:52 So code coverage is again a thing that if you’re careful, you make the metric the goal and code coverage is a tool, but it’s not the end all of testing. It’s a way that we can determine whether a specific code path is exercised under tests. So if I have a bunch of, if statements in my function that I’m testing, my tests only hit the true condition on all those statements, I’m really not testing the full functionality of that function. I’m only testing a specific path. And so code coverage is a way of determining which paths through a function are being run while the system is under test. There’s a couple of other techniques that we use as well. We use mutation testing where the system will not entirely randomly, but it will change lines of code and then rerun the tests and say, wait a minute, with this kind of mutation we would expect a failure in your test suite but your test suite didn’t fail. This likely indicates that you don’t have sufficient coverage over this part of your code. If we’re not careful, we can just optimize for coverage because coverage is useful but it’s not sufficient for a comprehensive test suite.

Gregory Kapfhammer 00:31:57 That’s a really insightful response. I remember you mentioned a while ago that we want to have the smallest possible test, however, I noticed that the book also advocates for the use of something that’s called a test double. What is a test double? And then at Google, what are the trade-offs that you face when it comes to using these test doubles?

Hyrum Wright 00:32:16 So we talk about fakes and mocks, test doubles as a kind of a class of things here. And oftentimes we’re testing against complex systems. So if I’m testing against say a database, I want to test my software. My software is really the thing that’s most important to test, not the database, but I still need to interact with the database to exercise my software appropriately. And so a test double is a stand-in for a complex system. Maybe it’s a small database that has specific data in it that I can interact with directly. Maybe it’s a pre-scripted set of responses to a server that is somewhere else. And so when I issue requests, I get a specific set of responses back from the server. It may live in memory in the same process as the system being tested so I don’t have to worry about network effects or availability of the remote server.

Hyrum Wright 00:33:06 Those kinds of things are really useful for testing. They also add another complexity because they aren’t the real system. So we’re getting farther away from the actual production environment that our software is going to live in. And so if we’re not careful, the double cannot accurately reflect the software as it will eventually be run. And that can cause false confidence if we’re not careful. The other problem is that they have to exist and oftentimes teams that create expensive systems aren’t the ones that are running tests against them. And so they’re not necessarily incentivized to create the doubles and the other teams would then use to run their tests against. And so there’s an incentive mismatch there that if we’re not careful can mean that other team’s kind of build test doubles in a half-baked kind of way and it’s not really a sustainable way of using these across the ecosystem.

Gregory Kapfhammer 00:34:49 You’ve given us a lot of really interesting insights into some of the low-level details related to testing at Google. I want to quickly take one step back. We’ve talked about flaky tests or test doubles and test coverage, but earlier in the show we talked about scale and efficiency challenges. Can you comment a little bit about the scale of testing at Google?

Hyrum Wright 00:35:10 So I can’t give specific numbers, but I will say that we run a lot of tests that it’s possible with any candidate change to test the transitive closure of things that are dependent upon that change. So for example, if I’m making a change in a low-level library that most of the Google infrastructure uses, I can run essentially all the tests at Google against that low level infrastructure. This can be expensive. So we have various techniques to do test pooling, test selection, and pruning the tree a little bit. Because in some sense I really care about the tests that are close to me in the dependency graph and not the things that are farthest away. If one test far away fails, it’s probably more likely to be flaky than actually be a bonafide failure of my specific change. And so figuring out how to effectively run tests in this way is an interesting computational challenge.

Gregory Kapfhammer 00:36:07 You talked about it being a challenge when it comes to computation, and I want to quickly turn our attention to a discussion static analysis across the Google code base because I expect that’s a challenge in a similar way. Before we get started with that topic, can you briefly comment what is static analysis and how do you use it at Google?

Hyrum Wright 00:36:26 So static analysis is a technique by where we can look at the source code of a program and by analyzing just the source code, not compiling, and running the program, but just by analyzing the source code, we can learn things about the behavior of the program. This is contrasted with dynamic analysis, which is running the program oftentimes with instrumented libraries and learning additional things about the program. We do both kinds of things, but we find that static analysis can actually be really, really powerful.

Gregory Kapfhammer 00:36:58 So in light of the fact that static analysis can be powerful and now you’ve introduced that testing is a kind of dynamic analysis, can you briefly talk about the trade-offs or the benefits and limitations of static and dynamic analysis?

Hyrum Wright 00:37:12 So stack analysis can’t catch everything obviously, and that’s okay. This is why we have many different sets or many different steps of assurance, but we want to catch as many things as we possibly can inside of static analysis because it’s earlier in the development cycle, we don’t actually have to run the tests, we can run the static analysis process over the coding question. There’s a great story, and I don’t remember if it made it into the book or not, but for a long time we’ve had threading annotations available to users within Google’s code base. So you can annotate a variable as I must hold a specific lock as I access this variable, or I cannot call this function while I’m holding this other lock. And the stack analysis tooling will actually verify that you are holding locks in a correct way and will fail to compile the program.

Hyrum Wright 00:38:01 If you’re not, this is much better than getting paged in the middle of the night because you’ve got a deadlock somewhere in your system just crashed. There was a team that was getting a bunch of failures in their system and they weren’t common, every couple of weeks the system would crash, they didn’t really understand why, but they could never track it down and then they had a team fix it or they had a team experience where they’re going to, we’re going to go add a bunch of these stack analysis annotations to our code base. And in doing so, their code failed to compile, gee, that’s weird. I wonder why. Oh look, there’s a bug here. They fixed the bug and it fixed this problem that they could never track down on their own. But now with the additional insight that Stack analysis gave them, they were able to fix that bug.

Gregory Kapfhammer 00:38:44 Wow, that’s a really good example. I remember in the book it mentions a static analysis tool called Tricorder. What is Tricorder and how do various teams at Google use it for static analysis purposes?

Hyrum Wright 00:38:56 So Tricorder, and there is an ICSI paper about Tricorder from several years ago that folks can dig into. Tricorder is a platform for doing scalable static analysis. And so Tricorder itself doesn’t own the static analysis tooling, but what it does is it separates the consideration of scaling and domain expertise. So we have people that work in Java or C++ or any number of other languages that are really experts in that language and they can write stack analysis tools that say, this is a bug pattern, a bug prone pattern, you shouldn’t use this, or this is an API that we’re trying to migrate away from. Don’t use that, in fact, use this other thing instead. And what Tricorder does is it takes all of those things and collects them and then presents them at an appropriate point in the development cycle. So it may present them in code review.

Hyrum Wright 00:39:46 If I’m writing a piece of code and I accidentally write a bug and there’s an existing stack analysis check for that bug, the code review tool will tell me you have just written a bug, like click the button and we’ll fix the bug for you. This also shows up if new stack analysis checks are added, we can then see those in the existing corpus of code. So the code index, the code search tool will show you as you’re browsing, hey, this is a place where your code can be improved. Do you want to create a change that makes that improvement? And we even go so far as to pull out existing findings and automatically, proactively send them to teams as well. So Tricorder is kind of the central part of managing that process, but there’s a lot of other pieces that go into making that automated improvement of source code actually work.

Gregory Kapfhammer 00:40:32 So that sounds like an awesome and super useful tool. For listeners who are interested in learning more, you mentioned ICSI and that’s the International Conference on Software Engineering. I’m hoping that they’ll go there to check out that paper and we can include it in the show notes as well. I want to turn our discussion to the next topic because I know you’re an expert on the topic of large-scale code changes. So very quickly, what is a large-scale code change and why do software engineers at Google have to make these types of changes?

Hyrum Wright 00:41:01 So a large-scale change or an LSC as we’ve come to call them, is a change that you want to make to the code base that is large enough that it’s impossible to do in an atomic change, right? It may span tens of thousands of files. It could be something as simple as I want to call a different function, right? I’m deprecating some function and I want to call some other function instead. Or I’m deprecating this type that people need to migrate to this other type. Those changes are not technically possible nor desirable to make that change across hundreds of millions lines of code in one sweeping move. And so we have to figure out how do we break that up into smaller chunks? But an LSC is generally done because we can improve efficiency, or we can improve developer experience across the code base. We try to centrally manage those. So instead of every team having to make this change, we recognize there’s a value in having a centralized team make those changes for everybody else.

Gregory Kapfhammer 00:42:00 So to be clear, is an LSC something that’s done manually, is it done automatically or is it some combination of automated and manual approaches?

Hyrum Wright 00:42:10 It kind of depends on the nature of the change. We want to automate as much as we possibly can. Sometimes it’s as easy as writing an appropriate we were talking about tricorder a minute ago, writing an appropriate stack analysis tool so that people can automatically see what needs to happen in their code base, click the button and they get the fix themselves. There’s certainly a lot of automated tools that we can run across our code base. There’s another paper from several years ago called ClangMR that talks about the way we automate changes across our code base by using the compiler and tools that are built into the compiler to make changes to our code base. So we want to automate as much as possible. There are times when we can’t automate or there’s certain circumstances in which we can’t automate. And as a result, we’ve actually built, and this is an outgrowth of the Covid era.

Hyrum Wright 00:42:58 We actually built a platform, we call it Busy beavers for crowdsourcing, a lot of those kinds of things. So sometimes there are tasks that need to get done but still need a human to look at, and the Busy beavers platform allows anybody to enumerate a set of tasks, write some good instructions, post it on an internal website, and then anybody else can wander by and just start picking tasks off that list. These tasks are designed to be relatively easy to start, relatively easy to finish, but also easy to put down if someone has to go work on something else. You can imagine this was very useful during an era of covid shutdowns and no one knowing what was going on, but utility has actually lasted a lot longer than just that.

Gregory Kapfhammer 00:43:40 That sounds interesting. So when you do a large-scale change through this busy beaver system, are you requesting changes to the production code base or to CI infrastructure or is it to test cases or can it be all those types of things?

Hyrum Wright 00:43:56 Usually it’s to the production code base, but it also may be test cases that are need to be modified as well as part of the production code base.

Gregory Kapfhammer 00:44:02 Okay. So once we make these large-scale changes, my main question is how do that the large-scale change was correct?

Hyrum Wright 00:44:11 So I’ll go back to the more automated set automated system example. If we need to migrate something that exists in 10,000 files, we can’t test that. I mean we can test that using Google CI system, but it’s really difficult if the test fails to figure out which file caused the failure. It’s also difficult to get that submitted to the code base because merge conflicts, it’s difficult to roll it back if you need to roll it back for the same reasons, it’s difficult to get it approved by 10,000 different people. And so that is not really a feasible kind of thing to do. So what we do is we try to design changes that can be split up into many different parts. So a 10,000-file change may be split up into hundreds of separate changes that are only 20 or 30 files a piece. We can run those through our continuous integration system much easier. We have a much better signal as to whether that specific set of files is correct or not. And then we can get them reviewed by someone who has global ownership over the entire code base, and they can review them individually using automated tools and get those submitted to the code base. So like how do we determine whether they’re correct or not, using the same techniques that we would use to determine whether any other change is correct automated testing and review.

Gregory Kapfhammer 00:45:22 Okay. You mentioned before that the Tricorder tool can automatically suggest a change to the code base. If I’m a software engineer at Google, can I trust that the Tricorder change is going to be correct? Or does it also have to go through code review and automated testing?

Hyrum Wright 00:45:39 So it also goes through code review and automated testing, but this question of is this correct is actually an interesting cultural one. Early on when we were doing a lot of these automated changes, people would push back, how do I know that this change is correct? Do I trust you? There’s a strong culture of stewardship within Google’s code base. Everyone can read all the source code, but specific individuals and teams are responsible for maintaining specific parts of the code base. And so the idea of an automated system or even essentially managed team coming and making changes to my software was a little bit foreign. We had to build a lot of trust by demonstrating that the changes we were making were actually beneficial. And the feedback we’ve gotten over the last several years as we did surveys and other kinds of discussion amongst engineers has been that they really appreciate this kind of automated fixes and cleanup across their own code bases.

Gregory Kapfhammer 00:46:32 Thanks for that response. If listeners are interested in learning more about testing, we’ve had a number of episodes on the topic on software engineering radio, so you may want to check Episode 595 or Episode 572, which we’ll link to in the show notes. Hyrum, we’ve talked a lot about the various approaches that Google takes in order to create good software. Obviously creating this book itself was a major endeavor. Can you tell our listeners a little bit about the process that you followed simply to create the book in the first place?

Hyrum Wright 00:47:02 Sure. As I mentioned, we knew we had a message that we wanted to get out, but we weren’t sure how to do that. Fortunately, Tom Trek, who’s one of the other co-editors on the book, has worked in publishing before. He was the technical writer that oversaw and really project managed the entire project, and he knew the process. So that really helped. We went to O’Reilly, we said, this is our plan, this is our thought. Are you interested? They said, yes, we were. And then we needed to find some content. We actually ended up crowdsourcing the content within Google while Tom Titus and I have a lot of expertise in many of these areas. Again, the genius myth, we are not the experts in all of them, and so we really wanted to get the experts on each of those topics. So each chapter is written by the person that is on or runs the team that is responsible for that thing.

Hyrum Wright 00:47:52 So it’s really a great example of people sharing their expertise in a cohesive way. We also wanted to make sure that each chapter tied into the core themes of the book, and we talk about time trade-offs and scalability. So we asked potential chapter authors to provide just a short thesis that demonstrated how their proposed chapter would tie into those themes. At the end, it was a lot of work, it was a ton of work. It may have been more work crowdsourcing the content than it would’ve been if we all wrote it ourselves, but I feel like we got a better product out of it. I would also say March of 2020 was not a good time to release a book, but I’m glad it’s still done well.

Gregory Kapfhammer 00:48:30 Well thank you for that response, Hyrum, I appreciate it. It sounds like you applied many of the lessons about Software Engineering at Google when you were writing the book itself.

Hyrum Wright 00:48:39 Yes, there definitely was a certain amount of self-referential experience as part of that process.

Gregory Kapfhammer 00:48:43 So when you were writing the book, were there any, I might call them aha moments that you could share with listeners?

Hyrum Wright 00:48:50 So one thing,I mentioned those three themes, time, scale, and tradeoffs. There’s another theme that emerged as we reviewed content as we talked to authors, and that’s one of constraints and that’s many of us don’t like constraints in our lives. We want to be able to do what we want to do and do what we need to do. That’s not feasible in an engineering organization, especially a large engineering organization. We can’t let everyone decide what compiler they want to use or what you operating system they’re going to run on, or what programming language they’re going to use or how they’re going to use that programming language, because that eliminates a lot of the benefits of the systems, we have in place to ensure scalability. Picking appropriate constraints as an organization scale is actually a really powerful way of enabling that kind of scalability. We kind of knew that going in, but it’s amazing to me how that came out in the book, whether it’s, again, large-scale changes or build systems or testing infrastructure or style guides. All those things benefit from having appropriate sets of constraints. It’s possible to get them wrong. I’d like to think we’ve gotten a lot of them right.

Gregory Kapfhammer 00:49:53 So it sounds like what you’re saying is that in many circumstances a constraint can be positive, even though initially we might perceive it as being negative.

Hyrum Wright 00:50:01 Yes. I mean, we see that in our lives all the time, right? A red light is a constraint in some sense, but I’m grateful it’s there so I don’t get hit by somebody coming the other direction. And even in engineering culture constraints are very positive if we choose them correctly. Again, it’s really easy to get them wrong, and I think a lot of times we do get them wrong as a society, as an engineering culture, but good constraints actually add a lot of power to an organization.

Gregory Kapfhammer 00:50:24 Thanks for that response. There’s a lot of content that we, regrettably didn’t have time to cover because of time constraints. So I’m wondering, is there something specific in the book that we didn’t discuss that you’d like to highlight now?

Hyrum Wright 00:50:37 I briefly touched on those three themes of time, scale, and trade-offs. The most important one of those for me is time. As we write software, as you write software, think about how long does this software have to last? Am I just going to throw this away tomorrow? Is it a bash script that’s going to exist on my command line, or is this something I need to have for the next 10 years? Google, we actually think a lot about this is software that’s going to last for a long time. But that’s not always the case for everybody. As you think about how much time your software has to last, that informs a lot of the other engineering decisions you make. How do I have to maintain this? Do I have to be responsive to operating system changes or programming language changes, or do I just have to hope that it lasts until next week? Or if you’re a student, until the programming assignment is due? But that’s kind of one of those themes that I think is really powerful but doesn’t show up in a lot of industry design discussions. How long does the software has to last?

Gregory Kapfhammer 00:51:30 Can you give a concrete example of how at Google you make specific decisions for software that has to live a long time?

Hyrum Wright 00:51:38 Sure. I mean, we’ve talked a lot about maintainability and the need to be able to maintain software. So we know that our existing corpus of software is going to have to survive language changes, library updates, operating system changes, all those kinds of things that are ecosystem related changes, changes in the environment the software runs under. That’s going to happen at Google, at software that runs for a long time. And so we have to think about how are we going to update our systems? If the compiler updates, are we going to be able to fix all the bugs that the new compiler may find that the old compiler didn’t find? And how do we do that using our processes? Or do we need to invent new processes? But those really aren’t problems that you have if you’re just trying to get something shipped next week, and then you don’t have to ever worry about it again.

Gregory Kapfhammer 00:52:28 You’ve shared with us a number of thought-provoking insights, and I really appreciate that. In a way, you’ve already answered my next question, but let’s say for example, we have a junior software engineer who’s maybe studying the topic or getting started in one of their first jobs. What kind of call to action would you give to that individual?

Hyrum Wright 00:52:47 So my advice to them would be think beyond just programming. Again, a junior person is probably spent most of their university time or most of their time focusing on how to write code. Think beyond just that. Think where does my code fit in a broader scheme of things, why do these processes exist? It can often feel very limiting. I have to run tests, I have to write designs, I have to participate in meetings. I just want to write code and I get that, I feel that way still sometimes, but those things are useful in terms of solving the actual business problem. If you want to write code, that’s great, but it’s a means to an end, particularly if you’re employed somewhere, right? They’re paying you to solve a specific problem. So focus on the bigger picture and not just, I want to write this piece of code today.

Gregory Kapfhammer 00:53:36 That’s great advice. Now, let’s imagine for a moment that our listener is a well-established software engineer. What do you think that they should do to level up their knowledge and skills based on the content in the book?

Hyrum Wright 00:53:48 I mentioned earlier, the book isn’t intended to be, you should do the things that we do. So I want to be careful about dictating, especially to folks that prop may have more experience than I do what they should do. I think the thing that I would suggest is if you are facing similar problems, if you are having difficulty scaling, or if you’re working in an ecosystem that doesn’t have a good testing culture or a good code review culture, or some of those other issues that we talked about in the book, think about how you can impact your existing environment in a way that makes this possible. Maybe you have to change environments, right? That’s always on the table. But think about it, can I bring a better code review experience to my employer if that’s something that you see as lacking. Our hope is that, again, we don’t tell you how to do these things, but we get you asking some of the questions that maybe have difficulty asking in the past.

Gregory Kapfhammer 00:54:39 Wow, that’s a really good point, Hyrum, thanks for sharing. For me, one of the things that was beneficial about the book was not only understanding and learning about the best practices at Google, but doing exactly what you just said, which is pausing to ask questions about what might fit best for my own development practices. So thanks for the book a lot. As we draw our episode to a conclusion, I’m wondering if there’s anything that we’ve left out that you wanted to share with listeners before we conclude.

Hyrum Wright 00:55:05 I think the one thing I would say is we would appreciate your feedback. The book gets available, open access on absale.io. You can always buy a hard copy. I don’t get any royalties from it, so I’m not saying that in a form of self-promotion but, if there are areas in the book that you think that we could improve or that should be a addressed in a separate way, I don’t know that we’re planning another edition, but we still want to hear about it, so please let us know.

Gregory Kapfhammer 00:55:29 All right. Thank you so much for sharing. If listeners are interested in learning more about practices for developing good software, they may also want to check out Episode 430. Hyrum, thank you so much for taking time to chat with the listeners of Software Engineering Radio. We really appreciated this opportunity. It’s been informative and educational. If you are a listener who wants to learn more, I hope you’ll check the show notes for details, because we’ll share all the things that Hyrum has mentioned in those notes. Hyrum, thank you so much for being on the show.

Hyrum Wright 00:55:59 Thanks a lot, Greg. It’s been a lot of fun.

[End of Audio]

SE Radio 609: Hyrum Wright on Software Engineering at Google

Show Notes

SE Radio

Related Links

Transcript

Join the discussion

More from this show

SE Radio 613: Shachar Binyamin on GraphQL Security

SE Radio 612: Eyal Solomon on API Consumption Management

SE Radio 611: Ines Montani on Natural Language Processing

Menu

Recent posts