SE Radio 488: Chris Riccomini and Dmitriy Ryaboy on the Missing Readme

Chris Riccomini and Dmitriy Ryaboy discuss their book, The Missing Readme, which is intended to be the missing manual for new software engineers. Felienne spoke with Riccomini and Ryaboy about a range of topics that new software engineers might not have learned about in university or bootcamp, including how to handle technical debt, how to test code, how to make sure code fails “graciously,” how to manage dependencies, and how to do code reviews.

This episode sponsored by Shortcut.

Show Notes

Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Felienne 00:00:16 Hello everyone. This is Felienne for Software Engineering Radio. Today with me I have two guests. Firstly, Chris Riccomini is a software engineer, startup investor and advisor, with more than a decade of experience at major tech companies and the author of Apache Samza. Also with me, Dmitriy Ryaboy is software engineer, an engineering manager that’s worked, that’s a variety of companies and helps create Apache Parquet. Currently the vice president of software engineering at zoom merging. So get it all right. Dmitriy and Chris wrote the book, the Missing Readme. That is the topic of today’s episode. So firstly, a question like what is this book about? Who is this book for? Like, of course it’s called the missing read me, but the reason we have

Chris Riccomini 00:00:59 First off, thanks for having us on the shows. It’s great to join you this morning, at least morning for us. So the book is really for a few different audiences. I think the impetus for writing the book first off was a few years ago, I inherited a team of engineers to manage and many of them were pretty entry level. And so I was doing the normal round of kind of, one-on-ones getting to know everybody and talking about what they’re working on and stuff. And I found myself repeating what I was saying quite a bit, and I sort of converged on the idea that there should really be a manual for this to help new software engineers get up and running and off the ground and kind of integrated into the flow of working at a company as a, as a professional. So I would say one of the audiences is actually the managers that are hopefully excited to hand out this book to the people, working for them, help them kind of bootstrap new hires and stuff, our aspiration, and actually our intent that we keep a stack of these under our desks for people as they onboard, if they need to get going, we can, we can help them out with it.

Chris Riccomini 00:02:00 Second and main audience I would say is really engineers sort of in the zero to five-year experience range, either just coming out of college, just entering the workforce. This cohort of engineers has a lot of good computer science fundamental experience. So they’ve been trained, you know, operating systems and compilers and programming languages and stuff like that. They tend to have a little bit pitting with somewhat of a broad brush here, but that this is, has been my impression. They tend to have a little bit less experience doing sort of, I wouldn’t call it softer skills, but just knowing how a company works and like how to interact with a manager, how to interact with other teams, even how to work with a team, you know, oftentimes in, in university, you know, you might have some projects that you do with some of your students, but a lot of your work is done in isolation.

Chris Riccomini 00:02:48 And of course, when you join a company, almost all of your work is not done in isolation. It can be a little bit disorienting. So that’s, I would say the second audience and then the third audience is somewhat similar, but I think has sort of the inverse skillset of a new college grad. And this is a bootcamp type engineers. So people that have, you know, maybe zero to however many years of real world work experience, but in, in perhaps a different field. So these, you know, we’ve hired bootcamps that we pay the company. I currently work for bootcamp graduates that have come from, you know, there were lawyers, we had, uh, somebody that we hired that was an editor like a book editor. They often have a fairly broad skillset and also experience interacting with managers, working at companies, understanding how things work in a little bit more detailed than perhaps a new college graduate might.

Chris Riccomini 00:03:36 But on the other hand, they tend to have a little bit less on the CS fundamental side of things. So, whereas a new college graduate has, you know, four years of class after class, after class CS fundamentals, a bootcamper might have like six months of intensive training, oftentimes in kind of a full stack environment. And they get trained out there like one language or two languages, and then a stack of some sort, whether that’s like node or Python or something like that. And so the book kind of somewhat ambitiously tries to cover all this, but it’s not meant to cover it in super deep detail. And so what we do is we cover sort of a lot of the technical stuff that you wouldn’t get taught in a university setting. So this is like some of the details of modern testing, how you interact with legacy code and work on refactoring and stuff like that, how you do operations. So like new college graduates are not often taught how to be on fall. And it turns out when you go into the workforce, oftentimes you go on call and you get paged and you have to answer questions and nobody really tells you how to do this. So there’s like sort of a technical portion of the book. And then there’s a more process oriented portion of the book. That’s about managers and like promotions and even design discussions, stuff like that. So it’s meant to address sort of both of these audiences.

Felienne 00:04:46 That’s a great answer, such a diverse book that covers different skills. And of course me working in a university sort of part of your audience is coming from my pipeline. So I very much recognize this. Well, you teach them CS fundamentals was, there’s so much more to know

Dmitriy Ryaboy 00:05:01 It’s exactly that there is just to add to what Chris was saying. Most of the time, the way people learn, how to code and the way they get into their first after engineering job, it’s working on their own project and they don’t get really get to see the work of maintaining the code and working with a team to keep it running. Right? So you get sort of this very nice clean sort of maybe you have an exercise where there’s an existing piece of code and you need to modify it and that’s great, but then you modify it and you’re done. You submit your assignment, you get your grade, you never look at it again. Whereas when you were working on something, you’re just looking at the thing all the time. And other people are looking at that thing all the time and it’s evolving under you.

Dmitriy Ryaboy 00:05:40 And it’s a different mode of working on a lot of the things that we spend time on in the book are about sort of the practices that have developed within the industry for dealing with that. And you just don’t really get that experience either in college or bootcamp. And often you don’t know what you don’t know. And in terms of the Genesis of the book, I had a similar experience to Chris, by the way, like Chris and I have never actually met in real life. This is we’ve sort of known each other on the internet for a decade or so, but the experience is really parallel to each other, new people join and they know how to program, do they have this? Maybe they have experience from other things, but there are certain things where they don’t even know what you’re talking about when you start talking about it and to somebody who has been around for a while. It’s so second nature, we don’t recognize this as a topic to teach because it’s not algorithmic complexity, you know, or clever ways to traverse a binary tree or whatever. Like those things are obvious that like, there’s a thing here. You need to learn it. How to do logging well is just logging. Right. But actually it’s really complicated.

Felienne 00:06:42 Yeah. I think we will definitely get to, to logging and more practical skills. But firstly, I want to go back to one of the things you also talk about that you were just hinting at is the idea of technical debt, right? That it needs a university or in a bootcamp, you make an exercise and maybe you change codes, but definitely you will be given 10 year old, a million lines of ten-year-old C code. So I think what I really liked about your book is that you described two different forms of technical debt. That to me were new prudence and reckless. So maybe you can dive into a brief summary for people in the audience that might not know at all what technical debt is, but specifically Freudians or reckless. And what do you think that that distinction ads

Dmitriy Ryaboy 00:07:24 Prudent and reckless technical debt. This is an idea we got from Martin Fowler who has a really informative website, Martin follower. He describes a matrix of the kinds of debt you might take out. So it’s your standard two by two. So one side you have deliberate versus inadvertent and on the other, you have reckless and prudent. And so you can have any combination of those two, right? So you might have something that’s reckless and deliberate. So you’re deliberately being reckless or you might have something that’s reckless and inadvertent. We get into it in the book, but essentially there is you consciously make a decision to take on some debt and you have a plan for how to address it. Or you consciously make the decision that this is a risk we’re willing to take as you kind of evaluate it. And then there is, you know, advertent when you don’t know what you don’t know, and maybe you do it because so reckless and inadvertent is, I don’t know what’s going to happen, but we’re just going to go and you just write some stuff and you throw it out there and then you have problems or you have pruned in an inadvertent where, you know, there’s an unknown and you know, and you, you need to move forward and you say, here’s how we’re going to monitor this to learn more.

Dmitriy Ryaboy 00:08:34 And when we learn more, inevitably, we’re going to do something about it. And the fact is that, especially that kind of debt is unavoidable. If you’re going to make any forward progress, right, as we start building a program, as it starts being used by real humans, you find out more about the problem. And then you learn how maybe you should have solved it, right? And then now you technical debt, but you couldn’t have addressed it before you started writing. And in fact, the definition of technical terms, the original definition where the term came from is about that, about learning the original term came about from trying to explain, to sort of non-technical people that sometimes we get a write the thing and it’s not going to be quite right because through the process of creating software to solve this problem, we’re going to learn more about the problem and we’re going to understand it better. And that’s why we always need to revise, right? So that’s why it’s not the right ones. And it’s done. It’s usually write it, improve it, keep improving it, it keeps living in,

Felienne 00:09:33 I think this is such a useful distinction, especially in need for people that are new, that don’t really understand how to do something. And then you want to communicate like sometimes it’s okay if you don’t understand how to do something, but sometimes you can make some predictions. One of the things you advise in the book sort of against technical debt to pay off technical debt. So to say is to do minor. Refactorings like make small fixes. So I was wondering like, where does that advice come from? Because I can imagine that making small fixes can also be like, oh, here, oh, that’s all I wrote there. And then, you know, you build this bigger monster, isn’t that sort of called flexing with creating more reckless technical bets. How do you see that

Chris Riccomini 00:10:15 In practical experience, addressing technical debt takes a lot of different strategies. So it really is dependent on the kind of technical debt you’re dealing with and if you know what to do about it, right? So there are certainly some cases where it’s appropriate to do large scale like refactoring efforts. Those are really complicated and really hard to get. Right. I have seen them done correctly, but it usually requires the entire organization like grinding to a halt before there’s an appetite to do that. So oftentimes you find yourself having to do things incrementally, whether it is an ideal state or not. Um, and in some cases it is an ideal state. So I think when you’re doing a smaller refactoring, there’s kind of two kinds of debt that you can, uh, address with that mode. One is like minor cleanup kind of stuff, where it’s not a systemic problem that you’re dealing with, but it’s something that’s limited and maybe more isolated.

Chris Riccomini 00:11:11 So maybe there’s some utility class that needs to get refactored and everyone knows it needs to get refactored. You’re mucking around with it. You might decide that this is something that I’m going to take on. It’s fairly, very minor. I can do it in, you know, a day or two. And I think as long as that’s handled appropriately, and then the commits are separated. So it’s not intermingled with your regular feature where you can be pretty successful with that. The second kind of technical debt approach is where you actually know the term, like what the end state that you want to get to. And you can March toward it incrementally. That’s like the ideal, right? And so for example, with my own teams, we will sort of have an escalating approach to technical that we’re on the first chunk. We have sort of the minor stuff that we do as we’re working. And usually we’ll budget a bit of time to do some of that work. But the second step up from that is like, there is a project that’s like maybe a quarter that we can budget in. And the third one is sort of like that very large chunk of work.

Dmitriy Ryaboy 00:12:11 Yeah. And in addition to what Chris just said, the reason we make a point of talking about minor refactoring is that it’s very easy to get into these dichotomies of people saying just don’t touch it. And, you know, unless you’re really, really ready to tackle it, leave it alone. Or, you know, you just gotta burn it down and start all over. Like let’s rearchitect the whole thing. We’re going to switch languages. It’s, there’s a new framework on hacker news. So let’s adopt that. You can make things better, but just fixing them as you go along, like while you’re in there, you see that, you know, the API does some internal API is not quite right or there’s inconsistency in how certain libraries are used or something needs to be upgraded. Just go ahead and do it. You know, it doesn’t need to be a giant, big deal where, you know, it’s prioritized by the, all the managers and there’s a committee that decides what to do, refactoring.

Dmitriy Ryaboy 00:13:01 Like you see something that’s a little broken, you can just fix it. It’s okay. Especially when you’re inheriting the million lines of SICO type thing where it’ll just take awhile, the little things do add up, right? As long as there’s like Chris was saying, as long as there’s agreement about where you need to take things and you’re not just changing. So that now there’s 17 different coding styles in the same code base. You don’t need to stop the world and sort of redo the whole thing. Right? Like you can do it a chunk at a time as you go along.

Felienne 00:13:28 Yeah. I think that’s really good advice because it needs, if you wait for this perfect opportunity in which there will be no feature requests for three months, that’s really, it’s never going to have run. Right. So I totally understand you need to do this step by step. So you actually, in the book you described sort of an algorithm. So to say, to deal with legacy goals, maybe you can summarize that and explain the background of that to

Dmitriy Ryaboy 00:13:50 Sure. This is also isn’t something that we came up with. Most of the ideas in the book are borrowed. So this is the legacy code changer algorithm from Michael feathers, who wrote a book called the working effectively with legacy code. I guess it’s a little bit dated. Now it’s came out in 2004, but it’s, it’s quite good. And the algorithm as he proposed it is to identify change points, find, test points, break dependencies writers, and then make changes and refactor. And we walked through each of those steps and explain what Michael feathers means by those. Although of course, to get the full treatment, you should just read the book. This is our general approach to all these topics with tell you what the general idea. So if you know, there’s a, there, there, and that’s the thing and people talk about it and there’s materials. And then we give you references for books or other resources to, to really get deep on it.

Felienne 00:14:40 Alternatively, of course, if instead of reading the book, you could also listen to, so for engineering episodes, 209 to five, in which I actually interviewed Michael feathers about this book. So that might be like this little between then our episode, that really dives into legacy codes in a whole episodes. And then maybe read the book. And with that, maybe we can also move on to the next topic because in these, your book, no, COVID a few of these topics as a little bit of a, a broad view to weave it all together. Something else that your book talks about, which I really thought was very valuable as a defensible program. What is defensible program?

Chris Riccomini 00:15:18 So this is, I think a term that’s been around for awhile and people do it oftentimes sort of intuitively, but they’re essentially a set of practices you can use to kind of both protect yourself as the author, from having to deal with bugs later on, and also to protect the software from unexpected behavior. So there’s a whole set of practices around no pointers, for example, and input validation. So especially in typed languages, you know, you get a parameter passed in and you really have no clue what’s in that parameter. So you can assume it’s the string. You always hoped it would be, or you can expect it to be, you know, anything. There’s a whole set of these practices to protect everybody that’s involved with the code and also to handle failures in a more graceful.

Dmitriy Ryaboy 00:16:03 One thing we do in the book is we have these sidebars of sharing our personal experiences or experiences we’re very close to, and that illustrate this point and like how this stuff comes about. So if you don’t mind, I’ll share the one that we have about defensive programming, where I made a big goof that as far as I know is still out there affecting people. My first job, I might’ve been still in my undergraduate at the time, but it was working on this genomic analysis service. This was back when the human genome project was just, well, it was fully underway, but it hadn’t finished yet. So this is like around the year, 2000 early two thousands. And we provided a online service for scientists to upload DNA sequences and do something with them, search for stuff at the time, the way that a lot of scientists, a lot of biologists thought about DNA as strings, they just needed a place to type it, right?

Dmitriy Ryaboy 00:16:58 So often we expected a textile, right? The textile, there was a standard format for how you’re presented DNA, you know, all the ACC and GS. And we explained what the format is, but then of course scientists would go and like type it straight up into word and upload a doc file a word document. My crappy little parser didn’t parse that because it expected a text file. And so of course, internally through an error, I tried to read the thing and it didn’t work. And what resulted for the user was no results are found. We didn’t tell them that there was a problem processing their input. We told them there are no results, right? And so they would then contact. Some of them would contact us and say, why were there no results that are different? They should be. And some of them would just go away thinking there are no results.

Dmitriy Ryaboy 00:17:38 And like that goes into their scientific research, right? So it’s rather terrible. And of course my terrible solution to this problem was to put a big icon on the submission form that had a word crossed out with the red line and that drove down the number of those errors. And so everybody was happy. And, you know, I got an attaboy from my manager for solving it this way, but that’s not defensible programming, right? Like this is avoiding the problem. This is, I didn’t write a parser for word. I didn’t throw an error up on receiving this thing to say, don’t submit it this way. Please submit it the different way. So there was nothing there, fundamentally that prevented there. It was just sort of trying to get the user to not do the stupid thing where I know that they were doing, you know, it was the classic blame to user’s name, but the visual pigment worked until they redesigned the website. I’d left the job. And the literally, while writing this book, I went to check the website. It’s still around. The tool is still up. They changed the UI and that little icon is not there. So on a Lark, I decided to type some DNA into, into a doc file. And I said, metadata, and sure enough, it went clean through and I got zero results. So it’s still returning bad. 15 years later, I moved to 20 years later. It’s still, you know, giving scientists bed results because I was too lazy to deal with bed. Wow.

Felienne 00:18:55 That’s a great story. So this is an example of what is not defensible programming. So maybe you also have and why it matters, but I’m wondering if you also have a good example of like a pro example of what is defensible program

Dmitriy Ryaboy 00:19:10 In this case, the answer would have been expect that you will have all kinds of wild inputs from your users. Be it, you know, people upload their own kind of file or the data doesn’t match and be clear about your expectations and either, you know, ideally of course you treat everything and you’re able to process everything. But the silent failure is, is really the, the error here, right? Like if you’re going to fail, fail early and tell people like what the problem is so that they can address it. This comes up in a wide variety of places, user input, there’s the classic one, but also, you know, services changed their API APIs and things started coming out. And how are you going to deal with that? What you don’t want is a failure and a giant stack trace that’s facing the user. Right. Do you want to sort of think about your, your failure conditions and how to defend against them because they are going to happen, especially as we started working on

Felienne 00:20:03 This is maybe the graceful failing that’s Chris was mentioning, right?

Dmitriy Ryaboy 00:20:07 Yeah. Like, know that it’s going to happen. It won’t be the, the ideal flow all the time.

Felienne 00:20:13 Yeah. Something else you mentioned in the category of defensible programming is avoiding no values. So how, and maybe also why like, how can I know to use no values? I’m going to have to initialize something like, like, how does that work? Can you maybe give some practical examples of why that matters and how to go about it?

Chris Riccomini 00:20:31 Yeah, no, it’s definitely one of those hot topics, especially these days is that among a lot of more of the functional programmers among us, but one of the classic patterns that we see increasingly popular over the past decade or so is using like optionals instead of no. And so the idea there is that you’re not, or rarely dealing with an actual, no value. What you’re dealing with is a structure that is called an optional, and it can be either something or it can be nothing because of the fact that you are dealing with an actual object and you need to either pattern, match, or extract the value from it. It forces you through the API to think through both the, there is something here path and also the there’s something not here path and to deal with this, something, not your path explicitly. So that’s, I think a very common one.

Felienne 00:21:18 So maybe diving into that in more mainstream languages, because it’s clear how that would work and Haskell or as sharp, the, these people are functional languages, but now in, in Java or in, see if I want to have an optional or a maybe how do I do that? Practically, do I use a library? Do I write this myself? Do I just use this way of thinking?

Dmitriy Ryaboy 00:21:37 It’s not that complex. You can read it yourself. There are also libraries, the way things evolve, probably between us talking now and the episode coming out there might be a new one. There’s quite a few. I first encountered this line of thinking through Scala, but Java has since adopted that superficially at first, it seems like it’s not really different from now. Right? I can check if the thing is now do something, or you can say, if the option contains something, do something right. Like what’s the difference. I can still always get my pointers up from, by calling and get on the option at their own time. The major difference there is that novel will throw kind of runtime exceptions. If you do reference it wrong. Whereas the option mode that really allows you to deal with the thing within the kind of regular flow versus the exceptional flow.

Felienne 00:22:27 You mean, if it’s not as runtime, what you mean is that it will then fail. It’s compiled time. If you don’t handle it properly, right?

Dmitriy Ryaboy 00:22:34 I’m thinking of ways that we can still force it to fail at runtime because you can still end up doing something with the reference to knowledge. If you really try hard

Chris Riccomini 00:22:41 In general. Yes. If you are properly, there’s a big caveat, but if you were properly using, you know, optionals, you should essentially compile time warnings or errors. In other language, I was just dealing with recently a swift and they essentially have something like an optional let’s and you can do an explanation mark or whatever it is to get the actual value out. And what I found myself doing, and this is for a side project and I was just hacking around, but this is definitely not the right way to do things is I would just essentially call get on everything and ignore the other path, the non path, which is essentially the same thing. It’s just pretending that it’s a regular object and ignoring the fact that it could be no. And so if you abuse, optionals or ignore their intent, then certainly you can get runtime errors.

Chris Riccomini 00:23:27 But if you play by the rules and are diligent about it, you should, for the most part, be in much better shape and as with everything. And it is a big caveat in our book. Like almost everything is great. There’s almost no absolutes in computer science. So it’s hard to say for sure, you’re always going to be safe if you use optionals. Right? And in fact, there’s a line of thinking with Java, for instance, where they push you to only use optionals in certain situations. So all of this stuff has a lot of caveats around it, but anyway, at a minimum, if you’re abusing it and you understand what options are for it, it does give you a little Pang of shame every time that you cheat, uh, which is the feeling I felt as I was playing around with swift. So you’re at least somewhat aware that you’re breaking the rules of it. Yeah.

Felienne 00:24:06 Is this extra level of defense, right? So let’s do one more from the defensible programming, because I thought that was such an interesting line of reasoning. Also, as you said in the beginning, this is the type of stuff that you don’t learn in school. Like literally. So you also say reads, right? Smart. What is smart retrying? Is this an infinite loop of waiting for three seconds and doing the API call again, maybe that’s not smart. What is smart retrying?

Chris Riccomini 00:24:33 Anytime something goes wrong. You either have an option of giving up and failing or retrying or going through some alternate code path. Right. And oftentimes, especially when you’re dealing with network, the answer is like, Hey, let’s, let’s retry again, at least a few times, right? The most naive approaches, you just simply take the call you made and call it again. And there are a number of issues with that approach. So you can essentially dos a service by if there is a failure amongst a wide number of services. For example, imagine the machine that is receiving calls sort of at a steady rate. And then it has a hiccup, maybe a garbage collection or something, and it’s offline for a second. So now you’ve accrued a seconds worth of RPC calls amongst all of the upstream services that are in call it. So when it comes back online, if they all are just immediately retrying, then you now get this wave of RPC calls all coming to the service.

Chris Riccomini 00:25:24 And then of course it GCs again, because now there’s a ton more calls than unexpected until you get this kind of cascading effect. One of the things we talk about in the book is like telescoping back off or exponential backoff, or even just a random sleep where when you retry you, don’t just immediately retry. But you pause for a little bit of time and maybe introduce something called jitter, which is like essentially a random weight or a bounded random weight where you say, you know, I’m going to wait to plus or minus 500 milliseconds, something like that. That is two seconds plus or minus 500 milliseconds. And then maybe you tell us, go back. So it’s, you know, maybe it starts at 50 milliseconds, then a hundred and then 200 and 400. You’re not banging away on things. Something else that we talk about in this area is item potency.

Chris Riccomini 00:26:05 If the RPC call you’re making is incrementing a counter, for example, and you are retrying your call. That is not an item potent operation. So if the call was successful, but the network timed out, you don’t really know that the counter was incremented. So if you just naively retry, you are now double counting or triple counting or quadruple counting. You might need some kind of item, potent operation. And then for that, you could use something like the ID, incremental ID or UID to allow the server to essentially de-duplicate the calls that you’re making. So this is all network focused, and this is all sort of retry focus, the retry stuff. I think vocalize writ large within your own applications when you’re dealing with disc, when you know, essentially anything that’s not in your nice little cocoon of immediate post.

Dmitriy Ryaboy 00:26:47 Yeah. And of course, this is where we do get into an, uh, fairly active area of computer science research, proper computer science research on distributed systems. And there’s things like CRG and all kinds of other things. It is the case that for like undergrads or bootcamp graduates, they often don’t see that stuff. And they kind of run into it much earlier these days when everything is a service. And you’re talking to all kinds of things, online, people who haven’t taken the graduate course in distributed systems wide, that building distributed systems without realizing that they’re doing that. And ideally they would go for law and learn the distributed systems course material. But as it is, at least knowing that networks fail, you should expect that the network will fail. It will not just stay up. You should retry. You should not try forever. You know how to do it in a nice way, right? There’s some basics that we just want to make sure everybody generally knows. And then for the more advanced concepts, of course, there’s a whole wealth of material there.

Chris Riccomini 00:27:41 I just want to plug, you know, coworker a mind road. I both, it’s pretty widely known as sort of you kind of canonical at this point that designing data intensive applications from Martin Batman is sort of like the reference book for a lot of the distributed system in intensive stuff to highly recommend that book, if, as a listener you’re into this subject or for that matter building microservices are actually dealing with stuff on a daily basis. There’s just so much there.

Felienne 00:28:05 Yeah. That’s a great tip. We will definitely put a link to Martin’s book in the show notes. Let’s move on to a different topic. Then something else that your group talks about is testing specifically, you distinguish different types of testing, the different uses of testing, I should say. So what are different ways in which testing is used in a project?

Dmitriy Ryaboy 00:28:25 Yeah. So there’s all kinds of ways that you can test. There are methodologies where you write tests in order to validate your own code. As you go along and encourage some best practices, there’s unit testing, which are sort of validate that your logic is correct. There’s integration, testing, validated the various components of your system work well together. There’s acceptance testing, which is sort of more about contracts, right? Like if a third party says you’ve been software that does X, Y, and Z, then acceptance testing is them checking X, Y, and Z. And signing off that you delivered the thing you said that you will deliver. There’s a variety of different tests. One common mistake is all the tests go into one bucket. And then you wind up in a situation where tests that you need to validate that the small changes you’re making are embracing things, you know, should be really fast.

Dmitriy Ryaboy 00:29:13 But if you’re running really complex integration tests that you need to run before you ship a version to production, they tend to be slow and they tend to require a lot of setup. So you kind of slow down your whole development cycle and it’s a very easy trap to fall into. So we talk a little bit about that sort of thing and different kinds of tests. There are lots of different philosophies on testing. There are some people who say, actually, don’t bother with unit tests, just write the integration tests because that’s what the user sees. There are some people who really advocate, you know, covering everything with us. We suggest that it’s easy to go a little bit overboard with that and to test the wrong thing. It’s a very deep topic. What the right test is sometimes controversial. This is probably the single chapter that we get the most feedback about where they say that we got it wrong. There’s quite a few different philosophies, actually believe it or not, we’re fairly aware of them. And we kind of had to choose one that seemed to be reasonable for a beginner.

Chris Riccomini 00:30:07 One of the things, as we were writing this chapter that we went through is we thought it would be great to provide some examples. There’s unit tests and here’s integration tests in the wild. And so we went out and we looked at a bunch of open source projects, some of which we work on. And of course, what you discover in the real world is that, you know, one person’s unit test is another person’s integration test. And some people have, you know, integrations tests that are actually acceptance tests. And so I think one of the things we highlighted in this chapter, it’s like, youíve got to be a little bit pragmatic and flexible about like when to use these things, what they’re called and don’t be too dogmatic about it. The most important thing is like, you’re testing your code and all that, and things are working.

Chris Riccomini 00:30:49 And so, you know, very frequently you’ll find integration tests that run as part of the unit test suite that might or might not be okay, depending on the situation. It’s just a very messy area in terms of just the naming hierarchy and when things get implemented and where they line up. The thing we wanted to highlight for the new software engineer is like, it’s common that things are messy. And like, it doesn’t mean you’re a bad software engineer. If you’re doing things in a slightly messy way, it also doesn’t mean other people are a bad software engineer. If you know, you’re looking at, you know, I think one of the projects, I won’t name the project, but we looked at it and they had all their integration tests kicking off locally. And they were like doing remote calls to cloud services and stuff. And so just to run the unit tests, we’d involve like installing service accounts for various cloud providers and stuff that that’s a little bit on the not great side of things. Like I said, you have to be somewhat pragmatic about dealing with this stuff.

Dmitriy Ryaboy 00:31:41 One of the points that we make about this observation is that, and these stuffed our projects are very impactful, very useful, have affected thousands of engineers in a good way, made a lot of people much more productive. So let’s not be too dogmatic about these things. When you see something like that, that does mean, oh my God, this is terrible. It means, oh my God, this has actually been a collaboration between many different people with a lot of different backgrounds. It’s been evolving that said if it’s slowing you down and if it’s creating problems, fix it.

Felienne 00:32:09 I actually, I was going to say, this is actually I think good advice, because you’re all risers in many areas of software development specifically, maybe in testing, you do have these different scans and was just like no way to get it right. So I do think especially for junior developers, it’s great to hear sort of the voice of the middle that, yeah, it’s fine. If one of your unit tests is actually also an integration test, then it’s not like, you know, you’re done as a professional. This, this sometimes happens. I do think that’s a, that makes sense. As an advisor specifically at that level, yet another topic on the book, the book really is a rich collection of all these topics that matter for professionals. Something that you also talk about is code reviews specifically. According this idea that for me, was new of doing a walk through with a code review. So what does code review?

Dmitriy Ryaboy 00:32:58 The general idea here is a lot of the time these days, when people do code reviews, it’s make a change, put it up on GitHub, GitLab, whatever your tool is, and then somebody else looks at it. So it’s asynchronous. And we give a bunch of tips for how to do this well in a way that you critique the code and you don’t critique the people. So it doesn’t feel personal. It doesn’t feel like an attack, how to write feedback, how to process it, et cetera. But it’s also the case that sometimes the change is big enough, that sort of a little blurb that says I’m fixing a thing isn’t sufficient to really get people to understand it. A lot of the time, this is, I guess, just how humans are, right? You put up a 20 line code change and you get five comments. You put up a 20,000 line to go change with all kinds of things in there and you just get a ship. It, they couldn’t like you can’t process that. So for larger changes, sometimes when you’re doing really structural stuff or it’s just a fairly intricate, we do recommend that you just grab time with a colleague and walk them through what you’re doing. You say, here’s what I’m doing over here. Here’s why I needed to modify that, that library. And here is a different way of thinking about, and you just really giving them rich context about what you’re doing and why, and then they have the appropriate context to give you feedback on the coaching.

Chris Riccomini 00:34:16 The other thing I would add is oftentimes we’ll get it like a gotcha like, oh, well, you’re just not supposed to ship large poor requests like anything over, you know, some number of lines is, is a no-no. And so if you break it up, you’ll never need a walkthrough or an in-person discussion or even a virtual discussion. And I think there’s different kinds of large. So there is large in terms of number of lines of changes. There’s also large in terms of like mental capacity to understand the complexity of what’s going on. So for example, you might have a very, uh, I’ll give you a real world example. We were refactoring a system at my company, which deals with money. And so this system is, is our ledger. It’s responsible for tracking. It’s a source of truth for the money movement at the company. So it’s kind of a big deal.

Chris Riccomini 00:34:57 And we were editing some of the logic in a state machine that was controlling. It was essentially a coordinator amongst a number of machines that were tracking payments. And so this is like a very sophisticated piece of logic. The code is only 500 lines long or so. And it’s like this, this loop, that’s doing a bunch of stuff based on events, the change itself, you know, it might be 40 50 lines of code, but getting it right and understanding all the edge cases and what’s going on in addition to the testing and stuff, the whole team needs to know about this change. And it really needs to get explored in detail and discuss and really analyze it thought through. And so setting up a, for a change like that is hugely beneficial versus, you know, blasting out the poor request, providing a design doc and sort of walking away and waiting for comments, a lot of stuff, you know, especially remote. Now it’s important to have real conversations where you can trigger some of these thoughts that might not happen in isolation.

Felienne 00:35:55 Yeah. I want to go dive a bit deeper into that. So big because this doesn’t really seem to fit the get hub flow. So there is a pull request and then comments happen asynchronously, which is also nice if you work in different time zones. So how do I do this? Do I plan something like this? Or is this something you’re discussing all the pull requests at one point, someone says, Hey, maybe we need a walk through, or do you have like guidelines? If there’s more than a hundred lines or maybe we should do, how do I fit? I love this idea. How do I fit this into my project?

Dmitriy Ryaboy 00:36:28 I think both had things happen. So sometimes if the asynchronous discussion on a pull request gets maybe a little acrimonious or two people are just not understanding each other, or it’s sort of, well, you can do it my way or you do it your way. I don’t really care like that kind of conversation starts happening. Maybe that’s the time to actually have a little face to face virtually or physically conversation that resulted as these things much more generally amicably quickly. It’s easy to lose tone and to not understand where people are coming from when you were just exchanging the, the text messages. So that’s why when sort of things feel like they’re starting to go south, just take a pause, find the time to sync up in person. And then the other is when you know that there is this kind of rich context that everybody needs to understand either because the changes sophisticated, not necessarily large, but maybe even subtle.

Dmitriy Ryaboy 00:37:20 And there’s subtleties there that upon just reading the code, somebody might not get, and you need them to know, this is probably not something that somebody fresh out of college should be doing that kind of change. But if you think there’s something going on, right, like tell people when you know, you’re doing something that maybe is what I’m actually doing is setting out for seven more pull requests that are also going to be small changes. But let me tell you where I’m going and let’s sync up. Then you do it preemptively, different teams have different processes around that. Like maybe there’s a standing sort of design discussion meeting, or maybe you do it ad hoc,

Chris Riccomini 00:37:54 Just wanting to jump out on the open source side of things. It’s one of those things where I think you’ll find a lot of people are definitely willing to take time and have in-person chats, but it’s not something that is, I think really part of the culture right now in a lot of ways. But I have definitely had in-person chats on a number of different open source projects, airflow chatting with people at Monson or Ahmanson, rather data catalog that we use. And so oftentimes if you ask them like, Hey, can we hop on a zoom or hop on a Google video chat, you know, modular time zones. It’s a great way to discuss the change, but also build sort of relationships with the community. In some ways I would just encourage people to try it. You get a lot more engagement than you expect, and that is also quite useful.

Chris Riccomini 00:38:35 So I totally buy the, it is not part of the culture, not part of the get flow, get up, pour request cycle, you know, writ large. I think some of the more mature projects when I, when I say mature, I mean, in terms of like process. So the projects that are under Apache or CNC app like that I’d have multiple companies working with them. They’re there, I think more apps and more likely to have community meetings. Meet-ups weekly sync up stuff like that. In those cases, it’s a little more common, but you know, your average, like Ruby gem on GitHub, that’s maintained by, you know, maybe a college student or somebody that’s far less likely, but I don’t think that’s because they’re unwilling to do it. It’s just something that I think has it really sort of happened as a mainstream thing, but it’s, it’s really beneficial. And it does, it does work

Felienne 00:39:20 To close that topic. If you do want to know more about us, go to reviews. We had a show on that as well. Show number 403, Dr. Mickey like Riley goes into coach reviews a little deeper, but this book has so much more to offer us that we could talk about. It’s one of the things this came up very early in this episodes already is logging. So logging is don’t think that maybe you don’t learn in school. Can you give some ideas of what are practical best practices that your who covers?

Dmitriy Ryaboy 00:39:47 I spent a unbelievable chunk of my life, dealing with logging, uh, being served in the data field, which is not something I would have predicted coming out of undergrad. There’s a couple of things. One, I don’t want to over-generalize but frequently when folks do their programming outside of kind of a professional environment, logging is just writing some stuff to standard out or maybe to sub file, right? It’s print line. I got to this point and I’m doing the next thing. And that’s a completely sufficient and fine. And then you get into industry and you realize people are using logging libraries that have logging levels and needed to understand what those levels are. And it’s just one of those things that is pretty fast to learn, but nobody bothers to teach you. We bothered to write it down so that you can read about it. What are these log levels mean? I mean, of course it’s documented, right? They can look, we’ll look it up on the web, but you need to know to look it up. What do these log levels mean would to put in where a little bit of kind of philosophy on, you know, what is a warning and what is info? And what’s the difference between a warning and an error and the fatal, that sort of thing.

Felienne 00:40:55 Maybe we can dive into those topics a little bit more. So, so what are different load levels for people in the audience that might not be familiar with that?

Chris Riccomini 00:41:08 Let’s see. Yeah, I’ll take a crack at it. Oh, well, fatal error, warn info, debug trace. I believe that is the worst to best level. And when I say that every framework and or language has its own verbiage, so some of them, you know, might get a little fuzzy there and set, not all of them have fatal and not all of them have error. That’s generally what you’ll find is that sequence, I believe I would just say we also, in addition to the levels, we talked quite a bit about sort of best practices. So how to keep your logs fast, where logs go, like, are they going to disk? Are they going remote? You know, redaction and not logging passwords, email, you know, PII, which is personally identifiable information like emails and usernames and stuff. There’s a lot that goes into good production grade logging.

Chris Riccomini 00:41:58 That is not obvious. Even from the error API that you’re calling, like one of the most common things you’ll see is like someone will call the log info and they’ll have their string that they’re logging. And then they’ll concatenate a dynamic value to that string, like starting server with port and then plus in an import number, right? And string concatenation is a slow operation. And if you are not logging at the input level, you are burning CPU on something that, that is, is not useful. And so there’s like all kinds of practices and little tricks that are not obvious that by tweaking the way that you’re logging things, you can really get a lot more performance out of your application. And, you know, I’ve been in many situations where I’ve done performance analysis and in the tight loop, the trace log line that is just never being read by. Anyone is taking up like 80% of the CPU cycle. So there’s, there’s like a lot of gotchas that are just not obvious with logging at all.

Dmitriy Ryaboy 00:42:50 I used to joke that eventually my retirement gig would be a strength and coordination consultant where I would just go basis and fix how they use strings and make their code run for the 10% faster, because there’s so much of that out there. And you can basically do it with a shell script, right? Just like fix people’s stuff. One practical tip printing, stack traces, they’re usually multiline. And once you go into a kind of industrial deployments that are usually systems that suck up all the logging that different services do and make them available for search, et cetera, but they tend to work on a line by line basis. So if your message isn’t contained within a single line, it gets broken up and good luck reconstructing back your 27 line stack trays that has been sort of broken up into 27 separate messages and mixed together with similar logs from all the other deployments of the same service. And now they’re all kind of shuffled together, right? So there are plenty of ways to deal with that structured logging. You know, there are loggers that will log things in Jason format for you and capture these sorts of things. It’s a little bit of work initially, but then it just saves you a ton of time if you’re using one of these like elk stack type log processors.

Felienne 00:44:04 Nice. Yeah, those are really nice practical tips and advice for logging, which is in these it’s opaque sets, you would think in theory, it’s fixable in practices can lead to so many issues. We’re nearing the end of the episodes. Is there like one topics that you really like about the book where you’re like, oh, I have this note of advice that I really want to share before we wrap it up.

Dmitriy Ryaboy 00:44:23 The section we have on handling dependencies and understanding versions, it is really not something people talk about enough. And it’s one of the biggest time sinks and sources of unexpected problems that I see people run into and not just juniors. Right? It’s really important to understand the dependencies that you’re introducing, what those libraries are, what the versions mean that you depend on and what will happen when versions change, whether or not they will change a new without you knowing it because of the way you’ve instrumented the dependencies. What 1.2 0.7, going to 1.2 0.8 means versus 1.2 0.7, going to 2.1 0.2 means. So like sort of the major, minor diversions and what to expect from this.

Felienne 00:45:09 So what are the, you’re not selling them practical tips there. So knowing what version numbers means, that’s definitely a tip, but there aren’t any practical advice or maybe a nice anecdote you have about dependencies that people can learn from. I think Chris has an idea. Yes.

Chris Riccomini 00:45:25 Again, there’s no a hundred percent guidance, but here’s your version numbers. So that things do not change underneath you is something that I am fairly dogmatic about. And there are, there are definitely competing philosophies and people that you think you should, you know, pin major and minor, but not, not the pattern in, or if you will think that you should always pull on the latest stuff, but I’ve just dealt with so many problems that stem from, like we’re pulling in some open source project. And either it, we have not pinned. And by pinning, I mean, setting an explicit version number on dependency, such that you will always use that version versus an unpaid version number where you might, you essentially leave it to the build or the build system to pick what the right version number is. And in cases where there is a new version available, it will pull that version in for you that behavior at least do all kinds of exciting stuff during a CAI and build time where things that worked and have not changed, suddenly stopped working for a variety of reasons. That class of problems is so maddening to deal with. And it’s SAS energy. And it’s just, it ruins the fidelity of your tests. It’s such an insidious problem

Felienne 00:46:33 And experienced this type of thing myself. And it is a need. It is so terrible because you have this, this commits and you’re like, okay, before this COVID works and after this, COVID it fails. So I’m going to read this a hundreds time because this is the offender. It’s not the offender. Someone else that I have no control over, as you decided to remove a feature, I was the Bendigo

Chris Riccomini 00:46:56 Even worse, you decide, okay, I’m going to revert this commit and you revert it. And it still doesn’t work. That same checksum used to work.

Felienne 00:47:06 I think this is really good practical advice. This is like the stuff from the trenches. So I’d say as well, you don’t teach this in schools or universities. You would say, you have to learn these by doing it. But of course it’s way more efficient than experts, senior people. Like you share this advice and put it in a book.

Chris Riccomini 00:47:22 One other thing I would, I wanted to throw out there, I think in line with your question about if people walk away with one thing we could upvote into their brain, what would it be? It’s compatibility. And this is kind of adjacent to the dependencies, but under really understanding forwards and backwards compatibility and what these different names mean. I think whether you’re dealing with data or API APIs, whether it’s a library API or a microservice API, that’s remote understanding what it means when you add like a required field or when you delete a required field or when you change a data type from, you know, inch too long or inch to string, like what that is going to do to the upstream and downstream dependencies of you in a professional environment. It’s so important to understand that you’re working within a team and you’re going to affect other people’s code. And it’s not something that is at all obvious. Like it seems so benign to just add like, oh, I need a field in my database, or I need a field in my API. I’m going to set it to require it. And then that simple change can bring down the whole, the whole website, the whole microservice, the whole application can crash. So I would say understanding compatibility is definitely something that we cover that I think in the evolvability chapter, we talk about that in some detail, because that would be my pet guidance.

Felienne 00:48:37 Although circles back to Dmitriy’s example of the words, button writes that’s if you make something, it is not always clear how it’s going to be used the interface around it might change. So then you in the software, you still need to be prepared against anything because you can depend on other people, you, and you don’t know like what you depend on and who I depend on you if you’re building a library or framework. So
Speaker 0 00:49:00 Have you ever really been happy with your project management tool most are too simple for a growing engineering team to manage everything or too complex for anyone to want to use them without constant product shortcut. Formerly known as clubhouse is different though, because it’s worse. Wait, no, we mean it’s better. Shortcut is project management built specifically for software teams. It’s fast, intuitive, flexible, powerful, and many other nice positive adjectives. Let’s look at some highlights. Team-based workflows, individual teams. Can you shortcuts default workflows or customize them to match the way they work org-wide goals and roadmaps. The work in these workflows is automatically tied into larger company goals. It takes one click to move from a roadmap to a team’s work to individual updates and vice versa, tied VCs integrations. Whether you use GitHub, get lab or Bitbucket shortcut ties directly to them so that you can update progress from the command line. The rest of shortcut is just as keyboard friendly with their power bar, allowing you to do virtually anything without touching your mouse. Throw that thing in the trash iterations, planning. So weekly priorities, and then let shortcut run the schedule for you with a company and burn down charts and other reporting. Give it a try at shortcut.com/se radio. Again, that’s shortcut.com/se radio shortcut again, formerly known as clubhouse because you shouldn’t have to project manage your project management.

Felienne 00:50:14 Where can we read more about your work? Like where can we find the book? We will make sure to, we put everything in the show notes, but I just want to give you an opportunity to shout out to your website or your Twitter or whatever you want us to link to in the show.

Dmitriy Ryaboy 00:50:27 So is it missing? Read me.com is the website for the book where you can look at it a little bit more. And of course it’s available on Amazon and everywhere else where books are sold. If people read it, uh, we’d love reviews. We’d love to find out what folks think about it. As you, of course know, reviews are very important for the others to have to put the book out there in the world to hear back. I am on Twitter at square column.

Chris Riccomini 00:50:51 Yeah, I’m on Twitter at, uh, C Rick Amini, so C and then my last name, the website has that controversial chapter. The test chapter is available for free. So if you go to the missing remy.com, you can download that PDF and come after us on Twitter. Now that we’ve given you our Twitter handles.

Felienne 00:51:07 Nice. Awesome. So we’ll make sure to add those links to the show notes. Thanks again for writing this wonderful book. And thanks for being on the show today with me.

[End of Audio]

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 488: Chris Riccomini and Dmitriy Ryaboy on the Missing Readme

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 675: Brian Demers on Observability into the Toolchain

SE Radio 674: Vilhelm von Ehrenheim on Autonomous Testing

SE Radio 673: Abhinav Kimothi on Retrieval-Augmented Generation

Menu

Recent posts

Search

Search

SE Radio 488: Chris Riccomini and Dmitriy Ryaboy on the Missing Readme

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 675: Brian Demers on Observability into the Toolchain

SE Radio 674: Vilhelm von Ehrenheim on Autonomous Testing

SE Radio 673: Abhinav Kimothi on Retrieval-Augmented Generation

Menu

Recent posts