SE Radio 434: Steven Skiena on Preparing for the Data Structures and Algorithm Job Interview

Steven Skiena, author of The Algorithm Design Manual and a professor at Stony Brook University, discusses the use cases for data structures and algorithms in a professional setting. Adam Conrad spoke with Skiena about why these concepts need to be studied by professionals and not just students. They discussed strategies for studying The Algorithm Design Manual when prepping for technical interviews. They also explored several examples of real-world data structures and algorithms at places like Google and why there is such an emphasis on this kind of analysis in interview settings.

Show Notes

Transcript

Transcript brought to you by IEEE Software

Adam Conrad 00:00:21 This is Adam Conrad for software engineering radio. Today we’ll be talking with Steven Skiena about technical job interview preparation with data structures and algorithms. Steven Skiena is a distinguished teaching professor of computer science at Stony Brook university. He serves as the director of the Institute for AI driven discovery and innovation at Stony Brook. His research interests include algorithm, design, data science and their applications to biology. Skeena is the author of several popular books in the fields of algorithms programming and data science. The algorithm design manual is widely used as an undergraduate text in algorithms and within the tech industry for job interview preparation, Steve, thanks for coming on to software engineering radio.

Steven Skiena 00:01:03 Thank you very much for having me.

Adam Conrad 00:01:05 Normally we would start with the definitions for our topic, but for data structures and algorithms, that’s a bit too fundamental. So if you’re looking for a definition, I would encourage you to check out a source like Wikipedia, but for today’s topic, I want to focus more on the why, why are data structures and algorithms just so fundamentally important to our understanding as professional software developers.

Steven Skiena 00:01:29 I think of data structures and algorithms just kind of being the primary abstraction behind computer programs. Data structures are the things we can do without thinking. they need to be, how do we, and th they’re kind of the thing that we organize algorithms around in working on an algorithm. You’re typically manipulating data in one way or another, and they have been developed a lot of very clever data structures that enable you to do certain operations very, very quickly and efficiently. And it turns out to be very natural to structure, a lot of algorithm, design problems around what data structures you have access to and what operations they do. Another way. W when I, when you pose this question to me, I started thinking about it a little bit. And one thing that I think is that algorithms and data structures are really kind of what separate computer scientists from coders. Okay. Computer scientists have ways of thinking about things that are through an algorithmic lens. And I think that one of the importance of learning how algorithms work and how data structures work is basically to turn you into a real computer scientist. And that’s probably one of the reasons it’s so important in interview prep.

Adam Conrad 00:02:43 What are some examples of data structures and algorithms we use in everyday work that we may not realize we’re using?

Steven Skiena 00:02:51 So the simplest data structures that you probably use in a programming or, or res or matrices, you know, if you grew up programming mat, lab, or Fortran, probably these were the only kind of data structures that you basically had access to rays and matrices. And they were good for certain kinds of things that are obviously good for certain kinds of numerical computation. They’re good for scoring things, although they don’t really give you support for fast retrieval by looking by content. Okay. I guess one way to think about an array array, lets you look up something when you know where it is, but if you want a fast way to try to look up a particular element, a particular record, you start getting into more sophisticated data structures for, for storing things, typically old dictionaries and the kind of dictionaries you’re probably exposed to are things like hash tables. hash tables are very, very practical data structures for storing keys. So you can look them up quickly by content. and there’s a whole host of other traders structures, things like binary, search trees and priority chews and lots of different things. So they’re kind of a certain number of canonical data structures that any, you know, reasonable algorithm designers should know about because they provide certain operations to retrieve data quickly or to manipulate data quickly. Okay. On a, you know, a reasonably large dataset.

Adam Conrad 00:04:26 One thing that stands out to me is that these fundamental data structures and algorithms may take a different form or a different name than we’re used to when we see them in an academic setting. So we may call them a raise or link lists, but in JavaScript, for example, we do have a raise, but we also have objects which represent hash tables or dictionaries. So it’s interesting that sort of each language has its own way of abstracting itself. these fundamental objects into things that are a part of the nomenclature for a given language Python is another example where they make frequent use of dictionaries.

Steven Skiena 00:05:00 Well, one thing that you’re saying that I think is interesting is to realize that different programming languages make a certain degree of algorithm more visible or more transparent than others. So when you’re programming and something like C, you are programming, you know, near the sort of near the bare metal and an array as an array. And when you look something up, it’s basically going to a memory location that who’s who’s positioned. It can compute in constant time and pulling the thing out in languages like Python, for example, arrays are implement that has hash tables. These are these more complicated, but more general probabilistic data structures that are able to retrieve items by content that are very, very powerful and very, very general and something every kind of programmers should know about.

Adam Conrad 00:05:53 Another example I can think of would be C so in CNC plus plus standard templating library offers vectors, but in academic setting, these would be considered lists or link lists, but you know, language like that, vectors are essentially the names that represent these kinds of fundamental objects. So I think in summary, what we’re really thinking about here is ensuring that there are fundamental structures and they just may be part of a different name depending on the context of the language that you are using.

Steven Skiena 00:06:22 Right? So I guess in a lot of people, my students always tell me who tends to program in Java talk about hash maps. And these are kind of, obviously, I guess, by the name, people get the idea they’re hash tables under looking it, but, but there’s a level of abstraction that, that when you’re using a pre implemented data structure and abstract data type, you don’t really know, you know, what the operations are, you know, what the definitions of what you can do on them and the operations you can do on them, but you don’t really know how they’re actually implemented. And one of the things that’s interesting is for a given set of operations, like the main operations on a dictionary, things like inserting items and looking them up by content and deleting items, there are many different ways you can implement this data structure where the performance of certain operations is better than the performance of certain other operations and that understanding that difference is important to getting good performance on a lot of applications.

Adam Conrad 00:07:25 Can you give us an example of an algorithm we might want to write on our own?

Steven Skiena 00:07:28 So if you want it to, for example, search through items in sorted order, you started hinting about the idea of sorting as kind of a fundamental algorithm problem. If you want to a common operation is you would like to take elements that maybe, you know, that you’re working with and retrieve them one by one and kind of as a sorted order. So let’s say that you were maintaining a, a queue of jobs. You might want to process the jobs in order from highest priority to lowest priority. And you know, the, do the most important jobs before you do the, the less important ones or do the ones that hadn’t been looked at for awhile before you do the ones that had gotten attention recently, if you have kind of a priority score associated with each item, you might want to be able to insert new items.

Steven Skiena 00:08:22 You might want to be able to change the priority of an existing item. You might want to delete an item because it’s no longer something you care about, or you might want to find, say, get me the next most important item. And being able to search through something like that, a hash table, which is normally a great data structure would be very bad at that because it doesn’t maintain a sense of order among things. sorted array maintains an order of things, a binary search street, main things in order of things, something called the priority queue, main things in order of things. But so hash tables, a great if you’re looking things up by name and you want to get, you know, get items by name, if you want to store me and tell me, what is the next most important thing or the next thing in alphabetical order hash tables would be a bad idea.

Adam Conrad 00:09:16 Yeah. And this makes sense because for some languages like JavaScript, they don’t have a concept of a priority queue, but for other languages, like see you have something available in the standard library to access,

Steven Skiena 00:09:26 Right? So if, you know, in certain languages that may be something like a standard template library, which will have things like priority choosing them. And so if you are using a language that has a good library of abstract data types, it’s important to kind of know what each one of these is good for. And that’s one of the reasons to take an algorithms course or to study algorithms is because knowing why a priority queue is a useful thing and where it is, where it comes in, enables you to use, first of all, if you have a fast implementation, then you know why you would use it. And it gives you an abstraction to think about when you think around, when you’re trying to design an algorithm. A lot of my students before they take an algorithms class, they say, well, everything is a container. And they kind of are used to just saying, well, I’ll go through all my data and I’ll look at it. Maybe one by one, realizing that there’s these richer abstractions available gives you a lot of power. When it comes to thinking about data, there was a quote I once read that said that civilization advances based on the number of things we can do without thinking, once you understand the kind of basic abstract data types and what they’re good for, you can think at the level of these data types rather than low level things like loops and arrays.

Adam Conrad 00:10:53 So what you’re really saying then is that it’s so much more important to truly understand intuitively what we’re working with, because anyone can really handle the implementation pieces of the algorithms or the data structures. But when you are able to Intuit the use cases for them, you now can start to think about a whole set of problems to solve. So with the priority queue example, we don’t have to think about how does a priority queue actually get constructed, or how do I think about the insert and pop instead? I think about priorities. Okay. I have a list of items. I have to think about the priority. I now understand how a priority queue works under the hood. And so for me, it’s now about thinking about the applications of those things now that I understand really what’s going on underneath and why I would use these algorithms and not just the what or the how

Steven Skiena 00:11:45 Right. And that’s one of the reasons why in a lot of these data structures to implement them to the point where you, you know, they’re, they’re usable is a little challenge, but not insurmountable. And, you know, in my book, I include implementations of, you know, important data structures like stacks and Jews and priority Jews and all kinds of things. Because I think that it’s important to understand how these things work under the hood.

Adam Conrad 00:12:14 So if I’m studying for interviews, there’s so much to cover. What one area in particular do you think requires the most study when prepping for interviews?

Steven Skiena 00:12:25 Well, I think that learning probably the single data structure that people need to know most about is probably hashing that I once heard a lecture from a guy who was eventually became the vice president of search at Google. And he said that the three most important algorithms at Google were hashing, hashing and hashing, and hashing is useful as a technique for building a, you know, a dictionary structure. So you can look things up, you know, you can insert items and you can look up and get them. But hashing is, is a very, very powerful technique and idea for doing lots of other kinds of things. So maybe an example of this would be how does Google know that a, when Google has its crawler walking over the web, when it fetches a document, how does it know that it hasn’t seen this document before?

Steven Skiena 00:13:29 You know, you could think my God, if Google really wants to know if this documents I’ve seen before, it’s got to compare it against everything on the web. And that would be, you know, order the size of the web. But what if let’s say you compute a hash code on a document, a hash code is kind of a numerical quantity that is designed to be deterministic for a particular file. In other words, it’s for anytime you see this file, you’re going to get the same number out of it. It’s like an, a unique identifier, but you can use a hash function to construct an identifier that is long enough that it’s unlikely if this document had not previously seen, been seen by know, on your searches that some other document will have the same identifier. So it’s long enough to be pretty unique, but short enough to be quick to work with and for doing things like identifying duplicate documents.

Steven Skiena 00:14:33 if you’re trying to build a plagiarism detector to try to figure out, you know, my students am, I am ambitious. Students might be trying to copy a homework from somewhere on the web. How will I know whether or not part of their document, okay. Was in fact stolen from a web page? Well, if you have windows of the text, you get these unique codes, you can see whether or not this code has existed anywhere else in previously submitted assignments or the web or something like that. And that, you know, that, that, that can give you a way to, to find these identifier. So hashing is a very useful thing in many different ways. We think of it as a, as a data structure thing, but there’s a lot, it’s more than just building a hash table. Hashing is an idea, you know? well, you know, we’ll talk about my book, I guess at some point, but as I’ve revised my book for a new edition, one of the things I did was emphasize hashtag in this latest revision, because I think it’s something that’s really important. That’s kind of only grown important and important. So in recent years

Adam Conrad 00:15:44 You talking about hashing actually reminds me about text compression. I know that it texts compression. What they are trying to do is take a lot of data and pull it down into very small pieces. So, right. Like in that sense, you’re using hashing to take long strings of texts and to bring them down into much smaller pieces.

Steven Skiena 00:16:00 Actually it’s a little, a little different there, I would say, cause there there’s somewhat different goals when you compress your document. The one thing you care about is that when you uncompress it, if it’s a text document, when you compress a document, it’s important. Yes, you want to compress it. So it’s smaller, but you absolutely want to make sure that when you reconstruct it, you can reconstruct the entire document from the compressed version. That’s certainly true when you’re doing a you know, any kind of text compression. You you’d be upset if you compress your, your, you know, research paper and it came back garbled. So the, the key with a hashcode is that you can find a very, very small representation that is unique. It’s not enough information to reconstruct the whole document from it, but it’s enough that it’s sort of unique. And so you can tell whether you’ve seen it before. So, you know, typically for example, I guess if you use a typical text compression algorithm, maybe you can shrink the document to 30% of its original size. When you use a hash code, you could take any document, you know, maybe, you know, a megabyte long or even longer and produce a hundred, 128 bit code or at 256 bit code. That would be fairly unique to that document.

Adam Conrad 00:17:24 Right? So in that sense, we’re actually taking what would be like the title and abstracting out the entire document into this one sort of unique identifier.

Steven Skiena 00:17:33 It should be a way to recognize whether I’ve seen it before without actually reconstructing it. Okay. So maybe it may be as a metaphor. You might imagine something like you found the most unique phrase in the document. Okay. If I only stored that if I saw this, you know, this phrase, I would recognize I’ve seen this document before. Okay. But of course you couldn’t reconstruct the rest of it.

Adam Conrad 00:18:01 That makes sense. Switching things over to algorithms. So we’ve covered the fundamental data structures we should really be thinking about. And we’ve seen some of the ones that we may not realize we’ve already been using for a really long time, but switching over to algorithms, that one takes the most focus for people in interview prep, especially graphs entries. I find when people are studying for algorithm exams for a tech interview, prep, the focus so much is on graphs, entries, traversing them, navigating them, finding ways of sorting items within graphs and trees. It’s such a fundamental piece of what we’re doing in tech prep. I think I remember seeing from Facebook wants that they said, if you really focused on graphs and you would get the most bang for your buck there, same with trees. So we’d love to learn more about understanding fundamental algorithms as well.

Steven Skiena 00:18:50 Okay. Before I even get into algorithm. And I want to make sure that the idea of a graph our network is clear to who’s listening here because a lot of different kinds of data in the world and an astonishingly wide variety of data in the world really can be kind of represented as networks of nodes. Okay. Or vertices pairs of which are connected by edges. That’s what a network is. And so the road network you know what, what, what intersections are there in the world and which pairs of intersections are connected by roads that would define a graph. And obviously there’s, there’s shortest best problems that are on road networks that are quite natural, like finding the shortest path between places when you use your GPS. it’s kind of amazing. You can say, find me the shortest path from here to some obscure place, you know, all the way across the country and for free in, you know, in, in a fraction of a second, you know, you’re a Google or whatever your map system is, is going to tell you the absolute shortest path in this network in order to do that, there’s got to be a fast algorithm.

Steven Skiena 00:20:07 There’s very interesting fast algorithms like dykes bruise algorithm that finds the shortest path between any two places in a network. That’s obviously a useful thing for understanding roads, but what’s interesting about graphs graph problems is that there’s a small number of algorithm problems that are relatively common, that tend to appear in a lot of different settings. And so the thing that’s useful for interview prep is to know the basic graph problems, things like shortest path, things like finding minimum spanning tree things like how do you do a traversals of graphs to walk over the network? These are things like breadth first search in depth for search and understanding these things gives you a lot of power for solving a lot of different problems.

Adam Conrad 00:21:00 And we can reference these data structures and these algorithms all within your book, the algorithm design manual. So switching gears Steve Yogi, a very famous blogger and ex Googler once referred to your book as the book that helped me understand just how astonishingly commonplace and important graph problems are. It has since become Canon as one of the top books to study when preparing for job interviews. So my first question for you then is why do you think this book is so heavily cited for job interview prep and what is people’s allure to it?

Steven Skiena 00:21:31 First of all, it has been, you know, I’m, I’m very gratified by the number of people who have used the the book. And I, I get letters from people quite often to say, you got me your job at my job at Google. You know, I read your book and I didn’t know the algorithm stuff. And then I read your book and I was able to squeak by my interview. And so I’m always very excited when I get people who say these things. What I think makes my book useful for interview prep is, is a couple of things. One is, it is I think, more accessible than most other algorithm books. Algorithms is an area that is traditionally a part of theoretical computer science. And it’s an area that involves a lot of mathematical analysis. It’s an area that involves a lot of reasoning and formal reasoning and proofs of correctness and proofs of efficiency.

Steven Skiena 00:22:25 And many of the standard texts in algorithms kind of are approaching it as here is a famous algorithm. I want to make sure you understand it to the level that you can, you can prove its efficiency and prove its correctness. And I think it’s easy to lose sight of the forest for the trees. What my book tries to do is to stress the intuitions behind the algorithm design. You know, how is it that you should think about these things? What should you know, when you learn about sorting algorithms, the average person learns about a lot of sorting algorithms and they learn about analysis of these and all, but the question of why are there different algorithm? What are really the lessons you should take away from it when you learn about something like Quicksort, which is a famous sorting algorithm. It’s not so much that, you know, Nathan, you know, it’s good to know how it, you know exactly how it works and stuff like that, but it’s kind of this representative of using randomization in algorithms and understanding why it should work well with high probability, even though there’s a terrible worst case, that’s an intuition thing.

Steven Skiena 00:23:40 And maybe you could see it by a big formal analysis, but I think my book gives, it gives a lot of intuition. I find that it’s very easy to get lost in the intricacies of algorithm analysis. And so that’s why when I write my book, I really want to leave it at the intuition level. And I think people find it a lot more accessible. I hear some people who say that, Oh, I tried to study one of the, the you know, really detailed algorithm books, something like the corpsman book, which is very famous out of MIT. That’s a great book, but you have to kind of understand some things in order to really appreciate it. And I think that going through my book, people can get, get an intuition for things. And that’s kind of what I, what I, what I try to teach.

Steven Skiena 00:24:27 I really try to teach and intuition. And I think that that, that pays off rather than, rather than being excessively formal about, about things. I really want people to understand what’s important and why. So that’s one reason why it’s used for interview prep. Another thing is that I do include implementations for many of the, the, the primary algorithms. So people do get a chance to look at some code. And when they look at some go, they, you know, they, they, some people think better about things when they see code. And certainly it adds some level of concreteness. So I think that the combination of the intuition of ideas, which is what I write about and seeing the concrete implementation resonates with people,

Adam Conrad 00:25:14 It resonated with me. I took your class online, audited it, as well as studied the book. So for people that are actually taking the class or reading the book for interview prep do you think any independent learners who are doing interview prep need to do anything differently than your students are doing when they’re taking the course?

Steven Skiena 00:25:33 So when people use the book for interview prep, there’s kind of two different aspects of interview prep, where out in the course of interviews, algorithms knowledge tends to come out in two different ways. One thing is that many companies do screenings. Things like by companies like hacker rank, they have coding contest, coding tests that they make people do before they screen them before they even give them an interview. And quite often, these programming tests are about algorithmic kinds of problems. So one aspect of being a, you know, let’s say getting through the interview process is making sure you pass your basic algorithmic coding thing. And part of that is looking at code and playing around and implementing some of the classical algorithms. I hear people, some people basically, at least if they don’t, at least they study the implementations in my book, but, but usually people want to, you know, do some programming with them to make sure they get a hang of it.

Steven Skiena 00:26:42 So part of it is fluency with programming. And the other part is the whiteboard preparation. If you go to an interview at companies, usually they will, at some point ask you an algorithm, design problem at a whiteboard, and they want to hear how you think they want to hear how you approach it. They want to hear what questions you want to ask, you know, to, to make sure you understand the problem. And so these are kind of two different, two different skills to learn what I teach my algorithms course. Again, it’s, you know, at Stony Brook university, we, I don’t make them reimplement classic algorithms because I think that it’s more on the conceptual idea. So they don’t get experience at doing the, you know, some of the kind of contest level programming that these programming contests puzzle type programming. So for that, actually, it’s, it’s funny.

Steven Skiena 00:27:43 I, I have another book which I wrote called programming challenges, which was designed to teach people how to do puzzle problems for programming contest. And I think this is something that is part of the process of studying for an interview, but the deeper and more meaningful part I think is, is being able to go to a whiteboard and know how to think about algorithms. And for doing that, I think that reading through the algorithm book, you know, in the order that it’s been skipped, the stuff that you know about, although don’t be too quick to skip the introductory chapters, which were on data structures, because if you don’t understand the basic data structures, you’ll eventually kind of we kind of assume that, that you kind of have mastered that. and you’ll eventually kind of fade out if you you know, don’t have the basics down, but I think that learning how to solve problem B you know, work at a whiteboard involves solving algorithm, design problems. And those are the kinds of things I do assign my students for homework. And that’s the kind of thing that I have plenty of these kinds of problems in my book to give people practice at solving these kinds of problems.

Adam Conrad 00:28:56 And that raises such a good point because for many people, they look at these books and they think, well, how is this going to help me? If I have to implement breadth first search, you know, I could do that and I can go through every single step in the pseudo code, but that’s not really getting me anywhere, but what matters is the underpinnings of what breadth first search represents? So if I know that I have to say, do something like shortest path, I know that I have to think about in my head going from one place to the next, and I have to explore all those opportunities in order to get there. So I think you raise a really good point that it just, isn’t going to cut it. If you’re just going to memorize things without really retaining and understanding and intuiting exactly what goes into these algorithm designs. If you can get there, then you’re so much further along because now you’re adapting to patterns, you’re understanding the underlying question that’s really being asked, and you can apply these algorithms through that kind of analysis when you’re tackling these kinds of problems.

Steven Skiena 00:29:56 Right. Right. And, and there also are a lot of kind of problems that are what I’ll say using the same general techniques, but the words differ. And so if you don’t know, and, and, and, you know, the details differ. And so, so like for example, there’s an algorithm design technique called dynamic programming. That is a you know, it’s, it’s a technique for designing algorithms that once you understand it okay, is actually relatively easy to apply. But until you understand that it’s like magic, you know, so one of the things you have to do is you need to see enough examples. You need to try to understand you know, of different seemingly wildly different problems, all of which kind of turn out to be the same technique. And I think that that’s the kind of thing you can get from a, you know, book on a, you know, algorithms is a, is a very, very interesting, and well-developed other than design, very interesting and well-developed subject. And I think that you can get that from a, you know, essentially a course on algorithm design in ways that I don’t think you can get so easily by just solving a couple of puzzle problems. There’s a certain, you have to kind of go through the thinking problems. And those are the kinds of things that you know, we do in the book.

Adam Conrad 00:31:18 I think a really important thing to also emphasize is that for many people doing just the problems in the book, isn’t going to cut it. I know for example, when I took an algorithm scores, when I took your course and audited, it, it was really difficult for me to understand dynamic programming, for example. So I had to read through all the examples in the book multiple times. I still didn’t get it. And as you pointed this out earlier, that you basically have to see the example enough times in order for you to have it click. And for me, I had to go to Wikipedia. I had to watch a couple of YouTube videos for it to really stick. So I think just for people out there who are doing tech interview prep, it’s not going to be sufficient to say, okay, I’m just going to go through these five problems that were assigned for this chapter.

Adam Conrad 00:32:01 And if I do that, then I’ll know everything there is to know about dynamic programming. that’s just not going to cut it. You have to really be able to have the honesty with yourself to say, okay, I am going to study this until I actually understand fundamentally what is going on. And that may take more, or maybe even less than what is regarded in the book. I think you mentioned this earlier, that if you’re going to do something and you’re going to skip the fundamental pieces of the data structures that you might want to make sure you really know it, but you can gloss over it if you really know those areas really well. So for things like, yeah, like dynamic programming, just make sure that you really understand it and give yourself permission, do more problems so that you can really understand it and don’t shut yourself short.

Steven Skiena 00:32:43 Right? So, I mean, you know, algorithm design is really about a certain number of ideas and techniques. You know, it’s not an unbelievably long list of ideas and techniques, but they are a little bit deep to understand. And they’re a little tricky to understand, to get a sense as to when a divide and conquer kind of algorithm one where good things happen. If you split the problem into smaller pieces that are of the same type, solve them, and then put them the answers back together, something like a divide and conquer, not knowing what a binary search type of strategy is an interesting thing to do. You know, often people will not tell you implement the binary search. They will tell you, maybe tell me how I can implement something to find a square root, an approximation of the square root of a number. And, you know, you could do that.

Steven Skiena 00:33:41 I guess if it gets, maybe if you remember your your math well, enough, you may remember something about Taylor series or God knows what, but if you don’t, if you think in terms of binary search, you can get this idea of honing in on an answer. You try a number and you square it. Okay. And if it’s bigger than what you’re trying to find the square root of, well, that tells you something about a number that’s too big to be the square root. And if you try a smaller number and when you square, it, it’s less than your target, then that number is too small to be your square root. And by doing kind of a search like this, a binary search, you can hone in on your answer. And I guess that’s, again, that’s a specific kind of a problem, but there are relatively small number of techniques that if you really kind of understand it, you can solve lots and lots of different algorithm problems. And I think that we don’t an introductory algorithm class or a book like mine is enough to really kind of get what these basic techniques are.

Adam Conrad 00:34:45 Addition to actually creating the algorithms. We also have to analyze them. And one way we talk about that is with big O notation. and that’s a huge thing that comes up in interview prep. So what does the O stand for in big O and what exactly are we looking for here when we’re thinking about analyzing an algorithm to figure out its efficiency?

Steven Skiena 00:35:03 The big goal is one of the things that my Agland students dread unnecessarily. It’s something that looks a little mathy, looks a little scary, but big old to my mind stands to sort of like on the order of things. So one of the things that makes algorithm, design and analysis, a good and enduring subject is that you can kind of analyze algorithms in a computer independent way. You’re trying to understand the running time that an algorithm is going to take in terms of the size of the input. It makes sense that the sort, if I ask you how, you know, what is the, the running time of, Quicksort the answer, shouldn’t be 10 seconds. It should depend on how many elements you’re actually sorting. You’re sorting a few elements. It’s obviously going to be faster than when you’re sorting a large number of elements, but we’re interested in how is the running time grow as a function of the size of the problem you give it.

Steven Skiena 00:36:07 So typically we are given a what you call, we’ll say that we have N elements of the input, and we want to know something about what is the rough running time as a function of N. And this notion of rough running time can be kind of made precise through this notation. They call the big O notation. We don’t care if we’re off by a factor of two or 10 or a hundred, but we do want to know how it depends on N which is the size of the problem. So some algorithms look at the data once or twice or a hundred times, and each element once or twice or a hundred times, if you have an items that will take order and time sometimes you look at all pairs of things and there’s N squared bearish. And so that would be something that is order and squared.

Steven Skiena 00:37:06 And some of these fancy algorithms and data structures have running times that are proportional to something like N times the logarithm event. So there’s a little bit of math involved in knowing what logarithms are and where they, they arise, why they come up and, you know, but, but this, this big O notation gives you a way to talk about how fast an algorithm is in a kind of machine independent way. This is one of the reasons why it’s good to learn algorithms because it is machine independent. It’s something that will stick with you. The, you know, the stuff that I learned about algorithms 30 years ago, every bit of it is still true and still still useful. And one of the reasons is because we are not talking about running times in terms of seconds, but running at times in terms of the big O

Adam Conrad 00:37:59 One example, that really makes sense to me is around Fibonacci sequence. So that’s a really common one. When we start talking about recursion or even dynamic programming, really small numbers for Fibonacci you can answer them instantly but for even normal numbers, like 50 or a hundred something that you can conceptualize even these small numbers, if you run them with Fibonacci on a naive algorithm, it won’t be able to run on your machine. It will just heat up and it won’t be able to finish in time. So that’s sort of how I see this applying for folks in the real world is you have to understand the algorithmic complexity, because if you can’t do something in a short amount of time, very quickly for exponential algorithms, things can get out of hand, but if you have a good algorithm for this, and you can analyze something that is much more efficient, you know, for example, with dynamic programming, you can do Fibonacci in a very optimized way so that it can run in linear time. So, you know, choosing the right algorithm can really be important,

Steven Skiena 00:38:59 Recognizing the difference between polynomial algorithms, time algorithms, things that are order in, or order and squared and things that are exponential, like order two to the end or N factorial is, you know, one of the, the, the most fundamental things in algorithm design and that often recursive algorithms left of their own devices turn out to be exponential time algorithms. And whenever anything is exponential, it grows obviously rapidly, but you don’t see it growing rapidly until it hits a point. And in fact, this COVID epidemic that we’ve been living through is a good example of this. The number of people infected by the virus unchecked by preventive preventive measures would grow exponentially. And what happened was it was doubling, let’s say every week and doubling from one to two patients, nobody saw it two to four patients. Nobody saw it, but when suddenly you went from 1 million to 2 million and then 4 million and 16 million suddenly it becomes very visible.

Steven Skiena 00:40:14 And when you look at exponential time algorithms, it’s the same kind of thing. You know, I, one thing I do in my classes, I have students implement an exponential time algorithm. And I have a contest for the fastest implementation where they have to be clever to try to keep the exponential growth at Bay as much as possible. But everybody starts being very, very happy because they run it on the small input files and it’s almost instant. And then they go to the next largest size and suddenly they hit a wall. And it, you know, instead of taking seconds, it will take hours and learning to recognize when you have an algorithm that is exponential. Time is a fundamental thing in algorithm design, sometimes it’s okay because, you know, there there’s techniques like backtracking and pruning and branch and bound. These are sort of search techniques to kind of can sometimes push that exponential stuff, push it down enough that you can still solve interesting problems on it. But then there are other algorithm design techniques, like you mentioned, dynamic programming that can turn what would have been an exponential time recursive algorithm into a pussycat that’ll run in linear time, or maybe in squared time. And again, being able to see things requires understanding, you know, first of all, what’s the difference between exponential and polynomial. And so recognizing when you’ve got something that’s going to be slow for large N and learning a couple of techniques that may give you a chance to beat that down.

Adam Conrad 00:41:52 Okay. So I’m convinced I’m a developer. I just lost my job. And I know I need to study for data structures and algorithms to get a good job at a great prestigious company. How should I read your book? Is there an order I should change based on the class or how the book is written? Is there anything that I should do because I’m a job seeker instead of a student to best prepare for tech interview prep using your course materials and your book.

Steven Skiena 00:42:20 First of course, how much you should read depends upon how much time you have and how much time you can devote to it. Obviously I think that, you know, if you read the whole thing, you will know more than if you read a part of it. But again, there’s two different. I organize the book in these two halves kind of, because I think that there is a distinction between the basic techniques that you really kind of need to know to design you know, basically to design basic algorithms, okay. That’s his stuff in the first half. And so that stuff I think is like bread and butter. The other stuff is it’s true that there are a certain number of algorithm problems that arise all the time in practice. You guys may have heard of the traveling salesman problem. We talked about finding the shortest path there’s problems in geometry, maybe the first thing that happens when you deal with points, you might be interested in something like finding something called the convex hall.

Steven Skiena 00:43:23 So there’s a certain number of specialized algorithm problems that are not elementary. They’re good things to know, and they’re good things to recognize when you see them. And so the second half of my book is kind of, I call it the Hitchhiker’s guide to algorithms. It’s kind of a catalog of what do I think are the most important kinds of algorithm problems that tend to come up and that people, you know, algorithm researchers have figured out what is the right way to think about these kinds of problems. So the second part is kind of like for every one of these problems, I give you kind of an executive summary of what, what you should do. If you have this problem, if you have a traveling salesman problem, what should you be concerned with? How should you deal with it? And I think that it’s very good for people to browse through the back of the catalog.

Steven Skiena 00:44:18 At least if you can get an idea of what these problems are and keep them in your head, what is a set cover problem? What is a traveling salesman problem? What is a knapsack problem? When you see these kinds of things in practice, you can, you know, kind of take advantage of that knowledge and say, wait a second. If I know that this is a set Gover problem, and I know Skinner’s book had four pages about what you should do. If you have set cover problems, just recognizing what the problem is. And then referring to this for reference gives you power. And so I think that it’s important if you’re doing an interview prep, you should probably, I think you should try to master as much as you can of the first half of the book. And I think you should try to browse through the catalog.

Steven Skiena 00:45:10 I, I try to have pictures of each problem so that you can kind of get an intuition about what is actually going on with it. And a lot of people find the pictures useful for kind of pinpointing what the challenge they face is really cold. Cause a lot of problems, if you know the name of it, you can look something up, but if you don’t know that a knapsack problem is a knapsack problem, you’re not going to find the algorithm by Googling it. And I think that, so I think the catalog is, is fun to browse through. I think people, people like looking at the pictures and if there there’s, when they find the one that they really care about, I think that it quickly gets them going in the direction that is most useful for their application.

Adam Conrad 00:45:58 We know the structure of the book, but just as important as reading the book is practicing the problems within the book. Now, if I’m an independent study or who is trying to study for interview prep, how do I know if I got them? Right? Are there any places I can go online where I can figure out the solutions to the problems that you have in your book?

Steven Skiena 00:46:20 So I have a website with the book, www.algorithm.com, a L G O R I S t.com. But so part of it is I keep an, an online solution Wiki there. So people who have had these problems have tried to work out solutions and you know, if people want to go there, that’s a good way to get some kind of a solution. One hint that I would recommend for people. It may be isn’t obvious is that it is good to work out algorithm, design problems with somebody else. If you have a buddy who is interested in learning this at the same time, you are, I think it is a great thing. If the two of you can sit together and argue with a Blackboard, what is the right way to do these things? Because one of the hallmarks about algorithm design problems is there is a flash of an idea that once you have it defines the solution, but until you see the flash of the idea, you’re kind of muddling around and, you know, getting to people and talking one guy poses as an idea, the other guy says, why it’s wrong.

Steven Skiena 00:47:34 I find that itch. I like to work things out with somebody else when I’m doing algorithm design and research. I definitely like to have somebody else at the Blackboard with me and we generate much better ideas, arguing, back and forth. And I think that the same thing is true with, with, with prep. If you can find a buddy, this is a good thing. If you need solutions. I think that the website that the, the solution [email protected] is a good one algorithm by the way, is an old English word for someone who’s skilled in problem solving and numerical reckoning. And so that’s why I kind of chose that as our our website.

Adam Conrad 00:48:19 And that’s great to hear. So if I’m able to study for this, I can actually go to the algorithm and check out the answers to the problem. So I know I’m headed in the right direction now for some people to play devil’s advocate, they fundamentally don’t even want to take these kinds of questions because they think that I shouldn’t have to be asked data structures and algorithms questions in interviews. And many people think that the whole process is fundamentally broken and how we hire what is your take on the heavy focus on data structures and algorithms questions in the industry?

Steven Skiena 00:48:49 So, first of all, I think algorithms and data structures are an incredibly beautiful and useful subject. And I think of them as what distinguishes a computer scientist from a coder this much, I think is true. That said, I I’ll admit that I am, you know, I’m not sure that the emphasis on algorithm design, okay, as a test thing to screen out good software engineers from bad software engineers is necessarily the best way to do it. We had a startup for a while named general sentiment, and we used to hire engineers and all the engineers that we, we were interviewing with us, they knew they were going to be speaking to me. They all knew about my book. They figured I was going to grill them on algorithm design. The honest to God truth is I never asked any of them, any algorithm, design problems, because I was the company needed people who were building big distributed systems.

Steven Skiena 00:49:50 And I needed to know sort of the sophistication of the kind of thing they could build. And, you know, in the course of discussions, what was their biggest program, whoever wrote, what did it do? You can get an idea of whether people are, you know, who have worked with things with a certain level of complexity and what kind of things they’d like to do. So I do think that algorithm design is a very important thing. I think as a professor, I can usually tell, you know, who’s good or bad at that kind of thing based on the grade in their algorithms class. So I don’t think that, you know, people usually can’t sneak through an algorithms class with, with get a good grade if they really don’t know what they’re doing, but it provides a way that people think. So I guess the, the, the, the rationale here is that it provides a way to understand how people’s thought process is. And sometimes you, you know, you know, that’s, I think a critical thing to tell when you’re interviewing somebody, how do they reason about things, you know, can they do deep, you know, do a certain level of thinking and headed this. And I guess on that level, I think algorithm design is a good way to test that.

Adam Conrad 00:51:03 Now I noticed on your publisher’s website, you have a new version of this book coming out, which hasn’t been updated since the second edition way, way back in 2008. So there’s a lot to update in 12 years. What has changed in the upcoming version of the algorithm design manual?

Steven Skiena 00:51:19 Right. Well, first of all, I just wanted to say this is as far as I’m concerned, the first official announcement of the third edition I’m always reluctant to talk about a book until it is completely completely done. at this point, if I got hit by a car, the book is at the publisher. And so I can guarantee you, they will be a third edition out probably by the time people listen to this. So what’s changed in algorithm design. So one of the great things about algorithm design is that it is a very foundational classical thing. And so things don’t change at the speed of light, but there, you know, when you look at it over the last 10 years, there are a lot of things that have changed. One thing that has changed is randomized algorithms. The idea of using random numbers to design efficient algorithms is something that has grown in, in power and in how widely it’s used.

Steven Skiena 00:52:15 So randomization is one thing that’s new. I knew I have a full chapter on randomized algorithms, which I wouldn’t have had before. Hashing is actually an example of a randomized algorithm. So hashing is at least in the way that you analyze it. And you think about it, there is a connection between hashing and randomization. That’s kind of important. So that’s one thing that’s new. Another thing that’s new is I added a real chapter on divided conquer. So you know, I had it in bits and pieces, but I, I give a much more thorough treatment of that because that’s something else also, that’s become more important as you have parallel computers. One of the things that you would like to do with a barrel, a computer is to take a job and break it into smaller jobs so that each processor can do something different.

Steven Skiena 00:53:06 And that’s the kind of thing that divide and conquer is very good about if you’re gonna be thinking about parallel computing, that’s one of the things that’s important. The other thing that I have added was a new chapter on dealing with NP, complete problems. We were talking earlier about exponential time algorithms versus polynomial time algorithms. There’s a class of problems called NP complete problems that have the property, that there is no algorithm possible for this problem, as you know, for some weasel words, for what I mean by possible, but essentially once you can prove that a problem or show that a problem is NP complete, it means that there’s not going to be any algorithm that runs in a polynomial time for you’re doomed to an exponential algorithm, if you want the exact answer, but there are these clever techniques for trying to come up with approximate answers, SureStep answers that are important to understand.

Steven Skiena 00:54:06 And so I’ve added a chapter about these heuristic techniques, things like simulated, a kneeling and stuff like that, about ways to argue that your algorithm is come up with a humoristic and argue that it will never do that badly. I’ll give you always give you the right answer, but it’ll always come close. And finally, I there’s a lot of stuff in the news about quantum computing. And so I give a very quick introduction to quantum computing, but I think that someone can read this and there’s eight or 10 pages there. And without going into the, all the horrible physics and complex numbers and all the yada yada, I think that it gives a pretty simple idea as to what the magic is with quantum computing and why that, why it’s so exciting. So these are the main things that I do. I have a lot more problems and my presentation, I think is a lot better.

Steven Skiena 00:55:00 One thing I did was we turned all the pictures and diagrams in the book into color. So now one thing that’s changed since the last edition is color printing is now a fairly routine thing for, for book publishers. And it’s amazing how much clearer ideas are when the drawings are kind of represented in color. And you think a little bit about what the color control in order to, you know, kind of what is the idea you want to convey and make that concept pop. So I think that, you know, it’s, it’s, it’s easier to understand the picture with, you know, what’s going on with the new figures and the new presentation. And so I hope that people will be able to read it and understand things even better than what the previous additions, Steve, this is all great. Thank you so much. Any final words for those preparing for their next job?

Steven Skiena 00:55:51 What about the interviewers who are running these questions for folks? So, first of all, I encourage, you know, obviously interviewing is a stressful thing, do the best you can, and it will probably work. I mean, there is a reason why the unemployment rate for software engineers is low it’s because your skills are in demand. And I’m sure that you’re going to be able to get a good job. One thing I will tell you about the interviewers and algorithm questions that I have found, you know, what my, my students all go off to these interviews, they’ve come back and they, they tell me about their interview problems. And occasionally they come back and said, you know, there was this algorithm that this question that this guy asked and you know, I gave him an answer and he said, no, no, no, this is the answer.

Steven Skiena 00:56:39 And that didn’t make sense to me. And it’s amazing to me how often the interviewers are wrong in the questions that they ask. Either that they ask the question in a way that it’s not as precise as they would like, or their understanding of the answer is actually not as good as they should be. You know, the reason why they are interviewing you is that they are a software engineer at a company. Not that they are a real algorithm expert, so sometimes recognize that your interviewer is fallible. But again, I think it’s learning about algorithm design is a incredibly useful skill and it’s broadening, and you know, there, there are algorithms in daily life that you kind of see once you have a background in, you know, in thinking this way. And I, you know, I, I wish you all good luck on your interviews. I hope my book can help you guys learn algorithms and come to appreciate it better. And it’s something good to know. And I hope I hope you guys can take advantage of it.

[End of Audio]

This transcript was automatically generated. To suggest improvements in the text, please contact [email protected].

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 434: Steven Skiena on Preparing for the Data Structures and Algorithm Job Interview

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

SE Radio 721: Rob Moffat on Risk-First Software Development

Menu

Recent posts

Search

Search

SE Radio 434: Steven Skiena on Preparing for the Data Structures and Algorithm Job Interview

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

SE Radio 721: Rob Moffat on Risk-First Software Development

Menu

Recent posts