Michael Perry

SE Radio 447: Michael Perry on Immutable Architecture

Michael L. Perry discusses his recently published book, The Art of Immutable Architecture. Using familiar examples such as git and blockchain, he distinguishes immutable architecture from other approaches and addresses possible misunderstandings about designing an architecture for immutability. Michael and host Jeff Doolittle also examine other relevant topics such as location independence, conflict-free replicated data types, and handling state in an immutable architecture.

Show Notes

Related Links

From the Computer Society Digital Library


Transcript brought to you by IEEE Software

Jeff Doolittle 00:00:22 Come to software engineering radio. I’m your host, Jeff Doolittle. I’m excited to invite Michael L. Perry as our guest on the show today, Michael is a software mathematician. He believes that software is math. Every class is a CRM. The compiler is the proof and unit tests. Check our work. Michael wrote the art of immutable architecture, a book on applying mathematics to building distributed systems, learn [email protected]. This episode mentions the fallacies of distributed computing and we’ll link to that in the show notes. The episode also references CQRS cap theorem and see our DTS. All of which have related software engineering, radio episodes links to these shows along with four others referenced during the episode will be included in the show notes as well. Michael has recorded Pluralsight courses on distributed systems, XAML patterns and cryptography. In addition to provable code formerly a Microsoft MVP for seven years, he maintains the spoon bending assist to Kent and Janaka open-source libraries. You can find his videos about distributed [email protected] through consulting from improving. Michael helps his clients benefit from the power of software mathematics. Michael, welcome to the show. Thanks Jen immutable architecture. That’s an interesting phrase. Could you start by maybe giving us a real world example of how you’ve applied the principles from the book before we get into more of the details?

Michael Perry 00:01:54 Yeah, in fact, the first distributed system that I ever worked on was for a hospitality company. So we were building a, a gift card system stored value. And the idea was that this was for restaurants that didn’t have very good internet connection. So this was, you know, back in 2000, 2001, you know, the connections from the restaurant was mostly dial up. So we wanted to make sure that they had the, uh, the card balances at the restaurant so that they didn’t have to make the connection or to retain the value. So, yeah. Uh, you know, a lot of the, uh, the problem was making sure that those remote stores were in sync with what we had in the, in the central server. And so, yeah, as people went from one restaurant to another redeeming value, we had to make sure that the, the transactions preceded them and we knew that their card balance was.

Jeff Doolittle 00:02:45 So that sounds like an interesting scenario. Maybe dig a little bit then into what brought you towards these. Well, first off, had you already thought of these concepts about immutable architecture, or was this still something you were still percolating in your mind at the, at the, at the time

Michael Perry 00:03:00 It was, it was all about, uh, imperative thinking. It was all about, okay, when they swipe the card, we were thinking about, uh, you know, what information is stored on the machine at that time, what message is sent to the server, then when the message is received at the server, what changes does it make to the database? So it’s, it’s all, if this, then that kind of logic, uh, all very imperative. And we ran into all sorts of problems with that. So when, uh, one of the, uh, the problems was that we had to, uh, when, when a store first came online and we had to download all of the, uh, information to them. So we just simply collected, uh, these are the current balances of all of the, uh, the cards and sent them on their way. Meanwhile, there were other transactions that were still on the way to the server, maybe even some of them at that particular restaurant.

Michael Perry 00:03:50 So it could have just been what we called receding. We know that some of the balances are incorrect, so let’s just, you know, receive that database in the meantime, there’s transactions flowing on the way back up. So it was all of this imperative logic, and trying to, uh, to catch all of these, these edge cases. And as it, as it turned out, the, the edge cases are just really impossible to reason through without just saying, okay, we’re going to capture each transaction as an immutable record. And then rather than storing the card balance, we have to store the transactions that led to that balance and then let the client workout the current balance.

Jeff Doolittle 00:04:30 Yeah. It sounds like as you were starting this process, one of the biggest issues was, it sounds like you could never get into a state of consistency.

Michael Perry 00:04:38 Right? Exactly. Yeah. And at the time I hadn’t heard the term eventual consistency. In fact, I hadn’t studied any of the consistency models at that time, but then it, it started to occur to me that, um, that there was an idea of what consistency meant that included this concept of, uh, of time. And so it was pretty much at that time that I also heard other people talking about the eventual consistency in the cap theorem, and I’m like, Oh, what’s, you know, what’s this, how do we solve this problem? And it wasn’t until I really started working on the, uh, the follow on to that system, which was all about a, um, you know, frequent diner program, you know, rewards for, uh, for making purchases that it occurred to me that if things don’t change, then they’re always in sync. And so I wanted to make sure that, uh, that okay, let’s just make things not change. Let’s just be immutable. And then every copy of a transaction is just as good as any other copy, because it’s exactly the same. So that kind of led to my thinking of immutable records as a form of analysis and, uh, started to put together some, some rules for how to analyze the behavior of a system. If you were to model its behavior terms of immutable records, and then that a frequent diner program was much more successful because of that

Jeff Doolittle 00:06:04 Back to before, when you originally started on this, you mentioned edge cases and really edge cases, or just cases, especially in a scenario like you were describing with not just partial connection, but slow connections and things of that nature. So, you know, you’re really running up against the fallacies of distributed computing, but I think the main point is there could be an infinite amount of edge cases that you’d never be able to capture with that imperative approach. So does immutability help resolve well, okay. Before we go into that, actually let’s maybe talk about that a little bit more. So when you say immutability, I think some developers, maybe aren’t very familiar with that term, or they think, Oh, that relates to functional programming. We don’t really do that in object oriented land or whatever. So maybe describe the term a little bit and help listeners understand how it does apply regardless of the style of language that you’re using.

Michael Perry 00:07:00 So in functional programming languages, immutability is the default. You have to have extra syntax in order to say that something is mutable and object oriented programming languages, the opposite tends to be true. And so that’s, I think one of the reasons that people attribute immutability to functional rather than object oriented, but it’s a choice that you can make whatever language you’re using. And it’s a, it’s a choice that can be made, not just for the run time in memory objects that you’re most thinking about when you’re writing functional or object oriented code. It’s also a choice that you can make about the objects that you’ve stored in your database. You can choose not to use update or delete statements and simply insert records. Uh, you can choose to create an append only database, and you can choose to treat objects as immutable over distance between machines in a distributed system.

Jeff Doolittle 00:07:56 That’s immutability of objects. And the title of your book is the art of immutable architecture. So how does immutability relate to architecture as opposed to objects? But, you know, as you’re describing that, I’m thinking of things like events, sourcing, or commit logs and things of these nature that can be replayed and listeners may be somewhat familiar with those concepts from reading about CQRS, which is command query responsibility, segregation. Maybe not bring that up the level two, if immutability can occur, say with those events or objects that you’re working with, how does it now apply to architecture?

Michael Perry 00:08:32 And that’s where you start to make the distinction between the object in memory and the, uh, the object on the wire or the object in the database. And it’s really how do you, so, so architecture as a, as I’m using it here is really about how the various components of the distributed system are designed to work together. So architecture is about the interconnectedness of the system. And so if you built each one of these with an imperative mindset, and then you try to connect them together, then you’re going to run into all of these edge cases, which as you said before, are just cases. And you have to test your way out of these strange behaviors that emerge when you have a distributed system. But if you, if you talk about the architecture review system, the interconnectedness of the, uh, the various components in terms of just exchanging information and knowledge about immutable records, then that’s where immutable architecture really emerges. And so, uh, yeah, things like that, even sourcing are a form of immutable architecture. So that would, I would put that under the umbrella. Uh, but there, there are lots of different beautiful architectures. And so the art is really the choice and the balance and picking the correct immutable architecture for solving the problems that, uh, that we need to solve and distributed systems.

Jeff Doolittle 00:09:53 I think that’s helpful because when I first heard immutable architecture, I think I’m importing a lot of my, you know, object oriented kind of background when I first started about immutability and to get immutability, you know, this is like back in the early two thousands when you were first thinking about these problems as well. So when I first hear immutable architecture, I’m thinking what an architecture that can’t change. That sounds, that sounds dangerous. It sounds like a bad idea. Like now I can just see the analysis paralysis. We can’t build anything until we have the perfect immutable architecture. So maybe you can respond a little bit to that because I think that’s a misunderstanding of, of what you’re describing.

Michael Perry 00:10:29 Yeah. But I’m describing as a, an architecture based on immutability rather than one that is itself immutable. In fact, the, uh, the, the term immutable architecture, uh, when I first started to using that, some people were confused with immutable infrastructure, and that’s the idea that you define the, the way the machine is configured. And then you can’t change that configuration. You have to define a new configuration to stand up a new machine. So, you know, things like Docker fall into the immutable infrastructure camp, but, uh, yeah, so it’s, it’s not really that the architecture itself can’t change. In fact, that is one of the fallacies of distributed computing that Depology doesn’t change. And it’s really that you are basing your architecture on the concept of immutability as it relates to the facts and the history and the decisions that are captured within the software and the objects

Jeff Doolittle 00:11:25 In the book. You mentioned something it’s a particular problem in it. It sounds like you were suffering from this problem with the first system that you were talking about with the gift card system. And it’s referred to as the two generals problem. Can you briefly give us a summarization of, of that for listeners? So if they’re not familiar with it, they understand the problem. And then maybe describe a little bit about how this approach of immutable architecture can help with that problem. The two journals is, is

Michael Perry 00:11:54 A, a consistency problem formed in as a story. Uh, the story is about a couple of armies and camp outside of the siege, the city, and, uh, the generals of these armies have to communicate with one another in order to decide when they’re going to attack, if only one army attacks, then the attack is going to fail. So they must both attack at the same time. And so they need to agree on a protocol where they can send Scouts back and forth with messages in order to say, okay, we’re going to attack tomorrow. And the rule of this game is that, um, if you, aren’t sure that the other general knows that you’re going to attack and has agreed that yes, we are going to attack, then you must abstain. Otherwise you run the risk of going alone and not knowing that, uh, that he’s going to be a fellow on you.

Michael Perry 00:12:48 So the problem kind of, uh, it, it asks us to try to come up with this protocol to imagine ourselves as one of these generals and say, okay, I’m going to send a scout and he’s going to say, we’re attacking tomorrow. And then we’re going to attack, well, what did that scout is captured? You know, this is enemy territory that he’s carrying the message through. So, okay. I have to wait a little bit and then send another scout. Well, how many Scouts do I need to send before? I know for sure that one of them made it through. Okay, well, what I need to do is add one more message to the protocol where the other general sends the scout back saying, okay, I’ve heard your message, and then I can attack, but wait a minute. Now, the other side doesn’t know if his confirmation made it through, if it didn’t, then he would be attacking without my help.

Michael Perry 00:13:34 And so it goes back and forth. And what you end up with, which is actually something that can be proven mathematically is that there is no protocol that solves this problem. This is an impossible problem. And this was exactly the problem that we were trying to solve with the gift card system, where it’s like, okay, the balance is $23. And the server has to know what the balance is. It has to know that the client knows what the balances, and we can’t allow any transactions to proceed, unless both sides are in agreement that you’ve reached consistency, but you’ve also got these messages that are in flight. You’ve got these dial-up networks that may fail. And so you have to kind of reason through and figure out a solution to these, uh, to these problems. And, and so in order to kind of slice that and Gordian nuts, um, what we had to do in the gift card system, and ultimately we have to do when facing any distributed system like this is you have to change the rules of the game a little bit. You have to say, well, there’s no deadline. We’re not going to attack tomorrow. We’re going to attack at some stated time in the future. And then you also have to say, and once I, it makes the decision and then just keeps on communicating until they’re happy with that decision. And once you make those two changes to the problem now becomes solvable. But what you’re really just done is you have changed it from a consistency problem to an invincible consistency problem,

Jeff Doolittle 00:15:08 Right? And with the generals, you could still attack and the other team, the other army or whatever, doesn’t get the message until you’re already attacking, but you’re just going to keep, maybe they’ll see you, mate, whatever, you just keep doing it. Now back to your gift card example, that really reminds me of the fact that a lot of these problems have been solved in the real world for centuries, because this is pretty much how banking works. You know, you have to allow a credit card companies do this all the time. They’ll allow a certain transaction size, you know, transaction amount to go through without having exact strong consistency, immediate, strong consistency regarding your balance, for example, because they really can’t have that. So if, you know, if visa tried to say, well, we can’t let anyone charge anything until we know their exact balance across all vendors and across all the system would not work.

Jeff Doolittle 00:16:02 Right? Exactly. So it’s, it’s almost, as if you’re saying we relax some constraints and we still have boundaries for those constraints, right? You can’t just charge a hundred thousand dollars on your credit card and have it go through. And then they come after you later, there’s, there’s checks and balances there to prevent that sort of thing from happening. But by relaxing some constraints, we can actually, as you put it on tie, some of these Gordian knots and find a way to reasonably solve the problem in a way that we can also predict much more clearly than if we try to just solve these with, you know, the typical fallacies of distributed computing approach.

Michael Perry 00:16:38 Right? Exactly. And one of the, one of the things that I love about that aggravation about the, uh, the banks and the fact that we’ve been solving this problem for years is that it elevates, uh, what started as a technical problem to a business problem. And then we are able to switch that business problem into a business opportunity. So, Hey, this isn’t inventory consistency anymore. This is an opportunity to charge people a fee if they go negative on their balance. So it’s a revenue stream rather than a technical problem.

Jeff Doolittle 00:17:09 That’s a great way of putting it. And I think it’s funny how often we we’ve tried as an industry to solve unsolvable problems by throwing more code or, or more scenarios, or as you said, edge cases before at the problem, when the irony is, you know, if somebody just asked the business, they might say, Oh yeah, we’re okay with some, you know, if, if you ask some, some programmers maybe 25 years ago, you know, I want you to build me shipping system. And it’s like, well, that package must arrive and it must arrive on time and it must be received if you build the system with all those musts, you’ve given yourself an impossible problem. But if you ask the business, they would say, yeah, sometimes packages don’t get, Oh, well then what will we do? And there you go, those are your business problems. And as you said, those become opportunities. If you can find ways to creatively resolve some of those issues, which companies like FedEx and ups and all of them have worked hard to do

Michael Perry 00:18:04 Exactly. And then one of the solutions that they’ve come up within this solution that we’ve been using for millennials is to keep records of what we knew at certain times in the past. And so that’s really, what’s immutable architecture is really trying to capture is that we are, we’re just trying to understand the decisions and the knowledge that was present at the time that that decision was made. And then sharing that information with all the other nodes in the system, and then let them come to the appropriate conclusion given that history affects.

Jeff Doolittle 00:18:38 So we’ve mentioned the fallacies of distributed computing. A couple of times, we’re not going to be able to delineate all eight of them here in the episode, but I do want to encourage listeners if you have heard it and you don’t remember, or you’re unfamiliar with it, definitely go do a search for fallacies of distributed computing. And I think as a exercise afterwards, it’s helpful to look at each of those in light of these concepts that we’re going to be discussing in the rest of the episode, just to kind of frame some of those issues that you might call yourself were facing with the gift card system. And basically you, instead of just trying to use brute force, you basically redefined the problem.

Michael Perry 00:19:17 Exactly. Yeah. That’s, that’s what this concept is really all about. And, you know, people have, uh, actually redefined the problem in terms of immutability a number of times and publish those in a number of places under different names. And so this is, this is me, you know, pulling upon that, uh, that inspiration adding in my own interpretation there and making that a system that I can then share with other people.

Jeff Doolittle 00:19:44 You mentioned analysis techniques before this being a podcast. We’re not going to be able to get too much into those because they’re highly visual. But I think that ability to do that modeling is potentially very helpful for listeners. And I think in the context too, you were mentioning business opportunities and business problems and the ability to model things in a way that the business can understand has a lot of power as well. But, you know, as I said, it’s very visual. So we can’t get too much into that in this conversation, but maybe to help listeners connect a little bit more with this concept, do we see a mutable architecture anywhere else out in the real world? You know, before you mentioned Docker as an example of immutable infrastructure, which it’s kind of a form of a mutable architecture, right, were spinning up containers were composing different containers. Maybe we’re throwing them up in a Kubernetes cluster, these kinds of things, but are there other examples that are close to, or, or use these concepts that listeners may already be familiar with?

Michael Perry 00:20:39 The portion of Docker that I would say is, is most closely related to these concepts is the fact that when you identify a Docker image, you do so based on a hash. And that is the hash of the steps that it took in order to create that image. So if you were to take exactly the same steps, then you will end up with exactly the same image. And so that history of those steps in some sense is the image. It is the identity of that image. And they, uh, another system that, uh, I’m sure that a lot of our listeners are familiar with that uses a hash in order to identify an object within the system is get, get, uh, uses the hash of each commit as its identity. And, uh, what’s really awesome about that is that every single node in the system, whether it’s your developer machine, the machine of one of your teammates or the remote, they’re all going to compute the commit hash in the same way.
< strong>Michael Perry 00:21:35 So they’re all going to come up with the same identity. So when I push a commit to a remote, it can check its logs and say, Oh, I already have a commit with that hash. And it was the same one. And it knows that that commit itself hasn’t changed. The commit is immutable, but the results have a history of commits evolves over time. And that’s the, uh, the part that simulates something that changes using immutable history. And, uh, then I would tell you a third example would be blockchain. That is a system whereby you can exchange transactions, each transaction being immutable, and they are gathered up inside of these blocks and the blocks themselves are immutable. And you can tell that they’re immutable because their identity again, is their hash. It’s something that if you were to try to add a new transaction to it, you would get a new hash.

Michael Perry 00:22:30 And so that is a feature of the data structure that is helping you to enforce the immutability constraints that you put upon yourself. And now that you have that constraint, you can then make assumptions. Like if I ask somebody, well, what, uh, what version do you have? And they give you a hash, you can say, Oh, I’ve got the same hash. And we’re on the same version. Everything else that led into that hash is exactly the same, the concept of the Merkle tree. They’re familiar with that. So that’s, that’s a really powerful tool for you to have, and you can only have that tool, uh, if you’ve constrained yourself to immutability

Jeff Doolittle 00:23:09 Along with that, each commit and get, or each, you know, ledger entry in the blockchain references, the prior hashes as well. So that becomes part of its identity, um, at least in the generation of its identity as well, so that you have that ability to see the history of how we got here as well. The, in that chain of events, which is another interesting concept. And of course, you know, blockchain adds more things like proof of work and concepts like that, but that’s not so much necessarily a part of the immutability itself. The immutability is the results, which is this ledger entry led to this hash. And therefore it is now, you know, and you can get to niche things like distributed consensus algorithms and such, but that’s kind of, I mean, that would be part of the immutable architecture, what it, not the ability to figure out now, how do we synchronize these nodes? And when do we determine that, okay, they’re sufficiently synchronized that we’re not going to accept this generally across all nodes.

Michael Perry 00:24:04 Yeah. That, that would definitely be part of the immutable architecture in order to have those historical facts that you are making those decisions based on, if you have the concept of consensus where you have to, you have to know on one node that other nodes have reached to the same conclusion, then you might be relying upon typology in order to solve a consistency problem, which is something to, to always look out for

Jeff Doolittle 00:24:31 One of the fallacies of distributed computing. Yeah, exactly. Okay. Well, we’re moving right into a perfect segue here, because you mentioned before this ability for individual nodes to determine identity of an object by generating a hash. And one of the chapters in the book is all about location independence. So let’s use that now to say, okay, if I can independently determine the identity of objects and also determine whether or not they’ve been tampered with, what is this concept of location independence, and how does that relate to this ability to identify objects sort of in a sense by value, by generating a hash, as opposed to, by some reference or other identifier that’s generated by a centralized node,

Michael Perry 00:25:17 Right? Yeah. So the purpose for having location independent identity is to know that if you’re talking about a thing and another note is talking about the thing, then you’re, you’re talking about the same thing you’re using the same identity for it. One of the ways that we often identify optics in any kind of a system with a database involved is we’ll use the auto incoming ID of that row as they identify her for the object. And if you think about it, that is a location dependent identity. That ID is only good for that database. If you were to have multiple records of the database, they would come up with different ideas for the same object, or even worse come up with the same ID for different objects. So it’s really important when you’ve got a distributed system that your identity not be tied to the vocation.

Michael Perry 00:26:08 And so using the hash is a great way to relocation independence. Using a public key is a great way to identify a user or an entity that has autonomy in a distributed system, because now when other nodes have that same public key, they know for certain that they’re talking about the same principle within the system, and there are other forms of location, independent identity that we could consider like natural keys, things that are already part of the domain. Uh, they’re going to be the same natural key, no matter what, you know, what you’re talking about. And so when you use a hash as a location, independent identifier, uh, what you’re doing is you are employing a pattern called content address storage. And, uh, this is a really powerful and really surprising pattern. It’s like, okay, well, in order for me to retrieve an object, I need to know what’s hash will.

Michael Perry 00:27:03 How can I have attach? I’ve been to it yet, half the object? Well, if somebody gives you the hash and say, okay, this is the contract I want you to decide. Then you can go to the data store and you say, okay, do you have the document that satisfies this hash? And so if it is previously stored that document, indexed it with that hash, then it can give you back the document. You have the added benefit of being able to hash the document and verify, yup. That was the correct hash. But the, the purpose is that nobody can change that document from that point forward, because that would produce a new hash and it would therefore be a different identity, a different object. And everybody who’s got that same document has computed the same identity for it. It’s location independent

Jeff Doolittle 00:27:45 With location independence as a property of our immutable architecture. We’re basically now allowing nodes with some agreed upon rules. The nodes are going to have to share these rules, but these rules are going to be, here’s how we generate a hash to identify objects. And so if I have the same content, I get the same identity as any other node. And so that seems like that also has some implications for now how you reconcile or create consistency across these nodes.

Michael Perry 00:28:16 Yeah. So a lot of people will talk about eventual consistency and the like with BC and meme on Twitter about it, they’re focusing on the eventual part of it, but it’s really the consistency. That’s the important bit. And so in order for two different nodes to reach the same conclusion, you have to have some guarantee that they actually will reach the same conclusion that when, when everything settles down, when the network reasses, that they end up at the same result, and that’s not just guaranteed out of the box, uh, you have to adhere to some properties in order to make sure that that happens. So if you’ve got the idea of the last rider wins, and you’re allowing two different nodes to make changes to an object, then which one is the last writer, if the changes both happened at those particular nodes, and then they shared that information with one another, then they each learned about the other one after they made their own change.

Michael Perry 00:29:11 So basically that’s a strategy in which both will agree that the other one is right. It’s going to lead to inconsistency. It’s going to lead to two different, uh, uh, nodes computing, the same value for the same object. So really eventual consistency means that after the network creases, that everybody comes to the same conclusion. So the, uh, the question is, what does it mean for the network to Queens? Uh, and that’s where you get different levels of consistency. You know, so we, we talked about strong consistency. You were, uh, if you were to just ask, as soon as you got your response, then you would get the same response from two different notes, very difficult to, um, to achieve that level of consistency. In fact, you know, things like the cap theorem will show us that you can’t achieve that level of consistency without breaking partition tolerance, but that’s getting a bit into the weeds, but then there’s a weaker constraints of eventual consistency, where it’s just let as many messages flow as need to flow.

Michael Perry 00:30:15 And there you get things like Paxos and other algorithms that will vote and share state until everybody’s like, okay, are you sure you got that? All right. And everybody’s good. And now we’ve got the, the same answer, but then there’s something in between, uh, called strong eventual consistency. And that’s a guarantee that, uh, all the nodes will agree upon the current state. Once the information about the individual decisions has made it to all of the different nodes. So it doesn’t require any additional voting and reconciling after the fact it’s just, but once everybody learns about the decisions that went into the state, then they will all reach the same conclusion. And I think that’s the most powerful consistency level. The one that I strive to achieve in most of my systems.

Jeff Doolittle 00:31:05 Now you could still add on top of that, something like Paxos or another distributed consent algorithm to basically verify that nodes in your system have achieved a certain parody, as far as the history that they’re dealing with, if that was necessary, but you’re saying, or I think you’re saying that in some scenarios where that’s not necessary, strongly eventual consistency can give you enough of a guarantee that you don’t necessarily have to go to that next level of adding strong consistency sort of on top of that.

Michael Perry 00:31:38 Right. Right. And that’s, that’s where it’s important too, to be able to model the problem in such a way that you can communicate with the, the business stakeholders so that they understand the ramifications of the strong, eventual consistency that you’re going for. And usually when you have those eventual consistency conversations, SLES get into the mix. And that’s not really what it’s about. It’s not about, Hey, you’re going to have this, uh, this information, but it’s going to be delayed by 500 milliseconds. Is that okay? It’s really more about, uh, you know, somebody else is going to make a completely independent decision and you’re going to have to tolerate it. And somebody is going to be able to go to a restaurant and use that gift card. And that restaurant doesn’t know that, uh, the value has already been depleted and they’re going to eat their food, and they’re going to walk away. You’re going to have to deal with that situation. So now it really enters into the, uh, uh, the business conversation, given the constraints and the limitations that are imposed and implied by it’s probably a visual consistency.

Jeff Doolittle 00:32:41 Right. And in order to do that, one of the concepts you mentioned in the book is it’s kind of a mouthful. Yeah. For short it’s C R D T’s C R D T it’s conflict-free replicated data type. Now what I’ve heard about these in the past, prior to reading your book, they typically related to things like counters or aggregators. And the concept was that in a similar fashion, once, once every node has received all the events say, or, or all of the, the numbers of something and aggregated them together, that they would come to some agreement, you know, an example might be a tweet follower account, which, you know, it’s not at any point in time, it may be off a little bit, but it’s okay. It’s not the end of the world it’s eventually going to catch up. And so that would be an example of a, you know, you have a conflict free replicated data type where each follower you get, you know, every, every note in the system knows how to resolve any conflicts in the numbers or things of that nature, but you’ve taken this concept to a different level. In fact, I’m going to have to find the link, but there’s an author who I remember at one point, and maybe you remember who it is, who had basically said, you don’t want to use CRD tees for anything beyond that kind of aggregation and summarization approach, but in the book, you’re S you’re taking the concept of CRD Ts and basically treating full documents as conflict-free replicated data types. So maybe first give us a brief idea of what the heck that means

Michael Perry 00:34:11 Quite a mouthful. Yeah. A lot of concerts going on there. Yeah. So the, the idea of a conflict is that concurrent changes have occurred that I’ve, I’ve made a decision. You’ve made a decision. If we allow both of those decisions to, to stand, then they’re in conflict, the system no longer behave, some sort of constraint that we would like it to have, you know, like going back to the gift card thing, if one of our constraints is that the balance of a card can never be below zero. And we redeem a card at two different places. Now we, if those are both allowed to stand and then be balances below zero. And so in order to protect against those things, we want to make sure that the things are, are conflict free, that the, that the data structure itself doesn’t allow conflicts well, that exact constraint there that, uh, in variant is one that, uh, that you can’t enforce it with any kind of a data structure. So it’s really that what you want to do is come up with the value that you can ensure is, is conflict pre and then build a data structure around that. And so the, the mathematics behind CRDT yeah. Kind of describe what those operations are and what those constraints can be. So a, a simple form of a CRDT is one in which all of the operations are competitive. And so if you were to apply the operations in a different order, then you would come to the same conclusion.

Jeff Doolittle 00:35:41 This is a plus B equals B plus a

Michael Perry 00:35:44 Exactly. That’s right. So if you had a, uh, an operation that was let’s keep score, so we’ve got a, a game and you just want to just keep scoring the game. And there were some mechanism by which you could be sure that nobody could enter their score twice. So we’ve solved that, uh, that problem of, of duplication at the mechanical level, then since keeping score relies upon addition, and addition is communitive now you have, what’s known as a communicative conflict-free replicated data type. So it’s, it’s a mathematical concept that, uh, that has, um, uh, real world ramifications. And it’s, it’s a sort of thing that’s, uh, has, has been used to solve a lot of these small problems like you were talking about, uh, like, you know, Twitter account. Well, that’s, that’s another addition problem. Hey, you can, you can add a Twitter followers from, uh, from different places in different orders. And you’ll always come to the same total, as long as you’ve solved the problem of duplication at the network level, but it’s not applicable for other problems. Like, you know, making sure that a, a card balance doesn’t go negative. So it’s really understanding the, uh, the mathematics behind it and the constraints that you can and cannot prove.

Jeff Doolittle 00:36:59 There’s an interesting article from 11 years ago in the I Tripoli computer society magazine called a communitive replicated data type for cooperative editing. And it’s interesting, they, they refer to CRDT as a communitive replicated data type a so interesting that now it’s being used, say conflict free, but you just mentioned the idea of community. And of course, that’s going to have implications that I think could be very helpful for, I’m not sure if this is one of the fallacies of distributed computing, but messages always arrive in order. That is definitely a fallacy. And so basically this is saying regardless of the order that messages are received and processed, we’re going to end up with the same end result,

Michael Perry 00:37:43 Right? Yeah. And actually the, the history of CRD Ts is a little bit interesting. And, and you can, you can kind of see some of that coming through what you just shared there. So Mark Shapiro is one of the mathematicians who originally worked on this, uh, this concept,

Jeff Doolittle 00:37:58 Mark is one of the authors of the article I just mentioned.

Michael Perry 00:38:01 Yes. Yes. Excellent. So, so yeah, uh, at different times, during the evolution of the idea, do you use different words to define CRDT? And so, uh, as we’ve come to understand that now a conflict-free replicated data type is when that, uh, that once everybody has received all of the information about what went into it, all of those data types compute the same value, and there are essentially two ways to achieve that. Uh, one is connectivity, uh, which we just described and, uh, then there’s a, an operational CRDT. And so it’s got a different set of rules that if you were to follow these rules within your data type, it would still converge to the same outcome. And one of Mark’s papers is a, uh, an equivalent proof between those two that you could implement one in terms of the other. So, yeah, it’s, it’s, it’s kind of evolved one of the, uh, the data structures that, uh, that he worked on was something called tree doc.

Michael Perry 00:39:01 And I think that that paper is, uh, is really describing pre doc. And that is the, you know, the way of, of capturing all of the changes that people make to a document like a, uh, you know, typing a letter and so collaborative editing in such a way that once they all learn about everybody else’s changes, then they will all render the same document at the same letter on their screen. Uh, so very powerful tool if you’re doing something like Google docs. And in fact, I think a tree doc, uh, was used within Google docs and it’s competes against a, a different algorithm, a different strategy called operational transforms, where every time somebody makes an edit, it goes to a central location that central location takes a look at that in terms of other edits that have happened concurrently, and then transforms that operation before sending it back out. And so that’s solving a consistency problem using topology. So you’ve got a central location, you send everything there, it transforms into a history that makes sense. And then, and only then can the, uh, the other participants learn what the eventual outcome was,

Jeff Doolittle 00:40:10 But that’s okay because typology doesn’t change. Oh, there’s one of the fallacies again. So now we have these concepts who’ve kind of laid out, you know, that are somewhat generalizable. Like you mentioned, you know, Mark Shapiro’s article on CRD, Ts and ideas of cap theorem and consistency. We haven’t really dug too deep into cap theorem and nor do we really have time to here, but just to remind listeners, you know, cap theorem, if, if you can just briefly give us what it is and what the three parts of the acronym represents.

Michael Perry 00:40:42 Yeah. So stands for consistency, availability and partition tolerance and your consistency. We’re talking about the strong consistency kind. And the cap theorem is one of these impossibility proofs that I love so much that says that at any given point in time, it’s impossible to have all three partition tolerance. By the way, is the behavior that a system will continue to offer whatever guarantees it offers in the face of network, partitions messages, not being able to flow. So if that happens, you either have to give up consistency or you give up availability.

Jeff Doolittle 00:41:15 So trade-offs basically between those three. Yeah, exactly. With those concepts in mind. Now there’s something you mentioned in the book that maybe is somewhat unique, and it’s this idea of facts, what you call historical facts. Can you talk about those a little bit and maybe how those differ from what listeners might be used to with say events in an event source system?

Michael Perry 00:41:38 The primary difference is the order of events versus the order effects. Events are totally ordered, meaning that you can put them in sequence. So, you know, for certain, anytime you take a look at two events, which one came before, the other facts are partially ordered. Sometimes you can compare two facts and say, okay, this one came before that one, but sometimes you can take a look at them and say, well, I can’t tell which one came first and partial order, uh, turns out to be a really strong feature of historical facts and something that you can use in a distributed system. So kind of looking at, even sourcing let’s, let’s take a look at a specific implementation. So we’ve got even store. This is a, a data store where you can put your immutable events into that data store. And then when you read them back, you’ll read them back in a certain order.

Michael Perry 00:42:32 And the way that you can figure out the current state based on those events is you just knew of an object and its initial state. And then you start to play each of the events against that object, allow that object to mutate. And then when you’re finished, the object is in a particular state. Everybody who’s read that same sequence of events will compute the same state. Now you can do that in an immutable fashion. It doesn’t really change the outcome, but it makes things a bit more easy to reason about where the initial state is an input to a function that also takes one of the events. So this function now taking two things, a binary function, it will output the next state. And then you can use that output as one of the inputs to the function itself with the second event. And so if you were to apply this in a functional programming language, then you would basically be doing a left fold over this sequence of events.

Jeff Doolittle 00:43:31 I can hear Greg young saying that right now. Exactly the, uh, creator of CQRS. And, uh, one of the, I believe he’s the chief scientist of events store.

Michael Perry 00:43:41 Yeah, I believe that. And so, so that’s the idea of, uh, of event sourcing is that you’ve, you’ve imposed this total order on your events. And that now means that the events themselves don’t have to know about each other. It’s just that they were put into the store in a certain order that they will produce the unexpected outcome.

Jeff Doolittle 00:44:03 So to summarize that, to reach a particular state, it’s required that these events be ordered and the events themselves are unaware of them.

Michael Perry 00:44:14 That’s right. Yes. Okay. And now compare that with historical facts as I define it in the book. So a fact knows about the facts that immediately proceeded its predecessors. So it knows that it followed these one or two previous facts, uh, similar to, in a get commit. Uh, if you take a look at the commit, you can see, well, this is your parent commit. So that commit immediately followed that parent. And so then imposes a partial order on those facts. If you were to apply the transitive property, you could say, well, since it followed this predecessor and that one has another predecessor, then it also followed that, uh, that other predecessor, and you can work your way back through history. And you can see everything that must’ve happened before, before this back to did, but that’s transitive closure over history that won’t include facts that occurred concurrently. So a decision that was made on a different node, without knowledge of the decision that, uh, that you’re making right now, that is going to be a concurrent decision. And it’s not going to have a reference to your fact, nor is yours going to have a reference to that one.

Jeff Doolittle 00:45:23 Could that be somewhat similar to what happens in get where you have two people who’ve both done a commit to the same branch, and now you’re in a situation where you are going to have a merge conflict.

Michael Perry 00:45:34 And in fact, if you, if you were to take the branch concept out of it, it still works. Branches are just a way to point to the commit. Right?

Jeff Doolittle 00:45:42 Right. I always say the same thing. There are no branches. They’re just pointers to a commit. That’s like the Zen moment with ghetto. Oh, there’s no branches. Yeah. There is no spoon that’s right.

Michael Perry 00:45:53 That simple fact of two different commits having the same parent, that is the thing that represents a concurrent change, like a current pair of commits. So neither one of them knows about each other, but they both have some common ancestor. And so no matter what your branching structure is, if you say that those two are on the same branch, or if they’re in different branches and you’re going to end up having to create a merge or a rebase. So you’re going to have to resolve that concurrent change. Now, this is where we have to really be careful about terms. Some terms have mathematical definitions in certain contexts. So a conflict has the, uh, the mathematical definition of when different nodes receive this information, they compute different outcomes. And so what was really happened in get is not a conflict it’s that both of the nodes understand completely that there is a tree in which you’ve got two leaves and they both point back to the same parent, no conflict there at all. So the conflict, it really comes in in terms of get, when you are now trying to come up with one, one single code base that is derived from that history. But in the world of CRDT is conflict-free replicated data types that still conflict for you. There is no conflict there.

Jeff Doolittle 00:47:17 Now the conflict could be, you know, in how people experience, what happens with the merge. So I think there’s an interesting concept here where in order to reconcile to get commits that have the same parent that typically involves human interaction. And to the extent that it doesn’t, you can get into weird scenarios where say the code doesn’t compile, you know, or whatever it might be. So we’re not really talking about CRD tees in get land. We’re talking about a human who is reconciling the conflict and making decisions about whether I can merge these things, or I can try to merge and fix things so that it merges correctly and still builds, or I can rebase. And then I, my history stays linear instead of having these merge commits. So with this concept of facts and CRD, Ts talk a little bit about how can you say in advance that these things will be reconcilable at some point in the future or, or, or now like these could these two conflicting facts, right? With the same parent, those can always be agreed upon by every note in the system that those lead to the same end state, how do you do that?

Michael Perry 00:48:30 Yeah. Uh, and that, that comes down to constraints. It comes down to the things that you do and do not allow yourself to do when you’re building the system. So if, if you’re in state, is this graph. If you’re, if you’re in state, while you’re looking at get, is this history of commits, then everybody has the same history. It’s only if you’re considering your end state to be, you know, a bunch of source code that now you’re trying to take that, that graph and then to horn it down into a single linear thing. So if you do all of your work on the graph itself, rather than doing your work on something that you compute from the graph, then you can say pretty confidently that everybody’s going to end up with the same graph. Because if I, if I share a fact with you, you know, by virtue of the fact that it’s inevitable, if you already have it and you know, you already have it, and it hasn’t changed, it’s already part of your graph.

Michael Perry 00:49:25 And, uh, if not, you can just go ahead and add it to your graph. And you know, that you’re not going to invalidate the graph structure, which to Linda with is a directed acyclic graph. Uh, there won’t be any cycles in your graph. And each, each fact in that graph has a point to its predecessors. It has a direction to the edges. So each one of those edges is directed. And so now if you make your decisions just based on the graph structure, then everybody’s going to make the same decision, because they’re all looking at the same graph

Jeff Doolittle 00:49:58 And this doesn’t preclude a person still creating a new fact that has two predecessors that are those conflicting. I think an example use the book as a contact management app. And one fact changes the phone number, and one maybe changes the email, and those are independent facts with the same predecessor. A person could then create a new fact with those two predecessors and reconcile and say, well, I’m also going to change the name, but I’m going to accept this change to the phone of this change, the, and reconcile those all together into this new fact.

Michael Perry 00:50:30 Right? Yes. And so, yeah, that’s, that’s a, that’s a really great example of historical facts. So you, each, each decision that somebody may have in order to change the phone number of a, of a contact to change the email of the contact, each one of those decisions is pictured as a separate immutable historical facts, uh, record, and in its predecessor relationships, as it’s pointing back to other facts, it’s revealing what information was known to that particular user at that time. So you can tell that this email was set with knowledge of this prior email. So it overwrote that, and now you end up with a chain of email changes over time. And if you’re going to take a look at the end of that chain, you would say, well, there’s only one node there at the end of the chain. Therefore I can tell with confidence what the user intended for the email to be.

Michael Perry 00:51:23 But, uh, if you look at it and you see that it forks, now you’ve got not a chain, but the graph that has two children, two leaves, then you can’t tell for sure which one was the user’s intent, because each of those changes was made without knowledge of the other. It was a pair of concurrent changes, but just like with a get merge or a what I would just really be a merged. So you’ve got a merge commit, has two different parent commits. So that is the end user making the decision that when I combine these two histories, this is the outcome that I want to get. So it’s, it’s a way of expressing the user’s intent in a new historical fact

Jeff Doolittle 00:52:06 With time running out, which it always does. There’s so many more concepts that we could get into, and I’d encourage listeners to find out more in the book about how you handle state. You know, we mentioned before that being a left, fold over events, but the same concept applies with facts, right? I mean, they’re basically you run a function of, uh, over a bunch of facts to determine state, which is interesting there’s security and privacy issues. There’s volume of data. Oh my gosh, I sorted these facts forever. How’s that going to work? I, you know, and then of course, how do we communicate and synchronize between nodes without coupling ourselves to typology and on and on. So all of those concepts and more, if those interest you definitely worth researching more on the internet about these concepts and also getting a copy of Michael’s book to dig deeper as well. So maybe to help us wrap up, let’s, let’s talk a little bit about adoption and impact. It kind of sounds like maybe these concepts could work great in a Greenfield application, but what about existing systems?

Michael Perry 00:53:08 Yeah, I think that is really the key to making anything, something that that can be adopted. And so I do go through lots of different patterns that can be applied in order to gradually get into an immutable architecture. Like for example, if you are integrating with an API that the API was, was written with location dependence in mind, uh, it was written with mutability in mind. And so how do you bridge that gap? Well, I’ve got a pattern in the, in the book that I call the outbox pattern that gives you as good a guarantee as you’re going to be able to get, uh, with immutability on one side and mutability on the other. And there are ways to apply the principles of immutable architecture using the tools that we’ve already used to build our mutable, our stakehold systems. So I walked through how you can apply immutability principles, just using a regular SQL database.

Michael Perry 00:54:04 And so this is something that you can work your way into. And in fact, in all of the systems that are built for my clients, we’re building the system with a team of people who some of whom are not yet familiar with the immutability concepts, and we’re building it for a group. That’s going to have to take this over and follow those concepts. And we’re, we’re adding onto systems that are already stateful in nature. So yeah, I’ve had to practice time and time again, the mixture between immutability and stateful systems. And it said it’s a bit of a balance, but it is very well achievable

Jeff Doolittle 00:54:41 To kind of wrap that up. What would you say would be the chief benefits if someone’s considering these concepts and ideas and they say, okay, I want to start using these ideas of, of immutable, architecture and facts. What are the primary benefits? If you could summarize them briefly of, of doing so, why would anybody want to do this? Why don’t they just keep doing what they’ve always done?

Michael Perry 00:55:01 Yeah. The, the primary benefits that I’ve found is that now you can really reason about your systems. You can understand what they’re going to do, which, uh, when you’re talking about a distributed system, that’s just really been almost impossible to do. You can understand the constraints that you have to follow in order to get certain properties. And you can understand what properties you can’t get from a distributed system. And you can understand that, okay, strong consistency is really expensive. And so now you can recognize when you’ve made that choice. And then you can say, well, if I do need that, uh, we’ll go with this architecture. If I don’t need that, then I can go with an immutable architecture. So the kinds of systems that you can build once you’ve adopted this things like progressive web apps, where you’ve got your state on the mobile device and things can go offline and the user can still work with the system, you can, uh, have active, active fail over where you’ve got different databases that are backing your application, and you can switch from one to the other at any time and know that you’re not going to correct the data structure.

Michael Perry 00:56:11 So basically you can solve the problems that we usually think of as edge cases that are usually kind of the afterthoughts of like we’ve been testing it. And now sometimes in this condition, we get the, we a race between these two things. And now we have to try to reproduce it in the lab and yeah. If we’re able to do so, then stop it. And then we have to debug and step through that. It’s just really a difficult end to an ultimately that way of thinking about developing software is not going to scale. It’s not going to produce good software in the future. Yeah. It’s not resilient. It’s not robust. And it’s, you know, to go to the extreme, it’s not, it’s definitely not anti-fragile meaning systems that actually improve right. When there are issues. And it even sounds here, like that’s another level that you could start to take this concept of a mutable architecture.

Michael Perry 00:56:59 Absolutely. Yeah. I foresee a world in which that is, that is the default. That’s the way we start building our systems and we’re better for it. That sounds great. There’s reason enough right there. So if people want to find out more about what you’re up to, where should they go? Oh yeah. You can go to immutable and that’s where you can find out all about the book. You can also see some of the patterns and historical And it’s also got a link to a YouTube channel where I have talked about these different concepts in that visual form, and then check out my courses on Pluralsight as well, because I’m using needle architecture in distributed systems. As I teach you the principles of distributed systems.

Jeff DoolittleThat’s great. And listeners can also find you on Twitter at Michael L. Perry. Is that correct? That is correct. Yeah. Great. Well, Michael, thank you so much for joining me today on software engineering radio. Yeah. Thank you very much yet. This was a pleasure. This is Jeff Doolittle for software engineering radio. Thanks for listening.
[End of Audio]

This transcript was automatically generated. To suggest improvements in the text, please contact [email protected].

SE Radio theme: “Broken Reality” by Kevin MacLeod ( — Licensed under Creative Commons: By Attribution 3.0)

Join the discussion

More from this show