Search
chris patterson - SE Radio guest

SE Radio 654: Chris Patterson on MassTransit and Event-Driven Systems

Chris Patterson, founder and principal architect of MassTransit, joins host Jeff Doolittle to discuss MassTransit, a message bus framework for building distributed systems. The conversation begins with an exploration of message buses, their role in asynchronous and durable application design, and how frameworks like MassTransit simplify event-driven programming in .NET. Chris explains concepts like pub/sub, durable messaging, and the benefits of decoupled architectures for scaling and reliability.

The discussion also delves into advanced topics such as sagas, stateful consumers for orchestrating complex processes, and how MassTransit supports patterns like outbox and routing slips for ensuring transactional consistency. Chris highlights the importance of observability in distributed systems, sharing how MassTransit integrates with tools like OpenTelemetry to provide comprehensive monitoring. The episode includes advice on adopting event-driven approaches, overcoming leadership hesitancy, and ensuring secure and efficient implementations. Chris emphasizes the balance between leveraging cutting-edge tools and addressing real-world challenges in software architecture.

Brought to you by IEEE Computer Society and IEEE Software magazine.



Show Notes

From IEEE Computer Society

Related SE Radio Episodes


Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Jeff Doolittle 00:00:18 Welcome to Software Engineering Radio. I’m your host, Jeff Doolittle. I’m excited to invite Chris Patterson as our guest on the show today for a conversation about MassTransit and event-driven systems. Chris is a software architect and an open-source leader with over 30 years of experience in designing, developing and deploying technology solutions. He’s the owner and consultant of Loosely Coupled LLC — a company that provides technology consulting and developer support services for MassTransit, an open-source distributed application framework for .NET. Chris is the founder and primary maintainer of MassTransit, which he has been leading since 2007. He is passionate about creating and contributing to open-source projects that enable developers to build message-based applications with ease and reliability. He regularly produces software development related content on YouTube, sharing his knowledge and expertise with the community. Chris, it’s a pleasure to have you. Welcome to the show.

Chris Patterson 00:01:13 Hey, thanks. What a good introduction. I mean, you basically covered it all.

Jeff Doolittle 00:01:18 Well that’s great. That’s my job. So let’s dive right in. I mentioned at the top of the show something about MassTransit, which basically before we get into MassTransit we have to answer a different question, which is what is a message bus and why should anybody care?

Chris Patterson 00:01:31 That’s a conversation that comes up and it’s something that has historically been a number of different things. If you go back to kind of the early software or service-oriented architectures and the concept of this enterprise service bus was floated around and it was a lot of corporate middleware, it was a lot of big vendor type things. It was generally when people of an Agile and kind of modern software development mindset hear things like enterprise service bus, they think, oh my gosh, that’s just crazy old SOA stuff and, that’s not something we want. And a message bus is probably just a different way of saying that we want to build distributed applications in a durable and reliable way. And typically we do that using message-based message brokers. So things like RabbitMQ, Azure Service Bus, Amazon has SQS, Active MQ, these are message brokers that are able to do durable, reliable delivery of messages asynchronously.

Chris Patterson 00:02:35 So with .NET, for what the past, geez it has to be eight years, we’ve had Async and Await in the task parallel library and people have really adopted that. And the whole framework, if you look at the latest ASP .NETS, everything is Async task. Everything is around doing things asynchronously and parallel concurrently. And it’s really made .NET a highly threat aware, highly scalable framework for building applications. And if you look at the benchmarks and you look at the way they built things, async has become kind of a staple of modern software development using .NET and accompaniment that needs to go along with that is it used to be in our applications, things would be somewhat synchronous. Think about it as a transaction script. I need to parse the request from the customer. I need to store that request in a database.

Chris Patterson 00:03:28 I need to insert some rows into the reporting table. I need to insert some rows into this thing. Things like SQL triggers and all of these ways of trying to build asynchronous processing into it is essentially a start to finish transaction script is how applications were built. Well now that we’re asynchronous, we want to do things like, oh well I can go update those other tables separately. I can do those asynchronously so I’m not slowing down the user response. I want to provide a fast response to the user. So instead of following that kind of transaction script programming model, we are able to use a message bus to produce events that can then asynchronously run and do those things for us. So when a synchronous process completes, such as adding a customer, I could produce an event via message bust that says, hey, a customer was added.

Chris Patterson 00:04:21 And then the follow-on processes such as updating the billing system to creating an account or validating an address or sending a confirmation email or any of those other things that happen when you create a new customer are able to run asynchronously by consuming those events on a message bus. So it, it’s a way of having execution and the word I commonly hear is pub/sub. I want pub/sub for my event. So a message bus provides that pub/sub framework so that you don’t have to know all of the consumers of those events upfront. Because like in that transaction script, if you wanted to add another step, you would’ve to go modify that original script and, insert some more rows or do whatever or call an HTTP API, any of those things. Whereas with a message bus, and by using events and pub/sub, I’m able to have an additive architecture where I can add new capabilities to the system without touching any of the front-end customer creation code. That customer creation code works as it does produce that event on the message bus. And then a consumer is able to take that message from the message bus because it fans out it’s published, subscribe and do additional capabilities without having to work through the entire system and add that function in. So that’s in general what a message bus is, at least how they’re used today versus how some might think of the enterprise service bus of yesteryear.

Jeff Doolittle 00:05:48 So that kind of makes me think a little bit about the open closed principle, which is the idea that at least at the class level classes should be open for extension and closed for modification. It sounds like this kind of brings that up to the level of the system architecture, but are there some risks or complications to that? I wonder if somebody might think, well it was really easy for me to just open my method and change it and extend my system and now you’re telling me I have this distributed message system I have to deal with. How would you speak to some concerns that somebody might have about added complexity with these different moving parts that maybe they’re not familiar with?

Chris Patterson 00:06:22 That’s definitely the trade-off that teams have to think about. And it’s one of the questions I think one of the early statements made on by one of the the elders of software development so to speak, is don’t distribute if at all possible, don’t distribute. When you build a distributed system now you have two problems or eight problems or 12.

Jeff Doolittle 00:06:41 Well it goes back to L. Peter Deutsch and The Eight Fallacies Have Distributed Computed. And we’ll put a link in the show notes to the episode where I interviewed Peter talking about just that. So yes.

Chris Patterson 00:06:49 Yeah, exactly. It’s the complexity does go up when you build a distributed system. Now, again, since we’re talking about a message bus and you mentioned MassTransit at the beginning, it isn’t always that complexity that needs to come in. So it is possible to do durable asynchronous processing without introducing a huge amount of complexity. But to discuss what’s a task there? Let’s talk about some atoms that are part of a message bus system. You have producers and you have consumers. Producers are things that run, they could be action methods on controllers. They could be a database process or some sort of command line script that runs, loads some data in a database and produces an event. They can be anything that happens where there’s a change in state and that change in state needs to be notified AKA produced as an event so that other parts of the system can react and respond to those changes of state.

Chris Patterson 00:07:43 I mean that’s event driven architecture by design events happen and systems respond to those events. So that’s producers with consumers, it’s at least with MassTransit, it’s a thing that takes an event, a message, a command, an event, any type of message and does something with it. They’re very similar to controller actions. They are a method called CONSUME that gets the event passed to it. It’s the Hollywood principle. Don’t call us, we’ll call you. You don’t have to worry about reading and pulling from a file and parsing all that stuff. The framework sits on top of RabbitMQ or Azure Service Bus and delivers a nicely wrapped message to your consumer in a method that you can then do whatever you want with. You can call HDP calls, you can update database rows, you can produce other events. I mean any of the things that you can do in code you would do within that consumer.

Jeff Doolittle 00:08:35 Are there any real-world analogies that you can think of that help people understand what a message bus does? Maybe it’s correlated to something people are familiar with in the real world.

Chris Patterson 00:08:45 I mean the most basic thing that I can think of is, you go to an ATM and you take out $80. Now the ATM is not making a real time call to your bank to see if you have $80.

Jeff Doolittle 00:09:01 Wait, what? Itís not?

Chris Patterson 00:09:03 Apparently not based on something that happened on social media a while back and a bunch of people overdrew their accounts. Yes. But believe it or not, as long as you meet a threshold it will give you money. And if it’s too much, hey that’s when the fees come on. And that’s a profit opportunity. So when you make a withdrawal from that ATM, it’s storing off the event or the transaction record that you took $80 from this account number and that is then produced back to the backend system, which then adjudicates those at the next business day. And all of that stuff walks out and it goes from pending to completed or whatever happens. And the reason for that is just the scale of debit level transactions across the network on any given day. I mean everybody taps to pay. Everybody does everything. The ATM network, they all work very similarly.

Jeff Doolittle 00:09:50 Yeah, it’s a great analogy too because if it’s down you can still get cash at least for a period of time. And I think that hearkens back to the example you gave before of being able to extend. So what if I had to crack open the ATM code in order to have it email the person who withdrew the funds and then if I wanted to introduce SMS, now I have to crack open the ATM code and introduce SMS and now we want to do push notifications to your banking app. So we’re going to update the ATM code. You got it. So I feeling that.

Chris Patterson 00:10:14 You got it. Okay, that’s a perfect example because all those things that happenÖ

Jeff Doolittle 00:10:18 But so when I said before how does this avoid making things more complicated, but what I just said kind of sounds complicated too. So sounds like there’s trade- offs here between which one you should do when.

Chris Patterson 00:10:28 There is. So one of the things about building a distributed system is being able to F11 through the debugger or F10 through the debugger depending which IDE you use these days is not there. You can’t just trace through like your code you can today. And the reality is you can’t do that anyway. Because the second you call an HTTPI, API or call a stored procedure or a database function, chances are you’re not tracing into that anyway. You’re waiting while something remote happens and then hope all ends up well and comes back. So with a distributed system you can’t really trace through that. You could have multiple processes running in multiple debuggers and when you produce a message here, you could then trap the message consumption through a breakpoint in that other code and it is possible to do and people have done it.

Chris Patterson 00:11:16 I tend not to do it that way because I find it extremely frustrating, but it does increase the complexity. You can’t just step through the transaction script code and see everything happen. But is that really the case of any system these days? I mean we’re building systems that are more complex and having to interoperate with other systems and we don’t want the 32nd call to the service provider to block the checkout accepted screen on our ordering site when we can defer that. If you look at Amazon, if you really think that they’ve checked your credit by the time you hit buy, no it doesn’t work that way. Your order is created and they get your intent to purchase and the adjudication of that purchase happens all in the backend asynchronously.

Jeff Doolittle 00:11:56 Yeah and there you go. I think that’s a good place to land that opening section, which is we try to model systems as if the world is synchronous, but what many of us have discovered is that the world is not synchronous, the world is asynchronous and so the systems we should design should reflect that fact and they’ll be easier to understand.

Chris Patterson 00:12:14 A hundred percent. Totally agree.

Jeff Doolittle 00:12:16 So before we get too much more into message buses and then MassTransit specifically, I do want to help listeners understand something that I run into a lot, which is what’s the difference between a message bus? You mentioned a few, you mentioned SQS from Amazon, Azure Service Bus, you mentioned RabbitMQ, there are others. What differentiates that from an event streaming system like Kafka or Pulsar, which listeners might be familiar with?

Chris Patterson 00:12:41 Yeah, it’s a question I actually get a lot because sometimes people say, well we already have Kafka, can we use that? And while MassTransit can consume and produce events from Kafka, we have to think about the overall architecture of what Kafka is. Kafka is a highly available partitioned log file. Anything you write to it in case, they call it a message, is just appended to a log within a specific partition. And it’s kept for whatever the retention period is. A lot of companies keep it for a long time and if they’re using Cloud Kafka providers, it works out great because they don’t have to manage it and they get an event stream that they can replay across however many times they want. But that’s what it’s for. It’s a log file that has a nice API that lets you consume things from it and they provide a lot of value added services on top of that to do things like roll-ups and, looking at key fields and saying, well I want the most recent version of this partition key.

Chris Patterson 00:13:33 And so things like that, what differentiates a message broker from say Kafka or Pulsar is message brokers. There’s a number of semantics that they have to support. Things like message locking we’re dealing with queues so we’re pulling messages from the front of the queue, locking them, processing them, and then removing them from that queue. Whereas in Kafka you’re not really removing things and the distributed locking, the scalability that comes from being able to have multiple consumers on the same queue and have them all load balance across available workers is one of the key benefits of using a broker. Much like you would use a load balancer on our front end, like Ingenix to talk to multiple backend servers, you would use a queue and you could have multiple consumers on that queue to load balance and scale up. You can even set scaling rules to say, hey, run more instances of the container if there’s more messages in the queue. And you can do things like that with Kafka as well. But Kafka is really more so just about pub/sub and not about that dispatch type processing of consuming from the front of the queue and then, either writing to other queues or producing other events and all of that type stuff.

Jeff Doolittle 00:14:41 Great. And I think as we continue in the episode, it will become clearer specifically what a message bus is for and the use cases that it’s more catered to. Before we dive into that though, and talk a little bit more in detail, what are some use cases before we dive in too deep where a message bus just might not make sense?

Chris Patterson 00:14:58 Again, we’re going back to that. Why do I just want to distribute if I don’t need to? If you have like a read heavy application that is just reading data, say you’re doing a bunch of munging on stuff and let’s say you have a pricing service and the pricing service does nothing but grind over catalogs, process contract rules and output a bunch of tables that say what the price is. That’s something where you’re going to want low response time, high availability, and you’re going to want to scale it out many times. A case like that, you might just use a pure API against a Redis cache or an in-memory database that you want the absolute least amount of latency possible. So that’s a case where you might not want to use a message bus for that. There are plenty of cases where it’s a closed system or it’s a processing and the events may be produced elsewhere.

Chris Patterson 00:15:44 ETL type things. I don’t really think that’s a case where you would want events, but the reality is, is because of the benefits of asynchronous processing, and I know one of the other things we need to talk about is like message persistence and durability. You might want to use this because you get that durable asynchronous processing. So those are things that we probably want to cover as well. But cases where you’re never going to scale out, you’re never going to need to produce or consume events from other systems. The reality is there are always cases where you might need to take that high read scenario that we just discussed. If the high read system has an in-memory cache and the backend database is updated somehow you must notify all those scaled out workers that the cache was just refreshed and to dump everything from in-memory. Okay, well how do you do that? Do you call the database and check a flag to see last update and if it’s there do it or do you just push an event over the message bus so that it then dumps, itís in memory caches and all calls go back to the database at that point? So it’s going to depend upon cost and complexity and what you need, but you may not want the message bus to be part of that execution path if the absolute lowest latency is needed.

Jeff Doolittle 00:16:57 Great. And of course we always end up mentioning at least one of the two hard problems in computer science. You just hearken two cash and validation, which is one of them. And of course then there’s naming things and off by one problem. But moving on from that, you had a great segue there for a second. I want to go back. You mentioned persistence and durability and the importance of talking about those in the space of a message bus. So I think most listeners know what they are, but just briefly, what is persistence and what is durability and kind of what’s the difference between those concepts? And then why do they matter in the context of message buses but also in the, just in systems design generally?

Chris Patterson 00:17:29 So yeah, so when you think about an HTTP request coming into your system, they’re calling an API and you’re doing something and if that HHTP connection is broken or if that in-flight controller crashes or that process crashes, it’s gone. I mean you’re cutting the HTTP connection, they’re getting a 500 error from the load balancer and then they basically have to call the API again. With a message bus and using a message broker as the ones we’ve mentioned, they all support durable messaging. And what I mean by durable is when a message is published, it is acknowledged, and the task completes knowing that that message is reliably written to disc on the broker so that it is not going to be lost. So if you have an API and that API submits an order and that order is written to a queue, that order is then saved in that queue, it’s durable and the consumer of that message is going to be able to receive it regardless of what happens in that API call. If the caller disconnects or if there’s a network failure or anything like that order has been reliably captured in that queue.

Chris Patterson 00:18:31 With HTTP, if I was calling into my API and then I was making an API call to the order system like directly to an ERP to create an order on say SAP or something like that. And it takes 8, 9, 10 seconds if you’re lucky. And you don’t really have a deterministic outcome from that. If something crashes, by riding it to a queue, I get that durability and I know that I can process that message. The other benefit of that is it’s decoupled. If you think about it, it’s that loose coupling between the two cars of the train. That means that if the backend processing of orders slows down, we’re still able to accept new orders into that queue. If we’re receiving a big burst because we just had a halftime ad during the Super Bowl and it’s like, hey, punch in this code to get your free watcha-McCall-it, all of those incoming messages could be written to the queue and we could process those as we have the system capacity to do so and they won’t be lost because they’re in that queue.

Chris Patterson 00:19:29 So it’s a great way to handle bursts of traffic without overloading your backend system or your database that might have a three or four second process to do some sort of order valification or adjudication or talking to the warehouse system or any of those type of interactions.

Jeff Doolittle 00:19:44 That’s great. So effectively, once I get a message on the bus, so to speak, now basically it’s in a database and there are some guarantees about how that’s going to be persisted and then eventually delivered and then when it’s processed, there’s also guarantees about how it will be removed from the queue with the knowledge that it has been received by whoever needed to receive that message.

Chris Patterson 00:20:04 Exactly. And it happens on the consumer side too. When I consume a message from a queue, I’m getting a lock on that message and say I have five minutes to process it and so I’m going to do a few things and if it’s going to take me a little longer, it’ll renew that lock and it does these things under the covers for you. These aren’t things you have to do explicitly, but if the process crashes, that message is still on the queue and when that lock time’s out, it’s going to redeliver that to a consumer once again. And so that consumer can say, hey, now I’m going to process it. And there are counters and fields in there where it can look and say, hey, I’ve tried this 10 times and I keep crashing the process. Maybe I should take this poison pill and put it somewhere else because this is clearly a special case that’s causing a system issue. And so then MassTransit will move it to like an error queue or a dead letter queue to say, hey, these need to be handled some other way because they keep taking down the system.

Jeff Doolittle 00:20:51 Well, and there’s a great transition because you just said MassTransit instead of message bus. So let’s talk about that. We’ve been talking about message buses, which is a good foundational topic to explore here. But now let’s talk about MassTransit. What is it and how does it relate to the conversation so far?

Chris Patterson 00:21:07 So MassTransit is an open-source message bus. It’s a distributed application framework because it’s more than just a message bus. It provides an abstraction that sits on top of RabbitMQ, Azure Service Bus, Amazon SQS, and Active MQ. And it provides publish and subscribe semantics as well as direct send and in even request response semantics for different message patterns. So there’s a number of different message patterns. I just described three, there’s a few more. But the abstraction provided by MassTransit makes it very easy to write your applications using, again, I’ve made a mention of, ASP Net’s controller and action relationship before we went to minimal API obviously, but because now everybody wants to do that. But it’s very much a way to say, hey, I have a consumer which is just a class, I want to consume this message type, which is just implementing an interface, or I want to consume these three message types, which is just implementing three interfaces of the same one with just different message types.

Chris Patterson 00:22:04 And then I get a consume method and these things MassTransit takes care of all the wiring. It does things like the serialization, the writing and reading of messages to the transport, whether it’s any of the mentioned brokers. It deals with all of the configuration. MassTransit has this concept of framework defined infrastructure. So you create consumers and message types and any of the different types of consumers, MassTransit supports. And then MassTransit will then go out and create the appropriate topics on Azure service bus or queues or RabbitMQ exchanges or any of the underlying message broker infrastructure components such that you don’t have to worry about that. So when you think about migrations from like entity framework and how you go code first and you create a class and you create a map and then it goes out and creates a database table for you, MassTransit does the same thing, its framework defined.

Chris Patterson 00:22:57 It says, okay, well based on what you’re consuming and producing, I’m going to go create these topics. I’m going to create these cues; I’m going to set all this up for you and make sure it’s wired up correctly. So those are just some of the things that MassTransit provides, but it makes it easy to say, I want to write my code, maybe I want to test it locally on RabbitMQ, but I want to deploy it in production in Azure Service Bus for our Azure customers and maybe in Amazon for our Amazon AWS customers. So gives you some portability across all those different underlying brokers.

Jeff Doolittle 00:23:27 Yeah, that’s great and sounds like it’s developer focused as well with the fact that you don’t have to spend a bunch of time learning how to set up the brokers or the exchanges and queue or whatever there is in the underlying broker technology that’s kind of taken care of for you, which is nice.

Chris Patterson 00:23:41 I’m glad you said that because when we first started MassTransit drew sellers and I guess it was what, 16, 17, however many years ago, our key focus was being developer first and not spending too much time thinking about writing things in other languages or four GLs or any of that. If you’re writing in C#, you’re right at home with MassTransit because you don’t have to learn a bunch of other languages.

Jeff Doolittle 00:24:07 So are there any use cases before we get into more of these patterns, we’re going to talk about event-driven systems and different patterns of distributed architecture and things of this nature, but are there use cases for simpler environments as well? Like if someone’s building an on-prem system or a monolithic architecture where they might introduce something like MassTransit, like can I add it to an existing system? Do I have to start from scratch? Like how can this fit in say a Brownfield application or a more classic on-prem or monolithic environment?

Chris Patterson 00:24:31 There’s a number of options there and a lot of customers do that. Customers that I’ve worked with through providing support for MassTransit, a lot of customers, and this is a really solid use case that they do a lot is, believe it or not, there’s a lot of companies that have a lot of .NET framework 472 or any of the old traditional framework. But they have teams, and their developers are like, well we need to be building everything with .NET8 now. So they’re like how can we?

Jeff Doolittle 00:24:56 Itís going to be nine in a week by the time the show drops. So yeah.

Chris Patterson 00:24:59 Yeah. Well nine is out now but eight is LTS. That’s right. If you’re doing six, you’re way behind the times. Yes, that that’s right. Because six is end of life already, which freaks some people out. But it’s like they were upfront about it. But you have these teams that want to build modern .NET core applications and they want to have fun with all that, but they’ve got this existing monolith. So a lot of the times I see customers, they’ll use MassTransit to just add event producers to their existing monolithic code. And so then they’ll write that out to like RabbitMQ on-prem. because you can run RabbitMQ anywhere. It just runs anywhere. And what they’ll do is then they’ll create their additional side-by-side services that are running over in they might be running in containers or a lot of them might just be running as Windows services using the .NET generic host.

Chris Patterson 00:25:41 And they’ll make those run and process messages from that same message broker using a couple of, they might have like a shared message contract library and say these are the events that are produced from the monolith in quotes and the new microservices over here are then processing those events and talking to our other systems. Or, in one case we had a customer that they were actually building software on a fairly large platform written in a language that we don’t want to talk about too much, but they needed to be able to call external APIs with OAuth validation and their monolith had no capabilities to do that. So what they did is they actually produced commands out of their monolithic application writing to the message broker and then their microservices built-in .NET6 at the time would then get those request messages, make the actual HTP calls with all the OAuth tokens and everything, write the responses back onto another RabbitMQ, which would then be picked up by the monolith. And to the monolith it was just an RPC call, which it was familiar with because that’s how most monoliths are written. And it just took a little bit more time because it was going out of process and talking to another third-party service. But it met their needs and was well within their capacity requirements. So it gave them that ability to kind of build new technologies into their existing monolith without having to disrupt the monolith or do the, oh well let’s try to migrate the whole thing to .NET8. Yeah, that’s never going to go well.

Jeff Doolittle 00:27:03 Or rewrite the whole thing but let’s not go there.

Chris Patterson 00:27:06 Yeah, that’s even worse.

Jeff Doolittle 00:27:07 We can do a whole show on that.

Chris Patterson 00:27:09 I’m sure you have.

Jeff Doolittle 00:27:10 Yes, but I avoid it now. So again, moving on, great segue there actually, you started talking about concepts that fit firmly in the camp of event driven architecture and, you talked a little bit about how for an existing maybe monolithic system or something like that that you can use a tool like MassTransit to support event driven architecture. What are some other aspects of that that you can share when we talk about robustness or scaling or retry, how do those play into this and give us some concrete examples of where you benefit from using a framework like MassTransit for those sorts of concepts.

Chris Patterson 00:27:42 Yeah, so when you talk about the capabilities provided by the framework and by the framework being MassTransit, there are a number of different things that you can do. You mentioned like robustness and scaling and retry. When you’re reading from a queue you automatically get these multiple consumers on the queue scaling so you’re able to scale up and manage that. But as far as like robustness things like if you do have a system failure such as a downstream system that API you’re calling to validate the price might be down. They might be having an outage, or someone ran a backhoe through your fiber connection at the data center. It doesn’t happen much now with stuff in the cloud but when you’re talking on-prem it happens all the time, especially if you have facilities all across the US. So, the internet goes out.

Chris Patterson 00:28:25 So in those scenarios you want to be able to do things like okay well I need to stop processing from the queue, or I need to retry because maybe it was just a transient spike. Maybe it was just one of those weird five hundreds that, or maybe I exceeded my rate limit. So MassTransit has a lot of what we refer to as middleware that allows you to do things like set up retries, set up kill switches, which I think a lot of people call it a circuit breaker. But essentially saying, hey I’ve tried to call five times and it’s not responding. I’m going to stop for a little bit and wait and, give the system a chance to settle and then go back and come back in a couple minutes and try to start processing again. Versus just throwing everything into an error queue and saying it’s not processable.

Chris Patterson 00:29:04 So those type of things give you some benefit there. I’ve just mentioned error queue, if messages are completely un-processable and you’ve specified retry 10 times and if you get this error this is actually bad, you have some granularity in how you specify things like retry handling, rate limiting of saying hey I can only call some of my customers use Shopify and it’s like you can only call Shopify a hundred times a second. If you call it more than that. You start getting weird errors. So they put rate limits on their consumers that are actually updating their inventory price or creating orders in the Shopify system so that they don’t overflow the API and cause weird downstream system errors. Because when you’re dealing with a lot of these external providers, especially like a Shopify or a Stripe or any of the payment providers, it isn’t a one-way conversation.

Chris Patterson 00:29:49 When you talk about a distributed system and building in an event-driven way, you also have this concept of if I call Shopify and create an order, I’m going to get a web hook call back that’s going to be like, oh I have an order created or that order is processed or that order changed state. Handling those web hooks, that is a use case where using a message bus is a hundred percent the way you should be doing it. Because if you’re letting a webhook call go straight into your database at the beheads of whatever that provider is doing, you could miss it. Because if you fail to process it, you might not get it ever again. So all of the backend like Shopify webhook handlers and these other systems, that hit you up when you receive money like through Stripe or whatever, all of those in the cases of the customers that I’ve worked with are getting those and writing those to cues so that they can then process those at their own leisure’s using a variety of the different capabilities of the system.

Jeff Doolittle 00:30:42 What are some common challenges you’ve seen when people want to adopt an event-driven approach to their systems when these concepts are new to them?

Chris Patterson 00:30:49 A lot of people, especially when they come from like the API samples or, a lot of people learn from samples. They’re like, most of the questions I get through the support form are like how do I do this? Is there a sample of this? And you can tell that that person literally just got a requirement dropped into their Jira bucket and they’re like, okay so I’m just going to cut and paste this to Chris and say do you have one of these? Because I would just want that.

Jeff Doolittle 00:31:12 So this was MSDN back when we were younger and now it’s been stack overflow, who knows what it’s now it’s chat GPT.

Chris Patterson 00:31:18 Yes. Now it’s chat GPT. Fortunately yes and I’ve, the amount of chat GPT code I’ve been submitted from a support request is hilarious because I can almost recognize it now because it has my variable names in it. A lot of people come from this transaction script mindset. They call my AP and I’m going to do some things right now and then I’m going to be done. And a lot of it comes from just the request response mentality. You think about an API request, you get a request, you do some things, and you respond. So it’s how a lot of people think and MassTransit supports that very well. I mean it has the capabilities to do it and it provides a lot of cool capabilities to make that actually even better than it is with just like an API controller. And we’ll get into that when we start talking about like sagas and state machines and things like that.

Chris Patterson 00:32:03 But to go back to kind of the question, the challenge is it’s like it’s getting around the mindset of, what is synchronous and what is asynchronous. And when I say synchronous, I also mean like async / await. If you’re using await, it’s synchronous. If you’re waiting for something and that’s fine, you could do await and I mean it really increases the scalability of your application. But the difference of async processing through durable messaging is you may not need to await; you can just guarantee that that fire and forget you’ve said something needs to happen and it will happen. And it will happen. But you don’t need to wait for it to happen. So you can respond to that API and get your latency down a few seconds. It’s thinking about that. The other challenge we discussed earlier is that debugging experience. You can’t just F10 through it and see every line of code execute and complete so that’s kind of a shift. Believe it or not, one of the biggest challenges or impediments is getting buy-in from leadership. The people on the team, they watch one of the videos on YouTube from any of the influencers in the .NET space or

Jeff Doolittle 00:33:04 They listen to Software Engineering Radio.

Chris Patterson 00:33:06 Or they listen to this, and they think I have a shiny new hammer. I am going to go pound some nails with this thing.

Jeff Doolittle 00:33:13 And everything’s a nail now.

Chris Patterson 00:33:14 Of course, of course it is. And so then their leadership comes in and they’re like, yeah but we really don’t have the capacity to stand up rather than queue or we don’t have that. And the nice thing is that with MassTransit, I’ve now put the kibosh on that statement because now if you’re using SQL Server Postgres, you can use those as a transport. MassTransit has a built-in SQL transport where you can just create and do all the same capabilities, you’d get from RabbitMQ even beyond that are available by using SQL as a transport with just MassTransit out of the box. So now they can’t argue about separate infrastructure having to be set up. So now it’s just a question of can you sneak it in without leadership finding out? Not that I would recommend that, but well Ö

Jeff Doolittle 00:33:56 You shouldn’t ask for permission to do the right thing, but

Chris Patterson 00:33:58 Forgiveness, permission, anybody who’s married knows how that works. Again, could be a wholeÖ

Jeff Doolittle 00:34:03 Okay thatís another episode but we’ll move on.

Chris Patterson 00:34:05 Exactly.

Jeff Doolittle 00:34:06 Let’s dive a little bit more into a few topics related to performance and operations around MassTransit. So I’m sure it depends somewhat on the transport, but give us a sense of how many messages, in what amount of time can MassTransit handle in common scenarios?

Chris Patterson 00:34:21 I get that question a lot. People are like, what’s my throughput? How can I get that? And for producing durable messages to RabbitMQ, I can do thousands a second, I mean like eight, 10, 12,000 messages a second. So it’s really going to depend upon how big of a broker you provision. With Azure Service Bus I created like an M4, it was like $600 an hour or something. It was ridiculous. But I ran it long enough to run the benchmark, it might’ve been per day, but I was able to get around four to 6,000 messages a second produced and consumed through Azure Service Bus using that level of configuration. So it’s whatever you’re willing to invest in it from the underlying infrastructure because the broker is the bottleneck. And so it really depends, and it depends upon how you’re using it, how many queues, things like that.

Chris Patterson 00:35:06 MassTransit encourages the use of many queues. So you don’t have two people requesting two separate services waiting in the same line. Itís doesn’t have to be the DMV, you can actually fan out and have separate queues. But the scalability of the broker is the real thing. What I found at least in most applications is people think they have a lot of throughputs and then they realize they don’t because their database might take 10-12 milliseconds to do each operation, which if you do the math, that’s a hundred a second. And being able to do more than a thousand messages a second with like the SQL transport that I mentioned is more than enough for most customers. So it makes it applicable. But there are people doing some very, very high-volume distributed system stuff with RabbitMQ to the point where they even don’t even need message durability. They turn it off and I’ve seen 16, 20, 204,000 messages a second flying through RabbitMQ without any real major issues there.

Jeff Doolittle 00:36:03 So we mentioned before during the show there’s challenges with distributed systems because it’s hard to just get a debugger going on these multiple distributed systems, which leads into some topics that are, I think, hot topics in our industry today related to monitoring and observability. So speak a little bit to how MassTransit helps support those.

Chris Patterson 00:36:23 For sure. MassTransit’s on version eight which came out I want to say like 18 months ago, it’s getting on two years, but the main capability brought in there was kind of the standard .NET observability using open telemetry. So when you’re processing messages from the time that API call hits in all the way through all your message consumers sagas, scheduled messages, everything, you’re able to see that complete trace in like an open telemetry dashboard such as application insights or even if it’s just using kind of the, some of the freer ones. I know Dynatrace has support for it as well, but the ability to visualize and see that complete trace of every message through the system has really, I think opened up the observability of distributed systems and made them a bit easier to understand. MassTransit actually uses open telemetry internally for some things like checking to see if something is done like in the testing framework.

Chris Patterson 00:37:16 MassTransit has a robust testing framework and it makes it easy to unit test your consumers, your sagas produce events for them and like stimulate them to do things and do so using an in-memory analogy of a transport, I wrote an in-memory implementation of RabbitMQ so that you could test like publishing consuming, load balancing, all of that type of stuff and you’re able to test your consumers with that. It’s how I recommend people test things because if you’re testing just your consumer for its inputs and outputs, you’re truly unit testing, you can also fully set up its fully container based. You can figure it the same way. You have a service collection, you add MassTransit test harness, and you can just test your consumers as they are with their dependencies. It’s also really easy to set up like dummies or test doubles to say, hey, if this message is produced, produce this one in response so you can like simulate other parts of the system that a consumer might be dependent upon.

Chris Patterson 00:38:09 And it also works with the ASP.NET web application factory. I think it’s their test application framework that they have for ASP.NET where it actually stands up your APIs in kind of a test host, MassTransit trends, it works with that as well. So if you add the test harness, it’ll replace say your Azure Service Bus configuration with an in-memory one that allows you to like test your API controllers or your MAP gets, MAP posts, all that kind of minimal API stuff directly with interactions with MassTransit consumers. So it’s really robust and it makes it, I think, really easy to test and you can even in your tests output like a call graph of the open telemetry taste to see what happened and where the time spans were of everything.

Jeff Doolittle 00:38:50 That’s great, that visualization really helps.

Chris Patterson 00:38:52 Yeah, when I look at it and I see something that looks like it shouldn’t be there, it’s like, oh yeah, no wonder it’s taking 12 seconds because it hung up on something.

Jeff Doolittle 00:39:01 Well and something else too, speaking of debugging and troubleshooting, you mentioned before if a message, maybe you try it 10 times and then eventually you put it in a dead letter queue, which we’ll put notes in, in links in the show notes for listeners to go learn about these things. But effectively now I imagine what you can do is you can go grab that message and may maybe redeliver it in a test environment and get the bug to show up and then you can fix it, deploy the fixed code, and then publish the message back to the main queue and fix the bug and nobody, maybe the customer never even knows the difference.

Chris Patterson 00:39:31 Yeah, I mean you should be in advertising.

Jeff Doolittle 00:39:34 I’m not trying to be such an advocate, I’m just saying like it seems like,

Chris Patterson 00:39:36 But no, that’s exactly, something you could do right?

Jeff Doolittle 00:39:38 Yeah. Which you couldn’t do in a request response scenario.

Chris Patterson 00:39:41 I mean we used to do that all the time, like try to catch, we would like write out to a log file the input parameters of an API call to try to what the customer was doing or dump their JSON payload to disc in a directory. Now you have the message. Yeah it’sÖ

Jeff Doolittle 00:39:55 Oh youíre going to talk about security next by the way.

Chris Patterson 00:39:57 Oh no, not that.

Jeff Doolittle 00:39:59 Yeah, right.

Chris Patterson 00:39:59 But yeah, being able to take that message out of the error queue, try it in a non-prod environment, get the bug fix out there and then just move those messages from the error queue back into the queue for reprocessing. Yeah, customer wouldn’t have to be any of the wiser, it just gets fixed.

Jeff Doolittle 00:40:12 So speaking of security, are there any security related issues to consider either things that are potentially made simpler or easier or things that are challenging in this kind of a MassTransit framework environment that people should be aware of?

Chris Patterson 00:40:25 I’ve had to help customers fill out a number of different security audits for their ISRM teams, their risk management teams. I think the, from an architectural guidance perspective, messaging should stay within the domain. And if you have multiple domains such as warehousing and billing and auditing and ordering and all of those different domains, be careful what messages are public or shared across those domains and which ones are in an internal domain. MassTransit has a multi bus feature where you can connect to multiple brokers and for large scale systems, I’ve actually seen customers have like a RabbitMQ per vertical. Accounting might have their own RabbitMQ and warehousing may have their own RabbitMQ and each warehouse might have their own RabbitMQ on-prem running in a warehouse. And then there’s an overall Azure Service Bus on top of that.

Chris Patterson 00:41:16 And with MassTransit connect to multiple buses and you can say, okay, well these are the events that are across the whole application landscape and these are the ones that are local to say Accounting. And so when you think about it as like public or external events or internal events and that type of encapsulation really helps keep the security folks happy because then it’s very clear what you’re communicating outside of your domain. But one thing that I definitely don’t recommend, or should I say highly discourage is people saying, oh well messaging is so great, we love it so much, let’s make our customers call us through Azure Service Bus instead of an API. APIs are great because they have things like authentication and they work well with your auth providers like Okta or Auth Zero or, active directory.

Chris Patterson 00:42:00 There’s reasons people use APIs with all of these layers of security because they’re easy to audit and trace and control access to. Message brokers are a little more wild west and that once you’re in the message broker things like role-based access control and stuff get very weird and they just don’t make sense in messaging. And I see a lot of people try to apply these to it. So what I generally recommend is to keep your security folks happy and using the standard tools that they’re familiar with, stick to APIs for your external interactions with outside third parties or customers and then in those APIs, produce or consume those messages and get those out because you’re going to pass a lot of security audits that way. I mean message brokers, if you get down to like message level signing and authentication and everything, you’re bringing a lot of complexity and robustness and details.

Chris Patterson 00:42:48 If you think about the onion architecture or kind of the concept of core domain from the domain driven design book by Eric Evans, that core domain should be your business domain. And those external things like OAuth2 or Open ID connect or those things that change every 5-10 years in our industry should be at the very outside of your architecture and shouldn’t make its way into the core domain. And eventing is because of the way people are architecting systems and with asynchronous processing and durable eventing in mind, it’s a fact of life that that type of event bus architecture is part of your core domain at this point.

Jeff Doolittle 00:43:23 Letís talk about some patterns that frameworks like MassTransit support and starting with just from a high level talk a little bit, we don’t need to get into too many details about what they are, there’s other episodes where people can look online. But let’s talk about how MassTransit supports and relates to approaches like event sourcing and CQRS, which is Command Query Responsibility Segregation.

Chris Patterson 00:43:41 Ah yes, CQRS and event sourcing. Two of my favorite words, MassTransit is not an event sourcing framework. So I’ll say that upfront, there are some of those out there. But in an event source system, you may be able to produce events from those that would be consumed by MassTransit and could be used by a number of the capabilities in the framework to store things like you would do in a CQRS system. So Command Query Responsibility Segregation, the separation from reads from writes is really what I just simplify it down to. MassTransit, it’s what it was built for. I mean your commands are coming in, they’re validated commands, they’re 99.9% guaranteed to succeed. You’re writing those commands to a queue and you’re processing those with your consumers or, other messaging patterns that you might use to get those done.

Chris Patterson 00:44:25 The events produced by consuming those commands are then used to populate things like your read stores or your view caches or any of those things that are providing that query response back to the customer through those APIs. You may be using events to update a Redis database of order status so that when an API comes in, you just hit Redis and say hey, what’s the current order status? Now MassTransit has some better capabilities for that I think to get you more up to date information without having to maintain a separate cache. And we can go into those in the pattern section. But the intent being if you’re building a CQRS system, you’re more than likely going to use a message bus. Whether it’s end service bus, MassTransit, Rebus, Brighter Command there’s a number of different frameworks that are out there. I have a favorite but some would say I’m biased.

Jeff Doolittle 00:45:12 Well you mentioned patterns and one of the patterns that comes up a lot in this context is sagas. So what are they and what can they be used for? And give listeners some specific real-world examples if you can.

Chris Patterson 00:45:24 I will. So saga messaging pattern and when I talk about it in this context, I’m really talking about a saga that maintains state. So when you think about a message consumer, a message consumer consumes a message does something and then it’s done. It doesn’t have any state, they’re stateless. Now if you write a consumer that goes and updates a row in a database, well you’re kind of manipulating state with a consumer, but the consumer itself doesn’t maintain state. Sagas are stateful. And within MassTransit the best way to create sagas is by defining a state machine. So, something happens that initially creates an instance of that state and the state machine defines the behavior of events and how they interact with that state over time. So, when we talk about an order when an order is submitted, that order submitted event might initiate a saga tied to that order ID.

Chris Patterson 00:46:15 And then as subsequent events in the system happen, those events can be correlated back to that same order such as order payment approved, order fulfilled, order shipped, all of those type of events in the system could be correlated back to that order and update the state on that saga. So if the end user wants to get a status of their order, they may produce a order status requested event, which could be consumed by the saga and then the saga could then respond back because sagas can participate in request response, respond with that state back to the API caller, therefore eliminating the need for a read store because the state of the order is right there in that saga and it’s just observing those events. When you think about terms, there’s two terms, orchestration and choreography. When you think about orchestrating a process such as an order process, sagas are the orchestrator for like an order when the order is submitted, the saga could then send off events to say, hey, I need to go validate that order.

Chris Patterson 00:47:14 I need to go, check payment, I need to go see if they still have a valid customer account. If it’s a controlled item, maybe they need to have authorization to purchase, are they authorized to buy? And then they might have custom pricing too. The order might be like, hey, I know you quoted me pricing on the site, but if they’re a gold customer they might get 10% off. So we might do some price adjustments based on that order. Anything that we’ll want to do with that. And that’s what sagas are meant to do. They’re meant to orchestrate that process. The state machine syntax makes it very easy to think of state and behavior separately such that when this happens, if I’m in this state, I want to do this thing. And they can be as simple as complex as they need to be to meet the business requirement, but it gives you that nice stateful consumer behavior that you might otherwise get. If you were to have a consumer that would say, oh well I want to consume an order process message, I’m going to go out to the database, I’m going to load the order, I’m going to check if it’s in a certain state. I mean that’s what sagas really are meant to replace is those nouns in our system that need to have something done to them.

Jeff Doolittle 00:48:17 Okay. So they can help you manage a multi-step process effectively.

Chris Patterson 00:48:21 For sure.

Jeff Doolittle 00:48:22 Okay, that’s great. And speaking of multi-step processes, there’s a couple other patterns, that it talks about on the MassTransit documentation transactions and outbox. Let’s kind of layer those in and let’s talk about how transactions, we all know they’re tricky in a distributed system. And when I say transaction, I don’t necessarily mean a database transaction. I mean a business transaction. I think we kind of got this a little bit with sagas, but maybe talk a little bit about more about how MassTransit can help with transactional consistency in business processes and how things like outbox can help with that.

Chris Patterson 00:48:51 For sure. So when we talk about state, it has to be source stored somewhere. And so sagas state can be persistent in a number of different places. I mean SQL Server, Postgres, Azure Cosmos, I mean there’s a number MongoDB. Yeah, MongoDB. There’s a ton of persistence providers for MassTransit to be able to store that saga state and handle the locking on it. because one of the things that I didn’t mention about sagas, but it’s worth mentioning, they lock that state. So only one message is being processed by that state machine for that particular state instance at a time. And if there’s a optimistic or pessimistic concurrency, depending upon the transport, it will say, oh well I did this work and I went to finish but I couldn’t because someone else finished before me. That optimistic concurrency. So that’s where we would want to bring in something like an outbox and MassTransit has a transactional outbox or just an in-memory outbox depending upon what your needs are, to be able to say, okay, well I’m going to process this and if everything goes well, the events that were produced by that state machine are going to be then sent to the broker.

Chris Patterson 00:49:50 Because if you think about it, if I was to process optimistically and say, okay, well I do these things, I publish these events to the broker and now I’m going to go save this and oh I got a concurrency conflict because some other process was processing a message for that same state instance. Now I’ve got like events floating out the system that may or may not have happened or may have been duplicated. So the outbox eliminates that. It basically makes that transactional commit of that saga therefore result in the messages being produced once versus, false messages being produced through the system. So it helps increase that transaction ability of state machine. Now the outbox actually has two different modes. That’s the consumer/saga outbox. The bus outbox actually is we talked about how to produce messages. Well you may have an old school process that just talks to a database and search some rows in a DB context and then says save changes async.

Chris Patterson 00:50:41 And if it works you want to produce an event. But with the bus outbox you can actually produce that event within that same transaction. And this time we are talking about a database transaction such that when you call save changes async only if that transaction is successful will those messages actually be written to the database, which then get replayed back up to the broker. So that’s another way that the outbox pattern kind of comes in there and, and it’s all built in. You just configure it and it just works. It works for MongoDB or Entity framework.

Jeff Doolittle 00:51:09 Okay. So basically you can have message publication participate in a database transaction if I understood you correctly.

Chris Patterson 00:51:16 Yep, a hundred percent that’s what it’s for.

Jeff Doolittle 00:51:17 Okay, that’s great. Let’s talk about another pattern. Job consumers, what are they and when should they be used?

Chris Patterson 00:51:24 So this is probably my favorite new feature of MassTransit. It’s been around for a while. We originally called it turnout and it used to be because you’ve got a consumer, but that consumer is going to go out and calculate the infinity number of pie or some ridiculously huge compute thing. It’s going to convert an MPEG four video to a QuickTime video.

Jeff Doolittle 00:51:44 Something it’s going to train a new LLM.

Chris Patterson 00:51:46 Yes, exactly. It’s going to train a new LLM and it’s going to take a week to do it.

Jeff Doolittle 00:51:50 How many GPUs anyway?

Chris Patterson 00:51:51 Yeah, exactly. And I mean people do use them with GPUs but that’s just kind of a funny part. So they had a couple of names early on we used to use them processing big import files, but they started being called job consumers with version eight or I donít know when, but with version 8.3 we really took it to the next level because I wanted to really make it possible to do job processing with just MassTransit and so job consumers, it’s like a regular consumer, it has a different interface but it doesn’t run, we mentioned that the broker keeps the message locked, it doesn’t run while the message is locked. The job consumer engine, the job service that’s part of MassTransit, which uses, believe it or not, three saga state machines to manage every job through the system.

Chris Patterson 00:52:36 So it’s just using MassTransit handles, the scheduling and execution of job consumers in the system and they run without a message lock. So they’re just running on a machine somewhere and it as it completes and they can report progress, they can report, they can save state, I mean they can do all the things that you would really want to do with like a long running job. They can run and do that and MassTransit is going to keep track of that. It’s going to control like the concurrency of it. And now with 8.3 you can actually schedule them, and you can schedule recurring ones. So like all the things that we used to grab like Hangfire or Quartz to do in our systems or justÖ

Jeff Doolittle 00:53:11 A console app.

Chris Patterson 00:53:12 Or a console app with an event script with

Jeff Doolittle 00:53:14 A scheduled task.

Chris Patterson 00:53:15 Yes.

Jeff Doolittle 00:53:16 Yes. Programming sins of our past. Yes.

Chris Patterson 00:53:18 Yeah, exactly. All those things that we’ve like man, it would sure be nice to do this. And I mean I would say that Drew Sellers was one of the big pushers on this because he was like, dude when are you going to get this done for me? And it’s like what do I work for free? And then I realized that I do apparently.

Jeff Doolittle 00:53:31 Yeah right open-source.

Chris Patterson 00:53:33 Yeah. So I put all this stuff in there and now, I mean it’s super compelling. You can create recurring jobs that run using just a chronic expression and that math works itself out and it’ll run like every hour on the hour and do its thing and it run with the same semantics as a consumer.

Jeff Doolittle 00:53:48 That’s great. And you don’t need an external things like quartz or anything anymore like that.

Chris Patterson 00:53:52 No, because might been passed because most brokers can schedule messages. So like Azure Service Bus, SQS. SQS is limited because it’s like to 12 hours, which is a little weird. But even MySQL Transport now, I mean with MySQL Transport you can build a complete job system with Postgres or SQL Server. I mean no external dependencies.

Jeff Doolittle 00:54:10 You said MySQL, you just mean SQL Transport?

Chris Patterson 00:54:12 My SQL transport. Oh you are not my SQL.

Jeff Doolittle 00:54:15 Oh, oh not MySQL database.

Chris Patterson 00:54:17 No it does not support MySQL.

Jeff Doolittle 00:54:19 Itís a Postgress Transit.

Chris Patterson 00:54:20 Yes.

Jeff Doolittle 00:54:21 SQL Transport. Yes. Sorry, you said MySQL and I wanted to clarify there one more thing on job consumers. So just to clarify to make sure I understand, you mentioned that they’re not locking. In other words, typically when a consumer is processing a message from the broker, there’s a lock placed on that for as long as the consumer’s processing the message until it can acknowledge that it’s completed processing with a job consumer, you’re not holding onto that message. This allows you to process something that might take longer than the amount of time that you have to process that message from a lock message from a broker. Am I getting that right?

Chris Patterson 00:54:49 Exactly. That’s a hundred percent correct. In fact, the locking is actually handled by the three state machines that manage every job.

Jeff Doolittle 00:54:56 Okay, great.

Chris Patterson 00:54:57 So MassTransit using itself to make itself better.

Jeff Doolittle 00:55:00 Yeah. That’s cool. It’s kind of meta but that’s cool.

Jeff Doolittle 00:55:03 What about routing slips? What are they and what can they be used for? And maybe some specific examples again of where they’re useful.

Chris Patterson 00:55:09 For sure. So, we talked briefly about transactions and how transactions are tricky in a distributed system. The canonical example of this is the travel reservation system where I need to book a hotel car and a flight all in the same thing. And if any one of those fails I need to roll that back and go back to the user and give them alternative choices. because, if my flight’s not available, obviously the hotel’s not going to do me any good. So routing slips are a way to compose. So remember we talked briefly earlier about orchestration and choreography. Sagas or orchestration routing slips or choreography. So routing slips lets you add a series of activities that run and complete or not complete as a single kind of transaction. Now that’s the textbook definition, but it doesn’t always work out that way.

Chris Patterson 00:55:56 People do crazy things when you create software for them to do stuff. But the intent is that you’re able to execute a series of activities and if an activity throws an exception and fails any previous activity that’s completed is allowed the opportunity to compensate itself. So when you think about compensating transactions, I booked my flight, I have a flight reservation ID number, which if you want to go into how that really works, we can, but it’s a lot like the ATM, you commit to buying a flight and then you adjudicate it back in when you actually pay for the flight but you commit to buying a flight and you get a number, then you go book your hotel and you get a reservation number and then you go book a car and you get another reservation number. No money’s changed hands at this point.

Chris Patterson 00:56:37 If all three of those things succeed, the routing slip completes and then you’re a process is able to go on and use those notifications to then continue doing whatever it needs to do. Let’s say the car is not available so it throws an exception and says, hey, I couldn’t book the car, it’s not available. Well as it goes back through the itinerary of the routing slip, it says hey well these two things previously completed, I’m going to give each of those activities an opportunity to compensate itself and then produce an overall event that says hey, this routing slip failed as a whole. And so it, that compensation will go back to the hotel and say, hey, we had a problem. Cancel this reservation number. And it says Yeah, okay, no problem. We do this every day. And then it goes back to the airline, it says, okay those seats are available again, we’ll pull it back out and all of that rolls back.

Chris Patterson 00:57:20 So that’s what routing slips are for. They work similar to consumers, it’s just they have a different interface, they have inputs and then they have a log file that they can write, which that log is then stored in the routing slip. So the reason it’s called a routing slip is with a saga you have a central point of control that’s always talking to a database. With a routing slip, the network is used to capture that state. So the routing slip message itself on the broker also includes all of the state. It has a variables collection, which is like a key value dictionary that can keep track of everything that’s happened and provides a shared memory space for all of the activities in that routing slip.

Jeff Doolittle 00:57:56 That sounds like the message is the application pattern. I’ve heard that before.

Chris Patterson 00:58:01 Iím familiar with that one. But yeah, like everything is carried in the message so that there is no central thing that everybody has to check in with.

Jeff Doolittle 00:58:08 Yeah. So would you ever combine a saga and a routing slip together to solve that? They make it complex but then again, are there situations where it makes sense to combine some orchestration with some choreography In some scenarios?

Chris Patterson 00:58:19 I would say 80% of the time it works every time.

Jeff Doolittle 00:58:22 Meaning individually like just a saga or just a routing slip?

Chris Patterson 00:58:26 Both. 80% of the time you’re going to use both.

Jeff Doolittle 00:58:28 Oh you are going to use both. Okay now explain why?

Chris Patterson 00:58:32 Well because you may have an order that is a noun in your system, but you may need to communicate with five systems to do some processing step of that order by building that routing slip and running that you’re eliminating the need to put all that complexity into your saga. Because if you think about it from an orchestration, tell A to do something, okay, A did it tell B to do something? Okay B did it. Whereas if I can say, okay, I need A, B, C, D and E to do their things and then when it’s all done let me know I can do that as a separate choreograph set of activities that then I get the disposition of back in the saga to determine what to do next. Yeah, that’s great. So it’s a matter of compartmentalizing complexity to avoid having a massive saga state machine with 342 states and all of the complexity that goes with that. It allows you to compartmentalize that behavior.

Jeff Doolittle 00:59:20 And I would imagine as well, you can do things like set timeouts and say this step this particular consumer uh, the orchestrator might say I’m going to do the A to B and C in parallel with a routing slip to kind of extend this example. So to make it more concrete, maybe I want to check inventory, check your payments and get a shipping provider. So something like this and I want to use a routing slip to do those three things. But the orchestrator might say, but that all needs to happen within the next six hours or else I’m going to do something else. And I imagine there’s ways I can configure that so I can sort of pick up the pieces if something just takes a long time to respond and I don’t actually get an error.

Chris Patterson 00:59:54 Yeah, yeah. So saga state machines have the ability to schedule events to themselves. So you can say, hey I have a time window of 60 minutes, in which case I have to escalate this order if all of the processes downstream haven’t completed yet.

Jeff Doolittle 01:00:06 Letís talk a little bit about interop. What if somebody’s interop, because I know MassTransit as you’ve described it works in the .NET ecosystem. And so what if I’ve got to talk to something that’s written in Java or that’s written in some other language and maybe it’s going to talk to a message broker. You imagine that’s what it’s going to do? How do I, are there things that I can do to interrupt better with those other systems with something like MassTransit?

Chris Patterson 01:00:27 Yeah, definitely. So I mean I think we’ve reached a point in the universe where JSON is the language of the wire. I mean pretty much everyÖ

Jeff Doolittle 01:00:35 Not soap? Sorry, shouldn’t go there.

Chris Patterson 01:00:37 There is still XML and soap out there, believe it or not. But it’s becoming less frequent as people start to update their systems and JSON is available or they go through a middleware or a true enterprise service bus, like a MuleSoft or something that can translate XML to JSON and all that stuff. And it never has the fidelity you want but someone paid money for it and so you got to use it. The Interop is actually pretty easy because a lot of people speak JSON and MassTransit has a number of ways to be very liberal and it accepts but very deliberate in what it produces. So you can accept any JSON from any source and to some point I think you can try to do that with XML as well just raw XML.

Chris Patterson 01:01:19 But that goes back to Newtons soft and pretty much everybody likes System Text JSON now, not everybody, I know there’s some Newtons soft levers out there still, but not the author. So with system text JSON, I mean JSON, it’s very easy to pull in pretty much anything from any system and anything MassTransit produces is either going to be just raw JSON or it’s wrapped JSON MassTransit uses a message envelope by default. So it has a lot of headers and stuff in it and provides a lot of rich data in addition to just the message body. But not everybody does that. So it’s very easy to consume anything that’s JSON and you can do it. MassTransit has the concept of a rider. So, when I message what a real message broker is but we also talked about Kafka, well MassTransit can consume events from Kafka and Event Hub by using a rider, which is a way to bring messages in through topics in Kafka and it can also produce events back out to topics in Kafka.

Chris Patterson 01:02:09 But again that’s, that’s very much a consume and produce conversation that’s very different from like publish and subscribe. Even though they say pub/sub is a Kafka thing, it’s, you have multiple consumers on the topic, it’s very different architecture in Kafka. So bringing data in from other systems is very easy. A lot of the people I know using Kafka are using Avro the message format, which is very different from JSON but they use Avro because it’s binary and it’s very tight and it’s very common and it works with any language. MassTransit’s able to consume those Avro messages from Kafka and bring them into like a business domain that might be using say RabbitMQ or Azure Service Bus internally where they might just use JSON internally so it can speak different languages and you can even specify different serialization types per message type within MassTransit. So it’s kind of flexible in that way when you produce things out. So you could say, hey, when I publish this event it needs to be raw JSON because the system consuming it doesn’t know your headers.

Jeff Doolittle 01:03:03 And then I think we touched on it a little bit more, but any other thoughts on multi bus? Like when would I want to have multiple instances of MassTransit running to talk to different brokers or even maybe a combination of the in-memory broker with a backing broker like Azure Service Bus or RabbitMQ?

Chris Patterson 01:03:19 When I think of and I don’t, I hate dropping names because they don’t really make sense and I don’t know if they actually use MassTransit, but when you think about like the, it’s not even a convenient store. If you live in America and you drive on any highway, you’re going to see $12 generals at every small town along the way. And when you think about, you probably have in-store operations and then you have corporate operations, that goes back to the kind of the central office. That’s a case where if your application is like running on a register or a fulfillment system, you might want to use multi-bus to say process stuff within the building like using RabbitMQ and then produce events back out using Azure Service Bus using multi-bus within the same application. Because it lets you have multiple bus instances all within service collection, which, is pretty much the standard dependency injection container for .NET at this point.

Chris Patterson 01:04:08 That’s the main use case. I see a MassTransit does have a mediator, which if you’re familiar with Media R, it’s an in-memory mediator implementation that does not use a broker or anything or even simulate one. It’s a very synchronous way of calling but it lets you use the same consumer types with, just an asynchronous call to the I mediator, I scoped mediator interface and lets you do that as well. So it’s not really a multi-bus but it does allow you to do kind of that, I guess it’s commonly referred to as like the clean architecture pattern where your API calls through mediator through some middleware to a consumer that then may produce events to a broker.

Jeff Doolittle 01:04:45 So what’s next for MassTransit, Chris?

Chris Patterson 01:04:47 What is next? Job consumers were pretty huge. That was a big drop. That was something I wanted to get done. The SQL transport is obviously a pretty new thing. When I think of, one of the other things I really want to tackle, and I’ve got a branch for it is multi-tenancy. MassTransit being an application framework that that defines kind of how your event-based applications are built. I think another piece of that is to provide kind of that multi-tenant support because a lot of people are building SaaS apps with MassTransit and in many cases they just have a message property or something that’s like a tenant ID. But I want to take that capability further to where when they have like separate databases per tenant or things like that and there’s a couple of commonly used like multi-tenancy frameworks out there and I’m just trying to look at what they have and see what makes sense being able to connect to even like a broker per tenant.

Chris Patterson 01:05:39 Because multi-tenancy doesn’t necessarily mean you have a thousand tenants. In a lot of cases it’s like, well we have 10 but they really got to be on their own brokers because legal when you talk about security. So, coming up with a way to do that where how it would work to where they would have a bus that they could access that would be tenant aware, those, that’s something I think that would be super valuable. I don’t have a lot of customers that are actually asking for that, but just in general in the community, I think there’s a desire out there and it’s one of those things that I think would be clever to write. So that’s something, I mean I continue to offer support and consulting around MassTransit. I mean it is kind of my full-time gig so, and I take care of customers that are support customers actually prioritize the backlog. So the things they ask for first get done first. But MassTransit is still open-source and has a lot of community contributions of things. Someone just added message pack serialization a few weeks ago, so I was kind of like, oh I’ve never used it, but sure. Thanks.

Jeff Doolittle 01:06:34 That’s great. And we could do a whole show on open-source and inter source. One more question on the idea of multi-tenancy. Could that be an opportunity as well to possibly introduce some things around like regionality, like say I’ve got Azure East, Azure West and these maybe use some tenancy to determine message delivery or things or what resources to use for, whether it’s data sovereignty or jurisdictional requirements or things like that. Is that something else that you might consider including?

Chris Patterson 01:06:58 Well, I mean when you talk about a tenant, tenant is just a way to differentiate and talk to different message brokers. So your tenant might be geo right it might be, hey, we’re in Germany, so because I have a German tenant ID I need to select the broker that’s in a German data center. So it could certainly be used for things like that. I mean it’s really just a key to access some backend thing. So, tenant is kind of one of those things that could be a lot of things.

Jeff Doolittle 01:07:23 Well if people want to get involved or learn more, where can they go?

Chris Patterson 01:07:25 MassTransit.io is the main webpage. From there you can get everywhere. We’ve got a super active Discord channel, the GitHub site, MassTransit is the organization. MassTransit, MassTransit’s the project. There’s discussions on there. You can go in and kind of search and see what’s out there. Those are the main places that people hang out. I mean, it’s an active project. It’s been around for over 15 years. It’s used worldwide. And if I look at my stats in real time, it’s just like looking at who’s awake. I can literally see what part of the world is awake by looking.

Jeff Doolittle 01:07:55 At the sun never sets on the MassTransit empire.

Chris Patterson 01:07:58 Yes. And unfortunately that means if I’m awake and I’m on Discord, people are asking questions. I think some people are shocked when I answer at like 10:00 PM on a Saturday and I’m like, hey, I just happen to see it.

Jeff Doolittle 01:08:10 There you go. There you go. Well, you have nothing else to do with your time, so we appreciate it.

Chris Patterson 01:08:15 Exactly, exactly.

Jeff Doolittle 01:08:16 Well Chris, thank you so much for joining me on the show.

Chris Patterson 01:08:18 It’s been great being here. It’s been a fun conversation.

Jeff Doolittle 01:08:21 This is Jeff Doolittle for Software Engineering Radio. Thanks so much for listening. We’ll see you next time.

[End of Audio]

Join the discussion

More from this show