Search
Asanka Abeysinghe, SE Radio guest

SE Radio 653: Asanka Abeysinghe on Cell-Based Architecture

Asanka Abeysinghe, CTO at WSO2, joins host Giovanni Asproni to discuss cell-based architecture — a style that’s intended to combine application, deployment, and team architecture to help organizations respond quickly to changes in the business environment, customer requirements, or enterprise strategy. Cell-based architecture is aimed at creating scalable, modular, composable systems with effective governance mechanisms. The conversation starts by introducing the context and some vocabulary before exploring details about the main elements of the architecture and how they fit together. Finally, Asanka offers some advice on how to implement a cell-based architecture in practice.

Brought to you by IEEE Computer Society and IEEE Software magazine.



Show Notes

Related Episodes

Articles, and Resources


Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Giovani Asproni 00:00:18 Welcome to Software Engineering Radio. I’m your host Giovani Asproni and today I will be discussing Cell-based Architecture with Asanka Abeysinghe. Asanka is the CTO at WSO2, a global infrastructure software company and he has over 20 years of experience in designing and implementing highly scalable distributed systems, service-oriented and microservice architectures. He’s a committer at the Apache Software Foundation and the regular speaker at numerous global events and tech meetups. Asanka is the creator and original author of Cell-Based Architecture, Digital Double, and the Platformless Manifesto. Asanka, welcome to Software Engineering Radio. Is there anything I missed that you’d like to add?

Asanka Abeysinghe 00:00:59 No I think. Thank you very much and it’s a pleasure to be here. I watched your previous episodes a very interesting podcast. It’s an honor to be here. I think you, gave a really good introduction. In addition to that, I have been in the industry for nearly two decades now, work on these large distributed systems and helping many organizations to be successful in their digital journey by implementing application architecture and application development in their different type of business objectives.

Giovani Asproni 00:01:33 So let’s start with, giving our audience some context. What is a cell-based architecture?

Asanka Abeysinghe 00:01:40 Yeah, so cell-based architecture is an architecture style and it’s a combination of application, deployment, and team architecture as well because sometimes people just take it as a deployment architecture. That’s why I specifically mentioned that it’s a combination of these three things. And during the podcast I will explain in detail about how it aligns with these three elements of application deployment and team architecture. So in a very high level that’s what it is and addressing most of the challenges the architects, developers, and DevOps and platform engineers facing today. I can give a little bit history about how we started as well and a little bit on why we started and then we can get into the concepts of the architecture style because having some understanding about how this idea came into the picture and then why we implemented might be really helpful for you to understand.

Giovani Asproni 00:02:48 Okay. So maybe then we can start with, what problems aims are you solving?

Asanka Abeysinghe 00:02:54 Yes. It’s basically this concept or the idea came to our thinking around 2017, 2018 when microservices came into the picture and people were blindly getting into microservices and implementing systems using microservice architecture. And if you can remember those days, there was a very famous diagram called Death Star diagram that represent thousands of microservices connecting to each other. You can’t understand how these things are connected, what are the dependencies so on and so forth. So that’s what we saw in the diagram, but it was more difficult inside those organizations who implemented these systems that they didn’t know who owned these microservices, who’s going to maintain it, what’s the lifecycle of a microservice? So it became a problem in the industry. So that’s how the idea came into the picture. And then we were trying to find a solution for that particular problem. So that was the primary issue.

Asanka Abeysinghe 00:04:08 And then that’s another thing that we identified that’s a gap between architecture, development and deployment. If I explain it in detail, architect something, he or she will draw some diagrams and then identify this should be the architecture that the system that they’re building. Then developer look at it and then they build something by referring the architecture more, aligned with that, but not exactly what the architect architected in the drawing board so diagrams. And then it get worse when the developer develop that thing and then give it to the deployment or DevOps and operational people, they deploy something completely different because they have to have some standards and then they need to look at the security. They need to look at what are the best practices they follow. So basically there wasn’t a connection in between the architecture, development and deployment. So we were looking at how we can address that by creating an architecture construct that you can take from architecture to development to deployment.

Asanka Abeysinghe 00:05:26 So that was the second problem. And then there was another thing that we identified this gap between brownfield and greenfield. People draw nice diagrams, people try to implement systems, but reality inside the enterprise, you can’t ignore brownfield. We build these systems for a decade or two decades and you have a lot of databases, you have legacy systems perfectly running inside the enterprise and consumed by many users. So you need to still use those data and systems and build stuff on top of that, there should be a synergy between brownfield and greenfield. So we were looking for a solution, how you can bridge this gap in between brownfield and greenfield. And then again at the same time we identified people releasing reference architectures, but those reference architectures are more kind of reference implementations that bound to a specific vendor technology and we wanted to implement something or introduce something that is completely technology and the vendor neutral that the person who’s implementing can take it and use it based on their technical architecture and the preferred vendor products and build it. So those were the motivations that we had and the problems that we are addressing again, the solution for those problems that I framed in this particular segment.

Giovani Asproni 00:07:05 So a question, so the first one you mentioned the microservices, the proliferation of microservices in companies. So what do they want to address there? Do you want to address, because you mentioned the ownership and maintenance and do something to those services, but were you trying to address mainly the ownership problem or other issues related to microservices? Maybe related to, I don’t know, reliability, operability, other aspects?

Asanka Abeysinghe 00:07:32 So both actually, because there was a misunderstanding about the micro, micro people took it as the size, but microservices doesn’t mean it’s size, it’s the scope. So how you can design microservices properly and then have a single scope to a one service is where the true microservice architecture is coming. But when it comes to enterprise, microservices too granular, it’s too small for a team to work and team to operate. So that’s where you need to have a way that you can group set of microservices and then expose set of capabilities from that group of microservices. So when you group it, then it becomes something a team can maintain. And I would look at it from a different angle. I’m sure that the listeners are well aware about domain-driven design, something that heavily used inside the enterprises that you divide the bigger problem into small chunks using domain-driven design.

Asanka Abeysinghe 00:08:40 So there was an angle to that as well. If you use domain driven designs, you can identify what are the related microservices for a specific domain. So in in cell architecture we’ll get into detail, you can nicely implement domain-driven design by grouping these microservices. That is the first issue. So once you do that, then you can have a proper dependency management, as well as you can have proper ownership for that set of microservices. And then these things that you explain about resiliency and then high availability, scalability, those are coming in the outer architecture of microservices, not the inner architecture of microservices. So cell architecture addressing both.

Giovani Asproni 00:09:29 Can I ask you what do you mean with outer architecture and inner architecture in the context?

Asanka Abeysinghe 00:09:34 Yeah, so the inner architecture is basically the capabilities that you put inside the microservice and how you implement the business logic inside those microservices. That’s the inner architecture, mainly application architecture related stuff. Outer architecture is basically what you are getting from the fabric because you are not putting resiliency, you are not putting high availability inside the microservice, right? It’s basically should come from the place that you deploy this stuff in very high level. It can be the platform that you deploy, or it can be the infrastructure that you deploy. Then only you can have a more scalable system. So that’s the out architecture of microservices that you use those capabilities from the platform as a platform service and then get your microservices scalable and higher available. There’s a gray area there as well, like whether the micro is state full stateless because state full and stateless, those type of things depending on how you scale and how reliable these stuffs are. So you need to little bit consider that when you are building the in architecture or the application logic, but differentiating these two will help you to have a more robust architecture in the application level.

Giovani Asproni 00:10:56 Okay. And a question that I want to ask because when you look around for definition of cell-based architecture, there are values places, of course there is the GitHub repo where you put the reference architecture there, but there are also other places and a recurring thing in other places is that they mentioned that the cell-based architecture is pretty much something coming from the bulkhead pattern, so aimed at resiliency.

Asanka Abeysinghe 00:11:23 Yeah. So, if you go back to the history a little bit, the architecture style introduced by myself and Paul Freemantle who was the founding CTO at WSO2. So we introduced this in 2018 after that. But we have seen there are a few other parallels that referring cell architecture and used it in different contexts. So in one of those references it’s the completely taking it as a deployment architecture. That’s why it’s mainly addressing those, outer architecture related principles here. But if you go and read the original spec that you mentioned in the GitHub, you will find that it’s addressing all these three aspects that I explain application deployment and the team architecture. So that reference you referred is basically just focusing on deployment architecture. And in my view that is not pure cell-based architecture because it’s more of a segmented architecture that is violating few primary concepts that we introduced in the cell architecture that when we are getting into the concepts, I can explain detail.

Giovani Asproni 00:12:40 Yeah. Maybe later when we get to a bit more into details of the architecture, you can explain us this difference. Okay. Now maybe we can just start to go into more details now. So I’d say starting to ask, you know, what are the main parts of a cell-based architecture?

Asanka Abeysinghe 00:12:56 Yeah. So actually before answering that, I would like to add one more thing for the previous question. And now in general, when it comes to an architecture style or a pattern, the patterns are more effective if you apply the context to it. So a usage of a pattern, more effective pattern plus context. So in the previous example you took that the definition of just focusing on deployment. It might be the context, it might be an infrastructure team who took the architecture spec and then mold it to the way that they want it in the deployment architecture by applying context. So it became more deployment oriented. I just need to address that because anybody who’s taking a pattern, I think they need to consider this by applying the context before using it.

Giovani Asproni 00:13:52 Yeah, of course patterns always come with a context of usage. I ask a question because if you look around in Google finding documentation about it, dimension of the bulkhead pattern is everywhere. And so I wanted to check with you what was the origin or if you had anything to say about that because in your documenting it, I don’t think there is any mention of the bulkhead pattern at all and by the way you

Asanka Abeysinghe 00:14:15 Weíll consider it.

Giovani Asproni 00:14:15 yeah, all the links also are, will provide all the links in the page of the episode as well so our audience can have a look themselves. Yeah, maybe we can actually go into the details and starting with talking about the main parts of a cell-based architecture and after we define that, maybe also give an example from real life.

Asanka Abeysinghe 00:14:35 Yes. So the, even the concept came with real life. The analogy of this started with biology because in real world everything created using cells, right? Like you get all the human beings and then basically sales are giving a structure to life. So we thought same thing in the software engineering sales can give life to modern software systems. So that’s how the concept started and we kind of cultivate the idea into an architecture style. So in the cell architecture, the foundation is a cell, atomic unity is a component. So, I will give you examples. What is a component since we spoke about microservices, a microservice can be a component inside the cell and it can be an integration, it can be some kind of task, it can be something that generate events. All these types of workloads, we treat it as a component and then set of components we call as a cell.

Asanka Abeysinghe 00:15:50 So cell has a boundary and inside the boundary you have the cell components. So that boundary that defines this grouping is what we spoke about the bounded context in the domain driven design as well. So you have the components, you have the cell boundary now and there’s a gateway at the top of the cell. Like in human cells there’s a membrane similar to that. You have this cell gateway at the top that controls the communication coming to the cell. So all the ingress communication to the components inside the cell comes through the cell gateway. So you can’t directly communicate a component without bypassing the cell gateway. So that’s a fundamental concept that we have in the cell architecture. The egress calls that going out from the cell, basically the components inside the cell, it can go out, it is not going through the cell gateway.

Asanka Abeysinghe 00:16:57 The egress calls are going out and if you kind of imagine how this works in the enterprise architecture, that egress call will go and hit another cell gateway because now you have this sales operating inside the enterprise. So any egress call will go and hit another cell gateway and then do that particular communication. So the beauty of that, this provides a standardization of communication within the enterprise as well as outside the enterprise. If I getting into detail now, all the northbound calls, that’s basically the egress calls coming to the enterprise come and hit go through the cell gateway and then the southbound calls that is going outside the cell, outside the enterprise that you can control by using the egress policies that you put and then the westbound and eastbound, that’s basically the internal communication happening within the organization. You can control since all this communication will hit a cell gateway because of the way that we are architecting and controlling the communication within that particular architecture.

Giovani Asproni 00:18:23 Can I ask you a question? So what is the main reason for controlling bios cell gateway? What advantage do we get with this?

Asanka Abeysinghe 00:18:29 Yes. So it’s coming with another fundamental principle that’s a confusion in the industry about API and the service. In my definition service is the implementation. API is the interface that you allow the service to access or the execute. So there’s a clear definition between an API and service. So by applying that principle, you are exposing all the APIs from that cell gateway and letting the person who’s invoking those services to use the API instead of using the service endpoint. So that is the first reason thatís why we are controlling the communication from the gateway. Second thing, now when you have a gateway at the top, then you can apply the policies for that particular domain by enforcing the policies at the gateway level. So it’s really easy now you know exactly what you have to do and you can define a set of rules by telling who can access these services, what type of security standard that you should use, and what type of APIs that you are exposing.

Asanka Abeysinghe 00:19:55 As example you might decide, okay, for this particular function I am having a Rest API. For this particular function I am having the GRPC API, and for this particular function I am having a GraphQL API. All these things can be defined at the cell gateway level. So that way developers, they have to fit into that standard defined by the enterprise architecture and build these systems. So it’s basically giving freedom for people to do the development with some guardrails that you put at the enterprise architecture level. So when microservices came into the picture and these two pizza teams introduced into the enterprises, it was great that you gave lot of autonomy to these teams and how you develop this stuff. But it created an issue by losing the control. Now if you take it from the enterprise architecture level or from the business point of view, you didn’t know what exactly the developers were doing, what language they are using, what third party libraries they are using, how they are securing this stuff, how this communication happening.

Asanka Abeysinghe 00:21:11 And without calling a standard endpoint with the contract or a definition for that particular endpoint, it became Peter’s service or John’s service that you have to go to Peter and Joan and ask, Hey, what’s the definition of your microservices? How can access it? So it became a problem within the organization and the enterprise architecture lose the control, but you can’t operate an enterprise like that. You have to have set of standards, you had to have a set of best practices and you had to standardize the enterprise architecture. Once you do that, then you can give a lot of freedom for the application development teams to operate within those standards and develop these systems without bringing uncertified or unauthorized technologies into the enterprise. So those are the advantages you are getting by putting that gateway at the top and have this seamless integration at that level.

Asanka Abeysinghe 00:22:15 Let me add one more point, before we jump into the next topic. So when you have the gateway at the top, observability is really easy because now the communication will happen through a gateway, right? Because somehow, it’ll hit one of the cell gateways. So you can easily capture the observability data from that cell gateway and then have a full picture of what is the round trip of a specific transaction. And then in addition to that you can get the latency and then you can get different type of communication styles happening. Because at the architecture level you might assume this is how the communication will happen within these particular services, but in the runtime, you might see different styles or different patterns happening, right? Those things can be observed at the runtime as well as design time because now you can capture that communication data from the gateway that you put in top of each and every cell.

Giovani Asproni 00:23:20 A question about observability, not that you mention that using a cell-based architecture, say you access from the gateway, you do the observability from there. But what about tracing? You know the calls where which service originates the call and which service is going because when you have to debug something, when there is a problem, sometimes you really need to go at the service level. So to understand what’s going on.

Asanka Abeysinghe 00:23:43 Yeah. Yeah. So that’s two different levels right now there’s the intercell communication and there’s the intra cell communication. So usually it’s totally depending on how the implementation works. So I’ll take one example, one implementation, reference implementation that we have done. So in that particular, implementation between sales, the communication happens through APIs and you have the API gateways and within the cell you have a service mesh. In that particular example, we used EBPF and cilium to implement the service mesh and the observability. So you can capture all these interservice communication within that particular cell as well and then you can get those tracing in between the service calls happening within the cell without any issue. So that’s basically within the domain.

Giovani Asproni 00:24:41 Yeah. My point was if you need to connect several domains, sometimes when service interactive and via gateway, you really want to see their whole path of the call. So, is this still possible?

Asanka Abeysinghe 00:24:55 Yes, it’s possible. It’s possible. So at the level one you see the gateway-to-gateway communication and then when you drill down you can get into that communication path and then see what are the services got interacted within that particular transaction and then get all the details. And then if there are multiple interactions happening in between the services, you can get that as well. So all these things are depending on how you implement the cell architecture, but from the architecture point of view, yes, it’s supported but you need to get it implemented when it comes to the implementation.

Giovani Asproni 00:25:36 Okay. Now a couple of more things about the cell architecture then I’d like to hear an example of a real system in action if you like. So it said that is a cell of course is the main part of a cell-based architecture. Then components that are within a cell are the things that make up a cell. Then the gateway that is to communicate outside the cell, well at least to receive ingress communications from other cells. But then also found out there is also a concept that is a control plane and data plane as well. So what are those maybe just to define them and what is their purpose for our audience?

Asanka Abeysinghe 00:26:15 Yeah, so the control plane basically now there are two levels, right? One, you have the control plane at the top that you provide the signals how the enterprise should operate. So that’s at the top level. And then you have the data plane. Basically the cells are part of the data plane and then within the cell you can have a micro control plane. Basically the gateway act as a micro control plane to have limited data and metadata to operate the cell and provide the signals to the services within the cell. As an example, assume you have a public token provider or STS, but if you implement MTLS within the cell or the services, you can have a private STS to increase the performance of that particular cell within the cell so that STS become part of the local control plane of that particular cell. But if you look at it from the high-level, cells are part of the data plane and then you have the control plane that controls how these cells are orchestrating and then what type of tokens that I should issue and then the observability, then operational related stuff. All these things are coming at the control plane that you implemented as a platform service and managing all these workloads running in the data plane.

Giovani Asproni 00:27:56 Okay. So as I understand it, the control plane defines if you like the way the cell should, well a few things. I think there are some, it looks like there are defines some policies but also some other things a bit more on the technical level like the API registry and all these things. So in a way, since you are talking about architecture that involves everything involves not only the technical design but involves deployment, involves operability, the control plane defines all the policies and some of the, maybe some of the systems to enact some of those policies. So defining which cell should communicate with which ones, some aspects like this, the data plane is a more runtime kind of thing.

Asanka Abeysinghe 00:28:39 Exactly the business logic and all the actual workloads are running in the data plane and the control plane is controlling how this workload should run, how they should scale, how they should communicate, basically that’s where the control centers, like if you imagine the example analogy, I took is an airport even you can take it as a train control system. All the trains are like the data planes and the control center that, sits somewhere which managing the trains plus the passengers is the control plane basically.

Giovani Asproni 00:29:17 Okay. And also a question about also control plane because doing some research around, I was looking to this. So it seems that also parts of the control plane are also the pipeline, the deployment pipelines as well. Is that correct? So because they basically that is, if you like, is the architecture deployment time kind of architecture. So you define that in the control plane.

Asanka Abeysinghe 00:29:37 Yes. So all the, deployment related decisions are taken by the control plane as well. And then we had to look at not only the production system, right before you come into the production system, you have the dev environments, you have the test staging, that’s how the enterprise work, right? So within an environment, how you deploy these things plus how you promote these things from environment to environment. And then again, if you are going in detail about the deployments, how you can do blue green canary type of deployments, all these things are instructed by the control plane and then running a data plane or number of data planes, why I use the term number of data planes. We are in the era of multi-cloud and hybrid cloud type of requirements, right? Multi-cloud comes with most of the enterprises are not sticking to one hyperscaler.

Asanka Abeysinghe 00:30:37 They like to use multiple hyperscalers based on various reasons as well as you can’t ignore the data centers, right? Still enterprises are using these data centers and then having a hybrid type of a setup so you can have multiple data planes. I think that’s where things like Kubernetes is helping a lot in the current context because Kubernetes is helping you to have that multi-cloud architecture in a proper way because Kubernetes is like the unique in the infrastructure now. So if you are using Kubernetes, you can run it in any hyperscaler plus in your data center in the same standards. So with that, I think we need to think about a single control plane with multiple data planes when it comes to the practical deployments.

Giovani Asproni 00:31:36 Okay. Another question about boundaries of a cell. I was thinking, so we talk about DDD and so far, we said the cell is pretty much a domain or maybe a boundary context, but are there any other criteria that can be used to defend the boundaries depending on some other things, some other constraints that are important to a particular company?

Asanka Abeysinghe 00:31:59 Yeah, I think how you are organized basically, so I think the, I would say 75 to 80% of the implementations that I know use the domain-driven design because it’s a well-defined way of identifying these domains. But in some cases I have seen totally depending on the teams, like how you have identified different teams and they’re responsible for set of things that they do not coming from the domain-driven design point of view, rather the way that they worked earlier and how they have built their expertise. So that team definitions or team boundaries directly mapped to cells that we have seen and various other reasons like how they have been operating at the moment and the way they have structured these services have been used as well. But those are like more organization specific. I would say there’s no like a common way of defining that. The common way of defining what we have identified is the domain-driven design across organizations.

Giovani Asproni 00:33:17 Are there any situations for example, where maybe a cell is pretty much an entire system and there are some kind of replicas maybe with data sharding or something used for saying reliability or availability purposes. Is this a common case as well?

Asanka Abeysinghe 00:33:35 I have not seen a single cell as a system, but you can logically define it at the architecture level and then tell, hey, this is a system. I am treating it as a legacy cell or something like that. But it is at the paper like all the architecture level, but implementation is just a system, right? You have a point to access it. Implementation level, it doesn’t make sense to have something like that. Implementation level, it’s always like, you have a set of workloads that you group and then creating sales. I have not seen the scenario that you explained.

Giovani Asproni 00:34:17 Okay. You know, I was asking because well doing some research around, so there was cell-based architecture apparently and Slack and some other companies and my understanding from what I found on the web about Slack seems to be that this in that case is pretty much about availability zones and resilience. So it’s kind of the cell seems to be Slack. The system and then it’s kind of replicated in different availability zones. So if something happens, something bad happens only a small number of customers are involved.

Asanka Abeysinghe 00:34:50 Yeah. I think the way that I read that paper as well, it’s, completely looking at it from the deployment architecture point of view. Okay.

Giovani Asproni 00:34:57 Okay, as we said before.

Asanka Abeysinghe 00:34:59 Exactly. It’s perfectly fine because they are addressing one particular problem and then trying to use their architecture tool address that. But the original idea and how we look at it, it’s way more than that. You can get a lot of benefits more than the deployment.

Giovani Asproni 00:35:18 As you mentioned before. Because also from your explanation basically now when we were talking about the parts of a cell, the control plane is pretty much, it’s not simply the architecture of the system per se, but it’s the architecture of the context around it as well. Exactly. Because we are talking about deployment, pipelines, policies, all sorts of things. Basically everything around the system that helps in building, deploying and operating the system.

Asanka Abeysinghe 00:35:43 Exactly.

Giovani Asproni 00:35:44 Okay. Now let’s talk a bit about size. So how many cells should the system have?

Asanka Abeysinghe 00:35:51 It depends on the complexity of the problem and what size of an organization and what portion of the system that you have built with the cell architecture. So the number of cells, it’s really hard to give a definite number because it totally depending on the complexity, but if you ask the question how many components for a cell, then I have seen the best practices or good designs, it’s around 10 components per cell. And usually when I am helping organizations, designing systems using cell architecture, if it goes beyond 10 components, then we look at it and then divide it into subdomains because if it is more than 10, it’s really hard to maintain as well as there’s a problem in that architecture and the design. That’s kind of the sweet spot that we have identified when it comes to the number of components and various implementation. Got various number of cells like around 20 into hundreds because it’s totally depending on the design and the complexity of the problem.

Giovani Asproni 00:37:13 Also from what you mentioned before that ideally a team owns a cell, I would imagine that if you have more than, well you said 10, but let’s say more than a reasonable number of components, it’ll be very difficult for a single team to keep track of what they’re doing.

Asanka Abeysinghe 00:37:30 Exactly. And then there’s a relation between the team and the cell. So a single cell cannot be owned by multiple teams, but a team can have multiple cells.

Giovani Asproni 00:37:45 Okay, now I can understand the think can have multiple cells. Now why a cell cannot be owned by multiple teams?

Asanka Abeysinghe 00:37:51 Because then the fundamental things that we identified as API is the way of communicating and then the ownership and then the domain knowledge you need to implement that particular domain. All these things are breaking. I’m not telling you cannot do that. You can do it, but from the best practices point of view, if you start, letting different teams owning multiple cells, I mean single cell owned by different teams, then it’ll be really hard to maintain it and then you will go back to the same issues that we faced without using cell architecture as well. So it’s a best practice.

Giovani Asproni 00:38:32 Have you come across companies that did that? So with did several teams owning a single cell and getting into problems with that then have you got an example without mentioning names?

Asanka Abeysinghe 00:38:44 Yeah, I think there are multiple cases that we have seen like that. Then there’s a problem with ownership. There’s a problem with release cycles. Now you have lot of dependencies within team boundaries. So the way you can iterate, way you can frequently release will get affected when you have this multiple teams. Because this is not only about technology, right? What is slow downing in most cases it’s the people aspects and the political aspect of organizations. So when you have these cross boundaries and then multiple team inwards, then you have to deal with these political aspects as well that is slow downing. And then having that particular team owning that cell will have more impact than you can do quick releases. You can have proper autonomy and be more innovative in that particular domain rather than you have this cross boundary. And then there can be situations as example. Now you have team A, but you identified you might need some expertise from team B. Then I think in proper Agile terms then you assign that person from team B to team A during that particular sprint. That’s perfectly fine, but in that particular development cycle that the person who assigned from team B belongs to team A and working with that particular team so that way they can use that capabilities, but the ownership remain in that team A not the team B.

Giovani Asproni 00:40:25 Okay. And now another question about the cell. So in the reference architecture document in GitHub, the one you created; you state this. So I quote ìthe number of component connections within a cell should be higher than the number that crosses the cell boundary. Hence one approach would be to cluster components based on the connections.î Now what do you mean with that?

Asanka Abeysinghe 00:40:50 So basically if you look at it now, we have set of components inside the cell, right? And there’s communication happening within those components. And then again there’s a communication happening or coming to the cell from different consumers who’s communicating with the APIs that you expose through the API gateway. So what we have identified, if there’s no significant communication happening in between those services and if the number of communications is limited, you should not divide the cell into multiple cells because it’s perfectly fine to remain those services within that particular cell boundary. And if you see that increase and you cannot manage that, that’s where you need to consider, hey, whether I should take some of these services out and then create a new cell boundary. So that’s basically where we are coming with the communication, the number of communications and number of components and how you identify the cell boundary. That equation, again, it’s a best practice, not a must to have thing that when you are defining.

Giovani Asproni 00:42:10 Yeah, I was asking because I could imagine that if you have a cell that implements a central domain in a particular enterprise, you may have lots of cells using it and the central domain might have just a few services inside in terms of connections. They might have a few connections, but there might be a lot of connections to the cell itself.

Asanka Abeysinghe 00:42:29 Yeah,

Giovani Asproni 00:42:30 Which is against this particular guideline.

Asanka Abeysinghe 00:42:34 It depends on the functionality that you are providing from that particular cell and the communication style. Again, it can be, differ from the number of consumers who’s in working the APIs. But I think when it comes to the specification level, what we have identified, those are the guidelines. As I said earlier, even the context might be different in some cases. So that’s example that you took, if that is the case, I think you have to apply the context and then define the cell boundary. What we have identified is the common case in general, what’s happening when you have the number of domains and how you define the domains and the communication.

Giovani Asproni 00:43:21 Okay. And now I’d like to talk a little bit about the deployment of sales because it’s like we deploy sales, but we also deploy services within the sales. Yeah. And we always talk about basically CICD for sales and services. Now, when we say we deploy a cell, what do we do? Does it mean for example, I don’t know if you are in the cloud, a particular, I donít know, AWS account to declare it a cell and deploy services for that cell there? And create a gateway in that account for to communicate with other cells. This is just an example popping from my head, just trying to understand what we mean with deploying a cell.

Asanka Abeysinghe 00:43:58 Yeah, I think you can deploy and implement cell architecture in a more kind of a virtualized environment like that. It’ll work, but it’s more kind of a static deployment in that case. But cell architecture is more effective when it comes to a deployment like Kubernetes. So I’ll explain it from the Kubernetes point of view because an AWS account will be too large to treat it as a cell, but if you look at it from the Kubernetes point of view, usually when it comes to implementations, people create CRDs to define the cell boundaries. But if you are going for a basic deployment, you can treat a Kubernetes namespace as a cell and a Kubernetes pod as a component. So you have the pod that will scale the component. So set of pods, put it in a namespace, can treat it as a cell for somebody to get an idea of the deployment.

Asanka Abeysinghe 00:45:11 But if you deploy this in a non-Kubernetes environment, then you need to have network level policies defined to control the cell boundary and, stop communication by putting a gateway and do some infrastructure level configurations to create the cells. And we have not seen that many non-Kubernetes deployments because most of the deployments that we have seen is on top of Kubernetes. And as I mentioned, the common way like, put in the namespace. But to get it done properly, you can write a custom record definition or in Kubernetes world we call it as a CRD and then define how you are creating a cell, how you are deploying a cell, and then have those particular instructions there are, and some of the implementations they are using things like Helm charts to define this infrastructure, basically infrastructure as a code. And then when it comes to CICD and deployment level, you are using those scripts to create cells as well as create components and deploy it.

Giovani Asproni 00:46:30 In terms of deployment, cell-based architecture more suitable for cloud environments or they can work just as well on premises?

Asanka Abeysinghe 00:46:40 Yeah, so in my view, on-premises means managed by the organization and SaaS means managed by a provider. Okay. Cloud means, or in the other way it can be a hyperscaler or it can be a private data center because if you run a Kubernetes cluster in a private data center, still it’s a cloud, right? Managed by that employment, the particular organization. So if I reframe the question, it’ll work in both like in a hyperscaler environment or in your data center. But having infrastructure orchestration layer like Kubernetes will help you to implement it properly. Otherwise it’ll become a static deployment that will just separate the workloads but will not bring the additional flexibility that you can get that provide from cell architecture. So to answer your question, yes, it can be a hyperscaler or it can be a data center deployment, but having something like Kubernetes will help you to great extent.

Giovani Asproni 00:47:51 Okay. And now I’d like to talk about implementing a cell-based architecture. So let’s say an organization that is considering to do that. Now my first question for this is for what kind of systems would you recommend the cell by cell-based architecture?

Asanka Abeysinghe 00:48:08 So I would say any, like there’s no such system that you can pick. You can apply it to any kind of a system, but only thing if it is a distributed system, this is more applicable. Like if you have like a more kind of monolithical systems, then putting cell boundaries will not change a lot. It can change a little bit on encapsulating some of the complexities and putting an API gateway on top of each and every system, that’s perfectly fine. But then again, you will not get the full benefit of cell architecture by doing that. So as long as the system is distributed and you have more decentralized architecture with number of components, you can apply the cell architecture. And there are two ways of doing it. You take the existing distributed components and group them to sales because you might have started writing microservices, you might have written these distributed workloads in the system and you already have it, right?

Asanka Abeysinghe 00:49:19 You don’t have to rewrite everything to apply cell architecture. In a situation like that, you can do an exercise by listing all the components that you have already and then group them by using something like domain-driven design or any other mechanism and identify these are the number of cells that I am going to have and for each and every cell I’m going to distribute my existing services. So that is number one approach. But if you have a monolith system and you are planning to move into a distributed architecture, then you can start from scratch and then identify, hey, what are the microservices that I’m going to write and then how I’m going to organize these microservices into cells and identify those cells and identify those microservices or components and start implementing the new system by applying the cell architecture concept concepts. And we have seen both approaches, especially like if you take a larger enterprise, these things are already existing, right? Written by .net or written by Springboard or any other technology. So your kind of rearchitect it by putting a modernization effort. But if it is a startup or an organization going to do kind of a redesign, they can start with the second approach and do the architecture and get into the implementation.

Giovani Asproni 00:50:52 If a company is starting from scratch a new system, a greenfield one, they have this great idea for the systems they don’t have and say, okay, we think we want to use a cell-based architecture. Yeah. How can they proceed? Can they proceed incrementally for example? What is a typical way of proceeding?

Asanka Abeysinghe 00:51:11 Yeah, I think iterative approach and iterative architecture is something I highly recommend because people have a misunderstanding about the architecture. Architecture is static, it is not architecture, also iterative. So you can have an iterative approach, but I think you should have what’s my end goal or like a, what’s my ideal architecture identified? Then you can have a more face approach. In phase one, I am just having two cells and then putting everything together and put two gateways and get it done. And in Phase 2, Phase 3, I am planning to divide these things into a number of cells and then distribute microservices and then have that end state architecture. Or it can be, I am only building few microservices at this point and I’m communicating with my existing system for certain capabilities. So like that, I think it’s not only the architecture and implementation, right?

Asanka Abeysinghe 00:52:18 You have to look at what business is looking for, what type of capabilities that I had to provide first. And then again, it can be a skill related thing as well, right? You might not have the required skillset within the existing team to implement it. So I think having that analysis and then identify what is the most feasible plan and then have that phase approach will help. Again, it’s depending on various factors. I think having a proper architecture exercise by including the architects and project managers or the product managers will help you to define that base approach and have it. But I highly recommend having an iterative approach than you try to build these things. Because I’ll give you one example that one system that we are being involved, it was around 300 people and they had, in state architecture, but it took around three years to implement it.

Asanka Abeysinghe 00:53:26 But when you get into three years, most of the requirements you identified in year one wasn’t valid. And then business didn’t see any advantage of implementing a system like that. So I think maximum three to two weeks sprints are better, that you should deliver quickly, experiment with that and then see whether there is a value that you can generate and get into the next phase is the best because that’s a problem that we have seen that that technical architecture and the technical implementations are not aligned with where the business is looking and what type of capabilities that they are looking at. So I think having that alignment and adjusting these type of architecture approaches and implementations will help the technical teams to add more value to the enterprise.

Giovani Asproni 00:54:20 Okay. So you just mentioned skills. So are there additional skills that are required to implement a cell-based architecture compared to the ones required for a microservice one?

Asanka Abeysinghe 00:54:31 No, but I look at it from this angle because you need to have understanding about application architecture. You need to have an understanding about the deployment or the infrastructure architecture as well as understanding about how you divide and have a more distributed system.

Giovani Asproni 00:54:56 Okay. And now I think I’ll ask the final question since we are getting to the close. Costs, what are the costs of a cell-based architecture compared to the cost of microservice one? Let’s take as a reference, a Cloud.

Asanka Abeysinghe 00:55:10 Yeah.

Giovani Asproni 00:55:10 System. So are there differences in costs?

Asanka Abeysinghe 00:55:13 So I think if you look at it from the infrastructure cost, there are few additional things that you have to run right as a gateway per each cell. Those are little bit of additional things that you need to run. But then again, if you look at it from the overall gain that you are getting from productivity, stability of the system, and then flexibility that you have as an overall, system architecture, it’s lot more. So if you do like comparison between hard cost, it can be higher since you are adding things. But if you look at it from the overall TCO, I think you’ll gain a lot of benefit in the longer run because of these additional advantages that you are getting. So I think if you are doing a proper cost estimation, don’t look at it from the short-term financial gains that you are getting.

Asanka Abeysinghe 00:56:16 Look at it from the long term as well as the soft gains that you are getting. As I mentioned about the productivity and the flexibility, because the main issue that we have seen in the enterprise, the business teams might have great business ideas to implement or provide new capabilities to your customers or new business models that you can implement. But due to the technical limitations you have, you might be not able to execute those things, right? So having that flexibility in the technical architecture and the implementation, you can quickly respond to the business and then make sure that you are implementing those concepts at the business level by changing your architecture and then doing new implementation in the technical level. So to get that flexibility, you can apply cell architecture, get a lot of benefits, as well as another thing that how discoverable, like how quickly you can identify the things within the enterprise and how much you can reuse in the enterprise. Those are other things that you need to consider. So sales architecture is giving a great framework for the organization to have that discovery, flexibility, ownership, autonomy, all these advantages are getting. So I think that’s a huge cost benefit you are getting in the long run once you apply this architecture style and implement your systems.

Giovani Asproni 00:57:55 Okay, thank you Asanka. I think you’ve done quite a good job of introducing this cell-based architecture. And is there anything we missed that you’d like to mention?

Asanka Abeysinghe 00:58:05 I think we covered almost everything, but I would like to recommend the, listeners to read the paper as well as there are a lot of references that I have added as a reference article, so additional materials in the specification, you can read that and the specification released on the creative comments. So if you think you can contribute, feel free to do that. Send a full request. I’m happy to review them, include it and, if you like to contribute on articles or any other details about the implementations that you have done in cell architecture, please share it as well because then the architects and the developers who’s planning to use this architecture style, they can get a lot of benefit by reading those implementation details. So I would like to add that as the last comment.

Giovani Asproni 00:59:01 Okay, thank you. And the link to the GitHub repository is in the references we’ll give for the show so the audience will be able to access those easily and so they will be able to contribute if they wish to do so. So thank you Asanka for coming to the show. It’s been a real pleasure. And this is Giovanni Asproni for Software Engineering Radio. Thank you for listening.

Asanka Abeysinghe 00:59:23 Likewise, thank you very much.

[End of Audio]

Join the discussion

More from this show