Dan DeMers of Cinchy.com joins host Jeff Doolittle for a conversation about data collaboration and dataware. Dataware platforms leverage an operational data fabric to liberate data from apps and other silos and connect it together in real-time data networks. They explore a range of key topics, including zero-copy integration, encapsulation and information hiding, handling changes to data models over time, and latency and access issues. The discussion also explores dataware management and security concerns, as well as the concept of ‘data plasticity’ as an analogy to neuroplasticity, which is where the nervous system can respond to stimuli such as injuries by reorganizing its structure, functions, or connections.
Show Notes
From the Show
From IEEE Computer Society
- Data Placement for Multi-Tenant Data Federation on the Cloud
- Data mining with big data
- Data Integration on Multiple Data Sets
- Improving data sharing in data rich environments
- A Data Model for Heterogeneous Data Sources
From SE Radio
- Episode 523: Jessi Ashdown and Uri Gilad on Data Governance
- Episode 507: Kevin Hu on Data Observability
- Episode 484: Audrey Lawrence on Timeseries Databases
- Episode 456: Tomer Shiran on Data Lakes
- Episode 417: Alex Petrov on Database Storage Engines
- Episode 397: Pat Helland on Data Management with Microservices
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Jeff Doolittle 00:00:17 Welcome to Software Engineering Radio. I’m your host, Jeff Doolittle. I’m excited to invite Dan DeMers as our guest on the show today for a conversation about data collaboration and dataware. Dan DeMers is co-founder and CEO of Cinchy and a pioneer in dataware technology. Previously, he was an IT executive at some of the most complex global financial institutions in the world, where he was responsible for delivering mission-critical projects, greenfield technologies, and multimillion dollar technology investments. After realizing that half of all IT resources were wasted on integration, he created Cinchy with a vision to simplify the enterprise and provide the rightful owners of data with universal control of their information. Dan, welcome to the show.
Dan DeMers 00:00:59 Thanks for having me. Happy to be here.
Jeff Doolittle 00:01:00 So your bio seems to give a bit of a sense of what dataware might be. So, give us a brief introduction to what dataware is and why our listeners should be interested in it.
Dan DeMers 00:01:12 Sure. The easiest way to understand dataware is to actually just remind ourselves what is software? Because there was a day where software didn’t exist and then it came into existence, and today we take it for granted. But so, what did software do? It separated the form from function, right? We had machines, machines existed prior to software, post-software, though, you have machines but machines can then be programmed, which is the instruction, the logic, i.e. the software. And that changed and transformed how you think about machines. Right now, from that point forward, the more programmable a machine is the longer that machine is going to last, the more versatility is going to have, the more function that’s going to be able to be capable of doing because you can defer that till after the manufacturing process. A brilliant major shift and changed the world and continues to change the world today.
Dan DeMers 00:01:59 Well, dataware is really just the next step in that inevitable decoupling. And this time it’s not separating the form from function, it’s separating the knowledge from the function, from the logic. So, it’s essentially decoupling data from the software, and that magically simplifies everything, quite frankly. And it starts with relieving software from all the complexity of how to store data, how to integrate data, how to share data, how to protect and control data, and can now allow the software to do what it was originally intended to do, which is implement the functionality, implement the logic, the actual program, and let dataware solve the data problem in the same way that software lets hardware solve the physical machinery problem.
Jeff Doolittle 00:02:40 So what are some of the challenges that people face in shifting first maybe their thinking from the current paradigm to what you’re describing. And then after that, maybe we can start digging a little bit more into some of the technical challenges. But maybe first start with sort of what does it take for somebody conceptually to kind of transition from the current paradigm to more of this dataware approach that you’re advocating?
Dan DeMers 00:03:00 Right. I would say it’s a really good question, and I don’t know if I’ve even cracked the code on that, in spite of giving that a whole lot of time and energy, because it is both strangely simple and complex. And what I’ve come to realize though is it’s easier to explain the concept of dataware sometimes to a child that has no existing reference frame on how it works. And I learned that just even through explaining it to my kids. I’ve got three young boys and their friends, and they would just kind of naturally get it. Whereas someone who has 30 years of experience and has gone through several iterations and understands data lakes and data warehouses and data mesh and data fabric and all these latest buzzwords; dataware is hard for them to get their head around.
Dan DeMers 00:03:44 And what I’ve also come to realize is, so it’s an unlearning journey as much as it is a learning journey, but there’s also just a lot of almost like collateral damage from the overhyping of data-related technologies. Like, if you go back to data warehouse and data marts and data master, data fabric and data virtualization and master data management and, each of these things, if you read the marketing materials of the vendors when it was coming out, it sounds like it’s going to save the world, right? But it doesn’t. It solves an individual problem and sometimes even creates additional problems. So, there’s all this noise of what were really false hype cycles, right? That weren’t major shifts. Software is the last major shift, right? That was a big deal; that genuinely changed the world and continues to software’s eating the world and continues to, but dataware eats the software that’s eating the world. So, it’s a combination of unlearning and making it feel practical in a context that you understand. That’s what I’ve found. But again, I haven’t cracked the code, so I don’t know, maybe we can figure it out together.
Jeff Doolittle 00:04:50 Well then how does dataware relate then to applications maybe in a way that’s different from what’s previously been thought of?
Dan DeMers 00:04:57 Well, yeah. So traditionally, applications are designed to store their own data. And it’s not because someone consciously said that data should belong to an application, right? No one ever decided that and then architect technology to bring that concept to life. It was almost like an accidental design. If you think of the evolution of software, the first computer programs as instructions didn’t necessarily have the context of a memory. They couldn’t remember information, right? So, if the program was terminated and then you run the program again, it can’t remember where I left off. And so, the origins of digital data was really to act as the memory for that program.
Jeff Doolittle 00:05:39 When we talk about kind of the state of applications owning their data, and maybe that wasn’t explicitly sought by teams, but the microservices movement, from what I can recall, has actually explicitly stated that services should own their data. So maybe explore that a little bit, in regards to how does dataware sort of fit in that mindset, and is it completely turning over the tables of that concept?
Dan DeMers 00:06:03 Yeah, I think you have to go back even prior to microservices and prior to service-oriented architecture and all the architectural shifts before that to really get an understanding of the whole thought behind why apps owned data. And you alluded to it, which is that was never really originally an intentional design. It was an accidental design. Because the first computer programs, they would store digital data to act as a memory for the program, right? So, it was in fact, the data was subservient to the program. It was there to meet the needs of the application, right? To remember state and other such things. But as the applications started to get more sophisticated went beyond simple state persistence and would have business context, business information, transactions, information about a customer, so on and so forth. But we never really at the time had a need to rethink the ownership of data.
Dan DeMers 00:06:53 So it still continued to live in this paradigm where it’s subservient to the application and then suddenly woke up and realized that that data has value. So we can mine it, but because it’s siloed in these applications, that minimizes my ability to extract value from that data. So that’s when we attempt to bring copies of it together in the form of data marts and data warehouses and all the different variations — data lakes, data virtualizations, all these are trying to solve that same problem, which is data’s everywhere and therefore it’s nowhere. So, I need a consolidated view, whether physically or virtually to be able to get the intelligence out of that. But continuing to try and get a consolidated view while continuing to spin up applications that create more data silos is obviously, you’re chasing your tail. And the shift from software from monolithic to client server to three tier to N-tier to SOA to microservices, there’s a phenomenon there, which is the scope of a piece of software gets smaller over time.
Dan DeMers 00:07:51 And that’s how you achieve scale because you can’t scale because you can’t centralize everything you need to federate, right? So, it’s that federation. So basically, you have software that’s on a journey where what used to be one application is now a hundred applications, and you can call them microservices that have a defined scope, et cetera, et cetera. But it continues with that model of whatever your scope of software is, whatever the boundary is — in the context of a microservice, the service boundary is also your data boundary — but which microservice owns a customer such that no other context outside of that service would ever need to have any awareness of a customer. Like the whole thought, quite frankly, if you take a step back is ridiculous. Like how can data be owned by an application? State can be owned by an application, but business information, it just doesn’t make sense.
Dan DeMers 00:08:37 If you were to redraw the entire landscape ignoring all the current constraints and historical constraints, you would never put data inside of the software. It would be a separate and distinct plane that would also need federation similar to software. And that’s really what dataware is doing, is it’s creating almost like the data equivalent of an application network, which is a network of connected services with well-defined contracts, but doing that for data and doing it in a manner that allows the software to interact with that plane. But neither is subservient to the other. They are two separate concepts. You’ve got basically logic and services, and then you have information. And those are two completely different things that obviously interact with each other — and it’s not even just one way. Sometimes the data can interact with the service because for example, I can register a CDC listener on a piece of data and then that can trigger some type of business process, which may invoke a service.
Jeff Doolittle 00:09:31 The sense I have is it’s pretty broad, and I think there’s a few areas that we can tackle here that we’ll get to as the show continues. There’s a lot of things going on in my mind right now, but what I want to lean into here is you mentioned in your bio that I read at the top of the show that in your experience half of all IT resources were wasted on integration. And so, I feel like we’re getting closer to that as you’re describing all of these applications and the data that’s kind of locked in these different silos. And so, share a little bit of your experience about how you saw that waste coming about, and then help explain how dataware has helped resolve that situation.
Dan DeMers 00:10:10 I think back to when I came out of school and I kind of accidentally stumbled into the world of large global financial institutions, and I spent the first 11 years of my career at Citigroup, a big organization that’s been in business for 200 years had 10,000 plus applications and lots of mergers and acquisitions and spent billions of dollars on technology every year, about 30% as change. And me being part of that change team, whether I was enhancing or fixing existing systems or consolidating systems or building net new systems, a little bit of kind of all the above. And so, doing that was an eye-opener because throughout that decade, new technology was coming to market that allowed faster production of business capability, right? With different frameworks, new programming languages, so on and so forth. But in spite of the fact that you could produce functionality faster, projects weren’t really getting delivered faster. You can chunk the projects down and use an agile based delivery, but it just still felt like it was getting slower.
Dan DeMers 00:11:07 And then I had this realization where I could pick up the phone and call any of the thousands of developers and say, what are you doing right now? And chances are they’re writing an API to basically expose data or to access data or building an ETL or doing a reconciliation or implementing some type of after-the-fact like something that just is all because, the data is all over the place. And that percentage of time, what I now call the integration tax, actually was getting more expensive over time as the software was getting more focused and the evolution from monolithic to microservices and that wasn’t an overnight thing. It was a gradual journey. More apps, more silos, and those silos need to be destroyed. And the typical approach is to destroy them using integration.
Dan DeMers 00:11:54 But you’re integrating everything to everything over time, and that’s just not sustainable. So that was easily consuming half of the entire change budget of such a large organization. But what was even more interesting is it was getting more expensive as technology advanced. And obviously that doesn’t make any sense. Like imagine if every day you show up to work, your income tax gets a percentage point higher; there’s going to be a point where you stop showing up to work, right? So, if something had to give, right? So, it didn’t immediately hit me what the actual, it took a, honestly, it took a long time to kind of extrapolate the symptoms into the underlying root cause. But I’m very confident that the nature of dataware is basically the missing thing that caused that — that essentially reverses that trend. And there’s an inevitability to it. Meaning just like software, if the person who invented the first computer program was never born, somebody else would’ve written the first computer program. There’s no question that it would’ve happened. It’s kind of like if you ever watched Terminator 2 Judgment Day, like it’s, you can call it something else, you can delay it, but it’s going to happen. Dataware is inevitable. The only question is when and how.
Jeff Doolittle 00:13:07 I think it was Ada Lovelace wrote the first computer program, if I’m not mistaken. So, integration, obviously as you pointed out, huge expense, complexity on top of complexity. And essentially your claim there is that it’s hearkening to this inevitability that data wants to not be sort of, confined within either microservices.
Dan DeMers 00:13:28 Imprisoned by a software.
Jeff Doolittle 00:13:29 Yeah, it’s interesting too because it triggers a lot of patterns in my mind. Like I know a lot of the DDD patterns relate to trying to figure out how do you bound data within context, but then how do you share the data between contexts? And I’ve seen that get incredibly complex and incredibly challenging as time goes by.
Dan DeMers 00:13:45 You know why? Because that context changes over time. And sometimes you get it wrong, and if the world was just fixed and never changed, then in theory you could design towards that. But it’s dynamic. It changes. The context of today is not the context of tomorrow. And if you tightly couple your data boundaries with your service boundaries, then you’re going to be screwed. And again, just take the example of the customer. Customer is not owned by a single service, right? If I work in an organization that has 10,000 applications, how many do you think need to know something about a customer, something about an employee, something about a product? Probably about 10,000.
Jeff Doolittle 00:14:23 Yeah. And maybe different things that they accrete to that customer that are contextual to maybe one or a few services, but not to all. And yeah. These various sorts of things. Let’s dig into one of the more specific challenges that I imagine listeners might be asking about right now that I know I’m asking is there’s data and there’s data. So, there’s blobs, there are files, there’s relational data stores, there’s document databases, there’s all these different ways of storing and retrieving data. So, how does dataware kind of deal with, I guess the struggle I’m having maybe intellectually here is, it feels like somehow there’d be this monolithic dataware platform to rule them all. And like, do I have to turn all my data into some new format? Is this just another integration that I have to do? Like, how does dataware kind of deal with those kinds of challenges?
Dan DeMers 00:15:12 Right, yeah no that’s a good question. And you have to think of dataware in the same way that you think of software, right? There’s not one piece of software, there’s not one pattern of software. It’s a whole new approach, right? To make machines that can defer their exact functionality to a program that can be written later, right? That’s essentially what a software is. And dataware is that separation of data from the software. And you could implement dataware through a central monolithic platform. You absolutely could. That’s probably not going to take you very far. However, you could also implement dataware as a federated network of information that’s properly governed using even DDD-type principles, right? Where you’re organizing data into domains and those domains are business-aligned. And as your business changes and evolves, you’re adapting your domains accordingly. And does it need to be a central platform? It could be a decentralized platform.
Dan DeMers 00:15:58 So, there’s going to be good ways, there’s going to be bad ways and, there’s going to be an evolution in the ways that dataware comes to life. But dataware is dataware when it is separate and distinct from the software. You also mentioned different formats and protocols and persistence technologies like document versus graph, versus relational versus, you know, columnar versus all these different specialized formats. Put that all loosely in the bucket of data of information, whether it’s structured, unstructured, semi-structured. And again, if it’s separated from an individual piece of software, then you’re applying a dataware-based approach. Like in my mind, a dataware configuration or approach that would fit in a modern enterprise is one that basically draws a line between the software and the data, and the interface is supporting polyglot and multiple formats.
Dan DeMers 00:16:53 And whether I want to interact with something and benefit from the benefits of like a document database to give me a schema flexibility or a graph database where I can use inference or relational database where I want referential integrity and transactions and whatnot. These are just capabilities of whatever I’m using to implement my dataware layer. Whether I built that or whether I bought that or whether I bought a bunch of things and assembled it to create a dataware environment. But again, the core is that it’s separate. The line is redrawn, you’ve got software applications and then you’ve got data, and they’re independent things that interface with each other, but neither is owned by the other. That’s dataware.
Jeff Doolittle 00:17:29 So maybe down to brass tack a little bit, if I want to get started on doing some — I mean, maybe naively somebody might say, okay, fine, I have a postgres database and my data is separate from my application and heck, I’m
Dan DeMers 00:17:42 Going to one application, but what in fourth application?
Jeff Doolittle 00:17:46 Okay, so then I just naively give everybody a connection to my postgres database and say, thumbs up, I have dataware.
Dan DeMers 00:17:52 So, it’s the old shared database pattern? We know that went well, right?
Jeff Doolittle 00:17:55 But, tell us why that’s not dataware.
Dan DeMers 00:17:58 Yeah. And honestly, that’s a fair question, but it’s kind of like if you take let’s use a — let’s switch context for a second and let’s use collaboration technology for documents. So, everyone’s used Google Drive or SharePoint or Box or OneDrive or something that allows us to have a file or a collection of files that I can give access to other parties, we can work together on that. It’s version control. It’s access control. We’re using basically collaboration technology to basically collaborate on files. Well, what’s the difference between that and say a file system — like, why did I need collaboration technology? Why didn’t I just give you access to my file system? Right? And it’s, well, because quite frankly, the file system’s missing collaboration functionality, it wasn’t designed to do that. It’s designed to basically organize information in the context of a computer, right?
Dan DeMers 00:18:38 Not in the context of like the world. So, collaboration technology basically adds in the missing functionality to make that actually viable. Because if you gave everyone access to your file system, trust me, it isn’t going to work, right? And we know that. The same is true with the database. If I give you access to my database, well, who owns the data model, right? You go and you muck with the data model and all of a sudden I have code that was written against that model and it breaks — like, how dare you? So, you start to then want to create silos as a result of that. And whether it’s data model changes, like schema evolution, or if it’s physical resources and whatnot, you run into all these problems. Well, it’s because a database wasn’t designed for collaboration. The intended use of a database, as we know it today, was to meet the needs of a single application.
Dan DeMers 00:19:20 It’s designed to be the servant of an app, and that’s it. End users, business users don’t log into the database. It’s just not designed to do that. However, dataware — and again, there’s different ways that you can go about implementing it — at a conceptual level, it is designed to do that. It is designed to enable collaborative data management, whether it’s two applications, whether it’s two development teams, whether it’s two business teams or whether it’s all those parties, all collaborating where I can own data, you can own data, I can reference your data, but you can evolve your schema independent from mine. I can grant access without you needing to get copies of that. You can interact with it as a human, as a machine, as artificial intelligence. That’s essentially what it’s doing.
Jeff Doolittle 00:20:00 So, let’s talk a little bit about the dynamism that I think I just caught there. You talk about like schema evolution. So that would be one of the problems with sharing your, there’s many — there’s many, please, listeners, I’m not proposing you to share your Postgres connection with a bunch of other applications. That’s, that’d be really bad. But you talk about dynamism and, and schema change. So, let’s explore that a little bit. We’ll get into it a little bit later about like, there’s got to be some like data or platforms or something like that to resolve these things. Because otherwise it sounds like we could just be telling our listeners, well, you just need to do more ETLs and you need to come up with more centralized data stores and you need to come up with these kinds of things. But let’s first talk a little bit about the schema evolution. Like how does that, because obviously that’s a big challenge, especially when you talk about like statically type languages and things like this where maybe they’re expecting the data to be in an exact certain shape, and if it’s not, then they have problems. How does dataware help with some of those kinds of challenges of sort of the dynamic nature of the schema of data over time?
Dan DeMers 00:20:50 Yeah. And that’s where plasticity comes in. So, if you think about how your brain works, right? You learn new information, you make observations. You go to sleep your brain, what does it do? It reorganizes, it’s adapting its structure, it’s structural plasticity. And without that capability, you and I both wouldn’t be very smart, right? Like if our brain couldn’t reorganize itself through new experiences, we would know tomorrow what we knew yesterday. And we would’ve the intellectual capabilities of not even a newborn child, right? Like, because our model can’t change. And if we limited it so you can extend it but never refactor it. Meaning you can’t evolve it; you can just add append to it. Similarly, you’re going to run out of physical space, right? Unless our brains are designed to just continuously expand, but then it will be inefficient.
Dan DeMers 00:21:37 So there’s a reason why human intelligence requires the evolution of structure, the evolution of schema. And that same phenomenon is true in digital systems as well. But in a model where the data is owned by an application, and if you are another application and you’re trying to interface with my data — because I own it if I’m the application — but you’re not talking to the data directly, you’re talking to the code, you now create this data contract, right? Which is your code needs to be compiled against some type of standard that if those standard changes, if I rename a column or something and that changes the external service, then your code is going to break in accordance with that. And that makes sense in a world where the data is behind the applications, right? But when data is now front and center and it is existing on a separate plane, that just doesn’t cut it; you can’t have these rigid contracts.
Dan DeMers 00:22:35 You need the ability for one business team to refer to information in another business team. And for the, even the structure itself, whether it’s appending or refactoring or deleting and whatnot, to be able to evolve independently without it breaking my, whether it’s my data, my data structure, or my application code. And this becomes a complex subject in terms of how one actually goes about implementing plasticity. But where it becomes possible is through that standardization of that data layer, right? The dataware environment is what makes that possible because you’re intercepting all data-related operations through your dataware environment, through your dataware layer.
Jeff Doolittle 00:23:20 Okay. So, the dataware is then helping with this sort of, you mentioned plasticity, but schema change over time is maybe another way of looking at it. And I guess the idea to make it concrete is if I have an application and it’s integrated with a dataware platform and there’s a certain shape of data that I’m expecting, and if something changes, the dataware is going to still support me getting the data in the format that I’m used to. Now I might opt in to change over time, but the dataware is somehow going to ensure that I can still receive the data in the format that I expect?
Dan DeMers 00:23:55 Yeah. I can give you a really simple example because again this can be getting into the guts of it, which is good, but if we go back to the file and document collaboration example, I don’t know if you’ve ever noticed this. And like we use Google Docs for document collaboration, although more and more we’re treating documents as data and we can use data collaboration to ultimately render that obsolete. But that’s a whole different conversation. So, Google Docs for a second — or Google Drive, because it’s not just documents, it’s files. If I take up a file and I take it from my local computer and I put it on Google Drive and then I give you access to that, well when I’m putting it on Google Drive, I’m organizing it, right? I’m giving it a name, I’m putting it in a structure.
Dan DeMers 00:24:31 And that structure may be contained in another structure. Like you can have subfolders just like a file system, it kind of feels like it’s organizing files in a file system. But then I give you access and let’s say you bookmark that document. Well, what happens if I go and I rename that document or I move it around, I reorganize the folder. So, I take it out of this folder, put it into the parent folder, rename that folder, and then rename that file. What happens to your bookmark? What do you, what do you actually think happens to that bookmark?
Jeff Doolittle 00:24:56 Well, I’m actually looking at a Google Drive doc right now and it has a really nasty long hash of some sort that I have no idea what it means, but I’m guessing it’s a document-unique identifier. So that way I can reorganize a location of the document without affecting it and you can change the name of it without affecting my ability to access it.
Dan DeMers 00:25:13 That’s it. So that’s a really simple example of, if I were to apply the concept of plasticity to document collaboration, now just extend that to data and there’s more complexities to it than that. That’s very simplistic. But there’s a perfect example of that, right? So, it’s, and without Google Drive being in the middle, that concept wouldn’t have been possible, right? It’s the fact that it’s intercepting, it has awareness of whoever created the file, how they organized it, to assign that GUID, et cetera, or however it’s uniquely identifying it. And it’s separately tracking how that file with an immutable reference is organized. But in theory, I could have that same document in five different locations and not have five separate copies of that, right? Because it can just be a symbolic link. It can be a pointer, but none of that would be possible without the collaboration technology, right?
Dan DeMers 00:26:04 So, that’s what document collaboration did for documents and it’s amazing. No more, oh, my bookmark is broken. Did you move the file? It doesn’t happen anymore, right? You don’t need to, it just works. That’s how data should be; if I write code and that code refers to data that is organized in a model and you change that model. Let’s take a simple example where you just append to it or you rename something, and there’s other scenarios where if you break things apart or you combine things or you, you move things from one structure to another. There’s some pretty complex scenarios, but conceptually that’s what it’s doing is it’s how to gracefully handle those scenarios and give the, the other party the experience that they would expect knowing that you have this unique opportunity to implement plasticity because you are implementing a dataware layer.
Jeff Doolittle 00:26:52 Yeah, I like what you just said there about essentially making it easier for the integrator. Maybe we don’t call them that in this world, but the idea that I’ve well,
Dan DeMers 00:26:59 A collaborator.
Jeff Doolittle 00:27:00 Yeah, the collaborator, right? And I’ve been saying for a while now that a good API is hard for the implementer and easy for the integrator, and that’s another way of saying technical empathy. It sounds like here what we’re doing is we’re saying let’s do the hard work of making it easier for the person who’s working with this data or platform instead of having them have to carry a lot of the burden of a lot of these things around. And we’ll get into some of these other things in a minute, like access controls and managing schema change, and things of this nature. Let’s lean a little bit then into before we, I do want to talk some about security and access control in a little bit, but first, one of the things you mentioned in some of the documentation from some of your websites is this thing called ‘zero copy integration.’
Jeff Doolittle 00:27:39 And that kind of came up this there a little bit with like Google Drive. What’s interesting is though, anyone who’s used Google Drive recognizes that you can download the file and bring it to your local and you can print it and change it or these kinds of things. And so, I think there’s probably some interesting challenges there as far as it goes with dataware as well. Especially as we talk about things like security and information control and things of that nature. And then that’s also going to bring in a challenge around things like availability and latency. So, speak to that if you can. Some about how dataware addresses some of these challenges and what zero copy integration maybe means, and maybe what it doesn’t mean.
Dan DeMers 00:28:16 Sure. Yeah. So, zero copy integration is a standard that was actually just recently ratified in Canada not too long ago actually, that is now being taken internationally. And think of that as a design principle that you’re designing to minimize copies. And how are you doing that? You’re using dataware to enable data collaboration. Again, using Google Drive as that simple analogy, it’s very similar. And if I give five collaborators access to that, then it doesn’t mean that they all need five copies. It also doesn’t prevent them, as you say, right? But there’s definitely fewer copies as a result of collaboration than there would be otherwise. So that’s a step in the right direction, because today the world works off of copies. Software and developers are giant data copying engines, right? That’s what we do. And that’s not going to instantly stop.
Dan DeMers 00:29:05 And you have existing copies of existing data inside of existing systems that’s also not going to be untangled anytime soon, right? So, it’s really just changing it such that on a go-forward basis, you’re consciously minimizing copies because every copy is inefficient, every copy is compute, it’s storage, it’s a potential transformation where you need to do a reconciliation. There can be a loss or corruption, there’s a loss of control over that copy. There’s so many bad things about copies that you want to minimize that. And the enablement of a true like puristic world of zero copies, honestly, it’s not going to happen in our lifetime, but I can tell you confidently that a world where you are forced to copy every time you want to do something, as we traditionally are, is also not a world that is going to be sustainable. So, it’s all about the minimization of copies, and you’ll find that over time — this is just a prediction at this point — is there’s going to be innovation in the dataware space that will enable us to get ever and ever closer towards realizing that true zero copy vision of the future.
Jeff Doolittle 00:30:14 Yeah, that’s helpful. So zero copy doesn’t mean there can’t ever be a copy under any circumstance. But it does mean that the goal is to minimize the number of copies.
Dan DeMers 00:30:24 Yeah. And if you read the standard, it talks about that because you have existing systems, you already have existing copies, and no organization has time to re-platform their entire ecosystem. This is not going to happen, right? So, you asked a question earlier that I don’t think we answered, which is, how do you actually do something about this when you already have existing stuff, right? If you’re starting greenfield, then in theory it would be easier, but you’re not, you have existing systems, you’ve got modern SaaS apps, you’ve got hybrid multi-cloud. You’ve got all this complexity already. Well, except the fact that your existing complexities that are already implemented are already implemented, right? It’s already done. You’ve already eaten that complexity. The opportunity really is to change how you deliver change going forward, such that if I’m going to build five new systems, let’s say over the next year, and all these five systems need to interact on a common concept — maybe they’re adding information related to a customer or something — rather than each of these five having their own slices of this information and then doing integrations between them APIs, ETLs, and adapting it to application specific data models that may evolve over time. But then you get into the contract things.
Dan DeMers 00:31:23 Instead, make it so that these five applications can collaborate on that and do it in a way that doesn’t have all the byproducts and downsides of a shared database, right? In other words, proper dataware technology. And now instead of five copies, you can have just the one original copy for those five applications. And that’s a very simple example, but it’s really just changing how you deliver change to use collaboration versus integration. So, if I’m going to create a new PowerPoint presentation rather than creating a local PPT file and then sending you a file attachment over email as I would’ve done pre-document collaboration, I’m going to use some type of collaboration tech and I’m just going to give you access, so that’s what zero copy integration is, is use collaboration as your default approach for implementing digital systems.
Jeff Doolittle 00:32:11 So how does that work when we live in a world of the fallacies of distributed computing? So, the fallacy that the network is available, and that it’s reliable, these kinds of things. Does that prevent us from ever reaching the nirvana of a true zero copy future?
Dan DeMers 00:32:25 Right now? I would say it does through innovation over time, maybe we can overcome those barriers and hurdles. I can’t tell you exactly how, but I personally wouldn’t be surprised if future innovations in the dataware space unlock that. But definitely now, like today, you’re going to need to implement caching, you’re going to have to account for network latency. There’s going to be other considerations, especially when you’re dealing with like transactional data and high volumes, like again, I come from a background of financial services. So, if you’re doing like high frequency equity trading where you’re hypersensitive to latency, you’ve got to be aware of that and that needs to be accounted for in your design. So however, it’s still good to have collaboration, even if you need, say local caching. And the local caching has eventual consistency back into the original source and it’s only trusted once it’s committed back, right? So, there’s, there’s ways that you can still move toward the minimization of copies and work within the current constraints of technology.
Jeff Doolittle 00:33:24 Yeah. And then I think about other things like offline type approaches. I mean, Git is a great example of the ability to collaborate in a distributed fashion and then you reconcile after the fact. And then there’s, as we’re talking about Google Drive and Google Docs, conflict-free replicated data types, CRDTs, I’ll put a link in the show notes. Yeah, that’s another one of these mitigating technologies that you could possibly use to handle partially connected types of scenarios. And I imagine, yeah, and I’m seeing you nodding so I’m like okay, it seems like these could be relevant things going forward to be able to help with zero copy integrations.
Dan DeMers 00:33:54 Yeah, for sure. Because one thing to keep in mind is like we have through my company we have a dataware platform, but again, dataware is not such that you need to use a singular platform. There’s lots of, you can implement your own, you can assemble it using different technologies. But when we’ve designed our platform, we kind of think of it that way, which is, it’s like Git for data — and that includes metadata of course. And not only the ability to have multiple branches and merging and like all the functionalities that you would expect in a modern such tool, but extending that to the world of data. But it gets really interesting when you think of even the time machine aspects of what dataware makes possible. Cause again, by introducing a universal data layer that has awareness of schema evolution and data evolution over time, it also unlocks that potential, right?
Dan DeMers 00:34:42 To creatively use the awareness of the historical evolution of schema such that you can now run queries and pull data from the past in the model of the past. And so, it opens up all these interesting things. So, you start to realize that it opens up the, if I can go back into the past, like in our platform, I can run a query in the past and I can see it in the current data model or in the model that was in place at that time, but I can’t change data in the past. So, we’re starting to think about, well what if you could change data in the past? What does that do? Okay, it spawns a timeline, right? And that timeline, was it always there and now you’re just revealing it, or is it actually creating it? And it kind of gets, some of these more advanced scenarios get pretty damn complicated, but the fact that they’re even possible is exciting, right? It’s now just a matter of time before solving them all.
Jeff Doolittle 00:35:28 Yeah, I wonder if I’m the only one now when you say alternate timelines, who’s thinking about like Biff Tannen and Back to the Future and the alternate timeline-we got to get back to the other timeline. Yeah, that’s interesting. So, you mentioned the idea of dataware as a platform, and you just mentioned one aspect and let’s explore some of the other ones. So, there’s a few we’re talking about, I want to talk a bit more about access control and security, but you just mentioned one which is like this dynamic temporality, which I think is something new that hasn’t come up previously in our conversation. What factors generally, I just mentioned a couple, but what characterizes data where broadly? It’s more than a Postgres database where you share your connection stream with the world. We get that. Yeah, it’s not application data locked in silos. It’s not just a bunch of ETLs and transforms. You mentioned metadata. So, can you sort of break down what are the elements of a dataware platform, broadly? You mentioned a couple, but maybe there’s more.
Dan DeMers 00:36:20 Yeah, and one thing to think about there, and I should have said this earlier, is when you think of, for example, that temporal kind of superpower and the ability to have granular controls, which we haven’t talked about, but I’m sure we will. And these are all different capabilities that can be built into a dataware platform or not, right? So, it’s not necessarily mandatory, and there’s going to be different pros and cons of one dataware configuration and architecture and pattern and platform versus another. So that’s one thing to keep in mind, right? However, what dataware has that defines it to be dataware is the fact that it is managing data independent of software. And the enablement of that decoupling is the very definition of what dataware is really doing, right? So, you’ve got software and software then sits a top dataware and dataware provides essentially everything that the software needs in terms of data management: how to access it, how to store it, how to protect it, how to track changes to it. All these things is what it’s providing really as a service to not just one piece of software, but any piece of software.
Dan DeMers 00:37:24 So that’s what dataware is doing. And then there’s basically features of a dataware platform. And that can include, for example, the creation of that time machine. And what’s interesting though is it goes from like in a world where every application is a data platform, it would never be economical for you to build into that data platform for an individual application all of these superpowers, right? Granular data-level, data-driven access controls, schema, evolution, support multi timeline and support wormhole queries, which are like remove time as a filter. Like you would never be able to do this, right? It just wouldn’t, your simple application that would’ve cost you $10,000 is now going to cost you $10 million, right? You can’t do that. But when you start to concentrate into a common capability that then gets used many times, it gives you that scale.
Dan DeMers 00:38:13 It’s kind of like the power grid. If you think of you’ve got power plants — like nuclear, solar, geothermal, and they all have pros and cons and they all have different formats and protocols and pros and, they’re very complicated things. And then there was a point where we could generate power and there was no power grid. So, what did the power grid do? Well, it basically decoupled the producers of energy from the consumers of energy. That’s basically what it did is I can have solar panels on my roof, I can self-supply, and then if I have surplus, I can feed that back into the grid. And when I’m short, I can draw down from the grid. And when I’m drawing down, maybe I’m grabbing it from the solar panels from someone else who is still under the sun while it’s a cloudy day where I am, right? .
Dan DeMers 00:38:49 And I don’t even necessarily need to know, right? Cause it’s all standardized through this. And the power grid provides all these capabilities and it’s still evolving today. Like, today’s power grid is not yesterday’s grid. And tomorrow’s grid will be even smarter, right? It’s, it’s evolving independently from individual power generation, right? But if we identify a new way of generating electricity — maybe we can just harness gravitons and suddenly we can whatever we can in theory just connect that into the grid and I can still plug in my iPhone and charge it, right? It’s that decoupling, that’s magical. And that’s all dataware is doing. It’s the power grid for information management. So, what that means though is that all the different capabilities you have to make sure that it fits your purpose right? If you’re building a dataware platform, you don’t want to over-engineer it, you don’t want to under engineer it, you want it to be fit for purpose. So, you have to actually figure out what requirements you actually have to have a data layer that spans applications, that provides a human interface for regular business users to interact with it. What are the features you actually need? I can tell you the features that I need in my environment, but they’re going to be slightly different than what you might need.
Jeff Doolittle 00:39:55 So in a sense, I guess it sounds like dataware is, it’s like it’s a form of software. I mean somebody’s got to write this software to provide these capabilities, but generally speaking, it seems like what it’s doing is it’s decoupling the data, the data management, the data access controls, and then this temporality, as you said, it sounds like that’s one of those things, it’s like, it sounds pretty cool by the way. I mean, I could try to go back and event source everything from scratch, but good luck. That’s a non-starter because the data’s already shredded into relational tables, but whatever. But the ability to do this temporality, but broadly speaking, it sounds like it’s a shift in: here we’re writing software that’s explicits purpose is not to solve this particular business use case. It’s to solve this data collaboration case. And then the business case can be provided by an application on top of that. And one of the challenges is collaboration. Right? And the challenge is, if I’m building a simple application, building a dataware platform is going to be excessive.
Dan DeMers 00:40:52 Yeah. By like a million times. Yes.
Jeff Doolittle 00:40:54 But if I can leverage them, especially in bigger environments. So, let’s talk about that a little bit too. Like there’s a lot of tools and technologies out there to try to simplify the integration burden. And I won’t name any vendors, but listeners might be familiar with companies who basically say, hey, just plug all your data sources into us, and then we’ll let you create these complex workflows that shuttle the data around to all these different places. And dataware seems like a different approach to that. So, how does it differentiate from maybe some of those other more integration-based approaches?
Dan DeMers 00:41:24 Yeah, well I’d say you can kind of draw vendors and technological approaches and whether they’re open-source projects or closed-source or internal proprietary approaches or whatnot into one of two categories. It’s either facilitating better, faster, cheaper integration, or it is enabling the minimization of integration. So, it’s either pro-integration or anti-integration technology. So, what’s kind of interesting, and this causes confusion, is so why would I want to do integration? It’s because I want connectedness and reuse of data. Why would I want to use anti-integration, i.e., collaboration? Well, it’s because I want connectedness of my data. So, the ultimate end goal of having data be organized in a connected way is a universal need, right? Everyone wants their data to be integrated. The question is, do you want to do integration or collaboration? Which is just which path gets you to that end goal of connectedness of data. But I think you can largely put a technology either into its facilitating integration or it’s facilitating the avoidance of integration. And on the surface, some of the promises may sound similar, but as the industry matures, I think you’re going to start to be able to more clearly differentiate those who are in favor versus those who are against integration as the pattern.
Jeff Doolittle 00:42:47 Ok. So, if I’m somebody who’s writing software and I want to explore dataware, I imagine like any other software I have to integrate with, there’s going to be some set of APIs that I’m going to be interfacing with. And then for end users, it sounds like there’s going to be some, I don’t know, ability to maybe explore and see.
Dan DeMers 00:43:06 Yeah. Like the human interface data.
Jeff Doolittle 00:43:07 Yeah. So, share a little bit with our listeners about what is the human interface on top of dataware?
Dan DeMers 00:43:13 Yeah. What’s interesting is the human interface and the machine interface or the application interface or the code interface, whatever term you want to use, they actually share similar characteristics in terms of how they’re powered. And how they’re powered is through metadata. So, if you think of, I don’t know, I’ll use just a relational paradigm just to simplify the conversation. If you have like a table and I design the model of that table, I give it a name and I give it some columns, and these columns have a particular column type and whatnot, well that structural data, which is also available as data itself, that gives you the model, right? The schema. I could generate an end user experience or generate an endpoint, whether it’s a, a soap endpoint or a REST endpoint or expose a view of graphQL or whatever future standards emerge, it doesn’t matter.
Dan DeMers 00:43:59 And I can have that endpoint, that experience, whether it’s an HTML interface or anything, it doesn’t matter, be adaptive based on the metadata, right? And that’s very simple because it’s just taking the structure but add in the dimensions of the controls, add in the temporal capabilities and all the other considerations. Basically, what you’re doing is you’re harnessing metadata to build hyper-adaptive experiences, whether it’s for humans or for machines, that adapt dynamically to the metadata such that if I go in and I don’t know, do something as simple as rename an attribute of an entity, then the screens should adapt themselves accordingly. And the machine interfaces, which maybe you’re exposing it as JSON over REST, should also adapt accordingly. And if I have plasticity enabled such that I may be a program interacting with the REST endpoint, getting the JSON back, where I assumed a certain model, and you have awareness of who I am where I can honor that and respect that and, and be able to track and basically prevent you from breaking your code, I could even do the same for a human as well, right?.
Dan DeMers 00:45:00 So, I can insulate even humans from the dynamicism of schema evolution. So, the mechanics though of how you activate metadata to build these interfaces dynamically is, is actually quite the same. It’s just what is the actual end experience, right? Is it an HTML interface? Is it a mobile experience? Is it an AR experience, a VR experience, is it a REST experience? Is it, these are all just now experiences. So that’s what you have to think of. Applications are really experiences that will interface with data and add, of course, logic around that. But the experience is still part of the software, right? It’s not part of the dataware. Does that make sense?
Jeff Doolittle 00:45:40 I think so. Let’s talk a bit about access-control management, because I think that’s a significant challenge with a lot of what we’re trying to do with data. And so, you mentioned metadata, which that’s unfortunately it’s a very meta concept, like metadata could be literally anything. But I imagine one aspect of the metadata is how are we doing controlled access to the data, and how does that kind of shape out in this dataware landscape?
Dan DeMers 00:46:04 Yeah. And I think, again, the opportunity of having a standard layer that separates software from data, meaning multiply qualifications uniquely opens up the ability to have consistency of controls, right? And the ability to have the controls be enforced in the data itself. If you think of the traditional approach where you have individual apps that each solve different business capabilities and they all have their own local data store and their own local data model, and you’re transforming it from one app to another, where there’s basically separate copies of that, even if it looks a little bit different, it’s a derivative of, therefore it has elements of — the problem with that approach is the controls. And I don’t mean things like authentication or even high-level authorization. I mean like whose salary can I see as a simple example, right? If I have salary data in 50 applications, well whose salary can I see? Imagine I have some level of access to these 50 applications. And some of these could be operational systems, some could be analytical systems, some could be reporting, maybe I can access a Tableau report or a click report or an app or an API that I’ll interface with separate copies of this data. Like, how do I ensure that I can’t see my boss’s salary or I can’t change my own salary? Or if I …
Jeff Doolittle 00:47:17 Well that might be a feature, not a bug.
Dan DeMers 00:47:19 Oh yeah, exactly. So, it’s one of those things that, until you take a step back and realize it’s actually just impossible to have consistency of controls in any organization of any complexity, which is pretty damn scary. And this is someone coming from a background of financial services where if you’re a customer dealing with a bank, know that the bank — not because they’re dumb, not because they’re trying to screw you. They have hundreds, probably thousands of copies of your data and they’re trying to control it, but they can’t. It’s like there’s a reason why a bank vault has one door, not a thousand doors, and they’ll just add a new door every time you want to take it a deposit or a withdrawal, right? It’s, you need to have that ability to have the controls be defined and universally enforced.
Dan DeMers 00:47:59 And again, that separating data from applications where you can have many applications collaborating on data is the opportunity to move the controls from the application code into the data itself. So now that simple salary example is a data policy that says — and different organizations will have different rules, maybe some have an open policy where everyone can see each other’s salary — but imagine a rule that says you can only see the salary of yourself or anyone who works for you either directly or indirectly. And as you move through the organization, maybe you get promoted or demoted or I change departments, et cetera, that’s all adapted, that’s all dynamic. And whose salary can I change? Well, I can’t change my own salary, but I can change the salary of my direct reports. But maybe I can only do that when comp season is open and maybe we do an annual comp review unless there’s an exception process.
Dan DeMers 00:48:40 Like, all of these rules can now be expressed such that they’re applied and enforced in the data such that it doesn’t matter which of the 50 applications I’m interfacing with, the controls are guaranteed to be the same. And if I write a buggy application and the buggy application says, here I’m going to give you this person’s salary that you shouldn’t have because I’m kind of dumb and I didn’t know that you’re not supposed to see that, well it’s not going to work because it’s not running under the application’s credentials, it’s running under your credentials, and you don’t have access to that. Which is a big difference. Instead of apps having service accounts to application-specific databases, right? Where the app code has unconstrained access to all data in that database is it’s all running under the credentials of whoever the ultimate end user is, be that a system or a person.
Jeff Doolittle 00:49:24 Interesting. So, if I’m understanding that correctly, then the application would always be executing on behalf of the end user and that way the credentials that are passed to the dataware would be the user’s — or I mean it could be a system, but it wouldn’t be the application itself.
Dan DeMers 00:49:39 Yeah. Some type of identity, whether that identity is an artificial human or a genuine human, it is running under the identity, and that identity has credentials and those credentials change over time. And those credentials should be configured by whoever ultimately owns the underlying data that is being protected.
Jeff Doolittle 00:49:54 Sounds like it would be pretty important then to also be able to do a couple of things. One, audit those access controls, and to be able to do that independently, directly with the dataware platform sounds like a pretty important thing. And then also the ability to test and make sure that your access permissions and controls. So maybe speak to that a little bit about how are existing or future dataware platforms going to address those kinds of concerns as well?
Dan DeMers 00:50:16 Yeah. Well, the way that we’ve handled that in ours, and I don’t know if — in theory, there could be other ways of doing it — but is we simply treat the control data like those grants as data. And similarly, theyíre under the protection of dataware, right? Where it’s all version-controlled is access-controlled. So, who has access to the access data? Yeah.
Jeff Doolittle 00:50:37 Right.
Dan DeMers 00:50:38 And having the granular control over that and the temporal nature and the ability to have the insulation, basically data plasticity and schema plasticity and all these other considerations, adding that to your control data — because at the end of the day, it’s just data, right? — is the ultimate safety net. Because it gets into interesting scenarios that you have to design your policies around. For example, in that salary analogy, if I change departments when I go back into the time machine, can I see the salaries of the people who worked for me in the past? And this is all, what’s interesting is dataware will force you to ask yourself some questions that you’ll need to answer, but you never really even had this question before because you weren’t even able to do these types of things, right? So, it gets quite interesting when you have some more complex scenarios, but it’s powerful because you can choose as the owner of data what you want that experience to be. But I think the simple answer, and I think you’ll find this as a common consideration of any dataware implementation, is that the protections that you’ve put for business data, you’re extending that to all other forms of data about that data. Be it control, be it structure, be it description, be it any other metadata. It’s just data.
Jeff Doolittle 00:51:52 So let’s switch gears a little bit. There’s a concept in computer science that’s been around for decades, and this sounds like it’s going to blow it up. So speak a little bit to the idea of encapsulation and information hiding because my challenge is, as I look at this, and maybe it’s still relevant, maybe it still applies, but I’m wrestling a little bit with how real world systems, like we don’t have a conversation by cracking to burner skulls and connecting our neurons and our axons and our dendrites; that would be dangerous and gross and painful and all the other things. And so how is dataware not that? And I don’t think it is that, but I mean, I don’t know. Because I mean, in my experience, systems that don’t do a good job at information hiding tend to be incredibly complex and impossible to maintain. And so, help us with the nightmare scenario that people might, like me, be thinking about when we say, oh my gosh, we’re just going to connect everything to everything now.
Dan DeMers 00:52:45 Well actually the analogy that you gave is perfect because you and I have separate brains, and that’s not an accident, that’s an intentional design, right? And there’s the concept of a collective intelligence, which I think for a long-time people thought that’s where we were trending towards, right? Where you have basically the central source of all knowledge and everyone can just kind of hook into that. In that type of a model, though, the eventuality is it becomes the Borg, if you ever watch Star Trek, right, where the agents are mindless, they have no autonomy, they have no independence of thought, right? They’re simply agents of the collective, but that’s not how it works in nature. And nature’s amazing at fleshing out the efficient model. And it’s not a collective intelligence. There’s no single central brain. It’s a collaborative intelligence. And collaborative intelligence requires autonomy, right?
Dan DeMers 00:53:33 Coming back to why you and I have separate brains, yet we’re able to collaborate. But you can choose as the owner of the information inside of your mind what information you want to hide versus what information you want to release. You can tell me your deepest darkest secrets or you cannot, right? That’s your choice as an autonomous being. Dataware is essentially embracing that same paradigm and extending that to the world of digital systems, right? Where you can have, whether it’s different business domains, different owners, different individuals, all similarly having that ability to hide information, i.e. manage access controls. But that’s a little bit different than what you were asking, which is the reasons why one would want to encapsulate both logic and data in the traditional world of software where software traditionally owns both the logic as well as the data. I’m thinking as I’m answering your question here, it’s an interesting question actually, but…
Jeff Doolittle 00:54:30 I think you answered part, well, maybe you answered all of it. I mean, generally speaking, the idea of you hearken to collaboration as opposed to centralization. We’re not talking about the one dataware database to rule them all like the Borg.
Dan DeMers 00:54:42 No, of course not.
Jeff Doolittle 00:54:43 No. And as you mentioned, nature’s done a fantastic job of encapsulating things where they need to be. And I guess that brings to the idea that there will be dataware speaking to dataware, I guess is what I’m hearing you say.
Dan DeMers 00:54:55 Oh, of course. You and I are having a conversation right now. And I’m seeing a bunch of pixels on my screen and I’m hearing sound coming out of my speakers, and we can collaborate and we’re using a language called English, and there’s the dataware equivalent in the real world is quite complex. I don’t even really understand it myself. It’s magical. But, and it allows us to have this conversation and not only that, it allows us to even pass information not direct from people to people, but even across masses of people and generations of people, right? Like, you know how to make a fire, but you were not born with that knowledge. How did you know that? No human was born with a knowledge of how to make fire, it’s magic, right? And like how is that possible? Right?
Dan DeMers 00:55:37 One thing that I always refer back to, and it’s almost like I’ve come to accept it just as a design principle is, well how does nature do it? And if you want to know the future of technology, it’s right in front of you. It’s all around you. It’s how do you digitize the real world? And that is the inevitable future of the digital equivalent of that real world, right? And there’s lots of, let’s say, design inspiration to borrow from. And collaborative intelligence and collaborative autonomy, and the concept of dataware is just an example, but it’s a really good example.
Jeff Doolittle 00:56:07 Yeah. It reminds me of something one of my mentors says a lot, which is that features are aspects of integration, not implementation. And what you’re describing here is a lot of potential integration points between dataware platforms of various capabilities and then the features can emerge from those integrations. Just like you mentioned we’re having a conversation here, right? We didn’t evolve specifically to have a podcast. There’s no feature in the human evolution to have a podcast. But what we’re doing is we’re integrating these various things together so that we can create something that didn’t previously exist. Not that no podcast has ever done before, but the concept of that is an integration of different capabilities and then emergent is the feature itself.
Dan DeMers 00:56:48 Yeah. And there’s no central storage of Dan’s information in Dan’s brain and your information in your brain that meets the needs of this specific podcast.
Jeff Doolittle 00:56:57 Right? Are there emerging protocols or things I imagine the ability part of this sounds daunting and as you mentioned like no small startup team should be building — well I don’t, maybe they should — but again, if they’re trying to build a simple application,
Dan DeMers 00:57:10 No they wouldn’t.
Jeff Doolittle 00:57:11 They should not be building a dataware platform. No, but what kinds of like, I don’t know, are there emergent protocols or commonalities that are coming out? Because I imagine there’s going to be competition in this space as well in different ways of doing things. So what’s kind of the landscape in that regard?
Dan DeMers 00:57:26 Yeah, and it’s the early days, for sure. If you just think of software’s been around for a while and it’s continuing to evolve and so dataware it is early days. However, there is dataware platforms, like we have a dataware platform that you can buy and you can use; you can buy other technologies that have similar capabilities and they might work even better for you in different contexts. But yeah, as a startup, if you’re trying to solve a particular — if you’re building an app for that, you don’t want to be building a dataware platform at the same time. So, to your question though, around protocols and standardization and whatnot, so zero copy integration is an example of a standard. Now that standard though is not a protocol, right? It doesn’t describe exactly how to technically implement it. It really describes the framework that one would use to evaluate whether you are adhering to that standard or not, that is agnostic to the technology implementation.
Dan DeMers 00:58:16 So yeah, it is something that I know we’re planning to do through the alliance is to collaboratively create standards in that space. What you are seeing, though, is if you take data mesh as an example, like there’s a lot of hype around data mesh, which is basically borrowing domain-driven design from software architecture and applying it to basically your data analytics infrastructure to avoid the creation of a monolithic data warehouse. And breaking the warehouse into these different data products that are organized into different domains. And you’re seeing that go from a theory to talking about the people and process side of it to now the emergence of technologies that claim to implement this. And again, that’s narrowly focused on the analytics plane, but you’re seeing like real technology bringing some of these principles to life. So, I think the stage that we’re at right now is you’re having individual vendors having their own spin on it. And the problem with that is it doesn’t enable interoperability between dataware environments, right? If you built a data product in a mesh-type context to serve analytics and I have a different dataware platform, my ability to seamlessly interface with yours requires us to do guess what? Integration.
Jeff Doolittle 00:59:26 Yeah, that’s right.
Dan DeMers 00:59:27 Right? So, I’m now integrating my dataware platform to your dataware platform. Now that’s still a much better world than integrating every application to every application. So, it’s a step in the right direction. It’s kind of like the evolution of networks. We didn’t start off and the first network wasn’t the internet, right? The internet is actually a network of networks. The network had to come first. That’s kind of where we are in the world is we have networks, but if you remember the early days, you got token ring and Ethernet and even before that there wasn’t even like, it’s kind of like those early days. And that being said, I can choose to buy an Ethernet or a token ring and maybe I can’t bridge them together, or I can choose to have all my computers be working in isolation and not even have a network, right? That’s not a good choice. So that’s kind of like, I don’t know, does that help?
Jeff Doolittle 01:00:14 No, absolutely. It’s going to be messy is what I’m hearing. But messy doesn’t, that doesn’t mean it’s not the right trajectory.
Dan DeMers 01:00:18 And you can’t sit on the sidelines like it’s not going to work because your competitors who take advantage of this, whether they build or they buy or they do a hybrid or whatnot, they’re going to have a lot less of that integration tax to slow them down. And how are you going to beat your competitor that is able to do things in a fraction of the time? Like it’s not going to work at scale anyways outside of some anomalies. So again, there is an inevitability to it. We’ll all be using dataware if you’re not already starting. But today it’s a way of differentiating and giving one a competitive advantage, but it very quickly pivots to become an existential requirement, right? Like try running a business today without software, whether it’s as a service or not. Just don’t use software, use pencil. Good luck.
Jeff Doolittle 01:01:02 Yeah. Not many businesses are going to be conducive to that anymore. I mean, even you go to the farmer’s market and they all have some payment gateway attached to their phone. Even they’re using. And I, you guarantee they got a spreadsheet, some, some Google sheet somewhere managing their inventory and their materials and stuff like that. So. Yeah, so good luck.
Dan DeMers 01:01:20 The software is eating the world. Dataware eats the software.
Jeff Doolittle 01:01:23 Dataware eats the software. Interesting. Well, it sounds like it’s going to be interesting days moving ahead as people start exploring more of dataware and then integrating dataware, and emerging patterns are going to come out of this. And I imagine, as you said, eventually we got to the network of networks and really, frankly, it also, it’s retained some of the warts from the previous and maybe that will be the case here too, but hey, it’s good enough. It’s working. So, we’re running with it, and sounds like a similar thing could happen with dataware.
Dan DeMers 01:01:52 Yeah. And that’s why we created the alliance, the Data Collaboration Alliance, is to, for parties that are interested in learning more about this as well as participating and contributing to the establishment of standards and the early days of the emergence of a dataware ecosystem. But ultimately working backward from that future that’s all standardized, it’s all interoperable and, it’s access not copies based and people have control over their data. That’s why we created that organization, and why we’re working with data privacy experts from across the globe as the initial members. But yeah, this is the kind of thing that’s going to be very, very exciting for some people. Scary for some other people, but for me it’s exciting.
Jeff Doolittle 01:02:29 Do you envision a world where, so for example, we talk about access, not copies — and then of course, what if you can’t access the copy because the network is down to these kinds of things. One of the challenges with these kinds of things too is like man in the middle attacks or bad actors in the system that don’t follow the rules, right? So, I mean, in my ideal scenario, let’s take like my personal healthcare information and a great world would be a future world where I bring that data with me and I own that data. My doctor doesn’t own the data, my insurance company better not own that data. The government better not own that data. Like, I own that data and ideally I bring it with me.
Dan DeMers 01:03:02 Well, owning the data is irrelevant. You mean to have control for that.
Jeff Doolittle 01:03:04 Control over the ownership of the data? That’s right. Yes, exactly. And but now the ability to revoke that control is where I see a challenge here. Maybe you can speak to that a little bit. So, I give my doctor access, I can’t stop them from copying it. And so, how are the conversations shaping up in the dataware space about challenges like this?
Dan DeMers 01:03:20 Yeah, so it’s interesting because even if you use Google Drive as an example, like I can turn on settings that prevent you from downloading copies of that, but there’s going to be ways around that. And quite frankly, if the screen is shown on as pixels, I can take a picture of it.
Jeff Doolittle 01:03:34 Yeah. And then you can OCR with a machine learning AI and then, yeah, there is, again.
Dan DeMers 01:03:37 It gets harder with innovation, right? It doesn’t get easier, it gets harder. And the same is true in the dataware world. So first of all, without that approach, everyone is forced to create copies of that, where those copies, even if they’re not choosing to make a copy because they want a copy, maybe they don’t have mal intent, it creates the byproduct that can be the source of a breach, right? Because the very presence of the copy, even if they don’t want it, is itself giving some risk, right? So, the reality is your doctor probably just wants you to get better right? Probably doesn’t want to steal all of your data. They probably just literally need access to be able to give you the right prescription. And they probably don’t care to see it after that. So, for the most part, like that’s going to dramatically reduce the risk and exposure.
Dan DeMers 01:04:26 But the absolute guarantee and assurance of that, it’s kind of like, even money and intellectual property in humans, like these are all things that have value and therefore we prohibit copies of them. It’s illegal. If I copy money, I can go to jail. But guess what, if I was smart and I did a bunch of research and I decided I didn’t care if I went to jail, I could probably find a way to copy money. But it’s not easy. It’s hard and it gets harder over time, right? And if I copy intellectual property, if I clone humans, right? It’s, these are things that, but the difference here is that these things are already recognized as being of value and respected as such. Whereas data, we say it has value, but traditionally we haven’t respected it as such. We don’t even try to do this, right? So, there’s absolutely a future where the copying of data will be illegal. That’s not anytime soon, but that is guaranteed that’s the future. And does that mean that data will never be copied? Sadly, no. Some people break the law.
Jeff Doolittle 01:05:23 Okay. Yeah. There will always be counterfeiters, but there’s ways that make it more and more challenging over time. Yeah. I still am going to keep…
Dan DeMers 01:05:29 Call the counterfeiter a counterfitter. Don’t call them a good citizen, if that makes any sense.
Jeff Doolittle 01:05:34 Yeah. Well, and maybe part of the future is where the network itself might need to take on aspects of dataware enforcement and things. And that isn’t to say that somebody couldn’t fudge with the network and mess with that, but you can imagine if you could create a network that you could check and make sure it hadn’t been tampered with, and there’s all kinds of implications for security…
Dan DeMers 01:05:52 Right. So there’s, there’s lots to be invented and innovated on in this space. So, this is just the beginning of the revolution. This is not the end of it. So, more questions than there are answers.
Jeff Doolittle 01:06:04 Yeah. Like maybe it’s not zero copy, maybe it’s few copies. But if those copies are under the control of a system that knows when it must purge, it must rescind, it must whatever. And again, now you’ve passed the buck to some extent, but that may be a way to help mitigate some of these. Well if there’s only one copy literally on a thumb drive plugged into somebody’s MacBook in Uruguay and it’s illegal to copy it, it’s going to be a problem for some use cases. And so, opportunity to innovate and explore and possibly see what might come up there. So, before you wrap up, tell us a little bit about your company Cinchy and kind of how dataware fits with what you guys are doing.
Dan DeMers 01:06:43 Yeah, so we’re all in on dataware. So, what we’re really doing is we’re building a platform that organizations can use to basically bootstrap their dataware transformation and change how they deliver change. So we’ve been working on that for five, six years now and been growing a business and we have some good enterprise customers using it, but we’re also committed to just accelerating that inevitable shift to dataware, which is why we also have the Data Collaboration Alliance that while we started, it’s an open not-for-profit that anyone can join and contribute to, to work collaboratively on standards that, of course ,Cinchy as a for-profit company is very committed to adhering to, right? Because we’re trying to create the acceleration of this future, and it’s not going to work if there’s only one dataware platform, right? That’s not the future. But yeah, so we’re used by mostly mid and large enterprise organizations to avoid all of the complexity of having to build data platforms inside of new software as well as make it so that whenever you have to do an integration, you can intercept that work. And we reframe that as a liberation, which is basically don’t integrate it from system A to system B is liberate that data by connecting it into a dataware environment and then from that point forward you can collaborate on that data, so liberate don’t integrate. So, we have a platform that’s pretty powerful. It has some of the capabilities we’ve described, there’s still lots more coming. But yeah, that’s, that’s what we do.
Jeff Doolittle 01:08:11 Okay. Well, if listeners want to find out more about what you’re up to, where should they go?
Dan DeMers 01:08:17 Two places. One is Cinchy.com if you want to check out our actual commercial platform. The other is datacollaboration.org if you want to know more about just the concepts behind this and how to enable data collaboration and not just to learn more about it, but we’re looking for contributors as well. So, there’s an open environment, the Collaborative Intelligence Network, you can actually join in, you can interact with dataware, you can use it to basically further the cause. So, depending on your interests, check out one of those two sources.
Jeff Doolittle 01:08:44 Great. Well Dan, thank you so much for joining me today on Software Engineering Radio.
Dan DeMers 01:08:48 Thanks for having me. It was fun.
Jeff Doolittle 01:08:49 This is Jeff Doolittle for Software Engineering Radio. Thanks for listening. [End of Audio]
DeMers has reinvented the relational database and called it Dataware! Maybe he was too young and missed the pitch of using DB2 or Oracle to decouple, centerialize, and control data separately from applications. I took a look at Cinchy.com to see if I was missing something. I went though a good chunk of the reference docs. Basically, it is exactly what I imagined – a proprietary DBMS missing a ton of the key features we discovered and added to SQL based DBMS systems over decades of refinement.
As far as I can tell, the only thing in the podcast mentioned you can’t do better out of the box with open source Postgresql is the automatic row versions. I’m not convinced that is a useful feature, but, if so, there are many ways to build persistent row versioning into a database (I even built a cool one myself with “undo” capabilities back in the early 2000s). Let’s check out some of these ideas:
– Centralized data: The very definition of a database “server”.
– Data decoupled from apps: This was the entire point of “relational” technology and “SQL”.
– Reviewable schema changes: This is why DDL was invented! Isn’t it better to have your schema and schema changes checked into git than locked in a proprietary “platform”?
– Multiple views of the same data for different apps: Even Postgresql supports views. Anything much more sophisticated runs the risk of coupling with applications and/or requiring integration.
– Centralized access control: Search for “row-level security” in your favorite DBMS documentation.
I don’t doubt that the features that allowed Oracle to become a huge company are compelling. But there are important reasons why this strategy hasn’t worked for everything.