SE Radio 665: Malcolm Matalka on Developing in OCaml with Zero Frameworks

Malcolm Matalka, founder of Terrateam, joins host Giovanni Asproni to talk about the reasoning behind choosing a not-so-widespread language (OCaml) and (almost) totally avoiding frameworks for the development of Terrateam. While discussing the reasons for choosing this specific programming language and the advantages and disadvantages of using external frameworks, they also consider a range of related topics, including static vs dynamic typing, the use of monorepos, and the advantages of choosing a single language that can be used both for web front ends and server back ends. The episode ends with lessons learned that can be applied to other contexts and projects.

Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related Episodes

Articles, and Resources

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Giovanni Asproni 00:00:18 Welcome to Software Engineering Radio and I’m your host, Giovanni Asproni. Today I’ll be discussing Developing in OCaml with Zero Frameworks with Malcolm Matalka. Malcolm is the co-founder of Terrateam, a CICD solution for Terraform and Open for GitHubs that integrates with GitHub. He has been developing software since 2002 and he has worked for startups and large companies across a range of languages, technologies and domains. Malcolm, welcome to Software Engineering Radio. Is there anything I missed that you’d like to add?

Malcolm Matalka 00:00:48 Thank you very much for having me. I think you hit all the important parts.

Giovanni Asproni 00:00:53 Okay. Here are some related episodes. 652 Christian Mesh on Open Tofu, 289 James Turnbull on Declarative Programming with Terraform and 204 Anil Madhavapeddy on the Mirage Cloud Operating System in the OCaml language. Today we’ll be talking about developing with OCaml with Zero Frameworks, which is, I would say quite an unusual approach at least.

Malcolm Matalka 00:01:17 Yeah, I’d say a little bit too. And on both accounts actually.

Giovanni Asproni 00:01:21 In both accounts, probably both accounts we would kind of ignite heated conversations among developers on the choices of language and the use of frameworks or which frameworks and all this kind of stuff.

Malcolm Matalka 00:01:33 Yep.

Giovanni Asproni 00:01:34 Okay. Let’s start then giving our listeners a bit of context and can you give us a brief description of Terrateam the application we developed in OCaml with a zero frameworks approach?

Malcolm Matalka 00:01:46 Yeah, absolutely. So Terrateam is what’s called a tacos in this industry, which is the infrastructure as code industry. And that stands for Terraform or Tofu as there’s two competing implementations here at this point, automation and collaboration software. And essentially what that means is it’s a class of software that allows users or customers, teams to manage their infrastructure using Terraform or Open Tofu collaboratively. In our case, that means that we have chosen to build on top of VCS providers such as GitHub and we’re also working on GitLab integration. So all of your infrastructure management goes through pull requests. In a core element, what kind of makes us different from other providers in this space is that we believe you should never have to leave your normal comfortable development workflow to use the product to make complete the task you want to complete. We try to integrate into your existing workflows.

Malcolm Matalka 00:02:50 So for example, if you are , creating a Kubernetes cluster, you would probably write some Terraform code to represent that and then if you’re using Terrateam, you’d make a pull request and then Terrateam would pick up that event on GitHub and perform what’s called a plan operation to say, if you want to execute this change, here are the things that I will do to make that happen. And then the rest of your team can go and review that output and give you approval. And once you have that approval, you can choose to apply that change, which means make that change into reality and then merge it into your main branch and then continue on. And for Terrateam, all that happens inside of, in this case GitHub.

Giovanni Asproni 00:03:31 Okay. What were the main issues driving your decision about the choice of language and the evidence of frameworks as well?

Malcolm Matalka 00:03:39 So I’ll say to start out with, I think it’s really important to understand that context matters a lot when you’re making technical decisions. And what I mean by that is, I personally believe that at the end of the day, the difference between choosing different technical stacks is relatively marginal compared to other decisions. And so what that means is if you choose Python or OCaml, I don’t think it’s going to make a huge difference at the end of the day, but what does matter is if the people that are working in that code base enjoy using that or can think in the way that that code base supports. So for example, OCaml is a statically typed language and it’s strongly typed. You can kind of think of it, but people might think of Haskell. So it’s very similar to Haskell OCaml but it’s not, it’s a little bit more practical I would say.

Malcolm Matalka 00:04:32 But it has, if you looked at two files, you would probably in the surface say they’re very simple. And for me, I think in a sadly typed language. So what that means for example, is I might choose to represent, let’s take something simple like the zip code. I might create a type that represents a zip code and then I might have functions that guarantee that if you give me a string that you say is a zip code, it enforces that that actually is a zip code. And now that we have successfully constructed a type of zip code, I can pass it around my code base and nobody else has to verify, oh, is this actually a zip code or not? So it’s no longer a string. It might be a string in the computer, but it’s not a string in the code. And for me, that’s a way I like to think about problems.

Malcolm Matalka 00:05:19 I like to say, how can I take this external input a user, and turn it into a type in the program and then what operations can I do on that type? And that is harder to do in a language for example, like Python where you don’t really have a compiler enforcing types for you and you have to go through more tests for example or understand your program at a different level. So at the end of the day, if you are very proficient in Python rate, great, use Python. If you’re very proficient at something else, like Rust great, use Rust. For me, I really enjoy writing OCaml and when I wake up in the morning and I have a problem to solve and I know I’m going to be able to solve it in OCaml, to me that is like a strong motivator. So to start out with like choose your tech stack for something that you enjoy doing. So that’s where we got started with OCaml is I’ve been a long time OCaml developer. Probably the path I took was starting out with actually PHP and then moving on to Python after that. And then playing around with C and C++ and eventually making my way to OCaml as I like started thinking more about types and how they express my program. So I’ve been working in OCaml for probably 15 years before we founded Terrateam.

Malcolm Matalka 00:06:39 And part of that was I have kind of a, I guess like either a naivete or an arrogance around like how hard can that be to do so I’ll see a problem and it looks hard. I’m like, come on, that can’t be that hard. It’s just software, right? OCaml is kind of like the wild west in that there’s a lot of competition there on how to do things and it’s a lot of really smart people. And one of the things there that’s very common is there’s multiple standard libraries and there’s multiple frameworks on how to express concurrent operations. And what I mean by that is you might have to do multiple HTP requests and if you are not using some sort of concurrency system or parallelism system, you’d have to do one request followed by the other. But if you have some sort of concurrency framework, you can say do both of these at the same time.

Giovanni Asproni 00:07:30 Okay. What about the rest of the team? So you really like OCaml, you’re a longtime OCaml programmer. So were you already working with other people that are in the same space then they use OCaml as their main language?

Malcolm Matalka 00:07:45 Not so much. I was definitely prior to Terrateam, I was doing OCaml more in passion projects and a lot of the software that we use in Terrateam is stuff that I wrote as part of these passion projects. There is, that’s called CUFP, which is the Commercial Users of Functional Programming, which is a yearly conference part of another larger functional programming conference. And that’s where a lot of the industrial people get together to talk about what they’re doing and a lot of hobbyists as well. But I had never worked professionally as OCaml before Terrateam.

Giovanni Asproni 00:08:18 Okay. But you found other people that had the same, shared the same passion with OCaml.

Malcolm Matalka 00:08:24 Yeah. Yeah. No, it’s, I would say it’s a small but really good community. And I don’t know about you, but for me there’s sort of this inflection point where a community becomes too big and I find it’s hard to be a member of it because you, a lot of communities get like they’re small core people where like those are the thought leaders in it and then everyone else either just is listening to it or it’s hard to have like a discussion about these complicated things. For example, if you look at like a lot of Reddit subreddits, you go to any, post on there and it already has like 2000 comments and it’s hard to have like a deep nuanced conversation there in my opinion. But the OCaml community is smaller such that there’s a mailing list and then also, a Discourse for it.

Malcolm Matalka 00:09:09 And it’s small enough that there’s a lot of really good, really good back and forth communication with a lot of people discussing how to do things, the future of the project. You can really have kind of a big impact in OCaml because it still is this core group of people. It’s definitely growing and we’re actually getting this week, there was pretty long Discourse thread sort of comparing ways that go does things better for a newbie than OCaml does, and what things we could consider incorporating into future versions of the language make it more friendly to beginners.

Giovanni Asproni 00:09:45 Okay. Understand. So if I can summarize is the choice of OCaml was due to the fact that you like OCaml, you’ve been a long time OCaml user, even if it was for passion projects only, but also the community is made this a small community with passionate people and probably clever people that somehow you enjoy interacting with is active and you can have a voice. So this basically you like the technology and the context around it because you said before otherwise, you know, could have chosen any other language pretty much to solve the same problem. Yeah. Is this, am I understanding?

Malcolm Matalka 00:10:23 Yeah, well one thing to add to that is I also feel very confident solving problems in OCaml. Like if you give me a problem, I feel very confident that I’ll be able to solve it in a reasonable amount of time.

Giovanni Asproni 00:10:34 Okay. And going back to the zero frameworks approach, what were the issues you are trying to solve with this? Why you said, okay, I don’t want to use any frameworks or Terrateam.

Malcolm Matalka 00:10:47 So the origin of that comes from just working with other frameworks in the past. I don’t want to throw it under the bus, but I’ve worked quite a bit with DDjango in the past that that’s probably the one of the main sources. And for me, I found that a framework like Django, let me go back one step. Let me say what is a framework? So for the purposes of this conversation, a framework is a library where it makes flow control decisions about your code. So rather than you calling the framework, you say, hey framework, when an HTP request happens, call this thing that I wrote and then go and route that to something else, etc. So it’s the one making the flow, the control flow decisions. So my experience with Django was pretty not great in that I found that I spent a lot of time debugging the framework and fighting the framework rather than solving the problem I had.

Malcolm Matalka 00:11:51 And part of that could have been that I’m just not thinking in the right way, that I’m coming at it from a different angle and there’s this impedance mismatch between me and Django. But that really led me to sort of like when I was talking about before where I think how hard could this be? I was like, okay, well I want to write things my way for the use cases I have. And that was the first passion project on starting to write a lot of backend libraries and our own frameworks or how to solve common problems. For example, HTP request route. So it isn’t to say Terrateam doesn’t have frameworks, it’s that we don’t take frameworks in as a dependency, but we’re okay taking libraries in as a dependency because they don’t control the flow control of your program. And that means changing out a library for a different library is quite easy. Changing out a framework for a different framework essentially is rewriting your program.

Malcolm Matalka 00:12:50 And there’s a few real key benefits there. One is that we are in complete control over the destiny of that software. So we’re not caught into another piece of software’s release cycle or when they decide to deprecate certain features that we might actually find valuable. Also, our framework only does the things we need it to do. So there’s never community driven functionality in there that we don’t consume but may interact negatively with things we’re trying to do. And for us, we are a bootstrap company, we’re pretty lean and having the ability to choose when we want to make changes like that to our software is really important. We don’t want to have to be running up against like a long-term support deadline, like deprecation for framework we depend on and being forced to do a big rewrite of our software because of someone else’s decision.

Giovanni Asproni 00:13:45 Okay. So when you say zero frameworks, you mean zero frameworks as in frameworks develop outside your team? Yeah. Things you depend upon, but within your development environment, you created your own frameworks to solve your own problems. Okay. So basically you said the zero frameworks mean we don’t get any of those in, but we create the ones we need that are perfectly suitable for the things we need because we are making them for that reason.

Malcolm Matalka 00:14:12 Exactly. Exactly.

Giovanni Asproni 00:14:14 Okay. And also, when you say zero frameworks, you mean absolutely none, not a nil. It’s like, or maybe use some, I don’t know, for, need testing framework to run your tests or something else. I mean, is zero an absolute zero or is almost there but notÖ

Malcolm Matalka 00:14:33 I would say 99.9% zero. We do have our own testing harness and I actually go back and forth on whether that is something that we should use that someone else made, or whether we should maintain that ourselves. Currently we don’t have a problem with it, so it just works. So there’s no real motivation there. But when it comes to production, the only external framework we use is one that handles HDP parsing for us and construction of queries. And that particular framework was written in a way where it’s quite easy to make it be consumed by your own system. So it was done in a way where we can integrate it into what we have very easily and if something better comes along, we’ve designed our integration in a way where we could flip it out pretty easily.

Giovanni Asproni 00:15:28 Okay. So you wrapped it, you wrapped the framework behind your own implementation, put a facade in front of it. So it doesn’t pollute your own application.

Malcolm Matalka 00:15:36 Yeah. And that’s really because HB parsing is complicated enough where we didn’t want to deal with that, to be honest. And other people can definitely do that better. But also there was an existing solution that we knew we could integrate into our system in a way where we felt confident we could remove it if we wanted.

Giovanni Asproni 00:15:54 Okay. So zero is not exactly zero.

Malcolm Matalka 00:15:58 Yeah. I mean it’s slightly bigger than that.

Giovanni Asproni 00:16:00 That is sufficiently close to call it zero.

Malcolm Matalka 00:16:05 I think this is an interesting growth as a software engineer. So when I was younger and even more obstinate, I probably would’ve been much more, this is the rule, we follow the rule and just the softness of age and experience has said we have rules of thumb and they’re great rules of thumb, but context really does matter a lot. And we have to choose in the situations, use our context to decide what the best option.

Giovanni Asproni 00:16:35 Yeah. I agree with that and I also know that the choice of framework sometimes is not made thinking deeply enough. Especially in situations where you have several frameworks in a single application, they may actually compete for the control flow of the application. Exactly. Creating all sorts of problems.

Malcolm Matalka 00:16:55 And then you end up saying, oh, well I’ll spawn this one in one thread and this one in the other and hope I never have to communicate between the two of them. because who knows what’s going to happen there.

Giovanni Asproni 00:17:06 Yeah. So it’s definitely understandable.

Malcolm Matalka 00:17:10 Yeah. But so we have both backend frameworks and frontend frameworks. So one thing that’s cool about OCaml is there is a compiler from OCaml to JavaScript. So that means a lot of our libraries end up getting used in both contexts. And for example, our API definition is OCaml code and we just compile it to JavaScript for the front end and to machine code for the backend. And we know it’s always communicating with each other using the same exact API definition. That’s one example.

Giovanni Asproni 00:17:44 That is actually interesting because before, when you mentioned OCaml chosen as the language because you liked it mostly. So any other language would’ve worked the same. But if you chose, for example C++, doing the backend would’ve been easy, but then for the front end probably would’ve to use something else. Maybe JavaScript or Typescript.

Malcolm Matalka 00:18:04 Yeah. Or at least I know you can compile C++ to JavaScript, but you least would’ve probably had to put pieces around it in another language to make it integrate in there. Whereas OCaml, you get, it’s a direct mapping to JavaScript. There’s a few competing options there. Actually we use one called JS of OCaml, which is meant to be more ergonomic for the OCaml developer. So it cares less about JavaScript. And it says, we’re just going to take how you want to write OCaml and turn into JavaScript. Facebook a while ago came up with what’s called ReasonML, which is a dialect of OCaml and it interops with OCaml, but it is designed to compile into more human readable JavaScript. And it has more of a direct mapping to JavaScript, but all that is built on top of OCaml and compiles OCaml. It integrates with all OCaml. So Facebook chose the path of, we think it’s important to be closer to JavaScript, whereas the path we’ve chosen is we want to be closer to OCaml also be able to use it in these JavaScript contexts.

Giovanni Asproni 00:19:07 Okay. Okay. Yeah. But I can see the advantages of this approach when you have to communicate before, sorry, between end and front end.

Malcolm Matalka 00:19:15 It’s also for again, a lean team. I think there’s a lot of value in not having to context, which between languages.

Giovanni Asproni 00:19:23 Yeah, yeah, definitely. Also, this is a question that just came to my mind also with this lean team. You do backend frontend, but you don’t have specialized frontend or backend developers in this context? No. Everybody can do pretty much everything.

Malcolm Matalka 00:19:38 Yeah, our engineering team is me and one other person.

Giovanni Asproni 00:19:40 Yeah. Okay. So it’s extremely lean.

Malcolm Matalka 00:19:45 Yeah, very lean. Very lean. I will say that our style sheets probably suffer a little bit from being so lean. I don’t have necessarily an eye for, a beautiful front end, but it’s functional and it works.

Giovanni Asproni 00:19:56 And what, using OCaml, what tools are around there that can make your life easier? If you chose Java for example? An idea that everybody uses is probably intelliJ or maybe not everybody, but most people. With refactoring capabilities and all sorts of nice things to navigate or they manipulate it easily. What is the standard in OCaml?

Malcolm Matalka 00:20:17 So for me, I’m on E-max, but a lot of people use Visual Studio, which has really good OCaml support. And I think part of that is because Facebook has been such a large, or Meta now I guess, has been such a large user of OCaml, they developed a lot of tooling around there. But you know, the recent development in developing in the last 10 years has been LSPs. And that I think has been fantastic for a lot of people in the sense that I no longer have to choose a special IDE to get great functionality for a language. So I use E-max and I have all the same access to the LSP functionality that a Visual Studio user has. Really great,

Giovanni Asproni 00:21:03 So you can do refactorings and navigate your code easily.

Malcolm Matalka 00:21:07 Yeah, yeah. So I mostly use it for the navigation, searching through it, looking at what a type is. So like I said, ok ML is a strong type language and once you start getting to that mindset, you encode a lot of information in the type. So if you have an expression that you’re looking at, it’s really valuable to put your cursor over it and see what the resulting type is. Because that just tells you a whole lot about the context of that code and what’s going on and how the author planned on using that value that they’ve expressed.

Giovanni Asproni 00:21:40 Yeah. I think all expressing everything in types like this, I think in some circles are called also micro types. Like the example you gave of the zip code before. So you can keep it as a string and then check every time that is actually a valid zip code. You can do a small type that is basically past the string, build it, check that the format is correct, and then pretty much that and pass that around.

Malcolm Matalka 00:22:04 Yeah, I remember a very small, so this was such a small example and of a place where I was using Python and I was just like, man, I wish I had a strong S type language. Here was, I forget the actual program we’re doing doesn’t matter. But the point is, we had a function that was receiving something that looked, that was iterable, like a list as input. And a precondition we wanted was that that list needed to have all duplicates removed. So it needed to be a set-in reality. And it’s kind of an anti-pattern in Python to check what type something is you really should be, oh, can I just iterate through it? You know, it’s the duct typing thing. But it was really important in this case that it was actually head duplicates removed and our only option was just to check each time to, essentially we converted the iterable to a set and then use that set going around. But I was like, well, why do I have to do that? Because other parts of the code could already have made that guarantee for us. And this is something where it’s a little paper cut I’ll say, that adds up when you just think of all the different places you’re doing things like that when if you could just say it’s a set and then that function consumes a set at the end.

Giovanni Asproni 00:23:23 Yeah. In terms of framework. So we said that, well, apart from the one framework for HTTP parsing, you develop the other ones, including the unit testing one is said, but how far did you go? So do you spend a lot of time developing internal frameworks for your system or you need only some?

Malcolm Matalka 00:23:43 At this point, we don’t spend much time doing it at all. I would say the deepest we’ve gone is we have our own Postgres driver as well.

Giovanni Asproni 00:23:51 Wow. That that’s, why did you do that?

Malcolm Matalka 00:23:54 Well, so we, well actually is that the deepest we went? So I guess even the further layer down is we have our own concurrency framework, which that’s what I said. It’s the one that says, I want to do two things at the same time. Let me express that and then give me the result. And part of that means if you want to go up from there. So that’s the base level. And so the next level up from there for us is we want to interact with the database. We implemented our own database driver and that’s actually broken up into two parts. One is what we call a codec, which is what parses the bytes. And the way that works is it’s actually does, it’s a library. So you have some other piece of code that reads bytes and then it hands it to the codec and the codec says, okay, you’ve given me enough bites to consume.

Malcolm Matalka 00:24:38 We call them frames and here’s the decoded one. Go do what you want with it and then keep on giving me more bytes. And then, oh, you’ve given me a frame, you want to turn into bytes, here’s the bytes for that frame. So that’s a library. And so anyone could actually use that if they wanted to build their own Postgres driver. But then on top of that you have the Postgres protocol, which is say, says even if we have these frames, what’s the order they have to happen in? If I send you this frame, what do you have to respond with? And that one does integrate into our concurrency frame. So for the listener who’s thinking, oh, this guy did it, that’s great, I can go do it. I want to be really clear that a lot of this was work I was doing on my own before founding the company. We wouldn’t have done this if we had to deal with the frameworks and found the company at the same time.

Giovanni Asproni 00:25:27 Okay, that’s interesting. What would they have done instead if you?

Malcolm Matalka 00:25:31 I think we might have used some existing frameworks that are in the OCaml world. I would’ve tried to stay in the OCaml world if we couldn’t find what we wanted there, which I think is unlikely. I think we probably would’ve gone either, we would’ve looked at architectures that might simplify at the expense of performance using like perhaps we could have used like even like CGI or something like that with a column behind it. And then worst case look at another language option. For me, my next language would probably be Python. Even though I’m not a huge Python fan, I absolutely accept that it has its benefits and it has its purposes and I am proficient in it. So that would’ve probably been the plan B in that case.

Giovanni Asproni 00:26:19 Okay. That seems to, tying with the context that you mentioned a few times. In this case the context of when you founded the company, what you got, what you had available.

Malcolm Matalka 00:26:29 Yeah, so I mean I’d spent almost 10 years just for fun developing these tools on my own to solve problems how I think they should be solved. And I had been developing a bunch of personal projects with those tools. Some of them I was trying to turn into a company, but just the idea didn’t work out. And so once we got to the point of making Terrateam, I felt really confident that what I had written was production ready and we could go with that. And I think that that has been one way we’ve managed to really truly stay lean. There are, there’s been a few times where we have bugs in a framework and because I’ve written pretty much every line, I know exactly where to go to start looking at what’s going on there. Another benefit we have, which this has nothing to do with your own frameworks or not, but we are a monorepo company so we have everything in the same repo and with our own frameworks in that repo. It’s nice where if you’d need to just add some sort of debugging statement somewhere really low in the stack, you just do it recompile and run and you’re good. And I, so for me, I’m very into a monorepos as well.

Giovanni Asproni 00:27:41 Hmm. Okay.

Malcolm Matalka 00:27:42 I think the value there on debugging is just so high where if you’re trying to manage your dependency manager to get your own custom version of some low to level dependency in there, you end up going through, in my opinion, a lot more trouble than it’s worth.

Giovanni Asproni 00:27:57 So the monorepo, you chose the monorepo in all these contexts with OCaml, no frameworks, monorepo or with something in mind in terms of how to manage the system, develop it and debug it in a way that is easy come out for you?

Malcolm Matalka 00:28:16 I took a lot of inspiration actually from SQL Light where they are, I mean he’s gone even further than me where he has his own IT replacement for managing SQL Light, but they are very much into everything, you have to treat it as a whole system and you have to think about how all this interacts and yes, this one thing might solve your immediate problem now, but how is that going to affect when you get a customer support call and you’re trying to figure out why this weird interaction happened, but you don’t control how those interactions happen. You end up having to learn someone else’s code on the fly rather than having gone in designing it. So that is, if you can do that, if you can be there when the time is right, where you have all this tooling and now you have an idea you can execute on, I think the benefit of being able to understand your whole system and know what each piece is doing and be able to jump in when you have a, usually when I get a support issue, I know roughly where in the stack that that issue is and like almost down to the file that I need to look at to think about how to address it.

Giovanni Asproni 00:29:28 Yeah, I think some frameworks are actually good in this as well because they review hooks for observability purposes, basically checking to them. But I don’t think they’re the majority yet. Many of them will work in obscure ways when you have defects or issues and it’s really difficult to figure things out.

Malcolm Matalka 00:29:46 Yeah, yeah. And a lot of popular frameworks are popular because they provide a lot of use cases to a lot of people and that means a lot more code and a lot more both positive and negative customizability where you have all these hooks do different things, but you also have to understand how to use all those hooks. A lot of documentation there. So again, that can be the right choice in your context depending on what you’re doing. But I think if anything, whether you want to do frameworks or not do your own frameworks, I think our industry would benefit from thinking about our applications as whole systems and less about this one ticket I’m working on or this one component I’m working on or just writing the code versus maintaining the code in the future.

Giovanni Asproni 00:30:36 Yeah. With the approach you chose. So I’m curious to know what worked well, a bit of hindsight if you like. So you already said, OCaml was that tool for the job because you knew the language very well. You’ve been working that for quite some time. You mentioned the strong typing was a must anyway. So I would imagine that if you chose another language would’ve been something with strong typing or at least strong enough with C++. It’s difficult to say that it’s very strongly typed sometimes, but also you mentioned the fact that you can compile into JavaScript. So this allows you to use one tool for backend and frontend and also use, well share lots of the same structures in the communication. Are there any other advantages that you found out using OCaml? I mean this, these are already quite a few, but I’m wondering if there is something else.

Malcolm Matalka 00:31:31 I think that, this isn’t something going in to it I was thinking deeply about, but I will say that I think the OCaml community is at a sort of a size where there are actually a lot of library choices for consuming different integrations or different file formats. So I think that one thing you can get stuck on if you’re going to build a lot of stuff yourself is if you’re doing a lot of integrations getting stuck on other people’s APIs and having to implement that or if you’re doing data related things, having to consume different file formats and especially file formats that aren’t super strict. So I can think of like CSV comes to mind where there’s this concept of a CSV file, but how you escape like a comma inside of a cell really depends on what tool is being used or it may be that there’s these slight variations there.

Malcolm Matalka 00:32:41 So for us, since we are happy to consume libraries, the OMO(?) world actually is big enough where there are a lot of libraries for different things we want to work. So there are, for example, there are other Postgres libraries. So there is a place where I could go and look at other implementations and think about how they were doing it. Maybe there’s some interesting parsing going on there that I had to figure out my own. And there’s a pretty good encryption world in OCaml for some reason. I think it’s because actually there’s a large number of cryptocurrencies developed on OCaml. So the crypto libraries are actually pretty solid. And so one thing that we do quite a bit actually is we, our API is based on you request a token that is signed by a server and then you pass that around to do different other API requests.

Giovanni Asproni 00:33:36 Yeah, okay.

Malcolm Matalka 00:33:38 But I, for example, if you are really into sort of a less well-known language take, want to tie off like Icon or something like that or even SML, which is kind of related to OCaml, I think you might struggle, you might be able to, you will be able to do your own frameworks I think, but you might struggle when you come into contact with the real world and what APIs or file formats exist out there depending on what problem you’re trying to solve.

Giovanni Asproni 00:34:05 And what about zero frameworks approach again? So we mentioned the issues with, depending on external framework potentially also several of them that each of them trying to manage the flow of control of your application maybe in competing ways, but also upgrades and anything else related to outside dependencies pretty much. Are there any other advantages that you found out that you were not thinking about when you started with a zero framework approach?

Malcolm Matalka 00:34:36 So coinciding with the monorepo choice, I think the ability to have all of your code close at hand was not something that I went into it expecting to be such a huge win. So I knew that it was important for us, for my programs, for me to be able to control the flow of it entirely. But I didn’t necessarily go in with the idea of being a monorepo. But once essentially I started building out some frameworks and then I was like, all right, well I want to use this for something so I’m just going to put the application in next to the framework so it’s easy because I know there’s going to be a lot of back and forth there. And then as I started iterating on that and I was like, wow, that’s really easy. If I have a bug that I think is framework related, it’s just right next to it and I can treat that task the exact same way I would treat an application task in terms of the code change. And I think that is something that I did not expect going into this, but turns out to be really, really valuable.

Giovanni Asproni 00:35:38 Yeah. Also what I would expect is that if the application grows a lot or maybe a different kind of application with, lots of microservices and whatnot, you probably would need to develop some tools around them to manage the monorepo itself. You probably wouldn’t like to check out 20 gigabytes of code to change on service.

Malcolm Matalka 00:35:58 Well to go back to being lean, one thing we do is we implement a monolith. A monolith in terms of the binary that you get out of it. But the code structure and the architecture of the code is very, is split out into components of course, but everything does compile down into one single binary. But being with the goal of being lean, that also forces constraints on you to think about the scale that you want to attack problems at. I give you example. So right now we’re a feature that we want is when we run people’s Terraform code, sometimes it generates artifacts that you need between a plan step and an apply step. And in our case, we run all of the operations on ephemeral compute. So you get a brand-new computer every time you perform an operation, there’s a little bit of state sometimes you want to keep them.

Malcolm Matalka 00:37:03 So we are implementing a simple key value store API that the users will be able to upload a small artifact to and then pull it out the other side. And so we’re going to this knowing that one, we’re not going to be S3, we don’t want to be S3, we don’t need to handle a billion requests per second or whatever they’re doing. And we’re okay making trade-offs for simplicity and for staying small and lean and for debug ability that I think if you want to have a company or product operating at a different scale, you might not be able to make those same decisions that we are.

Giovanni Asproni 00:37:46 Yeah. And now another question with the, always with the hindsight what did not work so well, are there any aspects of the application for which OCaml was less suitable than other languages?

Malcolm Matalka 00:38:00 The space that we’re in is very Go heavy and that means that there’s a lot of tooling that’s written in Go that we cannot, or there’s libraries written in Go that we can’t easily interact with because we’re OCaml. And so there is some functionality that becomes a question of do we need to wrap this Go library in something else that for example, maybe communicates over a standard in and standard out or do we want to implement our own version of that Go library? An example here is we have a small feature where we index your Terraform code and Terraform has a concept of modules and you can say, okay, for this place use this shared piece of code in another directory or in another, you can pull it off the internet as well. And for the case where that module is located in the same repository as the code using it, you want it to function such that if you update the module, everyone who uses that module gets a plan and apply operation performed on it.

Malcolm Matalka 00:39:06 We want to see the actual output there. So we have a way to manually express that in our product, but also, we wanted a way to automatically do it so people could be more dynamic. And that involves parsing what’s called HCO or HashiCorp config language. And it’s a pretty simple language, but there’s a Go library that does it of course because Terraform Open and Tofu are written in Go, so they have to do it. So it’s just there. And in that case, we made the decision that we wanted to write our own parser because we think that is a steppingstone to doing more complicated things in HCO and we wanted that functionality there for ourselves. But there’s other linting operations that, we don’t consume the state or plan, we don’t have parsers for those yet. And there are parsers of course in Go and we aren’t sure whether that’s something important enough to us where we want to implement it in OCaml or we want to look at how could we wrap that in a Go CLI or something like that or how to integrate that. So in our case, it’s kind of like the underlying challenge is the space is in a language that we’re not using. And so there is a little impedance mismatch. Additionally we are open-source, so being an OCaml doesn’t necessarily attract Go programmers. So depending on how we want to interact with the community, OCaml can be kind of a stumbling block for getting contributions.

Giovanni Asproni 00:40:40 Actually, this is interesting because assuming that you’ll be wildly successful and the company will grow, this I think will pose interesting problems in terms of hiring people, finding developers that want to work in OCaml. I mean just for the fact that the communities is more compared to other language communities.

Malcolm Matalka 00:41:03 Yeah. My opinion there, and this hasn’t been tested yet, we’re starting to get to the point we’re going to grow our engineering team more, but my view is that if you’re trying to stay relatively small and lean, you want to grow maybe an engineer every year or two? As long as you can attract someone to IT and you’re willing to invest in long-term success for them, then the training them up on all of this is not a big challenge. So we don’t really expect to have to hire people necessarily from inside the OCaml world. We’re happy to train them in everything they need to know and especially this because we’ve implemented so much of our own code, they would want to be trained up in those specifics anyways. So I think if you’re an Uber or some other company that is trying to hit hypergrowth, this is definitely not something like OCaml you are going to struggle to get people that can hit the ground running. But if you are a company that’s willing to invest a lot in the training and long-term success of employees, I am less concerned of that for the future of the company.

Giovanni Asproni 00:42:15 Yeah. And also you might be able to actually, grow the business without hiring hundreds of developers anyway.

Malcolm Matalka 00:42:21 Yeah, yeah.

Giovanni Asproni 00:42:22 I saw, I think recently that Blue Sky, the entire engineering department was 15 people running the show. I mean it’s millions of users so it’sÖ

Malcolm Matalka 00:42:35 Yeah. And I don’t know, I don’t remember how big WhatsApp was but it was double digits, right? And all that was in Erlang, which is a relatively small language as well and on the scale of OCaml.

Giovanni Asproni 00:42:46 So yeah. So maybe a problem or maybe not a problem.

Malcolm Matalka 00:42:50 Yeah, I was talking to, so in the journey of us deciding to go open-source, we talked to a lot of people because we were just terrified that we went open-source, all of our revenue would disappear and what would we do? Then I talked to some other people that were in the open-source world and I talked to Adam Jacob of Chef(?) and he’s actually in the infrastructure world now too with system in it and Chef (?) actually had a lot of Erlang in it, which again is one of those more esoteric languages, not a huge community. And I asked him about contributors and his response was, you might actually get a ton of contributions when you go open-source because what happened in Chef is there’s all these hobbyist Erlang people who are so happy to see a production piece of software written in their favorite language and they just jump on it and just want to, like, they want to change the world, right? So they want to implement as many pull requests as they can to be part of that. So they’re just super happy to see something that is out there that the rest of the world is using.

Giovanni Asproni 00:43:57 That’s interesting. Becomes a win-win situation then. Exactly. For the company and the community of people that are actually interested in this, well, let’s say esoteric languages, they don’t find a lot to contribute to it. So they jump on what is available.

Malcolm Matalka 00:44:11 And I donít know if you remember the, what is it called, the Blood Paradox article from Paul Graham where essentially, he makes this argument about, I think it’s about Python at the time, this is how old that article is, where Python was kind of a nothing language. And he said, look, go find Python programmers because those are people really passionate about the technology space and they’re probably really into that language so they’re going to know it really, really well and because they’re willing to be off in this other space, they’re probably going to be, get the exact language he uses, but kind of like more thoughtful about a lot of stuff and have thought through a lot of things because they’re making this explicit decision to go a different path.

Giovanni Asproni 00:44:57 Yeah. What about the framework? So again, have you found situations in hindsight to say, oh gosh, we should have used the framework, an outside framework here because we are spending, I donít know, an inordinate amount of time solving this problem or any other reason?

Malcolm Matalka 00:45:14 So it’s not something where it’s such a pain point where I just wish we didn’t do it, but one thing we have is a JSON schema compiler to OCaml. So we generate OCaml code from adjacent schema. So for example, we use that because we have our, we do a lot of GitHub API calls and I didn’t want to have to write the API out for all of those so, and GitHub nicely, they publish an adjacent schema for the entire API, but it’s huge. And so I decided to write a JSON schema code generator for that. And that is something where I didn’t sufficiently learn JSON schema before writing it, where the tool I wrote isn’t just kind of like a mess. So when I have to go and fix a bug, I’m just sort of like dreading that piece of code if I ever come across it.

Malcolm Matalka 00:46:10 And there’s a few things that I know I want to do in there to make life a little bit easier and I just don’t want to do it because that code is just so messy and gnarly and it works good enough where I’m just sort of like, don’t touch it, I’ll deal with this other pain point on the other side of like consuming this library. And we do a lot of JSON schema stuff just because JSON’s, you know what the internet speaks and especially when we’re adding new integrations, we try to find a JSON schema if we can find it, or we’ll write our own JSON schema if we can’t find one out there. So this one little tool ends up getting used in inordinately large amount of time and it has like the littlest pain points in it.

Giovanni Asproni 00:46:55 I’ve got the feeling that you’ll have to buy the bullet sooner.

Malcolm Matalka 00:46:58 Yeah, I think so, I think so.

Malcolm Matalka 00:47:01 But I’ll say we probably have the most complete GitHub API client, I think it’s three megabytes of code is what gets outputted. It takes 10 minutes to compile from scratch.

Malcolm Matalka 00:47:14 And that is just for the parsing. We can go both ways. We can parse requests and generate requests as well.

Giovanni Asproni 00:47:20 Ah, okay.

Malcolm Matalka 00:47:21 We also found some interesting, places where the GitHub API doesn’t agree with its own JSON schema.

Giovanni Asproni 00:47:28 Well that happens too.

Malcolm Matalka 00:47:32 Yeah, happens to the best of us.

Giovanni Asproni 00:47:34 Yeah, I’ve seen other situations where that that was the case?

Giovanni Asproni 00:47:40 In terms of lessons learned. So lessons learned so far at least. By using this approach, are there any lessons that you think can be applied to other contexts and systems? Yeah, so for example, how did avoidance of frameworks affect timelines or other considerations in general? Any lessons you think could be applied to other contexts or systems?

Malcolm Matalka 00:48:07 I think that for timelines, having our own frameworks has actually helped us be more consistent in predicting them because we kind of know that there aren’t, there isn’t this rabbit hole where you’re trying to do something that seems easy, then you realize the framework that you are using doesn’t really do it. We know what our frameworks can and can’t do and very rarely does it turn out we want to add this feature that ends up becoming this huge modification of a framework. It’s really nice in that way. Definitely, again like I really can’t stress it enough, to anyone listening. It’s just don’t go into this being like, it worked for this guy, it worked for me as well. Really be thoughtful about what you want to get out of it. But also, I think there’s, there’s the only internet talking points of what’s called NIH syndrome, right?

Malcolm Matalka 00:48:56 Like Not Invented Here. And I think definitely pay attention what people are saying in those cases. I think maybe our industry is a little too focused on you need to consume dependencies and somebody else has written that better than you have. It’s better off using it. I think you have to be more thoughtful there and you shouldn’t be afraid of putting something yourself as long as you can convince yourself that you’re getting out of doing that work what you want to get out of it. And also look back on it, like if you did implement something and you realize actually this wasn’t the right choice, then like use that information for the next time. But don’t let like these there’s so many acronyms in our world of like you, GNI and NIH and all that. And I think we would benefit in general being more thoughtful about that because definitely for, like me talking to other developers, you get this resistance and you start talking about, oh, I did this myself, did that myself. We have this, we own that. And look at it really as the whole system. Like you really have to think about it as the whole system and there’s definitely gains to be made when we start doing more system level in this industry.

Giovanni Asproni 00:50:08 Yeah, and I think also when you work on the systems, you can still put safety net guards there. Like when you say you use the HTTP parsing, but you put a facade on it so it doesn’t pollute your system. So if that library is not maintained anymore or use something that is better, you just need to change that and the connections to it, but the rest of the application will be unaffected. So we can still protect ourselves against some choices or make new choices later, I think again.

Malcolm Matalka 00:50:41 Yeah. And so I sort of poo-pooed on Django earlier in this, but you know, if Django is the right choice for you, that’s fine. We’re certainly an extreme example of the degree that we’ve opened all of our own technology, but that doesn’t have to be true for everyone. There could be certain things where this is actually really important that we own this piece, but how HB(?)requests are routed is not important to us. We just need it to work. And that’s great.

Giovanni Asproni 00:51:11 And I guess this applies to pretty much everything, so even choosing the language, choosing the framework, anything. So look at the context where you are and take the decision based on that, I guess.

Malcolm Matalka 00:51:23 Yeah, something that I’ve definitely learned in my career and I used to be very opinionated about, oh, you should use this because that’s no good, or choose this because it’s way better. And like I said, I’ve softened up a lot where I think what your team or yourself are most effective in and also just enjoy using day in and day out. Even if it is, you know, for all the technical metrics, a less optimal choice. It might be the more optimal choice for social reasons.

Giovanni Asproni 00:52:00 Yeah. This, this is interesting because it’s one of those things that you, those thoughts that come to your head after a few years of experience, because I think each of us when we were at the beginning made decisions based on the perfect pay, more perfect language, obvious solution without thinking about any social aspects at all, focusing on the tool, maybe the nominal performances of the tool. I worked a lot in C++ in the past, you know, it was many people wanted to be close to the metal to go fast without even knowing if they needed to go that fast or if they were able to go fast in the first place. Every, there is C++ is, butÖ

Malcolm Matalka 00:52:42 Yeah. Yeah.

Giovanni Asproni 00:52:43 So sometimes we need to broaden the context and understand a bit more than the technical aspects.

Malcolm Matalka 00:52:50 Yeah. And I think that just because we’re people as well, there’s always going to be fads in languages. There’s going to be, I mean, there’s a huge thing of rewrite everything in Rust, right? And there’s a lot of blog posts around that and there’s a lot of good reasons to do that, and there’s a lot of not good reasons to do that. So don’t, going against the grain on decisions is not inherently assign that you’re not making the right choice.

Giovanni Asproni 00:53:17 Yeah. I agree. Well, thank you. I think you gave us a lot of food for thought today. So is there anything else that we missed that you’d like to add?

Malcolm Matalka 00:53:32 Like I said, we’re open-source. That means we’re a monorepo, so you can go and look at all I’ve talked about. If you go to our website and use the link to GitHub there, you can see all that. And I will say Iím so proud of the code in there, but I also am realistic in that there’s a lot of sharp edges. There’s a lot of stuff in there that works because I know how it works and it works for me. And somebody coming in, they might be like, well, that’s a strange decision. And I do know there’s places where, oh, it breaks down here and here’s where you got to be a little, little careful in using it. So I’m not going to say this is like production ready for other people. It’s definitely production ready for us.

Giovanni Asproni 00:54:13 Yeah. Well and nobody will be able to criticize your code about the JSON parts because you already said that it’s nearly problematic. It’s like, but the reality is that in any software system, you end up with parts of the code that are not the best one.

Malcolm Matalka 00:54:34 Yeah, yeah.

Giovanni Asproni 00:54:35 Possibly, right for a variety of free zone.

Malcolm Matalka 00:54:37 Yeah, I mean, going back to the experience breeds wisdom idea is I remember being young and looking at other people’s code and being very critical of it. And now that I’m there, I’m like, all right, well I know why those decisions were made and sometimes those are the right decisions.

Giovanni Asproni 00:54:54 I had to pay beers to colleagues once criticizing Sanko. I saw, I was really upset with them only to find out that it was the code that I wrote a few months earlier.

Malcolm Matalka 00:55:02 So

Giovanni Asproni 00:55:06 I think many of us make this mistake.

Malcolm Matalka 00:55:10 Yeah.

Giovanni Asproni 00:55:13 Okay. Thank you very much Malcolm, for coming to the show and it’s been a pleasure.

Malcolm Matalka 00:55:18 Thank you very much.

Giovanni Asproni 00:55:19 This is Giovanni Asproni for Software Engineering Radio. Thank you for listening.

[End of Audio]

SE Radio 665: Malcolm Matalka on Developing in OCaml with Zero Frameworks

Show Notes

Related Episodes

Articles, and Resources

Transcript

Join the discussion

More from this show

SE Radio 724: Jure Leskovec on Relational Graph and Foundational Models

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

Menu

Recent posts

Search

Search

SE Radio 665: Malcolm Matalka on Developing in OCaml with Zero Frameworks

Show Notes

Related Episodes

Articles, and Resources

Transcript

Join the discussion

More from this show

SE Radio 724: Jure Leskovec on Relational Graph and Foundational Models

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

Menu

Recent posts