josé valim

SE Radio 588: José Valim on Elixir, Machine Learning, and Livebook

José Valim, creator of the Elixir programming language, Chief Adoption Officer at Dashbit, and author of three programming books, speaks with SE Radio host Gavin Henry about what Elixir is today, what Livebook is, the five spearheads of the new machine learning ecosystem for Elixir, and how they all fit together. Valim describes why he created Elixir, what “the beam” is, and how he pitches it to new users. This episode examines things you can do with Livebook and how it is well-aligned with machine learning, as well as why immutability is important and how it works. They take a detailed look at a range of topics, including tensors with Nx, traditional machine learning with Scholar, data munging with Explorer, deep learning and neural networks with Axon, Bumblebee and Huggingface, and model creation basics. Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes


Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Gavin Henry 00:00:18 Welcome to Software Engineering Radio. I’m your host Gavin Henry. And today my guest is Jose Valim. Jose Valim is the creator of the Elixir program language and I love this title, Chief Adoption Officer at Dashbit, a company that focuses on sustainable open-source and continuous adoption to boost the Elixir ecosystem. He’s also a seasoned speaker and author of three programming books and an active member of the open-source community. Jose, welcome to Software Engineering Radio. Is there anything I missed that you’d like to add?

JosÈ Valim 00:00:51 No, I think that was a good intro and thanks for having me.

Gavin Henry 00:00:55 Thank you, perfect. We’re going to have a chat about what Elixir looks like today and the challenges you face at the time you decided to create Elixir, what Livebook is, and finally, Elixir joining the machine learning world. So I like to always ask when we’re speaking to someone that’s gotten major achievements like you, what challenges were you facing at the time when you thought, you know what, I’m going to go and build something versus I’ll try and fix what I’m working with at the moment. What challenges can you remember you were facing when you decided to create Elixir?

JosÈ Valim 00:01:28 I’ll answer the second part of the question, which is why I decided to create something instead of fixing it. And I hope I’ll come back to the first part of the question, but to me, I like to say that most people when they have a problem, they don’t necessarily think that they will, I’ll create a new program language that’s definitely going to solve my problems, right? It’s the regular expression meme, right? Oh I like, you have a problem, you use regular expression and I’ll have two problems. And creating a programming language definitely going to bring way more problems than two. So when I started, I was interested in learning, I had issues writing, I was struggling to write concurrent software in languages like Ruby. And then I say, okay, if concurrent, this was what, 13 years ago? 2010, 2009?

JosÈ Valim 00:02:17 And then we’re starting to hear a lot about concurrency, right? And today you have four cores in your wristwatch, so it’s starting to hear a lot about concurrency and people saying concurrency is going to be the future. So I was like, well if concurrency is going to be the future, I really want to learn and figure out how I write concurrent software. And then I started on this journey of learning and thinking. I would learn ideas that I could bring back to Ruby for example. But throughout this process, I say, there are points of no return. Points there were things that I learned. So when I learned about functional programming, because a lot of the issues I had with Ruby at the time is for example, if you had two threads and they were trying to change the same place in memory, that place would get corrupted.

JosÈ Valim 00:03:03 And in functional programming we are not changing things in place. We are always transforming them. And that really resonated with me because I thought, well that’s a better programming model. Because I don’t have to track how things change over time and it also solves the concurrency issues. So it was a point where I was, okay, I think in the future when I write software, I want to write software that is functional or if not functional, where things are mostly immutable by default. Where I am transforming things. I don’t want to mutate things anymore. I definitely want to mutate things but not as the full way of building systems. Yeah. And then the other point was where I learned about the learning virtual machine and then I was well this is at the time, so going back to 2010, if you look at all the programming languages that were coming out at the time, they were all talking about concurrency.

JosÈ Valim 00:03:57 They’re like, go had a focus on concurrency. Swift came later, had a focus on concurrency, closure, had a focus on concurrency. So concurrency was certainly a big topic. And the thing about the in virtual machine was not only that they did concurrency, but they did it three decades ago and they also solved the next step, which is, if you’re writing a system where a machine is not enough to power that system, you need to use multiple machines. The next steps that is need to think about distribution and how all those machines, they’re going to work together to tackle a certain problem. And then I was, well the virtual machine not only solved the concurrency, but they solved the next problem, which is distribution. And I fell completely in love with it and this is what I want to use. So at some point in this, I just want to learn just by learning, I completely change my career in the sense that this is the software that I want to write, this is how I want to write it. And the more I use the virtual machine, I thought that there were new ideas that could be explored. And I thought that the most productive way to start this conversation of how we can expose or use the virtual machine in different ways was by creating a new programming language and that led us to where we are.

Gavin Henry 00:05:18 Wow. I mean, I suppose it’s a great way to follow something that you get excited about. I was reading an article by Paul Graham the other day, which he just released called Do Great Work . And I think the main takeaway theme from that is, follow what’s exciting you because that’s going to keep you driven and keep you interested.

JosÈ Valim 00:05:37 Yeah, and I think it ties in with the challenges question because definitely I had a lot of challenges, but I think because I was excited about it, I never really saw that as a challengeís challenge in a way. You can consider challenge in a way that it can even be crippling or something that would be too hard to overcome. Iím always excited about this and there is work to be done. There is always work to be done, so let’s do the work and, in a step after the next step, we’ll eventually get somewhere. And I always had this idea that it could not work at all, right? So I was not necessarily too worried about failing. I always had the constant, well it can fail. I can spend three years of my life working on this and it’s not going to go anywhere. And I always had in my mind that that’s fine, it’s not a problem. I’ll definitely have learned a ton. I would have enjoyed the whole process, so that’s fine.

Gavin Henry 00:06:38 Yeah, you always learn something. Could you just take a step back and define the parallel beam (?) for us?

JosÈ Valim 00:06:45 Oh, so the virtual machine is, well the same way that we have the Java virtual machine and the Java programming language. We have the learning virtual machine and the learning programming language and the learning virtual machine and the programming language, they were created by Ericson, which is a telecommunication company and they created with the specific goal of to power their telecommunication system. So today we say, I hear different percentages but, we hear the majority, more than 66% of the whole four G traffic in Europe is going through Airlink machines. I hear people saying it’s 66% of the global traffic and things like that. But the point is Airlink plays a very big role in telecommunication today. And as I was saying, to build those systems they had to tackle a bunch of different problems related to concurrency, distribution, fault tolerance.

JosÈ Valim 00:07:46 And that’s one of the other things that made me fall in love with it because, many of the design decisions behind Airlink and the language, they were all driven from an engineering principle. It’s, well we have this problem and how we are going to solve this problem, right? They did not say, look, we want Airlink to be functional, for example, right? They did not start what with the features and then they derived the language from it. They’re, well we have those problems. Oh we need concurrency, we need this idea of supervisors for fault tolerance. And they were building everything from those principles. So one of the reasons why immutability is important from the point of view of Ainlink is because, so if you have concurrency, if you have many things running at the same time, and if now those different entities, they share a state because you have something that is mutable, it’s shared.

JosÈ Valim 00:08:39 If one of those entities crash, right? Can you guarantee that when it crashed it did not crashed in a way that the state was corrupted and therefore other things that depend on that same state are now corrupted too? Or they’ll now be depending on this mutable shared thing that can break the system. So they’re like, we don’t want to share a state because if every guarantee that everything is isolated. It’s easier to restart things and therefore it’s easier to have a system that can heal itself and be fault tolerant. So, when you are starting Airlink and it’s wisdoms, there are a lot of interesting design decisions and why they were done, which is why I really enjoy working with it.

Gavin Henry 00:09:22 I’ve never actually made the connection between immutable state and the supervisors or crash early and often because it doesn’t matter if you crash because the data’s not been changed. I’ve just kind of realized that that’s a really good point. Since you are, I just love saying this, the Chief Adoption Officer, if I was to chat to someone new to Elixir or yourself or chatting to someone new, what would be the top three things you recommend that I tell them to make them want to get excited and jump in, like your elevator pitch as it were?

JosÈ Valim 00:09:56 That’s something that I am currently rethinking, and I guess we’re going to jump into it. So for years, literally I think since 2014 or 2015, so eight potentially nine years, I had this go to talk about why Elixir, which is called Idioms for Building Distributed Fault Tolerant Applications. If you Google that, you’re going to find versions of this stock you’ll be, which you see how I aged through time because I’ve given it so many times. Every time I go to a non-Elixir event, that’s the talk I give. And the talk is mostly, like the elevator pitch is mostly, well it’s the virtual machine, right? It’s the problems about concurrency that we face today, how to write maintainable software. And now I am working on a new talk where Livebook is going to be the core of the talk and use that as a driving point to show everything that you can do with Elixir.

JosÈ Valim 00:10:57 Because one of the things that I think like my previous talk, it always resonated well with, maybe it’s a consequence of the events I was usually invited to, but it always resonated well with senior developers, right? So it’s, well this idea of thinking about concurrency, how you can speed up your tools, how you can reason about failures, sometimes it’s going to make sense once you have the, let’s say the battle scars, right? After you have spent hours overnight not sleeping but trying to understand why a system is failing. So I think that talk can resonate really well after you have some experience and you went through some of the pains and you want to rethink how you design and approach systems. But now that we are with Elixir, with Livebook now we have really a great platform for to start exploring and twink with ideas and we’re starting to have visual tools as well that are really accessible. You are in your Livebook.

Gavin Henry 00:12:00 That resonates completely with my journey because I’m senior as well and all the pain points are solved in the way that you just described. How you can run code snippets from the documentation, you can jump onto the IEX interactive thing to just run your own code or connect to a live deployment of Elixir and connect to the Beam and run the same functions that connect to the same database. All these things that you try to figure out before when you’re on younger and solve all these problems. It’s just, I think it’s not luck but it’s all just come together at the right time when everything’s kicking off on that type of ecosystem anyway. When people see Livebook, if they haven’t as well, that is just mind blowing as well. So let’s move on to Livebook. So what is Livebook?

JosÈ Valim 00:12:46 So Livebook is a computational notebook platform and that’s a fence way of saying well think if you have your IDE and a web browser and an interactive shell all integrated into a single place. So you have places where you can write some code and then you can execute it immediately. We call those places cells so we can execute a code cell immediately get the result. But because it’s running the browser you can actually get your Elixir code like plotting a chart or playing some audio and doing all kinds of interesting things we’re going to explore. And at the same time you can write some pros, some texts, some markdown documentation. So it’s this idea of code documentation and a rich environment all in one place and that opens up a lot of possibilities of things that you can do, build and explore. That’s it in a nutshell.

Gavin Henry 00:13:48 And is that different from something like CodePen or one of these places where you could run or like the Go for playground, for Go where you can run specific code? It’s much more than that, isn’t it?

JosÈ Valim 00:14:01 Yeah, so for those who are familiar with Jupiter Notebooks, it’s something it’s closer to Jupiter Notebooks but it’s not only a playground because the playground, it’s you think wow, it’s a single place, right? And then I change, I evaluate, I get the result. But you’re not really writing a document, you’re not really writing a story, a tutorial an application. It’s meant to be a scratch pad, it’s something that you’re going to fraud. But with Livebook it’s not necessarily meant to fraud, of course you can explore ideas and discard them, but we use that for documentation, writing applications and building things that are meant to be long-lived.

Gavin Henry 00:14:40 And it’s a standalone application that you can either connect to on or run it yourself on your own desktop or through or deploy it a normal web app. Is that right?

JosÈ Valim 00:14:54 Yeah, exactly. It’s a web app, it’s implemented as a web app but we know we have live boot desktop, so you just run that, we package that and you can run that in your machine but at the end of the day it’s a web app, it’s open source. So as any other web app there are many different things you could do with it.

Gavin Henry 00:15:11 Cool. Why was it created?

JosÈ Valim 00:15:13 Well so the story of why it was created, it’s related to machine learning. So I’ll try to make it very short, but we started as a fourth for machine learning and we can talk about why we started that and because Jupiter Notebooks and the biggest ecosystem for machine learning is the Python community. And we know that Jupiter Notebooks, they play a very important role in the machine learning community for Python. So at the beginning it was really, well what if we want to do machine learning in Elixir and Jupiter is important, what if we have the equivalent of Jupiter for Elixir? So we thought that as part of a machine learning journey, I was well this is going to be a required stop so we have to tackle it. And one of the things that we could do is that, so in Jupiter, which comes from Python, you can actually plug different languages. I was from the beginning say I wanted to roll our own and that’s for two reasons. So one of the reasons is that Elixir is good for web, for writing web applications and a couple years ago we announced Live View, which is a way of building rich interactive applications. We voluntarily having to write JavaScript but it’s all interactive and reactive and you can build really modern fence applications powered by Elixir. And I was well we have the tooling right to build this. This is what Elixir is.

Gavin Henry 00:16:35 In my view is part of the Phoenix web framework project that is written in Elixir, isn’t it?

JosÈ Valim 00:16:40 Yes, correct. So I was, we have the tooling and if you looked at how people are trying to improve Jupiter Notebook, so there are a bunch of commercial versions of Jupiter and so on and they’re trying to add features like collaboration. So for example, collaboration is something that is very easy to do with Phoenix, right? Because it was designed with this real time interactive aspect in mind and they was, okay that’s a problem they currently have and it’s a problem that we can solve. And then the other thing was that one of the main issues that people have with Jupiter Notebooks is that they’re not easy to reproduce. And there is a whole other conversation that we can have on this, but in a nutshell it’s if you have, imagine that in one cell, so imagine having one place say X equals one and then in the next cell, the next code that are going to evolve is X is equals X plus one.

JosÈ Valim 00:17:37 In Jupiter Notebook, it means every time you execute the second cell, the value of access is going to increment. Because the way that Jupiter Notebooks work is that you’re working against this state that is alongside the notebook and the problem with that is that imagine that you create a recipe, right? And you send to somebody, if somebody executes it from top to bottom, you are going to be like X equals one and X equals X plus one, right? So and then you can say oh the result at the end was two but if you give it to somebody else and they decided to execute the last cell two times, X is going to be three. So it means that every time somebody execute that Notebook, they may get a different result. And then we’re thinking well you know, this is because of mutability. Elixir is immutable like itís immutable by default.

JosÈ Valim 00:18:22 So we should not have those problems and if we do not have those problems we can rethink how we are going to approach the notebook. So this is something I think telling the story now it’s something, it’s very different compared to Elixir because with Elixir I was like I want to explore, I want to learn, I don’t necessarily want to create something new. And with Livebook from the very beginning it was clear that it was worth to create something new because the difference are so much that are so many that it’s worth exploring that new direction and it’s we’re going to get a new tree and once we shake it, a bunch of interesting things are going to fall off. And we were excited about exploring that. So that’s kind of how we start with Livebook. And at the beginning I would not be able to describe 20% of the features that Livebook has today. But that was the idea, right? We knew if we plant this seed is interesting things are going to happen and we decided to do it.

Gavin Henry 00:19:21 Just to revisit the mutability thing there because X was assigned a value at the start of your recipe, you would need to take that and assign it to a new value to change it because X will always be the same. That that’s the point there, isn’t it?

JosÈ Valim 00:19:36 So yes and no. So in Elixir that’s something that it often can be confusing? In Elixir the values they’re immutable. So if, if you think of a value, it’s something that is in memory and the thing that is in memory is immutable, but the values you can change to where the variable is going to point to. So in Elixir I can say list is equal a list with 1, 2, 3. And then I can say list is equal 4, 5, 6. I’m just changing to where the variable points, and this is a, is a transformation that only happens in the program, but when it comes to the memory, nothing’s being changed or manipulated in memory. So the difference between Python and Elixir here is that Python, when you change X, it changes every time you change X you can think each cell, it gets the all day state in the notebook and then it changes these things and it dumps somewhere.

JosÈ Valim 00:20:37 So every time you do X plus one, it goes to this global place to say hey, what is the value of X? And then say oh it’s true, let’s increment one, it’s three. And it puts again back in this global place where in Elixir everything is linear. So if the value of X before you execute a piece of code was one, at the end, the value can be two. But if we execute that piece of code again, we don’t get the value from this global pace place that is shared, we just go back to the value of X before the cell. So it’s a mathematical formula, right? So we have the inputs, we have the outputs, and in this case the input to a cell is not going to change unless you change things before the cell. I don’t know if I can explain if it makes sense. Yeah,

Gavin Henry 00:21:23 No that’s perfect because that reinforces the view when you’re debugging immutable code you don’t have to try and figure out what that value changed to because it’s not going to change.

JosÈ Valim 00:21:33 And I think that’s, it’s a good example, right? Which is, if you’re debugging code, if it was immutable, so imagine that you have X equals one and X equals X plus one. If you put a debugging between those two lines in Elixir, the value of X is always going to be one, right? Because even if you change the value of X later, it’s not going to affect the value of X before, right? And I think that’s the difference, right? It’s very linear.

Gavin Henry 00:22:02 Yeah, I think in some of the examples in books and documentation, that’s when they usually introduce recursion so they can show you how to increment a variable by returning a different value each time. Is that right?

JosÈ Valim 00:22:14 So in Elixir when we use recursion is, well there are many reasons why we use recursion. So one of the ways that you can think about it is, we have the data structures and in order to traverse those data structures you can use recursions. But when it comes to a state, so if you think well in Elixir, if nothing is, I donít know if I can explain this for audio, but if nothing is mutable, if everything is immutable, right? How do you create state? So imagine that somewhere in your web application you want to store how many users are connected at the same time, right? So that thing needs to be, needs to be something, needs to be shared state because every time somewhere joins you need to say, hey I want you increment this by one, every time somebody leaves, you want to decrement and you want to be able to increment or decrement this from anywhere you want, right?

JosÈ Valim 00:23:02 So it needs to be stated, it needs to be shared somehow in Elixir, the way you would achieve this is using recursion. So we create this idea of this thing called processes and those processes they can send and receive messages. So what they do is that they have, so a counter would work something like this, you have a function that you call it saying we have a function called loop. And then you start with a value of zero, loop is zero, which is your current counter and then you’re going to receive messages. If you receive the message of increment, you’re going to call the same function, you’re going to call the same loop function but increment in the state by one. And if you receive a message of decrement, you’re going to decrement one from the state and you can receive a GAT message, which is, and then when you receive a GAT message, you are going to send a message back to the person who sent you the message and say–hey, you wanted to get the value of the counter, here is the value back. So the discussion of recursion when we are thinking about immutability and mutability in Elixir is because we use recursion and message passing as the way to emulate state, right? So there is no mutable state per se. We have functions with their state and then you can send message and back. But again you’re never really mutating the counter right? Again, it’s never the acts of value is like it’s changing under your feet. You’re just pointing to different things as you move forward.

Gavin Henry 00:24:28 Thank you, that was a perfect explanation. That helped me as well. So hope it helps the listeners. So we’ll just finish up the Livebook section with two more questions before we get into machine learning in general with Elixir. So Livebook was created because you want to be able to support machine learning and Elixir and for a way for users, data scientists, programmers to explore data through Livebook. Is that correct?

JosÈ Valim 00:24:54 Yes, that’s how it started.

Gavin Henry 00:24:56 Okay. And this is just a question from me. How does it save the code you’re working on? How does that work?

JosÈ Valim 00:25:03 And that’s also another, going back to the Jupiter conversation. So one of the positive and negative things about being widely adopted as Jupiter is that you have plenty of materials and plenty of experience of people saying, Hey, , this should have been done differently or this needs to improve and for us to get started means we can consume all this material as well and learn the lessons that Jupiter learned for time and it’s their documented for us to learn. And one of the things Jupiter is that, so if you think of a notebook, if you think of a document, right, how it’s going to be shared. Some things, they are opaque. If you have a Word document or an Excel document and you try to open it in an editor, it is just going to be blobs.

JosÈ Valim 00:25:49 I think Excel for sure, maybe docs depending one of the formats not, but it’s a blob, you can’t do anything with it. And that was one of the things that people, a Jupiter Notebook, you cannot open it in disc and this has problems that doing a code review for example can be hard because you cannot just submit a change and have somebody review that. Today they are tools, GitHub to improve that, but it’s not plain text. And for us it was, we wanted to tackle this problem. So Livebook notebooks, they say that they are live markdown files and it’s a subset of markdown. We restrict a little bit the markdown that you have there. So it has a very specific structure and that’s it. So, you can open it up, you can review it on GitHub, drop some comments and all this kind of stuff.

Gavin Henry 00:26:36 Oh excellent. So you could see your Elixit programming in the markdown document. Perfect. Okay, we’re about halfway through now, so I’m going to move us on to all the exciting progress in the machine learning world for Elixir. So I’ve got another question here, which we’re struggling to do at the moment. In your own words, what is machine learning?

JosÈ Valim 00:26:56 Yeah, so one of the ways that I saw machine earning, and I again, I don’t know if it’s good, but one of the ways that, or if I’m recollecting properly, one of the ways that I saw it described is machine learning is about defining models to execute certain tasks that it would be impossible or very error prone for us to define or declare those tasks manually, right? So for example, think about a chess game sure they are the rules of chess and we can describe, but imagine that you want to build a model that wins a chess game. It’ll be very hard for us to encode those rules in software, right? using conditionals or whatever, right? Oh, there are so many states, so many combinations. So the idea of machine learning is how can I define a model? Something that I can train that can learn from based on something, some data, some training set and then execute that task for us without us explicitly describing or declaring what we wanted to do. I guess that’s one possible interpretation.

Gavin Henry 00:28:06 Yeah, that that’s good enough for me. I mean in Sean’s book he says exactly that once you get to a point where you’re doing rules for some type of system and there’s so many edge cases that you then move to machine learning and either building a model or finding the right model and he says the model, I’ll put a link to the book for everyone in this. He says a model is just another name for the algorithm that you choose to crunch the data set, which is the NX data structure, which we’ll get to. I think just to summarize, why is Elixir a suitable programming language for machine learning? I think we’ve touched upon it on the past half an hour or so.

JosÈ Valim 00:28:46 Yeah, so interestingly when I started with Elixir Darling machine was not necessarily a good candidate for machine learning. But if you look at machine learning in Python for example, a lot of it’s powered by libraries in C, C++ I think some in far trained. So it’s all native code and it all, so a lot of people say, one of the merits in Python for machine learning, it’s how Python can be such a good glue language between , the high level Python software and the low level code NC and in the early machine because, so there are things that we did not talk about how, so I kind of mentioned that we write our code that runs inside those processes that they send and receive messages and we can kind of pierce together that those processes, they’re all isolated, they don’t share anything, right?

JosÈ Valim 00:29:41 They can only exchange messages and they are very cheap, they’re very lightweight and we can run millions of those at the same time. And the airlink virtual machine, it’s also a preemptive which means that those processes there is, it’s not possible for one of those processes to starve the system to do so much work that the rest of the system it cannot respond because that thing is busy and in order for this to happen it’s essential that if you have an infinite loop for example, if you have anything that it’s very expensive, it’s essential for that thing to, if it has to do a lot of work for it to say–hey, I’m going to do some quota of work and then I’m going to stop. And then the virtual machine will check if anything else needs to do some work and if something else needs to do work, those other things are going to do the work.

JosÈ Valim 00:30:32 And then you come back. So you are preempting everybody gets some amount of work and then eventually everybody’s going to get to, to the end of their journey is going to finish the work that they have to do. And the issue with calling C code was that well if I just go and call a C code that is going to do some very expensive 3-D or 4-D matrix multiplication, then that thing could take a lot of time and that can starve the rest of the system. So almost 10 years ago, in order to, for you to, if you wanted to integrate with native code in Airlink you had to write the native code, you had to write the native code to say look, I’m going to do just some amount of work and then I’m going to yield back to their virtual machine just to see if something else needs to run and then do a little bit more.

JosÈ Valim 00:31:19 So it had to do all this yielding. And of course depending on algorithms it may be hard to rewrite, imagine we, if we had to rewrite all the machine learning algorithms and computations to work based on those ideas, it would be a lot of work, a lot of time spent. So they introduced about eight, nine years ago this idea of called, so when you have to call C code in their language say it’s a Native Implemented Function, we call it NIF. So it’s native code, low level code native implemented function and they introduce this idea of dirty NIFs and the idea of a dirty NIF is that you say if that dirty NIF is going to do a bunch of CPU work or if it’s going to do a bunch of IO work and we have separate threads that those threads can run and you customize how many threads you want to have.

JosÈ Valim 00:32:08 So you, you give separate threads to run this dirty work but that dirty work is not going to run together with your Elixir code. So it’s running on a separate operating system thread. So they introduced this whole abstraction of dirt needs and so basically they added this way of how we can now integrate native code into the airline virtual machine. And once we had that, if you say if you compare Alex with Python, it’s well now we can integrate native code so we can bring those, all those algorithms, those libraries for machine learning. That’s exactly what we did. So for the machine learning in Elixir, it is not we are starting from all the low level from scratch. We are using Google XLA which is an accelerated linear algebra library from Google, which Google uses for TensorFlow, it’s the same library.

JosÈ Valim 00:32:59 We are just, we wrote bindings, we wrote bindings for that and we build on top of that. We also have bindings for Deep torch, which is what the PyTorch library from Facebook, right? Another big player machine learning it’s bindings to the low-level library that the PyTorch uses as well. So that’s how we started with the machine learning and it was mostly well, technically we can do it. And it was, we talk about Sean Moriarty and his excellent machine learning book and before he wrote the machine learning book he wrote a genetic algorithms in Elixir book. I say this is the book that changed my life but I never read it. And the reason I say that is because as soon as that book was published, I think on the first or the day after I sent an email to Sean, I was like, this is very exciting. I was always interested in having more machine learning stuff in Elixir. Let’s build something together. And that’s when we got, oh we can build something together. Oh we can use the same native libraries that Python uses. And we started slowly building this whole infrastructure.

Gavin Henry 00:34:05 That wasn’t out very long ago. Was it 2021? You published that book? I’ll put it in the show notes as well. Yeah, I’ve actually got it. It’s a good one. Okay so that was a lovely description and history lesson of how you got to the point of Elixir being suitable. So as I understand it, and I hope you’re going to explain, there’s five elements to the machine learning Elixir ecosystem and that would be Nx, school Explorer, Axon and Bumblebee. Could you take us through that or confirm, I understand that correctly?

JosÈ Valim 00:34:41 Yeah, yeah perfect. So I think there are two building blocks. One it’s Nx and Nx stands for Numerical Elixir. And what it does is that the main abstractions there is something that we call a Tensor. So a Tensor is a multi-dimensional data structure. As programmers we are familiar with lists or vectors and those are one dimensional, right? We’re also familiar with matrices, right? Which are 3D, right? So you have two dimensions in there. And then so a Tensor can represent things that are one dimension, they are two dimensional. They can also represent things that are three or four dimensions. So for example, something that is three dimensional for a computer, it’s an image because you have the height, you have the width and then images, they have the RGB layers, right? The red, green and blue layers, each of them get a byte to represent depending on the image format, right?

JosÈ Valim 00:35:39 But they get a byte to represent how much red’s there, how much green is there, how much blue is there. You have some other images which are RGBA, but yeah, so you have three dimensions, right? You have width, you have height, and then you have another dimension which is to have the channels, right? RGB or RGBA and so on. So that’s a dimensional, A video is going to have another dimension on that because you can think of a video, so it’s going to be 4D because you can have, can think of a video to be a batch of images, right? So Tensor allow us to represent that, right? To represent all those things and depending what you want to do. So even if you’re thinking simply about it, well if I have an image, how do you rotate an image, right? Or how do I rotate a video?

JosÈ Valim 00:36:21 How do I convert a video to black and white, right? If you think about all those things, we can express those as transformations on top of this Tensor. And Nx provides a bunch of low-level primitives, not really low level by initial low-level primitives to this operation, right? To do all those things. So we can multiply matrices, we can invert, we can add layers, remove layers, there are a bunch of things that you can do and Nx does it. Nx does a lot more. So if somebody’s listening to this episode, they’re coming from Python, Nx can also, we had this idea of serving Tensors, we had this idea of numerical definitions. We talk all about that in the project with, but in nutshell that’s what the Nx does. It defines those Tensors.

Gavin Henry 00:37:05 So that’s the lowest level.

JosÈ Valim 00:37:07 Yes, that’s almost everything built on top of that. So for example, one of the things that build on top of that is a project called Scholar. So now when we talk about machine learning, a lot of people they think about neural networks, right? Or deep learning. That’s what people are thinking about. So we have a library for that called Axon, but we also have a library called Scholar which is more about the traditional machine learning. And now we have algorithms for example clustering. Imagine you have some data, right? And you can distribute this data in a space and you can say–hey can I break this data into cluster, find things that are related to each other or imagine that you have some points you want to do. Like you want to find a polynomial curve that is going to fit those points as close as possible.

JosÈ Valim 00:37:53 Or you can do a linear regression. So Scholar is a traditional machine learning library. That’s what it has in there and it’s built on top of Nx. And the other cool thing about Nx is that all the code that you write, you can run it either all the code can be compiled just in time to run on your CPU and that can make it much, much faster. Or if you have a GPU can also compile that code to run on the GPU as well. So everything that I’m saying about Scholar, it means that it can run on the CPU or on the GPU as well. And then Axon is also about machine learning but it’s about deep learning in particular. So it’s about the abstractions for building neural network. So neural networks, we break them into, they have a bunch of layers, each layer has different capabilities, it adds different features.

JosÈ Valim 00:38:44 So you need a whole library to define those layers. If you want to train a neural network, you need to figure out how you’re going to train a neural network as well. And Axon is a library that takes care of that. So it’s the building blocks for building a neural network. If you think about a neural network as a bunch of layers stacked on top of each other, Axon is going to define those layers and all you have to do is to connect those layers together. And that’s going to give us deep learning and neural networks in Elixir and it’s built again on top of Nx. And then Bumblebee comes on top of that. And what Bumblebee has that, well we have a bunch of predefined existing neural networks today, right? And some they’re very popular, right? Some architect, we often call them architectures.

JosÈ Valim 00:39:30 So, like GPT is very familiar because of Chat GPT, we have Stable Diffusion, we have Whisper which is for speech to text. There are a bunch of existing architectures, existing neural networks and often what people do is that because the most expensive thing is training a neural network. It needs a really large amount of data and sometimes people they spend hundreds of thousands of dollars to train your network. And there are people who train those neural networks and they make the result of the training, which we call are the model weights. They make it open-source, they make it available, maybe not open-source, the best way to describe it, but it’s available and you can upload it and you can use it. So for example, well maybe it’s a good time to say this. So if you’re excited about trying those things out, you can install Livebook on your machine.

JosÈ Valim 00:40:19 We have this thing called Smart Cells that we can go back and talk about them, but you start a new notebook, click on this smart cell button and say–hey I want to have a neural network. You click that button as well say I want a speech to text neural network and it’s going to configure the neural network for you. So you can start running a speech to text neural network with Elixir on your computer and give it a try. And again this happens because we have implemented those models in Elixir and because somebody trained this neural network made the model way, it’s available and we downloaded to our machine and now you can run that thing and I can say hello darkness, my old friend that is going to give me the text back; hello darkness, my old friend and everything else. And this part where we build existing models and we can load the weights of the model to your machine so you can run it.

JosÈ Valim 00:41:11 That’s the Bumblebee project, right? And that’s built on top of Axon. So to sum up so far, we have Nx which is the building block, and then we have a Scholar which is all about traditional machine learning. And then we have Axon which is about deep learning about neural networks. And Bumblebees gets the framework, right? So if you’re coming from web, you can think get Phoenix or Raoís (?) or in Jungle, right? Imagine that you have a web framework that’s going to be Axon, but imagine that we had a bunch of ready web applications all developed for you, for you to try out and run that and that would be Bumblebee. It’s basically a bunch of existing models that you can, they already work, they have the parameters, you can just get them try out, deploy to production and start running them. Those cover four of this. I’m going to use this opportunity to have a break before I go forward.

Gavin Henry 00:42:07 Iím going to go through an example on pull apart and summarize everything we’ve just talked about. Because there’s a lot of new terms there. So Nx is the fundamental different types of data structures that we need to get the data we’re analyzing into some type of format that fits Nx, that’s the bottom layer. Scholar is our traditional machine learning. So I would understand that to be, you need to know what you’re looking for, whether it’s polynomial or linear on the other terms you said. So you need to know what you’re looking for in order to use that. Axon you nicely explained is the framework that you’re using which brings you into neural networks or deep learning, which will need to take a step back and to find the difference between or the complimentary parts of machine learning and neural networks. And then Bumblebee is kind of the nice go and grab this existing model or algorithm from somewhere like Huggingbear, that host these types of things download it and apply that to our data set or the data we’re interested in. Is that about right?

JosÈ Valim 00:43:12 Yeah, just with the correction Huggingface.

Gavin Henry 00:43:14 Huggingface, I’d put Huggingbear in here.

JosÈ Valim 00:43:16 Yeah. No it would also be a great name Huggingbear. But yeah, it’s Huggingface and for those who are not familiar, Iíd like to say, I donít know if they like the description, but I like to say it’s the GitHub for machine learning and for AI and neural network. So all the models are there with the parameters and you can deploy and run and give things a try. I really enjoy everything that they have been contributing to the machine learning ecosystem.

Gavin Henry 00:43:44 And your neural network, is that a bunch of computers or how do we define that?

JosÈ Valim 00:43:51 So it doesn’t have to be a bunch of computers. So a neural network, it’s a model and I think, I’m not sure exactly how I would say what exactly is a neural network, but it’s a machine learning model and I think if it has some specific layers that would describe them as a neural network, but I’m not sure what is the essential bit of everything that a neural network has that makes it a neural network. So for example in Scholar we have models why those things? They’re not neural network. I’m not a hundred percent sure what is the thing that makes it different.

Gavin Henry 00:44:27 Okay, I get the feeling that’s going to be another show. It’s going into deep learning and neural networks.

JosÈ Valim 00:44:32 Yeah, Sean would definitely be ready to answer those questions.

Gavin Henry 00:44:36 Yeah I think we could do a deep dive on that.

JosÈ Valim 00:44:37 There is one, I think just as a parenthesis here, I think one of the cool things about the neural networks is that you define the layers that compose of the neural network, and it uses because a neural network has two parts, right? We say it’s the training part where you have to train a neural network and I said that’s expensive and there’s the inference parts that after it’s trained you say, hey, convert this speech to text or tell me if there is a dog or a horse in this picture, right? That’s the inference thing. And again, I don’t know if this is what makes a neural network a neural network, but one of the interesting things that the training, the code is derived from how you stack the layers together. You don’t have to say, here’s how I train it. You just assemble the layers together and then we use mathematical properties should derive how we’re going to train the neural network and maybe that’s what makes a neural network an actual neural network. But I’m not sure.

Gavin Henry 00:45:35 Okay, that’s cool. We did a show actually in February, Episode 249 with William Falcon on Optimizing Deep Learning Models and that goes in depth into PyTorch and PyTorch Lightning and things you mentioned. So reference that for the listeners. Okay, let’s try if we can in the last 10 minutes or so to come up with an example that uses everything. So my interest at the moment is fraud detection, whether that is in telecoms world or API abuse or a bank transaction that’s fraud. Would you be able to take us through if possible, where you start with that you’ve got a bunch of API logs or events or call data records or phone calls that would be your raw data that you’ve kind of got to find a model for get into shape. How does that all that work?

JosÈ Valim 00:46:31 I have no idea. .

Gavin Henry 00:46:33 , Good answer.

JosÈ Valim 00:46:35 Yeah, as I said, I am working on Nx and sometimes Sean is, hey this neuwal network is slow or it doesn’t compile, I don’t know what it does, I don’t understand it. But if gives me a way to reproduce it. Like that I can run it on my machine. I don’t actually need to care about what it does and how it’s stitched together. I can go there and I can optimize it. I can figure out ways of making it efficient and improve it and even help come up with the abstractions. But when it comes to using them for using them to solve the problems, I don’t know.

Gavin Henry 00:47:16 What about Bumblebee? We go and get a model. Why are we actually running that against or on your data?

JosÈ Valim 00:47:23 What do you mean?

Gavin Henry 00:47:23 When you’re downloading this model that fits your use case, are you inputting a bunch of CSV records or when you’re using Livebook?

JosÈ Valim 00:47:32 Yeah, so going back to the ecosystem, right? You said there were five and there is one that I did not cover, which is Explorer. Yeah, yeah, that’s the fifth one. So everything we have talking about Nx, Bumblebee, Axon, Scholar, those things when we think about Tensors, they’re all zeros and ones, they can only represent numbers in them, right? And so it doesn’t really know about words, right? So you may say, okay, so how can I do spam detection or how can I do a speech to text if the thing doesn’t know about words? And so for example, we have something before which is called a tokenizer. And what it does is that it goes through the text for example, you also have to train, but it goes through the text and sayóhey, this token here is going to get the number one these other token is.

JosÈ Valim 00:48:26 So we get hello, it says look the hell thing maybe or HA is going to get the number one low gets the number two and you split everything in tokens, spaces a token and you give the numbers, right? But that’s not all that’s not all the data that I have. Not only strings, you also have daytimes, right? Which maybe we can treat them as a number, but maybe they encode different properties in there that can be interesting. Maybe the interesting thing of a daytime is not really the date part but the time depending what you want to do, right? So for example, maybe fraud detection is more common at midnight, right? So it doesn’t matter the date, it matters that that it’s midnight, right? So data is richer, right? It’s not only zeros and ones and we need to be able to get this data from ACSV file from our database somewhere.

JosÈ Valim 00:49:15 We need to be able to get this data and to message it, right? To transform it. So eventually we can transform it enough that it becomes zeros and ones in a way that we can feed into those Tensors, right? So that’s what the Explorer project does, right? So if you’re thinking about, so let’s try to put everything together. So Explorer is, you can think about it as an Excel or as a table, a database table, right? It’s columns that have a type and a bunch of rows and you want to process those columns in a very efficient way as fast as you can, right? Because a lot of the issues with machine learning, it’s about massaging the data. So if you’re using Livebook, your workflow would be something, well, I’m going to open up Livebook, I’m going to bring the CSV into Livebook and I’m going to use Explorer to part that CSV into meaningful data structure.

JosÈ Valim 00:50:14 And then from there, a data engineer or a data scientist is going to try to explore a little bit the data, try to understand some parts, maybe the data is dirty, maybe sometimes the numbers are less the names, maybe the name is important. Sometimes the names are last name comma, first name, sometimes first name, last name, maybe the dates they’re not necessarily correct. Sometimes with time zones, sometimes with vault, you may want to do a lot of messaging on that data. And then eventually after you message that data, it’s in a state, it’s good. Maybe there’s an existing machine learning model that you can try. And that existing machine learning model would be trained on data of some sort. Say look, the person who trained that machine learning model massage the data in this way. And then you massage the same way, give it to the machine learning model.

JosÈ Valim 00:51:02 The machine learning model is going to tell you if it’s fraud or not. Or maybe you need to get an existing model, you have to train it. And then if you have to train, you get all of your data for example. So you need somebody to go through this data and say–hey, this was fraud, this was not fraud. You need to label the data. And then you define this data that they’re going to use to train and test the model. And then there are a bunch of techniques of things that you can do here. But then what you do is that if you know what is fraud, what is not, you use that to train the model. You use that to test the model to see if the model is accurate because you may not want to have false negatives or false positives. And after you go through this whole process, several tries, mostly likely the hope is that they’re going to arrive to a model, they’re going to be happy to run in production. And our tools are going to help with that part as well and give a bunch of options on how you want to deploy those models, how you want to scale it in the same machine, different machines, one GPU, multiple GPUs and so on. That’s kind of the general overview of the problem.

Gavin Henry 00:52:04 I’m trying not to laugh here because for someone that just said they had no idea, that was a perfect explanation of how it all works, , because that’s how I had it in my head and you explained it perfectly. So thank you. So just to summarize, Explorer pulls apart the data you’re ingested depending on your business logic and your skillset as a data scientist or whatever you’re trying to pull apart, what’s important from that data to you and why. And then you get it into the shape and form to either run against existing models or create your own, which ends up in Nx in a GPU or CPU.

JosÈ Valim 00:52:37 Yeah, you asked me to paint Mona Lisa and what I said is that it has a mouth, eyes, nose.

Gavin Henry 00:52:43 . That’s right.

JosÈ Valim 00:52:44 Sounds just the rest is up to you. ,

Gavin Henry 00:52:48 Do you have any idea of how we build a model versus an off-the-shelf one from Huggingface? Is that something this can help you with? The Machine Learning ecosystem and Elixir?

JosÈ Valim 00:52:59 Yeah, so usually you build a model, you would get Axon which has the layers and the models they would say, oh, for image recognition is going to be something, oh you’re going to have a dense layer and then you have like convolution layer and then you would stack the layers together. That’s part one of defining a model will be sticking those layers together. Right? But then you have to train it, right? Which is the tricky part and that’s going to give you the model weights that you would be able to run in production as well. One of the things that is also worth saying for people who are not familiar with machine learning is that you can also fine tune. So for example, you can get a general model for fraud detection, but then it may not be super accurate on your data, but it may be more efficient both in terms of costs or even more accurate for you to get instead of training everything from scratch for you to tune that model on your data.

JosÈ Valim 00:54:00 So you don’t need as much data as before if you’re just going to fine tune something that exists and that’s also an option. But yeah, I assume like most people they’re not necessarily, so let me try to recap. So it’s three parts, right? So one part is what layers do you have and how you’re going to stack those layers together. That’s one part of the problem, which is basically done by research. Researchers around the world, they are working on those things, thinking about new layers and how to stack them. But after somebody publishes a paper like how to do image recognition, how to do fraud detection, right? They say how the layer should be stacked together and you use Axon to stack those layers together, right? And then somebody has to train to get the parameters.

Gavin Henry 00:54:45 And do the labeling.

JosÈ Valim 00:54:46 Yeah, labeling is part of the training data. And then after you have all those things, you can give it an input like some data that you got from CSV or some text if you’re doing spam detection. So you’re going to have your input and you’re going to have your output, which can also be different kinds, right? It can be a bullying thing, oh this is spam, that’s not a spam, but if you’re doing image recognition or image classification, right? It’s a dog, it’s a cat it may have all different kinds of things that it’s going to say. And the interesting thing is that it doesn’t necessarily say it’s a dog or a cat for example, or a horse or a hot dog, right? It just says well I am 60% sure it’s a dog, 10% sure it’s a cat,1% sure it’s a hot dog, right? And it just gives you the percentage, it’s going to be one of those and then you have to decide if that’s good enough, right? Or if it’s ambiguous and you don’t want to make a decision and so on.

Gavin Henry 00:55:42 I suppose that’d be your example before where if it’s a poorly trained model, you might show a picture of a lady and it says yeah, 60% sure this is the Mona Lisa. But as you fine tune it and say you need to be looking for this, this and this, it would find it. Yeah.

JosÈ Valim 00:55:59 That could be one of the things. So again, my understanding I’m not a 100% sure is that, so for example, imagine that you want to, you have a very specific use, you want to identify different breeds of dog or imagine that you work with tooling very specific tooling where, maybe where you’re working for us, if we see three different kinds of hammer and I would say it’s a hammer, right? But depending on what they’re working no this is a hammer with this square hat that I use for this. This is a hammer for this and this, right? Some people may have specific uses, you get an existing model that already does the image recognition and then you fine tune it to teach you about the specializations. That’s one way to go about it.

Gavin Henry 00:56:42 Yeah, I think that’s called assisted learning, isn’t it?

JosÈ Valim 00:56:46 I’m not sure because there are so many terms related to that. I’ve heard like transfer learning, I know there is fine tuning, I know there’s the reinforcement learning which is something else, but there are so many terms that I’m not exactly sure what makes one or the other. What are the differences between them?

Gavin Henry 00:57:04 It reminds me of when Spam Assassin first came out and as a system administrator I used to spend tons of time dumping folders of spam into spam assassin and training it again. You had to constantly keep feeding the new changes. Okay. I think that was pretty much at the end of the show.

JosÈ Valim 00:57:23 Yeah, just one thing because I thought that came because you mentioned spam, it’s, I think it’s worth also mentioning that these spam filters, they got really popular before deep learning became a thing, right? Which I think points back to a Scholar, right? the way that people were doing spam filtering I believe before deep learning is that they were using tools like Scholar, they’re using traditional machine learning tools and one of the advantages that is that usually they are much cheaper to run, they don’t require that many resources. So it’s something that depending on what you’re doing maybe you should consider, oh traditional machine learning is fine, I don’t need the deep learning one and it’s something we’re thinking about.

Gavin Henry 00:58:07 Yeah. Thank you. So what’s next for you in the machine learning world? What are you currently staying up late doing or, looking forward to doing in the morning?

JosÈ Valim 00:58:17 Yeah, so this is something that I want to announce I want to publish within the next month. I think for when we started this machine learning stuff, which was about three years and a half ago, I had a huge list of things to do, really massive list of things should do and learn and explore because you are at that point in your journey that you need to know what to do and what not to do. You say look, it’s not worth investing time in this particular area. This is where we should focus on. And I had a huge list and we are at a point where we still have a lot of work to do of course, but the core, it’s of the things in the list, it’s all tackled. So now we are, we’ll continue to focus on improving what is there, adding new capabilities rather than adding a new foundational a block, we said there five blocks.

JosÈ Valim 00:59:16 They’re not necessarily thinking about the six or the seven of block the blocks they’re in place and now it’s about continue to improve them and that’s pretty much our plan. We just continue improving, adding more features to Nx or to Bumblebee implementing more models and as companies and teams that are starting to run those things in production, I’m sure we’re going to get more and more feedback that we are going to take and continue to improving the tooling. So I’m really excited and proud that we are getting out of this discover loop and now getting to this cycle where we continue improving rather than starting new foundations all the time.

Gavin Henry 00:59:57 Thank you. I think we did a great job of covering why anyone should invest time in the Elixir world for their next machine learning project. But if there was one thing youíd like a software engineer has to remember from our show, what would it to be?

JosÈ Valim 01:00:11 Yeah, no that’s a good question. I’ll have to think about it.

Gavin Henry 01:00:15 Was there anything that we missed that I forgot to mention?

JosÈ Valim 01:00:19 I think through the description, hopefully folks got the idea of all the exciting things that you can do with Elixir, right? As I was saying, you can start Livebook and you can start running a machine learning model and from Livebook you can kind of build a chat app where you can send audio and it’s going to convert that to text. You can send an image, it’s going to classify that image, going to generate text. There are all those crazy things that you can do and have those things interact with your Elixir code base in all interesting ways. Plot graphs transform data. So that’s kind of like, that’s where I want to think, what Elixir is for today. I’m thinking, well I want it to be this tool where you can kind of throw any problem at it and you are going to have a great time tackling that problem. And we are going to have good abstractions for solving that problem. That’s where I hope we are going towards.

Gavin Henry 01:01:20 Thanks. Well for me, I’m going to have a go of what people should remember from the show and this is how I think of it. Elixir is not just a program and language is a whole ecosystem. I think that’s where it’s got to today, over the past 10 years. I think people just think of it as a programing language. But when you’re actually using it day to day and you’ve got the whole tool chain ecosystem, places to reach out, communities to join, it’s, it’s not just a programming language. So where can people contact you or, reach out to Chief Adoption Officer. I just love that. Where can they reach out to you?

JosÈ Valim 01:01:54 Yeah, so best places. So if it’s something regarding, let’s say, I don’t want to say Enterprise, but if it’s something related with business company, go, our website and you can reach out to me there. We are a very small company. That’s why I didn’t want to call CEO because I always find it funny when it’s, oh, I’m the CEO of a three person company . And I was, I don’t want to call myself a CTO because we’re not really running any technology. We do open-source, right? So I felt, oh, Chief Adoption Officer, or I was thinking Chief Open-Source Officer, but that doesn’t make a lot of sense. So I was Chief Adoption Officer. But yeah, it can be there. But otherwise you can reach out to me through the or through GitHub. That’s where I am most active.

Gavin Henry 01:02:44 Well, thank you for coming on the show. It’s been a real pleasure. This is Gavin Henry for Software Engineering Radio. Thank you for listening.

JosÈ Valim 01:02:50 Thanks for having me.

[End of Audio]

Join the discussion

More from this show