|
In this Episode we're talking about Erlang with its creator Joe Armstrong. We started by looking at the history of the Erlang language and why it is so relevant today. We then looked at Joe's approach to Concurrency Oriented Programming and its main ingredients: share nothing, lightweight concurrency and pure message passing. We also compared this to the classic shared memory approach to concurrency. We then looked at other interesting aspects of Erlang, such as its functional nature (and why this is important to concurrency) and pattern matching. Next we discussed how to implement distribution and fault tolerance, and we took a look at OTP, the "application server" for Erlang. We concluded the conversation with a littel discussion about how Erlang was designed, it's current community as well as its future. TranscriptThis time we are talking to Joe Armstrong, I am always tempted to say Joe Erlang because he is the Mister Erlang, I guess. So, welcome Joe. Thank you. Before we actually dive into the topic, I just like to say thank you to Trifork, the organizers of JAOO, because they gave us this room inside the Trifork offices, so we have a quiet spot to do the interview, which is always useful. So Joe, why don’t you give us a little brief overview of what Erlang actually is, how it was created and why anybody cares today, what is the big deal? Well, what is Erlang? In a nutshell I suppose, today would be its parallel programming without pain. Parallel programming is perceived as being something, which is rather difficult. And I think this is due to not the problem of parallel programming in itself but the way that people perform parallel programming. There are basically two models of parallel programming, one involves the idea of shared memory, which is protected by mutexes and locks, and that method of programming is extremely difficult. Erlang takes a completely different approach and uses what is called a pure message passing paradigm. So, this no shared data, and in Erlang everything is represented by process, process is communicated by message passing, which makes concurrent programming a lot easier, so, the message is that Erlang is a language, which makes concurrent programming painless. Okay, and processes are obviously not the same as operating system processes, it is very lightweight. No, they are very lightweight objects, they, in Erlang processes belong to the programming language and are not part of the operating system. There is this general idea of actors, I guess, that is close. Yeah. Okay. So, when was Erlang created, what was the context in which it was, and what kind of systems was it built for? It is difficult to put a first date to it because there was never a project to make a new programming language. But the work that it derives from started in 1986, and it was done in the computer science lab at Ericsson. At that time, we had the problem of just really trying to find a better way of programming telephony and of programming big switches, which is Ericsson’s call business. And we just started tinkering around with different programming languages to see how we could program telephony. Now, telephony is a problem way where hundreds of thousands of people are all connected to the same switch, all doing independent things. And so the problem is very naturally concurrent, and inside Ericsson, there was a strong tradition of building concurrent systems, which were also fault-tolerant. The other aspect of the problem with is that unlike desktop systems and commercial systems, telephony systems have to be upgraded without taking them out of service. So, I think when we started we were looking at a way of handling massive concurrency, hundreds of thousands of people being controlled by one switch. We also wanted these systems to basically run forever because if we needed to change the code we have to do this without stopping the system. So, about 1986 we started looking at how to do this. And I think the first version what you could call Erlang was in 1987. And this I wrote in Prolog. Well, I started with Prolog, well, I really liked Prolog. Right, one new weird language, old even weirder language. Well, that is not a weird language but it is a very beautiful language. I cannot really understand why everyone doesn’t use it. Well, Prolog is not very good to certain things, it is not very good to doing concurrency. So, all I aimed was that concurrency to Prolog and searching of lots of Prolog processes all running at the same time. Of course, this is extraordinary inefficient but the programming model was very attractive. So, that is how it started. The focus is on concurrency. So, the reason why it is becoming popular, hyped, relevant these days is obviously the multicore stuff, right? Yeah, I think really, okay, Erlang is a language, which is 20 years old and for about 18 of those years it was just being used internally inside Ericsson. So, nobody was really interested in it, a few devotees in the outside world used it. But it did not have any wide spread popularity. I think when you have concurrency or threads or anything like that which run on a sequential processor, you do not actually have concurrency, you have multi-scheduling, the operating system schedules between different processes, and that is not real concurrency, so, it is not a benefit. Actually your programs will go slower if they are multi-threaded because the extra overhead of scheduling. But when you have a true multiprocessor, things actually start going faster, and that is a radical change. So, in your presentation you call this a real significant paradigm change that needs to have changed another, a better approach to actual paradigms. Yes absolutely. I think, the other thing that happened was from about 1986 or 1985 to 2002, sequential processes got faster on average by about 65% per year … Something like that, a lot. Yeah, and from 2002 to 2004 that slowed down to 20% per year. And the reason for that was that chips got bigger and bigger and bigger and the clock frequencies got greater and greater and greater and so in about 2002 we got to a situation where in one clock cycle you could not reach all of the chip. … the fundamental issue. Yes, it was predicted by some hardware guys in 2000, and they wrote a paper saying the end of the road for instruction-level parallelism. What that means is that chips as from about 2002 actually go slower because the solution to that problem is to not have one racking great big chip, which gets bigger. It is to break it into small cores on the same chip. So, actually programming is sort of, you know, all over the mid-eighties and nineties they get at more and more bloated features to their code, their code could go slower and slower and slower and there was safety at the end of the day by a faster clock speed. I mean, you must remember that, you know … Amstrad, CPC … 40 megahertz back in 1985. Now we have two gigahertz. Sure, yeah. So, programmers actually did not have to do anything. Now I do not know if they were aware of that fact. You see, and I think when their boss, sort of congratulated them, wow, your new program has a lot more features in it. You know, it is not slower, you are amazing the smart programmer. I think a lot of programmer actually believed that it was due to their simple small code. They ignored the fact that the processor got faster. Yeah. So that trend stops in 2003. What happens then is the sequential processes start going slower. So, if you want … Really different. Yeah, and I think this is going to be, I think, let us see what will happen. I think after a year or two years there will be this gap, which people will start to notice. And then after three or four years, and then funny things are going to happen. In 2009, Intel according to their roadmap will bring out the K4 processor, which has got 32 cores. And some poor programmer is going to be sitting there and they are going to have performance problems with their program. And the boss is going to look at the CPU measurement on the machine and see that it is, let us say, it is using one of these 32 cores at a 100%, and he is going to say hm, you are telling me you have got performance problems and yet we are using 3% of the CPU, come on. The problem is in order to use that other 97% of the CPU, you have to write a true multi-threaded application. Now, you know, a Java program is know all about thread-save applications. And they also know that when you go from a sequential processor to a parallel processor, the program probably will not work. Well, actually I do not think they do know this. The computer scientists know this because the programs interleave in a different way and a whole new load of bugs is going to come out. Okay, so in the book you recently wrote, which is called Programming Erlang, a pragmatic programmer thing, we put it into the shownotes. You started actually by motivating a new approach to programming, which you called concurrency-oriented programming. And you did not say it is object-oriented with concurrency or functional with concurrency. You kind of try to coin or you did coin this new, well, not moniker for a completely different approach. So, can you characterize what this concurrency oriented programming approach means as a whole, what are the ingredients to making it? Yes, it is a style of programming where you use concurrency to structure the application. I mean that is the simple answer. And it is a style of programming, which maps onto the real world because we live in a concurrent world; anybody who tells you that we do not is an idiot. I mean, wherever you look you see things happening in parallel. People are walking around in a room, cars are buzzing along on a motorway, something like that. And to model that in a sequential language is a complete nightmare, just does not make any sense. So, as I see the world, I see objects, which communicate, I mean, we are sitting in a room having an interview and, you know, for us to model that this is Markus in one process, Joe is in another process. I am sending messages to him, you know, he has just sent me a message, a visual message, the message was move the microphone nearer to your mouth because the microphone was drifting off. That is a message. And we do not have shared memory. If we had shared memory, Markus and I would be a Siamese twin and we would communicate, you know, from some new or old passage but we do not work like that. The world does not work like that. And modelling it like that, I think, is a disaster. This is shared memory programming, which is a deep sort of two models of concurrency, this shared memory processing and this message passing processing. So, what the concurrency-oriented programming did, that is the model of what we are doing. And so you have programming languages and then you have modelling languages. If you take the world of objects, the programming languages are object-oriented languages, are languages like Java, C++... And the way you model the world, the way you think about the world is in terms of objects. So, if you go to the Erlang world, the programming language is based on pure processes and message passing but the way we model it, is a concurrency model. We map the concurrency model from what we observe in the real world. Okay. So, the primary thing here is to share nothing, no shared memory, actors are completely isolated. Yeah. However, there is an additional tweak to it, and I mean Erlang is a functional language, which does not have any immutable state. So, there is also this thing of requiring no locks on a much lower level. So, how does immutability and this functional modifying nothing approach play into this? The reason that shared memory programming models are difficult is that two parallel processes can access the same area of memory at the same time and make changes to that memory. That is a notion of mutable state. If you have immutable state, then variables in a language cannot be changed once they have been set. Erlang has what we call assign-once variables, having been given a value they can never ever be changed. That means the process itself cannot change them. But more importantly, it means other processes cannot change them. Since other processes cannot change them, you do not need to lock these data structures. So, everything can run in parallel. When I was reading the book I actually wondered how important that feature really is. Because if you had mutable state within an actor and used message passing between them, then you would still achieve the actor level concurrency and no shared property, right? Yes. So, I was wondering how, well, if I build an actor library on some other language that enforced those properties, then I did not need a functional language that had the immutability concept on language level, right. Or is there a deeper … You could do it that way. I mean, I think, it is probably a historical accident that Erlang is, that the individual processes are immutable. And that actually reflects the Prolog history. If it had started off C or something like that without message passing, we would have still done, let us say, a version of C. MPI. Yes, something like that. Yes. Which is the message-passing interface ... But there are additional benefits to having non-mutable state. So, in the book I say when you debug a program you have an additional benefit because when you, if you set a variable only once and you like to preserve it even the thing cracked, this any moment and place in the program where it can happen. And if it is getting updated all over the place, you do not know which of these occurrences were submitted and lead to the error. So, these actually are very easy to debug. Good point. Something, well, you have these actors, which serve as the active ingredients of a concurrency-oriented program. Now, if you build a virtual machine for Erlang or a compiler, how will you do it actually, is it compiler-interpreted or … Yes. Well, it is both. It is, the first version was a pure byte-code interpreter. And that became a 32 bit threaded word interpreter and then a group at Uppsala university built a HiPE compiler, which was a High Performance Erlang. That is a nice acronym. So, we can hype it. And so, today what we distribute is, well, actually the machine changed, became called BEAM, which is Bogdan’s Erlang Abstract Machine. In fact, we had three or four of these machines, all sort of named after ourselves in typical modesty. And yeah, the current version is a 32 bit threaded machine in a rather light source with a native code compiler if you want it. So, in some sense, and that is what I was trying to get at, the virtual machine of course uses threads, probably thread pools or stuff to do things. A virtual pool steals two threads from the operating system and then it does everything itself. So, the processes are, I mean, Erlang is much more like the runtime system, it is much more like an operating system than a normal runtime, I mean the normal runtime for programming languages is rather small because everything is done with instructions through the compilation process. But in Erlang the runtime system is enormous because it is doing an awful lot of what the operating system is doing. In fact, the operating system just basically manages files and sockets for us and gives us a larger amount of memory when we start. And then we do it all ourselves. We already mentioned MPI briefly in passing. So, the question is why is everybody doing shared memory and are you the only guy or the only community who does the message-passing thing? So, the question is if there are other approaches or other languages, other tools that use the same kind of approach to concurrency? There are, but I do not think there is material as in Erlang. Erlang is kind of leading the pack, well, when it comes to message passing concurrency. And I think the reasons for it have actually got nothing to do with speeding up things or multicores because that is a modern phenomenon. The original reasons have to do with fault tolerance. So, if we back off back to 1986, what we were trying to do then is to build a fault tolerant system. So, I think software people do not really understand fault tolerance like hardware people do. So, the first I used to give a lecture and I still do give a lecture, well, I said, well, if you are going to make a fault tolerance system you need two computers. And he looked strangely at me. And I said well, you know, come on, if one computer crashes you are screwed. So, you actually need two computers. Or more. Or more, yeah. I mean, you can have three or four or five, and both might crash. And so, let us take a very simple world, it has only got two computers and we want to make a fault tolerant system. It is pretty obvious that you have to copy all the data from one computer to the other, that you need to do recovery. If computer number 1 crashes, you take over on computer number 2. And that means, you cannot have the dangling pointers, you must have copied everything you need from computer 1 to computer 2 in case computer 1 crashes. So, that is the reason for copying everything, it has got nothing to do with concurrency, it has got a lot to do with fault tolerance. And so, that is the kind of restoring that for Erlang. That has an inevitable cost because people want to say things like, well, I would like fault tolerance to cost nothing in the case where the system does not crash. But that is just not possible because if you make a fault tolerant system you need two computers; you need to copy all the data over. If they then don’t crash, you could not just have a dangling pointer, you could have copied less data and they will not work in the presence of errors. The additional benefit which we saw years later was once we have copied everything we never need to lock everything and these two machines can work in parallel on that data. Well, I do not think that thought was in our heads, you know, back in the mid-eighties. It is interesting to see you. We had a couple of episodes on distributed systems and in the distributed systems, well, it is well known that message passing is a good way to do it. Obviously, you cannot have shared memory over a physical world, your Sweden-Australia example. So, in addition to sequential Erlang, which is a functional programming language with pattern matching, we will talk about that later, and concurrent Erlang, which allows you to basically spawn every function execution into a separate actor, there is also distributed Erlang. So, how is that different? The model is not different. That is why we have concurrency, in this model we just have message passing and it uses what we call location transparency. This is just the idea of if you know the name of somebody and send him a message, it is like email or popping a letter into the post. How you actually implement that is a kind of, well, it is not a detail. You mean how the VM does it. Yeah, so, here are two alternatives or three, four alternatives. One way of doing it is to have two virtual Erlang machines running on the same physical processor. These are separate address bases with processes and everything. Then the programmer himself manages the named space. When I want to start a process in node number 2 or in node number 1, and you can have as many nodes as you want as they are convenient because you could put, let us say, ten nodes on one processor, one sequential processor, and you can test a distributed application. Then you can deploy it in real distributed application by moving these nodes to physically separated computers, possibly in different countries. That is modern modelling distribution. The other is that you map these nodes automatically onto the cores of a multicore computer. And then you do not need to know about the nodes and this is a completely different version of the Erlang scheduler, which would just schedule randomly onto one of these, that was not random, it is a Round Robin Scheduler. But the details do not really matter. And in the implementation we will improve there is algorithms and things. So, not a version of Erlang. The Erlang programmer sees a world where there is only one node and the virtual machine maps processes onto the physical cores in the CPU. But all this is, makes virtually no changes to that program. That is the nice thing actually, I mean if you want to evaluate a function in another actor, you first basically say spawn and then you specify the function. And if you want it on another node you say spawn function, then add the aim of the target node. That is … Yes. And just all the data in the closure just busies over to the appropriate, we have not to do anything. Nice. Okay, so let us leave the concurrency aspect behind for a moment and let us look at the fault tolerance aspect. By the time we broadcast this, we will have had the double episode with Bob Hanmer Fault Tolerant Systems. So, you listeners should know what it is all about with the fault tolerant system and some of the patterns and so, in Erlang there is this thing called let it crash, let it die. Yes. So, what is your model towards fault tolerance there? The model of fault tolerance is the fundamental model, is that of handling errors remotely, not locally. The reason for that is back to our, if you go back to our two computer’s situation, we have two computers and one of them crashes. The error must be handled on the second computer. If the first computer crashes, we cannot handle it on the first. That is no computer to handle it on. So, this introduces very early on the notion of remote error handling. So, if process 1 crashes, process 2 will fix it up. So basically, try not to fix up the error in process 1. Okay, there is a try-catch to except things, so that thing inside will not process. Just because if it is a trivial error and you can deal with it. Then you do do so. The model we built is basically, what is provided in the language is a reporting mechanism. So, if one process dies, another process gets an error message and says hey, this process died. You can link the process, just one being the supervisor of the others. That is right. So, we, it is all like a human organisation. Again. Yeah. Because it is very easy to understand. We have workers and we have supervisors. The workers do the jobs and the supervisor just watches the workers. If a worker dies, the supervisor gets informed and he starts a new worker. So, the workers are not supposed to fix errors. You know, we even have programming rules, we tell them they are not allowed to fix errors. This makes a very very clean programming model. So, it is strange, in all languages apart from Erlang, you are encouraged to do defensive programming. In Erlang, we are not encouraged to do defensive programming. We have what people call happy case programming, we only program the happy case. What the specification says, the task is supposed to do. You see, one of the troubles, which I met early in programming was when writing code from a specification, the specification says what the code is supposed to do. It does not tell you, what you are supposed to do if the real world situation deviates from the specification. This is an enormous class of things. It does not tell you what to do if it does not follow the speck. So, what do the programmers do? They take ad hoc decisions, this is crazy, this is lunacy and this leads to about 30, 40, 50% of all code. It is there a checking code to allow for the cases, which are not in the specification that the specification does not tell you what to do. So, the Erlang approach is let it crash. If you deviate from the specification, you crash. And the supervisor process detects this. So, the workers are going along and it says, basically it says, you know it is like a really top down strict brain-dead organization where the worker in current of the situation, they do not know what to do. And so, they die and the supervisor gets the message hey, I have died because this thing happened that is out of speck. So, what does the supervisor do? He says: Make sure all the invariants are obeyed. The invariants are things like closing files, closing sockets cleaning up behind you. So, it does not matter how an error occurred, it is irrelevant because we put it right with the invariants afterwards. Now, the error goes into the error log. When we look into the error log, when we look at the error log, we can say okay, what happened, what this worker gave up because this funny thing happened. And then the system was put back to rights again. And then we can make a design decision. Do we wish to code for this care, it is so rare, this is never going to happen again. Yeah. And that brings us nicely to the next point. So, if this thing really has a, well, to use the correct terminology, I think, fault somewhere in the code you might want to replace this module, actor or whatever. So, that is where if you want to run the system forever, you want, you need this dynamic on-the-fly update ability. That is something else that Erlang brings? Yes. And in fact, here is a funny thing. When you have deep concurrency in your language, lots of processes, things like code upgrade actually become easier than in the sequential language because, I mean, let us imagine we have got a million processes and, you know, a thousand of them are executing code that has got some little bug in. But the other 900,000, we do not want to bother those guys, okay. So, they can just carry on doing what they are doing. If we can track down the 1,000 processes that got the bound code in them, we can change the code in those ones and leave all the rest alone. If you only have one process, that is a problem. Yeah. So, one impressive number I, I do not know where I read it, it was probably in the book, was that using this approach you achieved nine nines for some kind of switch system. Yes, this was an Ericsson, I think, transformer switch called AXD 301. That figure of nine nines is kind of bloated around in a lot of blogs and does not represent an average behaviour, it represents a best-case behaviour that was observed once and British Telecom observed this and reported it. When we asked them, I think, for detailed figures, you know, what did the systematic study know, so, I mean, it is more a apocraful then hard science. But it is the ... It is a sort of top of the mountain. I am not saying that all the systems run like that. But aiming at seven nines or something like that, you know, it is realistic. So, if you want just to, since we talked about actual real achievable performance, let us briefly go back to the concurrent thing. What is the scalability, do you really get linear scalability, how much does this depend on what your program or in your structure of the program, what is some of the best practises to make a program scalable? Yes, right. This is very early days because to start with, there are not many multicore processes with large number of processes. The largest one you can get your hands on is probably the Sun Niagara, which has got eight, no, I think up to 64 of course now. But I have played with the eight core version with four hyperthreads per core, which Sun says equipment to 32 hardware threads. Now, what we see there is programs that have a large amount of computation and a small amount of data in the messages scale linearly up to about, let us say, 20 processes, above 20 processes, sorry, not processes, 20 cores. When we go out from 20 to 32 cores, the curve flattens off a bit and I don’t think we understand why, you know, somebody might understand why and we have to investigate the cache behaviour and things like that. But that is very encouraging. Then the entry level to concurrent programming, that new cores, I mean, you cannot really investigate, you do not get many plots on a graft when you have got two points. So, you cannot see if it is linear. But it is very encouraging. Ericsson's chip deployed under the name dual core AMD, which was virtually twice as fast as the one that was running on a single core. And everybody was actually delighted. I mean, so here’re the programmers, because, you know, programs in Erlang were sort of ready for concurrency. So, when they ramp up the number of cores we tweak the program a bit, plug it in and measure higher-fastest. So, we see this is a great commercial advantage. And to make it concurrency-ready basically means that you should have as many actors as possible because if you have few processors, cores it is, well, just scheduler on several ones and if you have many, it is just going to fill up. But it is not only that. You have to get rid of the sequential parts. There is something called Amdahl's Law, which is very easy to understand. I mean, well, for example if 10% of your application is sequential, then you will never be able to get faster than ten times. And the reason for that is if you thing about it, 10% is sequential, 90% is concurrent. But these 90% can be just made to shrink down to zero with enough processes. You are left with the 10%, which you cannot do anything about. So really chasing the sequential code out of your application is the thing you have to do to make it go really really fast. And of course, you can only do that if the overhead of making something concurrent is very low. So, and that is the whole point. That is the point, yes. I think absolutely you are right that that is the key, a key observation because on the chips there are two types of concurrency, this is structional level parallelism, which is deep pipelines, and then you have got thread level parallelism and that maps onto multicores. So, it is not actually another parallel mechanism, parallelility, that sounds awful, but this is not another mechanism for parallelism there. So, really now, since you were doing thread level parallelism the programmer has got to make the decision, is it worthwhile the effort of spawning a process on a new physical processor because if there is a very small computation, that is not worth the effort of sending off to another processor. And I think, only the programmer can make that judgement. And I am pretty sure that is right because right since 1960s there have been research programs to automatically parallelize programs, chiefly Fortran for simple computing application. And these projects have never succeeded. Well, they succeeded in a minor sense, they had managed to speed up things by 15%. And that’s saying ok, the compiler, the analysis program figure out what we can do in parallel. I thing the programmer is much much better doing now. They know or they should know in a good program it has a performance model in their head and they say well, I know this is a big computation, I know it is isolated from that source spawning off in a separate process. It is just that the nuts-and-bolts of doing it in C or Java is extremely difficult, it is painful. But it is very easy in Erlang. So, when I read the Erlang book, and even more when I was in your talk one day, I got the impression Erlang is not really a language, it is a system, an operating system with a language of course but there is a lot of infrastructure that makes all these things possible. It is almost like a little bit of an application server. I mean, if you compare that to some of the enterprise applications, always you will see many commonalities. But in order to build real systems you actually do deliver and package additional infrastructure … Let us talk about the OTP framework. I think we have an unfortunate naming here. Actually, I do not think I have ever said Erlang was not a language, Erlang is a language. I said that was my impression. Yeah. So, what is OTP and what is the runtime system. I think, to give you an analogy, I think if you said well okay, Erlang is like C. The runtime systems like Unix and OTP is like Ruby on Rails, it is an application framework. You know, it has just got a very unsexy name, I mean, if we just called OTP OTP on Rails, you know, everybody would be using it. So, OTP is an application framework and it has got all the kind of goodies you need to build fault tolerant systems. So we are familiar with application frameworks, things like Ruby on Rails help you make web applications. But we are not familiar with application frameworks that help you make unsexy things like fault tolerant software that runs forever. So, that is what OTP is. It is Erlang on Rails, whatever you like. But for building fault tolerant applications. And the runtime that is the operating system. So the ingredient is a database, I guess, and some kind of probably server component. So, the main ingredient, as you said, is a database. And that is a very interesting database. It is tailored for real time use. I mean it reflects its heritage. It was done for telecom applications for Ericsson. We use it in telecom’s applications. So, it is RAM-resident. And it is not strictly true. It is, you can define tables in it, it is a DBMS. It is rather unique, I think, and once you defined the tables you can specify where physically the tables are and how they are. Frequently you can say I want a RAM-replicated table on two machines. And that is for fault tolerance. So, you might RAM-replicate on two machines and disk-replicate on one machine. So, you spread the tables over three machines or two machines or you might have a fragmented table you spread over 50 machines if you have a colossally large dataset. It is very flexible. And of course, it understands about Erlang data terms. So, it is an object database. Yes, you can put any tuple. I mean, you do not have to serialize something. You know, if you want to store an XML parse tree in the database it is a single instruction and banging goes your XML parse tree. I mean, there is no messing around, it is an object store. Okay. So, then let us look at the other extreme. We have looked at all the infrastructure distributed, concurrency, blablabla. Let us look at sequential Erlang. Erlang is a functional language. Can you give us a little bit, some pointers to what some of the interesting features are. It is a dynamically typed functional programming language. So, functional programming is split into a number of different schools. All functional programming languages are based on the idea of immutable state. No side effects, immutable states are another way of saying they have no side effects. And you know, what you learned at school was that side effects, especially when they are hidden, are evil. So, that is why. You know, that is what you do in Java all the time, you put them in a methodically hideable racer so your programs are crashed. Functional programming is do not believe in that. Do not believe in something called the lambda calculus. The lambda calculus is a pure calculus for how you perform a computation. That has spawned off several kind of families of functional programming languages. You have the polymorphic strongly typed languages, like Haskell and things like that. And then you have got the languages without a typed system; they are still safe, languages like Scheme and Erlang. So, Erlang is in this dynamically typed part of the functional programming languages. But this still has all these notions of hire-order functions of immutable state. That is the problem, which, it is a problem member of the functional programming language community. So, we see different functional languages, which I think are different appropriate for different tasks. Can you give us a very small intro to what pattern matching is because that is the concept that Java folks and probably mainstream people do not know. Yeah. Pattern matching is, we do not use if-statements and case statements and, what we do actually but too much lesser extend then normal programming language, we just write patterns that matched. That is in the function declaration. That is in the function declarations. It is different variance of the same function with different patterns that match. Yes, it is a way of doing polymorphism. So, sort of virtual polymorphism in some sense. Yes. Now, what the pattern matching compilers do, they take these patterns and then actually compile them into trees of if-statements and switch statements and things. But they do it probably in a way that is more efficient than you or I can do. They optimise these things. The main consequence of pattern matching is that the programs are extremely short. Because the programmer does not have to think about do I write a switch statement or a case statement or an if-statement, they do not write these things. They just write the patterns and the compiler produces these things for them. Okay. So, I think that was all the technical stuff. Let us look a little bit to wrap up slowly, at how the language was designed, I kind of thought that, you know… Sorry, design is a kind of … Okay, how it happened that there was, yes, you know, you were kind of visionary in some sense what concurrency … Okay, so what I am about trying to get at is that the, well, maybe if it was not designed, we should not need to call the process, so, what were the sequence of accidents that lead to Erlang? Well, as I said, we started off; I joined this really newly started computer science life at Ericsson back in about 1986. And we just got, I think we were the first people in the company to get a Unix system because people said oh, you cannot use Unix, that is sort of virus that must not be connected to the rest of the corporate networks. Nobody believed in Unix in those days. And we had a VAX 11750. So, basically we interpreted computer science as just programming telephony in every programming language that was a free implementation for the VAX 11750. That was our definition of computer science. And we just sort of tinkered around, so this was not a process. All we actually did, we programmed the same problem, which was just ports. We had a little hardware telephone exchange. PABX, which we controlled from these different languages and we wrote telephony in Aida, in concurrent UCLID and Prolog, in Smalltalk, in ML, in all these languages. And we actually published some research on this and we just saw how easy was the program. So, what happened then was, I was sort of grafted into this project and I, everybody was doing their own little ways at programming. So, I used Prolog and added processes to it. And this very slowly sort of, you know, one day it had got a name, you know, we actually said well, this is sort of a little programming language. And then we thought oh, let us try it out on some users. And we managed to find some users and they said ah, this is great, I mean, they liked it, they were really enthusiastic. And things kind of dithered alone and, yeah, they liked it. It was not fast enough, they wanted features, we had not the features. First you grew the language. And in that way, I mean, that was just me and them. And you know, a complete write and re-write of the system took a day or 20 minutes. It was not long. And you know, my boss Björn used to say what is Erlang, it is whatever Joe defines it to be today. There was never a big specific identity. All this kind of, when I give a lecture we say here are the requirements for the language, that is after the fact construction. That is what we will do afterwards, to explain what we have done. We never had a set of requirements. Well, implicitly you had ... We had it implicitly. I think the culture, you know, we worked for a big telecom company, the culture, the requirements were sort of sleeping out of the walls. They were in the culture. And so, we built this, we tried it out on some users. After about a year I worked on it alone and then Robert Virding joint me and Robert as usual said: Can I make a few changes to your program? I said yeah, sure, you know. And then typical Robert he rewrote everything new. I think what was left about the program after he made a few changes was one comment. He said Joe started this. He changed every single line of code and then there were two of us. And then about a year later Mike Williams came along and then there were three of us. And at that stage they wanted it to go faster. And they decided that they could deploy Erlang if it was 80 times faster. 80. 80. Eight zero. Because this was a Prolog interpreter timing, very slow. So, we kind of scratched our heads how we could make it faster. And we decided to then do it about a code interpreter. And that is written in C, I guess. Well yeah. So we, at that stage I started writing a byte code interpreter, I wrote it in Prolog. And of course, it did not get faster but slower. So, the Prolog version, went about a thousand interferences per second, it went out to four interferences a second. I said ah, the coming plan would have to replace the slow intro up to the C program. And then I started writing. I had never written in C before, I used to write in Fortran until Mike saw my C and wanted to puke and said: Joe, that’s the worst C I have never seen in my life, I guess. Try to look at like a program. Yeah, and so Mike came along and he wrote an emulator C. Mike is pretty good in C. So, he really, I think he and I sort of, he told me quite a lot about C and Mike had a lot of experience with concurrent languages. And so he said well, the performance of a concurrent language is predicated by three things, it is the context-switching time, it is message-passing time and it is the time to create a process. So, these are fundamental. You know, in any programming mode you start with something fundamental. And you implement that very very very very well. Then you build your tower on top of it. And so, what Mike did in the emulator there was, he broke the C and then he turned on the optimiser and he printed a heavy assembly and stated it many times and did it again and again and again. but for these kind of low-level virtual machines, it makes sense to tweak the details because it scales kind of. And so we tweaked the details and then we made the first byte code interpreter. And that was actually, that was before the JVM and things like that we deployed it. And it, I cannot remember what it was, it was more then 80 times faster, it was a hundred times faster, something like that. And the product team said they had revised that, they wanted it 280 times faster. And that started this long road of sort of re-writing and making it faster. That carried on until it was turned into a register machine and Bookman and Housman took over the … That is Bogdan and Beam. Yes, that is right. And he re-engined the random machine for a register machine. And we got researches in. So it, Claes Wikstrom joined after four years and added things like the real-time database. And he had a distribution, we had always planned distribution but we never actually implemented it. I mean, it seemed just you know, we would just call it messages and things, so it must be easy to implement, because it was easy to implement we never actually did it. And it is sort of grew from this very rapid approach of typing cycle where everything took a data to re-write the complete system. And now, it must be up in, I hate to say it, you know, six months or something. Okay, to wrap this thing up, why do you not give us a little description of what the community is today, how the book is going, what is behind the HIPE these days? I do not know. You know, I mean …, it is crazy. I mean, if you look at Reddit and things like that, you will think there were only two languages around. But I mean that is among people who can get a number of programming languages. People seem to be flocking to the language, I mean, if you just search for Erlang blogs, you know these four new ones, you know, go to Google and do blog searching on Erlang you will find a sort of four new ones a day. What I am saying, I am not saying they are all good, some of them are extremely good, some of them are kind of not so extremely good but that is not the point. The point is this interest that is there. And then we got one or two flagship applications, you know, telephony applications and databases aren’t sexy stuff and others are kind of boring. I think the flagship application is Ejabat, which is a Jaba server. And that was written completely in Erlang, I thing the exciting thing is, can Erlang spread through the hillside its community, can we see spontaneous growth from people who just picked it up off the net. And yet we are seeing lots of that, we see a lot of companies being formed that use it. I think, we see a lot of companies using it and make money, we see a lot of companies that put Java competitors out of business. So, this is, the people know about that, the word is spreading. Venture capitalists know about it. So, they find out companies. That is very exciting. I saw this blog post by Ralph Johnson who suggested that Erlang might be the next Java. So, it seems to be taking off really. I do not know if that is true. I also found it a little bit optimistic ... I think, I was thinking about it. And I think Erlang might be kind of, in the way that languages are, it might be like Smalltalk. Because if, what happened with Smalltalk. Smalltalk was the language that did object-orientation right. They got it right. And the good ideas were taken from Smalltalk and grafted on to C and caused is the creation of languages like Java. Now, Smalltalk always had this passionate community that said well, we do object-oriented programming right. And all these other people just have stolen a few of the ideas and, well, not stolen, they have included these ideas in their languages. So, what might happen with Erlang is that is has this passionate community that says well, we do concurrency-oriented programming right. And then other people graft these ideas into their languages. So, you might see Java++ that has the Erlang concurrency model. And you might have heard about the Scala people who put the actor’s framework into and then got really good concurrency. Yeah, I mean that is, but I think they have really to, I do not know how they have done it. Well, they also have closures and some of the pattern matchings ... But I think there would be a case for re-engineering this, as I said earlier, in the Erlang virtual machine you have to start with the process-spawning mechanisms, with the context switching times, with the message passing times, really low level in the virtual machine. And then you build that up. If you do not have instructions in the virtual machine to do that, you have to emulate it. That is very inefficient. So, if you want to re-engineer a virtual machine, like the JV Amos or something like that, adding instructions to create processes and things and then build something on top of that, I do not think you could graft in the existing infrastructures. I mean, the trouble with Java is if you say well, let us make Java++ so that we can use all the libraries but the libraries are using the shared mutable states that is not really going to work. But re-engineering it could work. Well, as long as it has curly braces, I think that is going to be the next big language. Okay, did I forget to ask anything you want to say? The bit syntax. Oh, the bit syntax. No, this is really cool because it is a DSL, which is my kind of pet peeve, a DSL for parsing bits in a byte stream. So yeah, give us the heads-up. Yeah, the bit syntax is something that was added to Erlang, it is an after-thought. Tony Rogall, I think, did it as well. And what they observed with a lot of time they were just unpacking data structure, I mean unpacking bits out of words. So, you know, if you ever tried to extract seven bits out of the middle of a 32 bit word, you do a sort of x bit shift left by eight bits and zero x ff, now what the heck is that, you know how you do that? No, I cannot do that. Yeah, it is horrible. And, of course, you get it wrong. Actually my approach would really be since I am into this DSL stuff to write DSL that specifies exactly what I want ... So, that is exactly what we did. We extended pattern matching over bit fields. So, now you can say, you know, this is seven bits, this is three bits, this is two bits, this is nine bits and that gets compound into pretty efficient code. So, in the book, my favourite example from the book is, of course, it is an Mpg3 streaming media server, which implements the Shoutcast protocol. And what we have to do there to get it right is to synchronize the server with the action Mp3 frames. And this is just about a low level as you can get. But it is written in Erlang, which is a high level language. So, you just define the bit pattern, which defines the sync frame in Mpeg3. And then you just trundle down the data looking for this. And once you found it, you just stay in sync; the code is really easy to understand. It is a nice example of something where you put the specification on one side and you put the code next to it, it is isomorphic. So, it is very easy to understand. It is, when I was in your talk, I think Magnus Christensen from Intention was also there. We looked at each other and said hey, this is a really good example for DSL … Yes, and I think this is one of the sort of things where you see a lot of, that can be grafted onto a lot of programming languages. And I always wondered why do regular expressions go over bytes, why not over bits. It is very easy to make a regular expression match when the match goes over bits. This is very useful. Absolutely, especially if you are in this kind of low-level area. Okay Joe, thank you very much for being on the show, I appreciate it was very interesting. Thank you. |