August 16, 2016

SE Radio 266: Charles Nutter on the JVM as a Language Platform

Venue: Skype
Charles Nutter talks to Charles Anderson about the JRuby language and the JVM as a platform for implementing programming languages. They discuss JRuby and its implementation on the JVM as an example of a language other than Java on the JVM.

Show Notes

Transcript

Transcript brought to you by innoQ

This is Software Engineering Radio, the podcast for professional developers, on the web at SE-Radio.net. SE-Radio brings you relevant and detailed discussions of software engineering topics at least once a month. SE-Radio is brought to you by IEEE Software Magazine, online at computer.org/software.

* * *

Charles Nutter: [00:00:35.08] Hello, this is Charles Anderson for Software Engineering Radio. Today we are talking to Charles Nutter from Red Hat. Do you mind if I call you Charlie?

Charles Anderson: [00:00:42.22] Yes, that’s just fine.

Charles Nutter: [00:00:43.12] Great. At Red Hat, Charlie works on JVM languages, focusing on JRuby. He has also worked on JRuby for Engine Yard and Sun Microsystems. He has been a Java Virtual Machine enthusiast since Java 1.0. Charlie works to make JRuby the best Ruby implementation for high-performance big data and heavy loads, and to use the lessons learned from JRuby to help the JVM and other languages that run on it meet their potential.

Today we will be talking about the JVM as a language platform. Welcome to Software Engineering Radio, Charlie!

Charles Anderson: [00:01:17.00] Thanks, I’m glad to be here.

Charles Nutter: [00:01:18.17] Let’s start out at the very highest level. What are the components of the Java platform?

Charles Anderson: [00:01:28.10] I always think about the Java platform as three pieces. There is the Java language — or whatever language you happen to be running on top of the platform — there are the libraries that have been written for the platform — we can include the JDK libraries themselves, Java.whatever, and all the other libraries that are out there; there are tens of thousands of libraries people have written — and then probably the most important part of that is the JVM, the third piece, which makes everything run. The language and the libraries may be the same across different runtimes, across different JVMs, but each of the JVMs runs things a little bit differently, with different goals in mind.

Charles Nutter: [00:02:05.00] What are the attractions of developing on top of the JVM?

Charles Anderson: [00:02:09.23] Like any good managed runtime, the JVM provides Garbage Collection, automatic memory management, it provides a JIT that can turn JVM bytecode into automized native code. The wealth of libraries that are out there are a big part of it. Folks that use JRuby often will come to us because there’s a JVM library that doesn’t exist in the Ruby world. With those three things we have a good selling point for building languages on top of JVM.

In some cases (like with Ruby), the original language is also not multi-threaded, or it’s multi-threaded but not really parallel. That’s something else we get out of the JVM – it’s been multi-threaded truly parallel for years, and it’s really battle-tested, so bringing Ruby to the JVM also gives it real native threads.

Charles Nutter: [00:02:58.11] Do you off-hand have a guess as to how many languages there are besides Java on the JVM these days?

Charles Anderson: [00:03:05.01] We’ve tried to track this a little bit over time; there’s no canonical list of all these languages, but people periodically do put together lists. There’s a couple out there on Wikipedia, and it’s definitely in the neighborhood of hundreds. It was over 200 registered or known language implementations when we started hitting JRuby hard and working on it hard in 2006, so there’s more now. It continues to grow over time. It’s probably 250-300 languages that are available.

Charles Nutter: [00:03:37.02] Is there any taxonomy at a very level one could apply to the languages?

Charles Anderson: [00:03:45.00] Every language that gets written for the JVM is going to be a managed environment. There are implementations of languages like C and C++ that can compile to the JVM, but generally it’s going to be managed languages with memory management taken care of for you. There’s also static and dynamic typing languages. A statically-typed language is certainly easier to get onto the JVM, since it’s a statically-typed runtime, but a lot of dynamic language implementations like Ruby and Groovy, and I think closure qualifies as a dynamic language as well. Outside of that, it’s the same as taxonomies for languages on other platforms. There aren’t so many lists out there that show all of the different languages, but most of the interesting ones people have heard of, like Scala and closure and JRuby.

Charles Nutter: [00:04:40.21] Yes, later on I’ll ask about those. I think of those as professional languages, and then there are languages that are aimed at research projects, and then hobby languages, given that there are hundreds of them on the JVM, many of which we’ll never hear about.

Charles Anderson: [00:05:03.15] And there are toy languages and experimental languages all over the place on the JVM, too. It’s a lot of experimentation. It’s a nice platform to target for a language because of all the benefits it gives you, and it’s easy to get a language going.

Charles Nutter: [00:05:20.18] When the JVM was created, was it intended to support multiple languages?

Charles Anderson: [00:05:27.03] If you go back and look at the early revisions of the JVM specification, they did have language in there saying, “Perhaps in the future we may extend the specification to support more languages, or to have features that newer languages might need”, so they were thinking about it from the beginning. None of those enhancements actually happened until about 2007-2008, when we got Java 7 work going on with invokedynamic, making dynamic dispatch actually a fast and optimized feature of the JVM. It took a while for some of those extended features to come along, but I think they knew this platform was going to be good for languages, even in the beginning.

Charles Nutter: [00:06:10.10] Has Sun and now Oracle been very supportive of languages other than Java on their JVM? Or has it been sort of a politely tolerating?

Charles Anderson: [00:06:21.15] It was probably politely tolerating or just ignoring up until 2005-2006. At that point, folks like John Rose and Tim Bray, who were language advocates and working at Sun Microsystems, decided it was time for us to start doing a little bit more. That’s when the invokedynamic work started, that’s when Tom Enebo, the other JRuby co-lead and I were hired by Sun to work on languages (and specifically JRuby). Since then, the support has been really tremendous.

JRuby is used as a regression case for new versions of Java, new JVM features they’re working on, they use it for performance testing, they use it for punishing the JVM with some of the more interesting tricks that we use, and they’ve been working independently on language support within Oracle. They have done JavaScript implementations that is extremely fast. They’re working on new language runtimes to make more dynamic languages work fast on the JVM. I’d say they’ve been very supportive, and it’s been a great ten years to work on this stuff.

Charles Nutter: [00:07:30.02] That surprises me. I would have thought, if anything, it would have been kind of the opposite, that in the early ‘hippie’ days under Sun that it might have been a little freer, and then under the ‘corporate’ days of Oracle it would be a little more locked down, but good news there.

Charles Anderson: [00:07:46.29] In the late ’90s and early 2000s, Java was still considered a cool language by a certain number of developers out there. It wasn’t as reviled as it is these days, and they were fine using Java. Towards the middle 2000s people started to realize, “Okay, there are a lot of other cool languages out there. There are a lot of features from new languages that we’re not getting in Java”, and both Oracle and Sun recognized at that point that the platform can be more than just Java.

Charles Nutter: [00:08:19.20] That makes perfect sense, now that you mentioned those timeframes; it makes perfect sense in the rear-view mirror. As a case study for developing a language on the JVM, let’s look at JRuby. You were just saying that in a way Oracle uses it as a case study for testing. For our listeners who might not be familiar, what exactly is JRuby?

Charles Anderson: [00:08:46.24] JRuby is literally a Ruby implementation on top of the JVM. We tried to keep it mostly pure Java, pure JVM byte code, but there are some features that have native libraries that we call out to, specifically for integrating with the platform. We’ve tried to do as much as possible to make it look and feel and behave just like regular Ruby, all the way from the command line up through the libraries that are available. It’s Ruby on the JVM.

Charles Nutter: [00:09:16.19] Can you give us a brief history of its development? You mentioned beginning working on it in 2006.

Charles Anderson: [00:09:25.15] JRuby the project has actually existed for a bit longer than that. The first commit is in fall of 2001. Initially, I think they were mostly trying to get a good Ruby parser on the JVM, perhaps for tooling and other purposes like that. Over the years, they were like, “Okay, we’ve got a parser. Maybe we can actually build a runtime for this.” When I got involved in the project in 2004 and 2005 it was roughly compatible with Ruby 1.8.6. It was missing may features. It didn’t run most standard Ruby code that was out there, and I thought there was a lot of potential in this. I thought we could make this be one of the best Ruby runtimes, or the best Ruby runtime.

In 2005 I started working on it hard. My friend and co-lead Tom Enebo was already working on the project; we started really pushing forward, trying to get Rails to run, trying to get things to run fast, and then in 2006 we were hired on full-time by Sun Microsystems to keep doing what we were already doing. That’s been ten years of full-time work on JRuby since then.

Charles Nutter: [00:10:33.17] In case it isn’t obvious, we can just state that JRuby is open source.

Charles Anderson: [00:10:39.00] Right, open source. The main license is Eclipse Public License. It’s not a copyleft license, it’s one of the more business-friendly licenses, but we also released the source under GPL 2 and LGPL 2 for folks that need those licenses.

Charles Nutter: [00:10:55.10] Where do the majority of JRuby users come from? Are they coming from a Java world, are they coming from a Ruby world?

Charles Anderson: [00:11:03.29] It goes around and around. When Ruby really started to get popular in 2005-2006 a lot of Ruby-ists were Java expats; they had left the platform looking for something better. They wanted to do web development that wasn’t as verbose as it was in Java, so they went to Rails. There are a lot of folks in the Ruby community that were ex-Java people.

Most of our users have come either from Ruby as former Java users, or they’re just plain Ruby-ists that need a better runtime. We have people that will come from Java to Ruby, but it’s usually just more of a decision that they want to use Ruby rather than specifically JRuby, and JRuby just fits into their JVM world better. Most of the time it’s people that know Ruby already and want something a little bit more powerful.

Charles Nutter: [00:11:49.20] In the beginning of 2016 JRuby 9000 was released. That’s a pretty impressive number. What was that?

Charles Anderson: [00:12:00.14] For about the first three quarters of 2016 we were working on the current major version of JRuby which we code-named JRuby 9000. The name was intended as a joke initially, because we couldn’t decide what version number to use. 1.8 would have conflicted with Ruby 1.8, and we were going to be much newer than that. 2.0 would have conflicted with Ruby 2.0, and we didn’t want to confuse people. We went with 9000, and eventually it turned out that this was going to be our 9th major release, so it kind of turned into JRuby 9.0.0.0.

[00:12:37.19] What JRuby 9000 actually is, it’s many years — probably three or four years — worth of work on a completely new runtime for JRuby. Before we would parse code in, we’d have an AST in memory, the interpreter would just walk that AST for a while, and then a very naive JIT that would turn that AST into JVM bytecode for the JVM to run. The new version has its own intermediate representation, like a standard optimizing compiler, we have optimization passes, control flow graph, data flow graph… A lot more tools to analyze and optimize Ruby code before we even give it to the JVM, so then the JIT doesn’t have to do as much work, the JVM can do a better work of optimizing Ruby, and it’s really starting to pay off now with the most recent 9.1 release.

Charles Nutter: [00:13:27.11] That makes sense. You’re getting a little ahead of me, I eventually want to get into some of those details. How has the JVM aided the development of the JRuby implementation, versus a native runtime like the standard Ruby MRI interpreter?

Charles Anderson: [00:13:47.22] If you look at the development of the other Ruby runtimes that have to be building their own VM, they spend a tremendous amount of time trying to find ways to optimize that code. If it’s a runtime with a git, they have to write their own and their own native code generation; maybe based on LVM, maybe something hand-rolled. They have to write and debug their own GCs. That’s a career-maker. Those are things that you work on your entire life to get a really solid, generational, concurrent garbage collector. Those things we get for free.

We get a nice JIT that can do native code, we get excellent garbage collectors, we get a massive array of tools that have to be written from scratch for the other runtimes. All of those things that bootstrap a managed environment, we get for free. That makes a really huge difference, so we can focus on the performance of the language itself and on making better versions of all the core libraries.

Charles Nutter: [00:14:48.27] Then I suppose also you get things like the Java performance tooling.

Charles Anderson: [00:14:56.03] Exactly, pretty much any tool that you can use with Java, you can use with Ruby. Remote monitoring, performance monitoring, heap dumps, heap analysis – all this stuff works right alongside Ruby code.

Charles Nutter: [00:15:08.19] I recall one of the newer releases of Ruby, they were talking about some new garbage collection mechanism being available, and I was thinking to myself… If memory serves, that sort of thing was available in Java maybe ten years ago.

Charles Anderson: [00:15:27.08] Right. In Ruby 2.2 they added a pseudo-generational garbage collector. It will transplant objects that it knows are safe, that the C world hasn’t seen, that extensions haven’t see. So they can do a little bit of generational juggling around, but it’s still way behind what you have on the JVM. There are whole teams of people that work on the JVM garbage collectors for years.

Charles Nutter: [00:15:54.07] Yes, big teams that have been working for years and years.

Charles Anderson: [00:15:58.16] And writing their doctorate on this new garbage collector technology. This is where garbage collection lives.

Charles Nutter: [00:16:04.21] Exactly. I’ve seen you speak on the difficulties of duplicating the semantics of Ruby in JRuby. In what ways does the JVM make development harder?

Charles Anderson: [00:16:15.28] The fact that the JVM is kind of platform-agnostic is a mixed blessing. There are features that people expect to behave on a platform-specific basis. The biggest ones we’ve run into is a lot of the standard POSIX APIs, LibC APIs; things that are missing on Java, like until recently no support for looking up and managing symlinks, or all the possible permissions that might be on a UNIX file system. Ruby has its own notion of what a string is. A string is an array of bytes and then one of the couple dozen different encodings that it provides, while on Java all strings are UTF16 character strings.

We had to implement our own string implementation to support all those encodings. Even just down to language level there are features in Ruby and many other dynamic languages like closure’s that don’t have a good analog on the JVM. In order for us to have closures and capture local variables we need to allocate some structure, put that on the heap, and that has to be managed on extra overhead for the JVM.

[00:17:26.14] Some of these things are improving; they are working on making the JVM fit these other Ruby features, some of the POSIX stuff, the native integration… Java itself is going to have a byte array based string internally that maybe will be exposed at some point. It’s really at those edges… IO, native library support; the things that really touch the system have been wrapped up so neatly on the JVM that sometimes we can’t get the features we need.

Charles Nutter: [00:17:54.10] The POSIX stuff was pretty obvious to me, but I was surprised when I heard you speaking about the difficulties of strings, because neither one of the languages (Ruby or Java) treats a string as 7-bit ASCII. I had never considered that given that they had more modern representations that there would be so many difficulties.

Charles Anderson: [00:18:27.09] The string thing is odd, because they could have probably settled on a single encoding like the JVM, but they also use string as a container for binary data. There is no separate bite string, so they had to at least be able to support arbitrary unencoded bytes alongside at least one encoding of character data. Then, due to various issues, nobody could agree on whether it should be UTF-8 or something else, so they went with a more complicated implementation that supports pretty much all encodings at the same time.

Charles Nutter: [00:19:01.25] Is JRuby itself written in Java, or JRuby is itself hosting…? I think you said it was written in Java, right?

Charles Anderson: [00:19:11.03] The majority of JRuby is written in Java. The actual runtime, the interpreter for Ruby code, the parser and the compiler, that’s all Java code. The core classes, like string and hash and array are mostly written in Java, but more and more, as new features come along for Ruby, we often will add them as a Ruby implementation, to try and host more of JRuby’s implementation in Ruby itself.

Charles Nutter: [00:19:39.12] Then perhaps if it turns out to be a performance bottleneck, redo it in Java.

Charles Anderson: [00:19:44.07] Yes, exactly. A feature that comes to mind – spawning an external process. Already, spawning an external process is pretty heavy to do in the first place, so having that as a bit of Ruby code that does the actual process launch and argument processing is not as big of a hit. If it was a numeric operation, we’d probably want to have a native version of that.

Charles Nutter: [00:20:08.00] Yes, that makes perfect sense. Do you find that Java’s original promise of “Write once, run anywhere” works for either the JRuby compiler or applications that people write using JRuby?

Charles Anderson: [00:20:26.00] That has been great for us. JRuby is today probably the best way to run Ruby on Windows, because most of the core Ruby developers run on UNIX, most of the libraries get tested on UNIX, and they will run into odd platform problems that just don’t work on Windows. JRuby has been great for that. We also don’t have to worry about making binaries for every platform. We ship one Java-compiled jar file (class files) and it will work anywhere that there’s a JVM. The JVM guys have taken the hard work out of porting across platforms and done that for us.

Charles Nutter: [00:21:07.24] That’s true. It’s yet another great benefit of the JVM.

Charles Anderson: [00:21:12.09] Yes, definitely.

Charles Nutter: [00:21:13.10] What about If somebody’s developing a JRuby app and they’ve got some Ruby library that they want to use, and/or its dependencies? What portability issues do you run into there? We already talked about POSIX stuff possibly, or string stuff that Ruby is taking care of.

Charles Anderson: [00:21:36.08] If it’s a part of Ruby that is prescribed behavior that we’re not matching, we bend over backwards to make it work. We would consider that a bug in almost every case and try to make sure that JRuby matches however CRuby does it. However, sometimes libraries are written with only UNIX in mind. They’re using Ruby features or native features that aren’t going to work on Windows. In that case the code may run fine on JRuby, but they’re going to be hitting stuff that doesn’t do the same as it would on a UNIX system.

[00:22:10.00] There are also libraries that have native code in CRuby. We don’t run that native code, we have our own extension API that’s also for Java or other JVM languages rather than C. Sometimes you can’t even get those libraries to run on Windows at all with CRuby, but if we have a JRuby equivalent, we’re already ahead of the game. It’s just going to be JVM bytecode and our extensions will work, while CRuby’s won’t.

Charles Nutter: [00:22:39.15] Again, it makes perfect sense. I would also think that sometimes thread safety is an issue because of the Ruby library coming from an environment where threading is whatever you want to call it – a second-class citizen or an afterthought. Maybe the libraries aren’t thread-safe and then suddenly you put them into a multi-threaded environment. Would that be an issue?

Charles Anderson: [00:23:03.19]That happened a lot more in the early days of JRuby. People were writing stuff for simplicity and because it didn’t break on MRI. They would write stuff that uses a lot of global state. They’d have global variables or they’d have constants, which are essentially just a value stuck into a class somewhere, and they would be modifying that for multiple threads.

Over the past ten years we’ve managed to do a lot of education of the Ruby community – how to write good, thread-safe code, what not to do, and the libraries have improved tremendously. There is a library called Concurrent Ruby available now that has all the possible concurrency patterns implemented. There’s futures, there’s promises, there’s thread pools, there’s thread-safe collections – all of this stuff in one library that pretty much everyone is using now. That has given people better tools to go along with our recommendations and our education.

[00:24:05.15] These days we don’t run into thread-safety problems in Ruby libraries as often. They do come up once in a while. Usually it’s something we can help the author fix with a little bit of concurrency magic, but it’s improved over time. It does happen, but much less.

Charles Nutter: [00:24:21.17] How does the performance of JRuby compare to MRI or other Ruby implementations? You strive to be the best… Are you?

Charles Anderson: [00:24:31.17] We generally are now. It’s going to depend benchmark to benchmark. If you take a pure Ruby implementation of some algorithm – one that I use fairly frequently is a pure Ruby Red-black tree implementation that we have a nice benchmark for. If you run that on JRuby compared to CRuby, it’s going to be anywhere from two to ten times faster on JRuby. JRuby is generally going to be fastest, and if you find something that’s slower on JRuby, we would usually consider it a bug. There’s no reason we shouldn’t be faster than CRuby.

[00:25:05.11] There’s also work going on, part of Oracle labs, they have three or four people now working on an implementation of JRuby on top of their Truffle runtime, which is a much more aggressive optimizing platform than just running on top of the JVM normally. They’re getting close to C++ performance. It’s early days for that work, but we believe with the Truffle work, with our new IR and our new optimizations we’re able to do, we’re going to be able to get really close to straight up Java performance over the next year.

Charles Nutter: [00:25:39.28] Impressive. If I wanted to write a web application in JRuby, would I go with a Ruby framework like Rails or Sinatra, or choose one of the many Java-based frameworks? Would it be governed by, like we were saying earlier, what background I’m coming from?

Charles Anderson: [00:26:00.10] We definitely look at the background of folks that are coming in. If you are a Java developer that’s familiar with Java web frameworks and not interested in trying one of the Ruby frameworks that are out there… Most of them do have some amount of support for JRuby. Spring for example has ways that you can declare beans as being from Ruby, and then it will load the Ruby code up and install it where it’s supposed to be. Play framework has support for it, Vertex has support… So you can use Java frameworks with Ruby by going through JRuby. We recommend, if you’re going to be using Ruby, that you look at some of the Ruby libraries that are out there.

[00:26:40.12] Rails is still very hard to beat for quickly getting an application up and going, being able to evolve it quickly. Sinatra is a very nice, neat, fast environment for small services. You will find that it fits a little bit better to have your applications be a Ruby-based framework if you’re going to be using JRuby, but it is possible to use the Java frameworks, too.

Charles Nutter: [00:27:05.15] That makes perfect sense. In episode 240 of the podcast here we talked about Groovy. At least in theory, Groovy is just a jar file which makes it possible to sneak it into a Java shop in theory. Does JRuby fit that deployment model or does it require a little bit more, and thus you would have to get support from management to bring it in?

Charles Anderson: [00:27:36.04] Some of our best applications, our best users have done exactly this – they’ve snuck it into a previously just-Java environment. We have utilities to take a Rails application or a Sinatra application, bundle it up as a Java WAR file and deploy it on any server. Your dev ops team will probably never know the difference, because it just looks like another Java application.

Charles Nutter: [00:28:01.25] That makes perfect sense. Once it’s down to a WAR file, then it’s just another WAR file.

Charles Anderson: [00:28:08.18] Along the same lines of the question about whether you should use Ruby or Java frameworks, we also say if you’re in a Java environment then go for it, deploy it as a WAR file, it will work just fine. If you’re not in a Java environment or you don’t have a requirement to run on a normal Java application server, one of the Ruby web server libraries is probably gonna be just as good for you.

Charles Nutter: [00:28:33.08] Yes, and then you’re not even bound to a particular deployment model.

Charles Anderson: [00:28:38.18] Exactly. We wanted to fit both worlds. If you’re a Rubyist, it feels right, or if you’re a Java developer, it feels right.

Charles Nutter: [00:28:44.29] I’d like to take a brief sidetrack if we have time, to talk about another language on the JVM that you’ve had involvement with, which is Mirah. Briefly, what is Mirah?

Charles Anderson: [00:28:58.12] I wanted to create a language that looked and felt very similar to Ruby, but it would be statically typed, it would compile directly to JVM byte code, so it didn’t have any other library dependencies once you’d finish compiling. The idea was that Mirah might be an easier language for Rubyists who want to contribute to JRuby, than having them come in and try to write Java code.

That goal never really happened. We’ve continue to use Java code; pretty much everybody in the world can write Java, and it’s very well supported by tool. But Mirah the language has continued to live on. It’s a statically typed, Ruby-like language for the JVM, it has IDE support, there’s a number of folks using it for production applications. It’s a fun little language.

Charles Nutter: [00:29:45.28] Was the implementation of it significantly simpler than JRuby because of the differences? For example the string thing that we were talking about. You could just say the semantics are Java strings.

Charles Anderson: [00:30:01.03] Absolutely. Probably the biggest difference between Mirah and JRuby or Scala or Groovy or any of the other non-Java languages on the JVM is that it’s just a compiler. There are no core classes that we have implemented in Mirah, there’s no runtime library that’s required, there’s no jar file for Mirah. It’s intended to replace Java C in your compiler chain, in your build chain, but with this Ruby-like language instead of regular Java, the goal being we want to try and have a language that fits that niche — not even that niche, that’s what most people want, a language that just compiles down to a binary and doesn’t drag along this [unintelligible 00:30:44.26].

Places that you could use Java but maybe not Groovy because of the large runtime it brings – Mirah fits right in that spot.

Charles Nutter: [00:30:55.15] So start with a syntax that’s Ruby-inspired and then take out a lot of problems and produce something that compiles to Java directly, without a runtime.

Charles Anderson: Right. One comparison that’s been made is Mirah is the CoffeeScript of Java.

Charles Nutter: [00:31:14.21] That fits even better. I was thinking there is a version of Python – perhaps it’s RPython – that has the Python syntax; they took out a fair amount of the dynamic semantics so that they could then compile to native code.

Charles Anderson: [00:31:33.22] Exactly. RPython is probably a pretty good analog. RPython doesn’t have as much type declaration as Mirah does. Mirah follows the Scala or C# pattern, where you need to declare parameters to methods and probably return values, but that’s it. Whereas with RPython you don’t have to declare anything and their magic compiler can figure it all out. I never went that far, because one, doing full-system type inferences was beyond my abilities at the time I was working on Mirah, and two, I don’t particularly mind these Scala and C# approach of local type inference just requiring parameters to be declared. That was a nice balance, and it certainly made the compiler easier to write.

Charles Nutter: [00:32:20.24] That’s a great explanation of the trade-offs there, of starting with that Ruby syntax, but then let’s bring in some Java thing so it maps much cleaner and easier. Building on this overview of JRuby, I’d like to dive a little bit deeper into the JVM as a platform for JRuby. You’ve already hinted at some of these things, but is JRuby — or as you hinted at before, JRuby 9000 — implemented with its own intermediate representation, or does it compile to — Java source code I guess not, maybe Java byte codes? How does that work?

Charles Anderson: [00:33:12.04] I can give a basic overview of how it works. JRuby has an LALR parser (for the parser geeks out there). It’s a port of MRIs — Bison grammar — so a pretty much line-for-line port, both in the lexer and the parser part, that’s all written in Java. That takes Ruby code in and gives us a tree, an SAT. JRuby 1.7 then would just interpret that AST; in 9000 we run an additional compiler against that AST. That gets compiled into our own intermediate representation, basically our own bytecode form. Then we can run our own optimization passes on it, we can eliminate dead code, we can propagate values… All the cool tricks that optimizing compilers can do, we can do at the Ruby level.

[00:34:00.14] Then eventually, if a code runs enough and it gets hot, just like in the JVM, we will JIT it. In our case, that’s jitting to JVM bytecode. We’ll do a little bit more analysis on the code, run a few more optimization passes appropriate for bytecode generation, and then spit out JVM bytecode. The JVM from there will do a very similar process – that JVM code goes into the JVM’s intermediate representation, eventually it compiles down to native code… We’ve built a little VM on top of the JVM, just because we need the speed and startup time of an interpreter, but eventually get up to full performance of actual bytecode.

Charles Nutter: [00:34:44.00] When you say JIT, it’s Just-in-time compilation or transformation, right?

Charles Anderson: [00:34:53.23] Right. Just-in-time compilation, which for us is simply a counter. If a method gets called 50 times, we figure it’s hot enough to actually compile into JVM bytecode.

Charles Nutter: [00:35:03.13] What are the advantages of that strategy? It sounds like there are several pieces to it, but ultimately I suppose what you’re aiming for is performance.

Charles Anderson: [00:35:14.24] The advantages of having our own interpreter are mostly startup time-related. JVM languages that don’t have an interpreter — Groovy, JavaScript either in Rhino or in the Nashorn; I think Rhino might have had an interpreter mode as well, but Nashorn, the newer JavaScript implementation is all compiled, all bytecode — those implementations either have a big hit at compile time, and they have a compile time, like in Groovy; you can’t just run the Groovy but you typically build it all. If you don’t build it ahead of time, they have to not only parse this code in, read it into memory, they also have to then generate bytecode before they can even execute.

[00:35:58.00] After we parse, we’ve got our own compiler pass, but we can get going much more quickly by not having to generate the bytecode right away. And then only generate the bytecode for the things that really need it.

Charles Nutter: [00:36:10.00] Right, the things that are getting called frequently. When you do generate bytecode, is it difficult to produce bytecodes that the hotspot VM will recognize as something that it knows how to optimize? Is that something you’re trying to fit its pattern language?

Charles Anderson: [00:36:31.09] There’s a lot of work being done both in JRuby and on the JVM side by folks at Oracle to try and improve how we can describe code in JVM bytecode. Maybe that means better annotations or better typing systems. We do spend a lot of time looking at the JVM bytecode, looking at what the JVM actually does with it, and trying to make the two optimize the right way. That was part of the motivation of doing our own intermediate representation in JRuby 9000.

[00:37:05.00]There are things that we know about Ruby code that were too difficult to convey in JVM bytecode; the JVM didn’t see through it and didn’t optimize the way we wanted it to. Now with our own optimizing compiler on top of the JVM, we can do a little bit of that work for the JVM and use it as our native code backend.

Charles Nutter: [00:37:27.02] You mentioned earlier about the invokedynamic bytecode that was introduced in Java 7, and that was primarily to help languages other than Java, such as JRuby. Briefly, what does that bytecode do?

Charles Anderson: [00:37:41.22] invokedynamic was added in Java 7. After many years of work and discussion, JRuby was right alongside that development; we made sure it went well. invokedynamic is a bytecode that was added to the JVM initially to provide a way to do dynamic calls. Dynamic calls that the JVM could still optimize and see through just like other Java indications. Most JVMs have the capability of optimizing dynamic calls because of the nature of JAVA. Sometimes it has to be able to dynamically figure out what the target is, for example for an interface invocation. It doesn’t know at compile time what that actual type is going to be, so there’s plumbing in there, essentially like a dynamic language.

[00:38:30.19] It actually has grown to do a lot more than just dynamic calls. In Java 8 the support for lambdas (Java’s version of closures from other languages) are bootstrapped using invokedynamic, because you can run this code once, you can get it installed into the bytecode of the application and then optimize it just like any other piece of Java code. There are more features in Java 9 that are going to be using invokedynamic under the covers. Because it’s actually so flexible, we can continue to evolve the Java language without even adding anymore bytecodes.

Charles Nutter: [00:39:08.18] In the end for JRuby this probably improves performance. Anything else? Does it also cut down on code size?

Charles Anderson: [00:39:20.20] Performance definitely is an improvement. Sometimes it can take a little bit longer to warm up if we use invokedynamic rather than our own dynamic dispatch mechanisms, but the performance is usually three to five times faster than what we can do with our simulated dynamic call sites. So we get performance out of it; most of my benchmarking is done with invokedynamic, because it generally is far and away better than not using it.

[00:39:47.16] Code size also is reduced, because we can put more plumbing into the invokedynamic logic rather than having to use our own dynamic call logic. It shrinks down the amount of bytecode that we generate, but it may be more native code emitted once the JVM compiles it, because it’s doing a better job of optimizing, it’s inlining code where it wouldn’t before. That can be a trade-off. There’ll be more code executing at runtime, but less code loaded into the JVM.

Charles Nutter: [00:40:20.18] I hadn’t even thought of the difference between the Java bytecode size and the native size. How significant is it that a bytecode was added to the Java VM to support non-Java languages?

Charles Anderson: [00:40:37.21] Until then it was the most significant change ever, and there haven’t been a whole lot since. It’s the first time ever that any bytecode, any instruction was added to the JVM. That’s one of the reasons that when we worked on invokedynamic trying to design this feature, we really wanted to make sure it was as flexible as possible. Making changes to the Java language specification are hard; making changes to the JVM specification are ten times harder, because you need all of the JVM vendors to agree on this, and they all have to make an implementation of it, and there’s a lot more work that goes into it. It’s actually worked out really well to make this one instruction be super powerful. We haven’t had to add anything new to support things like lambdas.

Charles Nutter: [00:41:28.20] I was going to ask you if there were any other bytecode changes after yours, but I guess not. What’s the minimum version of the JVM that one needs to run JRuby? Do you need Java 7?

Charles Anderson: [00:41:42.01] We support two versions of JRuby right now. The 1.7 line, which is a couple of years old now – we’re going to maintain that until the end of this year. Most people were trying to get them to migrate over to JRuby 9000. JRuby 1.7 works on Java 6 and higher, but we removed the Java 6 support for JRuby 9000 because we really wanted to use invokedynamic more. So Java 7 and up if you’re going to be using JRuby 9.x.

Charles Nutter: [00:42:09.21] You already mentioned the Java 8 added lambdas, and since Ruby and therefore JRuby have always had lambdas and closures. Do the Java 8 lambdas affect JRuby?

Charles Anderson: [00:42:23.27] We get this question quite a bit. The interesting thing about lambdas in Java 8 is that because of invokedynamic there are almost no JVM changes and no JDK changes really required to support it. A lambda is mostly just an on-the-fly interface implementation, like anonymous inner classes where before it just gets its own little class in memory, that class implements some interface and gets passed into whatever API you’re calling. We’ve already done that as well.

If you have a Java call that you want to make from Ruby and it takes an interface, we will automatically make the Ruby code implement that interface for you. We’re essentially doing most of what Java 8 lamdas already did at a language level, and nothing that Lambda added was necessary to support those same features.

Charles Nutter: [00:43:16.22] You don’t use the Java 8 lambdas because you’re omitting bytecodes, and the only bytecode there is invokedynamic, right?

Charles Anderson: [00:43:25.28] Exactly. We have very similar to Lambda – when you have a closure in Ruby, we bootstrap that invokedynamic just like the Java 8 compiler does. They are very similar techniques, we’ve just been doing them since before Java 8 even started adding lambdas.

Charles Nutter: [00:43:44.08] Since once value of using Ruby on the JVM via JRuby is that you can use existing Java libraries like you mentioned, what are some of the issues involved with inner operating with Java and Java libraries? You just mentioned the whole concept of dynamically generating an interface.

Charles Anderson: [00:44:08.06] There are a few things. Obviously, the dynamic nature of Ruby. At times you may have to give some more hints, calling Java code, to know which types are actually getting passed in. Say that I need to call the string version of this method rather than the numeric version. We tried to guess as much as possible based on the actual types that are coming in, you probably meant to call [unintelligible 00:44:31.06] version versus the long version, but sometimes you’ll need to be more explicit about that.

[00:44:42.24] A lot of Java libraries assume that you’re going to have a normal Java class somewhere in memory, that you can instantiate and call methods on statically. Obviously, Ruby classes are built dynamically as your application loads, so normally there is no Java class that goes along with a particular Ruby class. That causes some problems integrating with libraries.

We also have our own object hierarchy, which means if you want a Ruby class to extend a Java class, we don’t have multiple inheritants on the JVM, so we have to do some clever indirection. Essentially make a little proxy object that actually extends the Java side, and figure out how to route calls around. Mostly it ends up being around the type system. The way that classes are structured and the assumptions that the rest of the JVM makes about what a class or what an object looks like.

Charles Nutter: [00:45:40.28] Does the interoperability work better in one direction than the other? I would think JRuby calling into Java would be a little easier than Java calling back into JRuby for some of the reasons you’ve just mentioned.

Charles Anderson: [00:45:54.02] Exactly. It is much easier to do the Ruby calling Java side of that. It’s possible to do the Java side calling Ruby; we usually recommend that if you want to do that, you have your Ruby code implement some interface, it will generate a nice class for it that fits into our object hierarchy, and it will look like a normal interface implementation to Java code. If you have to do a straight up dynamic call into a Ruby object, we’ve got some utility APIs that you can call through that make that a little bit easier. The interface implementation way smoothes most of the problems of calling from Java into Ruby.

Charles Nutter: [00:46:29.24] Is it practical to interoperate between two non-Java languages? Suppose I’ve got JRuby and Scala, or JRuby and Jython.

Charles Anderson: [00:46:42.18] We’ve talked about this at the JVM language summits, at Java One, those of us that work on JVM languages and the folks that work on the JVM itself. We’ve tried to come up with clever ways that we can make language interop better; it always seems to boil back down to Java is the lingua franca that we all end up supporting.

If you’re calling from JRuby into Scala with our array type, that array type is probably going to be converted into a simple Java array, and then turned into whatever Scala uses as an array type. We always go through Java as our standard set of interop types. That works pretty well most of the time. Most of the languages that are around the JVM have very good integration with Java itself, so as long as we’re going through Java, the languages can talk to each other.

Charles Nutter: [00:47:38.22] That makes perfect sense. The JVM is there for Java, and it’s a lingua franca. Pulling out into bigger picture things, when we were talking about various different languages on the JVM, I tend to think of one group of them as professional languages or successors to Java (would-be successors). We’ve talked about Scala, Groovy, Kotlin, Ceylon… Do you think the existence of these other languages helps or hinders traditional Java?

Charles Anderson: [00:48:19.04] I’m coming from a biased standpoint, being one of the professional language developers on the JVM. I think it absolutely helps Java. It helps in that we get JVM improvements for different language patterns that may never have been considered until they were available, for example invokedynamic; there was very little interest in working on invokedynamic until there was a breakout hit dynamic language on the JVM, JRuby. Suddenly it made sense to actually start pushing on dynamic dispatch, which ended up helping Java in Java 8 lambdas and other Java 9 features.

It has also helped encourage Oracle to get the Java train rolling again. Lambdas had been wanted in Java for a long time, but until we saw people going to Clojure and Scala and Ruby to get them, there wasn’t a lot of energy put into it. Once those languages became well established, everybody saw how powerful closures were; Oracle had to put the resources in time into actually getting closures on the Java language as well.

[00:49:27.27] I think it brings more people to the platform and it makes the platform better. It drives the platform to improve.

Charles Nutter: [00:49:34.08] That makes perfect sense. At some point in time people picking up these third-party languages and what not for whatever their features are, like being able to do closures, that would then be a driving force behind adding those things to Java. In my own personal experience, what you are talking about makes perfect sense. When I first looked at closures, I was like, “Meh.” But then, when I started working full-time or nearly full-time with a language that’s supported, I was like “Wow, this is pretty cool.”

Charles Anderson: [00:50:10.19] Yeah, absolutely. One pattern that I’ve seen over the years, now that we have these other alternative JVM languages, there’s tens or hundreds of thousands maybe of people out there who never would have touched the JVM if it was only Java. But they will use the JVM with Clojure, they will use the JVM with JRuby. It’s brought people in that never would have even been here, so it can only be good for the platform, as far as I’m concerned.

Charles Nutter: [00:50:37.25] Yes, it’s expanding the tent, so to speak. In the recent few years there have been a few new native languages (Go, Rust). Do you think these pose a challenge to JVM languages, or are they because they’re native more targeting domains like databases or operating systems? Is it an apples and oranges sort of thing?

Charles Anderson: [00:51:05.13] They bring a lot of interesting new features. The two that you mentioned, Go with its event reactor as part of the core language, it’s able to crank up all these coroutines, run many different virtual threads of execution on much fewer native threads. That’s definitely a challenge for the JVM. I’ve read through the Go runtimes code, and you essentially need to design that sort of micro-threading pattern into the runtime from the beginning. There are some micro-threading libraries for the JVM, but they’re all sort of hacky (they use bytecode tricks), they’re not really doing native, VM-level thread tricks like Go does. But the platform could do that; the platform could get support for true native coroutines just like Go has, and hopefully the interest in Go and the demand for that sort of feature will drive the JVM to do it, too.

[00:52:08.23] On the Rust side, if you’re doing Rust, that’s mostly unmanaged code. It would probably be difficult to bring it over to the JVM. Everything’s always going to be garbage-collected. Any unmanaged APIs for memory stuff would have to be emulated in some way – pointer tricks, allocate and de-allocate, that kind of stuff doesn’t fit into the JVM pattern well. But other language features that are in Rust could certainly be done on JVM.

I have done little prototypes of Go coroutines, I’ve wanted to try and do a Rust on the JVM (maybe with a little-reduced features), but the interesting thing about all of these off-platform languages is that they’re showing us new ways of looking at problems. JRuby is the most successful off-platform language to come to the JVM. If you look at Scala and Clojure and Groovy and Kotlin and all the other professional languages on the JVM, they were written for the JVM to begin with. JRuby, by virtue of needing more out of the JVM has helped drive the platform probably more than those languages, with native support and with dynamic invocation.

As those other languages get popular and show us cool features, the JVM community is going to want those features and will probably see something come around.

Charles Nutter: [00:53:30.14] You’ve already answered this question, but what do you think about the future or the outlook for Java, the non-Java languages and the JVM itself?

Charles Anderson: [00:53:44.05] Ignoring politics around Java and the JVM, since that would be worthy of an entire other podcast, tech-wise I think the JVM and the Java platform are incredibly healthy and continuing to grow. They are bringing in language developers doing all sorts of new cool stuff, with some unusual languages in some cases. I don’t know if the JVM will outlive Java; Java is kind of the sea of the JVM. Pretty much everything is Java at some level, even if it’s just going to bytecode, but I think the JVM is going to survive a long time. We have a real open source implementation that anybody can use and build and modify and distribute, we have a lot of different languages that are available and more being created every day. This is the best platform available for new language work, and it’s going to continue to be successful for a long time.

Charles Nutter: [00:54:42.24] Great. A platform for new language development – that’s exactly the theme of this show. In closing, is there anything else that you’d care to mention about the JVM as a language platform?

Charles Anderson: [00:55:00.15] Folks that are interested in building a new language, if you want a managed runtime, you don’t want to have all the objects’ management yourself, you don’t want to have to write a JIT and a JC. It’s definitely the best platform to look at. It’s certainly the best-managed runtime that’s out there, and by far the best open source managed runtime that’s available.

I would encourage folks that are JVM/Java developers to have a look at JRuby. Take a look at what people are doing with Ruby, try it out, and know that you can still call all the same libraries and deploy to the same servers if you need to, but you can get the benefits of frameworks like Ruby and Rails, and probably become a better developer by trying out some new patterns in your own apps.

Charles Nutter: [00:55:45.24] Where can people get a hold of you or follow your work? I’ll include links to anything, so don’t feel obligated to spell any URLs out.

Charles Anderson: [00:55:57.20] I welcome anybody to tweet questions at me. If you want to have a private conversation, you can find me at [email protected] on pretty much any of the chat services. The JRuby project itself is homed at JRuby.org, which has links to download, links to our GitHub project site, to a wiki, and all of our issue trackers. JRuby.org will get you there. If you are already a Ruby user and you’re using one of the Ruby switchers, like RVM, Ruby Build, Ruby Install – those all support JRuby, too. It should be a matter of RVM Install JRuby, and you’ve got JRuby locally to play with.

Charles Nutter: [00:56:38.02] The JRuby project – open source… I assume it’s an open community, welcoming.

Charles Anderson: [00:56:44.17] Yes, absolutely. It’s all open source, it’s all on GitHub under the JRuby organization. We’ve got JRuby itself and a bunch of other supporting projects, and we are always looking for folks to help out. There’s several hundred open items in the bug tracker that can be worked on, and we’ve even marked a few for new contributors. Try out JRuby and maybe you can help us make it better.

Charles Nutter: [00:57:10.09] Great. Thanks for your time, Charlie. I’ve really enjoyed our discussion, and I hope our listeners will, too. This is Charles Anderson for Software Engineering Radio.

Join the discussion

You must be logged in to post a comment.

2 comments

Jörg W Mittag says:

June 10, 2017 at 2:12 pm

“the JRuby language” – Ah, it always hurts when I see that. JRuby is not a language. JRuby is a language *implementation* of the Ruby language.

A compiler and a language are two different things.
Pablo Adames says:

April 22, 2022 at 6:40 pm

Excellent talk

SE Radio 266: Charles Nutter on the JVM as a Language Platform

Show Notes

Related Links

Transcript

Join the discussion

2 comments

More from this show

SE Radio 675: Brian Demers on Observability into the Toolchain

SE Radio 674: Vilhelm von Ehrenheim on Autonomous Testing

SE Radio 673: Abhinav Kimothi on Retrieval-Augmented Generation

Menu

Recent posts

Search

Search

SE Radio 266: Charles Nutter on the JVM as a Language Platform

Show Notes

Related Links

Transcript

Join the discussion

2 comments

More from this show

SE Radio 675: Brian Demers on Observability into the Toolchain

SE Radio 674: Vilhelm von Ehrenheim on Autonomous Testing

SE Radio 673: Abhinav Kimothi on Retrieval-Augmented Generation

Menu

Recent posts