My current idea is, as any respectable programmer, to create my own language, and as a first step, my own virtual machine. I’d very much like to create the successor to the JVM, at least in terms of popularity. I know it will not happen, but I have to keep a goal!I like a lot of elements of the JVM but I also think it’s not a general enough virtual machine. I like its portability and ubiquity, its acceptance in the business, its more than decent performance and its embryo of reflectivity. What I hate is its incestuous relationship with Java-the-language, its memory footprint and its inability to cope with low-level details.
The relationship to the language might have been a good technical choice in the beginning, but now that people try to run other languages on it, you can see all the unnatural things they have to do to adapt to the Java class model. If you want to get all the performance advantages of Java, you cannot write a simple interpreter: you need to either compile to bytecode or to generate it in the running application. Scala is an example of the first case, JPython, JRuby, Rhino and ABCL Lisp are examples of the second case. Scala’s case is not too terrible: it was their design choice to keep as close as possible to the Java class model, so there is no great mismatch. As for dynamic languages, this is where things go bad. Since the smallest element that you can load in the JVM is a class, you need to generate a class for each new individual function. All the class structure is ‘dead wood’ that you’re forced to carry. You don’t use member variables, instantiation, inheritance and all that stuff. And if you ever want to drop something you have loaded (let’s say you’re redefining an existing function) you’re hosed: the classloader keeps a reference to the previous class and you cannot drop it without dropping the classloader itself. This means that if you hope to drop something, you have to create an individual classloader for each new element you load…
I think this is bad, because what I have in mind is a system that you never have to stop or reboot. Something that lives, where you can change parts without having to stop the whole thing, something similar to what Alan Kay, Smalltalkers and Lispers advocate. You must be able to add, change, and remove parts, anytime.
About the memory footprint, I don’t know if it can reasonably be avoided, but I will try to keep an eye on it. I know that some Lisp systems display the same symptoms and it might not be a real problem, but, well, I must be too old, already. Having had the “chance” to program on a 64k machine (Amstrad CPC 464/Z80) I am amazed at the size of current applications, especially in regards to what they do. We don’t program in assembler any longer, granted, and RAM is quite cheap, yes. It’s just me who cannot get used to it
)
Finally, the inability to manage low-level details is something that’s annoying. There is no way to experiment with other memory allocation policies from within a program, no way to easily juggle with bytes and pointers (don’t even mention the missing unsigned integers). I know this is what makes Java and the JVM a safe platform. You keep the powerful and potentially dangerous tools away from the programmers, for fear that they break the machine. Doing that, you also prevent potential good uses. And from a Lisp point of view, you also make a metacircular implementation impossible. There is no practical way you can code a JVM in Java.I want to code my VM in my language.
I want to give all the tools to the programmer. At the very least, even if I hide the tools in the trunk, I want the programmer to know that he can find them easily. If he wants to tear apart the engine, it’s ok with me. Maybe (surely) one programmer will be better than me at tuning it, and that will be great.
Hi,
interesting post. Though, I wanted to correct something: you said Scala was the only bytecode generators of the languages you cited. This isn’t true:
JRuby has both a Just In Time JVM bytecode generator that keep working with the dynamic nature of Ruby and also an Ahead Of Time JVM bytecode compiler (I think it’s only useful when startup time matters). An other compiler from a subset of Ruby to the Java language is also on the way. Also Groovy is able to generate bytecode ahead of time like Scala I think. I think their is no JIT in Groovy but I’m not sure about it.
Finally, I don’t know if the classloader keeping references to classes is true for JRuby either. I would say it’s not cause the JRuby compiler is generating a lot of classes on the fly. Some of these won’t be used anymore later. Some times ago the JRuby had a problem with the classloader keeping reference to those classes in the permgenspace and thus bloating it. I think they just fixed it. In any case, that’s worth an other check.
Also, by the way, if you want more low level control for the JVM, why don’t you fork it instead of reinventing the wheel. While I don’t really like the idea, I think what Google did with the Dalvik JVM for its Android platform is better than inventing yet an other VM for yet an other language. OK, Dalvik is not a fork, it’s actually an other VM, but at least it’s still able to run a bytecode generated from the Java language. And that’s true Dalvik is driving some benefits (multiple applications in the same VM, saving memory).
Hope this helps.
Thanks for the comments.
I first thought I would reply here, but I think I will write a more thorough response as a new post.