27
Jan
08

The magic of the web

That’s terrific! I’m here in my living room, trying to sort out some of the ideas in my head and putting them in this blog in the hope it would help, and then, someone replies and even proves me wrong! There’s no place to hide one’s ignorance, not even on the web :o )

The other day I was trying to clearly describe why I did not like the JVM as a platform for language experimentation, and I got a comment from a reader. Imagine that: a person who read my non-native english (may I say bytecode english?) and took the time to respond. That’s nice. What’s not so nice is that the comment is mostly a rebuttal of my arguments, and the worse part is that the rebuttal is right. My point was that for a dynamic language you needed to generate a new Java class for each new function and that because class data was kept forever (in permgen space, you see permanent generation) there would come a time when the JVM would run out of space. Well, Home: 0 – Visitors: 1. The first part (new class generation) still holds, but the second simply isn’t true. I did some research after the previous post to try to see the light.

The Experiment

My test program contains two classes:

  • the first one is the program driver, with a main method and a function that can create new classes on the fly, with almost nothing in them but a constructor and a static function. This is done using the nice ASM library.
  • the second one is a simple classloader.

The main function creates 100,000 classes and instantiates each one. There are two options: either keep all instances in an array, or simply discard them. In the first case, the JVM is forced to keep all classes around. In the second case, I thought that the JVM would keep the class data too. And I was wrong, at least for Sun’s JVM 6 on Linux. The complete run in the first case successfully crashes after having filled the permgen space. In the second case, the test completes and jconsole shows a sawtooth graph for memory in the non-heap and permgen space. I will try the same program on other JVMs, just to see if it was always like that and if other ‘vendors’ are as smart as Sun.

The conclusions

The permanent generation is not permanent after all: memory can be reclaimed from there too.

When you write false things in a remote corner of the web because you haven’t done your research, you might well be caught!

What’s next?

In a future post, I’ll try too be less stupid and enumerate irrefutable facts to support my view that the JVM is not the best platform for dynamic language experimentation.

The code

package my.test;

import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;

import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.MethodVisitor;
import org.objectweb.asm.Opcodes;
import org.objectweb.asm.Type;

public class Tester {

  // Change this value to force the vm to keep class data around
  private static final boolean KEEP_OJBECTS = false;

  // Initial sleep time to allow jconsole connection before the real work
  private static final int SLEEP_TIME = 20000;

  private static final int NB_CLASSES = 100000;

  public static class TCL extends ClassLoader {

    private Map<String, Class> classes;

    public TCL() {
      this.classes = new HashMap<String, Class>();
    }

    public void add(String name, byte[] bytecode) {
      this.classes.put(name, bytecode);
    }

    public Class<?> findClass(String name) {
      byte[] b = this.classes.get(name);
      return defineClass(name, b, 0, b.length);
    }

  }

  public static void main(String args[]) {

    try {
      Thread.sleep(SLEEP_TIME);
    } catch (Throwable t) {
      //
    }

    String name = "mypack.MyClass";
    Object[] objs = new Object[NB_CLASSES];
    for (int i = 0; i < NB_CLASSES; ++i) {
      String cname = name + i;
      Object o = createAndTestClass(cname);
      if (KEEP_OJBECTS) {
        objs[i] = o;
      }
    }
  }

  private static Object createAndTestClass(String name) {
    String iname = name.replace('.', '/');

    ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES
        | ClassWriter.COMPUTE_MAXS);
    cw.visit(49, Opcodes.ACC_PUBLIC, iname, null, Type
        .getInternalName(Object.class), null);

    // Constructor
    MethodVisitor mv = cw.visitMethod(Opcodes.ACC_PUBLIC, "<init>", "()V",
        null, null);
    mv.visitVarInsn(Opcodes.ALOAD, 0);
    mv.visitMethodInsn(Opcodes.INVOKESPECIAL, "java/lang/Object", "<init>",
        "()V");
    mv.visitInsn(Opcodes.RETURN);
    mv.visitMaxs(1, 0);
    mv.visitEnd();

    // method public String callMe() { return "[classname]"; }
    mv = cw.visitMethod(Opcodes.ACC_PUBLIC | Opcodes.ACC_STATIC, "callMe",
        "()Ljava/lang/String;", null, null);
    mv.visitCode();
    mv.visitLdcInsn("[" + name + "]");
    mv.visitInsn(Opcodes.ARETURN);
    mv.visitMaxs(1, 0);
    mv.visitEnd();
    cw.visitEnd();

    TCL cl = new TCL();
    cl.add(name, cw.toByteArray());

    try {
      Class c = cl.loadClass(name);
      Method m = c.getMethod("callMe");

      String ret = (String) m.invoke(null);
      System.out.println(name + ".callMe() -> " + ret);

      return c.newInstance();

    } catch (Throwable t) {
      t.printStackTrace();
    }

    return null;
  }
}
25
Jan
08

What’s on my mind, or « Me against the Java Virtual Machine »

My current idea is, as any respectable programmer, to create my own language, and as a first step, my own virtual machine. I’d very much like to create the successor to the JVM, at least in terms of popularity. I know it will not happen, but I have to keep a goal!I like a lot of elements of the JVM but I also think it’s not a general enough virtual machine. I like its portability and ubiquity, its acceptance in the business, its more than decent performance and its embryo of reflectivity. What I hate is its incestuous relationship with Java-the-language, its memory footprint and its inability to cope with low-level details.

The relationship to the language might have been a good technical choice in the beginning, but now that people try to run other languages on it, you can see all the unnatural things they have to do to adapt to the Java class model. If you want to get all the performance advantages of Java, you cannot write a simple interpreter: you need to either compile to bytecode or to generate it in the running application. Scala is an example of the first case, JPython, JRuby, Rhino and ABCL Lisp are examples of the second case. Scala’s case is not too terrible: it was their design choice to keep as close as possible to the Java class model, so there is no great mismatch. As for dynamic languages, this is where things go bad. Since the smallest element that you can load in the JVM is a class, you need to generate a class for each new individual function. All the class structure is ‘dead wood’ that you’re forced to carry. You don’t use member variables, instantiation, inheritance and all that stuff. And if you ever want to drop something you have loaded (let’s say you’re redefining an existing function) you’re hosed: the classloader keeps a reference to the previous class and you cannot drop it without dropping the classloader itself. This means that if you hope to drop something, you have to create an individual classloader for each new element you load…

I think this is bad, because what I have in mind is a system that you never have to stop or reboot. Something that lives, where you can change parts without having to stop the whole thing, something similar to what Alan Kay, Smalltalkers and Lispers advocate. You must be able to add, change, and remove parts, anytime.

About the memory footprint, I don’t know if it can reasonably be avoided, but I will try to keep an eye on it. I know that some Lisp systems display the same symptoms and it might not be a real problem, but, well, I must be too old, already. Having had the “chance” to program on a 64k machine (Amstrad CPC 464/Z80) I am amazed at the size of current applications, especially in regards to what they do. We don’t program in assembler any longer, granted, and RAM is quite cheap, yes. It’s just me who cannot get used to it :o )

Finally, the inability to manage low-level details is something that’s annoying. There is no way to experiment with other memory allocation policies from within a program, no way to easily juggle with bytes and pointers (don’t even mention the missing unsigned integers). I know this is what makes Java and the JVM a safe platform. You keep the powerful and potentially dangerous tools away from the programmers, for fear that they break the machine. Doing that, you also prevent potential good uses. And from a Lisp point of view, you also make a metacircular implementation impossible. There is no practical way you can code a JVM in Java.I want to code my VM in my language.

I want to give all the tools to the programmer. At the very least, even if I hide the tools in the trunk, I want the programmer to know that he can find them easily. If he wants to tear apart the engine, it’s ok with me. Maybe (surely) one programmer will be better than me at tuning it, and that will be great.

16
Jan
08

Hi there.

First blog post but nothing to say. I’ll post ideas, thoughts and progress about things I do when programming.

My current interest is in new languages and VMs. Like every other programmer, I want to write my own language, write my own super-efficient VM for it, make it portable to all platforms, have lots of programmers use it and become famous (and rich) with it.

Ok. Still with me? That’s really nice of you. Well, I will probably never do all that, but at least I can try. I hope to learn a few things along the way, about other languages, compiler theory and practice, this kind of stuff.

I read a lot about existing and past languages, their design choices, their implementations, their deaths… It’s very interesting and very perplexing too. I am sometimes lost in the middle of all that. Language design is about choice: what to include, what to reject, how to do things, how to regard the programmer? That’s the difficult part. You have to exclude things. Personally, I’d like to include as much as possible, or make further extension possible. This is my first design decision: when facing a seemingly strong binary choice, try to support both ways. It seems like a non-decision, in fact, but let me illustrate that with examples.

Should the memory of the VM be garbage collected or not? Both! It should be possible to provide both garbage collected values and manually managed chunks of memory. Why? Because it is the only way to implement the garbage collector using the language and the VM. In Java, for instance, it is impossible to implement a custom memory manager, because all values are managed by the garbage collector and there is no way to access physical memory.

[to be continued]