Saturday, October 31, 2009

Scala Type Inference and Static Linking - An Accident Waiting to Happen?

Type inference in Scala is great. Anything that means I have to type less code is great! I'll also openly admit that I'm a die hard fan of static typing. When I think dynamic about typing, all I can think about is an object being validly sent to a block of code that was never meant to handle that object, and the compiler thinking that's okay. For me, that's just one more crack for bugs to slip through that our team doesn't need.

So, basically, I want it both ways - I want type security, but I don't want to have to specify it. But could this cause problems? If my answer was no, this would be a rather pointless blog entry.

An issue that comes to bear with Java and Scala is that along with the static typing comes static linking. I'm going to admit that I'm nowhere near enough of a language theorist to be able to tell you that these two are intrinsically paired, but I see signs that this is the case: Static typing means the compiler will check that the type of every expression is guaranteed to be the same type, or at least assignable to the same type, as the location to which you apply the result of the expression, be it a variable, a method parameter or part of another expression. Static linking is the way in which the compiler records in byte code which methods it will call. Where there are overloaded methods on the called object, the compiler will choose which of those overloaded method to call at compile time, based on the static type of the method parameters, rather than letting the JVM decide at run time which method to call based on the actual type of the parameter. So, the way I see it, if you didn't have static linking, then the static typing would only provide guarantees at compile time and not run time.

Here's a Scala example of the effect that static linking has on a program:

object StaticLinkingExample {
def main(args : Array[String]) {
val stringReference : String = "foo";
val objectReference : Object = stringReference;
printit(stringReference);
printit(objectReference);
}

def printit(value : String) { println("String version called") }
def printit(value : Object) { println("Object version called !!!") }
}

The output of this application is:

String version called
Object version called !!!

Notice that, even though 'objectReference' is clearly a String, the printit(Object) method is called. Why? Because the static type of objectReference is Object, and that's the only information that the compiler uses when deciding which method to link. This is static linking in action.

What has this got to do with type inference? Well, the thing about type inference is that, even though you generally don't have to specify the type that values have or that methods return, the type that is used is still very important to the JVM, because of static linking.

The potential for trouble comes to a head when you start changing code, because when you change code that's making use of type inference, you can very easily change the type of a variable or method without even realising. In Java, if you changed the type, you would realise straight away, because your compiler would tell you that the resulting type doesn't match your specified type. With type inference, this doesn't happen - the compiler will just invisibly change the type of the member for you.

Below is an example of how the maintenance of code using type inference could cause errors in your application. Imagine that I have several applications and each of my applications make use of a Foo, so I have all my Foo-related classes in a separate project (and JAR) which is used by each of my applications. As a classically-trained GoF Pattern programmer, I make sure all my applications get their Foo from a FooFactory. Here's the first version of my FooFactory:

trait Foo
class StandardFoo extends Foo

object FooFactory {
def newFoo() = new StandardFoo
}

Note that the type of newFoo() is not declared, but is inferred by Scala.

Now here's one of my applications that makes use of the FooFactory:

object FooApplication {
def main(args : Array[String]) {
val result : Foo = FooFactory.newFoo()
println(result);
}
}

I run my application, and the result is as expected:

StandardFoo@7ca3d4cf

But when I deploy this code into production, I find the performance is rather bad. I sit and think about it for a little while and realise there is a much better way to implement Foo, so I write a new, AdvancedFoo and change my FooFactory:

trait Foo
class StandardFoo extends Foo
class AdvancedFoo extends Foo

object FooFactory {
def newFoo() = new AdvancedFoo
}

The performance problem has to be fixed before the weekend, but it's Friday night and I really want to get home to my wife and kids. I know my application won't break, because I've used a Gang of Four pattern to hide the implementation details from my applications, so I just build the FooFactory library version 2, chuck it into production and go home.

On Monday morning, I look at the results from the weekend's FooApplication run and, to my absolute surprise and horror, I find this:

java.lang.NoSuchMethodError: FooFactory$.newFoo()LStandardFoo;

What happened? No such method? My FooFactory still has a newFoo() method. But the last part reveals the problem: StandardFoo. If we have a look at the byte code, we'll see the source of the problem. First, let's have a look at the output of running javap on on the first version of FooFactory:

Compiled from "FooFactory.scala"
public final class FooFactory extends java.lang.Object{
public static final StandardFoo newFoo();
public static final int $tag() throws java.rmi.RemoteException;
}

And let's compare that to the byte code from version 2:

Compiled from "FooFactory.scala"
public final class FooFactory extends java.lang.Object{
public static final AdvancedFoo newFoo();
public static final int $tag() throws java.rmi.RemoteException;
}

Okay, so newFoo() was returning a StandardFoo, and in version two the return type is AdvancedFoo. It's tempting to think that shouldn't matter. The type of the 'result' val in FooApplication was still just Foo, so it should happily accept whatever newFoo() returns, shouldn't it? Let's have a look at the byte code of FooApplication's main method:

public void main(java.lang.String[]);
Code:
0: getstatic #26; //Field FooFactory$.MODULE$:LFooFactory$;
3: invokevirtual #30; //Method FooFactory$.newFoo:()LStandardFoo;
6: astore_2
7: getstatic #35; //Field scala/Predef$.MODULE$:Lscala/Predef$;
10: aload_2
11: invokevirtual #39; //Method scala/Predef$.println:(Ljava/lang/Object;)V
14: return

And there's the problem, clear as day, on the line starting at byte 3. FooApplication is statically linked to a version of newFoo() that returns a StandardFoo. Did you know that the JVM considers the return type as part of the method signature? A method with the exact same name and same arguments but with a different return type is a different method. So version 2 of the FooFactory doesn't actually contain the method to which FooApplication was statically linked - and hence my NoSuchMethodError.

The quick-fix solution to this is pretty simple: FooApplication needs to be recompiled. No change to the source is necessary, but the recompilation will re-write the byte code with a static link to the new version of newFoo() that has an AdvancedFoo return type.

So, that's a fix to this instance of the problem, but it won't stop it from happening again (when I upgrade to GridFoo!). How would we stop it again? Sadly, for "less is more" coders like me, the solution is to not use type inference, at least in locations where you want to hide implementation details. Basically we need to follow Item 34 from Effective Java (1st Edition): "Refer to objects by their interfaces". While in Java this principle mainly protects you from having to make source changes if you change the return type, in Scala the source would typically not require any change but, without recompilation, the statically-linked byte code is still incompatible. If our original implementation of newFoo() had a return type of Foo like such...

object FooFactory {
def newFoo() : Foo = new StandardFoo
}

… then the method signature of FooFactory version 2 would have been identical to version 1 and the statically linked FooApplication wouldn't have known the difference.

In some ways this is obviously a bit of a straw man: I changed the version of a library and didn't recompile my code. But in the world of large projects, complex dependency graphs and (gasp) transitive dependencies, there's always a chance for bizarre things like this to crop up as other people change things and don't let you know.

I think an argument could even be made here for not using type inference at all, or at least not on public methods in projects that are depended on by others. If this sounds a bit Draconian, consider this: the point of type inference is mostly to relieve you from having to write or think about type declarations. But if you now have to think about when you do and don't want to rely on type inference, half that advantage is lost, and it may be easier to just to declare your types every time.