Saturday, November 28, 2009

Scala: Under The Hood of Hello World

The exercise of printing 'Hello World' from a new programming language is so popular because it allows you to see quite a few things about a language straight away, including the style of the syntax, how to print output and the amount of boilerplate required just to start a simple application.

After getting Hello World to work, there's really two directions you could go: you can go forward, and start doing other stuff with the language, or you can go down, drill deeper, by asking the question "How does it work?"

While moving forward is certainly more productive and exciting in the short term, understanding what's going on underneath the covers of your code can be very beneficial and can lead to insights that may affect the way you write code for the rest of your programming life. For instance, if you've done a bit of Java and a bit of Scala, it may have occurred to you that Scala allows you to define objects (as opposed to classes), while Java has no concept of compile-time objects, so how does the Scala compiler map the concept of an object into Java's single-minded view of classes? We'll find out…

Opening the Hood with Scala
Unless you've managed to run a Scala Hello World program accidentally, you have probably read enough of some Scala tutorial, book or blog to know that Scala is compiled to Java Bytecode - the binary language that is read and executed by the Java Virtual Machine (JVM). This means that a good way to get an idea of what actually happens when we run a Scala program is to have a look at the bytecode that is generated when we compile the application.

So, here's your standard Scala Hello World program:

object HelloWorld {
def main(args: Array[String]) {
println("Hello World!")
}
}

If we compile this program and have a look at the output directory for classes, we'll find two Java class files:

-rw-r--r-- 1 graham staff 778 28 Nov 09:43 HelloWorld$.class
-rw-r--r-- 1 graham staff 607 28 Nov 09:43 HelloWorld.class

Two classes? Yep, two classes. We compile one "object" and we end up with two classes. Let's have a look at what's inside these classes.

If you want to see what's going on inside a Java class (or a Scala class or object that's been compiled to a Java class) you can inspect the class file with the JDK tool 'javap'. Depending on the options you provide to javap, it can print you out just a summary of the class' method signatures or you can see a pseudo-English translation of the all compiled bytecodes. Let's start by having a look at just the signatures:

[scala-tests] graham$ javap -classpath . HelloWorld HelloWorld$
Compiled from "HelloWorld.scala"
public final class HelloWorld extends java.lang.Object{
public static final void main(java.lang.String[]);
public static final int $tag() throws java.rmi.RemoteException;
}

Compiled from "HelloWorld.scala"
public final class HelloWorld$ extends java.lang.Object implements scala.ScalaObject{
public static final HelloWorld$ MODULE$;
public static {};
public HelloWorld$();
public void main(java.lang.String[]);
public int $tag() throws java.rmi.RemoteException;
}

Wow! So, we wrote one method - HelloWorld.main() - and we've ended up with 6 methods and one static field across two classes. Obviously not all of this code is relevant to the running of our HelloWorld program, so let's discuss some of the surrounding fluff and then put it out of mind.

What Is This $tag() Thing?
Probably the first obvious thing is that both classes have a method called $tag(). If we have a look at the Scaladocs for the ScalaObject trait, the base class of all classes and objects compiled by Scala, we'll see no mention of the $tag() method. However, if you have a look at the source of ScalaObject.scala, you'll see this definition, which is been in the Scaladoc:

/** This method is needed for optimizing pattern matching expressions
* which match on constructors of case classes.
*/
@remote
def $tag(): Int = 0

Basically, the $tag() method is a simple categorisation method akin to java.lang.Object's hashCode() that is used by Scala to make it's much-lauded pattern matching perform better. Interestingly, in the 2.8.0 branch of Scala, $tag() has been removed, so now that we understand it, we know that we don't really have to worry about understanding it any more!

What Makes It Run?
Having learnt enough to ignore the $tag() method, let's have a look at how our program runs. Your keen eye may have noticed that we have two main() methods - one on the HelloWorld class and one on the HelloWorld$ class. If you're lucky enough to have two keen eyes, you would have noticed that the main() method on HelloWorld is static, but the main() method on HelloWorld$ is not. The significance of this is that, though there are two main() methods, only one of the classes - HelloWorld - can be used to start the application. If I tried to start the application by telling the java command to start with the HelloWorld$ class, the JVM will happily tell me that I'm an idiot:

[scala-tests] graham$ java -cp .:$HOME/Library/Scala/Current/lib/scala-library.jar HelloWorld$
Exception in thread "main" java.lang.NoSuchMethodError: main

Instead of acting like an idiot, let's have a look at what this HelloWorld.main() method does. The javap command allows us to see the actual bytecode instructions that are contained within the class if we pass the -c option:

[scala-tests] graham$ javap -c -classpath . HelloWorld
Compiled from "HelloWorld.scala"
public final class HelloWorld extends java.lang.Object{
public static final void main(java.lang.String[]);
Code:
0: getstatic #11; //Field HelloWorld$.MODULE$:LHelloWorld$;
3: aload_0
4: invokevirtual #13; //Method HelloWorld$.main:([Ljava/lang/String;)V
7: return
...

If you've learnt a little bit about Java bytecode, reading what this method does is pretty simple…
0: This instruction retrieves the value of the static field HelloWorld$.MODULE$ and pushes it onto the stack. We can see both from this line (after the colon) and from the signatures we looked at above that the type of the MODULE$ field is HelloWorld$.
3: This instruction takes the first argument to the method - the String[] that represents the command-line arguments - and pushes it onto the stack.
4: This instruction invokes the main() instance method on the HelloWorld$ object that was pushed onto the stack at 0, passing it the String[] pushed onto the stack at 3.
At the heart of it, this is a pretty simple operation. If we were to write this method ourselves in Java, it would look like this:

public static void main(String[] args) {
HelloWorld$.MODULE$.main(args);
}

What Is This MODULE$ Thing?
Moving on, I think the next thing we want to find out is, what is MODULE$ and how is it initialised? Chances are you're pretty smart and you've probably figured this out already, so let's just cut straight to the bytes:

[scala-tests] graham$ javap -c -classpath . HelloWorld$
Compiled from "HelloWorld.scala"
public final class HelloWorld$ extends java.lang.Object implements scala.ScalaObject{
public static final HelloWorld$ MODULE$;

public static {};
Code:
0: new #10; //class HelloWorld$
3: invokespecial #13; //Method "":()V
6: return

public HelloWorld$();
Code:
0: aload_0
1: invokespecial #17; //Method java/lang/Object."":()V
4: aload_0
5: putstatic #19; //Field MODULE$:LHelloWorld$;
8: return
...

Again, reading this is pretty simple. Basically it's creating a singleton instance of HelloWorld$ which is stored in a public static field. There's a static block that creates a new HelloWorld$ object and there's a HelloWorld$() constructor, which does something which I find a little bit odd. If we translated it into Java, we'd have something like this:

public final class HelloWorld$ {
public static final HelloWorld$ MODULE$;

static {
new HelloWorld$();
}

public HelloWorld$() {
super();
MODULE$ = this;
}
}

Does that assignment in the constructor look a little weird to you? The constructor is setting the value of the static MODULE$ field. This is especially strange seeing as MODULE$ is a final static field. If you tried to use this Java code you'd find that it wouldn't even compile. At first I thought that this was so odd that I must have interpreted the bytecode incorrectly, so I wrote this little Java class to test out what happens when you call the HelloWorld$() constructor directly:

public class HelloWorldTest {
public static void main(String[] args) {
System.out.println(System.identityHashCode(HelloWorld$.MODULE$));
new HelloWorld$();
System.out.println(System.identityHashCode(HelloWorld$.MODULE$));
}
}

Sure enough, the output of running this code shows that every invocation of the HelloWorld$() constructor does indeed change the value of MODULE$, even though it's final:

2114843907
1179703452

That's a bit of an oddity, but there's a lesson there - never call the constructor of a Scala object from Java code - there is no telling what kind of havoc you may wreak if you instantiate a second instance of an object that is assumed to be a singleton within your Scala code.

Dude, Where's My Code?
Okay, so nothing we've examined up until now has had ANYTHING to do with the one line of imperative code that we wrote in Scala. But we're almost there, as there's only one piece of the double-main() puzzle left to examine - HelloWorld$.main():

[scala-tests] graham$ javap -c -classpath . HelloWorld$
Compiled from "HelloWorld.scala"
public final class HelloWorld$ extends java.lang.Object implements scala.ScalaObject{
...
public void main(java.lang.String[]);
Code:
0: getstatic #26; //Field scala/Predef$.MODULE$:Lscala/Predef$;
3: ldc #28; //String Hello World!
5: invokevirtual #32; //Method scala/Predef$.println:(Ljava/lang/Object;)V
8: return

And here we finally see the code that we actually wanted to run. This method is extremely similar to the main() method we saw in HelloWorld: It pushes the singleton instance of Scala's Predef object onto the stack, pushes the "Hello World!" constant onto the stack and then invokes the Predef.println() method.

Can You Repeat That In English?
Let's summarise what we found in the bytecode of this HelloWorld program. When you define a main() method on a Scala object, the Scala compiler splits the responsibility of running your application into two classes. One class has the same name as your object and contains a static method that is invoked by the JVM to start the application. The other class, which has the same name as your object except with a dollar sign ($) at the end, contains the actual code of your object's main() method in an instance method called main(), as well as a constructor and a public static field for creating and accessing a singleton instance of this special class.

If you want to read some more about why the code from you object ends up in a class that has a dollar sign at the end of its name, you might like to do some reading on companion objects in Scala.

Bonus Marks for Pointing Out Something Nerdy
There is one other little surprise in here that I skipped over and which I only noticed while looking at a decompilation of a Java HelloWorld program. The HelloWorld class generated by the Scala compiler has no constructor at all. Not one. It is im-possi-ble to create an object of type HelloWorld. This is a minor departure from the convention of Java where every class has at least one constructor, by virtue of the Java compiler generating one for you if you decide not to define one. This little difference has absolutely no effect on anything at all, but I always find it interesting when the Scala guys decide to contradict ideas that have been Java "laws" since what seems like the beginning of time. It almost feels like they're slaying Jack Harkness.

Friday, November 6, 2009

XML Generation with Scala

When I first heard the about a proposal to add native XML (a.k.a. XML literals) into Java, my first thought was: who writes XML in their code? I've created plenty of services that generate XML in my time, but they've all either used a template engine like JSPs or Velocity or been generated by an object tree, vis-a-vis JAXB. So my question became: who would even WANT to write XML in their code? It stank of a poor separation of concerns.

Today, however, I am eating my words. I've just written a small utility that uses Scala to generate the HTML of a simple web page, so I used Scala's native XML and it is Good. It's remarkably simple and intuitive, so I thought I'd share the love.

XML Literal Coercion

So, Scala essentially has XML literals, which means that you can just start typing XML in the middle of your code and the Scala compiler will automatically coerce it into an object that can be used for performing xml-ish operations. If you try this using the 'scala' interpreter, you can see some of what happens under the hood:

scala> val myXml = <test><tag/></test>
myXml: scala.xml.Elem = <test><tag></tag></test>

Note that there's no quotes or special charters around the XML - it's just XML. Scala automatically turns it into an Elem object.

But this isn't where the fun ends! The power of native XML comes from the fact that it's very easy to escape out of XML and into Scala code, right in the middle of your XML. You can escape into Scala simply to include some computed output in the XML, or you can do more crazy things like looping through a list and yielding a list of more Elem objects to be included.

Escaping to Scala: Text

From what I've seen so far, there are two slightly different ways to escape into Scala from an XML literal. The first is used when you want to escape into Scala to include text or another XML element and it looks like this:

val name = "Graham"
val xmlOne = <test>{name}</test>
println(xmlOne)

Output:

<test>Graham</test>

Escaping to Scala: Attributes

If, however, you want to escape at the XML attribute level, you would be WRONG if you tried to do this (like I did):

val xmlTwo = <test name="{name}"/>
println(xmlTwo)

You don't get a compilation error, but the "escaping" is not interpreted, so you get this:

<test name="{name}"></test>

The syntax for escaping an attribute properly looks like this:

val xmlThree = <test name={name}/>
println(xmlThree)

All we've done is drop the quotes, and now we get the output we want:

<test name="Graham"></test>

Escaping to Scala: Elements and Looping

Finally - this is where the power becomes really evident - you can escape into Scala and evaluate an expression that yields a list of Elems in order to include an iteration of values in your XML. Observe:

val hobbies = List("Scala", "Photography", "Cycling")
val xmlFour =
<test name={name}>
{for (hobby <- hobbies) yield <hobby>{hobby}</hobby>}
</test>
println(xmlFour)

Output:

<test name="Graham">
<hobby>Scala</hobby><hobby>Photography</hobby><hobby>Cycling</hobby>
</test>

Did you notice the really cool part of that? After we escaped out of the XML literal into Scala so that we could loop through the list, we then started another XML literal and then escaped out of that one to print the value 'hobby'! At that point we are nested 4-deep: Our outer XML literal contains nested Scala, which contains a nested XML literal, which contains more nested Scala. And yes, this could go on forever, down and down and down (just like the turtles). The great thing is that the syntax is so intuitive that you probably hardly noticed that we had gone that deep in this example - the code is simple and clear.

A Little Warning

Note that, when you're looping and yielding, there's nothing to stop you returning something that's not an Elem - Scala will simply call toString on whatever you return and insert it into your XML as text. For example, the following compiles and runs, but doesn't produce what we want:

object Test {
case class Scala;
case class Photography;
case class Cycling;

def main(args: Array[String]) {
val name = "Graham"
val xmlFive =
<test name={name}>
{for (hobby <- List(Scala, Photography, Cycling)) yield hobby}
</test>
println(xmlFive)
}
}

The output from this?

<test name="Graham">
&lt;function&gt;&lt;function&gt;&lt;function&gt;
</test>

Ewwww… (and, yes, the entities do appear in the output)

In conclusion: I used to think that native XML in Java was a stupid idea. Now I don't really care because Scala has it and it looks good and it works fine so whenever I need to produce XML from within code I'll just use Scala.

A Side-Note

If you're slightly genius, you're probably wondering why, in the last example, I showed my whole 'Test' object, whereas in all the other examples I just showed the contents of the main() method. The simple explanation is that I had originally defined the case classes in the main method, but I got this compile error:

Test.scala:19: error: forward reference extends over definition of value xmlFive
{for (hobby <- List(Scala, Photography, Cycling)) yield hobby}
^

To be honest, I wasn't (and I'm still not) exactly sure why this was a problem, but it seemed obvious that scalac was not happy for me to define a case class inside a method and then use that class in the same method (well, actually, I'm using its companion object). I wonder if the compiler injects the instantiation of the companion object at the end of the method, rendering it unusable throughout? That would seem a bit silly. Anyway, the obvious fix was to move the declaration of the case classes out of the definition of main(), and hence why I posted the whole class as the example.

Saturday, October 31, 2009

Scala Type Inference and Static Linking - An Accident Waiting to Happen?

Type inference in Scala is great. Anything that means I have to type less code is great! I'll also openly admit that I'm a die hard fan of static typing. When I think dynamic about typing, all I can think about is an object being validly sent to a block of code that was never meant to handle that object, and the compiler thinking that's okay. For me, that's just one more crack for bugs to slip through that our team doesn't need.

So, basically, I want it both ways - I want type security, but I don't want to have to specify it. But could this cause problems? If my answer was no, this would be a rather pointless blog entry.

An issue that comes to bear with Java and Scala is that along with the static typing comes static linking. I'm going to admit that I'm nowhere near enough of a language theorist to be able to tell you that these two are intrinsically paired, but I see signs that this is the case: Static typing means the compiler will check that the type of every expression is guaranteed to be the same type, or at least assignable to the same type, as the location to which you apply the result of the expression, be it a variable, a method parameter or part of another expression. Static linking is the way in which the compiler records in byte code which methods it will call. Where there are overloaded methods on the called object, the compiler will choose which of those overloaded method to call at compile time, based on the static type of the method parameters, rather than letting the JVM decide at run time which method to call based on the actual type of the parameter. So, the way I see it, if you didn't have static linking, then the static typing would only provide guarantees at compile time and not run time.

Here's a Scala example of the effect that static linking has on a program:

object StaticLinkingExample {
def main(args : Array[String]) {
val stringReference : String = "foo";
val objectReference : Object = stringReference;
printit(stringReference);
printit(objectReference);
}

def printit(value : String) { println("String version called") }
def printit(value : Object) { println("Object version called !!!") }
}

The output of this application is:

String version called
Object version called !!!

Notice that, even though 'objectReference' is clearly a String, the printit(Object) method is called. Why? Because the static type of objectReference is Object, and that's the only information that the compiler uses when deciding which method to link. This is static linking in action.

What has this got to do with type inference? Well, the thing about type inference is that, even though you generally don't have to specify the type that values have or that methods return, the type that is used is still very important to the JVM, because of static linking.

The potential for trouble comes to a head when you start changing code, because when you change code that's making use of type inference, you can very easily change the type of a variable or method without even realising. In Java, if you changed the type, you would realise straight away, because your compiler would tell you that the resulting type doesn't match your specified type. With type inference, this doesn't happen - the compiler will just invisibly change the type of the member for you.

Below is an example of how the maintenance of code using type inference could cause errors in your application. Imagine that I have several applications and each of my applications make use of a Foo, so I have all my Foo-related classes in a separate project (and JAR) which is used by each of my applications. As a classically-trained GoF Pattern programmer, I make sure all my applications get their Foo from a FooFactory. Here's the first version of my FooFactory:

trait Foo
class StandardFoo extends Foo

object FooFactory {
def newFoo() = new StandardFoo
}

Note that the type of newFoo() is not declared, but is inferred by Scala.

Now here's one of my applications that makes use of the FooFactory:

object FooApplication {
def main(args : Array[String]) {
val result : Foo = FooFactory.newFoo()
println(result);
}
}

I run my application, and the result is as expected:

StandardFoo@7ca3d4cf

But when I deploy this code into production, I find the performance is rather bad. I sit and think about it for a little while and realise there is a much better way to implement Foo, so I write a new, AdvancedFoo and change my FooFactory:

trait Foo
class StandardFoo extends Foo
class AdvancedFoo extends Foo

object FooFactory {
def newFoo() = new AdvancedFoo
}

The performance problem has to be fixed before the weekend, but it's Friday night and I really want to get home to my wife and kids. I know my application won't break, because I've used a Gang of Four pattern to hide the implementation details from my applications, so I just build the FooFactory library version 2, chuck it into production and go home.

On Monday morning, I look at the results from the weekend's FooApplication run and, to my absolute surprise and horror, I find this:

java.lang.NoSuchMethodError: FooFactory$.newFoo()LStandardFoo;

What happened? No such method? My FooFactory still has a newFoo() method. But the last part reveals the problem: StandardFoo. If we have a look at the byte code, we'll see the source of the problem. First, let's have a look at the output of running javap on on the first version of FooFactory:

Compiled from "FooFactory.scala"
public final class FooFactory extends java.lang.Object{
public static final StandardFoo newFoo();
public static final int $tag() throws java.rmi.RemoteException;
}

And let's compare that to the byte code from version 2:

Compiled from "FooFactory.scala"
public final class FooFactory extends java.lang.Object{
public static final AdvancedFoo newFoo();
public static final int $tag() throws java.rmi.RemoteException;
}

Okay, so newFoo() was returning a StandardFoo, and in version two the return type is AdvancedFoo. It's tempting to think that shouldn't matter. The type of the 'result' val in FooApplication was still just Foo, so it should happily accept whatever newFoo() returns, shouldn't it? Let's have a look at the byte code of FooApplication's main method:

public void main(java.lang.String[]);
Code:
0: getstatic #26; //Field FooFactory$.MODULE$:LFooFactory$;
3: invokevirtual #30; //Method FooFactory$.newFoo:()LStandardFoo;
6: astore_2
7: getstatic #35; //Field scala/Predef$.MODULE$:Lscala/Predef$;
10: aload_2
11: invokevirtual #39; //Method scala/Predef$.println:(Ljava/lang/Object;)V
14: return

And there's the problem, clear as day, on the line starting at byte 3. FooApplication is statically linked to a version of newFoo() that returns a StandardFoo. Did you know that the JVM considers the return type as part of the method signature? A method with the exact same name and same arguments but with a different return type is a different method. So version 2 of the FooFactory doesn't actually contain the method to which FooApplication was statically linked - and hence my NoSuchMethodError.

The quick-fix solution to this is pretty simple: FooApplication needs to be recompiled. No change to the source is necessary, but the recompilation will re-write the byte code with a static link to the new version of newFoo() that has an AdvancedFoo return type.

So, that's a fix to this instance of the problem, but it won't stop it from happening again (when I upgrade to GridFoo!). How would we stop it again? Sadly, for "less is more" coders like me, the solution is to not use type inference, at least in locations where you want to hide implementation details. Basically we need to follow Item 34 from Effective Java (1st Edition): "Refer to objects by their interfaces". While in Java this principle mainly protects you from having to make source changes if you change the return type, in Scala the source would typically not require any change but, without recompilation, the statically-linked byte code is still incompatible. If our original implementation of newFoo() had a return type of Foo like such...

object FooFactory {
def newFoo() : Foo = new StandardFoo
}

… then the method signature of FooFactory version 2 would have been identical to version 1 and the statically linked FooApplication wouldn't have known the difference.

In some ways this is obviously a bit of a straw man: I changed the version of a library and didn't recompile my code. But in the world of large projects, complex dependency graphs and (gasp) transitive dependencies, there's always a chance for bizarre things like this to crop up as other people change things and don't let you know.

I think an argument could even be made here for not using type inference at all, or at least not on public methods in projects that are depended on by others. If this sounds a bit Draconian, consider this: the point of type inference is mostly to relieve you from having to write or think about type declarations. But if you now have to think about when you do and don't want to rely on type inference, half that advantage is lost, and it may be easier to just to declare your types every time.

Sunday, September 27, 2009

Scala: Beyond Hello World

I overheard someone at work last week asking another guy how he would go about implementing string-wrapping in Java. Memories came flooding back of university functional programming assignments where we had to use Miranda to shape nursery rhymes into a trapezoid.

I listened to the continuing conversation and realised that the solution being discussed - which involved seeking through the string, remembering where the last whitespace was and cutting off substrings at appropriate points - was obviously imperative in nature. "Not that there's anything wrong with that." If I wasn't reading a book about Scala programming at the moment I would have joined right in. But my functionally-leaning mind instantly started to wonder what the functional solution would be.

It sounded like a simple but interesting problem to try and solve with a new language - a nice "real" program to attempt after Hello World. So I sat down after work and whipped up a small Scala program to wrap words. I have to admit, it took a bit longer than I expected - over an hour. Though a large part of that was spent looking up docs for the List class and trying to solve new and bemusing syntax errors, I do recall getting distracted by an investigation into when tail call optimisation does and doesn't occur.

If you're new to Scala and looking for a little problem to help you learn some syntax and waste an hour, I can recommend word-wrapping as a suitable foe. My solution is below. If you come up with your own, I'd love to know what you did differently!


object Wrapper {

type Line = List[String]

def main(args: Array[String]) {
for (i <- 1 to 31) print(if (i == 31) '\n' else '.')
val s = "This is a really, really long line of text that, hopefully, " +
"will be long enough to sufficiently test a function that must " +
"divide a really, really long string into lines that are no " +
"more than 30 (that's thirty) characters long."
for (line <- wrap(s)) {
for (string <- line) {
print(string + " ")
}
println
}
}

def wrap(stringToWrap : String) : List[Line] =
wrap(List.fromArray(stringToWrap.split(" ")), 30, List(List()))

def wrap(stringsToWrap : List[String],
maxLineLength : Int,
lines : List[Line]) : List[Line] =
stringsToWrap match {
case Nil => lines.reverse
case nextString :: remainingStrings =>
if (lineLength(lines.head) + nextString.length <= maxLineLength)
wrap(remainingStrings, maxLineLength,
(lines.head ::: List(nextString)) :: lines.tail)
else {
wrap(remainingStrings, maxLineLength, List(nextString) :: lines)

}
}

def lineLength(s : Line) : Int = s match {
case Nil => 0
case head :: tail => 1 + head.length + lineLength(tail)
}

}

Sunday, September 20, 2009

Missing Parameter Type For Expanded Function?

I just made a silly Scala syntax mistake and got a fairly unhelpful compiler error message (often the case with newer languages). Google wasn't much help, so I thought I should post up the solution to help others save time in the future.

The code I wrote was essentially this:

def main(args: Array[String]) {
println(for {s <- List("One", "Two", "Three")} yield _) }

And the error I got was this:

error: missing parameter type for expanded function ((x$1) => List("One", "Two", "Three").map(((s) => x$1)))
println(for {s <- List("One", "Two", "Three")} yield _)

My novice mistake was in assuming that I could use an underscore to access the variable from the 'for' comprehension. Simply changing that underscore to 's' fixed it good:

def main(args: Array[String]) {
println(for {s <- List("One", "Two", "Three")} yield s) }

Something else worth noting is that the command-line scalac was actually much more helpful than my IDE, because it printed a third line with a caret pointing straight at the underscore, while the IDE only told me what line the error was on.

Thursday, September 17, 2009

Hey Scala, Where's My Ternary Operator?

While still playing around with Scala Hello World programs, I thought I'd do a little Java integration test so I wrote this awesome program:
println(Math.random() > 0.5 ? "Hello World!" : "Goodbye World")

The result, of course, was neither "Hello World!" nor "Goodbye World" (lest this be a very short blog), but...
error: identifier expected but string literal found

Huh? Why wouldn't that work? Is there no ternary operator? How primitive!

Of course, it dawned on me without too much more wasted neurons: this is a functional language - everything returns a value! (It has been some time, but the memories are slowly starting to come back.) So, of course, it only makes sense that if/else returns a value as well. Voila:

println(if (java.lang.Math.random() > 0.5) "Hello World!"
else "Goodbye World")

And once you see it, you see how obvious it is - that this is just the way if/else should work. Why should any language need a quirky ternary operator when they all have a perfectly good if/else construct already?

On a related note, I'm feeling an increasing annoyance as I look at Java code at work all day. For example, today I was looking at code that essentially did this:

public List convert(List customers) {
ArrayList dtos = new ArrayList();
for (Customer c : customers) {
dtos.add(convert(c));
}
return dtos;
}

public CustomerDTO convert(Customer c) {
return new CustomerDTO(c.getCustomerId(), c.getName(), c.getBalance());
}


So, as soon as I saw that, I thought, "Man, if we were using Scala, I'd just have to write this:"
def convert(customers : List[Customer]) =
customers.map(c => new CustomerDTO(c.customerId, c.name, c.balance))

In fact, the expression is so short, you most likely wouldn't even bother to put it in a function. (Except to test it, or mock it.)

The brevity here isn't just "cool" (even though it is). This is an extremely simple piece of code; and an extremely common operation. The brevity of the Scala code helps to express how simple the function is, whereas the Java code for the same operation takes longer to read and could leave you wondering whether there was something you missed.

The most ridiculous thing about this comparison is that Java has been around for almost 15 years now! And still we have to write 5 lines of code to do ridiculously simple operations like this. I can't imagine it would be much work at all for a few smart guys to add the functionality that Scala displays here to Java, but it appears more and more that Java is slowing to a halt under its own sheer weight and the burden of keeping 6 million-odd developers happy. (As if writing the above code over and over again is the way to do it.)

R.I.P. Java?

Saturday, September 12, 2009

New to Scala? Study the Syntax. CAREFULLY.

So, it only took me about five seconds to go from having a Hello World Scala program working to having a 10% more complex Hello World program failing in a way that I had no idea how to fix.

Here's my program:

object HelloWorld {
def main(args: Array[String]) {
println(testfunc)
}

def testfunc() {
"Hello World!"
}
}

and the output of this awesome program? Just this...
()


If you're a seasoned Scala programmer, you can probably see already why I'm such an idiot, but there's an important point to be made here.

I was trying to test out two things with this example. First, this being the first time I've done functional programming in almost ten years, I wanted to "practice" the idea that the last expression in a function is the return value. Yes, I wanted to practice not writing "return". Secondly, I had read that Scala let's you invoke functions without parentheses, so I thought I'd give that a go by calling "testfunc" instead of "testfunc()".

But neither of these were the problem here. The problem was that, during all my Java-ingrained, time-poor reading of Scala tutorials, I'd had seen but never cared to take particular notice of one small Scala detail: function declarations and function definitions are separated by an equal sign:


object HelloWorld {
def main(args: Array[String]) {
println(testfunc)
}

def testfunc() = {
"Hello World!"
}
}

This code produces the output I was expecting at first: that magic phrase "Hello World!"

I'm actually really glad this happened. It wasted ten minutes of my time, but it taught me a real lesson that I've forgotten through not learning new languages often enough: sit at the computer and type the examples in. No amount of reading examples will actually make you understand a language or teach you how to write it. Only by sitting down, typing in every symbol from an example and understanding what it does (and figuring out where you stuffed it up) can you hope to become proficient.

There is a corollary lesson here, too: no matter how much people tell you Scala is like Java and how much you want to believe it, Scala is NOT Java. There are little differences which, as in this example, can make a HUGE difference. My approach from here on in will be to try and forget about "migrating" my knowledge from Java (the language) to Scala, and just approach Scala as an absolutely new language that needs learning from scratch.

Lastly: The thing that got me really confused is that my original code, complete with the nonsensical error, compiled and ran! To be honest, I still have absolutely no idea what that code is actually doing. If I change my 'println(testfunc)' to 'println(testfunc.getClass)', the output is:
class scala.Predef$$anon$1
So, I'm guessing that my function definition without the equal sign is somehow defining an anonymous type rather than a function, but I really am guessing - I actually have no idea. I'm looking forward to burying my head more and more into Scala syntax until one day I finally gain the knowledge to understand my mistake and jump up screaming "Aha!"

UPDATE:
Well, it only took a little more reading to discover what the original code is doing. Despite Scala being a primarily functional language, it still has the concept of "procedures", or what would be void methods in Java. These are methods that don't return a value, and they are defined using the exact same syntax as a function, except without the equal sign.

So, my original assumption - that my function was not being invoked but was instead being passed to println as some kind of anonymous type - was quite wrong. The function IS being invoked, but it doesn't return anything. But why does the code compile and run if the method returns nothing? That's because everything in Scala has a value. Even procedures that don't return anything still have a value! Crazy. (But fun.)

Friday, September 11, 2009

My First Scala Issue

Decided to try writing some Scala before going to a talk on it at my local JUG. That way I could put my hand up with the cool group when they ask "Who's actually written some Scala?".

scala-lang suggested that any of Eclipse, IDEA or NetBeans have mature plugins, so I downloaded IntelliJ IDEA 9.0M1, installed the plugin, pasted in someone else's Hello World example, hit CTRL-SHIFT-F10 and then sat back to watch in awe.

Unfortunately, there was no Scala joy. IDEA was kind enough to tell me that "Compilation completed with 1 error and 0 warnings" but wouldn't show me anything about what the error actually was. The line above was simple followed by a pretty little red 'X' icon, with no text next to it.

I Googled around a bit and found a few people - but not many - had had a similar problem. Maybe it was a Snow Leopard issue? Some JetBrains guys had suggested to someone else that the character encoding of their scala files mightn't be what Scala was expecting. So I spent half an hour trying to play around with the character encodings of my single file, my whole project and the compiler, but got nowhere.

In the end, I found a simple solution: I deleted IDEA 9.0M1, downloaded IDEA 8.1 and started again. Within 3 minutes the HelloWorld was compiled and running, with no sign of invisible errors. Hooray!

Lesson learned: Maybe it's better not to try and experiment with a fast-moving, non-core technology (e.g. Scala) in a pre-GA Java IDE release (IDEA 9.0M1).