Tuesday, December 20, 2011

SodaTest 0.2.1 Released


I've just released the second milestone of SodaTest, the spreadsheet-driven integration testing framework.
A significant change with this release is that SodaTest is now available from the Maven Central repository. You should be able to access the latest versions of all SodaTest artefacts from Maven Central right now as org.sodatest : sodatest-* : 0.2.1
As well as a couple of small bug fixes, this release incorporates a number a improvements based on feedback from use of SodaTest in corporate environments. The key enhancments were:
  • Coercions can now be created in Java without any knowledge of Scala
  • Built-in support was added for coercing strings into Java enums
  • An implicit function is available for converting into a Coercion any function with the type: (String) => _
  • A CurrencyAmountCoercion and CurrencyAmountCoercionForJava are now available for coercing a variety of financial value formats into your own custom Currency or Money class
  • Reports no longer report a mismatch due to trailing empty cells in the actual output
  • A bunch of usability fixes such as handling unusual situations, reporting better error messages, correct usage messages, more scaladoc and READMEs.
So, if you've been holding out from using SodaTest wondering if anyone is using it for real, whether it's going to be improved over time, and whether you should even bother trying it out, the answers are yes, yes and yes!
(And in case you're wondering why this release is 0.2.1 and not 0.2, it's because 0.2 got lost in the first, failed attempt to release from Sonatype to Maven Central.)

Thursday, December 8, 2011

Will Typesafe Address Scala Criticisms?

If you've been paying attention to the Scala corner of the web over the last week, you'll no doubt have heard something about Yammer - an enterprise-focussed social network - switching from Scala back to Java.

If you've been on holidays and missed it all, basically what happened was that Coda Hale, the Infrastructure Architect at Yammer, wrote a private email to the Typesafe team giving loads of mostly critical feedback regarding Scala, but the email was subsequently leaked and became the talk of the town, among both Scala supporters, like David Pollak, and critics, such as Stephen Colebourne.


A Painfully Honest Critique

Firstly, let me summarise the main points in Coda's email:

* Yammer have decided to change their "basic infrastructure stack" from Scala to Java.

* The main reason is that Scala has "friction and complexity" that isn't offset by enough productivity gain.

* Some seemingly simple constructs in Scala, e.g. List, actually have very complex interfaces and behaviour that necessitate explaining a lot of concepts to beginners. The belief by some that this complexity can be ignored except by library authors has proven to be false in practice.

* The most vocal members of the Scala online community are generally very academic in their practice and discussion of Scala.

* Existing libraries seem to be being re-written just because people want to practice their use of advanced functional programming concepts, but these re-writes are then being recommended as the library of choice.

* The Yammer team eventually decided the most sensible way to deal with the Scala community was to ignore it.

* Learning Scala properly is important, but doing so is neither quick nor easy.

* Java is not going away, so the choice to use Scala is actually a choice to use Scala and Java, along with the interaction challenges that brings.

* The Yammer team had "endless issues" with SBT, and found themselves writing plugins to replicate Maven functionality.

* Switching to Maven solved most of their problems, but they found it has been marginalised by the Scala community, with poor support and constant encouragement to switch to SBT.

* The backwards-incompatibility of Scala major releases requires library authors to cross-compile,  which has resulted in library re-invention when authors are no longer committed to the project.

* The backwards-incompatibility also causes headaches for developers, which has resulted in many production Scala deployments not being upgraded to 2.9.

* Yammer found idiomatic Scala code performed very poorly. By eliminating for-comprehensions, avoiding Scala's collection library, avoiding closures and marking everything with private[this], they were able to achieve speed ups of 100x in some components.

So that was that. For what it's worth, Coda's email is also quite complimentary of Scala. I think he gives the impression that he likes it and would really like to keep using it, but the realities of making it work in a day-to-day production-deploying team have weighed them down so much that the benefits have been eclipsed.

A Frustratingly Measured Official Response

Not long after Coda's email became public, Typesafe published on their blog what is quite obviously a public response to the email, and it's that blog I'd like to take a closer look at today.

Here are the main points from the Typesafe blog:

* Typesafe are investing in Scala IDE support by contributing to the Scala IDE for Eclipse

* They are addressing the learning curve by developing training courses, but also provided references to some of the free resources on the web.

* 'scalac' is slow because it has more work to do than 'javac', but they are investigating using parallel and incremental techniques to improve speed, with a focus on improving sbt.

* If you don't want to (or can't) upgrade your Scala version, you can purchase commercial support from Typesafe to get access to long-term maintenance releases of previous versions.

* In order to reduce people's major-release upgrade issues, they are planning to fold a bunch of existing libraries into the Scala release .

* They take performance seriously, but are wary of engaging in premature optimisation.

* Typesafe are not just focused on the Scala language and core libraries but also on frameworks to help build applications more easily.

* They describe the Scala community using a couple of niceties as well as the words "opinionated" and "quirky".

* They say there's many places where Scala could be improved and they're hard at work doing so.

* They mention Yammer amongst a group of other companies with successful production deployments of Scala.



What Aren't They Telling Us?

What I think is most interesting about this response is the things it doesn't mention and the places where it's specific about what they're doing to solve issues.

Firstly, there's no mention of Maven. There is some commitment to improving compile times, but they seem to suggest that this will be for sbt users only. It has to be considered odd that Maven has been given so little attention by the chief proponents of Scala. One of the main advantages often cited for adopting Scala is its easy integration with Java, "so you can continue using all your favourite libraries". But the commitment to prolonging the life of people's Java API knowledge doesn't extend to the Java build chain. Should it? Why is Maven important? It's important because it works, and it's almost ubiquitous. It has had years and years of work poured into it and has been tested and improved by feedback from hundreds of thousands of users and plugin developers. It has a global ecosystem that is well understood. It has become a lingua franca; many people would know how to build their project with Maven but couldn't tell you how to build it with javac and jar. If you want to distribute your project to the world through Maven Central via Sonatype's Nexus repository, they've got fantastically detailed instructions for how to do so... with Maven. The Java-fluent world - which Typesafe want to convert into the Scala world - don't need another build tool. We need relatively minor improvements to Maven and its plugins to ensure we can keeping working without having to learn something completely new.

Secondly, Typesafe give no commitment to eliminating the backwards compatibility problems in future major releases. In fact, one of the parts of their workaround - the offering of paid support for people who want to continue using older releases - actually makes it in their interest to not solve the problem. The other half of their workaround - bringing certain libraries into the Scala release - will mean that, as a developer, your backwards compatibility issues will only be solved if your taste in libraries is similar to those of the Typesafe team. If you like a library that hasn't been deemed worthy of inclusion, you'll be left with all the same cross-compiling and "version-x-for-scala-y" dependency issues that you have now.

Thirdly, they seem to fob off Yammer's performance problems as an issue specific to their environment, suggesting that fixing such things for everybody would be "premature optimisation". I've heard this argument before: that it only makes sense for companies like Yammer and Twitter to spend time optimising tiny bits of code because it can save them millions across all their computers. But the same argument applies to small businesses just as well. If my business is growing rapidly, but I have limited cash flow, having a 100x performance increase is going to make a significant contribution to my bottom line if I can serve all of my customers on one machine instead of 100 machines. "Premature optimisation" is a phrase that should be reserved for application developers, not framework developers. Just optimise the hell out of it, please.

Finally: the community. While Typesafe dropped a small comment about the community in its response, its really not an issue that they can solve, though I do think it will greatly effect the measure of their success in the long term. Really, the comments in Coda's email should be a wake up call to the Scala community at large: a sizable Scala software development team, full of very smart people who've written some very scalable and successful software, found the Scala community to be so fractured and unhelpful that they eventually decided to ignore it and, I would assume, not be a part of it. That's not just a loss for the community, but also a warning, because for every developer that finds the community unhelpful and so decides to program Scala in his own microcosm, there will be numerous others who decide to not bother with Scala at all for the same reason. This alone could be a weighty enough issue to relegate Scala to a hobby language in the long term rather than seeing wide-spread adoption in the enterprise community.



Hope and Trepidation

When I originally read through Coda's email I was nodding my head in agreement and hoping that Typesafe would be listening carefully and would take it as an opportunity to shift their focus, maybe just temporarily, but long enough to knock over these major objections. Unfortunately, when I read through Typesafe's response, I didn't see any real hint that they plan to fix the problems that I find most annoying. In some cases I even saw excuses about why they won't be doing anything. I don't think "We're focusing on making sbt better" is  a response to "Why isn't there decent Maven support?". I know they've been meeting with influential Scala teams in the days since this response, I just hope they listen to what they've got to say.

So now I'm left feeling a little unsure about Scala. Will it continue to improve for me and people like me? People who like to use IDEA + Maven, not Eclipse + sbt, who are educated in creating applications rather than composing mathematical proofs? Or is the current state of the art in Scala, along with its broad range of minor but resonating annoyances, what I should expect to still be grappling with in five years time? Only time will tell whether Typesafe can navigate this wave of criticism and find a promised land of general popularity, but I think it is an indication that Scala's window of opportunity has entered the waning phase, and now is the critical time at which to capitalise on the extant interest.

What do YOU think?
Do you sympathise with the Yammer team's frustrations?
Are you confident that Typesafe will listen and respond to current criticisms with relevant and prompt action?

Wednesday, October 26, 2011

Top 10 Reasons Java Programs Envy Scala


I presented a talk at the 2nd meeting of the new ScalaSyd Meetup last night. I talked through the "Top 10 Reasons Java Programs Envy Scala" in an attempt to give Java developers a taste of some little things that could make them much more productive if they switch to Scala.

If you'd like to watch the Prezi, or listen to the talk (or both!), here are the links:

Presentation: Top 10 Reasons Java Programs Envy Scala (Graham Lea)

Audio Simulcast: Top 10 Reasons Java Programs Envy Scala (Graham Lea)

At the end of the talk, I mention that people who are tempted to learn Scala might want to look at my blog post, Graham's guide to learning Scala.

If you'd like to have a look at the code examples used in the talk you can view them on or clone them from GitHub.

Tuesday, October 11, 2011

Presenting at ScalaSyd, October 26

In two weeks' time I'll be presenting a talk at the recently-formed ScalaSyd Meetup. The title of the talk is "Top 10 Reasons that Java programs envy Scala"; the synopsis: "A whirlwind tour through 10 productivity-increasing features of Scala that aims to inspire Java developers to jump on the Scala bandwagon for their own coding pleasure."

There will be some other talks, Atlassian are providing the venue, pizza and drinks and JetBrains have donated some swag to dish out. It should be a good night so if you'll be in Sydney on Wednesday October 26, make a plan to drop in and make sure you RSVP on Meetup.com. Hope to see you there!

Friday, September 23, 2011

Code Writers vs Code Creators

It's occurred to me recently that one of the key turning points for a professional software developer is the realisation that you don't have to make do with the tools you've got - you can create your own tools. This is why 'The Pragmatic Programmer' recommends activities like learning scripting languages and using code generation.

What is your Goal?
As a software engineer, my key goal is to CREATE software. A lot of the time, creating will mean WRITING software (i.e. typing out a program), but that isn't always the case. Many pieces of code have large amounts of boilerplate, or contain elements that are repeated again and again across the code base. Usually this means that the code could have been derived from some description of the end result that was simpler than the code itself.  And wherever that is the case, the code is a candidate for being tool-aided or tool-generated.

What is a Tool?
Now, tools can take many forms.  Sometimes you just need to abstract a super class from some existing code, rather than copying code, and that base class becomes a tool for solving any similar problem. If you have a similar problem across a number of different applications, you might pull code out into a library that can be reused  across those code bases.

I'm using a pretty loose definition of 'tool' here. Basically I consider anything a tool if it has no function by itself, but can be used to create things which can perform a function. Base classes and libraries definitely fit this definition, but I hope that for the majority of developers this kind of reuse is bread and butter.

The kind of tool-creation I think we should aspire to is a little more high-level than this. At regular points in the creation of software, every programmer, and sometimes even whole teams, will come across problems where they feel awkward with the solution and with the process for creating it. So what I'm advocating is that developers and teams need to have the courage in these situations to put down the normal tools and say, "This isn't working. We need better tools for this task." Sometimes there will be existing tools for solving your problem, and sometimes you'll have to create the right tool yourself. But the key quality is to be on the lookout for situations where the productivity of your code creation is being hampered by something, and to be ready to say out loud, "let's take the time to fix this". The problem might be irrelevant boilerplate, repetetive details, or it may even be something not directly related to coding, like a randomly failing build or an automated process that sends you too many emails. Anything that's stealing time away from value creation and could potentially be semi-automated can become the target of tooling.

What's the Catch?
Of course, there's common sense to be used in deciding when to employ a tool or create one. If you're going to write your own tool, you need to make a really honest (i.e. pessimistic) estimation of the ROI (return on investment): Could the time it will take you to write the tool end up being longer than the time it will save you? Is this problem (and hence the tool to solve it) happening again and again or is it just a one off? Will the tool require ongoing maintenance that will significantly offset the gain it provides? You also need to have the discipline to create just the tool that YOU need, and to not try to solve the larger, generic set of problems that are like yours, but which you don't have. Remember: the purpose of creating a tool is to get you to the finish line faster, not to divert you from the race entirely because you're finding it uninteresting.

Are Your Eyes Open to Opportunities for Greater Productivity?
At the end of the day, though, I believe there will often come times for every developer where they find a problem that justifies the creation of a custom tool. If this hasn't been your experience, I'd like to challenge you to think about this: is the charge of your occupation to write lines of code as fast as possible, or to deliver working solutions as fast as possible? Assuming you're not paid by the line, you're really being paid to be a Code Creator, not just a Code Writer. That means you have a responsibility to use the right tools for the job, and hence a mandate to create them if they don't exist.

Want to learn more?

From Amazon...

From Book Depository...

Thursday, August 4, 2011

How to Create a Webapp with Scala, Spring, Hibernate and Maven in Late 2011

Way back in the ancient history of January 2010, I wrote a walk-through of how to create a webapp with Scala, Spring Hibernate and Maven.

As with all things, technologies have progressed, as have other things, including the way code is shared online. So I decided it was time to update the code form that post and share it somewhere more modern.

You can now find my code for starting a Scala + Spring + Hibernate + Maven Webapp on GitHub.

At the time of writing, this project uses Scala 2.9.0-1, Spring 3.0.5.RELEASE and Hibernate 3.6.5.Final. I hope to keep it updated as new versions of these libraries are released, and also to add some more flesh to the example as time permits. (Ha!)

In you're looking more for a tutorial rather than just code, the original post still serves as a great walk-though of all the steps that went into creating the sample webapp, as well as showing lots of mistakes that I made that are probably quite common.

Want to learn more?
Here's some books you might find useful if you plan to go further with Spring, Hibernate, Scala or Maven:

From Amazon









From The Book Depository


Spring in Action - Craig Walls & Ryan Breidenbach (Manning)

Hibernate in Action - Gavin King & Christian Bauer (Manning)

Spring Recipes - Gary Mak (Apress)

Programming in Scala - Martin Odersky, Lex Spoon & Bill Venners (Artima)

Maven - A Developer's Notebook - Vincent Massol & Timothy M. O'Brien (O'Reilly)

Saturday, June 4, 2011

What's the best way to transform a list with a cumulative function?

I've been doing this kind of thing a lot lately: I have a list of somethings. I want a new list based on that list, but it's not a straight one-to-one map() operation - each value in the resulting list is a function of the corresponding value and all values before it in the input list. (I'm sure there's a one-word name for this type of function, but I don't specialise in maths vocab.)

Example
As an example to discuss, imagine I have a list of numbers, and I want a list of the cumulative totals after each number:
 def main(args: Array[String]) {
val numbers = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val cumulativeTotals = ... ?
println(cumulativeTotals)
}
should output:
List(1, 3, 6, 10, 15, 21, 28, 36, 45, 55)

Funky Folds


So, until the other day, I've usualy been doing this with a foldLeft():
 private def cumulativeTotalFolded(numbers: List[Int]): Seq[Int] = {
numbers.foldLeft((0, List[Int]()))((currentTotalAndCumulativeTotals, n) => {
val (currentTotal, cumulativeTotals) = currentTotalAndCumulativeTotals
(currentTotal + n, (currentTotal + n) :: cumulativeTotals)
})._2.reverse
}
Now, I'm not holding that up as a good example of anything. Folds can be hard to understand at the best of times. A fold that passes around a Tuple2 as the accumulator value is not code that simply communicates to the next guy what's going on.

Stream Simplicity
So, after a particularly hairy instance of one of these the other night, I lay in bed trying to think of a better way. It struck me ('round about 1am) that Streams are a much more natural construct for calculating cumulative functions.

If you haven't come across Streams before, they're basically a way to define a collection recursively be providing the first value in the collection and a function that will calculate the next element in the collection (and, recursively, all the elements after it) as each subsequent element is requested by the client.

Streams are good for this problem because the iteration through the list is simple, while the "next" value usually has easy access to the "previous" value, so to speak. I've used this once or twice now and I like it a lot better.

For the problem above, my solution using a Steram looks like this:
 private def cumulativeTotalStream(numbers: List[Int], total: Int = 0): Stream[Int] = {
numbers match {
case head :: tail =>
Stream.cons(total + head, cumulativeTotalStream(tail, total + head))
case Nil => Stream.Empty
}
}
(Note: to make this work with the println() above, you'll need to toList() the Stream.

Recursion Wrangling
There is, of course, another obvious way, which is to construct the list using a recursive function that passes two accumulators: one for the current total and one for the resulting List of totals that is accumulating:
@tailrec
private def cumulativeTotalRecursive(
numbers: List[Int], currentTotal: Int = 0, cumulativeTotals: List[Int] = Nil): Seq[Int] = {

numbers match {
case head :: tail =>
cumulativeTotalRecursive(tail, currentTotal + head, (currentTotal + head) :: cumulativeTotals)
case Nil => cumulativeTotals.reverse
}
}
There's nothing wrong with this solution, and it probably performs much better than the Stream version, but I feel a bit weird about passing so many parameters around for such a simple operation.

I could reduce the parameter count by getting the currentTotal from the accumulator list instead of passing it around:
 @tailrec
private def cumulativeTotalRecursive2(
numbers: List[Int], cumulativeTotals: List[Int] = Nil): Seq[Int] = {

(numbers, cumulativeTotals) match {
case (number :: tail, Nil) => cumulativeTotalRecursive2(tail, List(number))
case (number :: tail, lastTotal :: otherTotals) =>
cumulativeTotalRecursive2(tail, (number + lastTotal) :: cumulativeTotals)
case (Nil, _) => cumulativeTotals.reverse
}
}
but then the function body ends up more complex than the one with an extra parameter, which isn't a good trade-off.

Mutability the key to Simplicity?
Finally, I thought about the possibility of solving it with a for comprehension. I realised quickly that I'd need a mutable variable, but the result is a very, very simple piece of code:

private def cumulativeTotalComprehension(numbers: List[Int]): Seq[Int] = {
var currentTotal = 0
for (n <- numbers) yield {
currentTotal += n
currentTotal
}
}
I'm pretty sure this wouldn't look as pretty for a lot of the problems I've been tackling with the foldLeft() but, mind you, none of them looked very pretty either. Is this an okay solution? Do Scala afficionados going to vomit in disgust when they see the var keyword?

Your Turn
What I'd really like to know is whether there's an idiomatic way of doing this that I've just never come across.

That's all I can come up with at the moment. I'm sure there's other ways to do it. One that looks simple, takes a sinlge parameter and runs fast would be ideal. If you've got some good ideas of other ways to do this, please leave something in the comments!

Want to learn more?
If all of this has just made you think you might need to do a bit more study on what recursion, Streams or folding are, try one of these great books:

From Amazon...


From Book Depository...


Wednesday, June 1, 2011

Introducing SodaTest: Spreadsheet-Driven Integration Testing for Scala and Java

I'd like to reveal what I've been working on in my spare time for the last couple of months.

The Announcement

SodaTest: "Spreadsheet-Driven Integration Testing", is an open-source framework for creating Executable Requirements for Integration and Acceptance testing in Scala and Java.

The impetus for starting this project was to attempt to create a tool that improves on Ward Cunningham's Framework for Integration Testing, "FIT". As an 'Executable Requirements' testing tool, it can also be considered as an alternative to Fitnesse, Concordion, RSpec, Cucumber, JDave, JBehave, SpecFlow, and Thoughtworks' Twist.

The Background

We used FIT in anger when I first arrived at my current workplace over four years ago. Since then I've watched the team become first dissatisfied with it, then scathing and fearing of it, before abandoning development of new FIT tests in place of Integration Tests written in JUnit. Nevertheless, we all still felt that Executable Requirements were a Good Thing and made a couple of failed restarts in trying to get back on the wagon.

From my own perspective, many of our issues (but not all) were to do with the tool. I identified two main issues that I'd seen hurt people over and over: the input format and the programming model. So I set out to create something that solved these two problems while remaining fairly close to what I think is a great foundation in FIT.

The Result

The result is a tool I've called SodaTest. It uses spreadsheets, in the form of comma-separated value (CSV) files, as the input format. The contents of the spreadsheet are basically small individual tables, much like what would appear in a FIT test. There is a required but simple structure to the tables and a minimum of special words and symbols to provide some context to the parser.

I also aimed to keep the programming model as simple as possible. I tried to make sure there is only noe sensible way things could be done, so as not to confuse developers with options. I've gone to lengths to ensure developers of Fixtures will have to do a minimum of work by having the framework do a lot of boilerplate work for them. Lastly, I've structured and named everything in a way that I believe will guide developers in the right direction (by which I mean away from mistakes that I have made in the past while writing FIT tests).

I've also taken the time to add a little sugar here and there that I thought was missing from FIT, for example integration with JUnit and a more comprehensive set of built-in coercion strategies for converting strings into strong types.

I'm quite pleased with the result. (I wouldn't be telling people about it if I wasn't!) Yesterday I released version 0.1 of SodaTest and people should be able to use this first release of SodaTest to create tests that do almost everything that FIT could do, but with less effort in creating the tests, writing the Fixtures, and getting the whole lot executing in their environment.

You can find out more about the motivation for SodaTest and the features it includes by reading the README on GitHub.

While SodaTest is written almost entirely in Scala (and the most benefit will be gained by using Scala as the language for writing Fixtures), I've also written the sodatest-api-java extension that allows SodaTest Fixtures to be written in Java. There is one limitation where Scala is (currently) still needed, but I reckon 95% of Fixtures should be able to be written entirely in Java if that is something you care about.

The Next Steps

The next steps for SodaTest are clear to me: Dogfood, Broadcast, Feedback and Evolve.

I want to convince my team at work to start experimenting with Executable Requirements again using my home-grown tool; I'd love it if other people in the Scala and Java communities could download this tool and give it a little tryout during their next iteration or two; I want to hear Feedback from people about what's good about SodaTest, what needs more work and whether there's parts that are just plain horrible; and, if the feedback is positive enough to consider SodaTest a preliminary success, I want to continue improving it in the areas where it's still holding people back.

There is already a Roadmap of possible features to add, but now it's really time to get it in people's hands and find out from users what is the next most important thing it needs to do.

Try It Out!

If you're a Scala or Java software developer and Executable Requirements are either a passion of yours or something you've been wanting to try out, why don't you give SodaTest a try? You don't have to commit to it, just write a couple of tests with it, get them running, then passing, and send me some feedback. All feedback is useful, even if you think it sucks! (As long as you tell me why.)

To get started with SodaTest, I suggest you:

<repositories>
<repository>
<id>sodatest-github-repo</id>
<name>SodaTest repository on GitHub</name>
<url>http://grahamlea.github.com/SodaTest/m2repository/ </url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.sodatest</groupId>
<artifactId>sodatest-api</artifactId>
<version>0.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.sodatest</groupId>
<artifactId>sodatest-runtime</artifactId>
<version>0.1</version>
<scope>test</scope>
</dependency>
</dependencies>

Sunday, May 22, 2011

Explaining Scala to Your Wife Through Knitting

On Friday night I was sitting in my kitchen coding up some Scala. (Oh, what excitement a having small, sleeping child brings to Friday nights!)

My wife, who is a die-hard knitter, came into the room and, having seen me do little in the evenings for the last few years except code Scala, asked, "Is Scala just the bee's knees?"

My response, purely from reflex, was "No. CODING is the bee's knees!"

"Sure," she said (with a roll of the eyes, I suspect), "but is it like knitting something really interesting? Compared to Java, I mean?"

"Not quite," I said, "it's like if you had electric knitting needles that would let you start off the row and once you've done the interesting bit, the needles take over and do the rest of the row for you. So you'd just knit the interesting bits, not the dumb bits. With Java, though, you spend a lot of time writing dumb stuff."

"Right," she said, "that would be cool."

Yes, it would be cool. Sadly for knitters, electric knitting needles do not exist. Java programmers, on the other hand, have the freedom to use Scala if we want to stop doing dumb stuff.

How would you explain Scala to your wife?

Monday, May 16, 2011

Scala's Match is Not Switch++

Scala's match operator is a beautifully powerful beast. It can do some really sophisticated stuff with very little effort. However, if your introduction to 'match' was along the lines of, "It's like Java's 'switch', but better", then I want to offer a word of caution. I'll show an example where this thinking caused problems for me, briefly explain match and explain how I look at it now to prevent this confusion for myself.

A (Bad) Example
Some things can look quite straightforward, but at runtime act quite differently from how they look. You might call these bugs. Take the following snippet as an example. What do you think this prints out?

object TestingObject {
def main(args: Array[String]) {
val testValue = "Boris"
val inputValue = "Natasha"
List(inputValue) match {
case List(testValue) => println("Match!")
case _ => println("No match")
}
}
}
Because you're a smart alec and you know what makes for an interesting blog, you have probably correctly guessed that this prints out "Match!".

Let's try and get a little more information...

object TestingObject {
def main(args: Array[String]) {
val testValue = "Boris"
val inputValue = "Natasha"
List(inputValue) match {
case List(testValue) => println("Match! testValue = " + testValue)
case _ => println("No match")
}
}
}
The code now prints "Match! testValue = Natasha". This gives us a bit more insight into what's going on. Obviously the testValue that is defined in the case expression is not a reference to the original val testValue but a declaration of a new reference with a different scope.

My mistake when I wrote this code was to believe that everything that I put between case and => was an input to the matching algorithm, i.e. it was going into some == or .equals() that happens under the hood. This is true in Java's switch statement, where everything in the case must be a constant, but it's certainly not how things work in Scala's match.

New Perspective
I prefer now to think of match the other way around to how it was first introduced to me: I now consider the use of an explicit extractor object to be the normal case, and I think of any other form of case expression, such as matching against a constant value, as syntactic sugar for some built-in, implicit extractor.

When you use an extractor object, you have an object that defines an unapply() function which accepts an object of the type you are matching as a parameter. The unapply() function can either return a Boolean or an Option. If it returns a Boolean, then true means the argument is a match and false means that it wasn't. If it returns an Option, then None means that it wasn't a match, while a Some indicates it did match. Not only this, but the object contained in the Some becomes available to the expression on the right of the =>, assuming you give it a name in the case expression. I think this is the most significant difference between switch and match: In Scala, the case expression has outputs.

My problem in the example at the start was caused because I thought of List(testValue) in the case expression as an input, but it's not. It's actually the name of an Extractor object (List) and a name given to the output of that Extractor (testValue).

My new way of thinking is to pretend that everything in a Scala case is an extractor. So when I look at a case that is matching a plain old constant, I think to myself, "That's calling an extractor that returns true if the input value equals the constant. If I see case x: Int => I think, "That's calling an extractor that returns Some(x) if the object is an Int or None if it's not.

Re-Program Yourself
So, my suggestion is to rid yourself of any notion that Scala's match is like "switch on steroids". That view can lead to the false belief that match is a switch that can deal with all kinds of interesting, non-constant input values. In actual fact, the cases in match are all about Extractors, not inputs, and matching against constant values in the same way as switch is best thought of as just a nice little trick that the compiler does for you to hide the Extractor.

Want to Learn More?
If you suspect that a bit too much of your Scala knowledge may come from blogs that were comparing it with Java, reading one of these books might help:

From Amazon...


From Book Depository...


Wednesday, April 13, 2011

'Programming in Scala' book now FREE

If you're looking to learn Scala or looking to broaden your knowledge thereof, you'll be pleased to know that it's just been announced that the first edition of 'Programming in Scala' by Martin Odersky (the creator of Scala), Lex Spoon, and Bill Venners is now FREE online in HTML format.

Here's the announcement:
http://groups.google.com/group/scala-announce/browse_thread/thread/502c7d5a386e357f

Here's the online version:
http://www.artima.com/pins1ed/

If you want to grab a hard copy as well, you can get it here...

From Amazon...


From Book Depository...


Tuesday, January 18, 2011

Scala Pragmatism: Ignore CanBuildFrom

After a recent blog of mine about converting two lists to a map, a commentor wrote that they were...
trying to understand the code I read [in] the ScalaDoc for the zip() [function]:
def zip[A1 >: A,B,That](that: Iterable[B])(implicit bf: scala.collection.generic.CanBuildFrom[Repr,(A1, B),That]): That
would you please help us to read/understand this?

This is indeed curly stuff which, I'll have to admit, I wouldn't be able to fully explain without consulting a text book or two to get the terms right. Then you'd also have to read the same text books before you could understand my explanation.

If I can take some poetic license with the question, though, I think what the poster really wants to know is why the signature is so complex when the intent of the function is so simple. So instead of explaining the intricate details of what this signature means, I'm going to explain why, to make practical use of the zip() function, you don't need to understand the above signature, as well as a little bit about what the added complexity actually achieves.

Firstly: Why you don't need to bother too much.

If you look at the Scala API for List, you'll see that as well as the above signature for zip(), it also lists a "[use case]" signature for zip, which is simply this:
def zip [B] (that: Iterable[B]) : List[(A, B)]
The purpose of this 'Use Case' signature is to let you know you that, most of the time, you will be able to call zip() on a List and pass in just another List (or some other concrete subclass of Iterable) and get back a List of Tuple2s. Just like this:
scala> val list1 = List(1, 2, 3)
list1: List[Int] = List(1, 2, 3)

scala> val list2 = List('a', 'b', 'c')
list2: List[Char] = List(a, b, c)

scala> val result = list1.zip(list2)
result: List[(Int, Char)] = List((1,a), (2,b), (3,c))
That's the pragmatic part, and the most important part to understand. If you cast your eyes down the List scaladoc, you'll notice that there's lots of these use cases. (You should understand why this is the case once you understand why the extra complexity is there.)

To sum that up: to use the zip function, you just have to pass in an Iterable. You will rarely, if ever, need to know what CanBuildFrom is or what all those type parameters refer to.

Secondly: Why then have all that complexity in the signature, when I'm not using it?

Whoa! That's actually not true! The fact that we've called the function without explicitly providing that second parameter doesn't mean that you haven't used the extra complexity. In fact, you have used it, and it's provided you a small benefit, all without you realising. This is where a little bit of magic happens...

In understanding why that signature is so complex, the key is actually in the explanation above of what zip does. Note that I said that you can call zip() on a List and pass in just another List, and you will get back another List. This is actually a little bit amazing. It probably doesn't seem that amazing, but then you have to consider that the implementation of the zip function is not actually defined on the List class, but is defined in a trait that List inherits from called IterableLike. What is surprising here is that the IterableLike trait doesn't know anything about the List class, and yet its zip function is able to return an instance of List. (And remember that Scala's List is a concrete class, very different from Java's List interface.)

This is what the CanBuildFrom parameter is all about: it allows functions like zip to be defined once, high up in the collections API hierarchy, but then to also return a different type of result depending on the type of the collection on which the method is actually invoked. To put that in practical terms: if I call zip on a List, I'll get back a List of tuples, but if I call zip on a mutable.HashSet, I'll get back a mutable.HashSet of tuples. The same goes for map() and filter() and countless other functions. This is the magic: these functions don't know anything about these concrete types they are returning, but you haven't had to tell it about the concrete types either! To achieve the same result in Java without any casting, you would have to override the zip function in each and every concrete subclass of IterableLike from which you wished to return a specific collection type. (Edit: Actually, I don't think this is true. I think I've figured out in my head a way to do it that doesn't require casting or overriding.)

There is just one mystery left: why don't you have to pass in an instance of CanBuildFrom? As you can see, the parameter list containing this parameter is implicit. This means that, if this parameter list is not explicitly defined in any invocation, the Scala compiler will to attempt to find a variable or object that is in scope, and declared to be implicit, and also meets the type criteria of the parameter. So you don't have to provide a CanBuildFrom instance because scalac is tracking one down for you.

But where is it tracking it down from? All code has to come from somewhere, right? At first, I thought that there might be one CanBuildFrom instance in the Predef object that served the needs of nearly every collection. The real answer is a bit less exciting: each concrete subclass provides its own instance of CanBuildFrom in its companion object, though in many cases this definition is as simple as creating a new instance of GenericCanBuildFrom. I have to be honest and admit that I haven't yet figured out how this definition in the companion object makes its way into the zip function and other functions. While you unwittingly import the companion object whenever you import a collection type, you haven't imported all the companion object's functions into scope. Perhaps someone with a bit more patience for searching the Scala source code will nice enough to explain it in the comments.

So, to sum up:
  • You don't need to understand CanBuildFrom in order to use functions that have one in their signature.
  • Look for "[use case]" entries in the scaladoc for the collections API and use those as the basis for function calls unless the compiler seems to be telling you to do otherwise
  • The advantage of the complex function signatures is that the definition can be written once, but can return many different concrete types of collection, without you having to tell it how to do that.

Want to learn more?

If you'd like to get a bit more detail about how Scala's collections and the related classes like CanBuildFrom operate under the covers, there is a series of pages on scala-lang describing some of the implementation details of Scala's new collections API.

And if you really want to know every last thing about why this signature looks the way it looks and how it works, you might want to read the academic Scala paper, 'Generics of a Higher Kind', but please don't post questions about that on my blog! ;)

If you just want to learn a bit more about Scala, try one of these:

From Amazon...


From Book Depository...


Monday, January 10, 2011

Scala == Effective Java ?

I started reading Joshua Bloch's Effective Javalast week. I'll have to admit that I haven't read it before, but only because I've been told by several people, "you already do most of what's in there anyway." Seeing as we tell all the new recruits to read it, I thought I should actually flip through it myself so I know what's in there.

Books of best practices are always written in relation to domains that have many possibilities for bad practices (choosing otherwise would make for a very short book). Reading the first chapter of Effective Java, I was amused as I realised that, if you're coding in Scala instead of Java, many of the book's recommendations are either unnecessary, because Scala doesn't permit the corollary bad practice, or built into the language of Scala, or made easier to implement than they are in Java. This isn't a criticism of the book, but an observation that the state of the art is moving on, and Java is being left behind.

From the first 25 items in the book, here are my notes on practices that either become easier to follow or unnecessary if you are using Scala:

Item 1: Consider static factory methods
One of the four advantages given for static factory methods is that construction of classes with type parameters is briefer through static methods because the type parameters can be inferred on a parameterised method call. Scala solves this one by inferring type parameters on constructors, but not only that. It infers LOTS of types, almost everywhere: fields, local variables, method return types, anonymous function parameters - all can usually have their type inferred from the context, meaning that Scala code has a lot less declarative noise, while still remaining statically type-checked.

Item 2: Consider the Builder pattern instead of long constructors
Joshua writes himself that the Builder pattern simulates the combination of named parameters and default parameters that are found in other languages, and Scala supports named and default parameters. What he means by "simulates" is that you need to write a whole extra Builder class in Java to get the nice Builder effect at the client site, whereas in Scala, you essentially get a Builder for free every time you write a constructor.

Item 3: Enforce Singletons
While code-enforced singletons have gone out of fashion a bit with the popularity of IoC containers, they're still around a lot and these rules are still important. Scala supports singletons at the language level, providing the 'object' keyword which, substituted for the word 'class' in a declaration, will compile a singleton object into your code instead of a class. The generated object follows all of Josh's recommendations, including deserializing to the same instance (as long as you use Scala's @serializable rather than extends java.io.Serializable)

Item 4: Enforce non-instantiability with private constructors
There's really only two cases where you want to do this: singletons (see above), and utility classes, which are really just singletons that have no state. Either way, Scala's 'object' keyword for creating singletons is the simple answer.

Item 5: Avoid auto-boxing
The recommended panacea for unwanted auto-boxing is to prefer primitives wherever possible. Now, Scala is a purely object-oriented language, which means there are no primitives, and all numbers, characters and booleans are represented at the language level as objects. However, Scala uses of these primitive wrapper objects compile to byte code which uses Java's primitives wherever possible, so this recommendation is implemented for you by the Scala compiler.

Items 7, 8 and 9: Overriding toString, hashCode and equals
If you're authoring an immutable data structure, declaring your Scala class as a case class will tell the compiler to automatically implement toString, hashCode and equals for you, along with an unapply() method that can be used in a pattern matching clause. (There are some disadvantages to using Scala's case classes, but I believe they work well for most situations.)

Item 11: Override clone() judicously
While Scala as a language doesn't provide an answer to this one, it's considered best-practice Scala to favour immutable types, with transformation being much favoured over mutation. Following this principle will reduce the need to ever use clone(), because immutable objects can be shared among many clients and threads without the shared-mutable-state worry that might cause you to consider cloning something.

Item 14: Use public accessors methods rather than public fields
While Scala appears to have public fields - and indeed to make fields public by default - in fact Scala implements Bertrand Myer's "Uniform Access Principle", with all access to fields (which are in fact all private) being made through accessor and mutator functions that have the same name as the field. In other words, the compiler writes get and set methods for you and everything that looks like field access in your code actually goes through these methods.

Item 15: Minimize mutability
As already mentioned, it's considered Scala best practice to shun mutable state as much as possible. One of Josh's four recommendations for decreasing mutability is to make values final wherever possible. All Scala fields and local variables must be preceded by either 'val', indicating immutability (of the reference), or 'var', indicating that the reference can change. (Function parameters are always vals, and hence final.) Forcing programmers to make this choice when each variable is declared encourages the practice of using a lot of immutability, compared to final which is an optional modifier and seen by many as extra noise in most declarations.

Item 18: Prefer interfaces to abstract classes
Scala has traits - abstract classes that essentially allow multiple inheritance of function definitions (as opposed to interfaces, which only inherit function declarations). The possibility of multiple inheritance discounts quite a few of the disadvantages Josh raises against using abstract classes in Java. Of course, it also introduces some new issues, and exacerbates some others, like those Josh lists in Item 16 about preferring composition+delegation over inheritance. Multiple inheritance is a double-edged sword, for sure.

Item 21: Use function pointers
Scala, as a functional language, has functions as first-class members of the language, with every function naturally being available as an object (same as what Josh calls a function pointer) should the need arise. It also supports anonymous, inline functions, which, if available in Java, could reduce current "function pointer" logic like this:
new Thread(new Runnable() { public void run() { System.println("Running"); } } );
down to something like this:
new Thread({System.println("Running")})

Item 23: Don't use raw generic types
Scala doesn't allow you to write code that uses raw generic types, even if those types are defined in Java: it just won't compile. For what it's worth, raw generic types are not a feature, but merely an artefact of backwards-compatibility. Scala, not trying to be backwards-compatible with Java 4, just doesn't need raw types and, as a result, is able to provide stricter type safety for type-parameterised classes and functions.

Item 25: Prefer Lists over Arrays
While you should probably still prefer Scala's List class to Arrays for most applications, Scala prevents the chief problem cited with Java's arrays. In Java, you can cast a String[] to an Object[] and then assign a Long to one of the entries. This compiles without error, but will fail at runtime. In Scala, however, arrays are represented in source code as a parameterized class, Array[T], where T is an invariant type parameter. This basically just means that you can't assign an Array[String] to an Array[Object], and so Scala prevents this problem at compile time rather than choking at runtime.

Scala == Effective Java ?
So I'm going to put this question out there:
If 'Effective Java' is considered essential reading, and the best practices in it are the de facto standard for writing good programs, shouldn't we all be giving serious consideration to switching to a language that is so very close to Java, but makes good programming even easier?

Want to learn more?
From Amazon...
From Book Depository...