Saturday, June 4, 2011

What's the best way to transform a list with a cumulative function?

I've been doing this kind of thing a lot lately: I have a list of somethings. I want a new list based on that list, but it's not a straight one-to-one map() operation - each value in the resulting list is a function of the corresponding value and all values before it in the input list. (I'm sure there's a one-word name for this type of function, but I don't specialise in maths vocab.)

As an example to discuss, imagine I have a list of numbers, and I want a list of the cumulative totals after each number:
 def main(args: Array[String]) {
val numbers = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val cumulativeTotals = ... ?
should output:
List(1, 3, 6, 10, 15, 21, 28, 36, 45, 55)

Funky Folds

So, until the other day, I've usualy been doing this with a foldLeft():
 private def cumulativeTotalFolded(numbers: List[Int]): Seq[Int] = {
numbers.foldLeft((0, List[Int]()))((currentTotalAndCumulativeTotals, n) => {
val (currentTotal, cumulativeTotals) = currentTotalAndCumulativeTotals
(currentTotal + n, (currentTotal + n) :: cumulativeTotals)
Now, I'm not holding that up as a good example of anything. Folds can be hard to understand at the best of times. A fold that passes around a Tuple2 as the accumulator value is not code that simply communicates to the next guy what's going on.

Stream Simplicity
So, after a particularly hairy instance of one of these the other night, I lay in bed trying to think of a better way. It struck me ('round about 1am) that Streams are a much more natural construct for calculating cumulative functions.

If you haven't come across Streams before, they're basically a way to define a collection recursively be providing the first value in the collection and a function that will calculate the next element in the collection (and, recursively, all the elements after it) as each subsequent element is requested by the client.

Streams are good for this problem because the iteration through the list is simple, while the "next" value usually has easy access to the "previous" value, so to speak. I've used this once or twice now and I like it a lot better.

For the problem above, my solution using a Steram looks like this:
 private def cumulativeTotalStream(numbers: List[Int], total: Int = 0): Stream[Int] = {
numbers match {
case head :: tail =>
Stream.cons(total + head, cumulativeTotalStream(tail, total + head))
case Nil => Stream.Empty
(Note: to make this work with the println() above, you'll need to toList() the Stream.

Recursion Wrangling
There is, of course, another obvious way, which is to construct the list using a recursive function that passes two accumulators: one for the current total and one for the resulting List of totals that is accumulating:
private def cumulativeTotalRecursive(
numbers: List[Int], currentTotal: Int = 0, cumulativeTotals: List[Int] = Nil): Seq[Int] = {

numbers match {
case head :: tail =>
cumulativeTotalRecursive(tail, currentTotal + head, (currentTotal + head) :: cumulativeTotals)
case Nil => cumulativeTotals.reverse
There's nothing wrong with this solution, and it probably performs much better than the Stream version, but I feel a bit weird about passing so many parameters around for such a simple operation.

I could reduce the parameter count by getting the currentTotal from the accumulator list instead of passing it around:
private def cumulativeTotalRecursive2(
numbers: List[Int], cumulativeTotals: List[Int] = Nil): Seq[Int] = {

(numbers, cumulativeTotals) match {
case (number :: tail, Nil) => cumulativeTotalRecursive2(tail, List(number))
case (number :: tail, lastTotal :: otherTotals) =>
cumulativeTotalRecursive2(tail, (number + lastTotal) :: cumulativeTotals)
case (Nil, _) => cumulativeTotals.reverse
but then the function body ends up more complex than the one with an extra parameter, which isn't a good trade-off.

Mutability the key to Simplicity?
Finally, I thought about the possibility of solving it with a for comprehension. I realised quickly that I'd need a mutable variable, but the result is a very, very simple piece of code:

private def cumulativeTotalComprehension(numbers: List[Int]): Seq[Int] = {
var currentTotal = 0
for (n <- numbers) yield {
currentTotal += n
I'm pretty sure this wouldn't look as pretty for a lot of the problems I've been tackling with the foldLeft() but, mind you, none of them looked very pretty either. Is this an okay solution? Do Scala afficionados going to vomit in disgust when they see the var keyword?

Your Turn
What I'd really like to know is whether there's an idiomatic way of doing this that I've just never come across.

That's all I can come up with at the moment. I'm sure there's other ways to do it. One that looks simple, takes a sinlge parameter and runs fast would be ideal. If you've got some good ideas of other ways to do this, please leave something in the comments!

Want to learn more?
If all of this has just made you think you might need to do a bit more study on what recursion, Streams or folding are, try one of these great books:

From Amazon...

From Book Depository...

Wednesday, June 1, 2011

Introducing SodaTest: Spreadsheet-Driven Integration Testing for Scala and Java

I'd like to reveal what I've been working on in my spare time for the last couple of months.

The Announcement

SodaTest: "Spreadsheet-Driven Integration Testing", is an open-source framework for creating Executable Requirements for Integration and Acceptance testing in Scala and Java.

The impetus for starting this project was to attempt to create a tool that improves on Ward Cunningham's Framework for Integration Testing, "FIT". As an 'Executable Requirements' testing tool, it can also be considered as an alternative to Fitnesse, Concordion, RSpec, Cucumber, JDave, JBehave, SpecFlow, and Thoughtworks' Twist.

The Background

We used FIT in anger when I first arrived at my current workplace over four years ago. Since then I've watched the team become first dissatisfied with it, then scathing and fearing of it, before abandoning development of new FIT tests in place of Integration Tests written in JUnit. Nevertheless, we all still felt that Executable Requirements were a Good Thing and made a couple of failed restarts in trying to get back on the wagon.

From my own perspective, many of our issues (but not all) were to do with the tool. I identified two main issues that I'd seen hurt people over and over: the input format and the programming model. So I set out to create something that solved these two problems while remaining fairly close to what I think is a great foundation in FIT.

The Result

The result is a tool I've called SodaTest. It uses spreadsheets, in the form of comma-separated value (CSV) files, as the input format. The contents of the spreadsheet are basically small individual tables, much like what would appear in a FIT test. There is a required but simple structure to the tables and a minimum of special words and symbols to provide some context to the parser.

I also aimed to keep the programming model as simple as possible. I tried to make sure there is only noe sensible way things could be done, so as not to confuse developers with options. I've gone to lengths to ensure developers of Fixtures will have to do a minimum of work by having the framework do a lot of boilerplate work for them. Lastly, I've structured and named everything in a way that I believe will guide developers in the right direction (by which I mean away from mistakes that I have made in the past while writing FIT tests).

I've also taken the time to add a little sugar here and there that I thought was missing from FIT, for example integration with JUnit and a more comprehensive set of built-in coercion strategies for converting strings into strong types.

I'm quite pleased with the result. (I wouldn't be telling people about it if I wasn't!) Yesterday I released version 0.1 of SodaTest and people should be able to use this first release of SodaTest to create tests that do almost everything that FIT could do, but with less effort in creating the tests, writing the Fixtures, and getting the whole lot executing in their environment.

You can find out more about the motivation for SodaTest and the features it includes by reading the README on GitHub.

While SodaTest is written almost entirely in Scala (and the most benefit will be gained by using Scala as the language for writing Fixtures), I've also written the sodatest-api-java extension that allows SodaTest Fixtures to be written in Java. There is one limitation where Scala is (currently) still needed, but I reckon 95% of Fixtures should be able to be written entirely in Java if that is something you care about.

The Next Steps

The next steps for SodaTest are clear to me: Dogfood, Broadcast, Feedback and Evolve.

I want to convince my team at work to start experimenting with Executable Requirements again using my home-grown tool; I'd love it if other people in the Scala and Java communities could download this tool and give it a little tryout during their next iteration or two; I want to hear Feedback from people about what's good about SodaTest, what needs more work and whether there's parts that are just plain horrible; and, if the feedback is positive enough to consider SodaTest a preliminary success, I want to continue improving it in the areas where it's still holding people back.

There is already a Roadmap of possible features to add, but now it's really time to get it in people's hands and find out from users what is the next most important thing it needs to do.

Try It Out!

If you're a Scala or Java software developer and Executable Requirements are either a passion of yours or something you've been wanting to try out, why don't you give SodaTest a try? You don't have to commit to it, just write a couple of tests with it, get them running, then passing, and send me some feedback. All feedback is useful, even if you think it sucks! (As long as you tell me why.)

To get started with SodaTest, I suggest you:

<name>SodaTest repository on GitHub</name>
<url> </url>