Thursday, December 30, 2010

How to convert two Scala Lists into a Map

Update:

Since I posted this 24 hours ago, no fewer than 13 lovely people have left comments to let me know that this is actually far, far easier than I thought. There is a function on the List class (since 2.8.0, I think someone said), inherited from TraversableOnce, called toMap() which, if the sequence contains Tuple2s, will make a Map from them.

So the ideal solution is simply:
(keys zip values) toMap
Now the question is: How come everyone else knows about this and I didn't? :(

A couple of people also mentioned that you don't need to convert a List into an Array to be able to pass it into a var args method (but you still need the funky noise). Handy!

Thanks, everyone.


Original Post...

Not sure if this will ever be of use to anyone else, but I thought I'd put it out there just in case.

Problem:
You have a List/Seq of keys and a List/Seq of values, with the key at index n in one list matching the value at index n in the other. But what you really want is to convert them into a Map.

Solution:
Map(keys.zip(values).toArray: _*)

Explanation:
The steps here are:
  1. "Zip" the keys list with the values list, which creates a List of Tuple2 objects
  2. Convert the list of Tuples to an Array of Tuples. This is necessary because the Map object, which extends MapFactory, doesn't have any apply() methods that deal with lists but only the one that accepts a var args parameter of Tuple2s.
  3. Add this funky noise ": _*" which tells Scala to pass the Array as multiple varargs parameters, rather than trying to pass the whole Array as the first parameter.

Thursday, December 16, 2010

Graham's Guide to Learning Scala

It's a pretty widely-accepted view that, as a programmer, learning new languages is a Good Idea (tm). Most people with more than one language under their belt would say that learning new languages broadens your mind in ways that will positively affect the way you work, even if you never use that language again.

With the Christmas holidays coming up and many people likely to take some time off work, this end of the year presents a great opportunity to take some time out from your week-to-week programming grind and do some learning.

With that in mind, I present "Graham's Guide to Learning Scala". There are many, many resources on the web for learning about Scala. In fact, I think there's probably too many! It would be quite easy to start in the wrong place and quickly get discouraged.

So this is not yet another resource to add to the pile. Rather, this is a guided course through what I believe are some of the best resources for learning Scala, and in an order that I think will help a complete newbie pick it up quickly but without feeling overwhelmed.

And, best of all, it has 9 Steps!

[Note: Many of the resources have a slant towards teaching programmers that know Java, but I imagine if you know any other popular OO language like C++, C#, Ruby, Python, Objective-C or Smalltalk, you shouldn't have a problem picking it up.]

Are you ready? Here goes...

Step 1: Lay the Foundation

Read 'Introduction to the Scala Language' on scala-lang.org

This single page is a succinct but dense description of Scala and will let you know what you're in for, and why you should keep going.


Step 2: Get the Tools

Download and install the latest Scala release

Download and install the latest IntelliJ IDEA *Community Edition* release

Open IDEA and install the 'Scala' plugin


Step 3: Hello, World!

Create a new IDEA project with a Scala module.
Create a new file called HelloWorld.scala and paste in the following code:

object HelloWorld {
def main(args: Array[String]) {
println("Hello, world!")
}
}

Hit Ctrl-Shift-F10 to run it.
Congratulations! First Scala program done and dusted!


Step 4: Commute your Java skills to Scala

Read Daniel Spiewak's 'Scala for Java Refugees' series.

This should give you enough knowledge and experience with Scala syntax to be able to write Scala that does most of the things you can do in Java.

Make sure you enter the code examples into IDEA and try them out as you go. I suggest typing them out instead of copy-paste, because that's likely to give you exposure to common syntax mistakes and compiler errors.


Step 5: Open Your Eyes

Read through the 'Tour of Scala' on scala-lang.org

These short pages will give you some exposure to some of the more advanced features of Scala. Don't worry if you don't understand everything - just keep reading. You're not trying to learn how to use all these things, just to know that they're around.


Step 6: Learn the Basics, the Ins & the Outs

Read chapters 1 to 8 of Dean Wampler & Alex Payne's 'Programming Scala' (O'Reilly)

This is where the learning gets heavy. We're talking about more than half a book here. You'll learn everything from basic syntax (some of which you'll already know from the Refugees series) to Scala's more advanced patterns for object-oriented programming, an introduction to Scala's functional programming capabilities and some of the curly edges of Scala's type system.

Again, make sure you try out running the code samples in IDEA. I suggest you also have a play around with some of the concepts in your own little test classes. Any time you find yourself thinking "I wonder if I can use this feature to do this?", grab a keyboard and find out for yourself.


Step 7: Open the Dragon Scroll and Release your Scala Fu

Take a break from learning and do some hacking!

Think of a simple idea that you'd like to implement in Scala, then code it! (Seriously, I recommend keeping it REALLY simple.) If you can't think of something yourself, you can steal my idea of writing a function to perform word-wrapping.


Step 8: Take a Walk around the Grounds

Read the rest of 'Programming Scala' (chapters 9 to 14)

The rest of the book covers some of the peripheral libraries and concepts in Scala that contribute to making it a great language, like Agents for multi-threaded programming, "herding" XML input and output and building blocks for creating domain-specific languages, as well as reviewing some of the tools that have sprung up in the Scala ecosystem.


Step 9: Go Forth and Hack!

This is where the directed part of the journey ends, but hopefully it's only the first step.

What you do with Scala next is up to you, but by this point you know more than enough about Scala to start using it seriously at home or even at work, for fun, or for making some serious cash! If you do make some serious cash by hacking Scala, please remember how it all started and send a little monetary "Thank you" my way. ; )


Addendum: Getting Help

Perhaps it was a little presumptuous of me to effectively say "go away now and code". Chances are, you'll need some help as you keep experimenting, learning and doing more and more cool stuff with Scala.

The two best places that I've found to connect with other Scala users are the scala-user mailing list run by scala-lang.org and, for curly technical problems that you just can't figure out yourself, stackoverflow.com. Actually, reading through other people's Scala questions on Stack Overflow can also be a great way to pick up new ideas!


Want to learn more?

From Amazon...

From Book Depository...

Tuesday, November 16, 2010

SpringOne 2GX 2010 and Scala

SpringOne 2GX 2010 Wrap-Up

My last entry was the last of my notes from SpringOne 2010 in Chicago.
Here's a list of all my entries from the conference below in case you missed any:
If you're interested in watching the full content of some of the talks, I notice that they have just started publishing SpringOne videos on InfoQ, including Rod Johnson's keynote. I imagine this set will grow over the coming weeks.

Is This a Scala Blog or a Spring Blog?

I copped a little bit of flak recently due to all these Spring posts appearing on Planet Scala, an aggregation site for Scala blogs. The complainer's main issue was that I was spamming Planet Scala with "posts unrelated to Scala" and I think the core point of my response is worth repeating here.

One of the main reasons for Scala's popularity is its tight integration with Java. This integration allows Scala programs "access to thousands of existing high-quality libraries". It says so on the Scala website. Of all these "thousands of libraries", the Spring Framework is without a doubt the most popular enterprise framework for the JVM.

It's obviously not the case that everyone with any interest in Scala will also be interested in Spring. In fact I think there is probably a higher percentage of programmers in the Scala community who are utterly un-interested in enterprise web applications than there would be in the Java community. However, I believe that anyone applying Scala in an enterprise context, or even thinking about applying it there, is either pretty interested in Spring or, if they're not, probably should be, at least to the extent where they know what it does and where it's headed. Ergo, Spring posts on my Scala blog. I hope some of you have enjoyed the information, and I apologise to those that it may have annoyed.

Saturday, November 13, 2010

SpringOne 2010: GemFire SQLFabric - "NoSQL database" scalability using SQL

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

GemFire SQLFabric - "NoSQL database" scalability using SQL
presented by Jags Ramnarayan and Gideon Low

Jags started by proposing that the ACID semantics, which basically aim to ensure every change to a database is written to disk, are limited in the performance they can achieve due to being essentially I/O-bound. He also suggested the desire for these semantics is rooted in history when storage hardware and networks were slow and unreliable. Though he didn't say it explicitly, I think he was suggesting that this is no longer the case, with the implication being that we no longer need ACID.

He outlined how the "NoSQL" movement actually has nothing to do with SQL at all - i.e. the query language - but with the data structures - relational databases - that have typically been the subject of these queries. His point: You don't have to reinvent data querying just because you're re-inventing the data structure.

That, of course, led into the design of GemFire SQLFabric, the data management technology that was acquired by SpringSource/VMWare in May 2010. Jags said that, from an application developer's perspective, using GemFire SQLFabric is mostly identical to using other databases, just with a different JDBC URL and some custom DDL extensions.

I didn't jot down Jags' description of GemFire, but here is the spiel from the front page:

GemFire Enterprise is in-memory distributed data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes to manage application objects and behavior


Jags outlined a bit of the structure of a GemFire cluster and then said that, because of the structure, it didn't give great performance for key-based access, or for joins, which left me wondering what is was good for! (Edit: Turns out I misheard him. The performance is better for joins compared to other object data grids.) I think he clarified the join part later, though, when he discussed that data needing to be joined in a query must be co-located on a single node.

The GemFire table creation DDL provides extensions for specifying how a table is to be replicated or partitioned, and how much redundancy should back the partitions. These extensions allow the DBA to ensure that data to be queried together is co-located.

If no partitioning or replication options are specified in the table DDL, GemFire will make decisions about default options for these based on the relationships (i.e. foreign keys) apparent in the table definition.

He said that GemFire uses the JDBC drivers and query planning and optimisation code from Apache Derby.

While talking about joins, Jags mentioned that co-location of joined data is required in order to achieve linear scaling. He mentioned that co-location of data is only currently a restriction of GemFire, implying that they intend to remove this restriction, though he didn't mention whether they would be tackling the linear scaling problem when they do this.

He talked about the way in which many of the design decisions they make in GemFire are focussed on making the absolute minimum number of disk seeks. I think that's hard-core stuff! I've been coding commercially for over ten years now and I've never once thought about how many disk seeks my code is causing.

Gideon showed some of the networking that occurs to make GemFire work and discussed how there is a central component called a 'Locator' that the cluster nodes use to find each other and which also performs load balancing of client requests. Strangely, this seemed like a classic single-point-of-failure to me, but there was no discussion about that problem.

I came away not really being sure what GemFire could be used for. Jags' comments about ACID at the start seemed to suggest that he thinks we are over-obsessed with reliability in the modern age. However, in my finance day job, we need to be 100% certain that pretty much every piece of data we push to the databases is stored for good when the transaction ends. Even 99.9999% (yes, six nines!) is not good enough: if 1 in every million DB transactions goes missing, money will go missing and we'll have a very angry customer on the phone. Unfortunately, they didn't cover during the talk how (or whether) GemFire handles reliability requirements like these.

Having said all that, however, I noticed that GemStone have an essay on their site called "The Hardest Problems in Data Management", in which they discuss the demanding needs of financial applications and suggest that while the popular "eventually consistent" distributed databases do not measure up to these demands, their implementation does. Having read a few paragraphs, they certainly seem to know what they're talking about from a theory perspective. If you're seriously looking for a solution in this space, I would suggest you have a good read of their documentation rather than just relying on my scratchy notes here.

Want to learn more?

From Amazon...


From Book Depository...


SpringOne 2010: Extending Spring Integration

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Extending Spring Integration
presented by Josh Long (@starbuxman) and Oleg Zhurakousky (@z_oleg)

One of the most interesting things these guys said is that integration, as an industry need, is extending beyond the specialty of systems integration and toward customer integration: that is, getting systems to integrate with the users, or the users' devices, rather than just neighbouring systems.

Though I've been to two other talks that mentioned or demonstrated Spring Integration (SI), this is the first one I'd attended that focused on it exclusively. Hence, this was the first place where I received the simple definition of SI that I've been searching for. The answer was that, at it's core, Spring Integration is just an embedded message bus.

The guys went on to describe the basics of enterprise integration patterns and of Spring Integration in particular...

Channels are the most important concept in SI. These can be point-to-point or publish-subscribe and can also be synchronous or asynchronous. Enterprise Integration involves lots of different types of components for connection and processing the messages travelling through Channels: Transformers, Filters, Routers, Splitters, Aggregators, etc. All these are not Spring-invented terms but are commonly accepted integration patterns.

There is a Spring Roo add-on for Spring Integration currently in development. There are no milestones yet but apparently it is currently useable. (It's available from the SpringSource GIT repository under spring-integration/roo-addon)

It seemed very simple from a quick demo to create a persistent queue with just a few lines of XML. This is the kind of stuff that I'm very interested in. Unfortunately, there was no discussion about transaction coordination between the queues and other data sources.

A 'Service Activator', which is typcially the end point of a message chain, can simply be any POJO that defines a method that accepts the payload of messages, but is itself completely message- and SI-agnostic. In other words, you can use SI to deliver messages to your existing service-layer components without any change to their code. Pretty neat.

In response to a question, it was said that you could use a Service Activator to implement filters, transformers, etc., by holding a reference to the next channel in the chain, however this would bring integration concerns into the domain of your business logic.

They explained how the whole point of Spring Integration is to separate this logic away from you business logic, so your business code just deals with payloads and handles business-ey stuff, while your integration components - be they out-of-the-box ones or custom extensions - just deal with integration concerns and know little about the business function (other than routing). In a nutshell: The Spring team believe integration should be a configuration concern, not a business logic concern. This was a bit of a lightbulb moment for me.

There is a Spring Integration Samples project showing lots of different ways to use SI and they have a JIRA where you can request the addition of new samples! Good stuff.

After the talk ended, I wanted to ask a question about transaction coordination between Service Activators and message queues, but unfortunately, there were about 20 other people that wanted to ask questions as well so I couldn't get a word in.

I almost forgot that this talk was about extending Spring Integration. I'm sorry to say that I didn't write too much down about this because creating your own integration components is just so easy that's it's not really worth me re-documenting it. They did talk a little bit about rolling your own inbound and outbound channel adapters at either end of a channel in order to integrate with systems that the SI components don't yet support, but this stuff was a little over my head being a SI newbie.

One thing that I'm still keen to know but which I haven't been able to glean so far is exactly how transactional message queues like RabbitMQ fit into the picture. Are they installed as some special implementation of a Channel? Or is a message broker just another SI-agnostic endpoint? (Perhaps using an channel adapters?) Or can you use either approach, depending on your circumstances (e.g. single-process system with persistence vs. distributed system)? If you have any experience in this area, any comments you're able to add to reveal some more depth would be greatly appreciated.

If you're interested in watching the video of this talk, it's been published on InfoQ (1 hr 22 min)

Want to learn more?

From Amazon...


From Book Depository...


Friday, November 12, 2010

SpringOne 2010: Gradle - A Better Way to Build

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Gradle - A Better Way to Build
presented by Hans Docketer

Note that any references to the Gradle User Guide from this blog entry are to version 0.8, as that was the most recent GA version at the time of writing. If you are using a later version of Gradle, you can might want to check the latest User Guide instead.

Hans is the creator and lead developer of the Gradle build system. He’s also the CEO of Gradle Inc., a consulting company specialising in project automation.

First up, Hans explained that Gradle is actually mostly written in Java, with only the DSL being written in Groovy.

Gradle supports building Java, Scala, Groovy, Web projects, etc.

It appeared very easy to add a Maven dependency into the classpath. (Probably easier than in Maven!)

He demoed an example of developing a small, custom build operation that copied an image (of Rod Johnson) into the META-INF/ directory of any output JARs that had 'spring' in the name. While this was possible with only a few lines of code, I have to say that the code wasn’t all that intuitive and I think you would have to know quite a lot about the way Gradle structures things before you can easily write tasks like this yourself.

There’s a very easy method for importing Gradle tasks from external sources by specifying not much more than a URL.

Hans said that, in comparing Gradle to Maven and Ant, he believes the use of Groovy in Gradle instead of XML is not the main differentiator, but its design is where the gains come from.

He gave a pretty impressive demo of how Gradle detects changes in sources, tests, classpaths and even the Gradle script itself, and then only rebuilds those parts of the project that absolutely need to be built (and, of course, the things that depend on them).

While Gradle can be used to build projects that aren’t written in Groovy, it was my observation that you probably need to have a fair level of proficiency in Groovy in order to compose anything other than a basic build script.

It’s pretty easy to separate the declarative parts of a custom task from the imperative parts by defining a class at the bottom of the script that. (I found it interesting that, even though the syntax of a Gradle file is mostly declarative, Hans was still referring to the file as a 'script'.)

Gradle exhibits heavy integration with Ant that is also bi-directional, i.e. Ant tasks can depend on Gradle tasks.

Hans highlighted that Maven’s defaults (which he called a “strong opinion”) for things like project directory structure are one reason that people will avoid migrating existing projects to Maven. While this might be true, I think it’s based on a misconception - in reality, it’s quite trivial to override the defaults of things directory locations in a Maven POM.

Gradle uses coloured output, which I think is pretty cool for a Java-based build tool.

Hans noted that the flexibility of Groovy means that creating custom operations in a build script is pretty easy. Having seen smart people spend a week or two writing a pretty simple custom plugin for Maven, I think making the customisation process easier is definitely a win. (On the other hand, in four years of using Maven, we've only ever created one Mojo. We have, though, often used the Ant plugin to "break out of the box".)

Hans gave a demonstration of dependencies between sub-projects of a larger project which was pretty impressive, showing Gradle building and testing both dependency and dependant projects based on what changes had occurred.

He talked about how performance is a big focus of Gradle, in particular getting the dependency and change behaviour right to ensure that Gradle only ever builds things that have been affected by changes.

Gradle is able to generate a pom.xml file for a project and deploy an artefact to a Maven repository along with the generated POM.

Gradle doesn’t support (out of the box) a ‘release’ action like Maven’s Release Plugin (which creates branches and tags in the code repository, deploys build artefacts to a remote repository and automatically updates version numbers in the build files on head/trunk to the next snapshot). However, Hans said that they eventually want to develop a full deployment pipeline based on Gradle, which will be one of the focus points after version 1.0 has been released.

Want to learn more?

From Amazon...


From Book Depository...


Thursday, November 11, 2010

SpringOne 2010: Harnessing the Power of HTML5

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Harnessing the Power of HTML5
presented by Scott Andrews (@scothis)and Jeremy Grelle (@jeremyg484)

First up, the guys cleared up some confusion by explaining that the term HTML5 is currently being used to encompass much more than just the latest W3C HTML spec, but also all of the related sets of technologies that are now being standardised in an attempt to ease the creation of dynamic webapps. It really means HTML 5 + CSS 3 + new JavaScript APIs like WebSockets and WebWorkers.

They demonstrated a lot of the upcoming technologies through the rest of the talk by showing a presentation from html5rocks.com, a site Google has put together to promote the stuff.

Web storage (which, ironically, means storing data on the client) is coming. Local storage – a basic key/value store – is pretty much standardised. Another idea for a client-side database is still being discussed, but has been implemented in WebKit and hence is available on many mobile devices. Note that the data in this database would, like cookies, not be accessible by scripts from other sites, but would be unsecured on the local machine. Likely uses are things like data storage for offline access of cloud data. (That's cloud computing data, not meterological readings.)

There is now an application cache manifest that allows the server to instruct the browser about cacheable resources and allows the browser to skip checking the last modified date/time of these resources as long as the date/time of the manifest hasn’t changed.

Web Workers allow JavaScript code to execute something in a separate process. They mentioned that this is specifically a process and not a thread, the implication being that the spawned process doesn’t have access to the memory of the parent.

Web Sockets are being billed as the new XHR and the new Comet.

Javascript Notifications will allow the application to request the browser to show a Growl-style status popup, but only if the user has already given permission for the site to show these.

Drag and drop is being supported at the browser level.

There is a proposal to provide access to a user’s current latitude and longitude.

Scott and Jeremy recommended that people not try to use these new features directly, but continue to use the various existing JavaScript libraries that already do the same thing, with the hope being that these libraries will, over time, be upgraded to use (and abstract) the native support where it is available (assuming the native support will be better than the current methods).

There are new semantic tags in HTML 5 for <header>, <nav>, <section>, <article>, <aside>

HTML 5 includes new link/rel types include ‘pingback’ which will allow a page to provide a URL for the browser to call if the user leaves the site, and ‘prefetch’ to specify other pages for the browser to load into the cache in anticipation of where the user might go next.

There are new HTML form input types that restrict the type of data that can be entered as well as providing some visual, optimisation and input hints, which is especially useful on mobile devices. New input types include date, email, range, search, tel, color and number.

There is also the ability to provide basic client-side authentication (without JavaScript) by specifying constraints such as required or a regular expression pattern. A new CSS :invalid pseudo-class can be styled to change the appearance of invalid fields declaratively.

There are new tags for defining ‘meter’ and ‘progress’ elements (the latter for both indeterminate and % complete).

There are plans to support embedded audio and video in HTML 5 without the need for any plugins, although there are currently arguments going on about what codec should be the standard.

There’s a Canvas element for just drawing pretty much anything using Java2D-like shape, line and fill constructs. There is a pretty impressive example in the Google slides where photos can be dragged, rotated and scaled. JavaScript is required to change the contents of the canvas, but the API as a concept looks pretty neat.

There is support for JavaScript events coming out of SVG objects using native HTML tags.

CSS 2 selectors like ‘first-child’ are being more widely supported and new ones like ‘not’ are coming in as well.

CSS is introducing font downloading, allowing everyone visiting you site to see the same font without having to use image replacement. (As someone who's had to do a bit of this in the past, I have to say 'yay!')

There are STILL experiments with having CSS manage column-based layouts (link from 2005!), although this is only implemented in WebKit at the moment.

CSS3 will support HSL (Hue/Saturation/Luminance) colour definitions.

border-radius for rounded corners is becoming standard, except in IE 9!!! Says the speaker: “So IE will be square.” (I've since tracked down an announcement on msdn.com that seems to suggest that border-radius will be in IE9.)

Gradients, shadows and animations are all getting some standardised support in CSS3.

It will be possible to intercept changes to the history (e.g. capture the back button) within an Ajax app and load data rather than allowing a whole-page refresh.

There is a library called Atmosphere that provides a server-side abstraction over Comet and WebSockets to allow these protocols to be handled on any Java web server (many of which currently support this stuff but through proprietary APIs).

They showed a pretty cool example where they were using Canvas and WebSockets on the client side with Spring Batch and Spring Integration on the server side to parse a large file, stream interesting data to the client through an Integration channel and visualise that data in the browser.

I've found this funky little website callled caniuse.com that allows you to specify browser names and versions along with HTML 5 features that you would like to use and it will show you the level of support in each browser & version for that technology.

Want to learn more?


From Amazon...


From Book Depository...


Saturday, November 6, 2010

SpringOne 2010: Concurrent and Distributed Applications with Spring

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Concurrent and Distributed Applications with Spring
presented by Dave Syer

My favourite quote from this talk, and possibly from the whole conference, is one which I want to take back to my workplace and put into practice with avengeance:

Using single-threaded algorithms on 32-core machines is a waste of money

Dave also presented a really simple but useful definition of thread-safety:

Thread safety = properly managing concurrent access to shared, mutable state

Applying this definition, you can see there are three ways to tackle thread-safety: you can either eliminate mutability, or you can eliminate sharing or you can eliminate concurrency. Eliminating concurrency is the core aim of mutexs and locks, e.g. synchronized blocks. Eliminating mutability is one of the chief design idioms of functional programming.

On the topic of eliminating shared resources, Dave pointed towards the frequent use of ThreadLocal within the Spring Framework to associate unshared resources to individual threads. He made note of the potential for memory leaks with ThreadLocal, highlighting that, with the long-running threads in most servers, you have to ensure you clean each ThreadLocal up when you’re finished otherwise your data will hang around forever. (Sounds like going back to pairing new & delete!)

Dave talked about a method on ExecutorService that I've never used before called invokeAny() that will execute every task in a given list of tasks concurrently (assuming a multi-threaded ExecutorService implementation) and return the result of the first one to complete. The remainder of the tasks are interrupted. I imagine where you might use this is if you have a situation where you have two or three different algorithms, each of which can outperform the other two for certain structures of data, but where the most efficient algorithm for a given individual input can't be (easily) determined before execution. So, on a many-multi-core machine, you have the option of just running all three against the same data, taking the result from the first algorithm to complete and killing the others.

Dave briefly discussed an emerging (I think?) pattern for concurrency called Staged Event-Driven Architecture or SEDA.

He mentioned that Spring Integration 2.0 (RC1 released Oct 29) includes support for transactional, persistent message queues.

He highlighted the difference between between a distributed Applications (running the same binary on multiple nodes) and a distributed Systems (running related, communicating applications across multiple nodes). He said that it was wise to prefer looser coupled messaging architectures for distributed Systems because of the likelihood of unsynchronised release cycles.

Want to learn more?

From Amazon...

From Book Depository...

Friday, October 29, 2010

SpringOne 2010: Groovy and Concurrency

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Groovy and Concurrency
presented by Paul King

Paul by started by mentioning a library called Functional Java which, to me, looks like an attempt at porting a bunch of ideas present in Scala over to Java, and another one called Kilim, which is an Actors library for Java.

Paul said that his main argument for why you should use Groovy, rather than Scala or Clojure, is that Groovy is closer to the Java syntax and, hence, is more easily integrated with Java. (In my personal experience, I can’t say I’ve ever had any problems integrating Java with Scala. Going the other way (using Scala in Java) has some gotchas, but wouldn't be described as hard.)

Groovy supports a very nifty pipe (‘|’) operator extension to the java.lang.Process class, allowing you to easily pipe stdout to stdin between two or more processes, just like in a shell.

I've now heard Google Collections (now part of the Guava libraries) mentioned for about the 5th time this week. I really should check out what these are because they’re very popular!

Groovy supports adding functions to classes, and even individual objects, at runtime. That is, your code can contain statements that add members to an existing type. This is not creating an inline definition of a new type, but actually changing the type at runtime, as you might do in JavaScript. They call this Dynamic Groovy. I've never really got my head around why meta-programming - programs that write programs (and then run them) - is a good idea, but I've also read Paul Graham saying that this feature gave him a major competitive advantage, so there must be something mind-bending about it. Perhaps I just need to give it a try?

Groovy has an @Immutable annotation that, as well as making the contained fields final (without you declaring them so), is also shorthand for adding getters, toString(), hashCode() and equals() to a class based on just the field names and types. Case classes in Scala provide the same functionality along with the added bonus of making pattern matching simple.

He mentioned two Java concurrency libraries, one called Jetlang for message-based concurrency (the Jetlang site itself refers to Jetlang being similar to Scala's Actors) and JPPF, a grid-computation library. The JPPF intro uses domain language similar to that of Spring Batch, with regards to jobs and tasks.

He talked a bit about GPars (short for Groovy Parallel Systems), a concurrency library specifcally for Groovy.

He also said that shared transactional memory looked interesting but didn’t go into it much beyond mentioning the Multiverse library. I have seen this term bandied around a little, in particular due to a couple of people attempting to implement it in Scala, but I've never looked into it - it's frequency hasn't yet punched through my signal/noise filter.

He gave a little preview of what the GPars guys are trying to achieve with a “Dataflows” library. You use a DSL to describe a dependency graph between steps in your (complex, but parallelisable) algorithm and Dataflows will automatically run in parallel those parts of the algorithm that are not dependent, synchronising at the points where multiple asynchronous outputs are required to enter the next step.

Want to learn more?

From Amazon...



From Book Depository...


SpringOne 2010: How to Build Business Applications using Google Web Toolkit and Spring Roo

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

How to Build Business Applications using Google Web Toolkit and Spring Roo
presented by Amit Manjhi

Amit works at Google on the GWT (Google Web Toolkit) team. In this session, he showed the ease of creating a GWT app with Spring Roo and then talked through a a whole swathe of GWT best practices.

Roo supports the addition of JSR-303 constraints [PDF] as part of adding a field to an entity.

Creating a GWT webapp from an existing Roo domain model is as simple as typing gwt setup (and hitting enter) in the Roo console.

The default webapp generated by Roo automatically handles validation on the server side. I have to say that the default rendering for errors was pretty poor. The error he showed as an example just appeared at the top of the form and said something like "this field is required", without any reference to which field had caused the error. Room for improvement there, but you certainly get a lot of webapp for doing nothing much at all.

Bookmarkable URLs work out of the box, although I'm not sure if this meant that all URLs are bookmarkable by default or whether it's just easy to make URLs bookmarkable when you need them.

Amit showed a version of the same UI that some Google guys had jazzed-up with some nicer CSS and a few changes to the layout components. It was using some kind of slide transition where, when an item was selected from a list, the list would slide off to the left and the item detail slide in from the right, then vice versa when you went back. Looked very neat.

The Spring Roo addon for GWT generates the site using a whole bunch of best practices as learned by the Google Engineers who’ve been using GWT to develop the AdWords site.

The Model-View-Presenter pattern was presented as a suitable pattern for client-side GWT. This decoupling pattern allows different view implementations to be attached to the same Presenter.

Using DTOs (Data Transfer Objects) (as opposed to sending entities to the webapp) was recommended, though he did note that, coded manually, DTOs typically violate the DRY (don’t repeat yourself) principle. This downside is overcome by creating an empty interface for the DTO, annotated with a @ProxyFor annotation. GWT then automagically creates a proxy object for the entity class named in the annotation and this proxy acts as a DTO. The Google guys call this an Entity Proxy. From what I could tell, the proxy automatically proxies all value fields of the entity. You can provide annotated methods on the DTO interface that allow lazy navigation of entity relationships.

The default Roo project doesn'’t use GWT-RPC, chiefly because of the bandwidth implications when mobile devices are involved. Instead, they use an object on the client called a RequestFactory that talks JSON to a generic RequestFactoryServlet on the server.

RequestFactory receives notifications of the side-effects of its server-side calls and posts these as events on an event bus in the client.

They’ve replaced the Presenter with something they call an Activity that abstracts away a lot of the boilerplate normally required in a Presenter. The common parts of Activities are themselves abstracted out into a generic ActiviyManager.

He highlighted a proposed method for allowing search engine web crawlers to follow Ajax bookmarkable URLs by using #! to prefix the anchor rather than just #. Google have published an article called 'Making AJAX Applications Crawlable' where they discuss the details.

The talk became quite confusing past about half-way when it stopped being obvious (to me at least, having never written a GWT app) which code samples were part of the server and which were run on the client. My kind request to any GWT talk presenters: please introduce each code-sample slide by saying “This code runs on the {server|client}”.

GWT has a nice SafeHtmlUtils class for escaping HTML entered by users to avoid cross-site scripting (XSS) attacks.

GWT 2.1 contains a client-side logging library in the vein of the JDK logging. Spring Roo-generated GWT apps come with a handler that allows client events to be logged on the server. You can also use a GWT <property-provider> element to enable and disable logging at runtime on a per-user scope. (There's some information about property-provider here under the heading 'Elements for Deferred Binding'.)

Want to learn more?

From Amazon...


From Book Depository...


Wednesday, October 27, 2010

SpringOne 2010: Creating the Next Generation of Online Transaction Authorization

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Creating the Next Generation of Online Transaction Authorization
presented by Maudrit Martinez, Anatoly Polinsky and Vipul Savjani

These three guys from Accenture presented patterns of architecture with Spring Batch and Spring Integration that they have used in production systems for both online and batch processing of financial transactions.

Their diagram showed two technologies – Pacemaker and Corosync – that I hadn’t heard of before. Apparently Corosync is the clustering technology recommended by the Rabbit guys. They also used a product called Hazelcast for a distributed cache and Grid Gain for a compute grid.

They combined Spring Batch with Grid Gain in order to partition the processing of a batch of transactions across multiple nodes. The presenter was fairly impressed with GridGain’s over-the-wire classloading. (To be fair, this idea has been around at least since RMI was released in '97.)

Rather than passing the transaction data around their whole integration network, they instead placed the data in the distributed cache and passed around only the keys to the items in the cache.

They made use of a library called ScalaR, which is a DSL for using GridGain in Scala. They used Scala to process the transactions chiefly because of the availability of the ScalaR DSL and also due to its provision of Actors for simplified concurrent programming. and because, due to the need for performance, they didn’t want to use an interpreted language like Groovy.

They mentioned that parts of GridGain (though perhaps only the DSL) have reportedly been re-written in Scala, and that the GridGain team chose Scala over Groovy because of its compiled, static typing providing better performance than interpreted languages.

They showed where their code was calling Hazelcast and I noted that there wasn’t any attempt at decoupling the cache implementation – a Hazelcast instance was retrieved by calling a static method. Perhaps it was just some demo code they'd thrown together.

I noticed a cool way of converting a Scala list to a Java list that I hadn’t seen before:
new ArrayList ++ myScalaList
From what I can tell, this ++ operator isn't standard (at least you can't use it in the Scala 2.8 REPL), but it was an interesting, succinct syntax that caught my eye.

They mentioned the STOMP protocol, which is a text-based protocol for message broker interoperability supported by Rabbit MQ, among others.

The Spring Integration config they used to send a message to Rabbit didn’t have any implementation, but just an interface they had defined which was then proxied by Spring to send a payload onto the channel.

They mentioned a couple of times that the advantage of Rabbit MQ over JMS is that Rabbit's AMQP is an on-the-wire protocol whereas JMS is a Java API. They didn’t elaborate on why this was an advantage, but I suppose the protocol easily allows other programming languages to integrate with the messaging, where as a Java interface doesn’t offer any standard way to do that.

Their implementation for processing transactions used a chain of three actors: #1 for coordinating the authorisation of the transaction (I think – it may have been for coordinating the authorisation of multiple transactions?), #2 for performing the authorisation, which chiefly meant looking up a collection of rules and then passing these rules off to Actor #3, which was an executor for the Rules.

While searching for an online profile for Anatoly Polinsky, I found this great presentation on Spring Batch that he apparently authored. It also looks like he has released some of the code from the presentation in a project called 'gridy-batch' on github.

Want to learn more?

From Amazon...

From Book Depository...

Tuesday, October 26, 2010

SpringOne 2010: Introduction to Spring Roo

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Introduction to Spring Roo
presented by Rod Johnson and Stefan Schmidt

I didn’t take a whole lot of notes in this session because it was entirely demo, however the ease with which Spring Roo can quickly create working (though rudimentary) CRUD web applications was quite amazing.

Rod gave some good explanation around the magic that makes it happen. Basically, the commands given to Roo result in it generating two sets of files. One set is the Java files that the developer can work with, for example adding business logic, without worrying that Roo might overwrite the contents of these files later (it doesn't). The Java classes are annotated with Roo annotations, e.g. @RooEntity, @RooToString, @RooJavaBean, that Roo uses to determine what additional functionality it will add to the annotated class.

The functionality that Roo adds in is actually defined in aspects that are maintained in files juxtaposed with the Java files. From what I could tell, there is basically an aspect for each Roo annotation on each Roo class, e.g. Customer.java has CustomerRooEntity.aj, CustomerRooToString.aj, etc. These aspect files are updated or re-written by Roo automatically as the Roo model is changed, so if you were to make any changes to them your changes to the aspect source files they would get wiped out by the next Roo operation. This is what allows Roo to provide round tripping (although, in truth, I think it’s the illusion of round-tripping) : it generates an empty, annotated class that you can edit to your heart’s content without fear of interference from Roo’s automated operations, while allowing all the Roo-controlled stuff to be “round-tripped” by keeping it in aspects that are kept up-to-date with Roo-controlled changes.

At first I thought this meant that every Roo class had several aspects over it at runtime, which sounded like a performance nightmare, but I found out later that the aspects are applied at compile –time, essentially acting as source mix-ins to the Java class file that will deployed.

I spent a little time thinking about how one might try to achieve a similar thing with Scala. The first thing that occurred to me is that you wouldn’t need to use aspects. Because Scala supports multiple inheritance, the extraneous data and functions that Roo is storing in aspects could, in Scala, just be separate traits that the main class extends. This would also alleviate the need for the annotations, if you wanted to get rid of them. (I supposed the annotations have the advantage that they're not actually tied to the code in the aspects. I'm fairly sure you can remove and introduce the annotations in the Java files and the aspects will disappear and reappear accordingly (that one part is true round-tripping), while the same would not be true of traits - they would have to exist before you could extend them, unlike the aspects.)

The other little thing I thought is that, using Scala, the @RooJavaBean functionality would be relatively unnecessary, seeing as Scala already has the @BeanProperty annotation to do the same thing (albeit on a per-field basis). So while the Roo code generation saves a lot of boilerplate for Java developers, I think Scala devs can achieve pretty much the same thing with some sensible common traits and minimal extra effort. (I'm just thinking about the entity stuff we looked at here. It's likely there's Roo goodness in the web tier for which Scala cannot naturally provide a neat alternative.)

At the end of the presentation, they showed the latest version of the Spring Insight project, I’m guessing because they used Spring Roo to develop it. (?) It looks very cool, and the level of detail you can browse down to is amazing, e.g. you can see the JDBC calls issued during a web request. I've found in the past that the problem with tools that have this much data is always in figuring out how best to represent it all sensibly and aiding the user in selecting where to drill down. From the little I’ve seen, some of the charts they have seemed to do a good job of this. It's definitely worth a try back home to see what it can do. They are currently working on a version that will be able to be deployed against production systems.

Monday, October 25, 2010

SpringOne 2010: Slimmed Down Software: A Lean, Groovy Approach

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Slimmed Down Software: A Lean, Groovy Approach
presented by Hamlet D'Arcy

This talk turned out to be much more of a review of Lean principles than about how Groovy supports these principles, but that was fine by me. Obviously the principles are far more important than the language you’re using.

The key principle of Lean Development is to eliminate waste, which in essence means to stop doing things that you don’t need to do.

Interestingly, Hamlet proposed getting more sleep and ingesting less caffeine as good development practices

Meetings are more often than not a form of waste, especially when they seek to produce 100% consensus. Hamlet talked through the four forms of decision arrival, from dictatorship at one end to unanimity at the other, with democracy and something else in the middle. He didn't really make a conclusion out of this, but my guess is that he was warning us to stay away from the extreme ends. Certainly, constantly trying to achieve unanimity would cause a lot of waste.

He highlighted unfinished work as an example of waste. For example, choosing to half-develop 6 features rather than finish 2 or 3 causes waste. He didn’t really go into why this is waste, but my immediate thoughts were:
1) that it requires energy – both from individual developers and from the team – to keep abreast of unclosed loops; and
2) that knowledge learned towards the end of one feature may accelerate the development or prevent a change in another feature if it’s developed later rather than simultaneously.

While talking about unfinished work as a form of waste, he criticised distributed version control (e.g. Git, Mercurial/Hg) based on the fact that local branches, which are essentially code that’s not committed to head, are unfinished work and hence waste.

He recommended Groovy for unit testing, even if the production code isn’t Groovy.

He briefly discussed Value Stream Mapping, which from what I could tell is basically graphing out the flow of a process, including dependencies, time delays and actions, some of which may only be required to be performed every Nth time through the process (e.g. maintenance). My take-away message from the example shown was that you shouldn’t just accept time that is wasted waiting for a process to complete, but should schedule other tasks, that you know will need to be done in the future anyway, to occupy this time. (This is all in reference to analysing one’s process, not the flow of a program.)

While discussing Value Stream Mapping, he mentioned that you really want to be measuring activities and waste in $$$, not something more abstract like time or gut feel. If you’re making business decisions, $$$ is the unit that makes sense.

He referred us to an article on DZone called the “Seven Wastes of Software Development” (Though the top hit for this phrase on Google is a 2002 paper by Mary Poppendieck [PDF])

Hamlet postulated that languages with less syntax, e.g. Groovy (or Scala), allow one to write unit tests that look/read a lot closer to the original requirements specification.

He talked a little bit about EasyB, a Groovy DSL for BDD, and explained how its hierarchical test cases allow you to write multiple tests based on a shared scenario.

He claimed that the Agile idea that “the tests are the documentation” has been over-sold by evangelists and under-delivered by developers.

He showed how the Spock testing framework has you list all your assertions as Boolean expressions in an “expect” block, eliminating the need to write an assertThat(... call on every single line.

He raised the idea that every time we are called upon to make a decision there are three potential outcomes: Yes, No, or Defer.

He talked about something called “Real Options”, which posits that every possible decision outcome has both a cost and a deadline or expiry date. It almost never makes sense to commit to any decision before the expiry of the next option, because that’s when you’ll have the most information. The problem with achieving this is that human brains are wired to eliminate unknowns by locking down decisions as early as possible. The solution to that is to make an action point out of deferred decisions, the required action being to collect more information and reconvene when the decision needs to be made.

All of the above being under the banner of “delayed commitment”, it occurred to me that a good method for getting good at this is to constantly be asking yourself and everyone around you the question: “Is this the best time to be making this decision?

He mentioned Canoo a couple of times, which was the firm he was working with while experimenting with all this lean and Groovy stuff. (I think they were doing something with App Engine?)

He said that his team stopped using fixed-length iterations because “two weeks is not a unit of value but a unit of time”, i.e. you should be releasing when you have value to deliver, no sooner or later.

He suggested reducing the number of integration tests because having these tests fail due to valid changes to the system is a form of waste. I actually disagree on this one. Around my workplace, the idea that you should delete a test because it keeps failing is an ongoing joke. Obviously you shouldn’t have tests that duplicate each other or if they fail for no reason – both of which result in waste. However, if you’re doing TDD, you probably want to change the test first anyway, so it shouldn’t break when you change the implementation, it should pass! It was very interesting to hear someone working with a dynamic language suggesting having less integration tests. My assumption has always been that, if anything, you would need more tests at this level to prove the correctness of the wiring of your essentially un-typed components than you would with static typing. Now that I think about it a bit deeper, I was probably wrong - you shouldn't need any more tests - you should need exactly the same amount. Full coverage is full coverage. Having more wouldn't prove anything.

I really liked this: He emphasised that when you decide to make a change to your process it is an experiment – you should define a time limit and then assess, as professionally as possible (i.e. without bias), the results of the experiment before deciding whether or not to make a permanent change or to start a different experiment.

He criticised what he called “closed-door architecture”, where a small set of “architects” within an organisation decide what technologies will be used and dictate these to the rest of the developers. He didn’t mention his exact reasoning for talking this down, but the obvious ones I see are the demotivational effect on the non-architect employees and the potential to miss good ideas by not providing everyone with an opportunity to contribute their brainpower and expertise to the problem. I think, in order for this to work well, you need a pretty mature bunch of developers. If you're going to canvas everyone's opinion, then everyone needs to be really good at leaving their ego at the door, otherwise you're going to end up in a six hour meeting about which developer has the best idea rather than which idea is best for the customer.

In the context of introducing Agile practices to an organisation, he discussed an equation from some book that says that the value of a change to an organisation is relative to Why over How, meaning that a big organisational change (large How) has to tackle a big problem or create a big advantage (large Why) in order to provide value. Changes that can provide a large benefit with minimal impact on the work (that's a large Why over a small How) are obviously the sweet spot in terms of increasing value.

Lastly he showed an example of a Groovy script that used a @Grab annotation to download a Maven dependency and bring it into the classpath at runtime. Very cool.

Want to learn more?

From Amazon...


From the Book Depository...


Sunday, October 24, 2010

SpringOne 2010: What's New in Spring Framework 3.1

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

What’s New in Spring Framework 3.1
presented by Juergen Hoeller

Juergen started off with a review of the new features that were added in 3.0
(I’ve only noted down things that I’m not already using but thought are probably worth trying out)
Next was what’s coming in 3.1...
  • Servlet 3.0 (I didn't know much about this, but there was a good overview at JavaOne [PDF])
  • 'Environment Profiles' will allow for placeholder resolution based on a specified environment name (or names). This allows different configuration in, for example, dev and production, without having two different deployment artifacts. I think he also mentioned an Environment abstraction, which I assume would be for accessing the same configuration programatically.
  • They are putting some effort towards bringing convenience of specialised namespace XML elements, e.g. the task: namespace, to the annotation-based Java config
  • An abstraction for Cache implementations (including implementations for EHCache and GemFire to begin with), along with a @Cacheable annotation for aspect-oriented caching.
Lastly, he covered the main points that are currently on the cards for 3.2 ...