Tuesday, November 16, 2010

SpringOne 2GX 2010 and Scala

SpringOne 2GX 2010 Wrap-Up

My last entry was the last of my notes from SpringOne 2010 in Chicago.
Here's a list of all my entries from the conference below in case you missed any:
If you're interested in watching the full content of some of the talks, I notice that they have just started publishing SpringOne videos on InfoQ, including Rod Johnson's keynote. I imagine this set will grow over the coming weeks.

Is This a Scala Blog or a Spring Blog?

I copped a little bit of flak recently due to all these Spring posts appearing on Planet Scala, an aggregation site for Scala blogs. The complainer's main issue was that I was spamming Planet Scala with "posts unrelated to Scala" and I think the core point of my response is worth repeating here.

One of the main reasons for Scala's popularity is its tight integration with Java. This integration allows Scala programs "access to thousands of existing high-quality libraries". It says so on the Scala website. Of all these "thousands of libraries", the Spring Framework is without a doubt the most popular enterprise framework for the JVM.

It's obviously not the case that everyone with any interest in Scala will also be interested in Spring. In fact I think there is probably a higher percentage of programmers in the Scala community who are utterly un-interested in enterprise web applications than there would be in the Java community. However, I believe that anyone applying Scala in an enterprise context, or even thinking about applying it there, is either pretty interested in Spring or, if they're not, probably should be, at least to the extent where they know what it does and where it's headed. Ergo, Spring posts on my Scala blog. I hope some of you have enjoyed the information, and I apologise to those that it may have annoyed.

Saturday, November 13, 2010

SpringOne 2010: GemFire SQLFabric - "NoSQL database" scalability using SQL

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

GemFire SQLFabric - "NoSQL database" scalability using SQL
presented by Jags Ramnarayan and Gideon Low

Jags started by proposing that the ACID semantics, which basically aim to ensure every change to a database is written to disk, are limited in the performance they can achieve due to being essentially I/O-bound. He also suggested the desire for these semantics is rooted in history when storage hardware and networks were slow and unreliable. Though he didn't say it explicitly, I think he was suggesting that this is no longer the case, with the implication being that we no longer need ACID.

He outlined how the "NoSQL" movement actually has nothing to do with SQL at all - i.e. the query language - but with the data structures - relational databases - that have typically been the subject of these queries. His point: You don't have to reinvent data querying just because you're re-inventing the data structure.

That, of course, led into the design of GemFire SQLFabric, the data management technology that was acquired by SpringSource/VMWare in May 2010. Jags said that, from an application developer's perspective, using GemFire SQLFabric is mostly identical to using other databases, just with a different JDBC URL and some custom DDL extensions.

I didn't jot down Jags' description of GemFire, but here is the spiel from the front page:

GemFire Enterprise is in-memory distributed data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes to manage application objects and behavior

Jags outlined a bit of the structure of a GemFire cluster and then said that, because of the structure, it didn't give great performance for key-based access, or for joins, which left me wondering what is was good for! (Edit: Turns out I misheard him. The performance is better for joins compared to other object data grids.) I think he clarified the join part later, though, when he discussed that data needing to be joined in a query must be co-located on a single node.

The GemFire table creation DDL provides extensions for specifying how a table is to be replicated or partitioned, and how much redundancy should back the partitions. These extensions allow the DBA to ensure that data to be queried together is co-located.

If no partitioning or replication options are specified in the table DDL, GemFire will make decisions about default options for these based on the relationships (i.e. foreign keys) apparent in the table definition.

He said that GemFire uses the JDBC drivers and query planning and optimisation code from Apache Derby.

While talking about joins, Jags mentioned that co-location of joined data is required in order to achieve linear scaling. He mentioned that co-location of data is only currently a restriction of GemFire, implying that they intend to remove this restriction, though he didn't mention whether they would be tackling the linear scaling problem when they do this.

He talked about the way in which many of the design decisions they make in GemFire are focussed on making the absolute minimum number of disk seeks. I think that's hard-core stuff! I've been coding commercially for over ten years now and I've never once thought about how many disk seeks my code is causing.

Gideon showed some of the networking that occurs to make GemFire work and discussed how there is a central component called a 'Locator' that the cluster nodes use to find each other and which also performs load balancing of client requests. Strangely, this seemed like a classic single-point-of-failure to me, but there was no discussion about that problem.

I came away not really being sure what GemFire could be used for. Jags' comments about ACID at the start seemed to suggest that he thinks we are over-obsessed with reliability in the modern age. However, in my finance day job, we need to be 100% certain that pretty much every piece of data we push to the databases is stored for good when the transaction ends. Even 99.9999% (yes, six nines!) is not good enough: if 1 in every million DB transactions goes missing, money will go missing and we'll have a very angry customer on the phone. Unfortunately, they didn't cover during the talk how (or whether) GemFire handles reliability requirements like these.

Having said all that, however, I noticed that GemStone have an essay on their site called "The Hardest Problems in Data Management", in which they discuss the demanding needs of financial applications and suggest that while the popular "eventually consistent" distributed databases do not measure up to these demands, their implementation does. Having read a few paragraphs, they certainly seem to know what they're talking about from a theory perspective. If you're seriously looking for a solution in this space, I would suggest you have a good read of their documentation rather than just relying on my scratchy notes here.

Want to learn more?

From Amazon...

From Book Depository...

SpringOne 2010: Extending Spring Integration

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Extending Spring Integration
presented by Josh Long (@starbuxman) and Oleg Zhurakousky (@z_oleg)

One of the most interesting things these guys said is that integration, as an industry need, is extending beyond the specialty of systems integration and toward customer integration: that is, getting systems to integrate with the users, or the users' devices, rather than just neighbouring systems.

Though I've been to two other talks that mentioned or demonstrated Spring Integration (SI), this is the first one I'd attended that focused on it exclusively. Hence, this was the first place where I received the simple definition of SI that I've been searching for. The answer was that, at it's core, Spring Integration is just an embedded message bus.

The guys went on to describe the basics of enterprise integration patterns and of Spring Integration in particular...

Channels are the most important concept in SI. These can be point-to-point or publish-subscribe and can also be synchronous or asynchronous. Enterprise Integration involves lots of different types of components for connection and processing the messages travelling through Channels: Transformers, Filters, Routers, Splitters, Aggregators, etc. All these are not Spring-invented terms but are commonly accepted integration patterns.

There is a Spring Roo add-on for Spring Integration currently in development. There are no milestones yet but apparently it is currently useable. (It's available from the SpringSource GIT repository under spring-integration/roo-addon)

It seemed very simple from a quick demo to create a persistent queue with just a few lines of XML. This is the kind of stuff that I'm very interested in. Unfortunately, there was no discussion about transaction coordination between the queues and other data sources.

A 'Service Activator', which is typcially the end point of a message chain, can simply be any POJO that defines a method that accepts the payload of messages, but is itself completely message- and SI-agnostic. In other words, you can use SI to deliver messages to your existing service-layer components without any change to their code. Pretty neat.

In response to a question, it was said that you could use a Service Activator to implement filters, transformers, etc., by holding a reference to the next channel in the chain, however this would bring integration concerns into the domain of your business logic.

They explained how the whole point of Spring Integration is to separate this logic away from you business logic, so your business code just deals with payloads and handles business-ey stuff, while your integration components - be they out-of-the-box ones or custom extensions - just deal with integration concerns and know little about the business function (other than routing). In a nutshell: The Spring team believe integration should be a configuration concern, not a business logic concern. This was a bit of a lightbulb moment for me.

There is a Spring Integration Samples project showing lots of different ways to use SI and they have a JIRA where you can request the addition of new samples! Good stuff.

After the talk ended, I wanted to ask a question about transaction coordination between Service Activators and message queues, but unfortunately, there were about 20 other people that wanted to ask questions as well so I couldn't get a word in.

I almost forgot that this talk was about extending Spring Integration. I'm sorry to say that I didn't write too much down about this because creating your own integration components is just so easy that's it's not really worth me re-documenting it. They did talk a little bit about rolling your own inbound and outbound channel adapters at either end of a channel in order to integrate with systems that the SI components don't yet support, but this stuff was a little over my head being a SI newbie.

One thing that I'm still keen to know but which I haven't been able to glean so far is exactly how transactional message queues like RabbitMQ fit into the picture. Are they installed as some special implementation of a Channel? Or is a message broker just another SI-agnostic endpoint? (Perhaps using an channel adapters?) Or can you use either approach, depending on your circumstances (e.g. single-process system with persistence vs. distributed system)? If you have any experience in this area, any comments you're able to add to reveal some more depth would be greatly appreciated.

If you're interested in watching the video of this talk, it's been published on InfoQ (1 hr 22 min)

Want to learn more?

From Amazon...

From Book Depository...

Friday, November 12, 2010

SpringOne 2010: Gradle - A Better Way to Build

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Gradle - A Better Way to Build
presented by Hans Docketer

Note that any references to the Gradle User Guide from this blog entry are to version 0.8, as that was the most recent GA version at the time of writing. If you are using a later version of Gradle, you can might want to check the latest User Guide instead.

Hans is the creator and lead developer of the Gradle build system. He’s also the CEO of Gradle Inc., a consulting company specialising in project automation.

First up, Hans explained that Gradle is actually mostly written in Java, with only the DSL being written in Groovy.

Gradle supports building Java, Scala, Groovy, Web projects, etc.

It appeared very easy to add a Maven dependency into the classpath. (Probably easier than in Maven!)

He demoed an example of developing a small, custom build operation that copied an image (of Rod Johnson) into the META-INF/ directory of any output JARs that had 'spring' in the name. While this was possible with only a few lines of code, I have to say that the code wasn’t all that intuitive and I think you would have to know quite a lot about the way Gradle structures things before you can easily write tasks like this yourself.

There’s a very easy method for importing Gradle tasks from external sources by specifying not much more than a URL.

Hans said that, in comparing Gradle to Maven and Ant, he believes the use of Groovy in Gradle instead of XML is not the main differentiator, but its design is where the gains come from.

He gave a pretty impressive demo of how Gradle detects changes in sources, tests, classpaths and even the Gradle script itself, and then only rebuilds those parts of the project that absolutely need to be built (and, of course, the things that depend on them).

While Gradle can be used to build projects that aren’t written in Groovy, it was my observation that you probably need to have a fair level of proficiency in Groovy in order to compose anything other than a basic build script.

It’s pretty easy to separate the declarative parts of a custom task from the imperative parts by defining a class at the bottom of the script that. (I found it interesting that, even though the syntax of a Gradle file is mostly declarative, Hans was still referring to the file as a 'script'.)

Gradle exhibits heavy integration with Ant that is also bi-directional, i.e. Ant tasks can depend on Gradle tasks.

Hans highlighted that Maven’s defaults (which he called a “strong opinion”) for things like project directory structure are one reason that people will avoid migrating existing projects to Maven. While this might be true, I think it’s based on a misconception - in reality, it’s quite trivial to override the defaults of things directory locations in a Maven POM.

Gradle uses coloured output, which I think is pretty cool for a Java-based build tool.

Hans noted that the flexibility of Groovy means that creating custom operations in a build script is pretty easy. Having seen smart people spend a week or two writing a pretty simple custom plugin for Maven, I think making the customisation process easier is definitely a win. (On the other hand, in four years of using Maven, we've only ever created one Mojo. We have, though, often used the Ant plugin to "break out of the box".)

Hans gave a demonstration of dependencies between sub-projects of a larger project which was pretty impressive, showing Gradle building and testing both dependency and dependant projects based on what changes had occurred.

He talked about how performance is a big focus of Gradle, in particular getting the dependency and change behaviour right to ensure that Gradle only ever builds things that have been affected by changes.

Gradle is able to generate a pom.xml file for a project and deploy an artefact to a Maven repository along with the generated POM.

Gradle doesn’t support (out of the box) a ‘release’ action like Maven’s Release Plugin (which creates branches and tags in the code repository, deploys build artefacts to a remote repository and automatically updates version numbers in the build files on head/trunk to the next snapshot). However, Hans said that they eventually want to develop a full deployment pipeline based on Gradle, which will be one of the focus points after version 1.0 has been released.

Want to learn more?

From Amazon...

From Book Depository...

Thursday, November 11, 2010

SpringOne 2010: Harnessing the Power of HTML5

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Harnessing the Power of HTML5
presented by Scott Andrews (@scothis)and Jeremy Grelle (@jeremyg484)

First up, the guys cleared up some confusion by explaining that the term HTML5 is currently being used to encompass much more than just the latest W3C HTML spec, but also all of the related sets of technologies that are now being standardised in an attempt to ease the creation of dynamic webapps. It really means HTML 5 + CSS 3 + new JavaScript APIs like WebSockets and WebWorkers.

They demonstrated a lot of the upcoming technologies through the rest of the talk by showing a presentation from html5rocks.com, a site Google has put together to promote the stuff.

Web storage (which, ironically, means storing data on the client) is coming. Local storage – a basic key/value store – is pretty much standardised. Another idea for a client-side database is still being discussed, but has been implemented in WebKit and hence is available on many mobile devices. Note that the data in this database would, like cookies, not be accessible by scripts from other sites, but would be unsecured on the local machine. Likely uses are things like data storage for offline access of cloud data. (That's cloud computing data, not meterological readings.)

There is now an application cache manifest that allows the server to instruct the browser about cacheable resources and allows the browser to skip checking the last modified date/time of these resources as long as the date/time of the manifest hasn’t changed.

Web Workers allow JavaScript code to execute something in a separate process. They mentioned that this is specifically a process and not a thread, the implication being that the spawned process doesn’t have access to the memory of the parent.

Web Sockets are being billed as the new XHR and the new Comet.

Javascript Notifications will allow the application to request the browser to show a Growl-style status popup, but only if the user has already given permission for the site to show these.

Drag and drop is being supported at the browser level.

There is a proposal to provide access to a user’s current latitude and longitude.

Scott and Jeremy recommended that people not try to use these new features directly, but continue to use the various existing JavaScript libraries that already do the same thing, with the hope being that these libraries will, over time, be upgraded to use (and abstract) the native support where it is available (assuming the native support will be better than the current methods).

There are new semantic tags in HTML 5 for <header>, <nav>, <section>, <article>, <aside>

HTML 5 includes new link/rel types include ‘pingback’ which will allow a page to provide a URL for the browser to call if the user leaves the site, and ‘prefetch’ to specify other pages for the browser to load into the cache in anticipation of where the user might go next.

There are new HTML form input types that restrict the type of data that can be entered as well as providing some visual, optimisation and input hints, which is especially useful on mobile devices. New input types include date, email, range, search, tel, color and number.

There is also the ability to provide basic client-side authentication (without JavaScript) by specifying constraints such as required or a regular expression pattern. A new CSS :invalid pseudo-class can be styled to change the appearance of invalid fields declaratively.

There are new tags for defining ‘meter’ and ‘progress’ elements (the latter for both indeterminate and % complete).

There are plans to support embedded audio and video in HTML 5 without the need for any plugins, although there are currently arguments going on about what codec should be the standard.

There’s a Canvas element for just drawing pretty much anything using Java2D-like shape, line and fill constructs. There is a pretty impressive example in the Google slides where photos can be dragged, rotated and scaled. JavaScript is required to change the contents of the canvas, but the API as a concept looks pretty neat.

There is support for JavaScript events coming out of SVG objects using native HTML tags.

CSS 2 selectors like ‘first-child’ are being more widely supported and new ones like ‘not’ are coming in as well.

CSS is introducing font downloading, allowing everyone visiting you site to see the same font without having to use image replacement. (As someone who's had to do a bit of this in the past, I have to say 'yay!')

There are STILL experiments with having CSS manage column-based layouts (link from 2005!), although this is only implemented in WebKit at the moment.

CSS3 will support HSL (Hue/Saturation/Luminance) colour definitions.

border-radius for rounded corners is becoming standard, except in IE 9!!! Says the speaker: “So IE will be square.” (I've since tracked down an announcement on msdn.com that seems to suggest that border-radius will be in IE9.)

Gradients, shadows and animations are all getting some standardised support in CSS3.

It will be possible to intercept changes to the history (e.g. capture the back button) within an Ajax app and load data rather than allowing a whole-page refresh.

There is a library called Atmosphere that provides a server-side abstraction over Comet and WebSockets to allow these protocols to be handled on any Java web server (many of which currently support this stuff but through proprietary APIs).

They showed a pretty cool example where they were using Canvas and WebSockets on the client side with Spring Batch and Spring Integration on the server side to parse a large file, stream interesting data to the client through an Integration channel and visualise that data in the browser.

I've found this funky little website callled caniuse.com that allows you to specify browser names and versions along with HTML 5 features that you would like to use and it will show you the level of support in each browser & version for that technology.

Want to learn more?

From Amazon...

From Book Depository...

Saturday, November 6, 2010

SpringOne 2010: Concurrent and Distributed Applications with Spring

I spent the second-last week of October at SpringOne 2GX 2010 in Chicago and I thought some of you might get something useful out of my notes. These aren’t my complete reinterpretations of every slide, but just things I jotted down that I thought were interesting enough to remember or look into further.

Concurrent and Distributed Applications with Spring
presented by Dave Syer

My favourite quote from this talk, and possibly from the whole conference, is one which I want to take back to my workplace and put into practice with avengeance:

Using single-threaded algorithms on 32-core machines is a waste of money

Dave also presented a really simple but useful definition of thread-safety:

Thread safety = properly managing concurrent access to shared, mutable state

Applying this definition, you can see there are three ways to tackle thread-safety: you can either eliminate mutability, or you can eliminate sharing or you can eliminate concurrency. Eliminating concurrency is the core aim of mutexs and locks, e.g. synchronized blocks. Eliminating mutability is one of the chief design idioms of functional programming.

On the topic of eliminating shared resources, Dave pointed towards the frequent use of ThreadLocal within the Spring Framework to associate unshared resources to individual threads. He made note of the potential for memory leaks with ThreadLocal, highlighting that, with the long-running threads in most servers, you have to ensure you clean each ThreadLocal up when you’re finished otherwise your data will hang around forever. (Sounds like going back to pairing new & delete!)

Dave talked about a method on ExecutorService that I've never used before called invokeAny() that will execute every task in a given list of tasks concurrently (assuming a multi-threaded ExecutorService implementation) and return the result of the first one to complete. The remainder of the tasks are interrupted. I imagine where you might use this is if you have a situation where you have two or three different algorithms, each of which can outperform the other two for certain structures of data, but where the most efficient algorithm for a given individual input can't be (easily) determined before execution. So, on a many-multi-core machine, you have the option of just running all three against the same data, taking the result from the first algorithm to complete and killing the others.

Dave briefly discussed an emerging (I think?) pattern for concurrency called Staged Event-Driven Architecture or SEDA.

He mentioned that Spring Integration 2.0 (RC1 released Oct 29) includes support for transactional, persistent message queues.

He highlighted the difference between between a distributed Applications (running the same binary on multiple nodes) and a distributed Systems (running related, communicating applications across multiple nodes). He said that it was wise to prefer looser coupled messaging architectures for distributed Systems because of the likelihood of unsynchronised release cycles.

Want to learn more?

From Amazon...

From Book Depository...