Book Review: Programming Concurrency on the JVM


It’s been a long time since somebody wrote an in-depth book about programming concurrency for Java. To my knowledge it dates back to 2006; with the ground breaking “Concurrency in practice” by none less than Brian Goetz, Dough Lea, Joshua Blosh and a few other gentlemen. Even though not the object of this review, that is a book every Java developer has to read someday. Otherwise you can’t honestly claim you understand concurrency in Java. And read it again if you didn’t feel the blood drain from your face. When thinking about the consequences of what you read on your code at work. 😉

That being said, I’ve just read this book hoping it would be the next concurrency manual for Java. It turned out to be interesting lecture. But not as fundamental as the aforementioned book.

What this book isn’t

  • An extensive explanation of the post-2006 concurrency API in Java. More precisely the package java.util.concurrent. It does scratch its surface though.
  • A Java-only book. In this book Java is just one of the many JVM languages available today. Scala, Groovy, Clojure and JRuby are also covered. With Scala and Java as the main languages.
  • A reference book about concurrency. The book explains the principles and does contain many code samples. After reading it all you’ll still need to do more reading- and experimenting before applying it all to real applications.
  • An entry-level manual to concurrency. I would advise at least some Java programming experience before starting with this book.

What this book is

  • An introduction-level description to the available tools in the JVM for concurrency and the JVM memory model. It is the first book I read that describes the memory barrier in understandable layman’s terms. Also JDK7’s Fork/Join is covered. So there’s a good point for Java amateurs.
  • An extensive guide to practical STM using Clojure, Akka and Multiverse. All for Java, Scala, Groovy, JRuby and Clojure.
  • An extensive guide to practical Actors using Akka. For the same JVM languages.
  • A practical guide to mixing both.

A positive note this book sets is the fact that Java is not alone any more on the JVM. Other languages have arisen to a level they might just be useful for professional use. And because they all compile to JVM byte code, they are potentially able to share their unique capabilities with each other. The evident proof of this is that the main subject of this book, Akka, has been written in Scala. And yet, it can be used in your own favorite JVM language. Even if that isn’t Scala. Even if you don’t know Scala that well. Even if you’re only familiar with Java for that matter.

Maybe this book is at least beneficial for that: Even if some would argue Java slowed down to a snail’s pace, we can still write “rabbit” code in another language and integrate with Java. So personally, I don’t much mind the Java controversy any more. I won’t hide my personal preference for Scala and functional programming. And I’ll welcome the first occasion to demonstrate their usefulness for our profession. Also to non-Scala developers. Although I won’t argue Scala’s toolkit is far from being as robust and complete as Java’s.

Designing for concurrency

The book starts wit sections about concurrency. What it is, why you should matter and a few practical hints depending on what you want to run concurrently. Is it CPU intensive ? Then you better not have more threads than processors. Is it I/O intensive ? Then you can have more threads because some will be waiting for I/O to complete. Etc…

Personally, I think it is a good, common-sense, introduction to concurrency. And about what to take into account when using it. I haven’t yet seen much concurrency books doing this.

There are 3 solutions …

After the basic principles are clear, the author designates the evildoer: mutable state. These are the variables that can be changed by different threads. Whether this is concurrently or not. Because of the modern computer architecture, those threads might actually perceive a different value for the very same variable ! The problem is that different caches can cache that variable. And if you do nothing to “flush” those caches, different threads will see different values for the same variable. And that’s the first solution: synchronizing shared mutable state. And that means using the JVM’s tools. The author doesn’t hide it’s an art on its own. Because there are no fail-safes in the JVM. It will most certainly silently fail. And therefor it’s a heck of a problem to debug.

A less-know solution for the equation is to make the variables immutable. Apply shared immutable state. This avoids the dirty cache problem. Practically this means making them final. Unfortunately these are no build-in checks in the JVM to make sure you only share immutable objects. You might unknowingly share mutable state. Even when you really though you made everything final. So this is not so easy as it looks. Unless you rely on STM transaction managers which won’t allow you to change anything; unless you explicitly started a transaction first. It’s very comparable to transactions for databases. All changes occur or they don’t. Except that this all happens into memory.

(Personally I would advise to make everything final that can be. It avoids useless synchronization problems. And it’s an easy way to state your intention: This should not be modified. Applied to method arguments is avoids overwriting their values.)

The third way is the way of the Actor. Each thread has its own, isolated, state no other thread can access. Save for immutable message objects. Usually sent as the payload of asynchronous messages. That’s the solution of the isolated mutable state. If nobody else can access it, then there’s no need for synchronization. Right ? 🙂

Tackling Shared Mutable State

A practical use case is presented on how to use concurrency for computing the total size of directory. This is obviously a recursive algorithm that can benefit from concurrency. By having different threads scanning different sub-directories. But it also gives rise to the problem of shared mutable state when those threads are reporting their results. All these results need to be aggregated into a single result. Which represents the shared mutable state.

The tools available in the JVM are shown in different versions of the use case: Thread pools (because creating threads is expensive), Fork/Join, Locks, Latches, …

What I learned from this section:

  • Concurrent collections are better than synchronized collections.
  • The fork/join in JDK7 does work stealing. Resulting in better performance.
  • Use the Exchanger class to share data between threads. I didn’t know about that class.

Software Transactional Memory

The author starts with an in-depth introduction to STM. And proves in practical ways this no mere theory any more.

For us, JVM developers, there are three ways:

  • The Clojure language; which has the implicit restriction that everything is immutable. Unless it’s done during a transaction. Even though this is a different language, we can borrow this behavior in other JVM languages. Including Java.
  • The Multiverse library.  Again, because it’s all the same JVM family, it’s available to other languages too.
  • The Akka library. With the same advantages.

The author finishes with a warning though: It’s not a magic bullet. It’s okay if you’re not frequently writing. Too many writes will negatively impact performance. Exponentially with the number of concurrent transactions. So beware ! 🙂

What I like with this approach is the build-in safety of not being able to actually do it wrong. And the ease of use of the libraries is demonstrated. And that the transaction manager will retry failed transactions. Something we can only dream of for our ORM libraries. Maybe some day Hibernate will ?

Actors

The second novel tool in the toolbox are actors. Comparable to the STM section, the author starts with an in-depth introduction of what it is. Followed by practical use cases of Akka in different languages. Also in Java. Which turned out to be a breeze really. Just fire messages away using non-blocking methods. And occasionally wait for a response. If that’s ever needed.

What’s particularly impressive is the typed actor. It’s really an actor in disguise. It’s used as a regular object while under the covers it is using asynchronous message exchange. (I guess some byte code weaving takes place there.) And if one machine is not enough, you have remote actors. The different Akka instances communicate with each other over the network with immutable, Serializable, messages.

Just like for STM, the author ends with a warning: It’s very efficient for “fire and forget” method invocation. Occasionally waiting for a reply is no problem either. But often waiting for replies is asking for trouble. And that’s not what it is meant for anyway. The Actor  model is really about asynchronous data exchange.

Transactors

For those willing to push the envelope even more you’ve got transactors. Which are actors running in the context of a transaction. This becomes interesting when more than one actor is involved. Because in the end they need all to run using the same transaction. And the transaction must be rolled back if any of them fails.

Luckily Akka has a solution for this too. And as already demonstrated, it’s no big deal doing this.

Personal Conclusion

After reading this book, I think some people might consider its content too “experimental” or “novel”. To be immediately applicable in the Java/JVM enterprise world that is. I can not tell for me personally because I have no practical experience with STM or Actors. But now that I know the principles and the practical approaches to implement them, I’ll recognize the first occasion to give it at least a try. And if it performs as expected, I would certainly implement it.

I think, unfortunately, it will still take some time before either solution becomes mainstream. Simply because there’s no way to prove it does its job. Classic ORM tools rely on robust and proven transaction managers. Whose effectiveness are not being questioned any more. Even if all my tests prove STM works, how am I ever going to prove it will never fail in a production environment ?

That, my dear Watson, is the question…

If I were to give a score from 1 to 5, I would give it 3.5. Not 3, because it’s a good and informative book. But not a 4 either because this is the kind of book meant to make you aware of the existence of certain facts. If you want to apply them, you’ll have more reading and experimenting to do.

Book References

Programming Concurrency on the JVM: Mastering Synchronization, STM, and Actors

by Venkat Subramaniam

280 pages, 2011-09-02

ISBN:978-1934356760

Book’s homepage
http://pragprog.com/book/vspcon/programming-concurrency-on-the-jvm

Published in: on November 30, 2011 at 22:38  Leave a Comment  
Tags: