read me

De-chunkifying Sequences in Clojure

Jan 22, 2010

At the first CAP CLUG meetup I gave a presentation about the features of the recent Clojure 1.1 and one of the topics that garnered interest was that of chunked sequences. For those of you not familiar with chunked sequences they can be summarized fairly easily. Clojure sequence functions are lazy, however with the release of Clojure 1.1 the granularity of this laziness was changed from a 1-at-a-time to a chunk-at-a-time model. In other words, instead of “walking” through a sequence one node at a time, chunked sequences provide a “windowed” viewed on sequences 32-elements wide. We can see this in action with the following code:

(def gimme #(do (print \.) %))

(take 1 (map gimme (range 32)))

We might expect that this little snippet would print (.0) because we’re only grabbing the first element, and if you’re running Clojure 1.0 that is exactly what you would see. However, in Clojure 1.1 the picture is a bit different:

;=> (................................0)

If you count the dots you’ll see exactly 32, which is what we would expect given my statement from the first paragraph. Expanding a bit further if we increase the argument to range to be 33 instead what do you expect we’d then see?

(take 1 (map gimme (range 33)))
;=> (................................0)

Yep, that’s 32 dots again. To move our chunky window to the right is as simple as asking for its first element by:

(take 1 (drop 32 (map gimme (range 64))))
;=> (................................................................32)

You be might starting to wonder about this. In fact, you might be worried that chunked sequences have squashed the entire point of lazy sequences and for small sequences you would be correct. However, the benefits of lazy sequences are really striking for those of cyclopean magnitudes — I’m talking larger than memory big. Chunked sequences in those cases in those cases are an incredible boon because not only do they make sequence functions more efficient overall, they still fulfill the promise of lazy sequences: avoiding full realization of interim results.

However, there are legitimate concerns about this chunked model and one such concern is that of desiring a 1-at-a-time model to avoid exploding computations. Without going into too much detail about that, a nice straw-man to use as a counterpoint against chunked sequences is that of building an infinite sequence of Mersenne primes. That is, implicit realization of the first 32 Mersenne primes through chunked sequences will be quite costly indeed.

Therefore, in order to combat such situations I present a simple function (useable only in the current “new” branch) to de-chunkify a lazy sequence and enforce the 1-at-a-time model:

(defn seq1 [#^clojure.lang.ISeq s]
  (reify clojure.lang.ISeq
    (first [] (.first s))
    (more [] (seq1 (.more s)))
    (next [] (let [sn (.next s)] (and sn (seq1 sn))))
    (seq [] (let [ss (.seq s)] (and ss (seq1 ss))))
    (count [] (.count s))
    (cons [o] (.cons s o))
    (empty [] (.empty s))
    (equiv [o] (.equiv s o))))

(take 1 (map gimme (seq1 (range 32))))
;=> (.0)

(take 1 (drop 32 (map gimme (seq1 (range 64)))))
;=> (.................................32)

And there you go! Now you can again safely generate your lazy, infinite sequence of Mersenne primes! The world rejoices.

Bear in mind, that the code for seq1 is in no way official and should not be used for production code. Clojure will one day provide an official version of the function above, but for now I simply took a rough sketch posted by Rich Hickey and made it work with the “master” branch as an exercise and to hopefully gain more insight into the whole matter of chunkiness in general. Hopefully, it can serve the same purposes for you.

-m

p.s. I’m writing a book about Clojure with Chris Houser. The matter of chunked sequences and many other Clojure topics will be covered. ^_^

Related posts:

  1. Clojure Golf episode 1 This is an attempt to have some fun with Clojure....
  2. Clojure Golf – Episode 2: Largest Prime Factor In the last episode of Clojure Golf we saw some...
  3. Understanding the Clojure -> macro On page 109 of Paul Graham’s ANSI Common Lisp he...
  4. Clojure’s :pre and :post One of the more exciting features of the upcoming Clojure...
  5. Joy of Clojure TOC Updates Chouser and I are working hard to get the material...

Related posts brought to you by Yet Another Related Posts Plugin.

6 Tweets 6 Comments

13 Comments, Comment or Ping

  1. RT @fogus: De-chunkifying Sequences in Clojure –> http://blog.fogus.me/2010/01/22/de-chunkifying-sequences-in-clojure/

    This comment was originally posted on Twitter

  2. De-chunkifying Sequences in Clojure – http://su.pr/2zWxSN

    This comment was originally posted on Twitter

  3. De-chunkifying Sequences in Clojure: http://bit.ly/7Kl6g2 Comments: http://bit.ly/6z8JAL

    This comment was originally posted on Twitter

  4. De-chunkifying Sequences in Clojure http://bit.ly/6NgN3U

    This comment was originally posted on Twitter

  5. Is there any way to de-chunkify sequences in an exponential way ("uncover" twice as many elements each time instead of a constant number)? Would this be useful?

    This comment was originally posted on Hacker News

  6. wouldn’t it be better to have the intuitive behaviour by default, and the space-optimised behaviour as a configurable option?

    This comment was originally posted on Hacker News

  7. Simon Hawkin

    As someone else remarked in Hackers News, I too wish the default would be 1-at-a-time, and the chunk-at-a-time optimization would be an option. Would be more clear, more intuitive, and more consistent with the rest of Clojure spirit.

  8. Thanks for clarifying chunked sequences. This is the clearest exposition of chunked sequences I have ever seen so far. yes, I have a MEAP copy of your book.

    Now, isn’t chunking and dechunking, an implementation detail? Users expect some behavior from lazy behavior, chunking changes it and that is an implementation detail. Similarly with seq1 (dechunking). Why is that leaked into the outside view of the language?

  9. HNews: De-chunkifying Sequences in Clojure http://bit.ly/6atbO6

    This comment was originally posted on Twitter

  10. Colin Curtin

    typo: in those cases in those cases

  11. >transparent unless you have a side-effectOnly if you believe the side effect of using the CPU doesn’t count.

    This comment was originally posted on Hacker News

  12. yes, i understand the technical tradeoff.my point is that telling your users they aren’t sensible quickly becomes tedious (and tends to annoy them).

    This comment was originally posted on Hacker News

  13. Rich hickey heard youhttp://clojure-log.n01se.net/date/2010-01-22.html#15:01

    This comment was originally posted on Hacker News

Reply to “De-chunkifying Sequences in Clojure”

Additional comments powered by BackType