read


read
or learn more

De-chunkifying Sequences in Clojure

Jan 22, 2010

note: code modified to work with Clojure v1.2 on 2010.Nov.27 thanks to a reminder by Colin Jones

At the first CAP CLUG meetup I gave a presentation about the features of the recent Clojure 1.1 and one of the topics that garnered interest was that of chunked sequences. For those of you not familiar with chunked sequences they can be summarized fairly easily. Clojure sequence functions are lazy, however with the release of Clojure 1.1 the granularity of this laziness was changed from a 1-at-a-time to a chunk-at-a-time model. In other words, instead of “walking” through a sequence one node at a time, chunked sequences provide a “windowed” viewed on sequences 32-elements wide. We can see this in action with the following code:

(def gimme #(do (print \.) %))

(take 1 (map gimme (range 32)))

We might expect that this little snippet would print (.0) because we’re only grabbing the first element, and if you’re running Clojure 1.0 that is exactly what you would see. However, in Clojure 1.1 the picture is a bit different:

;=> (................................0)

If you count the dots you’ll see exactly 32, which is what we would expect given my statement from the first paragraph. Expanding a bit further if we increase the argument to range to be 33 instead what do you expect we’d then see?

(take 1 (map gimme (range 33)))
;=> (................................0)

Yep, that’s 32 dots again. To move our chunky window to the right is as simple as asking for its first element by:

(take 1 (drop 32 (map gimme (range 64))))
;=> (................................................................32)

You be might starting to wonder about this. In fact, you might be worried that chunked sequences have squashed the entire point of lazy sequences and for small sequences you would be correct. However, the benefits of lazy sequences are really striking for those of cyclopean magnitudes — I’m talking larger than memory big. Chunked sequences in those cases in those cases are an incredible boon because not only do they make sequence functions more efficient overall, they still fulfill the promise of lazy sequences: avoiding full realization of interim results.

However, there are legitimate concerns about this chunked model and one such concern is that of desiring a 1-at-a-time model to avoid exploding computations. Without going into too much detail about that, a nice straw-man to use as a counterpoint against chunked sequences is that of building an infinite sequence of Mersenne primes. That is, implicit realization of the first 32 Mersenne primes through chunked sequences will be quite costly indeed.

Therefore, in order to combat such situations I present a simple function (useable only in the current “new” branch) to de-chunkify a lazy sequence and enforce the 1-at-a-time model:

(defn seq1 [#^clojure.lang.ISeq s]
  (reify clojure.lang.ISeq
    (first [_] (.first s))
    (more [_] (seq1 (.more s)))
    (next [_] (let [sn (.next s)] (and sn (seq1 sn))))
    (seq [_] (let [ss (.seq s)] (and ss (seq1 ss))))
    (count [_] (.count s))
    (cons [_ o] (.cons s o))
    (empty [_] (.empty s))
    (equiv [_ o] (.equiv s o))))

(take 1 (map gimme (seq1 (range 32))))
;=> (.0)

(take 1 (drop 32 (map gimme (seq1 (range 64)))))
;=> (.................................32)

And there you go! Now you can again safely generate your lazy, infinite sequence of Mersenne primes! The world rejoices.

Bear in mind, that the code for seq1 is in no way official and should not be used for production code. Clojure will one day provide an official version of the function above, but for now I simply took a rough sketch posted by Rich Hickey and made it work with the “master” branch as an exercise and to hopefully gain more insight into the whole matter of chunkiness in general. Hopefully, it can serve the same purposes for you.

-m

p.s. I’m writing a book about Clojure with Chris Houser. The matter of chunked sequences and many other Clojure topics will be covered. ^_^

6 Comments, Comment or Ping

  1. Simon Hawkin

    As someone else remarked in Hackers News, I too wish the default would be 1-at-a-time, and the chunk-at-a-time optimization would be an option. Would be more clear, more intuitive, and more consistent with the rest of Clojure spirit.

  2. Thanks for clarifying chunked sequences. This is the clearest exposition of chunked sequences I have ever seen so far. yes, I have a MEAP copy of your book.

    Now, isn’t chunking and dechunking, an implementation detail? Users expect some behavior from lazy behavior, chunking changes it and that is an implementation detail. Similarly with seq1 (dechunking). Why is that leaked into the outside view of the language?

  3. Colin Curtin

    typo: in those cases in those cases

  4. Looks like this needs a small tweak to work in Clojure 1.2, adding a this argument to each of the ISeq methods: https://gist.github.com/717761

  5. Thank you for the reminder Colin Jones. I used the _ char instead of this since the implicit object argument is not directly used while the closed over s is. I should go back through my old posts and update them for Clojure 1.2+.

    :F

  6. Jim Newton

    I’m not sure I understand the intend of using seq1. Don’t we lose the 1-at-a-time property as soon as we call for/map/mapcat ? What is the use model for mapping and filtering using the 1-at-a-time sequences?

Reply to “De-chunkifying Sequences in Clojure”