domingo, 5 de febrero de 2017

The unfold function. Gentle introduction from examples

This post is about a higher-order function known by functional programmers under the name unfold or ana (from the word anamorphism), It is less famous than fold but still worth of consideration and pretty useful.

Most of the expositions I know introducing unfold are based on Haskell or on a Haskell-like language. To mention some seminal but readable papers:

There is also a brief and illuminating introduction in Hutton, G., Programming in Haskell, Cambridge University Press, 2016.

My intention is to present unfold under a Scheme-like syntax and at slower pace than in the referred papers. In particular, I'll use a domain-specific language tailored for teaching purposes from Racket. It is the language employed in the excellent book Felleisen, M. et. al., How to Design Programs. From this book, as well as from the amazing MOOC How To Code: Systematic Program Design I also borrowed the same systematic way of presentation.

All the code below can be run on DrRacket, the IDE for our language that is available here:

https://download.racket-lang.org/

Specifically, if you want to run the code, set the DrRacket language to 'Advanced Student Language'.

Additionally, write these two lines at the beginning of the code. They load a couple of libraries I will rely upon

(require racket/function)
(require racket/list)

Let us consider functions that produce a list from a (seed) value. So functions with this signature:

;; Y -> (listof X)

that is a function that takes one argument of type Y, and produces a list of elements of type X.

[Hopefully, the meaning of those signatures is self-explanatory. For more information look into the first chapter of HtDP/2e or take the How To Code MOOC.

Also, signatures here are just code comments for helping us design functions, but not checked by the compiler].

A function that resembles that signature is, say, copier, that creates a list of n copies of a string [see HtDP/2e section 9.3]. Note that this function contains an extra parameter, the string to be copied. So it doesn't follow the signature, but I begin with it because it is very easy to understand. We will look later what we can do with the extra parameter.

The signature of copier is

;; Natural String -> (listof String)

The implementation based on the template for Natural:

;; <template for Natural>
(define (fn-for-natural n)
  (cond [(zero? n) (...)]
        [else
         (... (fn-for-natural (sub1 n)))]))

is straightforward:

;; Natural String -> (listof String)
;; produce a list with n copies of s
(check-expect (copier 0 "hi") '())
(check-expect (copier 3 "hi") '("hi" "hi" "hi"))

(define (copier n s)
  (cond [(zero? n) '()]
        [else
         (cons s
               (copier (sub1 n) s))]))

Another typical function that constructs a list from some initial value is string-split, that produces a list of words in a given string.

For convenience we represent String as (listof 1String), where 1String is a String consisting of a single letter. [In the definition I'll use the list functions takef and dropf provided by racket/list. These are also known as take-while and drop-while in other languages.]

To design this function we rely on the template for lists:

#;
(define (fn-for-lox lox)
  (cond [(empty? lox) (...)]
        [else
         (... (first lox)
              (fn-for-lox (rest lox)))]))

;; (listof 1String) -> (listof (listof 1String))
;; produce the list of words in los
(check-expect (string-split '()) '())
(check-expect (string-split '(" ")) '(()))
(check-expect (string-split '("a")) '(("a")))
(check-expect (string-split '("a" "b" " " "c" " " "d" "e" "f"))
              '(("a" "b") ("c") ("d" "e" "f")))

(define (string-split los)
  (cond [(empty? los) '()]
        [else
         (cons (first-word los)
               (string-split (remove-first-word los)))]))

;; (listof 1String) -> (listof 1String)
;; produce the first word in los
(check-expect (first-word '()) '())
(check-expect (first-word '("a" "b" " ")) '("a" "b"))

(define (first-word los)
  (takef los not-whitespace?))

;; (listof 1String) -> (listof 1String)
;; remove from los its first word as any leading whitespaces before it
(check-expect (remove-first-word '()) '())
(check-expect (remove-first-word '("a" "b" " " "c" "d" " " "e"))
              '("c" "d" " " "e"))

(define (remove-first-word los)
  (trim-leading-whitespaces (dropf los not-whitespace?)))

;; (listof 1String) -> (listof 1String)
;; remove from los its leading whitespaces
(check-expect (trim-leading-whitespaces '()) '())
(check-expect (trim-leading-whitespaces '("a")) '("a"))
(check-expect (trim-leading-whitespaces '(" " " " "a")) '("a"))

(define (trim-leading-whitespaces los)
  (dropf los string-whitespace?))

;; 1String -> Boolean
;; determine whether given s is not a whitespace
(check-expect (not-whitespace? " ") #f)
(check-expect (not-whitespace? "a") #t)

(define not-whitespace? (compose not string-whitespace?))

What about the higher-order function map? It actually produces a list too, this time from a given list and some function over the type of its elements:

;; (X -> Y) (listof X) -> (listof Y)
;; produce the list (list (f x1) (f x2) ...) by applying
;; f to each element of lox
(check-expect (my-map sqr '()) '())
(check-expect (my-map sqr '(1 2 3 4)) '(1 4 9 16))

Again a function over a list that we can define easily:

(define (my-map f lox)
  (cond [(empty? lox) '()]
        [else
         (cons (f (first lox))
               (my-map f (rest lox)))]))

As a final example over a more intricate input type consider the function zip that takes a pair of lists and produces a list of pairs.

A natural recursive implementation of zip is as follows:

;; (listof X) (listof Y) -> (listof (list X Y))
;; produce (list (x1 y1) (x2 y2) ...) from given lists, if lists
;; have different length, excess elements of the longer list
;; are discarded.
(check-expect (zip0 '() '()) '())
(check-expect (zip0 '(1 2 3) '(3 4 5)) '((1 3) (2 4) (3 5)))
(check-expect (zip0 '(1 2) '(3)) '((1 3)))
(check-expect (zip0 '(1 2) '(3 4 5)) '((1 3) (2 4)))

(define (zip0 lox loy)
  (cond [(or (empty? lox) (empty? loy)) '()]
        [else
         (cons (list (first lox) (first loy))
               (zip0 (rest lox) (rest loy)))]))

More suitable for the argument's sake is to pass both lists wrapped in a single pair of type (list (listof X) (listof Y)):

;; (list (listof X) (listof Y)) -> (listof (list X Y))
;; produce (list (x1 y1) (x2 y2) ...) from given list pair ...
(check-expect (zip '(() ())) '())
(check-expect (zip '((1 2 3) (3 4 5))) '((1 3) (2 4) (3 5)))
(check-expect (zip '((1 2) (3))) '((1 3)))
(check-expect (zip '((1 2) (3 4 5))) '((1 3) (2 4)))

(define (zip lp)
  (cond [(ormap empty? lp) '()]
        [else
         (cons (map first lp)
               (zip (map rest lp)))]))

;; ---------------------------------

All of those functions share the same pattern.

They all have a predicate that produces the empty list if it is satisfied by the seed value, and a list constructor that applies some functions to the first and the rest of the constructed list.

Therefore we can abstract the pattern from the examples. [More about abstraction from examples in HtDP/2e Sect. 15].

#;
(define (copier ... ... ... n s)
  (cond [(...? n) '()]
        [else
         (cons (... s) ;-> s, what about n?
               (copier (... n) s))]))

#;
(define (string-split ... ... ... los)
  (cond [(...? los) '()]
        [else
         (cons (... los)
               (string-split (... los)))]))

#;
(define (my-map ... ... ... lox)
  (cond [(...? lox) '()]
        [else
         (cons (... lox)) ;-> (f ...)
               (my-map f (... lox))]))

#;
(define (zip ... ... ... lp)
  (cond [(...? lp) '()]
        [else
         (cons (... lp)
               (zip (... lp)))]))

Note that in order to preserve the same pattern on all templates I have made a few tweaks in the template of a couple of the above functions.

First, The seed value is missing in copier. There is an s instead of n. Also, there is no function call at the same place at which it appears in the rest of the templates.

Secondly, I have temporarily removed the f argument from my-map. Instead I put a comment to remind me that f will play, for sure, some role in the final design.

After filling the gaps we get this:

;; (Y -> Bool) (Y -> X) (Y -> Y) Y -> (listof X)
(define (unfold p? f g b)
  (cond [(p? b) '()]
        [else
         (cons (f b)
               (unfold p? f g (g b)))]))

This abstract function encapsulates a recursive pattern that is dual to fold. While fold de-structs a list (so the funny name by which it is known, cata-morphism), unfold cons-tructs a list (and, likewise, is called ana-morphism).

Now we can re-defined the examples with unfold, and amend the tweaks made above as required.

The easiest one is string-split:

;; (listof 1String) -> (listof (listof 1String))
;; produce the list of words in los
(check-expect (string-split2 '()) '())
(check-expect (string-split2 '(" ")) '(()))
(check-expect (string-split2 '("a")) '(("a")))
(check-expect (string-split2 '("a" "b" " " "c" " " "d" "e" "f"))
              '(("a" "b") ("c") ("d" "e" "f")))

(define (string-split2 los)
  (unfold empty? first-word remove-first-word los))

As for my-map the original f argument can be reinstated at the place occupied by the f of unfold as the initial member of a function composition with first:

;; (X -> Y) (listof X) -> (listof Y)
;; produce the list (list (f x1) (f x2) ...) by applying
;; f to each element of lox
(check-expect (my-map2 sqr '()) '())
(check-expect (my-map2 sqr '(1 2 3 4)) '(1 4 9 16))

(define (my-map2 f lox)
  (unfold empty? (compose f first) rest lox))

Regarding copier the only really diverging thing is the extra parameter, s. In other words, the unfold pattern takes a single seed parameter while copier takes two. One way to cope with this is to implement copier as a closure. Besides, there is no actual function call on the first term of the constructed list. In such cases the pattern obliges to supply some function. Typically, identity or, as here, const. [const is a function that produces the value of its body whatever the argument passed into it. It is provided by racket/function].

;; Natural String -> (listof String)
;; produce a list with n copies of s
(check-expect (copier2 0 "hi") '())
(check-expect (copier2 3 "hi") '("hi" "hi" "hi"))

(define (copier2 n0 s)
  (local [(define (recr n)
            (unfold zero? (const s) sub1 n))]
    (recr n0)))

zip doesn't entail anything special. We just pass the appropriate functions at due places. We can give them names, anonymous functions seem to be simpler, though.

;; (list (listof X) (listof Y)) -> (listof (list X Y))
;; produce (list (x1 y1) (x2 y2) ...) from given list pair ...
(check-expect (zip2 '(() ())) '())
(check-expect (zip2 '((1 2 3) (3 4 5))) '((1 3) (2 4) (3 5)))
(check-expect (zip2 '((1 2) (3))) '((1 3)))
(check-expect (zip2 '((1 2) (3 4 5))) '((1 3) (2 4)))

(define (zip2 lp)
  (unfold (lambda (xs) (ormap empty? xs))
          (lambda (xs) (map first xs))
          (lambda (xs) (map rest xs))
          lp))

Or in a more concise manner, taking advantage of curry, a library function in racket/function [For more info about currying: https://en.wikipedia.org/wiki/Currying]:

(define (zip3 lp)
  (unfold (curry ormap empty?)
          (curry map first)
          (curry map rest)
          lp))

In summary, whenever you find a function that constructs a list from some value, unfold might be a very useful and elegant abstraction.