[*This post is the second one about the function Reduce. It assumes everything discussed in the *first post about Reduce]

The function `Reduce`

in R is based on the Lisp function `reduce`

. It also adds an option borrowed from Haskell's `scan`

functions.

Let's see this by exploring a few simple examples and observing their behavior in R as well as in Lisp and Haskell.

### What does `Reduce`

do?

```
## Example 1.
# R
> Reduce(`+`, 1:4)
[1] 10
# Lisp
[1]> (reduce #'+ '(1 2 3 4))
10
# Haskell
Prelude> foldl (+) 1 [2, 3, 4]
10
```

Look at the Haskell function. I have used `foldl`

, that is the `fold-left`

function in Haskell, because R and Lisp do `fold-left`

by default. Note also that while in Haskell the initial value has to be specified, in R and in Lisp we don't need to do so. When not specified the initial value for left-folding is the first element in the given list.

The evaluation of all these functions (`fold-left`

) expands as follows for the given example:

```
(((1 + 2) + 3) + 4)
## Example 2.
# R
> Reduce(`-`, 1:4)
[1] -8
# Lisp
[1]> (reduce #'- '(1 2 3 4))
-8
# Haskell
Prelude> foldl (-) 1 [2, 3, 4]
-8
```

The expansion is the same:

```
(((1 - 2) - 3) - 4)
## Example 3.
# R
> Reduce(`-`, 1:4, right = TRUE)
[1] -2
# Lisp
[1]> (reduce #'- '(1 2 3 4) :from-end t)
-2
# Haskell
Prelude> foldr (-) 4 [1, 2, 3]
-2
```

The evaluation expands to this now:

`(1 - (2 - (3 - 4)))`

This is `fold-right`

. In R and Lisp we have to set an option for this fold, and at this point it should be clear why these two functions are described as "left" and "right" folds. Note also that, for `fold-right`

the initial value, if given (as in Haskell) become the right-most item in the expansion. In fact, in R and Lisp if the initial value is not given, it is the last element of the given list.

R provides an extra feature borrowed from Haskell for producing a sequence of results of accumulated values rather than the final reduction.

In Haskell, there is a special set of functions for this purpose, `scan`

functions, `scanr`

corresponds to `foldr`

and `scanl`

to `foldl`

:

```
Prelude> scanl (-) 1 [2, 3, 4]
[1,-1,-4,-8]
```

Observe the partial results in the computations above:

```
initial value: = 1
1st result : 1 - 2 = -1
2nd result : -1 - 3 = -4
3rd result : -4 - 4 = -8
```

Or, for a right fold:

```
Prelude> scanr (-) 4 [1, 2, 3]
[-2,3,-1,4]
```

where partial results from right to left and in reverse order are:

```
initial value: = 4
1st result : 3 - 4 = -1
2nd result : 2 - (-1) = 3
3rd result : 1 - 3 = -2
```

The R equivalents need the `accumulate`

parameter set to `TRUE`

:

```
Reduce("-", 1:4, accumulate = TRUE)
> [1] 1 -1 -4 -8
```

and

```
Reduce("-", 1:4, right = TRUE, accumulate = TRUE)
> [1] -2 3 -1 4
```

These functions are just variants of the tail-recursive pattern where a second accumulator is added to keep track of computed values so far.

So for instance `scanl`

can be implemented as follows:

```
(define (scan-left f i lox0)
(local [(define (f-for-lox-acc acc rsf lox)
(cond [(empty? lox) (reverse (cons acc rsf))]
[else
(f-for-lox-acc (f acc (first lox))
(cons acc rsf)
(rest lox))]))]
(f-for-lox-acc i empty lox0)))
```

Only and extra accumulator `rsf`

has been added. It maintains a list of resulting values so far. Each time a new computed value is produced is inserted into the `rsf`

accumulator. The result is finally reversed. It is possible to avoid the final `reverse`

if computed values are appended to the end of `rsf`

. Anyway it is clear the idea behind this.

### R implementation of `Reduce`

As we have seen, `Reduce`

in R is a multipurpose function. Other languages as Haskell prefer independent elementary functions for each task (right/left fold and returning accumulated values or not). I find the latter more elegant, easier to read and easier to test. R, though, is fond of functions with lots of parameters, and full of conditionals to select the code to execute for a specific purpose.

Even so, and with the goal of firstly implementing our own version for better understanding the R official version, it is reasonable to begin with basic use cases and add more after those have been implemented.

One of the simplest use case is to execute `Reduce`

for left fold with a given initial value. As usual, we need a very basic set of tests. (more should be added for good coverage). The following are hopefully enough at this stage:

```
test_that("test reduce: fold-left, init given", {
expect_equal(my_reduce("+", integer(0), init = 1), 1)
expect_equal(my_reduce("-", 2:4, init = 1), -8)
expect_equal(my_reduce("/", c(2, 9, 13), init = 7), 7/234)
expect_equal(my_reduce(list, 2:4, init = 1), (list(list(list(1, 2), 3), 4)))
})
```

The most natural implementation at this point is the translation into R of the `fold-left`

tail recursive procedure (first variant), as this:

```
my_reduce <-
function(f, x, init) {
f <- match.fun(f)
iter <- function(acc, x) {
len <- length(x)
if (!len) {
acc
}
else {
first <- x[[1L]]
rest <- tail(x, -1L)
iter(f(acc, first), rest)
}
}
iter(init, x)
}
```

Assuming that test cases are saved into the file `test_my_reduce.R`

, running tests with:

```
> library(testthat)
> test_file("test_my_reduce.R")
```

confirms that this implementation works.

A few points about this code. The first line applies `match.fun`

as customary in R code to check the validity of the argument passed as `f`

. I have used `1L`

instead of just `1`

for indexing. This is a common practice in R base code: when an integer is required the number as a literal integer (the number followed by `L`

) is passed. Finally, I have intentionally named the auxiliary helper as `iter`

for reasons that will be clear. Any name would have worked though.

The next step is handling the case where no initial value is passed. Some test cases for this:

```
test_that("test reduce: fold-left, init missing", {
expect_equal(my_reduce("+", integer(0)), NULL)
expect_equal(my_reduce("-", 1:4), -8)
expect_equal(my_reduce("/", c(7, 2, 9, 13)), 7/234)
expect_equal(my_reduce(list, 1:4), (list(list(list(1, 2), 3), 4)))
})
```

The first test case is for the empty vector. As no initial value is given, we cannot return it as before. By convention we return `NULL`

in this case.

And this is an obvious implementation that passes all tests above [only the first lines are added, the `iter`

function remains the same]:

```
my_reduce <-
function(f, x, init) {
f <- match.fun(f)
mis <- missing(init)
len <- length(x)
if (mis && !len) {
return(NULL)
}
if (mis) {
init <- x[[1L]]
x <- tail(x, -1L)
}
iter <- function(acc, x) {
len <- length(x)
if (!len) {
acc
}
else {
first <- x[[1L]]
rest <- tail(x, -1L)
iter(f(acc, first), rest)
}
}
iter(init, x)
}
```

What about the `fold-right`

operation?

Let's write some tests for fold-right with and without the initial value given:

```
test_that("test reduce: fold-right, init given", {
expect_equal(my_reduce("+", integer(0), init = 1, right = TRUE), 1)
expect_equal(my_reduce("-", 1:3, init = 4, right = TRUE), -2)
expect_equal(my_reduce("/", c(7, 2, 9), init = 13, right = TRUE), 63/26)
expect_equal(my_reduce(list, 1:3, init = 4, right = TRUE),
list(1, list(2, list(3, 4))))
})
test_that("test reduce: fold-right, init missing", {
expect_equal(my_reduce("+", integer(0), right = TRUE), NULL)
expect_equal(my_reduce("-", 1:4, right = TRUE), -2)
expect_equal(my_reduce("/", c(7, 2, 9, 13), right = TRUE), 63/26)
expect_equal(my_reduce(list, 1:4, right = TRUE),
list(1, list(2, list(3, 4))))
})
```

As for the implementation it looks like it would be a very different piece of code if we were to translate the natural recursive procedure examined in the post referred above. However, when designing functions that handle several possibilities we should seek after maximizing the amount of code shared by each of them. The best possible scenario for the function at hand would be one in which the tail-recursive pattern employed for `fold-left`

is shared by the `fold-right`

operation. It turns out that we already have a promising candidate, the second variant of `fold-left`

. Remember that it has the same signature as `fold-right`

, and it shouldn't be difficult to adapt it so that it serves as a tail-recursive version for `fold-right`

. A closer examination into the expansions of both procedures gives the answer. Let's recall them once again:

```
fold-right (-) 1 '(2 3 4) : 2 - (3 - (4 - 1))
fold-left-2 (-) 1 '(2 3 4) : 4 - (3 - (2 - 1))
```

The only difference lies in the order of elements in the given list. The function we are searching for (a tail-recursive `fold-right`

) should have the same expansion as `fold-right`

:

`fold-right-2 (-) ?? : 2 - (3 - (4 - 1))`

Which arguments should it receive? Obviously:

`fold-right-2 (-) 1 '(4 3 2)`

Hence, `fold-right-2`

is just `fold-left-2`

with the input list reversed, and, as you surely remember, the only remaining difference between our `fold-left`

, the one employed in R, `fold-left-1`

and `fold-left-2`

is the order of arguments passed to `f`

.

All of these points are captured by the following implementation [number of lines included]:

```
1 my_reduce <-
2 function(f, x, init, right = FALSE) {
3 iter <- function(acc, x) {
4 len <- length(x)
5 if (!len) {
6 acc
7 }
8 else {
9 first <- x[[1L]]
10 rest <- tail(x, -1L)
11
12 if (right) iter(f(first, acc), rest) else iter(f(acc, first), rest)
13 }
14 }
15
16 f <- match.fun(f)
17 mis <- missing(init)
18 len <- length(x)
19
20 if (mis && !len) {
21 return(NULL)
22 }
23
24 if (mis) {
25 if (right) {
26 init <- x[[len]]
27 x <- rev(head(x, -1L))
28 }
29 else {
30 init <- x[[1L]]
31 x <- tail(x, -1L)
32 }
33 }
34 else {
35 if (right) {
36 x <- rev(x)
37 }
38 }
39
40 iter(init, x)
41 }
```

Changes with respect to the previous implementation are:

- The nested function
`iter`

has been modified so that the recursive call passes parameters to`f`

in the order suitable to the requested kind of fold [line 12]. - The handling of options for the arguments
`init`

and`right`

has been appropriately extended [lines 24-38]. Note, in particular, that when right is`TRUE`

we need to reverse the sequence, as explained. The R function`rev`

does the job [lines 27, 36].

This implementation passes our current test set. Nice!

Do you think that this function is getting long? If so you are with me. As I said before I prefer shorter functions with as few parameters as possible and independent helpers. But R base code is plenty of this kind of monolithic functions rich in parameters and conditional branches.

[To be continued]

## No hay comentarios:

## Publicar un comentario