lunes, 25 de julio de 2016

Reading R code. Introduction

One of the most important things I learned when I started my higher education was that in order to really know deeply about a subject one has to dive into the original sources as soon as possible rather than solely sticking to secondary literature. Better to read Plato's and Kant's works in their original language than countless books about Critiques and Dialogues; better to analyze Mozart's and Beethoven's Sonatas (as Charles Rosen does) than skimming over a lot of general articles on Classical music.

When I was put in charge of developing a web site for my institution I hadn't any programming background at all. After some discouraging attempts to learning from intro books and tutorials, I decided to go to the sources and I studied the entire HTML and CSS specifications directly. It was hard, but everything was clear at last [Remark: nowadays it would be overwhelming. HTML and CSS specs are currently huge].

In learning programming languages the next step to the initial exposure to syntax, concepts and techniques might be well studying core libraries written by main developers.

How could one understand better a programming language than by trying to read the code that its core developers have created?

I'm always surprised by the scarcity of commentaries about exemplary code in any language. There is somewhat like a gap between introductory expositions and the source code itself. The latter is silently digested by the people who develop it or who create libraries on top of it, while beginners live always in the other much narrower side, their little friendly universe of tutorials and simple recipes, kind enough, for sure, but maybe a bit tasteless [though great exceptions can always be found out] when their time, the absolute beginners' time, has passed.

One reason for this lack might be that core libraries are very complex and abstract beasts, and an understanding of parts of it is just hopeless without a firm grasp on the whole architecture and design, something unreachable to anyone else but experts.

This is not always the case, though. At least it is not the case for a good bunch of functions in R base code. Many are written in R itself and they are almost self-contained in the sense that a preliminary comprehension doesn't depend on a complete acquaintance with the abstract underlying architecture.

So I've thought that I can give this idea a try, picking some R functions, and reading them with the aim of understanding R better. The goal is educational, self-educational mainly. Along the way I'll try to make things perhaps easier to others with a bit less programming background than mine hoping that in doing so I not only reinforce my understanding but also help others deepen their own.

I'm not an R expert. This means that while reading and trying to make sense of official implementations I probably will make some more or less educated guesses and I could (and surely will) make mistakes. So if someone more knowledgeable than I am (there should be many) read this, please let me know to fix any error, misleading step, or gratuitous deduction.

The intended audience is people with a basic working understanding of R data structures and programming constructs, including conditionals, loops and function definitions. The only required tool is the R console. And for the moment only one extension package will be used, the package testthat for unit testing support. It can be installed as usual via:

> install.packages(testthat)

To use testthat in a simple way (not the best one for real projects but enough for our purposes now) proceed as follows:

  1. Save the function you want to test in a file.
  2. Create a new file on the same directory for testing that function. The name of this file should start with 'test'.
  3. Source the code of the function to be tested.
  4. Write tests.
  5. Run tests from the console.

An example of this workflow.

Save the following function in foo.R:

foo <- 
function(x) {
    ifelse((identical(x, 1)), "I'm 1", "I'm not 1")
}

Create test_foo.R with this content:

source("foo.R")

test_that("foo is 1 or not 1", {
  expect_equal(foo(1), "I'm 1")
  expect_equal(foo(0), "I'm not 1")
  expect_equal(foo("hi"), "I'm not 1")
})

Load testthat on your session and run tests from the console:

> library(testthat)
> test_file("test_foo.R")

Introducing unit testing from the very beginning may seem unnecessary. On the contrary, testing is of paramount importance, and writing first a minimal set of tests guides the implementation. Since we probably will write our own code here and there, it is crucial to be equipped with a tool that enable us to always write those tests. This is just common-place and fundamental practice whatever the programming language.

By the way, for a perfect introduction to programming fundamentals, where programming stand here for "programming well" rather than "just coding", read this book:

http://www.ccs.neu.edu/home/matthias/HtDP2e/

and/or take this course:

https://www.edx.org/xseries/how-code-systematic-program-design

They both are gems that no one should miss.

One last point about the R source code. Apart from interactively getting the code as usual by typing the function name, for instance:

> which

and its documentation:

> ?which

you can, if you like, download the complete source from

https://cran.r-project.org/sources.html

Also, you can access to the official current snapshot on Subversion if you know how to do so, or browse over (or clone) a non-official github mirror. I'm aware of these two:

No hay comentarios:

Publicar un comentario