Simple, Correct, Fast

Topics: #R#spatial

The single most important quality in a piece of software is simplicity. It’s more important than doing the task you set out to achieve. It’s more important than performance. The reason is straightforward: if your solution is not simple, it will not be correct or fast. Drew DeVault

I've fallen victim to my own cleverness more times than I would like to admit. After reading Drew's post I've realized how often this happens to me. Let me show you an example.

Setup

The code below will get us setup to run the examples. I've also included some sf code to confirm the area of the polygon is indeed 1.

# calculate the area of a polygon, 
# in this case a square with an area of 1
x <- c(0, 1, 1, 0, 0)
y <- c(0, 0, 1, 1, 0)

# use sf to confirm area is indeed 1
sf::st_area(sf::st_polygon(list(cbind(x, y))))

# create vectors to traverse the sides of the polygon in
# r using the shoelace formula
x0 <- x[1:4]
x1 <- x[2:5]
y0 <- y[1:4]
y1 <- y[2:5]

The last few lines will be used to traverse the vertices of the polygon using the shoelace formula.

The first attempt

My first attempt at implementing the shoelace formula went as follows.

(Reduce(`+`, (x0 * y1)) - Reduce(`+`, (x1 * y0))) / 2
#> 1

The code does return the correct answer. I did account for vectorized operations in R. In other languages we would need to loop over the vector (x0, y1, ect...) in order to multiply them together. But is it simple? No it isn't.

I probably ended up at this implementation because I had just finished writing a lot of JavaScript. I tend to use reduce often in JavaScript, and in R Reduce is similar. I ended up with code that was half JavaScript inspired and half R inspired.

Encapsulation

Before moving on let's write a function to encapsulate our area function.

get_area <- function (polygon_matrix) {
  n <- nrow(polygon_matrix)
  
  x0 <- polygon_matrix[1:n - 1, 1]
  x1 <- polygon_matrix[2:n, 1]
  y0 <- polygon_matrix[1:n - 1, 2]
  y1 <- polygon_matrix[2:n, 2]

  (Reduce(`+`, (x0 * y1)) - Reduce(`+`, (x1 * y0))) / 2
}

Moving forward changes will be made to this function. And to be a little more rigorous, let's add some testing using the assertthat library. This will compare the results of sf::st_area to our get_area function. If assertthat::are_equal returns FALSE, our function isn't working properly.

polygon_matrix <- cbind(x, y)

# test
assertthat::are_equal(
  get_area(polygon_matrix),
  sf::st_area(sf::st_polygon(list(polygon_matrix)))
)

Attempt 2

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian Kernighan

I was trying to be clever. I had found a perfect use case for the Reduce function. And not just once, but twice, in the same line of code! In fact, I was pleased with the cleverness of my code.

The hubris was short lived. In my cleverness I had overlooked the existance of the function sum. I couldn't believe how stupid I had been (my inner dialogue can be harsh).

So I rewrote the code using sum.

get_area <- function (polygon_matrix) {
  n <- nrow(polygon_matrix)
  
  x0 <- polygon_matrix[1:n - 1, 1]
  x1 <- polygon_matrix[2:n, 1]
  y0 <- polygon_matrix[1:n - 1, 2]
  y1 <- polygon_matrix[2:n, 2]

  (sum(x0 * y1) - sum(x1 * y0)) / 2
}

assertthat::are_equal(
  get_area(polygon_matrix),
  sf::st_area(sf::st_polygon(list(polygon_matrix)))
)
#> [1] TRUE

Indeed, it returns the correct answer. The main logic in the function is still one line of code. It is simple. But it can be simplified even more!

Attempt 3

Alas, it can be even simpler. And potentially faster. sum is used twice. sum is vectorized and I've made use of it in the function above. But I forgot about the order of operations, and the actual mathematical notation of the shoelace formula.

Back to the drawing board.

get_area <- function (polygon_matrix) {
  n <- nrow(polygon_matrix)
  
  x0 <- polygon_matrix[1:n - 1, 1]
  x1 <- polygon_matrix[2:n, 1]
  y0 <- polygon_matrix[1:n - 1, 2]
  y1 <- polygon_matrix[2:n, 2]

  sum(x0 * y1 - x1 * y0) / 2
}

assertthat::are_equal(
  get_area(polygon_matrix),
  sf::st_area(sf::st_polygon(list(polygon_matrix)))
)

Finally, correct and simple. And since there is only a single sum function call it should be faster. I've not performed any benchmarking to confirm this.

Summary

Just to be certain, let's try this code on a more complex polygon to make sure it works. The area of the polygon is 30. Let's see if the get_area function works properly.

x <- c(3, 5, 9, 12, 5, 3)
y <- c(4, 6, 5, 8, 11, 4)

polygon_matrix <- cbind(x, y)

assertthat::are_equal(
  get_area(polygon_matrix),
  sf::st_area(sf::st_polygon(list(polygon_matrix)))
)
#> [1] TRUE

The first iteration of the get_area function was complex. Two needless calls to a function that isn't very common (Reduce), using a syntax that isn't super familiar (using an operator within the back ticks).

The second iteration doesn't use Reduce, but two redundant instances of the sum function. Simpler, but not the simplest. The final implementation is indeed the simplest, and it's proven to be correct (and I think faster).

I think it is easy to get caught up in writing code that just solves the problem, or is fast, and forgetting to take the time to ask if it is simple. So slow down a little, and double check your work (I always hated hearing that phrase from my teachers). Please read the article I linked to above. Drew makes a great point, "If you solution is not simple, it will not be correct or fast".