<aside> 📦

You have to first install and load these packages in R

</aside>

  1. Are you thinking of buying a house around Oxford? (Disclaimer: I am not a property agent and I am not selling one. I am just a poor academic complaining about the house prices.) This is one of the regions with the highest housing price across UK so you might want to invest carefully. We will use the Price Paid Data (September 2025) to understand the pattern of housing price and interpolate in areas with no trading information.

    <aside> 📥

    Download

    hp.csv

    LAD_DEC_24_UK_BGC.zip

    </aside>

    1. Load the hp data and convert it to a sf object.

    2. Load the LAD_DEC_24_UK_BGC data and subset it to the five Oxfordshire districts c("Oxford", "Cherwell", "South Oxfordshire", "Vale of White Horse", "West Oxfordshire") to create ox.

    3. What is the CRS of ox?

      • Answer

      It is the famous British National Grid.

    4. Reproject ox to CRS:4326 using st_transform().

    5. Visualise these spatial data.

    6. Let’s try a new visualisation package tmap. You can use either tmap_mode("plot") or tmap_mode("view") to create a static or interactive map. This is an example:

      tmap_mode("plot")
      tm_shape(ox) + tm_polygons(alpha = 0.3) + tm_shape(hp) + tm_dots(col = "Price", palette = "viridis")
      

      What looks wrong?

    7. We note that there are a very few big outliers. How much do these cost?

    8. We will ignore them for now and colour the codes by dividing the prices into quantiles instead. Add style = "quantitle" into tm_dots().

    9. You are also free to try other visualisation such as tm_bubbles().

    10. Now we will predict the prices of properties continuously in space in Oxfordshire. To do that, we first create a fine raster grid covering Oxfordshire and consider the centroids of the raster cells as the prediction locations. We will use terra:rast() and specify the region to be covered ox and number of cells. Let’s try 100 x 100 cells.

    11. Then the prediction locations xy can be obtained from the xyFromCell().

      • Answers

      What does xy look like?

    12. We have to convert the xy to sf object called coop, with st_as_sf again!

    13. Visualise coop by qtm(coop, size = 0.1). … Does it even make sense?

    14. Oh! We have to only retain the parts that are within ox. Use st_filter().

    15. We can obtain predictions at each of the prediction locations as the values of the closest sample location with the help of Voronoi diagram. Simply use terra::voronoi(). x should be the vectorised form of hp, and bnd should be ox.

    16. Plot the Voronoi diagram.

    17. Add the points on the diagram by points().

    18. The prediction can already be visualised by tmap again. First convert your Voronoi diagram to a sf object (polygons). Then visualise it using the tmap logic as demonstrated earlier, but use tm_fill() this time because the colours fill the polygons.

      • Answer
    19. We can also extract the predicted values at prediction points coop by using the st_intersection() function. Instead o a map with the predictions at the locations given by coop, we can map the predictions at a raster grid. To do that, we transfer the sf object with the predictions at locations coop to the raster grid grid, and create a plot with the raster values with tmap.

      
      # This step might take quite a few minutes.
      v.resp <- st_intersection(v, coop)
      
      v.pred <- rasterize(v.resp, grid, field = "Price", fun = "mean")
      tm_shape(v.pred) + tm_raster(alpha = 0.6, style = "quantile", palette = "viridis")
      
    20. We will now try the Inverse Distance Weighting method, which can be applied using the gstat() function of gstat with the following arguments: formula: Price ~ 1 to have an intercept-only model; nmax number of neighbours will be equal to the total number of locations (what is it?); sest = (list(idp = 1)) inverse distance power set to 1.

    21. Then we will use the predict() function to obtain the predictions and again tmap to show the results. The prediction is stored in a feature called $var1.pred.

      • Answer
    22. We will now try the Nearest Neighbours method, which again uses the gstat() function. Unlike the IDW method, in the nearest neighbors approach locations further away from the location where we wish to predict are assigned the same weights. Therefore, the inverse distance power idp is set equal to zero so all the neighbors are equally weighted. You can experiment with different number of closed sample locations, but let’s start with nmax = 5.

      • Answer
    23. How does increasing the nmax in the NB method change the result?

    24. Finally, we can obtain predictions using an Ensemble approach that combines the predictions across several spatial interpolation methods. Simply take the predictions from the above three methods and apply equal weights (i.e. multiply each of them with a 1/3 then sum them up). Visualise it.

      • Answer
    25. We will now assess the performance of each of the four methods using K-fold cross-validation and the root mean squared error (RMSE). We will create training and testing sets by using the kfold() function in dismo to randomly assign observations to K = 5 groups of roughly equal size.

      kf <- kfold(nrow(hp), k = 5)
      
    26. We will initialise vectors to store the RSME values obtained with each method.

      rmse1 <- rep(NA, 5) # Closest observation
      rmse2 <- rep(NA, 5) # IDW
      rmse3 <- rep(NA, 5) # Nearest neighbors
      rmse4 <- rep(NA, 5) # Ensemble
      
    27. This is an example of RMSE calculation for the closest observation using Voronoi method for K = 1.

      k <- 1
      
      test <- hp[kf == 1, ]
      train <- hp[kf != 1, ]
      
      v <- voronoi(x = vect(train), bnd = hp)
      v <- st_as_sf(v)
      v.pred <- st_intersection(v, test)$Price   # Again this is sadly taking very long time.
      rmse1[k] <- RMSE(test$Price, v.pred)
      
    28. This is an example of RMSE calculation for the IDW for K = 1.

      # Already done above.
      # k <- 1
      # test <- hp[kf == 1, ]
      # train <- hp[kf != 1, ]
      
      idw <- gstat(formula = Price ~ 1, locations = train,
                  nmax = nrow(train), set = list(idp = 1))
      idw.pred <- predict(idw, test)$var1.pred
      rmse2[k] <- RMSE(test$Price, idw.pred)
      
    29. Similar to IDW, how do we calculate the RMSE for NB with nmax = 5 for K = 1?

      # Already done above.
      # k <- 1
      # test <- hp[kf == 1, ]
      # train <- hp[kf != 1, ]
      
      nb <- gstat(formula = Price ~ 1, locations = train,
                  nmax = 5, set = list(idp = 0))
      nb.pred <- predict(nb, test)$var1.pred
      rmse3[k] <- RMSE(test$Price, nb.pred)
      
    30. Finally, RMSE for Ensemble method can be calculated as such, that weights are inverse RSME so the lower RSME, the higher weight.

      # Already done above.
      # k <- 1
      # test <- hp[kf == 1, ]
      # train <- hp[kf != 1, ]
      
      ens.pred <- v.pred * 1/3 + idw.pred * 1/3 + nb.pred * 1/3
      rmse4[k] <- RMSE(test$Price, ens.pred)
      
    31. Now we have to calculate the same for K = 2, K = 3, K = 4, K = 5… Enjoy your next hour…

    32. 🚨 NO! You could have used a for(k in 1:5) loop to do that.

      • Answer

      <aside> ☕

      This code will take probably 10 minutes to run. Grab a tea.

      </aside>

    33. Everything is now stored at vectors rmse1, rmse2, rmse3, rmse4 for the four methods. You can tidy everything in a data frame, and calculate the average RMSE over the 5 splits.

      • Answer
    34. Which one has the lowest RSME and thus is the best method?

      • Answer

<aside> 🎉

Congratulations! You have now learned how to perform spatial interpolation with a wide range of methods that take into account the spatial relationship between localities with known values and unknown values. I am curious if the Colleges use this method to estimate the costs of these lands around Oxford before buying them.

</aside>