2. Spatial autocorrelation

<aside> 📦

You have to first install and load these packages in R

</aside>

We will revisit the John Snow’s 1855 London Cholera spatial analysis. We want to test if closer neighbourhoods will have higher death rates, which might then suggest a proximal cause for the epidemics.

<aside> 📥

Download

snow.zip

</aside>
1. Load the data into a sf object. What did you notice about the CRS? What should we do?
  - Answer
2. Make a simple plot to confirm. You can also only plot the snow$deaths.
3. We will first prepare a list of weights based on neighbouring relationships, which relies on two functions poly2nb (neighbouring relationships) and nb2listw (weights) in the spdep package. By adjusting the parameters to these two functions, neighbours get higher weights and non-neighbours get low or zero weights.
  
  <aside> 💡
  
  For point type of data, spdep::knearneigh and spdep::knn2nb can construct the weights for neighbours. spdep::grid2nb can do so for raster data. For this practical, we focus on the areal data.
  
  </aside>
```
snow.nb <- poly2nb(snow, queen = T)
snow.nbw <- nb2listw(snow.nb, style = "W")
```
  1. An error comes up! Why?
    - Answer
  2. We will use st_buffer() to buffer slightly so polygons will “grow” over the roads, we will use dist = 5 (5 metres) because that is typically half the road width.
  3. Visualise again. Do they fully touch each other? Should we increase the dist = even more?
  4. Some polygons still don’t touch and it is completely fine. We simply add zero.policy = T to the nb2listw() function to account for that. They are then treated as isolated areas.
    - Answer
4. Then, moran.test() can be used to compute the Global Moran’s I statistic on snow$deaths. Inspect its value and p-value. What does it tell us?
  - Answer
5. We can use moran.plot() to visualise the spatial autocorrelation. It will show the observations of each area (snow$deaths) against their spatially lagged values, which are calculated as the weighted average of the neighbouring values for that area.
  - Answer
6. Next, we will look at the Local Moran’s I. It is similar to global Moran’s I but we will use localmoran() function instead.
  - Answer
7. As you can see from snow.li, the values we are interested in are Ii (Moran’s I statistic), Z.Ii (z-score), and the Pr(z > E(Ii)) (p-value). How can we extract these vectors and add them to the sf object snow?
  - Answer
8. We will use tmap package to visualise all these scores interactively. The below is given to you as an example for visualising the deaths.
```
p.deaths <- tm_shape(snow) +
  tm_polygons(col = "deaths", title = "Deaths", style = "quantile") +
  tm_layout(legend.outside = TRUE)

print(p.deaths)
```
9. You will then visualise the Ii (Moran’s I statistic), Z.Ii (z-score), and the Pr(z != E(Ii)) (p-value). You can use tmap_arrange() to nicely put all four interactive maps together.
  - Answer
10. We will produce the Moran’s plot again, but this time we will use the scaled values of snow$deaths.
```
snow.mp <- moran.plot(as.vector(scale(snow$deaths)), snow.nbw)
head(snow.mp)
```
11. We will then identify the four cluster types (high-high, low-low, high-low, and low-high) by using the quadrants of the scaled values (snow.mp$x) and their spatially lagged values (snow.mp$wx) and the p-values obtained with the local Moran’s I for each of the areas (snow$lp) obtained above. The classification should be as follows:
  1. High-high: both snow.mp$x > 0 and snow.mp$wx > 0
  2. Low-low: both snow.mp$x < 0 and snow.mp$wx < 0
  3. High-low: snow.mp$x > 0 but snow.mp$wx < 0
  4. Low-high: snow.mp$x < 0 but snow.mp$wx > 0
  We will also use lp < 0.05 to filter it. Now try to add a new feature snow$quadrant by these classifiers.
  - Answer
12. Finally, visualise this local spatial autocorrelation with tmap package again.
  - Answer

<aside> 🎉

Congratulations! You have now understood how spatial autocorrelation can be detected with Moran’s I. Perhaps you should look around the classmates around you, and you will find more similarities with them.

</aside>