Areal data

Birth counts 1974-78, North Carolina counties

Elevation at raster grid cells covering Luxembourg

Cellular neighbourhood in colorectal cancer tissue
<aside>
💡
We don’t always observe data at points, but summarised for areas.
</aside>
- $D$ is a fixed countable collection of areal units at which variables are observed
- Fixed = set of regions decided a priori and not changed during analysis
- Information aggregated over fixed, non-overlapping geographic regions, rather than at specific points
- Examples
- Number of individuals with a certain disease in provinces of a country
- Number of trees in a mixed plantation
- Average electricity consumption in districts of a city
- Regular lattice
- Structured as a matrix in rows and columns
- Areas are all of the same size equal to the cells of a raster grid
- Spatially inhomogeneous distribution of the events

Mapping of cells

Spatial distribution of trees in a rain forest
Spatial neighbourhood
- Areal data → spatial autocorrelation
- Do close areas have similar or dissimilar values?
- Do neighbouring counties tend to have similar disease rates?
- Do adjacent forest blocks have similar tree densities?
- Do nearby districts have similar unemployment rates?
- Spatial neighbours have different definitions

Contiguity neighbour Queen: single shared boundary point

Contiguity neighbour Rook: more than one shared boundary point

k-nearest neighbour: example of k = 3

Distance-based neighbour: example of 0.4
<aside>
💡
When is k-nearest or distance-based neighbour more useful than Queen or Rook?
</aside>

Contiguity Rook and Queen neighbours of k orders: example of k = 1 (dark grey) and k = 2 **(light grey)
Neighbourhood matrices