Areal data

Birth counts 1974-78, North Carolina counties

Elevation at raster grid cells covering Luxembourg

Cellular neighbourhood in colorectal cancer tissue

<aside> 💡

We don’t always observe data at points, but summarised for areas.

</aside>

$D$ is a fixed countable collection of areal units at which variables are observed
- Fixed = set of regions decided a priori and not changed during analysis
Information aggregated over fixed, non-overlapping geographic regions, rather than at specific points
Examples
- Number of individuals with a certain disease in provinces of a country
- Number of trees in a mixed plantation
- Average electricity consumption in districts of a city
Regular lattice
- Structured as a matrix in rows and columns
- Areas are all of the same size equal to the cells of a raster grid
- Spatially inhomogeneous distribution of the events

Mapping of cells

Spatial distribution of trees in a rain forest

Irregular lattice
- Real-life lattice data are seldom in a regular grid
- Areas different in size and shape
- Different number of neighbours
Sudden Infant Death Syndrome (SIDS) in North Carolina, 1974-1978

Spatial neighbourhood

Areal data → spatial autocorrelation
- Do close areas have similar or dissimilar values?
- Do neighbouring counties tend to have similar disease rates?
- Do adjacent forest blocks have similar tree densities?
- Do nearby districts have similar unemployment rates?
Spatial neighbours have different definitions

Contiguity neighbour Queen: single shared boundary point

Contiguity neighbour Rook: more than one shared boundary point

k-nearest neighbour: example of k = 3

Distance-based neighbour: example of 0.4

<aside> 💡

When is k-nearest or distance-based neighbour more useful than Queen or Rook?

</aside>

Contiguity Rook and Queen neighbours of k orders: example of k = 1 (dark grey) and k = 2 **(light grey)