Spatial Effects

Spatial data analysis is important in many fields such as environmental science, epidemiology, and climatology, where observations are collected across different geographical locations. A key challenge in spatial modeling is accounting for the dependence structure among nearby regions, which often display correlated patterns in outcomes.

In generalized additive models for location, scale, and shape (GAMLSS), spatial effects can be incorporated in various ways, such as through Markov random fields (MRFs). MRFs handle spatial correlation by applying penalties that reflect the neighborhood structure of the spatial data. This vignette is divided into two parts: the first part demonstrates how to estimate discrete spatial effects using MRFs with the gamlss2 package, while the second part provides examples of modeling spatial effects with smooth functions like thin-plate splines or tensor product splines.

Example: Modeling Severe Storm Counts in Germany

In this example, we analyze severe storm counts recorded at various weather stations across Germany over multiple years. Our goal is to model these storm counts while accounting for the spatial dependence between stations. To achieve this, we associate each weather station with its respective county in Germany, enabling us to incorporate the geographical structure into the model.

## load the Germany severe storm data
data("storms", package = "gamlss2")

## plot storm counts per station and year
plot(range(storms$year), range(storms$counts), type = "n",
  xlab = "Year", ylab = "Counts")
for(j in levels(storms$id)) {
  dj <- subset(storms, id == j)
  dj <- dj[order(dj$year), ]
  with(dj, lines(counts ~ year, type = "b", pch = 16,
    col = rgb(0.1, 0.1, 0.1, alpha = 0.4)))
}

The data contains storm counts per year for each station. A preliminary visualization of these counts allows us to inspect patterns of storm frequency over time and across stations.

Visualizing the Spatial Structure

We begin by plotting the locations of weather stations on a map of Germany. The sf package is used to manage and plot spatial data.

## load map of Germany
## needs sf package for plotting
library("sf")
data("Germany", package = "gamlss2")

## plot station locations
plot(st_geometry(Germany))
co <- unique(storms[, c("lat", "lon")])
points(co, col = 2, pch = 4, lwd = 2)

This map shows the geographical distribution of weather stations. The spatial structure will be incorporated into our model to account for the proximity of stations when estimating storm counts.

Defining the Neighborhood Structure

Next, we define the neighborhood structure among the weather stations using a distance-based criterion. This is crucial for the Markov random field, as it specifies how spatial correlation should be penalized.

## estimate spatial count model using
## a Markov random field, first a neighbor matrix
## needs to be computed, here we use distance based
## neighbors
library("spdep")
Loading required package: spData
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
nb <- dnearneigh(st_centroid(st_geometry(Germany)), d1 = 0, d2 = 80)
plot(st_geometry(Germany), border = "lightgray")
plot.nb(nb, st_geometry(Germany), add = TRUE)
Warning in st_point_on_surface.sfc(coords): st_point_on_surface may not give
correct results for longitude/latitude data

The neighbor matrix is constructed using the poly2nb() function from the spdep package, which calculates the adjacency structure of the geographical regions. We then visualize the spatial network of neighbors on the map.

Constructing the Penalty Matrix

The penalty matrix defines the spatial penalties imposed by the Markov random field. The matrix is constructed based on the neighbor relationships defined earlier.

## compute final neighbor penalty matrix
K <- nb2mat(nb, style = "B", zero.policy = TRUE)

## assign region names
rownames(K) <- colnames(K) <- levels(Germany$id)

## set up final penalty matrix
K <- -1 * K
diag(K) <- -1 * rowSums(K)

## remove regions not in data
i <- which(rownames(K) %in% levels(storms$id))
K <- K[i, i]

The penalty matrix K is set up such that it reflects the neighborhood relationships between the regions. Each element of the matrix represents how strongly each region is connected to its neighbors. The diagonal entries represent the total number of neighbors for each region.

Estimating the Model

We now estimate the spatial count model using the Negative Binomial distribution (NBI). The model includes smooth functions of altitude, year, and an interaction between altitude and year. Spatial effects are incorporated using the bs = "mrf" option.

## estimate count model using the NBI family,
## model formula is
f <- ~ s(alt) + s(year) + s(id, bs = "mrf", xt = list("penalty" = K)) +
  te(alt, year)
f <- list(update(f, counts ~ .), f)

## estimate model using BIC for shrinkage parameter selection
b <- gamlss2(f, data = storms, family = NBI, criterion = "BIC")
## model summary
summary(b)
Call:
gamlss2(formula = counts ~ s(alt) + s(year) + s(id, bs = "mrf", 
    xt = list(penalty = K)) + te(alt, year) | s(alt) + s(year) + 
    s(id, bs = "mrf", xt = list(penalty = K)) + te(alt, year), 
    data = storms, family = NBI, ... = pairlist(x = FALSE, criterion = "BIC"))
---
Family: NBI 
Link function: mu = log, sigma = log
*--------
Parameter: mu 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.15816    0.01616   71.68   <2e-16 ***
---
Smooth terms:
      s(alt)  s(year)    s(id) te(alt,year)
edf   8.9102   6.3675 100.9338       4.5541
*--------
Parameter: sigma 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.61129    0.02754   -22.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---
Smooth terms:
     s(alt) s(year)   s(id) te(alt,year)
edf  5.4313  1.9047 25.9703       3.0937
*--------
n = 3494 df =  159.17 res.df =  3334.83
Deviance = 15698.2644 Null Dev. Red. = 18.53%
AIC = 16016.5958 elapsed =  8.33sec

Here, the spatial effect is modeled as an MRF smooth (s(id, bs = "mrf")), where the penalty matrix K enforces spatial structure based on neighboring stations. We use the Bayesian Information Criterion (BIC) to select the optimal smoothing parameters.

Visualizing the Estimated Effects

Finally, we visualize the estimated effects from the fitted model.

plot(b)

The plot shows the estimated smooth functions for altitude, year, and the spatial effect. These visualizations help us interpret how storm counts vary across space and time.

Prediction

First we predict the spatial effects on the scale of the predictors. Therefore, we set up a new data frame containing only the unique loactions.

nd <- unique(storms[, "id", drop = FALSE])

## set some dummy values, not needed when predicting single model term
nd$alt <- 1000
nd$year <- 2020

## prediction for the mu predictor
nd$fmu <- predict(b, newdata = nd,
  model = "mu", term = "s(id)",
  type = "link")

## same for sigma
nd$fsigma <- predict(b, newdata = nd,
  model = "sigma", term = "s(id)",
  type = "link")

## add fitted values to map of Germany
m <- merge(Germany, nd, by = "id", all.x = TRUE)

## plot spatial effects
library("ggplot2")
library("colorspace")
ggplot(m) + geom_sf(aes(fill = fmu)) +
scale_fill_continuous_diverging("Blue-Red 3") + theme_bw()

## note that because of the discrete spatial effect,
## there are a lot of NAs, therefore, we need to compute
## predictions in such regions by averaging using the neighbors
## of a region. we use the neighbour list object nb to compute
## the predictions, this is iterated.
while(any(is.na(m$fmu))) {
  m$fmu <- sapply(nb, function(i) mean(m$fmu[i], na.rm = TRUE))
}

## plot final spatial effect
ggplot(m) + geom_sf(aes(fill = fmu)) +
scale_fill_continuous_diverging("Blue-Red 3") + theme_bw()

References

Rigby, R. A., and D. M. Stasinopoulos. 2005. “Generalized Additive Models for Location, Scale and Shape.” Journal of the Royal Statistical Society Series C (Applied Statistics) 54 (3): 507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.