An Alternative to Buffers for Spatial Merging

Car Crashes in Austin

spatial statistics
Author

Gio Circo, Ph.D.

Published

March 20, 2023

The Problem: Merging Points to Points

Code
library(tidyverse)
library(lubridate)
library(sf)


# DATA SOURCES AND INFO
# -----------------------------#
# https://data.austintexas.gov
#   vehicle crashes: 'Transportation-and-Mobility/Vision-Zero-Crash-Report-Data-Crash-Level-Records'
#   traffic cameras: 'Transportation-and-Mobility/Traffic-Cameras/'
#   austin council map: 'dataset/Boundaries-City-of-Austin-Council-Districts'

# Select a specific Austin Council District and year
# see: https://maps.austintexas.gov/GIS/CouncilDistrictMap/
cnl <- 3
yr <- 2022


# DATA LOADING
# -----------------------------#

# Get Austin shapefile, pull only the district we need
austin <- st_read("C:/Users/gioc4/Documents/blog/data/austin_city.shp", quiet = TRUE) %>%
  st_transform(crs = 32614) %>%
  filter(council_di %in% cnl)

# Read traffic camera data & vehicle crash data
# Limit crashes to a specific year, conver to spatial
camera <- st_read("C:/Users/gioc4/Documents/blog/data/traffic_camera.shp", quiet = TRUE) %>%
  filter(camera_sta == "TURNED_ON") %>%
  distinct(geometry, .keep_all = TRUE) %>%
  st_transform(crs = 32614) %>%
  mutate(camera_X = st_coordinates(.)[,1],
         camera_Y = st_coordinates(.)[,2])

crash <- read_csv(unz("C:/Users/gioc4/Documents/blog/data/austin_crash.zip","crash_data.csv")) %>%
  mutate(crash_date = strptime(crash_date, format="%m/%d/%Y %H:%M")) %>%
  filter(year(crash_date) == yr)

# Convert crash to sf, extract coordinates
crash_sf <- crash %>%
  filter(!is.na(latitude), !is.na(longitude)) %>%
  st_as_sf(coords = c('longitude', 'latitude')) %>%
  st_set_crs(4326) %>%
  st_transform(crs = st_crs(camera)) %>%
  mutate(crash_X = st_coordinates(.)[,1], 
         crash_Y = st_coordinates(.)[,2]) %>%
  select(crash_id, crash_date, crash_X,crash_Y)

# Clip to region
camera <- camera[austin,]
crash_sf <- crash_sf[austin,]

This is a bit of a mini-blog post based on a workflow that I have used based on some of my own work. A common issue in spatial analysis - and especially in criminology - is the need to analyze points that are merged to another point.

In criminology we might say that assaults occurring right outside of a bar are within it’s “spatial influence”. Typically what is done is we define a “buffer” around each of the points \(j\) (like bars, or gas stations) of interest and merge all of the crime incidents \(i\) that are within each of the \(j\) points’ buffer area. This is something I’ve done before looking at the effect of CCTV cameras on crime at businesses in Detroit. This is pretty common across a lot of criminology research (e.g. finding all crime that occurs within a 1-block radius of bars and liquor stores).

While I used to use the “buffer” method, I think there is a more efficient way of doing this via Voronai polygons which accomplishes the same goal, and allows for more flexibility in analysis. Let’s illustrate this using some data from the city of Austin. In this example we are going to look at the incidence of car crashes \(i\) around traffic cameras \(j\). Our goal will be to merge car crashes to the nearest traffic camera within a defined spatial range.

Here’s the study area - one of the Austin city council districts, showing the traffic cameras in blue, and the location of crashes in red. In the picture below there are 58 cameras and about 1,700 car accidents. For this example we’re restricting our analysis to only accidents that occurred in 2022 and using cameras that were active (TURNED_ON) at the time. We can see that there are a lot of accidents, many of them quite far from a traffic camera. Let’s say we want to define a study area around each traffic camera of about 300 meters - or about 980 feet.

Code
ggplot() +
  geom_sf(data = austin, fill = "#88CCEE", alpha = .3) +
  geom_sf(data = crash_sf, color = '#BB5566', size = .6, alpha = .7) +
  geom_sf(data = camera, color = '#004488', shape = 3, size = 2, stroke = 1.5) +
  theme_void()

Spatial Merging using Voronoi Polygons

Voronoi Polygons

Voronoi polygons(or tessellations) are useful for a number of purposes. Given a set of points \(j_n\) we define a set of \(n\) regions where all spaces within each region has a single nearest neighbor of the initial point \(i\). Practically speaking, this just means we sub-divide a study area into smaller areas corresponding to the proximity to a point. This has many useful properties, such as determining nearest-neighbor distances from points to points. Let’s see how we can do this in R.

To start, we’ll first use a helper function to convert the Voronoi tessellation to an sf object that is suitable for merging. We’ll then merge the camera data to the polygon we just created (using st_intersection) and pull a few of the variables we’ll want for this example.

Code
# Helper function to simplify tessellation
# borrowed from: 
# https://gis.stackexchange.com/questions/362134
st_voronoi_point <- function(points){
  ## points must be POINT geometry
  # check for point geometry and execute if true
  if(!all(st_geometry_type(points) == "POINT")){
    stop("Input not  POINT geometries")
  }
  g = st_combine(st_geometry(points)) # make multipoint
  v = st_voronoi(g)
  v = st_collection_extract(v)
  return(v[unlist(st_intersects(points, v))])
}


# create Voronoi tessellation over cameras
camera_poly <- st_voronoi_point(camera) %>%
  st_intersection(austin) %>%
  st_as_sf() %>%
  mutate(camera_id = camera$camera_id,
         camera_X = camera$camera_X,
         camera_Y = camera$camera_Y)

Now we can plot the result. Below we see we now have a defined set of regions corresponding to the areas nearest to each camera. Therefore, any crashes that occur in one of the Voronoi polygons is also its nearest camera. This saves us the step of determining which point is its nearest neighbor.

Code
ggplot() +
  geom_sf(data = camera_poly, fill = "#88CCEE", alpha = .3) +
  geom_sf(data = camera, color = '#004488', shape = 3, size = 2, stroke = 1.5) +
  theme_void()

Spatial Merging

After we’ve created the Voronoi regions, all we need to do is merge each point to the region it falls within (which implies the camera there is its nearest neighbor) and then compute the euclidean distance from the crash to the camera. The code below uses a for-loop to get the pairwise distances after spatial joining and then limits the output to only crashes that are within 300 feet of the nearest camera.

Code
# JOIN AND MERGE
# ----------------------- #

# compute euclidean distance
edist <- function(a,b){
  sqrt(sum((a - b) ^ 2))
}

# get x-y coords for crashes and cameras
# convert to matrix
camera_crash <-  st_join(crash_sf,camera_poly) %>%
  tibble() %>%
  select(camera_id, 
         crash_id, 
         camera_X, 
         camera_Y, 
         crash_X, 
         crash_Y)

dmat <- matrix(c(camera_crash$camera_X, 
                 camera_crash$camera_Y, 
                 camera_crash$crash_X, 
                 camera_crash$crash_Y),
               ncol = 4)

# compute pairwise distances
dlist <- list()
for(i in 1:nrow(dmat)){
  dlist[[i]] <- edist(c(dmat[i,1], dmat[i,2]), c(dmat[i,3], dmat[i,4]))
}

camera_crash$dist <- unlist(dlist)

# get ids of within 300 meters
dist_ids <- camera_crash$dist <= 300

Now we can plot the results. As we see below we now only have crashes that are within 300 feet or less of the nearest camera. One advantage of this approach is that we can make any adjustments to the spatial region we’re interested in by just adjusting the filter above - or we can use the full range of distances in our analysis and look at decay effects (for example, the effect of CCTV cameras on crime clearance).

Code
ggplot() +
  geom_sf(data = austin, fill = "#88CCEE", alpha = .3) +
  geom_sf(data = crash_sf[dist_ids,], color = '#BB5566', size = .6, alpha = .7) +
  geom_sf(data = camera, color = '#004488', shape = 3, size = 2, stroke = 1.5) +
  theme_void()

With this done, we can do any kind of further investigation. For example, which camera observed the greatest number of crashes? Here, the top-ranked camera is at a 4-way intersection leading to the highway. Also, due to its proximity to the highway, it’s very likely that our distance size (300 meters, or about 900 feet) is picking up accidents that are occurring on the highway below. Of course, this is just a demonstration of method of spatial merging, not an investigation into traffic accidents in Austin!

Code
camera_crash %>%
  filter(crash_id %in% crash_sf[dist_ids,]$crash_id) %>%
  count(camera_id) %>%
  arrange(desc(n)) %>%
  slice(1:5)
# A tibble: 5 × 2
  camera_id     n
  <chr>     <int>
1 919          72
2 674          59
3 948          46
4 155          44
5 962          35

Summary

This little mini-blog highlighted some approaches that can be taken to perform a relatively common spatial procedure. Using Voronoi polygons we looked at how we can use them to easily calculate nearest-neighbor distances. These types of spatial approaches aren’t necessarily the sexiest topics, but I find they often help considerably with modelling pipelines down the road. Sometimes have a good foundation can help with further analysis later.

An Aside: An even (easier) method?

Of course, another method is to simply use the a convenient function embedded in the sf library aptly named st_nearest_feature(). This takes two sf objects and returns the indexes of \(y\) that are nearest to \(x\). While the solution here is equivalent to the one above, it might not necessarily be available in your given software package. Also, while I have no testing to support this, I expect that this would likely be slow in case of many pairwise distances. The presence of the polygons helps avoid the unnecessary computation of distances between points that are not nearest neighbors.

# get index of cameras nearest to each point
idx <- st_nearest_feature(crash_sf, camera)
id_dist <- st_distance(camera[idx,], crash_sf, by_element = TRUE)

id_dist[1:5]
Units: [m]
         1          2          3          4          5 
  49.88593 1202.17589  784.61324 1412.67832  282.73844