NCAA Referee Dashboard

Exploring Referee Travel and Workload in Division I Men's Basketball

Project Goal: Explore whether referee travel distance influences officiating behavior during NCAA Division I Men's Basketball games.

Tech Stack: Python (requests, beautifulsoup, geopy), R (tidyverse, ggplot2, plotly), R Shiny

From Problem Framing to Data Product

In college basketball, high-stakes games demand fairness. But what if unseen factors influence outcomes? This project began with a simple question: Does referee travel distance affect officiating behavior? Before answering that, I had to build the data infrastructure from the ground up.

Laying the Data Foundation

Unlike modern APIs, stats.ncaa.org is a static HTML site with no structured access points. Every game, box score, and referee assignment is embedded in hard-to-navigate pages. The breakthrough came from discovering that each game had a unique ID buried in the URL. This became the anchor for scraping a full season's worth of data.

Diagram showing the game ID scraping strategy

Using custom Python scripts, I automated the collection of game IDs by date and extracted structured details for each matchup. This included team stats, locations, and referee crews. The final dataset consolidated 5,922 games across the 2024–25 season with 34 structured columns and over 770 referees.

Estimating Referee Travel

To measure travel, I geocoded venue names into latitude and longitude coordinates using Nominatim (OpenStreetMap). I then calculated straight-line distances between each official's assignments across the season. This helped reconstruct a proxy travel itinerary for every referee.

Building the Interactive Dashboard

To make the data actionable, I developed an R Shiny dashboard to visualize referee workload and travel behavior. It provides real-time exploration of referee patterns and flags outliers in assignment strategy.

Dashboard overview showing referee workload patterns

Key Features

Overview Tab

Rankings tab showing referee performance metrics

Referee Rankings Tab

Individual referee analysis view

Individual Referee Tab

Individual referee analysis view showing travel patterns and workload

Limitations and Design Choices

Behind the Scenes: Distance Calculation

A custom Python class was developed to:

← Back to Home