Modeling Processes of Neighborhood Change (MPONC)

Reference paper

@misc{mori2024modelingprocessesneighborhoodchange,
      title={Modeling Processes of Neighborhood Change}, 
      author={J. Carlos Martínez Mori and Zhanzhan Zhao},
      year={2024},
      eprint={2401.03307},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2401.03307}, 
}

Setup

cd 25Sp-MPONC/modeling_processes_of_neighborhood_change_new
conda create -n mponc python=3.10.16
conda activate mponc
pip install -r requirements.txt
python main.py

Abstract

This research project simulates the impact of the Atlanta Beltline on the surrounding metropolitan area using game theory. The simulation models agent movement across census tracts within the Atlanta-Sandy Springs-Roswell metro region, with agents seeking to move optimally (seeking 'attractive' census tracts) based on various factors.

Intro and Description

This project is based on the reference paper created by Dr. Martinez and Dr. Zhao, which aims to address the following: - How does the layout of transportation infrastructure affect the demographics of nearby neighborhoods? - Does the creation of these infrastructure actually benefit everyone equally; is it fair? - Can we predict the effects on surrounding communities before these structures are actually built?

These questions are primarily motivated by the issue of gentrification, an issue prevalent in many major cities. We utilized concepts in game theory, more specifically no-regret dynamics, in order to simulate the effects of the Atlanta Beltline on gentrification. To summarize our approach with no-regret dynamics:

People, or 'agents', randomly move from region to region. Depending the region's attributes, a 'cost' value is assigned to each action.
'Cost' is a function of region attractiveness, affordability, and community.
The higher the cost, the less likely an agent is to visit that census tract in the future.
This process is repeated until the probability distribution of visting census tracts converges - an equilibrium is reached, and further actions make no difference.
We compute the simulation convergance using the total‑variation distance between two sliding windows of recent agent‑distributions:
```
\mathrm{TV}(p,q)\;=\;\tfrac12\sum_{c}\lvert\,p(c)-q(c)\rvert\,.
```
If TV ≤ EPS_CONVERGENCE (default = 0.005) the system is deemed converged and the run halts automatically. All thresholds are configurable in config.py.
Alternatively, we can use a hardcoded runtime.

Cost Function

Every agent evaluates a tract with a cost defined as

cost = 1 – (affordability × attractiveness × community)

Factor	Scale	Quick intuition
Affordability	0 or 1	1 if the tract still has room or the agent is randomly selected to be an inhabitants (weight scales with relative wealth); 0 otherwise.
Attractiveness	0–1	How nice the tract is to live in (see sub‑components below).
Community	0–1	How close the agent’s income is to the local average—closer ⇒ higher score.

Attractiveness = upkeep × amenity_access × beltline_factor

Sub‑component	Range	What it captures
Upkeep	0 or 1	0 if the tract is abandoned (no residents); 1 otherwise.
Amenity access	0–1	Density of key POIs (restaurants, shops, transit stops, etc) weighted by distance.
BeltLine factor β	≤ 1	Extra accessibility for tracts in the BeltLine zone: β = 1.00 at ≤ 800 m, tapering linearly to β = 0.917 (1.10/1.20) at 1.6 km, then β = 0.833 (1.0/1.20).

* Amenity list adapted from 24Sp‑Mobility‑Seg; we omit several tags such as “shed”, “guardhouse”, “ferry_terminal”, “garages”, and “bridge”.

Implemented Amenities & weights (OpenStreetMap labels):

AMENITY_TAGS = {
      'amenity': ("bus_station|cafe|college|fast_food|food_court|fuel|library|restaurant|train_station|university|parking|school|hospital", 3),
      'shop': ("supermarket|food|general|department_store|mall|wholesale", 3),
      'landuse': ("residential|industrial|commercial|retail", 2)
}

* We operationalize β by giving tracts within 800 m of the BeltLine a +20 % boost to their Attractiveness score (β = 1.20/1.20); the boost then tapers linearly to +10 % at 1.6 km, and falls to β = 1.00/1.20 beyond that distance. These concrete percentages and distance bands approximate the BeltLine’s observed catchment zone and its predicted effect on nearby housing prices.

Community Score (Local Moran’s I)

We quantify how well an agent’s income matches its neighbours using Local Moran’s I:

\[ I_c = \frac{(w_c - \bar w)}{S^2} \sum_{j \in N(c)} w_{cj}\,(w_j - \bar w), \quad S^2 = \frac{1}{n}\sum_{k=1}^n (w_k - \bar w)^2 \]

w_c = average income in tract c
w̄ = regional mean income
w_{cj} = spatial weight (1 for adjacent tracts, 0 otherwise)
N(c) = neighbouring tracts of c

Each agent i with income w_i converts this statistic into a smooth score:

\[ \mathrm{Community}_i(c) = \exp\bigl(-\alpha\,\lvert w_i - I_c\rvert\bigr) \]

where α is set in config.py.
A closer match ⇒ value near 1 ⇒ lower cost.

Weighting amenity access vs community (λ)

A tunable parameter λ ∈ [0, 1] lets you emphasize either amenity access (high λ) or community match (low λ).
Internally we rewrite

cost = 1 - [[Affordability] × [Upkeep x (λ × AmenityAccess)] × [(1-λ) × Community]]

The Four-Step Model

Given that the agents move across various subregions in our simulation, one of the critical steps is determining the mode of transport of each agent. To do this in a way that accurately represents real-world distributions, we turned to the four-step model, a common trip generation algorithm:

The model has four components:

Trip Generation: This part of the model estimates the number of trips originating from or destined for a specific area. It focuses on understanding how many trips are generated rather than specific travel patterns. This process usually involves some type of data pertaining to the area at hand, such as demographics, income, or land usage.
Trip Distribution; This part of the model estimates the number of trips for routes that go from an area to another, as determined in the trip generation step. This process is typically done using the gravity model, which assumes that the number of trips are positively correlated with the attractiveness of an area and inversely correlated to distance.
Mode Choice: This part of the model determines the mode of transporation used to make the trips. This is typically done by considering demographic data (such as the percentage of people with cars) in an area.
Route Assignment: This part of the model determines the routes travelers take between origins and destinations. This is typically done by considering the route that takes the shorted possible time, and following that.

Our approach closely follows these four components. We first generate trips by considering the amenity density of areas. We sum up all amenity densities, and divide each area's density by this sum to generate a probability. We then utilize a Poisson Distribution to generate the number of trips by multipling a base number of trips by the probability. We then consider trip distribution through a modified gravity model. The equation for our model is the following, given that we aim to go from area/region i to j:

We essentially multiply the total number of trips from area i to area j with the net amenity score for the destination j times transportation cost for that specific trip from area i to j, divided by the net amenity score for area j times the transportation cost from area i to j summed up over all destination j's.

For our modal split, we assume that the car ownership rate is 0.7, and that the transit rate is 0.3. Each region's trips are split based on this. We then assign these routes based on the shortest possible distance.

Through this process, we were able to have a methodical way of distributing the agents across Atlanta based on area factors such as amenity density.

TIGER/Line Geodatabases shapefiles:

Project status

Outputs & configuration

Our code outputs a GIF to visualize agent behavior over time. Each circle represents the centroid of a census tract - green signifying those in the Atlanta Beltline - and the encircled number is the agent population. Our code also outputs a CSV file containing all the simulated data at every individual timestep.

Data contained in CSV's: Census tract name, agent population, raw average income, average income reported by census, normalized average incomes, and amenity density.
Note: 'Timestep' refers to a single instance agent action (relocation); 20,000 timesteps mean the agents relocate a total of 20,000 times during the simulation.

GIF

This GIF shows the behavior of 1,000 agents up to 20,000 timesteps, frames being captured every 400 timesteps. Rho=1, alpha=0.25.

Runtimes

(1000 agents, 349 census tracts) - Simulation (x8): ~19.6 minutes - GIF creation (x8), 50 frames: ~13 min * Graph, amenities, and centroids are cached after first build

Census-based approach

Our project utilizes US Census data in that: - The geographical regions our agents inhabit correspond directly to US census tracts (can correspond to any other census-defined geographic unit, i.e. zip codes, housing districts, and school districts). - Each 'agent' is assigned a 'wealth' value in our simulation. We create this distribution of wealth using Census data (population & median incomes), to represent real-world demographics.

Atlanta Beltline in our Simulation

We automate the process of labelling certain regions as 'in the Atlanta Beltline' by using commuting paths from OpenStreetMap that correspond to the Atlanta Beltline - namely, a bike trail and a railway. To experiment with a different beltline, such as a beltline that spanned across Atlanta horizontally, or simply expanded north by x miles, we would acquire the OpenStreetMap ID's of existing paths (bike trails, walking paths, roads, etc.) corresponding to our desired Beltline, and paste these into config.py. Alternatively, we can create a such path ourselves in OpenStreetMap. Then, any region containing segments of these trails would automatically be marked as "In the Atlanta Beltline".

In config.py - bike trail and railroad OpenStreetMap ID's for the beltline are as follows:

""" Beltline 'relation' IDs from Open Street Map """
RELATION_IDS = [8408433, 13048389]

Bike Trail	Railroad

Compare with Atlanta Beltline geography:

Adapting the Model to Other Cities

Although Atlanta serves as our case study, every pipeline stage—census shapefiles, OSM‑derived amenities, cost parameters, and even the BeltLine decision‑agent—can be swapped for a different region:

Geometry & Demographics
• Replace the Fulton/DeKalb TIGER/Line shapefiles with those of your target city.
• Point the MEDIAN_INCOME_URL and POP_URL in config.py to that city’s American Community Survey "ACS" tables.
Transit‑Ring Definition
• Identify (or sketch in OSM) the planned loop / BRT corridor / rail spur you want to study, then list its OSM relation IDs in config.py.
• The same β‑taper and DecisionAgent logic will assign accessibility boosts and density bonuses around the new corridor.
Policy Levers
• Tweak RHO_SCALAR to explore how strong the up‑zoning response should be for the above transit ring.

Because the simulation is purely data‑driven, you can rapidly prototype “what‑if” BeltLine analogues for anywhere with open census and OSM data while measuring potential community shifts/gentrification before shovels hit the ground. Example: * By changing the above URL, we get the following:

Alt text

Policy Scenarios: Vertical vs Horizontal Scaling

The simulation now supports two high-level policy experiments:

1. Vertical Scaling — Decision-Making Agent learns \(m\)

A dedicated DecisionAgent treats “how aggressively should we up-zone?” as a learning problem:

Action space:

\[ m \in \{0.00,\,0.01,\,\dots,\,1.00\} \]

sampled each timestep by multiplicative‑weights (no‑regret).

Base capacity curve:

U_c = 1 + \frac{\texttt{beltline\_score}_c - \texttt{BL\_LOW}}{\texttt{BL\_HIGH} - \texttt{BL\_LOW}} \times \bigl(\texttt{RHO\_SCALAR}_{max} - 1\bigr)

Effective multiplier

\[ \rho_c \;\leftarrow\; \rho_c \times \bigl[1 + m\,(U_c - 1)\bigr] \]

Two alternative utility metrics guide learning:

`UTILITY_METRIC`	Algorithm maximises	Real‑world analogy
`0` (default)	average utility (mean well‑being across all agents)	“Greatest good for the greatest number.”
`1`	minimum utility (well‑being of the worst‑off agent)	Rawlsian / max‑min fairness.

The DecisionAgent reinforces actions that raise the chosen utility, gradually converging to an ideal m for the current policy goal.

These “concrete zoning‑bonus percentages and distance bands” are the literal numbers (+20 %, +10 %, 800 m, 1600 m) encoded in DecisionAgent.py. Feel free to edit them in config.py.

2. Horizontal Scaling (complete BeltLine from day 0)

All census tracts whose centroids fall inside the 1.6 km BeltLine catchment zone begin with BeltLine factor β > 1 (default β = 1.20, editable in config.py).
Tracts outside that zone keep β = 1.00. This models an “all‑at‑once” completion of the transit loop.

Sobol Sensitivity Analysis

During calibration we ran a Sobol variance‑decomposition on six candidate features
(Affordability, Attractiveness, Community, Location, BeltLine, Upkeep).

Feature	1^st‑order index	Total‑order index
Affordability	0.42	0.45
Attractiveness	0.31	0.36
Community	0.18	0.22
Others	< 0.05	< 0.10

Because the cumulative contribution of the remaining three factors was < 10 %, we compressed the cost function to the product Affordability × Attractiveness × Community. The full Jupyter notebook lives in notebooks/sensitivity_sobol.ipynb.

Strengths and Weaknesses

Strengths

Our approach is very modularized. For instance, our code can easily be ran on other regions, with customizable 'Beltlines' and the definitio. It simply needs lists of agents, a NetworkX graph, and other generalized parameters to operate. Furthermore, Our approach is backed by established human behavior approaches (no-regret dynamics), utilizes a distribution system that is also established (four-step model). We are able to produce dynamic visuals (GIFs).

Weaknesses

Our simulation also assumes that there is no immigration/emigration in Atlanta, as we set a fixed number of agents. We also limit transportation choices to cars and public transportation, despite other modes of transport being popular (walking or biking) Additionally, our runtimes are relatively long due to the computationally expensive nature of the simulation. Ideally, our simulation would be ran in just a couple minutes or even seconds.

Next Steps

We hope to publish this research paper by this coming Fall semester. Most notably, we hope to improve the readability of our GIF's, improve the runtime of the simulation, validate our simulation's accuracy, and include additional visualizations of our results to better communicate our analysis during discussion.

Presentation

Team

Name	Seniority	Major	School	# Semesters	GitHub Handle
Matthew Lim	Sophomore	Computer Science	SCS	2	mlim70
Justin Xu	Sophomore	Computer Science	SCS	1	JXU037
Devam Mondal	Junior	Computer Science	SCS	2	Dodesimo
Nithish Sabapathy	Junior	Computer Science	SCS	1	nithish101