Skip to main content

Preparing data for regression discontinuity design

Project description

geoRDDprep

geoRDDprep is a Python package designed to streamline the data preparation process for Geographical Regression Discontinuity Design (GeoRDD). It provides efficient tools for spatial joins, polygon-to-line conversions, and implementing the Turner et al. (2014) algorithm for assigning points to boundaries.

Features

  • points_in_polygon: Efficiently assign points to polygons (e.g., addresses to school districts).
  • turner: Assign points to LineStrings based on orthogonal distance criteria (Turner et al., 2014).
  • poly_to_line: Convert Polygon geometries to LineStrings for boundary analysis.
  • drop_tiny_lines: Filter out small, noisy line segments to improve analysis quality.
  • remove_sliver: Clean up sliver polygons using Voronoi diagrams.
  • remove_overlaps: Remove overlapping segments between line datasets.

Installation

You can install the package directly from the source:

pip install .

Or, if you are developing:

pip install -e .

Usage

1. Assign Points to Polygons

Map addresses or other points to their respective administrative regions.

import geopandas as gpd
from geoRDDprep import points_in_polygon

# Load your data
points = gpd.read_file("addresses.geojson")
districts = gpd.read_file("districts.geojson")

# Assign points to districts
# The resulting GeoDataFrame will have columns from 'districts' suffixed with '_district'
result = points_in_polygon(points, districts, suffix_name="_district")

2. Prepare Boundaries (Polygons to Lines)

Convert polygon boundaries into lines for distance analysis.

from geoRDDprep import poly_to_line, drop_tiny_lines

# Convert polygons to lines
lines = poly_to_line(districts)

# Clean up noise by dropping very short lines (e.g., < 500 meters)
clean_lines = drop_tiny_lines(lines, method='length', meters=500)

3. Turner Algorithm

Assign points to boundaries based on distance and orthogonality.

from geoRDDprep import turner

# Match points to the nearest boundary within 15 meters
# 'turner_pass' column will be True if the point satisfies the criteria
matched_data = turner(points, clean_lines, orth_distance=15)

Requirements

  • geopandas
  • shapely
  • numpy
  • pandas
  • scipy

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

georddprep-0.1.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

georddprep-0.1.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file georddprep-0.1.1.tar.gz.

File metadata

  • Download URL: georddprep-0.1.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for georddprep-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dfd4f1c1c6cde1065f6aab188f1488da4e17f5c6982059c9e179c3666b5b02ef
MD5 38787b6d2924fc7cd36b9bd40c5035bd
BLAKE2b-256 6cde7dafd1d6cf78e50bf1fbad07243483e27f39ed2f9faa8b49249c6e03a5fc

See more details on using hashes here.

File details

Details for the file georddprep-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: georddprep-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for georddprep-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 77157ca5c2287a6aa0004db19654dbfa33a90c80fbf20e65da66519175e25610
MD5 4312af2cef6126e249fd958f8597a6db
BLAKE2b-256 d1c3f2ad1e6209dac203df18b5fb9cbd7b061e1ca900fb74d1b5565a8b1f1fcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page