Preparing data for regression discontinuity design
Project description
🌍 geoRDDprep
Streamline your Geographical Regression Discontinuity Design (GeoRDD) workflow.
geoRDDprep is a high-performance Python toolkit designed to simplify spatial data preparation for boundary-analysis. Whether you are an economist, political scientist, or data analyst, this package helps you assign points to districts, clean up messy polygons, and implement rigorous spatial algorithms (such as the Turner orthogonal distance criteria) with ease.
🚀 Key Features
- ⚡️ Vectorized & Fast: Rewritten to use fully vectorized operations via
geopandasandshapely, delivering up to 18x speedups compared to standard row-by-row iteration. - 📐 Bug-Free Turner Algorithm: Out-of-the-box implementation of the orthogonal distance criteria from Turner et al. (2014), with robust math handling both horizontal and vertical boundary projections.
- 🧹 Smart Sliver Cleaning: Merge sliver polygons and boundary gaps using Voronoi diagrams with automatic ID mapping and dynamic padding (making it compatible with both degree and metric coordinate systems).
- 🔄 Automatic CRS Alignment: Automatically detects Coordinate Reference System (CRS) mismatches and re-projects inputs dynamically (issuing a warning instead of crashing). Fully supports naive geometries (where
crs = None). - 🛠️ Easy Integration: Integrates seamlessly with your existing
pandasandgeopandaspipelines.
📦 Installation
Install directly from PyPI:
pip install geoRDDprep
🛠️ API Reference
1. points_in_polygon
Assigns polygon characteristics to points that fall within them.
def points_in_polygon(
points_gdf: gpd.GeoDataFrame,
polygons_gdf: gpd.GeoDataFrame,
suffix_name: str
) -> gpd.GeoDataFrame
points_gdf: Point geometries.polygons_gdf: Polygon geometries with attributes to join.suffix_name: Suffix to append to joined polygon columns. Overlapping column names are cleanly renamed with a single underscore (e.g.,id_districtinstead ofid__district).
2. poly_to_line
Converts polygon boundaries into LineStrings.
def poly_to_line(polygon_gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame
polygon_gdf: Polygons or MultiPolygons to convert. Returns exploded boundaryLineStringelements, preserving all original attributes.
3. turner
Verifies if points satisfy the Turner et al. (2014) orthogonal distance criteria relative to boundaries.
def turner(
points_gdf: gpd.GeoDataFrame,
boundaries_gdf: gpd.GeoDataFrame,
*,
orth_distance: float = 15.0,
reduced: bool = True,
unit_crs: int = 3857
) -> gpd.GeoDataFrame
points_gdf: Point geometries.boundaries_gdf: LineString boundary geometries.orth_distance(keyword-only): Orthogonal distance threshold (in meters). Default15.reduced(keyword-only): IfTrue, returns only original columns plus theturner_passboolean result. DefaultTrue.unit_crs(keyword-only): EPSG code for metric distance calculation. Default3857(Web Mercator).
4. drop_tiny_lines
Filters out small boundary LineStrings to reduce map noise.
def drop_tiny_lines(
boundaries_gdf: gpd.GeoDataFrame,
method: str = 'percentile',
*,
percentile: float = 0.01,
num_dev: float = 2.0,
meters: float = 500.0,
reduced: bool = True,
unit_crs: int = 3857
) -> gpd.GeoDataFrame
method: Threshold method ('percentile','number_of_std', or'length').percentile(keyword-only): Quantile threshold (0-1). Default0.01.num_dev(keyword-only): Number of standard deviations below the mean. Default2.0.meters(keyword-only): Length cutoff in meters. Default500.0.
5. remove_sliver
Cleans sliver polygons and gaps by assigning them to their nearest neighbor using a Voronoi diagram.
def remove_sliver(
polygons_gdf: gpd.GeoDataFrame,
boundary_gdf: gpd.GeoDataFrame,
*,
id_col: Optional[str] = None
) -> gpd.GeoDataFrame
polygons_gdf: Input polygons to clean.boundary_gdf: Bounding geometry to clip the output.id_col(keyword-only): Name of the unique identifier column. IfNone, automatically searches for'id', checks the index name, or uses default indexing.
6. remove_overlaps
Removes overlapping line segments from df1 that are present in df2.
def remove_overlaps(df1: gpd.GeoDataFrame, df2: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
df1: The GeoDataFrame containing LineStrings to clean.df2: The GeoDataFrame containing geometries to subtract.
🛠️ Usage Examples
1. Assign Addresses to Districts
import geopandas as gpd
from geoRDDprep import points_in_polygon
points = gpd.read_file("addresses.geojson")
districts = gpd.read_file("school_districts.geojson")
# Merges district characteristics into matching points
result = points_in_polygon(points, districts, suffix_name="_district")
print(result.head())
2. The Turner Algorithm (2014)
Check if points are within 15 meters orthogonal distance of school boundaries and not close to vertices or endpoints.
from geoRDDprep import poly_to_line, drop_tiny_lines, turner
# 1. Convert school district polygons to boundary lines
lines = poly_to_line(districts)
# 2. Remove tiny boundary segments (less than 500m) to reduce noise
clean_lines = drop_tiny_lines(lines, method='length', meters=500)
# 3. Match points to boundaries (within 15m)
matched_data = turner(points, clean_lines, orth_distance=15)
# Check which points passed the Turner check
print(matched_data['turner_pass'].value_counts())
3. Clean Slivers and Gaps
from geoRDDprep import remove_sliver
# Merge gaps/slivers into neighbor polygons using a custom identifier column
clean_polygons = remove_sliver(messy_polygons, boundary_clip, id_col="district_code")
🧪 Running Tests
To verify package modifications, you can run the test suite using pytest.
- Install test dependencies:
pip install pytest
- Run tests:
pytest tests/
🤝 Contributing
We welcome contributions!
- Fork the repository.
- Create a feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
📄 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file georddprep-0.1.5.tar.gz.
File metadata
- Download URL: georddprep-0.1.5.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f269c1eb2fdc9a9b228e9193e62844d1d575d8fd90a438ea32d822493026e227
|
|
| MD5 |
4505eb46caaba301d3a36954cead5f99
|
|
| BLAKE2b-256 |
5dad3b2f92fb8dff59d0bc096c546f5e6a677da4d56aa01ca1ea9bf21b9d3078
|
File details
Details for the file georddprep-0.1.5-py3-none-any.whl.
File metadata
- Download URL: georddprep-0.1.5-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7a2ae37318d292f7c7657e45c439a137f43f43e80d40c5657dd74bfad139a08
|
|
| MD5 |
7809462b8d585cd1f3986cdf07ddf3bc
|
|
| BLAKE2b-256 |
eba71e7041adcb629a92aabd719bd17ff66abccaa6594891dfad4a49c449192b
|