Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.
Project description
nyc311
Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.
Authored by Blaise Albis-Burdige.
What this package does
nyc311 is the stable 1.x toolkit for turning NYC 311 service-request data
into reproducible complaint-intelligence outputs and publication-quality
statistical analyses.
It pairs a thin CLI with a typed SDK so the same workflow can run in batch jobs, scripts, notebooks, and consumer packages.
The current release line provides:
- load filtered NYC 311-style records from local CSV extracts or the live Socrata API
- derive deterministic first-pass topic labels for supported complaint types
- aggregate complaint topics by borough or community district
- measure topic-rule coverage and summarize resolution gaps
- score anomalies over aggregated topic summaries
- export CSV tables, boundary-backed GeoJSON, and markdown report cards
- expose the workflow through both a thin CLI and a composable functional SDK
- compose domain-specific factor pipelines over geographic units
- build balanced temporal panels with treatment-event modeling and inverse-distance spatial weights
- run interrupted-time-series, PELT changepoint, STL decomposition, Moran's I / LISA, and panel fixed/random-effects regressions
- causal inference: synthetic control, staggered difference-in-differences, event-study plots, regression discontinuity
- spatial econometrics: spatial lag and error models, geographically weighted regression
- equity analysis: Oaxaca-Blinder decomposition, Theil index, reporting-rate adjustment, latent reporting-bias EM
- diagnostics: seasonality-adjusted anomaly detection, power analysis / MDE calculator
- Bayesian: BYM2 small-area smoothing (behind
nyc311[bayes]) - point processes: Hawkes self-exciting process for complaint contagion
- bulk-fetch full-city extracts split per borough with
.meta.jsonintegrity sidecars
Geography layer
nyc311.geographies is the 311-facing compatibility layer over
nyc-geo-toolkit.
Use nyc311 when you want packaged NYC boundaries inside the 311 workflow. Use
nyc-geo-toolkit directly when you only need the generic geography assets,
normalization helpers, and boundary loaders.
factor-factory integration (v1.0.0)
As of v1.0.0, nyc311 wires through to
factor-factory's 17
causal-inference engine families via two additive adapters:
from nyc311.temporal import build_complaint_panel, TreatmentEvent
panel = build_complaint_panel(records, geography="community_district")
# Hand off to any factor-factory engine family:
ff_panel = panel.to_factor_factory_panel()
from factor_factory.engines.did import estimate as did_estimate
results = did_estimate(ff_panel, methods=("twfe",), outcome="complaint_count")
print(results[0].att, results[0].ci_95)
The nyc311.stats modules continue to work as before; eleven of the seventeen
now cross-reference their factor-factory equivalent in a .. note:: block. See
docs/integration.md for the full crosswalk and
docs/migration-v0-to-v1.md for the consumer
upgrade path.
Install the tearsheets extra to emit
jellycell manuscripts from the
bundled case studies:
pip install "nyc311[tearsheets]"
Install
Choose the dependency footprint that matches your workflow:
pip install nyc311
For the full turnkey experience:
pip install "nyc311[all]"
For pandas-backed conversion helpers:
pip install "nyc311[dataframes]"
For geopandas-backed geography and spatial helpers:
pip install "nyc311[spatial]"
For plotting helpers:
pip install "nyc311[plotting]"
For plotting and exploratory analysis without the geospatial stack:
pip install "nyc311[science]"
For statistical modeling (interrupted time series, changepoints, STL, Moran's I, panel regressions):
pip install "nyc311[stats]"
For BYM2 small-area smoothing (PyMC):
pip install "nyc311[bayes]"
Why this exists
NYC 311 data is one of the richest public records of neighborhood quality-of-life complaints in the country, but much of the useful signal is locked inside short text fields such as complaint descriptors.
nyc311 turns those records into reusable outputs for civic analysis,
journalism, and research through an explicit, testable workflow.
Core workflow
The current stable workflow is:
- load records from a local CSV extract or a filtered Socrata slice
- filter by date, geography, and complaint type
- assign a first-pass topic label using explicit keyword rules
- aggregate counts by borough or community district
- export a CSV summary table or boundary-backed GeoJSON artifact
Supported topic extraction
The current rules-based topic extractor is implemented for the complaint types
returned by nyc311.models.supported_topic_queries() (nine high-volume types
including noise, rodents, street condition, heat/hot water, sanitary, and
abandoned vehicles).
This is intentionally described as first-pass topic extraction, not clustering or advanced NLP.
Time series
Use nyc311.dataframes helpers for DatetimeIndex complaint counts and panel
layouts:
from nyc311 import pipeline, presets
from nyc311.dataframes import to_timeseries, to_panel
records = pipeline.fetch_service_requests(
filters=presets.brooklyn_borough_filter(
start_date="2024-01-01",
end_date="2024-12-31",
complaint_types=("Noise - Residential", "Rodent"),
),
socrata_config=presets.large_socrata_config(),
cache_dir="./cache",
)
ts = to_timeseries(records, freq="W")
ts.plot(title="Weekly complaint volume")
panel = to_panel(records, freq="ME", geography="borough")
panel.xs("BROOKLYN")["Noise - Residential"].plot()
Data surface
- Socrata: dataset
erm2-nwe9(NYC 311 Service Requests from 2010 onward; tens of millions of rows). Usepresets.large_socrata_config()for bulk pagination (default 5,000 rows per HTTP request) andnyc311.io.cached_fetchto stream pages to CSV without holding the full history in memory. - Boundaries: borough, community district, council district, NTA, census
tract, and ZCTA layers ship through
nyc311.geographies(built onnyc-geo-toolkit). - Caching: pass
cache_dirand optionalrefresh/max_cached_recordstopipeline.fetch_service_requestsorio.load_service_requestsso repeated runs reuse deterministic CSV snapshots undercache_dir.
Quick links
Docs: Home, Getting Started, CLI Reference, SDK Guide, Examples, Architecture, Contributing, Releasing, Changelog
Example
from datetime import date
from pathlib import Path
from nyc311 import analysis, export, models, pipeline
records = pipeline.fetch_service_requests(
filters=models.ServiceRequestFilter(
start_date=date(2025, 1, 1),
end_date=date(2025, 1, 31),
geography=models.GeographyFilter("borough", models.BOROUGH_BROOKLYN),
complaint_types=("Noise - Residential",),
),
socrata_config=models.SocrataConfig(page_size=250, max_pages=1),
)
export.export_service_requests_csv(
records,
models.ExportTarget("csv", Path("brooklyn-noise-snapshot.csv")),
)
assignments = analysis.extract_topics(records, models.TopicQuery("Noise - Residential"))
summary = analysis.aggregate_by_geography(assignments, geography="community_district")
export.export_topic_table(
summary,
models.ExportTarget("csv", Path("brooklyn-noise-topics.csv")),
)
CLI equivalent:
nyc311 fetch \
--output brooklyn-noise-snapshot.csv \
--complaint-type "Noise - Residential" \
--geography borough \
--geography-value BROOKLYN \
--start-date 2025-01-01 \
--end-date 2025-01-31 \
--page-size 250 \
--max-pages 1
nyc311 topics \
--source brooklyn-noise-snapshot.csv \
--complaint-type "Noise - Residential" \
--geography community_district \
--output brooklyn-noise-topics.csv
Live-data snapshot workflow:
nyc311 fetch \
--output brooklyn-rodent-snapshot.csv \
--complaint-type "Rodent" \
--geography borough \
--geography-value BROOKLYN \
--start-date 2025-01-01 \
--end-date 2025-01-31 \
--page-size 500 \
--max-pages 1
Factor pipeline
nyc311.factors composes domain-specific metrics over geographic units:
from datetime import date
from nyc311.factors import (
ComplaintVolumeFactor,
EquityGapFactor,
FactorContext,
Pipeline,
ResponseRateFactor,
SpatialLagFactor,
TopicConcentrationFactor,
)
contexts = [
FactorContext(
geography="community_district",
geography_value=cd,
complaints=tuple(complaints),
time_window_start=date(2024, 1, 1),
time_window_end=date(2024, 12, 31),
)
for cd, complaints in records_by_cd.items()
]
result = (
Pipeline()
.add(ComplaintVolumeFactor())
.add(ResponseRateFactor())
.add(TopicConcentrationFactor())
.run(contexts)
)
df = result.to_dataframe() # one row per CD, one column per factor
See the SDK guide for the matching temporal-panel, statistical-modeling, and bulk-download examples.
Data assumptions
load_service_requests() currently supports:
- local CSV files
- live Socrata loading via
SocrataConfig
CSV inputs use these columns:
unique_keycreated_datecomplaint_typedescriptorboroughcommunity_districtorcommunity_board
resolution_description is optional and loaded when present. It is currently
used by the resolution-gap and report-card helpers, while topic extraction
remains descriptor-driven.
Public package surface
The public API is organized around explicit namespaces:
nyc311.modelsfor dataclasses, constants, and configsnyc311.iofor CSV and Socrata loadingnyc311.analysisfor topic extraction, coverage, gaps, and anomaliesnyc311.geographiesfor the 311-facing compatibility layer overnyc-geo-toolkitnyc311.samplesfor packaged sample records and sample-aligned boundariesnyc311.exportfor CSV, GeoJSON, and report exportsnyc311.pipelinefor one-call workflow helpersnyc311.dataframesfor optional pandas conversionsnyc311.spatialfor optional geopandas helpersnyc311.plottingfor optional plotting helpersnyc311.presetsfor reusable filter and Socrata config buildersnyc311.factorsfor the composable factor pipeline and built-in domain factors (including SpatialLagFactor and EquityGapFactor)nyc311.temporalfor balanced panel datasets, treatment events, and inverse-distance spatial weightsnyc311.statsfor ITS, PELT changepoints, STL, Moran's I / LISA, panel fixed/random-effects regressions, synthetic control, staggered DiD, event study, RDD, spatial lag/error, GWR, Oaxaca-Blinder, Theil, reporting-bias adjustment, BYM2, Hawkes, anomaly detection, and power analysisnyc311.cliwith thetopicsandfetchsubcommands
Documentation
The hosted docs site is the canonical reference: nyc311.readthedocs.io.
If you are browsing in GitHub, the source docs live in docs/, including
index.md, getting-started.md, cli.md, sdk.md, examples.md, api.md,
architecture.md, and contributing.md.
Runnable examples live in examples/ as self-contained consumer projects.
Precious research case studies (real data, cited in CITATION.cff) under
examples/case_studies/:
- Rat Containerization -- Evaluates the 2024 NYC containerization mandate using 81K real rodent complaints, the factor pipeline, STL decomposition, Moran's I, Theil inequality, synthetic control, staggered DiD, event study, RDD, and power analysis across 70 community districts.
- Resolution Equity -- Investigates whether resolution times vary by neighborhood demographics using 1M real 311 requests, two-way FE regression, Oaxaca-Blinder decomposition with ACS census data, spatial autocorrelation, ITS, and latent reporting-bias estimation.
factor-factory engine showcases (synthetic data, offline in seconds):
- SDID multi-borough policy -- Synthetic Difference-in-Differences (Arkhangelsky et al. 2021, AER) over a 5-borough × 36-month simulated 311 intake rollout.
- Mediation cascade (resolution) -- Four-way mediation decomposition (VanderWeele 2014, Epidemiology) of pilot → triage-time → resolution-rate.
- factor-factory quickstart --
Minimal
PanelDataset → factor_factory.tidy.Panel → engine → pandasin ~50 lines, without jellycell. Starting point for consumers who want the adapter without the tearsheet machinery.
For local preview:
make docs
make docs-build
Development
uv sync
uv sync --all-groups --all-extras
uv run --all-extras pytest -m "not integration"
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run mkdocs serve
uv run mkdocs build --strict
uv run python scripts/audit_public_api.py
uv run pytest -m "fetch and not integration"
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nyc311-1.0.1.tar.gz.
File metadata
- Download URL: nyc311-1.0.1.tar.gz
- Upload date:
- Size: 14.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db24613865ddf27bd82475b3c69671580feff8dfeb0d64eb8695e5be873f2db8
|
|
| MD5 |
21e7ebddec3424dd4d339299e6085325
|
|
| BLAKE2b-256 |
e31083adc4c2c0efc7788669620f0c13184c9c03c0b62f8b13e9ab4683bce5f7
|
Provenance
The following attestation bundles were made for nyc311-1.0.1.tar.gz:
Publisher:
cd.yml on random-walks/nyc311
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nyc311-1.0.1.tar.gz -
Subject digest:
db24613865ddf27bd82475b3c69671580feff8dfeb0d64eb8695e5be873f2db8 - Sigstore transparency entry: 1343716949
- Sigstore integration time:
-
Permalink:
random-walks/nyc311@d4794b2eaa5a77cd7b7ebeb1dcaa81b4dc73aa62 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/random-walks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@d4794b2eaa5a77cd7b7ebeb1dcaa81b4dc73aa62 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nyc311-1.0.1-py3-none-any.whl.
File metadata
- Download URL: nyc311-1.0.1-py3-none-any.whl
- Upload date:
- Size: 119.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1924914258bf73f6130a724140dd5ac75015fc9f70723e5457d601f56914d05b
|
|
| MD5 |
87c3a25c6092322b5b5197b723069534
|
|
| BLAKE2b-256 |
5ab975b61ef418ba710a48481b4efd9363f27890866e8afc8498051fae16687f
|
Provenance
The following attestation bundles were made for nyc311-1.0.1-py3-none-any.whl:
Publisher:
cd.yml on random-walks/nyc311
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nyc311-1.0.1-py3-none-any.whl -
Subject digest:
1924914258bf73f6130a724140dd5ac75015fc9f70723e5457d601f56914d05b - Sigstore transparency entry: 1343716950
- Sigstore integration time:
-
Permalink:
random-walks/nyc311@d4794b2eaa5a77cd7b7ebeb1dcaa81b4dc73aa62 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/random-walks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@d4794b2eaa5a77cd7b7ebeb1dcaa81b4dc73aa62 -
Trigger Event:
release
-
Statement type: