Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

blaise-ab

These details have not been verified by PyPI

Project links

Project description

nyc311

nyc311 — NYC 311 complaint analysis

Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.

Authored by Blaise Albis-Burdige.

What this package does

nyc311 is the stable 1.x toolkit for turning NYC 311 service-request data into reproducible complaint-intelligence outputs and publication-quality statistical analyses.

It pairs a thin CLI with a typed SDK so the same workflow can run in batch jobs, scripts, notebooks, and consumer packages.

The current release line provides:

load filtered NYC 311-style records from local CSV extracts or the live Socrata API
derive deterministic first-pass topic labels for supported complaint types
aggregate complaint topics by borough or community district
measure topic-rule coverage and summarize resolution gaps
score anomalies over aggregated topic summaries
export CSV tables, boundary-backed GeoJSON, and markdown report cards
expose the workflow through both a thin CLI and a composable functional SDK
compose domain-specific factor pipelines over geographic units
build balanced temporal panels with treatment-event modeling and inverse-distance spatial weights
run interrupted-time-series, PELT changepoint, STL decomposition, Moran's I / LISA, and panel fixed/random-effects regressions
causal inference: synthetic control, staggered difference-in-differences, event-study plots, regression discontinuity
spatial econometrics: spatial lag and error models, geographically weighted regression
equity analysis: Oaxaca-Blinder decomposition, Theil index, reporting-rate adjustment, latent reporting-bias EM
diagnostics: seasonality-adjusted anomaly detection, power analysis / MDE calculator
Bayesian: BYM2 small-area smoothing (behind nyc311[bayes])
point processes: Hawkes self-exciting process for complaint contagion
bulk-fetch full-city extracts split per borough with .meta.json integrity sidecars

Geography layer

nyc311.geographies is the 311-facing compatibility layer over nyc-geo-toolkit.

Use nyc311 when you want packaged NYC boundaries inside the 311 workflow. Use nyc-geo-toolkit directly when you only need the generic geography assets, normalization helpers, and boundary loaders.

factor-factory integration (v1.0.0)

As of v1.0.0, nyc311 wires through to factor-factory's 17 causal-inference engine families via two additive adapters:

from nyc311.temporal import build_complaint_panel, TreatmentEvent

panel = build_complaint_panel(records, geography="community_district")

# Hand off to any factor-factory engine family:
ff_panel = panel.to_factor_factory_panel()

from factor_factory.engines.did import estimate as did_estimate

results = did_estimate(ff_panel, methods=("twfe",), outcome="complaint_count")
print(results[0].att, results[0].ci_95)

The nyc311.stats modules continue to work as before; eleven of the seventeen now cross-reference their factor-factory equivalent in a .. note:: block. See docs/integration.md for the full crosswalk and docs/migration-v0-to-v1.md for the consumer upgrade path.

Install the tearsheets extra to emit jellycell manuscripts from the bundled case studies:

pip install "nyc311[tearsheets]"

Install

Choose the dependency footprint that matches your workflow:

pip install nyc311

For the full turnkey experience:

pip install "nyc311[all]"

For pandas-backed conversion helpers:

pip install "nyc311[dataframes]"

For geopandas-backed geography and spatial helpers:

pip install "nyc311[spatial]"

For plotting helpers:

pip install "nyc311[plotting]"

For plotting and exploratory analysis without the geospatial stack:

pip install "nyc311[science]"

For statistical modeling (interrupted time series, changepoints, STL, Moran's I, panel regressions):

pip install "nyc311[stats]"

For BYM2 small-area smoothing (PyMC):

pip install "nyc311[bayes]"

Why this exists

NYC 311 data is one of the richest public records of neighborhood quality-of-life complaints in the country, but much of the useful signal is locked inside short text fields such as complaint descriptors.

nyc311 turns those records into reusable outputs for civic analysis, journalism, and research through an explicit, testable workflow.

Core workflow

The current stable workflow is:

load records from a local CSV extract or a filtered Socrata slice
filter by date, geography, and complaint type
assign a first-pass topic label using explicit keyword rules
aggregate counts by borough or community district
export a CSV summary table or boundary-backed GeoJSON artifact

Supported topic extraction

The current rules-based topic extractor is implemented for the complaint types returned by nyc311.models.supported_topic_queries() (nine high-volume types including noise, rodents, street condition, heat/hot water, sanitary, and abandoned vehicles).

This is intentionally described as first-pass topic extraction, not clustering or advanced NLP.

Time series

Use nyc311.dataframes helpers for DatetimeIndex complaint counts and panel layouts:

from nyc311 import pipeline, presets
from nyc311.dataframes import to_timeseries, to_panel

records = pipeline.fetch_service_requests(
    filters=presets.brooklyn_borough_filter(
        start_date="2024-01-01",
        end_date="2024-12-31",
        complaint_types=("Noise - Residential", "Rodent"),
    ),
    socrata_config=presets.large_socrata_config(),
    cache_dir="./cache",
)

ts = to_timeseries(records, freq="W")
ts.plot(title="Weekly complaint volume")

panel = to_panel(records, freq="ME", geography="borough")
panel.xs("BROOKLYN")["Noise - Residential"].plot()

Data surface

Socrata: dataset erm2-nwe9 (NYC 311 Service Requests from 2010 onward; tens of millions of rows). Use presets.large_socrata_config() for bulk pagination (default 5,000 rows per HTTP request) and nyc311.io.cached_fetch to stream pages to CSV without holding the full history in memory.
Boundaries: borough, community district, council district, NTA, census tract, and ZCTA layers ship through nyc311.geographies (built on nyc-geo-toolkit).
Caching: pass cache_dir and optional refresh / max_cached_records to pipeline.fetch_service_requests or io.load_service_requests so repeated runs reuse deterministic CSV snapshots under cache_dir.

Quick links

Docs: Home, Getting Started, CLI Reference, SDK Guide, Examples, Architecture, Contributing, Releasing, Changelog

Example

from datetime import date
from pathlib import Path

from nyc311 import analysis, export, models, pipeline

records = pipeline.fetch_service_requests(
    filters=models.ServiceRequestFilter(
        start_date=date(2025, 1, 1),
        end_date=date(2025, 1, 31),
        geography=models.GeographyFilter("borough", models.BOROUGH_BROOKLYN),
        complaint_types=("Noise - Residential",),
    ),
    socrata_config=models.SocrataConfig(page_size=250, max_pages=1),
)

export.export_service_requests_csv(
    records,
    models.ExportTarget("csv", Path("brooklyn-noise-snapshot.csv")),
)

assignments = analysis.extract_topics(records, models.TopicQuery("Noise - Residential"))
summary = analysis.aggregate_by_geography(assignments, geography="community_district")
export.export_topic_table(
    summary,
    models.ExportTarget("csv", Path("brooklyn-noise-topics.csv")),
)

CLI equivalent:

nyc311 fetch \
  --output brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 250 \
  --max-pages 1

nyc311 topics \
  --source brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography community_district \
  --output brooklyn-noise-topics.csv

Live-data snapshot workflow:

nyc311 fetch \
  --output brooklyn-rodent-snapshot.csv \
  --complaint-type "Rodent" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 500 \
  --max-pages 1

Factor pipeline

nyc311.factors composes domain-specific metrics over geographic units:

from datetime import date

from nyc311.factors import (
    ComplaintVolumeFactor,
    EquityGapFactor,
    FactorContext,
    Pipeline,
    ResponseRateFactor,
    SpatialLagFactor,
    TopicConcentrationFactor,
)

contexts = [
    FactorContext(
        geography="community_district",
        geography_value=cd,
        complaints=tuple(complaints),
        time_window_start=date(2024, 1, 1),
        time_window_end=date(2024, 12, 31),
    )
    for cd, complaints in records_by_cd.items()
]

result = (
    Pipeline()
    .add(ComplaintVolumeFactor())
    .add(ResponseRateFactor())
    .add(TopicConcentrationFactor())
    .run(contexts)
)
df = result.to_dataframe()  # one row per CD, one column per factor

See the SDK guide for the matching temporal-panel, statistical-modeling, and bulk-download examples.

Data assumptions

load_service_requests() currently supports:

local CSV files
live Socrata loading via SocrataConfig

CSV inputs use these columns:

unique_key
created_date
complaint_type
descriptor
borough
community_district or community_board

resolution_description is optional and loaded when present. It is currently used by the resolution-gap and report-card helpers, while topic extraction remains descriptor-driven.

Public package surface

The public API is organized around explicit namespaces:

nyc311.models for dataclasses, constants, and configs
nyc311.io for CSV and Socrata loading
nyc311.analysis for topic extraction, coverage, gaps, and anomalies
nyc311.geographies for the 311-facing compatibility layer over nyc-geo-toolkit
nyc311.samples for packaged sample records and sample-aligned boundaries
nyc311.export for CSV, GeoJSON, and report exports
nyc311.pipeline for one-call workflow helpers
nyc311.dataframes for optional pandas conversions
nyc311.spatial for optional geopandas helpers
nyc311.plotting for optional plotting helpers
nyc311.presets for reusable filter and Socrata config builders
nyc311.factors for the composable factor pipeline and built-in domain factors (including SpatialLagFactor and EquityGapFactor)
nyc311.temporal for balanced panel datasets, treatment events, and inverse-distance spatial weights
nyc311.stats for ITS, PELT changepoints, STL, Moran's I / LISA, panel fixed/random-effects regressions, synthetic control, staggered DiD, event study, RDD, spatial lag/error, GWR, Oaxaca-Blinder, Theil, reporting-bias adjustment, BYM2, Hawkes, anomaly detection, and power analysis
nyc311.cli with the topics and fetch subcommands

Documentation

The hosted docs site is the canonical reference: nyc311.readthedocs.io.

If you are browsing in GitHub, the source docs live in docs/, including index.md, getting-started.md, cli.md, sdk.md, examples.md, api.md, architecture.md, and contributing.md.

Runnable examples live in examples/ as self-contained consumer projects.

Precious research case studies (real data, cited in CITATION.cff) under examples/case_studies/:

Rat Containerization -- Evaluates the 2024 NYC containerization mandate using 81K real rodent complaints, the factor pipeline, STL decomposition, Moran's I, Theil inequality, synthetic control, staggered DiD, event study, RDD, and power analysis across 70 community districts.
Resolution Equity -- Investigates whether resolution times vary by neighborhood demographics using 1M real 311 requests, two-way FE regression, Oaxaca-Blinder decomposition with ACS census data, spatial autocorrelation, ITS, and latent reporting-bias estimation.

factor-factory engine showcases (synthetic data, offline in seconds):

SDID multi-borough policy -- Synthetic Difference-in-Differences (Arkhangelsky et al. 2021, AER) over a 5-borough × 36-month simulated 311 intake rollout.
Mediation cascade (resolution) -- Four-way mediation decomposition (VanderWeele 2014, Epidemiology) of pilot → triage-time → resolution-rate.
factor-factory quickstart -- Minimal PanelDataset → factor_factory.tidy.Panel → engine → pandas in ~50 lines, without jellycell. Starting point for consumers who want the adapter without the tearsheet machinery.

For local preview:

make docs
make docs-build

Development

uv sync
uv sync --all-groups --all-extras
uv run --all-extras pytest -m "not integration"
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run mkdocs serve
uv run mkdocs build --strict
uv run python scripts/audit_public_api.py
uv run pytest -m "fetch and not integration"

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

blaise-ab

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

Apr 21, 2026

1.0.2

Apr 20, 2026

1.0.1

Apr 20, 2026

This version

1.0.0

Apr 20, 2026

0.3.0

Apr 12, 2026

0.2.8

Apr 7, 2026

0.2.7

Apr 3, 2026

0.2.6

Apr 2, 2026

0.2.5

Apr 2, 2026

0.2.4

Apr 2, 2026

0.2.3

Apr 2, 2026

0.2.2

Apr 2, 2026

0.2.1

Apr 1, 2026

0.2.0

Apr 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyc311-1.0.0.tar.gz (14.1 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nyc311-1.0.0-py3-none-any.whl (118.9 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file nyc311-1.0.0.tar.gz.

File metadata

Download URL: nyc311-1.0.0.tar.gz
Upload date: Apr 20, 2026
Size: 14.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nyc311-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1d3427b26fb058087cc2a0f6364779f4d257063de5b421c4ec1b882d3ddbc67c`
MD5	`8cb5c47c1d892fb20156bf9803e0211e`
BLAKE2b-256	`2f1567ce728e4b79e83ed18c594aa8fdd2a7b3f8bb3a02be8c9442ffe5e7402f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-1.0.0.tar.gz:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nyc311-1.0.0.tar.gz
- Subject digest: 1d3427b26fb058087cc2a0f6364779f4d257063de5b421c4ec1b882d3ddbc67c
- Sigstore transparency entry: 1341171764
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: random-walks/nyc311@fb3a13eb49dff6cea73143c36e812b265682bd25
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/random-walks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@fb3a13eb49dff6cea73143c36e812b265682bd25
- Trigger Event: release

File details

Details for the file nyc311-1.0.0-py3-none-any.whl.

File metadata

Download URL: nyc311-1.0.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 118.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nyc311-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2328739ffd700b5911003307452f5c69a3907f4733a388c2b205200d700dce9`
MD5	`d0e2c30b51dba459c19a2c66d2487779`
BLAKE2b-256	`f9383480da0aea739582c885d31da9916a436a66dbd9ee94fa02670446b6a435`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-1.0.0-py3-none-any.whl:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nyc311-1.0.0-py3-none-any.whl
- Subject digest: d2328739ffd700b5911003307452f5c69a3907f4733a388c2b205200d700dce9
- Sigstore transparency entry: 1341171798
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: random-walks/nyc311@fb3a13eb49dff6cea73143c36e812b265682bd25
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/random-walks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@fb3a13eb49dff6cea73143c36e812b265682bd25
- Trigger Event: release

nyc311 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nyc311

What this package does

Geography layer

factor-factory integration (v1.0.0)

Install

Why this exists

Core workflow

Supported topic extraction

Time series

Data surface

Quick links

Example

Factor pipeline

Data assumptions

Public package surface

Documentation

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance