Skip to main content

Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.

Project description

nyc311

nyc311 — NYC 311 complaint analysis

Actions Status Documentation Status PyPI version PyPI platforms

Python toolkit for reproducible NYC 311 complaint analysis via a typed SDK and CLI.

Authored by Blaise Albis-Burdige.

What this package does

nyc311 is the stable 0.2.x toolkit for turning NYC 311 service-request data into reproducible complaint-intelligence outputs.

It pairs a thin CLI with a typed SDK so the same workflow can run in batch jobs, scripts, notebooks, and consumer packages.

The current release line provides:

  • load filtered NYC 311-style records from local CSV extracts or the live Socrata API
  • derive deterministic first-pass topic labels for supported complaint types
  • aggregate complaint topics by borough or community district
  • measure topic-rule coverage and summarize resolution gaps
  • score anomalies over aggregated topic summaries
  • export CSV tables, boundary-backed GeoJSON, and markdown report cards
  • expose the workflow through both a thin CLI and a composable functional SDK

Geography layer

nyc311.geographies is the 311-facing compatibility layer over nyc-geo-toolkit.

Use nyc311 when you want packaged NYC boundaries inside the 311 workflow. Use nyc-geo-toolkit directly when you only need the generic geography assets, normalization helpers, and boundary loaders.

Install

Choose the dependency footprint that matches your workflow:

pip install nyc311

For the full turnkey experience:

pip install "nyc311[all]"

For pandas-backed conversion helpers:

pip install "nyc311[dataframes]"

For geopandas-backed geography and spatial helpers:

pip install "nyc311[spatial]"

For plotting helpers:

pip install "nyc311[plotting]"

For plotting and exploratory analysis without the geospatial stack:

pip install "nyc311[science]"

Why this exists

NYC 311 data is one of the richest public records of neighborhood quality-of-life complaints in the country, but much of the useful signal is locked inside short text fields such as complaint descriptors.

nyc311 turns those records into reusable outputs for civic analysis, journalism, and research through an explicit, testable workflow.

Core workflow

The current stable workflow is:

  1. load records from a local CSV extract or a filtered Socrata slice
  2. filter by date, geography, and complaint type
  3. assign a first-pass topic label using explicit keyword rules
  4. aggregate counts by borough or community district
  5. export a CSV summary table or boundary-backed GeoJSON artifact

Supported topic extraction

The current rules-based topic extractor is implemented for the complaint types returned by nyc311.models.supported_topic_queries() (nine high-volume types including noise, rodents, street condition, heat/hot water, sanitary, and abandoned vehicles).

This is intentionally described as first-pass topic extraction, not clustering or advanced NLP.

Time series

Use nyc311.dataframes helpers for DatetimeIndex complaint counts and panel layouts:

from nyc311 import pipeline, presets
from nyc311.dataframes import to_timeseries, to_panel

records = pipeline.fetch_service_requests(
    filters=presets.brooklyn_borough_filter(
        start_date="2024-01-01",
        end_date="2024-12-31",
        complaint_types=("Noise - Residential", "Rodent"),
    ),
    socrata_config=presets.large_socrata_config(),
    cache_dir="./cache",
)

ts = to_timeseries(records, freq="W")
ts.plot(title="Weekly complaint volume")

panel = to_panel(records, freq="ME", geography="borough")
panel.xs("BROOKLYN")["Noise - Residential"].plot()

Data surface

  • Socrata: dataset erm2-nwe9 (NYC 311 Service Requests from 2010 onward; tens of millions of rows). Use presets.large_socrata_config() for bulk pagination (default 5,000 rows per HTTP request) and nyc311.io.cached_fetch to stream pages to CSV without holding the full history in memory.
  • Boundaries: borough, community district, council district, NTA, census tract, and ZCTA layers ship through nyc311.geographies (built on nyc-geo-toolkit).
  • Caching: pass cache_dir and optional refresh / max_cached_records to pipeline.fetch_service_requests or io.load_service_requests so repeated runs reuse deterministic CSV snapshots under cache_dir.

Quick links

Docs: Home, Getting Started, CLI Reference, SDK Guide, Examples, Architecture, Contributing, Releasing, Changelog

Example

from datetime import date
from pathlib import Path

from nyc311 import analysis, export, models, pipeline

records = pipeline.fetch_service_requests(
    filters=models.ServiceRequestFilter(
        start_date=date(2025, 1, 1),
        end_date=date(2025, 1, 31),
        geography=models.GeographyFilter("borough", models.BOROUGH_BROOKLYN),
        complaint_types=("Noise - Residential",),
    ),
    socrata_config=models.SocrataConfig(page_size=250, max_pages=1),
)

export.export_service_requests_csv(
    records,
    models.ExportTarget("csv", Path("brooklyn-noise-snapshot.csv")),
)

assignments = analysis.extract_topics(records, models.TopicQuery("Noise - Residential"))
summary = analysis.aggregate_by_geography(assignments, geography="community_district")
export.export_topic_table(
    summary,
    models.ExportTarget("csv", Path("brooklyn-noise-topics.csv")),
)

CLI equivalent:

nyc311 fetch \
  --output brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 250 \
  --max-pages 1

nyc311 topics \
  --source brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography community_district \
  --output brooklyn-noise-topics.csv

Live-data snapshot workflow:

nyc311 fetch \
  --output brooklyn-rodent-snapshot.csv \
  --complaint-type "Rodent" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 500 \
  --max-pages 1

Data assumptions

load_service_requests() currently supports:

  • local CSV files
  • live Socrata loading via SocrataConfig

CSV inputs use these columns:

  • unique_key
  • created_date
  • complaint_type
  • descriptor
  • borough
  • community_district or community_board

resolution_description is optional and loaded when present. It is currently used by the resolution-gap and report-card helpers, while topic extraction remains descriptor-driven.

Public package surface

The public API is organized around explicit namespaces:

  • nyc311.models for dataclasses, constants, and configs
  • nyc311.io for CSV and Socrata loading
  • nyc311.analysis for topic extraction, coverage, gaps, and anomalies
  • nyc311.geographies for the 311-facing compatibility layer over nyc-geo-toolkit
  • nyc311.samples for packaged sample records and sample-aligned boundaries
  • nyc311.export for CSV, GeoJSON, and report exports
  • nyc311.pipeline for one-call workflow helpers
  • nyc311.dataframes for optional pandas conversions
  • nyc311.spatial for optional geopandas helpers
  • nyc311.plotting for optional plotting helpers
  • nyc311.presets for reusable filter and Socrata config builders
  • nyc311.cli with the topics and fetch subcommands

Documentation

The hosted docs site is the canonical reference: nyc311.readthedocs.io.

If you are browsing in GitHub, the source docs live in docs/, including index.md, getting-started.md, cli.md, sdk.md, examples.md, api.md, architecture.md, and contributing.md.

Runnable examples live in examples/ as self-contained consumer projects.

For local preview:

make docs
make docs-build

Development

uv sync
uv sync --all-groups --all-extras
uv run --all-extras pytest -m "not integration"
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run mkdocs serve
uv run mkdocs build --strict
uv run python scripts/audit_public_api.py
uv run pytest -m "fetch and not integration"

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyc311-0.2.8.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyc311-0.2.8-py3-none-any.whl (60.7 kB view details)

Uploaded Python 3

File details

Details for the file nyc311-0.2.8.tar.gz.

File metadata

  • Download URL: nyc311-0.2.8.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nyc311-0.2.8.tar.gz
Algorithm Hash digest
SHA256 96d186299a2ec7d22ccad903fcf0be83a222525bf9c5176c06a35ff1b6f8e8a4
MD5 bfcb3c709638b50c3604abceca32eb00
BLAKE2b-256 e406824de004ae14b7386c33450ce05b8b5c257e3ac1d97db2542997b85a02fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-0.2.8.tar.gz:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nyc311-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: nyc311-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nyc311-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c6670a9f06187be01ddeca8ea8a7db6d54b38703ee044b3353a16f4cce3fc6bf
MD5 99982d31fb97b53390026fe7e54b2f9d
BLAKE2b-256 5834a3bb65ec5a05d7f4b6ad5b5bf6a335bfd3bf5006d64b808b2d158c60fca5

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-0.2.8-py3-none-any.whl:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page