Skip to main content

NLP pipeline for extracting topics, anomalies, and resolution gaps from NYC 311 complaint data.

Project description

nyc311

Actions Status Documentation Status PyPI version PyPI platforms

Python toolkit for building reproducible complaint-intelligence outputs from NYC 311 service-request data through both a thin CLI and a functional SDK.

Status

nyc311 is preparing its first public stable release in the 0.2 line with a complete first-pass toolkit for loading, analyzing, and exporting NYC 311 complaint data.

The 0.2 release line better matches the current scope than the older v0.1 foundation framing that the project started from.

Implemented in the 0.2 release line

  • load filtered NYC 311-style records from local CSV extracts or the live Socrata API
  • stage filtered live slices as reproducible local CSV snapshots
  • derive deterministic first-pass topic labels for supported complaint types
  • aggregate complaint topics by borough or community district
  • measure topic-rule coverage and summarize resolution gaps
  • score anomalies over aggregated topic summaries
  • export CSV tables, boundary-backed GeoJSON, and markdown report cards
  • run the workflow through both a thin CLI and a composable functional SDK

Install

Choose the dependency footprint that matches your workflow:

pip install nyc311

For the full turnkey experience:

pip install "nyc311[all]"

For pandas-backed conversion helpers:

pip install "nyc311[dataframes]"

For plotting and exploratory analysis without the geospatial stack:

pip install "nyc311[science]"

Why this exists

NYC 311 data is one of the richest public records of neighborhood quality-of-life complaints in the country, but much of the useful signal is locked inside short text fields such as complaint descriptors.

This project aims to turn those records into reusable outputs for civic analysis, journalism, and research while staying honest about what is truly implemented today.

Core workflow

The current 0.2 release line focuses on a deterministic, testable workflow:

  1. read a local CSV extract of NYC 311-style records or load a filtered slice from Socrata
  2. filter rows by date, geography, and complaint type
  3. assign a first-pass topic label using explicit keyword rules
  4. aggregate counts by borough or community district
  5. export the result as a CSV summary table or boundary-backed GeoJSON

Supported topic extraction

The current rules-based topic extractor is implemented only for:

  • Blocked Driveway
  • Illegal Parking
  • Noise - Residential
  • Rodent

This is intentionally described as first-pass topic extraction, not clustering or advanced NLP.

Quick links

Example

from datetime import date
from pathlib import Path

from nyc311 import analysis, export, models, pipeline

records = pipeline.fetch_service_requests(
    filters=models.ServiceRequestFilter(
        start_date=date(2025, 1, 1),
        end_date=date(2025, 1, 31),
        geography=models.GeographyFilter("borough", models.BOROUGH_BROOKLYN),
        complaint_types=("Noise - Residential",),
    ),
    socrata_config=models.SocrataConfig(page_size=250, max_pages=1),
)

export.export_service_requests_csv(
    records,
    models.ExportTarget("csv", Path("brooklyn-noise-snapshot.csv")),
)

assignments = analysis.extract_topics(records, models.TopicQuery("Noise - Residential"))
summary = analysis.aggregate_by_geography(assignments, geography="community_district")
export.export_topic_table(
    summary,
    models.ExportTarget("csv", Path("brooklyn-noise-topics.csv")),
)

CLI equivalent:

nyc311 fetch \
  --output brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 250 \
  --max-pages 1

nyc311 topics \
  --source brooklyn-noise-snapshot.csv \
  --complaint-type "Noise - Residential" \
  --geography community_district \
  --output brooklyn-noise-topics.csv

Live-data snapshot workflow:

nyc311 fetch \
  --output brooklyn-rodent-snapshot.csv \
  --complaint-type "Rodent" \
  --geography borough \
  --geography-value BROOKLYN \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --page-size 500 \
  --max-pages 1

Data assumptions

load_service_requests() currently supports:

  • local CSV files
  • live Socrata loading via SocrataConfig

CSV inputs use these columns:

  • unique_key
  • created_date
  • complaint_type
  • descriptor
  • borough
  • community_district or community_board

resolution_description is optional and loaded when present. It is currently used by the resolution-gap and report-card helpers, while topic extraction remains descriptor-driven.

Public package surface

The current public package surface is organized around explicit namespaces:

  • nyc311.models for dataclasses, constants, and configs
  • nyc311.io for CSV and Socrata loading
  • nyc311.analysis for topic extraction, coverage, gaps, and anomalies
  • nyc311.geographies for packaged boundary layers and geometry helpers
  • nyc311.samples for packaged sample records and sample-aligned boundaries
  • nyc311.export for CSV, GeoJSON, and report exports
  • nyc311.pipeline for one-call workflow helpers
  • nyc311.dataframes for optional pandas conversions
  • nyc311.spatial for optional geopandas helpers
  • nyc311.plotting for optional plotting helpers
  • nyc311.presets for reusable filter and Socrata config builders
  • nyc311.cli with the topics and fetch subcommands

Documentation

The hosted docs site is the canonical reference:

If you are browsing in GitHub, the docs source lives in docs/:

  • docs/index.md
  • docs/getting-started.md
  • docs/cli.md
  • docs/sdk.md
  • docs/examples.md
  • docs/api.md
  • docs/architecture.md
  • docs/contributing.md

Runnable examples live in examples/ as self-contained consumer projects.

For local preview:

make docs
make docs-build

Development

uv sync
uv sync --all-groups --all-extras
uv run --all-extras pytest -m "not integration"
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run mkdocs serve
uv run mkdocs build --strict
uv run python scripts/audit_public_api.py
uv run pytest -m "fetch and not integration"

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyc311-0.2.0.tar.gz (12.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyc311-0.2.0-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file nyc311-0.2.0.tar.gz.

File metadata

  • Download URL: nyc311-0.2.0.tar.gz
  • Upload date:
  • Size: 12.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nyc311-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1e51036a217687cde62ede6dee79c3de2fd11c88b2d9798db75439a749d47d01
MD5 6d50f752b37556f75f9b51d4cc47d41c
BLAKE2b-256 606eade05715c2af19e54a7c2222589135acab008573816b9c9f9d0bac091ced

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-0.2.0.tar.gz:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nyc311-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nyc311-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nyc311-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 36603ff125e9abdcb22cacf4d0dd3ab13d0c164dbb581512bfb43f7c158e9aa3
MD5 7d44e15b70b62efb9892a0ccdaf003a0
BLAKE2b-256 bcfe8740b81db07fbff2bf4ec1aad30bea0715a69a3560831275472a2210d49e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nyc311-0.2.0-py3-none-any.whl:

Publisher: cd.yml on random-walks/nyc311

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page