Chicago open data used to validate CEP SNFEI.
Project description
civic-data-identity-us-il
This repository hosts raw and processed Illinois datasets used for validating entity identity, canonicalization, and adapter behavior in the Civic Interconnect project.
The primary dataset is the City of Chicago Contracts Dataset, which provides 182,000+ procurement records used to test:
- SNFEI identity stability across messy vendor names
- EFS v1 name normalization
- Adapter correctness for procurement verticals
- Cross-record entity consolidation (e.g., same vendor appearing many times)
- Address normalization (Chicago-specific conventions)
- Exchange construction (buyer → seller → contract mapping)
This repository contains both the full raw dataset and curated subsets designed as identity test fixtures.
Repository
data/raw/
Unmodified raw datasets retrieved directly from official public sources.
Contains large files (up to ~50 MB).
These files are not stored in the main Civic Interconnect repo to avoid
repository bloat.
data/identity/
Curated, size-limited datasets (~5k–20k rows) used for:
- testing SNFEI convergence
- evaluating string normalization
- training adapters on realistic noise patterns
These subsets are suitable for inclusion as examples in the main CEP spec repo.
docs/provenance/
Contains PROV-YAML metadata files describing dataset lineage, publishers, and retrieval activities. These files follow W3C PROV-DM conventions.
scripts/
Utility scripts for extracting and shaping subsets from raw data. The provided Python tool generates deterministic, identity-rich samples.
Data Source
City of Chicago – Contracts Dataset
- URL: https://data.cityofchicago.org/Administration-Finance/Contracts/rsxa-ify5
- Publisher: City of Chicago
- License: Public Domain
- Fields include procurement descriptions, award amounts, departments, vendor names, addresses, and contract timeline fields.
- Used to test:
- identity resolution
- canonicalization
- adapters for procurement verticals
Full provenance is provided in
docs/provenance/chicago_contracts.prov.yaml.
Citation
If you use this repository, please cite both:
- This repository (see
CITATION.cff) - The original City of Chicago dataset (automatically included in references)
Relationship to civic-interconnect
This repository serves as a data companion to the main specification and implementation in:
https://github.com/civic-interconnect/civic-interconnect
Only smaller derived files (e.g., 5k–20k row identity samples) are copied into the main repository under:
examples/identity/us_il_chicago/
The separation keeps CEP maintainable and free of large artifacts while preserving full reproducibility.
License
Raw public datasets retain their original license (Public Domain for Chicago
Open Data).
All derived outputs, scripts, and documentation in this repository are licensed
under the MIT License unless otherwise noted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file civic_data_identity_us_il-0.1.1.tar.gz.
File metadata
- Download URL: civic_data_identity_us_il-0.1.1.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35ef29ff1b08aeca523e50cccb4b5f84e7830eb831296270132dfda70438a915
|
|
| MD5 |
29044a9f679162d6b8f4cf839f2cc2a7
|
|
| BLAKE2b-256 |
fa85be320f915ddcabe97d0ecc5679204a7b2993c98dadceb4b4ff7d83adf783
|
Provenance
The following attestation bundles were made for civic_data_identity_us_il-0.1.1.tar.gz:
Publisher:
release.yml on civic-interconnect/civic-data-identity-us-il
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
civic_data_identity_us_il-0.1.1.tar.gz -
Subject digest:
35ef29ff1b08aeca523e50cccb4b5f84e7830eb831296270132dfda70438a915 - Sigstore transparency entry: 760335075
- Sigstore integration time:
-
Permalink:
civic-interconnect/civic-data-identity-us-il@762fa747d0571c9847f728f96dfc8b26c8e5b038 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/civic-interconnect
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@762fa747d0571c9847f728f96dfc8b26c8e5b038 -
Trigger Event:
push
-
Statement type:
File details
Details for the file civic_data_identity_us_il-0.1.1-py3-none-any.whl.
File metadata
- Download URL: civic_data_identity_us_il-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
849bd8cbeacaaf477d6aafd50ae374dca9d0af5b3917811c3831aa7a31c9210a
|
|
| MD5 |
69f27cdf4c9a999f1ed5900e3f5e24f1
|
|
| BLAKE2b-256 |
de887707f08f4301653524b7795d9b846c82dfa75c5d13d69088e406302dbf26
|
Provenance
The following attestation bundles were made for civic_data_identity_us_il-0.1.1-py3-none-any.whl:
Publisher:
release.yml on civic-interconnect/civic-data-identity-us-il
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
civic_data_identity_us_il-0.1.1-py3-none-any.whl -
Subject digest:
849bd8cbeacaaf477d6aafd50ae374dca9d0af5b3917811c3831aa7a31c9210a - Sigstore transparency entry: 760335076
- Sigstore integration time:
-
Permalink:
civic-interconnect/civic-data-identity-us-il@762fa747d0571c9847f728f96dfc8b26c8e5b038 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/civic-interconnect
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@762fa747d0571c9847f728f96dfc8b26c8e5b038 -
Trigger Event:
push
-
Statement type: