Collect, index, and search U.S. Congress data — Congressional Record, Members, Bills, Votes — via api.congress.gov.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Concord

A pipeline for collecting U.S. Congress data — currently the daily Congressional Record (proceedings) and the directory of Members — via api.congress.gov. Stores everything locally as JSON Lines + SQLite, with a FastAPI search demo on top.

Distributed on PyPI as congress-concord; imported in Python as concord (the bare concord name on PyPI was already taken).

Install

pip install congress-concord

Requires Python 3.12+. The install is batteries-included — every concord subcommand (scrape, load, index, run, serve) works out of the box.

Quick start

export CONGRESS_API_KEY=...  # free key from https://api.data.gov/signup/
export OPENAI_API_KEY=...    # required for proceedings indexing (semantic search)

# Proceedings — one day's articles, end-to-end:
concord run proceedings --from 2026-05-22 --to 2026-05-22

# Members — current Congress, end-to-end:
concord run members --congresses 119

# Serve the web demo:
concord serve

To work on Concord itself (rather than just use it), clone the repo and uv sync instead:

git clone https://github.com/johnmarcampbell/concord
cd concord
uv sync
uv run concord run proceedings --from 2026-05-22 --to 2026-05-22

Output for run proceedings:

→ Stage 0: scrape
Wrote 20 new proceedings to data/proceedings.jsonl (skipped 0 already present)
→ Stage 1: load
Loaded 20 new proceedings into data/proceedings.db (skipped 0 already present)
→ Stage 2: index
Indexed: chunked 20 new proceedings (98 new chunks, …); embedded 98 new chunks
✓ Done.

Re-running any command is a no-op — already-stored records are detected by their natural key (granule_id for proceedings, bioguide_id for members) and skipped. Kill the process at any point and the next run resumes.

Getting an API key

api.congress.gov requires a free key from api.data.gov. Sign up at https://api.data.gov/signup/ — the key arrives by email immediately. Rate limit is 5,000 requests per hour.

OPENAI_API_KEY is only needed for concord index proceedings (chunk embeddings) and concord serve (query embedding). Member name search uses FTS5 only, no embeddings.

Pass keys via environment variables; there are no --api-key flags.

CLI

Concord follows a <stage> <entity> shape — every entity type goes through Stage 0 (scrape), Stage 1 (load), Stage 2 (index), and there's a run that chains all three.

Command	What it does
`concord scrape proceedings --from YYYY-MM-DD --to YYYY-MM-DD`	Stage 0 — fetch articles, write JSONL.
`concord load proceedings`	Stage 1 — mirror the JSONL into the `proceedings` SQLite table.
`concord index proceedings`	Stage 2 — chunk + embed every proceeding into FTS5 and `sqlite-vec`.
`concord run proceedings --from … --to …`	All three back-to-back.
`concord scrape members --congresses 117,118,119`	Stage 0 — snapshot members of those Congresses.
`concord load members`	Stage 1 — project the latest snapshot per Bioguide ID into `members` + `member_terms`.
`concord index members`	Stage 2 — populate the `members_fts` FTS5 table.
`concord run members --congresses …`	All three back-to-back.
`concord serve`	Run the FastAPI search demo via uvicorn.

Every command supports --help for its full flag list. Stage commands accept --progress / --no-progress (progress is on by default and overwrites itself in place on a TTY) and store files default to ./data/.

Entities

Proceedings

One Proceeding record per article in the daily Congressional Record. Written one-per-line to data/proceedings.jsonl.

Field	Type	Description
`issue_date`	date (`YYYY-MM-DD`)	The day the issue was published.
`congress`	int	Which Congress (e.g. 119).
`session`	int (1 or 2)	First or second session of that Congress.
`volume`	int	Daily Record volume number.
`issue_number`	int	Issue number within the volume.
`update_date`	datetime	When `api.congress.gov` last revised this issue.
`section`	string	`Senate Section`, `House Section`, `Extensions of Remarks Section`, or `Daily Digest`.
`title`	string	Article title from the API.
`start_page`, `end_page`	string	Page range, e.g. `D551`–`D552`.
`text_url`	URL	Formatted-text source URL on congress.gov.
`pdf_url`	URL	PDF source URL.
`granule_id`	string	Stable identifier, e.g. `CREC-2026-05-22-pt1-PgD551-6`. Used for dedup.
`text`	string	The full plain text of the proceeding.
`fetched_at`	datetime	When Concord retrieved this article.

Members

A Member is a person who has served in Congress, identified by Bioguide ID. Each fetch appends a snapshot envelope to data/members.jsonl (per ADR 0006):

{
  "fetched_at": "2026-05-25T14:02:11+00:00",
  "key": {"bioguide_id": "S000033"},
  "payload": { "...raw /v3/member payload..." }
}

The Stage 1 loader keeps the latest snapshot per Bioguide ID and projects it into two SQLite tables:

members — identity fields that don't change across a career (name, birth year, photo URL).
member_terms — one row per (bioguide_id, congress, chamber). Carries party, state, district (House only), and start_date/end_date. A Member who switched parties or chambers between Congresses has multiple Term rows with the historical values intact.

Member name search uses an FTS5 index (members_fts) over the direct and inverted name forms. No embeddings — BM25 + porter stemming is the right tool for short proper nouns.

See CONTEXT.md for the full vocabulary and docs/plans/phase-1-members.md for the design rationale.

Web demo (`concord serve`)

Single-process FastAPI + Jinja2 + HTMX, reading the same SQLite file the pipeline writes. Routes:

GET / — landing page with a search box.
GET /search?q=… — federated search. Renders Members and Proceedings in two grouped sections; checkboxes above the results suppress either independently.
GET /members — browse all currently-serving Members with chamber/party filters.
GET /members/{bioguide_id} — Member profile: photo, current role, biography, term history.
GET /proceedings/{granule_id} — full text of one proceeding.

By default concord serve binds to 127.0.0.1:8000 for use behind a reverse proxy.

Backfill

The Daily Congressional Record is available via the API from 1995 onward. (Older material lives under the Bound Congressional Record endpoint with a multi-year publication lag, and is deliberately out of scope for the current rebuild — see docs/rebuild-plan.md.)

A full 1995-to-present proceedings backfill is roughly:

~5,800 issues to enumerate (≈24 paginated list calls at the API's 250-per-page max)
~5,800 articles-list calls, one per in-range issue
~290,000 text fetches to congress.gov (these don't count against the API rate limit)

In practice that's several hours, network-bound. Recommended pattern:

tmux new -s concord
export CONGRESS_API_KEY=...
uv run concord scrape proceedings --from 1995-01-01 --to 2026-12-31
# detach: Ctrl+b d

The JSONL file is safe to tail or wc -l while the pull is in progress. Killing the process (Ctrl+C, OOM, machine reboot) loses at worst the single in-flight record; the next invocation resumes via the dedup index built from the file on disk.

Members are much smaller — the last three Congresses fit in a single concord run members --congresses 117,118,119 in under a minute.

Architecture

Stage 0 (scrape) and Stage 1 (load) are parallel per entity type; Stage 2 (index) and the web layer are shared. See ADR 0007 for the rationale.

src/concord/
  api.py               # typed wrapper for api.congress.gov
  text.py              # fetch_text(url, client) — plain text from <pre>-wrapped HTML
  models.py            # Pydantic: Issue, Article, Proceeding, Member, Term, MemberSnapshot
  chunking.py          # chunk(text) -> Chunk[] for Stage 2 indexing
  embedding.py         # OpenAI Embedder wrapper
  scraper/
    proceedings.py     # Stage 0 — congressional record articles
    members.py         # Stage 0 — /member/congress/{n}
  pipeline/
    load_proceedings.py    # Stage 1 — JSONL -> proceedings table
    load_members.py        # Stage 1 — snapshot JSONL -> members + member_terms
    index_proceedings.py   # Stage 2 — chunks + FTS5 + vector embeddings
    index_members.py       # Stage 2 — members_fts
  storage/
    base.py            # Storage Protocol
    jsonl.py           # raw-store backend for Proceedings
    sqlite.py          # derived store — all entities, all indexes
  web/
    app.py, search.py  # FastAPI routes + federated query layer
    templates/         # Jinja2 + HTMX
  cli.py               # typer entry point

See docs/rebuild-plan.md for the rebuild rationale, docs/plans/ for per-phase plans, and docs/adr/ for the design decisions.

Development

uv sync
uv run pre-commit install   # one-time, wires up the local commit hook
uv run ruff check
uv run ruff format --check
uv run mypy src
uv run pytest

The pre-commit hook runs ruff format and ruff check --fix on every commit; CI runs all four checks above plus pytest.

Versioning and API stability

Concord is below 1.0 and is not committing to a stable Python API yet. What semver does track for congress-concord releases:

concord <subcommand> shape, flag names, exit codes, and the format of the success-summary lines printed to stdout
The on-disk JSONL and SQLite formats that the CLI produces (other tools may read these)

What semver does not track (yet):

Python imports. from concord.storage.sqlite import ... and similar internal imports may move between minor versions as the codebase refactors. Build CLI workflows on top of concord, not Python integrations, until 1.0.

See ADR 0014 for the reasoning. Maintainers: docs/releases.md is the recipe for cutting a release.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

johnmarcampbell

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.7.1

Jun 12, 2026

0.7.0

Jun 8, 2026

0.6.0

Jun 1, 2026

0.4.0

Jun 1, 2026

0.3.0

May 28, 2026

0.2.1

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

congress_concord-0.7.1.tar.gz (722.7 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

congress_concord-0.7.1-py3-none-any.whl (226.5 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file congress_concord-0.7.1.tar.gz.

File metadata

Download URL: congress_concord-0.7.1.tar.gz
Upload date: Jun 12, 2026
Size: 722.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for congress_concord-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`120a4351a4940057cde13255cd13e9ffc9a2348968cfc1fcf534cd2bb5d2ae54`
MD5	`d6b02c53d7c32752700b2c1a28a6a3eb`
BLAKE2b-256	`93a8355dbd898e1a289d7c4918ae3364b00504d7fd9a7785cb558a91fab403f8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for congress_concord-0.7.1.tar.gz:

Publisher: release.yml on johnmarcampbell/concord

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: congress_concord-0.7.1.tar.gz
- Subject digest: 120a4351a4940057cde13255cd13e9ffc9a2348968cfc1fcf534cd2bb5d2ae54
- Sigstore transparency entry: 1797143509
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: johnmarcampbell/concord@c5f027510e12bcdf7143cb5825e69dcf276a55a2
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/johnmarcampbell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5f027510e12bcdf7143cb5825e69dcf276a55a2
- Trigger Event: release

File details

Details for the file congress_concord-0.7.1-py3-none-any.whl.

File metadata

Download URL: congress_concord-0.7.1-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 226.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for congress_concord-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2266eb160c9290a0eadb4ed214856f4df69f92e2bef5f8174ebc57ec502f1b63`
MD5	`7b3705cb3637412f091b2e5416453d0b`
BLAKE2b-256	`7f4e2aa7edb9225e4df6d33720bb5f17a3578866294c961a6822f48f6a0feb39`

See more details on using hashes here.

Provenance

The following attestation bundles were made for congress_concord-0.7.1-py3-none-any.whl:

Publisher: release.yml on johnmarcampbell/concord

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: congress_concord-0.7.1-py3-none-any.whl
- Subject digest: 2266eb160c9290a0eadb4ed214856f4df69f92e2bef5f8174ebc57ec502f1b63
- Sigstore transparency entry: 1797144375
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: johnmarcampbell/concord@c5f027510e12bcdf7143cb5825e69dcf276a55a2
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/johnmarcampbell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5f027510e12bcdf7143cb5825e69dcf276a55a2
- Trigger Event: release

congress-concord 0.7.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Concord

Install

Quick start

Getting an API key

CLI

Entities

Proceedings

Members

Web demo (concord serve)

Backfill

Architecture

Development

Versioning and API stability

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Web demo (`concord serve`)