Skip to main content

Comprehensive CLI for the Korean National Assembly: bills, lifecycle, votes, ideal points, and bill texts

Project description

kna - Korean National Assembly CLI

PyPI

Comprehensive CLI and master database for the Korean National Assembly. Integrates 8 Open Assembly API endpoints into a single queryable interface covering six assembly terms (17th-22nd, 2004-2026).

Installation

Step 1: Install the CLI

pip install kna

If you see a PATH warning like:

WARNING: The script kna is installed in '/Users/you/Library/Python/3.x/bin' which is not on PATH.

Add it to your shell:

# Find where pip installed it
python3 -c "import site; print(site.getusersitepackages().replace('lib/python/site-packages','bin'))"

# Add to PATH (adjust the path from above)
echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Or use pipx install kna which handles PATH automatically.

Step 2: Get the data

The CLI needs the parquet data files. The repository uses Git LFS for large files.

# Install Git LFS first (required, one-time)
brew install git-lfs    # macOS
# or: sudo apt install git-lfs    # Ubuntu/Debian

git lfs install         # one-time setup

# Clone with data
git clone https://github.com/kyusik-yang/kna.git
cd kna

If you already cloned without LFS, the parquet files will be tiny pointer files and kna info will fail. Fix it:

cd kna
git lfs install
git lfs pull            # downloads actual data files (~500MB)

Step 3: Point the CLI to the data

# Set the environment variable
export KBL_DATA=~/kna/data/processed

# Make it permanent
echo 'export KBL_DATA="$HOME/kna/data/processed"' >> ~/.zshrc
source ~/.zshrc

# Verify
kna info

Troubleshooting

Problem Cause Fix
kna: command not found pip bin dir not in PATH Add pip's bin directory to ~/.zshrc PATH (see Step 1)
ArrowInvalid: Parquet magic bytes not found Git LFS not installed; parquet files are pointer files brew install git-lfs && git lfs install && git lfs pull
Cannot find data directory KBL_DATA not set and not running from repo root export KBL_DATA=~/kna/data/processed
No master file for Nth assembly Data files missing for that assembly Check ls $KBL_DATA/master_bills_*.parquet
ERROR: requires Python >=3.9 Python too old python3 --version - need 3.9+
ModuleNotFoundError: No module named 'kna' Installed to wrong Python pip3 install --user kna or use the same python3 -m pip install kna

Requirements

  • Python 3.9+
  • Git LFS (for data files)
  • ~500MB disk space (data files)

Interactive Explorer | Uijeong Jido 의정지도 | Tutorial | PyPI

Key Statistics

Total Bills 110,778 (17-22nd, full lifecycle)
Roll Call Votes 2,425,113 member-level records
DW-NOMINATE 936 legislator-terms (20-22nd, cross-assembly aligned)
Committee Meetings 572,127 records
Members 1,933 member-terms (17-22nd, party/district/committee)
Asset Disclosures 2,928 member-year rows (19-22nd, wealth panel)
Bill Texts 60,925 propose-reason texts (20-22nd)
Date Range 2004 - 2026

CLI Usage

# Database overview
kna info

# Search bills by title
kna search "인공지능" --assembly 22 --status enacted

# Full-text search in propose-reason texts
kna text "기후변화" --assembly 22

# Bill lifecycle timeline (proposal → promulgation)
kna show 2217673

# Legislator profile with DW-NOMINATE ideal point
kna legislator 이재명 --assembly 22

# Legislative funnel
kna stats funnel --assembly 22

# Passage rate trend across assemblies
kna stats passage-rate

# Export to CSV or Parquet
kna export health.csv --assembly 22 --committee 보건복지 --status enacted

Python API

from kna.data import BillDB

db = BillDB()

# Load bills (with column pruning)
bills = db.bills(assembly=22, columns=["bill_id", "bill_nm", "status", "ppsl_dt"])

# Ideal points (sign-flipped: negative = liberal, positive = conservative)
ip = db.ideal_points()

# Roll call votes
votes = db.roll_calls(assembly=22)

# Bill texts
texts = db.bill_texts()

# Member metadata (party, district, committee, gender)
members = db.members(assembly=22)  # 306 members

# Asset disclosures (net_worth, real estate, stocks, etc. in 천원)
assets = db.assets(assembly=22)  # member-year wealth panel

# Committee meetings
meetings = db.committee_meetings(assembly=22)

R

library(arrow)
library(dplyr)

master <- read_parquet("data/processed/master_bills_22.parquet")
laws <- master %>% filter(bill_kind == "법률안")

laws %>%
  group_by(ppsr_kind) %>%
  summarise(total = n(), enacted = sum(enacted)) %>%
  mutate(rate = enacted / total * 100)

Per-Assembly Breakdown

Assembly Bills Enacted Rate Committee Mtgs
17th (2004-08) 8,369 2,547 30.4% 20,044
18th (2008-12) 14,762 2,930 19.8% 57,003
19th (2012-16) 18,735 3,414 18.2% 78,115
20th (2016-20) 24,996 3,794 15.2% 107,933
21st (2020-24) 26,711 3,554 13.3% 200,283
22nd (2024-) 17,205 1,399 8.1% 108,749

Data Structure

master_bills (1 row = 1 bill, 49-55 columns)
├── Identifiers: bill_id, bill_no, age, bill_kind, bill_nm
├── Proposer: ppsr_kind, rst_proposer, rst_mona_cd, publ_mona_cd
├── Lifecycle: ppsl_dt → committee_dt → cmt_proc_dt → law_proc_dt → rgs_rsln_dt → prom_dt
├── Results: status, passed, enacted, proc_rslt
├── Votes: vote_yes, vote_no, vote_abstain
└── Derived: days_to_proc, days_to_committee

roll_calls_all (2.4M rows, member-level)
└── term, member_name, member_id, party, vote, bill_id, date

dw_ideal_points_20_22 (936 rows)
└── member_id, member_name, term, party, aligned, party_bloc

members_{age} (1,933 member-terms across 17-22nd)
└── mona_cd, member_name, party, district, committee, sex, birth_date, election_type, reelection

assets_wealth_panel (2,928 member-year rows, 19-22nd)
└── mona_cd, wealth_year, assembly, net_worth, total_realestate, total_stocks, total_deposits, ...

bill_texts_linked (60K rows)
└── BILL_ID, propose_reason, scrape_status

Documentation

Resource Link
Tutorial kyusik-yang.github.io/assembly-tutorial
Codebook CODEBOOK.md
Data Availability DATA_AVAILABILITY.md
Interactive Explorer kyusik-yang.github.io/kna
Ideology Map Uijeong Jido 의정지도

Companion Data

kna vs open-assembly-mcp: kna is an offline master database for statistical analysis in Python/R. For real-time lookups and exploratory queries via Claude, use open-assembly-mcp.

Dataset Description
kr-hearings-data 9.9M speech-level records from 16,830 hearings (2000-2025)
open-assembly-mcp MCP server for real-time API queries via Claude
assembly-explorer Interactive Streamlit web app

Reproducing the Data

Data files are large and not included in the repo. To regenerate:

# Set API key (free from https://open.assembly.go.kr)
export ASSEMBLY_API_KEY=your_key

# Collect and build
python3 collect.py phase1 && python3 collect.py phase2
python3 integrate.py
python3 build_multi_assembly.py lite && python3 build_multi_assembly.py batch

# Roll call votes
python3 collect_roll_calls.py && python3 consolidate_votes.py

# Member metadata
python3 collect_members.py

# Rebuild interactive site
python3 build_site.py && python3 build_voteview.py

License

Data sourced from public government APIs. Code is MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kna-0.4.1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file kna-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: kna-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for kna-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7afb6eedc5834b7e9bffb785c2da691d825a8ca98ee14201a28c428152957312
MD5 66c75f94eeabf82b36f6b8c9482c5e0e
BLAKE2b-256 442fbea8fa9c9f58533d6636a89945fe541e2c56a3c68807572ea1c0cc0a7c5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page