Comprehensive CLI for the Korean National Assembly: bills, lifecycle, votes, ideal points, and bill texts
Project description
Korean Bill Lifecycle Database
대한민국 국회 법안의 전 생애주기를 추적하는 마스터 데이터베이스.
열린국회정보 Open API 8종을 BILL_ID 기준으로 결합하여, 17대(2004)부터 22대(2024-)까지 약 111,000건의 법안에 대한 발의-심사-표결-공포 과정을 단일 테이블로 제공합니다.
Interactive Explorer | Codebook | Data Availability
Uijeong Jido 의정지도 - DW-NOMINATE ideal point explorer
Key Statistics
| Total Bills | 110,779 (17-22대, full lifecycle) |
| Committee Meetings | 572,127 records |
| Roll Call Votes | 2,425,113 member-level votes |
| DW-NOMINATE | 936 legislator-terms (20-22대) |
| Date Range | 2004 - 2026 |
Per-Assembly Breakdown
| Assembly | Bills | Enacted | Rate | Committee Mtgs |
|---|---|---|---|---|
| 17th (2004-08) | 8,369 | 2,547 | 30.4% | 20,044 |
| 18th (2008-12) | 14,762 | 2,930 | 19.8% | 57,003 |
| 19th (2012-16) | 18,735 | 3,414 | 18.2% | 78,115 |
| 20th (2016-20) | 24,996 | 3,795 | 15.2% | 107,933 |
| 21st (2020-24) | 26,711 | 3,554 | 13.3% | 200,283 |
| 22nd (2024-) | 17,205 | 1,399 | 8.1% | 108,749 |
Data Structure
master_bills (1 row = 1 bill)
├── Identifiers: bill_id, bill_no, age, bill_kind, bill_nm
├── Proposer: ppsr_kind, rst_proposer, rst_mona_cd, publ_mona_cd
├── Lifecycle: ppsl_dt → committee_dt → cmt_proc_dt → law_proc_dt → rgs_rsln_dt → prom_dt
├── Results: status, passed, enacted, proc_rslt
├── Votes: vote_yes, vote_no, vote_abstain
└── Derived: days_to_proc, days_to_committee
committee_meetings (1:N per bill)
└── bill_id, conf_name, conf_dt, conf_result
judiciary_meetings (1:N per bill)
└── bill_id, conf_name, conf_dt, conf_result
Quick Start
Python
import pandas as pd
master = pd.read_parquet("data/processed/master_bills_22.parquet")
laws = master[master["bill_kind"] == "법률안"]
# Passage rate by proposer type
laws.groupby("ppsr_kind").agg(
total=("bill_id", "count"),
enacted=("enacted", "sum"),
).assign(rate=lambda x: x["enacted"] / x["total"] * 100)
R
library(arrow)
library(dplyr)
master <- read_parquet("data/processed/master_bills_22.parquet")
laws <- master %>% filter(bill_kind == "법률안")
laws %>%
group_by(ppsr_kind) %>%
summarise(total = n(), enacted = sum(enacted)) %>%
mutate(rate = enacted / total * 100)
Reproducing the Data
Data files are not included in the repo (too large). To regenerate:
# 1. Set up
pip install pandas requests pyarrow plotly
# 2. Collect 22nd Assembly (Phase 1: ~10 min, Phase 2: ~15 hours)
python3 collect.py phase1
python3 collect.py phase2
# 3. Build master DB
python3 integrate.py
# 4. Build 17-21대 lite masters (uses existing BILLRCP/BILLJUDGE + external data)
python3 build_multi_assembly.py lite
# 5. Collect remaining batch data for 17-21대
python3 build_multi_assembly.py batch
# 6. Phase 2 for older assemblies (sequential, ~39 hours total)
python3 build_multi_assembly.py phase2 --age 21
python3 build_multi_assembly.py phase2 --age 20
# ... etc
# 7. Rebuild interactive site
python3 build_site.py
# ── Roll call votes (optional, ~2 hours total) ──
# 8. Parse inline votes from plenary transcripts (16-19대, < 1 min)
python3 parse_plenary_votes.py
# 9. Extract appendix votes from plenary PDFs (17-19대, ~30-60 min)
pip install PyMuPDF
python3 extract_appendix_votes.py
# 10. Collect member-level roll calls via API (20-22대, ~20-30 min per assembly)
python3 collect_roll_calls.py
# 11. Consolidate all vote sources into unified dataset (< 1 min)
python3 consolidate_votes.py
API Key: Register at 열린국회정보 (free) and set the environment variable before running any collection script:
export ASSEMBLY_API_KEY=your_api_key_here
To make this persistent, add the line to your ~/.bashrc, ~/.zshrc, or .env file.
Documentation
| File | Description |
|---|---|
| CODEBOOK.md | Variable-level documentation (54 variables) |
| DATA_OVERVIEW.md | Summary statistics and visualizations |
| DATA_AVAILABILITY.md | Per-assembly data coverage and limitations |
| DATA_COLLECTION_STRATEGY.md | Original API exploration notes |
| MASTER_DATA_PLAN.md | Expansion roadmap and cross-project integration |
Project Structure
korean-bill-lifecycle/
├── collect.py # Phase 1+2 API collection
├── integrate.py # Phase 3 data integration
├── build_multi_assembly.py # Multi-assembly expansion
├── build_site.py # Interactive site generator
├── tutorial.ipynb # Jupyter tutorial notebook
├── site/index.html # Interactive explorer (GitHub Pages)
├── data/
│ ├── raw/ # API responses (parquet)
│ └── processed/ # Master tables (parquet + sqlite)
└── docs/ # Documentation (*.md)
Data Source
열린국회정보 Open API (open.assembly.go.kr)
License
Data sourced from public government APIs. Code is MIT licensed.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kna-0.2.0-py3-none-any.whl.
File metadata
- Download URL: kna-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed5476253313a8a8a7fa0bb139a70d64dbbb8bc477a146c3e3d7e8370da7c78c
|
|
| MD5 |
7e348b38e14cd9f9a62e409af78ec1a1
|
|
| BLAKE2b-256 |
5e3bc6d31e170c2ca437188cf2125e3a89ef0c5deae67220c040e2db38d0e50b
|