Skip to main content

60,925 South Korean National Assembly bills (20th-22nd) with full propose-reason texts

Project description

assembly-bills

60,925 member-proposed bills from the 20th-22nd Korean National Assembly (2016-2026), with full propose-reason texts.

20th Assembly   21,594 bills   2016-05 ~ 2020-05
21st Assembly   23,655 bills   2020-06 ~ 2024-05
22nd Assembly   15,676 bills   2024-06 ~ 2026-02

The propose-reason text (제안이유) was scraped in full from the National Assembly Legislative Information System. 99.4% of bills have text available. This is the first publicly available dataset containing the full propose-reason texts for Korean legislative bills.

assembly-bills info

Quickstart

Option A: CLI

pip install korean-assembly-bills

If pip doesn't work, try pip3 install korean-assembly-bills or python3 -m pip install korean-assembly-bills. This is common on macOS where pip may not be in your PATH.

assembly-bills info                          # dataset summary
assembly-bills search "인공지능"              # search by keyword
assembly-bills search "부동산" --age 21       # filter by assembly
assembly-bills show 2124567                   # view a bill + full text
assembly-bills mp "이재명"                    # look up an MP's bills
assembly-bills stats --by party              # statistics
assembly-bills export ai.csv --keyword "인공지능" --with-text

Option B: Python

import pandas as pd

bills = pd.read_parquet("data/bills.parquet")
texts = pd.read_parquet("data/bill_texts.parquet")

# merge metadata and text
df = bills.merge(texts[["BILL_ID", "propose_reason"]], on="BILL_ID", how="left")
print(df.shape)  # (60925, 25)

Option C: Interactive App

kr-bills.streamlit.app - browse, search, and explore bills in your browser.

Or run locally:

pip install korean-assembly-bills[app]
streamlit run app.py

Data Format

All files are in Apache Parquet format (compressed with zstd). Parquet is a columnar format that is 5-30x smaller than CSV, faster to load, and preserves data types. It is natively supported by pandas, polars, R (arrow), and most data tools.

Reading Parquet files

Python (pandas)

import pandas as pd
df = pd.read_parquet("data/bills.parquet")

Python (polars)

import polars as pl
df = pl.read_parquet("data/bills.parquet")

R

library(arrow)
df <- read_parquet("data/bills.parquet")

DuckDB (SQL)

SELECT * FROM read_parquet('data/bills.parquet') LIMIT 10;

If you need CSV, use the CLI export command:

assembly-bills export all_bills.csv
assembly-bills export all_with_text.csv --with-text

Data Dictionary

bills.parquet

Bill metadata from the Open National Assembly API. 60,925 rows, 24 columns.

Column Type Description
BILL_ID str Unique bill identifier (e.g. PRC_R2A0X...)
BILL_NO int Bill number
BILL_NAME str Bill title
COMMITTEE str Assigned standing committee
PROPOSE_DT str Date proposed (YYYY-MM-DD)
PROC_RESULT str Processing result (e.g. 원안가결, 수정가결, 임기만료폐기)
AGE int Assembly number (20, 21, 22)
DETAIL_LINK str Link to bill detail page
PROPOSER str Lead proposer name + count
MEMBER_LIST str Link to full member list
RST_MONA_CD str Lead proposer MP code (comma-separated for joint leads)
COMMITTEE_ID str Committee code
CMT_PROC_RESULT_CD str Committee processing result
CMT_PROC_DT str Committee processing date
CMT_PRESENT_DT str Committee presentation date
COMMITTEE_DT str Committee referral date
PROC_DT str Final processing date
LAW_PROC_DT str Judiciary committee processing date
LAW_PROC_RESULT_CD str Judiciary committee result
LAW_PRESENT_DT str Judiciary committee presentation date
LAW_SUBMIT_DT str Judiciary committee submission date
PUBL_MONA_CD str Public proposer codes (semicolon-separated)
RST_PROPOSER str Lead proposer display name
PUBL_PROPOSER str Public proposer display names

bill_texts.parquet

Propose-reason texts scraped from the legislative information system. 60,925 rows, 3 columns.

Column Type Description
BILL_ID str Matches bills.parquet
propose_reason str Full propose-reason text (제안이유). Median 468 chars, max 14,445.
scrape_status str ok (99.4%), no_csrf, empty, or error

proposers.parquet

Individual proposer records linking MPs to bills. 769,773 rows, 14 columns.

Column Type Description
BILL_ID str Matches bills.parquet
BILL_NO int Bill number
BILL_NM str Bill name
PPSL_DT str Proposal date
PPSR_NM str Proposer name
PPSR_POLY_NM str Party name
PPSR_CH_NM str Chinese characters of name
REP_DIV str 대표발의 for lead proposer, null for co-sponsor
PPSR_ROLE str Role (발의자)
PPSR_KIND str Proposer type (의원)
PPSR_CN str Proposer serial number
NASS_CD str MP code
ERACO int Assembly number
_bill_id str Original BILL_ID from API

mp_metadata.parquet

MP biographical information. 947 rows (661 unique MPs across 20th-22nd assemblies), 16 columns. Sourced from the ALLNAMEMBER API endpoint with per-assembly party data from BILLINFOPPSR.

Note on CMIT_NM: The ALLNAMEMBER API only returns committee assignments for the current (22nd) assembly, regardless of the UNIT_CD parameter. Committee data for the 20th and 21st assemblies is therefore set to blank rather than showing incorrect 22nd-assembly committee names.

Column Type Description
_age int Assembly number (20, 21, 22)
MONA_CD str Unique MP code (matches NASS_CD in proposers)
HG_NM str Name (Korean)
HJ_NM str Name (Chinese characters)
ENG_NM str Name (English)
POLY_NM str Party during the assembly (from bill proposals)
ELECT_POLY_NM str Party at time of election (may differ due to party renames)
ORIG_NM str Electoral district
ELECT_GBN_NM str Election type (지역구/비례대표)
CMIT_NM str Standing committee (22nd assembly only; blank for 20th/21st - see note below)
SEX_GBN_NM str Gender (남/여)
BTH_DATE str Birth date
REELE_GBN_NM str Seniority (초선, 재선, 3선, ...)
n_bills int Total bills proposed (lead + co-sponsor)
n_lead int Bills led as primary proposer
BRF_HST str Brief career history

CLI Reference

assembly-bills [COMMAND] [OPTIONS]
Command Description
info Dataset summary statistics
search KEYWORD Search bills by name or text
show BILL_NO Full detail view of a bill
mp NAME Look up an MP and their bills
stats Count statistics with bar charts
export PATH Export to CSV or Parquet

search options

-n, --limit N       Max results (default 20)
--age N             Filter by assembly (20, 21, 22)
--committee TEXT    Filter by committee (substring)
--party TEXT        Filter by party (substring)
--text              Also search in propose-reason texts

export options

--keyword TEXT      Filter by keyword
--age N             Filter by assembly
--with-text         Include propose-reason text
--format [csv|parquet]  Output format (default csv)

Data Collection

The data was collected from two sources:

  1. Open National Assembly API (open.assembly.go.kr): Bill metadata, proposer lists, and MP information. APIs used:

    • nzmimeepazxkubdpn (bill list)
    • BILLINFOPPSR (proposer list)
    • nwvrqwxyaytdsfvhu (MP metadata)
  2. Legislative Information System (likms.assembly.go.kr): Propose-reason texts were scraped from bill detail pages. The scraping process:

    • GET the bill detail page to obtain a CSRF token
    • POST with the full form parameters to retrieve the <pre id="prntSummary"> content
    • This two-step process is necessary because the API does not provide the propose-reason text

All 60,925 bills across the 20th-22nd assemblies were scraped, with a 99.4% success rate (60,546 texts obtained).


Known Limitations

  • Proposer list cap: The Open Assembly API returns a maximum of 100 proposer records per bill. 208 bills (mostly mass-signature bills with 100+ co-sponsors) have truncated proposer lists. These bills can be identified by having exactly 100 rows in proposers.parquet.
  • Committee data for past assemblies: CMIT_NM in mp_metadata.parquet is only available for the 22nd Assembly. The API does not return historical committee assignments for the 20th or 21st Assembly (see note in data dictionary).
  • 177 joint lead proposals: Some 22nd Assembly bills have multiple lead proposers (comma-separated RST_MONA_CD). Analysis assuming a single lead proposer per bill should account for these cases.

Use Cases

  • Legislative text analysis: NLP on Korean legislative language, topic modeling, text similarity
  • Political science research: MP productivity, party differences, committee dynamics
  • AI impact studies: Compare legislative texts before and after ChatGPT (Nov 2022)
  • Network analysis: Co-sponsorship networks from the proposers table

Citation

If you use this dataset in academic work, please cite:

@misc{yang2026assemblybills,
  author = {Yang, Kyusik},
  title = {Korean National Assembly Bills Dataset (20th-22nd Assembly)},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/kyusik-yang/korean-assembly-bills}
}

License

CC BY 4.0. The underlying legislative data is public information from the Korean National Assembly.

Related Projects


Built with Claude Code. To study bills, you need API bills.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korean_assembly_bills-0.1.0.tar.gz (41.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

korean_assembly_bills-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file korean_assembly_bills-0.1.0.tar.gz.

File metadata

  • Download URL: korean_assembly_bills-0.1.0.tar.gz
  • Upload date:
  • Size: 41.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for korean_assembly_bills-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a00e9eb3d2f724d3c232c576074afe901bbf3a74871291e66e81b4afe1616a2f
MD5 31907065191fba06a66b8682cfd8822f
BLAKE2b-256 422894974e0a014b54c75874bbd39882c90201f31d5104cf494b23972a31d378

See more details on using hashes here.

File details

Details for the file korean_assembly_bills-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for korean_assembly_bills-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3f805db582b815e186f3111efbb2bf88c916b50fd784c86474e3cee8031d3bd
MD5 a17c478e6c22dda5e77b3f76fc2c632c
BLAKE2b-256 fd47071207789bbcbfe7d5859b0cf9e4fa51dd3ccf32c3591e3bfcfd773da2a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page