Skip to main content

DART 공시 문서를 완벽하게 분석하는 Python 라이브러리 — 숫자와 텍스트 모두

Project description


DartLab

DartLab

Beyond the numbers — Extract both financials and text from DART filings

PyPI Python License

Docs · 한국어 · Sponsor

Docs Data Finance Data Report Data

What is DartLab?

DartLab is a Python library for parsing and analyzing DART (Data Analysis, Retrieval and Transfer System) — Korea's official electronic disclosure system. It extracts both financial numbers and narrative text from corporate filings.

All data is accessed through simple properties on a Company object, following the yfinance-style API.

Installation

uv is required — a fast Python package manager written in Rust. It handles Python version management and virtual environments automatically.

# 1. Install uv (skip if already installed)
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Create a project
uv init my-analysis && cd my-analysis

# 3. Install DartLab — pick the extras you need
uv add dartlab              # Core (financial statement parsing)
uv add dartlab[ai]          # + AI analysis web interface (dartlab ai)
uv add dartlab[llm]         # + OpenAI/Ollama LLM (CLI analysis)
uv add dartlab[charts]      # + Plotly charts
uv add dartlab[all]         # Everything

# 4. Verify
uv run python -c "from dartlab import Company; print(Company('005930').corpName)"
# → 삼성전자

# 5. Launch AI analysis (requires dartlab[ai])
uv run dartlab ai
# → http://localhost:8400

Quick Start

from dartlab import Company

c = Company("005930")       # by stock code
c = Company("삼성전자")      # by company name (Korean)
c.corpName                   # "삼성전자"

Creating a Company object prints a usage guide. For the full guide, call c.guide().

Data is auto-downloaded from GitHub Releases when not found locally.

from dartlab.core.dataLoader import downloadAll

downloadAll("docs")                        # 260+ companies — disclosure documents
downloadAll("finance")                     # 2,700+ companies — financial numbers
downloadAll("report")                      # 2,700+ companies — periodic reports
downloadAll("finance", forceUpdate=True)   # re-download if remote is newer

Features

Financial Statements

c.BS    # Balance Sheet (DataFrame)
c.IS    # Income Statement (DataFrame)
c.CF    # Cash Flow Statement (DataFrame)

Cross-Company Comparable Time Series (financeEngine)

OpenDART financial data is mapped to standardized accounts, enabling cross-company quarterly time series.

series, periods = c.timeseries
# periods = ["2016_Q1", "2016_Q2", ..., "2024_Q4"]
# series["IS"]["revenue"]            # quarterly revenue
# series["BS"]["total_assets"]       # quarterly total assets
# series["CF"]["operating_cashflow"] # quarterly operating cash flow

r = c.ratios
r.roe               # 8.29 (%)
r.operatingMargin   # 9.51 (%)
r.debtRatio         # 27.4 (%)
r.fcf               # Free Cash Flow (KRW)

2,700+ listed companies are normalized to the same snakeId schema, making any pair of companies directly comparable.

Summary Financials with Bridge Matching

Extracts summary financial time series, automatically tracking accounts even when names change due to K-IFRS revisions.

result = c.fsSummary()

result.FS          # Full financial time series (Polars DataFrame)
result.BS          # Balance Sheet
result.IS          # Income Statement
result.allRate     # Overall match rate (e.g. 0.97)
result.breakpoints # List of detected breakpoints

K-IFRS Notes (12 items)

c.notes.inventory          # Inventories
c.notes["재고자산"]         # Korean key also works
c.notes.receivables        # Trade receivables
c.notes.tangibleAsset      # Property, plant & equipment
c.notes.intangibleAsset    # Intangible assets
c.notes.investmentProperty # Investment property
c.notes.affiliates         # Associates
c.notes.borrowings         # Borrowings
c.notes.provisions         # Provisions
c.notes.eps                # Earnings per share
c.notes.lease              # Leases
c.notes.segments           # Operating segments
c.notes.costByNature       # Expenses by nature

Dividends

c.dividend
# ┌──────┬───────────┬───────┬──────────────┬─────────────┬──────────────┬──────┐
# │ year ┆ netIncome ┆ eps   ┆ totalDividend┆ payoutRatio ┆ dividendYield┆ dps  │
# └──────┴───────────┴───────┴──────────────┴─────────────┴──────────────┴──────┘

Major Shareholders

c.majorHolder    # Largest shareholder + related parties ownership (time series)

For the full Result object: c.get("majorHolder")

result = c.get("majorHolder")
result.majorHolder   # "이재용"
result.majorRatio    # 20.76
result.timeSeries    # Ownership ratio time series

Employees

c.employee    # year, totalEmployees, avgSalary, avgTenure, ...

Audit Opinion

c.audit    # year, auditor, opinion, keyAuditMatters

Executives

c.executive      # year, totalRegistered, insideDirectors, outsideDirectors, ...
c.executivePay   # year, category, headcount, totalPay, avgPay

Shares / Capital

c.shareCapital     # Issued, treasury, outstanding shares
c.capitalChange    # Capital changes
c.fundraising      # Capital increases/decreases

Subsidiaries / Associates

c.subsidiary           # Investments in other corporations
c.affiliateGroup       # Affiliate group companies
c.investmentInOther    # Investee, ownership ratio, book value

Board / Governance

c.boardOfDirectors     # Board composition, attendance
c.shareholderMeeting   # Shareholder meeting agendas, resolutions
c.auditSystem          # Audit committee, audit activities
c.internalControl      # Internal control assessment

Risk / Legal

c.contingentLiability  # Contingent liabilities, lawsuits
c.relatedPartyTx       # Related party transactions
c.sanction             # Sanctions, penalties
c.riskDerivative       # FX sensitivity, derivatives

Other Financials

c.bond                 # Debt securities
c.rnd                  # R&D expenses
c.otherFinance         # Allowance for bad debt, etc.
c.productService       # Major products/services
c.salesOrder           # Sales performance, order backlog
c.articlesOfIncorporation  # Articles of incorporation amendments

Company Info

c.companyHistory         # Corporate history
c.companyOverviewDetail  # Incorporation date, listing date, CEO, address

Disclosure Narratives

c.business       # Business overview (sections + change detection)
c.overview       # Company overview (incorporation, address, credit rating)
c.mdna           # Management Discussion & Analysis
c.rawMaterial    # Raw materials, tangible assets, capex

Raw Data Access

c.rawDocs        # Original docs parquet (unprocessed)
c.rawFinance     # Original finance parquet (unprocessed)
c.rawReport      # Original periodic report parquet (unprocessed)

AI Analysis (dartlab ai)

Chat with an LLM over DartLab's structured data to analyze companies interactively — uv run dartlab ai opens the web UI at http://localhost:8400.

All extracted data (financial statements, notes, dividends, executives, governance) is provided as context for natural-language Q&A with streaming responses.

Currently supported LLM: Ollama (local)

The current version supports Ollama for local LLM inference. No API key needed, and your data stays on your machine.

  • Install Ollama, then ollama pull gemma3 to download a model
  • Select and download models in the UI settings
  • GPU (NVIDIA/AMD) is auto-detected for acceleration

Coming soon: Cloud LLM providers (OpenAI, Anthropic, etc.)


Bulk Extraction

d = c.all()    # All module data as dict (with progress bar)
# {"BS": df, "IS": df, "CF": df, "dividend": df, "notes": {...},
#  "timeseries": (series, periods), "ratios": RatioResult, ...}
import dartlab
dartlab.verbose = False    # Suppress progress output

d = c.all()    # Silent extraction

Result Object

Properties return the primary DataFrame. For the full Result object, use c.get().

# property — returns DataFrame directly
c.audit          # opinionDf (audit opinion DataFrame)

# get() — returns full Result object
result = c.get("audit")
result.opinionDf   # Audit opinion
result.feeDf       # Audit fees

Company Search

from dartlab import Company

Company.search("삼성")
# ┌──────────────┬──────────┬────────────────┐
# │ 회사명       ┆ 종목코드 ┆ 업종           │
# └──────────────┴──────────┴────────────────┘

Company.listing()   # Full KRX listed companies
Company.status()    # Local data index
c.docs()            # Filing list + DART viewer links

Core Technology

Horizontal Alignment of Filings

DART filings cover different periods depending on report type:

                           Q1         Q2         Q3         Q4
                          ┌──────┐
 Q1 Report                │  Q1  │
                          └──────┘
                          ┌──────────────┐
 Semi-Annual              │   Q1 + Q2    │
                          └──────────────┘
                          ┌─────────────────────┐
 Q3 Report                │    Q1 + Q2 + Q3     │
                          └─────────────────────┘
                          ┌──────────────────────────────┐
 Annual Report            │       Q1 + Q2 + Q3 + Q4      │
                          └──────────────────────────────┘

Q1 reports contain only Q1, semi-annual reports contain cumulative Q1+Q2, and annual reports contain the full year. DartLab reverse-engineers standalone quarterly figures from these cumulative structures, and tracks accounts even when names change between filings.

Bridge Matching

K-IFRS revisions and internal restructuring frequently cause account name changes within the same company. Bridge Matching combines amount matching and name similarity across adjacent years to automatically link identical accounts.

             2022              2023              2024
             ──────            ──────            ──────
 매출액 ────────────── 매출액 ────────────── 수익(매출액)
                              ↑ name change              ↑ name change
 영업이익 ──────────── 영업이익 ──────────── 영업이익
 당기순이익 ────────── 당기순이익 ────────── 당기순이익(손실)

Four-stage matching process:

  1. Exact match — identical amounts
  2. Restatement match — within 0.5 tolerance
  3. Name change match — amount error < 5% AND name similarity > 60%
  4. Special item match — decimal-unit items like EPS

When match rate drops below 85%, a breakpoint is detected and the segment is split.


Data

Sources and Integrity

All data originates from OpenDART and DART, Korea's official electronic disclosure system. The developer has not modified a single number — only metadata columns (stock code, year, report type, etc.) have been added for structural organization.

If you want to verify, you can cross-check any value against the original filings using the package's built-in DART viewer links (c.docs()).

Each Parquet file contains all filings for a single company:

  • Metadata: stock code, company name, report type, filing date, business year
  • Quantitative: summary financials, financial statement body, notes
  • Narrative: business description, audit opinion, risk management, executive/shareholder status

Data Releases

Category Release Tags Description Count
Disclosure data-docs Parsed annual report sections 260+
Finance data-finance-1 2 3 4 XBRL financial statement numbers 2,700+
Report data-report-1 2 3 4 Periodic report data 2,700+

Finance and Report data are split into 4 tags by stock code range (GitHub's 1000-asset-per-release limit). loadData() and downloadAll() handle this automatically.

Bring Your Own Data

If you structure your own Parquet files to match DartLab's schema, all existing features work out of the box. Place files as data/{category}/{stockCode}.parquet and every property, extraction module, and analysis tool will function normally.

Disclaimer

This project is licensed under MIT. While the data faithfully mirrors OpenDART public disclosures, no guarantee of commercial reliability is provided. Always verify against official sources for investment or compliance decisions.

Update frequency

Data is collected directly without paid proxies, so updates may be slow. Adding new companies or reflecting the latest filings may take time.


Why DartLab?

DART filings contain far more than financial numbers — business descriptions, risk factors, audit opinions, litigation status, and governance changes are all embedded in the text. Most tools only extract the numbers. The rest is discarded.

DartLab extracts both. It aligns quarterly, semi-annual, and annual reports on a single time axis, and automatically tracks accounts even when K-IFRS revisions or restructuring changes their names.

Current scope

Bridge Matching tracks account name changes within a single company across years. financeEngine enables cross-company comparison by mapping XBRL accounts to standardized snakeIds. 2,700+ listed companies are normalized to the same structure.

Text analysis capabilities are being developed in a separate project and will be integrated into DartLab.

The ultimate goal is a tool that can analyze the entire market at once, not just one company.

Roadmap

  • Summary financial time series (Bridge Matching)
  • Consolidated BS, IS, CF
  • Segment revenue, associates, dividends, employees, shareholders, subsidiaries
  • Debt securities, expenses by nature, raw materials/capex
  • Audit opinion, executive status, executive compensation
  • PPE movement, note details (23 keywords)
  • Board of directors, capital changes, contingent liabilities, related party tx, sanctions, R&D, internal control
  • Affiliate groups, capital raises, sales/orders, products, risk management/derivatives
  • MD&A, business description, company overview
  • Company property API + Notes integration + all()
  • Rich terminal output (avatar + usage guide)
  • Account standardization engine (financeEngine) — 2,700+ companies cross-comparable
  • Quarterly time series + financial ratios (c.timeseries, c.ratios)
  • AI analysis web interface (dartlab ai) — Ollama local LLM
  • Cloud LLM providers (OpenAI, Anthropic, etc.)
  • Text analysis module integration (from separate project)
  • Quantitative + qualitative cross-validation
  • Visualization

Sponsor

Buy Me A Coffee

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dartlab-0.2.5.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dartlab-0.2.5-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file dartlab-0.2.5.tar.gz.

File metadata

  • Download URL: dartlab-0.2.5.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dartlab-0.2.5.tar.gz
Algorithm Hash digest
SHA256 299d47948341d266f48cf67f13e574e3ec7ea59a8c41ad8ea4e8010e9db506dd
MD5 dfd76d23c68cd369833dc48b58141192
BLAKE2b-256 1e8dde50647ce9b9a03d913dd7c69cb1eecaa4951745cf67e2f2b91f4e1429a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for dartlab-0.2.5.tar.gz:

Publisher: publish.yml on eddmpython/dartlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dartlab-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: dartlab-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dartlab-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f5d133ec7e5fe2d76992a7ee52aaf3365b9df1ce964b0f2c2383f90d5972bf2e
MD5 682f6a02eeb47981852e2f478b33b10b
BLAKE2b-256 ef3c650030fc62fb976a93d1be28922ae7ac3f01bd87626bb5034aafe422aff4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dartlab-0.2.5-py3-none-any.whl:

Publisher: publish.yml on eddmpython/dartlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page