공시 문서에서 하나의 회사 맵을 만든다 — DART + EDGAR
Project description
DartLab
One company map from disclosure filings — DART + EDGAR
Docs · Blog · Marimo Notebooks · Open in Colab · 한국어 · Sponsor
Why DartLab
Corporate disclosures are the richest public record of a company — financials, risk factors, business strategy, governance, compensation, and more. But accessing this data today means:
- Manual PDF reading — annual reports are 200+ page documents with no structure for programmatic access
- Fragmented tools — one library for financial statements, another for text, another for EDGAR, none of them talking to each other
- No time axis — even if you extract data, comparing the same section across 5 years of filings requires custom glue code every time
- Lost context — financial numbers without the narrative that explains them, or text without the numbers that ground it
DartLab solves this by building one unified company map from raw filings. Text, tables, and financial statements are aligned on the same topic-by-period spine. You get the complete picture — not fragments.
Who It's For
| You are... | DartLab gives you... |
|---|---|
| Investor / Analyst | Instant access to any disclosure topic across all periods — no more PDF digging |
| Quant / Data Scientist | Clean, structured DataFrames from 2,700+ companies ready for modeling |
| Developer | A single Company object that unifies docs, finance, and report APIs |
| Researcher | Standardized cross-company datasets with full text + financial data |
| AI Builder | Pre-structured company context that LLMs can actually reason over |
What You Can Do
- "How did Samsung change its business description between 2023 and 2024?" →
c.diff("businessOverview") - "Show me Apple's risk factors from the latest 10-K" →
us.show("10-K::item1ARiskFactors") - "Which listed companies have the highest debt-to-equity?" →
dartlab.debt("all") - "Give me 5 years of income statements for comparison" →
c.IS - "What governance issues exist across the entire market?" →
dartlab.governance()
How It's Different
| DartLab | Typical tools | |
|---|---|---|
| Scope | Full disclosure text + financials + structured reports | Usually one of these |
| Structure | One horizontalized map (topic × period) | Flat outputs, no time alignment |
| Sources | DART + EDGAR in the same interface | Korea-only or US-only |
| Data provenance | trace() tells you exactly which source was used |
Black box |
| AI-ready | Structured sections feed directly into LLM context | Manual prompt engineering |
What DartLab Is
DartLab turns corporate filings into a single company map — for both Korean DART and US EDGAR.
The center of that map is sections: a horizontalized matrix built from disclosure sections across periods. Instead of treating a filing as a pile of unrelated parsers, DartLab aligns the document structure first, then lets stronger sources fill in what they own:
docs— section structure, narrative text with heading/body separation, tables, and evidencefinance— authoritative numeric statements (BS, IS, CF) and financial ratiosreport— authoritative structured disclosure APIs (DART only)
import dartlab
c = dartlab.Company("005930") # Samsung Electronics (DART)
c.sections # full company map (topic × period)
c.topics # topic list with source, blocks, periods
c.show("overview") # open one topic (alias for companyOverview)
c.show("IS", period=["2024Q4", "2023Q4"]) # compare specific periods
c.BS # balance sheet
c.ratios # ratio time series
c.insights # 7-area grades (A~F)
us = dartlab.Company("AAPL") # Apple (EDGAR)
us.sections
us.show("business") # alias for item1Business
us.BS
us.ratios
Install
uv add dartlab
No data setup required. When you create a Company for the first time, dartlab automatically downloads the required data from GitHub Releases (DART) or SEC API (EDGAR finance). The second run loads instantly from local cache.
[dartlab] 005930 (DART 공시 문서 데이터) → 첫 사용: GitHub에서 자동 다운로드 중...
[dartlab] ✓ DART 공시 문서 데이터 다운로드 완료 (542KB)
[dartlab] 005930 (재무 숫자 데이터) → 첫 사용: GitHub에서 자동 다운로드 중...
[dartlab] ✓ 재무 숫자 데이터 다운로드 완료 (38KB)
AI interface (web UI + CLI):
uv add "dartlab[ai]"
uv run dartlab # web UI
uv run dartlab setup # provider setup guide
uv run dartlab ask "삼성전자 재무건전성 분석해줘" # CLI one-shot
Try It Now
Interactive Marimo notebooks let you explore real company data immediately — no code to write:
uv add dartlab marimo
marimo edit startMarimo/dartCompany.py # Korean company (DART)
marimo edit startMarimo/edgarCompany.py # US company (EDGAR)
marimo edit startMarimo/aiAnalysis.py # AI analysis examples
Or open any tutorial in Colab — no install needed:
| Notebook | Topic |
|---|---|
| Quick Start — sections, show, trace, diff | |
| Financial Statements — BS, IS, CF | |
| Ratios — 47 financial ratios | |
| Disclosure — sections, text parsing | |
| EDGAR — US SEC filings |
Quick Start
Sections — The Company Map
sections is a Polars DataFrame where each row is a disclosure block and each period column holds the raw payload. Periods are sorted newest-first, and annual reports appear as Q4:
chapter │ topic │ blockType │ textNodeType │ 2025Q4 │ 2024Q4 │ 2024Q3 │ …
I │ companyOverview │ text │ heading │ "…" │ "…" │ "…" │
I │ companyOverview │ text │ body │ "…" │ "…" │ "…" │
I │ companyOverview │ table │ null │ "…" │ "…" │ null │
II │ businessOverview │ text │ heading │ "…" │ "…" │ "…" │
III │ BS │ table │ null │ — │ — │ — │ (finance)
VII │ dividend │ table │ null │ — │ — │ — │ (report)
Text blocks carry structural metadata — textNodeType (heading/body), textLevel, and textPath — so you can distinguish section headers from narrative content.
Show, Trace, Diff
c = dartlab.Company("005930")
# show — open any topic with source-aware priority
c.show("BS") # → finance DataFrame
c.show("overview") # → sections-based text + tables
c.show("dividend") # → report DataFrame (all quarters)
# compare specific periods
c.show("IS", period=["2024Q4", "2023Q4"])
# trace — why a topic came from docs, finance, or report
c.trace("BS") # → {"primarySource": "finance", ...}
# diff — text change detection (3 modes)
c.diff() # full summary
c.diff("businessOverview") # topic history
c.diff("businessOverview", "2024", "2025") # line-by-line diff
Finance
c.BS # balance sheet (account × period, newest first)
c.IS # income statement
c.CF # cash flow
c.ratios # ratio time series DataFrame (6 categories × period)
c.finance.ratios # latest single-point RatioResult
c.finance.ratioSeries # ratio time series across years
c.finance.timeseries # raw account time series
Financial ratios cover 6 categories: profitability, stability, growth, efficiency, cashflow, and valuation.
Modules — What Topics Are Available
DartLab exposes 100+ modules across 6 categories. Use the CLI to discover them:
dartlab modules # list all modules
dartlab modules --category finance # filter by category
dartlab modules --search dividend # search by keyword
Or in Python:
c.topics # list all available topics for this company
Categories: finance (statements, ratios), report (dividend, governance, audit), notes (K-IFRS annotations), disclosure (narrative text), analysis (insights, rankings), raw (original parquets).
Insights
c.insights # 7-area analysis
c.insights.grades() # → {"performance": "A", "profitability": "B", …}
c.insights.performance.grade # → "A"
c.insights.performance.details # → ["Revenue growth +8.3%", …]
c.insights.anomalies # → outliers and red flags
7 analysis areas: performance, profitability, health, cashflow, governance, risk, opportunity.
Charts & Visualization
Built-in Plotly charts and a JSON-based ChartSpec protocol that works with or without Plotly:
import dartlab
c = dartlab.Company("005930")
# one-liner Plotly charts
dartlab.chart.revenue(c).show() # revenue + operating margin combo
dartlab.chart.cashflow(c).show() # operating/investing/financing CF
dartlab.chart.dividend(c).show() # DPS + yield + payout ratio
dartlab.chart.balance_sheet(c).show() # current/non-current assets
dartlab.chart.profitability(c).show() # ROE, operating margin, net margin
# auto-detect all available charts
specs = dartlab.chart.auto_chart(c) # → list of ChartSpec dicts
dartlab.chart.chart_from_spec(specs[0]).show() # render any spec as Plotly
# generic charts from any DataFrame
dartlab.chart.line(c.dividend, y=["dps"])
dartlab.chart.bar(df, x="year", y=["revenue", "operating_income"], stacked=True)
Data tools for table formatting and text analysis:
# table tools
dartlab.table.yoy_change(c.dividend, value_cols=["dps"]) # add YoY% columns
dartlab.table.format_korean(c.BS, unit="백만원") # 1.2조원, 350억원
dartlab.table.summary_stats(c.dividend, value_cols=["dps"]) # mean/CAGR/trend
# text tools
dartlab.text.extract_keywords(narrative) # frequency-based keywords
dartlab.text.sentiment_indicators(narrative) # positive/negative/risk score
dartlab.text.extract_numbers(narrative) # numbers + units + context
Install chart dependencies: uv add "dartlab[charts]"
Network — Affiliate Map
Visualize corporate ownership networks — who invests in whom, group structure, and circular ownership:
c = dartlab.Company("005930")
# interactive vis.js graph in browser
c.network().show() # ego view (1 hop)
c.network(hops=2).show() # 2-hop neighborhood
# DataFrame views
c.network("members") # group affiliates
c.network("edges") # investment/shareholder connections
c.network("cycles") # circular ownership paths
c.network("peers") # ego subgraph as DataFrame
# full market network (all listed companies)
dartlab.network().show()
The browser view supports dark/light themes, company search, group filtering, hover tooltips with ownership percentages, and click-to-highlight connected companies.
Market Scan
Scan the full listed market by theme, then zoom back into a single company row when needed:
c = dartlab.Company("005930")
# one company
c.governance()
c.workforce()
c.capital()
c.debt()
# market summary
c.governance("market") # by market summary
c.governance("all") # full market DataFrame
# module-level full scans
dartlab.governance()
dartlab.workforce()
dartlab.capital()
dartlab.debt()
These scans combine report + finance parquet data into market-wide DataFrames for governance quality, workforce/pay trends, shareholder return behavior, and debt risk.
EDGAR (US)
Same Company interface, different data source:
us = dartlab.Company("AAPL")
us.sections # 10-K/10-Q sections with heading/body
us.show("business") # business description
us.show("10-K::item1ARiskFactors") # risk factors
us.BS # SEC XBRL balance sheet
us.ratios # same 47 ratios
us.diff("10-K::item7Mdna") # MD&A text changes
EDGAR sections include the same text structure metadata (heading/body separation, textLevel, textPath) as DART.
OpenAPI — Raw Public APIs
Use source-native wrappers when you want raw disclosure APIs directly.
OpenDart (Korea)
Note: The
Companyinterface does not require an API key — it works with pre-built datasets from GitHub Releases.OpenDartuses the raw DART API and requires an API key from https://opendart.fss.or.kr (free registration).export DART_API_KEY=your_key_here # Linux/Mac $env:DART_API_KEY = 'your_key_here' # PowerShell
from dartlab import OpenDart
d = OpenDart() # auto-detect API key
d = OpenDart(["key1", "key2"]) # multi-key rotation
d.search("카카오", listed=True) # company search
d.filings("삼성전자", "2024") # filing list
d.company("삼성전자") # corporate profile
d.finstate("삼성전자", 2024) # financial statements
d.report("삼성전자", "배당", 2024) # 56 report categories
# convenience proxy
s = d("삼성전자")
s.finance(2024)
s.report("배당", 2024)
s.filings("2024")
OpenEdgar (US)
from dartlab import OpenEdgar
e = OpenEdgar()
e.search("Apple") # ticker search
e.company("AAPL") # company info
e.filings("AAPL", forms=["10-K", "10-Q"]) # filing list
e.companyFactsJson("AAPL") # XBRL facts
e.companyConceptJson("AAPL", "us-gaap", "Revenue") # single tag series
These wrappers keep the original source surface intact, while saved parquet stays compatible with DartLab's Company engine.
MCP — AI Assistant Integration
DartLab includes a built-in MCP server that connects directly to Claude Desktop, Cursor, and other MCP-compatible AI assistants. This turns DartLab's full analysis engine into a tool your AI assistant can call.
uv add "dartlab[mcp]"
Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"dartlab": {
"command": "uv",
"args": ["run", "dartlab", "mcp"]
}
}
}
What the AI Can Do
Once connected, your AI assistant can:
- Search companies — "삼성전자 찾아줘" → calls
search_company - Read any disclosure topic — "삼성전자 사업개요 보여줘" → calls
show_topic - Compare periods — "2024년과 2023년 재무제표 비교해줘" → calls
get_timeseries - Calculate ratios — "부채비율 계산해줘" → calls
calculate_ratios - Grade a company — "이 회사 종합 등급은?" → calls
get_insights - Cross-market analysis — works with both DART (Korean) and EDGAR (US) companies
45+ tools are automatically available through the MCP bridge. The AI gets structured company data, not raw text — so it can give precise, grounded answers.
CLI
dartlab mcp # start MCP stdio server
AI Analysis
DartLab includes a built-in AI analysis layer that feeds structured company data to LLMs. The system automatically selects relevant data (financials, ratios, disclosure text) based on your question.
Python API
import dartlab
# streams to stdout, returns full text
answer = dartlab.ask("삼성전자 재무건전성 분석해줘")
# provider + model override
answer = dartlab.ask("삼성전자 분석", provider="openai", model="gpt-4o")
# data filtering
answer = dartlab.ask("삼성전자 핵심 포인트", include=["BS", "IS"])
# analysis pattern (framework-guided)
answer = dartlab.ask("삼성전자 분석", pattern="financial")
# 2-arg form
answer = dartlab.ask("005930", "재무 건전성 분석")
# agent mode — LLM selects tools for deeper analysis
answer = dartlab.chat("005930", "배당 추세를 분석하고 이상 징후를 찾아줘")
CLI
# provider setup (interactive guide)
dartlab setup # list all providers
dartlab setup ollama # local LLM (free)
dartlab setup openai # OpenAI API
# check status
dartlab status # all providers (table view)
dartlab status -p ollama # single provider detail
dartlab status --cost # cumulative token/cost stats
# ask questions (streaming by default)
dartlab ask "삼성전자 재무건전성 분석해줘"
dartlab ask "삼성전자 배당 분석" -p openai -m gpt-4o
dartlab ask "AAPL risk analysis" -p ollama
dartlab ask --continue "배당 추세는?" # continue previous conversation
dartlab ask "삼성전자 분석" --pattern financial # analysis framework
# auto-generate report (Markdown)
dartlab report "삼성전자"
dartlab report "삼성전자" -o report.md
# web UI
dartlab # open browser UI
5 providers supported: oauth-codex (ChatGPT subscription), codex (Codex CLI), ollama (local, free), openai (API key), custom (OpenAI-compatible).
Project Settings (.dartlab.yml)
Place a .dartlab.yml in your project root or home directory to set defaults:
company: 005930 # default company
provider: openai # default LLM provider
model: gpt-4o # default model
verbose: false
Core Ideas
1. Sections First
sections is the backbone. A company is described as one horizontalized map of disclosure units across periods — not a loose set of parser outputs.
2. Source-Aware Company
Company is a merged company object. When finance or report is more authoritative than docs for a given topic, it overrides automatically. trace() tells you which source was chosen and why.
3. Text Structure
Narrative text is not a flat string. DartLab splits it into heading/body rows with level and path metadata, enabling structural comparison across periods. This works for both Korean DART and English EDGAR filings.
4. Raw Access
You can always go deeper:
c.docs.sections # pure docs horizontalization
c.finance.BS # finance engine directly
c.report.extract("배당") # report engine directly
Stability
| Tier | Scope |
|---|---|
| Stable | DART Company (sections, show, trace, diff, BS/IS/CF, ratios, insights) |
| Beta | EDGAR Company, OpenDart, OpenEdgar, Server API |
| Experimental | AI tools, export |
See docs/stability.md.
Data
DartLab ships with pre-built datasets via GitHub Releases.
- finance / report: Already collected and up-to-date for 2,700+ listed companies.
- docs: Being collected gradually to avoid overloading the DART EDGAR system — currently 320+ companies stored in the
data-docsrelease, updated as collection progresses. - EDGAR finance: Fetched on-demand from SEC XBRL API (no pre-built dataset needed).
| Dataset | Coverage | Source |
|---|---|---|
| DART docs | 320+ companies (growing) | Korean disclosure text + tables |
| DART finance | 2,700+ companies | XBRL financial statements |
| DART report | 2,700+ companies | Structured disclosure APIs |
| EDGAR docs | 970+ companies | 10-K/10-Q sections |
| EDGAR finance | On-demand | SEC XBRL facts (auto-fetched from SEC API) |
# Bulk download (optional — downloads all companies at once)
from dartlab.core.dataLoader import downloadAll
downloadAll("docs") # DART disclosure documents
downloadAll("finance") # DART financial statements
downloadAll("report") # DART structured reports
Documentation
Docs are continuously updated with new content.
- Docs: https://eddmpython.github.io/dartlab/
- Sections guide: https://eddmpython.github.io/dartlab/docs/getting-started/sections
- Quick start: https://eddmpython.github.io/dartlab/docs/getting-started/quickstart
- API overview: https://eddmpython.github.io/dartlab/docs/api/overview
Blog
The DartLab Blog covers practical disclosure analysis topics — how to read financial reports, interpret disclosure patterns, and spot risk signals. 115+ articles across three categories:
- Disclosure Systems — structure and mechanics of DART/EDGAR filings
- Report Reading — practical guide to reading audit reports, preliminary earnings, restatements
- Financial Interpretation — interpreting financial statements, ratios, and disclosure signals
Contributing
The project prefers experiments before engine changes. If you want to propose a parser or mapping change, validate it first and then bring the result back into the engine.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dartlab-0.7.4.tar.gz.
File metadata
- Download URL: dartlab-0.7.4.tar.gz
- Upload date:
- Size: 14.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4285c3f3eea5d0a76f5c5963f70fc4ed3604b3fd2cfc36e449bff1de3b99d14f
|
|
| MD5 |
f7d890c563a28b2e2b56237b873a0275
|
|
| BLAKE2b-256 |
fa353bd8f3594b97678501070b9fd3b144c02b9d17ec0a3e0f7f3988464d090c
|
Provenance
The following attestation bundles were made for dartlab-0.7.4.tar.gz:
Publisher:
publish.yml on eddmpython/dartlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dartlab-0.7.4.tar.gz -
Subject digest:
4285c3f3eea5d0a76f5c5963f70fc4ed3604b3fd2cfc36e449bff1de3b99d14f - Sigstore transparency entry: 1154600082
- Sigstore integration time:
-
Permalink:
eddmpython/dartlab@200438575e53d0a360f2048c77b56a5bf95caac8 -
Branch / Tag:
refs/tags/v0.7.4 - Owner: https://github.com/eddmpython
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@200438575e53d0a360f2048c77b56a5bf95caac8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dartlab-0.7.4-py3-none-any.whl.
File metadata
- Download URL: dartlab-0.7.4-py3-none-any.whl
- Upload date:
- Size: 14.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71212fe2d664aeb6d2736e123800771e2daf18e010e0c32a917ba3d86f6fc175
|
|
| MD5 |
823534b28ac16ad219270c02936cad9a
|
|
| BLAKE2b-256 |
4db7d03b6b617f737a704473f4f0b410430b4f8af69da593c111076b09809632
|
Provenance
The following attestation bundles were made for dartlab-0.7.4-py3-none-any.whl:
Publisher:
publish.yml on eddmpython/dartlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dartlab-0.7.4-py3-none-any.whl -
Subject digest:
71212fe2d664aeb6d2736e123800771e2daf18e010e0c32a917ba3d86f6fc175 - Sigstore transparency entry: 1154600084
- Sigstore integration time:
-
Permalink:
eddmpython/dartlab@200438575e53d0a360f2048c77b56a5bf95caac8 -
Branch / Tag:
refs/tags/v0.7.4 - Owner: https://github.com/eddmpython
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@200438575e53d0a360f2048c77b56a5bf95caac8 -
Trigger Event:
push
-
Statement type: