공시 문서에서 하나의 회사 맵을 만든다 — DART + EDGAR
Project description
What DartLab Is
DartLab turns corporate filings into a single company map — for both Korean DART and US EDGAR.
The center of that map is sections: a horizontalized matrix built from disclosure sections across periods. Instead of treating a filing as a pile of unrelated parsers, DartLab aligns the document structure first, then lets stronger sources fill in what they own:
docs— section structure, narrative text with heading/body separation, tables, and evidencefinance— authoritative numeric statements (BS, IS, CF) and financial ratiosreport— authoritative structured disclosure APIs (DART only)
import dartlab
c = dartlab.Company("005930") # Samsung Electronics (DART)
c.sections # full company map (topic × period)
c.topics # topic list with source, blocks, periods
c.show("companyOverview") # open one topic
c.show("IS", period=["2024Q4", "2023Q4"]) # vertical view (period × item)
c.BS # balance sheet
c.ratios # ratio time series (항목 × period)
c.insights # 7-area grades (A~F)
us = dartlab.Company("AAPL") # Apple (EDGAR)
us.sections
us.show("10-K::item1Business")
us.BS
us.ratios
Install
uv add dartlab
AI interface:
uv add "dartlab[ai]"
uv run dartlab ai
Quick Start
Sections — The Company Map
sections is a Polars DataFrame where each row is a disclosure block and each period column holds the raw payload. Periods are sorted newest-first, and annual reports appear as Q4:
chapter │ topic │ blockType │ textNodeType │ 2025Q4 │ 2024Q4 │ 2024Q3 │ …
I │ companyOverview │ text │ heading │ "…" │ "…" │ "…" │
I │ companyOverview │ text │ body │ "…" │ "…" │ "…" │
I │ companyOverview │ table │ null │ "…" │ "…" │ null │
II │ businessOverview │ text │ heading │ "…" │ "…" │ "…" │
III │ BS │ table │ null │ — │ — │ — │ (finance)
VII │ dividend │ table │ null │ — │ — │ — │ (report)
Text blocks carry structural metadata — textNodeType (heading/body), textLevel, and textPath — so you can distinguish section headers from narrative content.
Show, Trace, Diff
c = dartlab.Company("005930")
# show — open any topic with source-aware priority
c.show("BS") # → finance DataFrame
c.show("companyOverview") # → sections-based text + tables
c.show("dividend") # → report DataFrame (all quarters)
# vertical view — compare specific periods side by side
c.show("IS", period=["2024Q4", "2023Q4"]) # period × item
# trace — why a topic came from docs, finance, or report
c.trace("BS") # → {"primarySource": "finance", ...}
# diff — text change detection (3 modes)
c.diff() # full summary
c.diff("businessOverview") # topic history
c.diff("businessOverview", "2024", "2025") # line-by-line diff
Finance
c.BS # balance sheet (account × period, newest first)
c.IS # income statement
c.CF # cash flow
c.ratios # ratio time series DataFrame (6 categories × period)
c.finance.ratios # latest single-point RatioResult
c.finance.ratioSeries # ratio time series across years
c.finance.timeseries # raw account time series
Financial ratios cover 6 categories: profitability, stability, growth, efficiency, cashflow, and valuation.
Insights
c.insights # 7-area analysis
c.insights.grades() # → {"performance": "A", "profitability": "B", …}
c.insights.performance.grade # → "A"
c.insights.performance.details # → ["Revenue growth +8.3%", …]
c.insights.anomalies # → outliers and red flags
7 analysis areas: performance, profitability, health, cashflow, governance, risk, opportunity.
EDGAR (US)
Same Company interface, different data source:
us = dartlab.Company("AAPL")
us.sections # 10-K/10-Q sections with heading/body
us.show("10-K::item1Business") # business description
us.show("10-K::item1ARiskFactors") # risk factors
us.BS # SEC XBRL balance sheet
us.ratios # same 47 ratios
us.diff("10-K::item7Mdna") # MD&A text changes
EDGAR sections include the same text structure metadata (heading/body separation, textLevel, textPath) as DART.
OpenAPI — Raw Public APIs
Use source-native wrappers when you want raw disclosure APIs directly.
OpenDart (Korea)
from dartlab import OpenDart
d = OpenDart() # auto-detect API key
d = OpenDart(["key1", "key2"]) # multi-key rotation
d.search("카카오", listed=True) # company search
d.filings("삼성전자", "2024") # filing list
d.company("삼성전자") # corporate profile
d.finstate("삼성전자", 2024) # financial statements
d.report("삼성전자", "배당", 2024) # 56 report categories
# convenience proxy
s = d("삼성전자")
s.finance(2024)
s.report("배당", 2024)
s.filings("2024")
OpenEdgar (US)
from dartlab import OpenEdgar
e = OpenEdgar()
e.search("Apple") # ticker search
e.company("AAPL") # company info
e.filings("AAPL", forms=["10-K", "10-Q"]) # filing list
e.companyFactsJson("AAPL") # XBRL facts
e.companyConceptJson("AAPL", "us-gaap", "Revenue") # single tag series
These wrappers keep the original source surface intact, while saved parquet stays compatible with DartLab's Company engine.
Core Ideas
1. Sections First
sections is the backbone. A company is described as one horizontalized map of disclosure units across periods — not a loose set of parser outputs.
2. Source-Aware Company
Company is a merged company object. When finance or report is more authoritative than docs for a given topic, it overrides automatically. trace() tells you which source was chosen and why.
3. Text Structure
Narrative text is not a flat string. DartLab splits it into heading/body rows with level and path metadata, enabling structural comparison across periods. This works for both Korean DART and English EDGAR filings.
4. Raw Access
You can always go deeper:
c.docs.sections # pure docs horizontalization
c.finance.BS # finance engine directly
c.report.extract("배당") # report engine directly
Stability
| Tier | Scope |
|---|---|
| Stable | DART Company (sections, show, trace, diff, BS/IS/CF, ratios, insights) |
| Beta | EDGAR Company, OpenDart, OpenEdgar, Server API |
| Experimental | AI tools, export |
See docs/stability.md.
Documentation
- Docs: https://eddmpython.github.io/dartlab/
- Sections guide: https://eddmpython.github.io/dartlab/docs/getting-started/sections
- Quick start: https://eddmpython.github.io/dartlab/docs/getting-started/quickstart
- API overview: https://eddmpython.github.io/dartlab/docs/api/overview
- Blog: https://eddmpython.github.io/dartlab/blog/
Data
DartLab ships with pre-built datasets via GitHub Releases:
| Dataset | Coverage | Source |
|---|---|---|
| DART docs | 260+ companies | Korean disclosure text + tables |
| DART finance | 2,700+ companies | XBRL financial statements |
| DART report | 2,700+ companies | Structured disclosure APIs |
| EDGAR docs | 970+ companies | 10-K/10-Q sections |
| EDGAR finance | 970+ companies | SEC XBRL facts |
Contributing
The project prefers experiments before engine changes. If you want to propose a parser or mapping change, validate it first and then bring the result back into the engine.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dartlab-0.6.0.tar.gz.
File metadata
- Download URL: dartlab-0.6.0.tar.gz
- Upload date:
- Size: 14.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a2559ebab3459fa5b7ad0b5e3365acf3554463f6616410e35fe07a71a83edb6
|
|
| MD5 |
449363d085861c87b8cdec69d5f5177a
|
|
| BLAKE2b-256 |
526268f755f080bb1c0becbf2945df4fe7e63758104fdef23d9ff5597c99a802
|
Provenance
The following attestation bundles were made for dartlab-0.6.0.tar.gz:
Publisher:
publish.yml on eddmpython/dartlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dartlab-0.6.0.tar.gz -
Subject digest:
8a2559ebab3459fa5b7ad0b5e3365acf3554463f6616410e35fe07a71a83edb6 - Sigstore transparency entry: 1131124254
- Sigstore integration time:
-
Permalink:
eddmpython/dartlab@9b89f8a4c8fbf7506b8ef7096ae76acf216f5607 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/eddmpython
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b89f8a4c8fbf7506b8ef7096ae76acf216f5607 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dartlab-0.6.0-py3-none-any.whl.
File metadata
- Download URL: dartlab-0.6.0-py3-none-any.whl
- Upload date:
- Size: 14.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8334a2bcd96097ecee14b1ce396bc2db45d4b7ed2512dd8bef6f1a78101471a0
|
|
| MD5 |
49d5d08c399fcba52e164914dccc1086
|
|
| BLAKE2b-256 |
e76c788654bd48b3bc612243e09cca959acc1bdf8892e0f987e690e89dc732bf
|
Provenance
The following attestation bundles were made for dartlab-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on eddmpython/dartlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dartlab-0.6.0-py3-none-any.whl -
Subject digest:
8334a2bcd96097ecee14b1ce396bc2db45d4b7ed2512dd8bef6f1a78101471a0 - Sigstore transparency entry: 1131124310
- Sigstore integration time:
-
Permalink:
eddmpython/dartlab@9b89f8a4c8fbf7506b8ef7096ae76acf216f5607 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/eddmpython
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b89f8a4c8fbf7506b8ef7096ae76acf216f5607 -
Trigger Event:
push
-
Statement type: