Skip to main content

공시 문서에서 하나의 회사 맵을 만든다 — DART + EDGAR

Project description


DartLab

DartLab

One company map from disclosure filings — DART + EDGAR

PyPI Python License CI

Docs · 한국어 · Sponsor

Docs Data Finance Data Report Data

What DartLab Is

DartLab turns corporate filings into a single company map — for both Korean DART and US EDGAR.

The center of that map is sections: a horizontalized matrix built from disclosure sections across periods. Instead of treating a filing as a pile of unrelated parsers, DartLab aligns the document structure first, then lets stronger sources fill in what they own:

  • docs — section structure, narrative text with heading/body separation, tables, and evidence
  • finance — authoritative numeric statements (BS, IS, CF) and financial ratios
  • report — authoritative structured disclosure APIs (DART only)
import dartlab

c = dartlab.Company("005930")   # Samsung Electronics (DART)
c.sections                      # full company map (topic × period)
c.topics                        # topic list with source, blocks, periods
c.show("companyOverview")       # open one topic
c.show("IS", period=["2024Q4", "2023Q4"])  # vertical view (period × item)
c.BS                            # balance sheet
c.ratios                        # ratio time series (항목 × period)
c.insights                      # 7-area grades (A~F)

us = dartlab.Company("AAPL")    # Apple (EDGAR)
us.sections
us.show("10-K::item1Business")
us.BS
us.ratios

Install

uv add dartlab

AI interface:

uv add "dartlab[ai]"
uv run dartlab ai

Quick Start

Sections — The Company Map

sections is a Polars DataFrame where each row is a disclosure block and each period column holds the raw payload. Periods are sorted newest-first, and annual reports appear as Q4:

chapter │ topic            │ blockType │ textNodeType │ 2025Q4 │ 2024Q4 │ 2024Q3 │ …
I       │ companyOverview  │ text      │ heading      │ "…"    │ "…"    │ "…"    │
I       │ companyOverview  │ text      │ body         │ "…"    │ "…"    │ "…"    │
I       │ companyOverview  │ table     │ null         │ "…"    │ "…"    │ null   │
II      │ businessOverview │ text      │ heading      │ "…"    │ "…"    │ "…"    │
III     │ BS               │ table     │ null         │ —      │ —      │ —      │ (finance)
VII     │ dividend         │ table     │ null         │ —      │ —      │ —      │ (report)

Text blocks carry structural metadata — textNodeType (heading/body), textLevel, and textPath — so you can distinguish section headers from narrative content.

Show, Trace, Diff

c = dartlab.Company("005930")

# show — open any topic with source-aware priority
c.show("BS")                # → finance DataFrame
c.show("companyOverview")   # → sections-based text + tables
c.show("dividend")          # → report DataFrame (all quarters)

# vertical view — compare specific periods side by side
c.show("IS", period=["2024Q4", "2023Q4"])  # period × item

# trace — why a topic came from docs, finance, or report
c.trace("BS")               # → {"primarySource": "finance", ...}

# diff — text change detection (3 modes)
c.diff()                                    # full summary
c.diff("businessOverview")                  # topic history
c.diff("businessOverview", "2024", "2025")  # line-by-line diff

Finance

c.BS                    # balance sheet (account × period, newest first)
c.IS                    # income statement
c.CF                    # cash flow
c.ratios                # ratio time series DataFrame (6 categories × period)
c.finance.ratios        # latest single-point RatioResult
c.finance.ratioSeries   # ratio time series across years
c.finance.timeseries    # raw account time series

Financial ratios cover 6 categories: profitability, stability, growth, efficiency, cashflow, and valuation.

Insights

c.insights                      # 7-area analysis
c.insights.grades()             # → {"performance": "A", "profitability": "B", …}
c.insights.performance.grade    # → "A"
c.insights.performance.details  # → ["Revenue growth +8.3%", …]
c.insights.anomalies            # → outliers and red flags

7 analysis areas: performance, profitability, health, cashflow, governance, risk, opportunity.

EDGAR (US)

Same Company interface, different data source:

us = dartlab.Company("AAPL")

us.sections                         # 10-K/10-Q sections with heading/body
us.show("10-K::item1Business")      # business description
us.show("10-K::item1ARiskFactors")  # risk factors
us.BS                               # SEC XBRL balance sheet
us.ratios                           # same 47 ratios
us.diff("10-K::item7Mdna")          # MD&A text changes

EDGAR sections include the same text structure metadata (heading/body separation, textLevel, textPath) as DART.

OpenAPI — Raw Public APIs

Use source-native wrappers when you want raw disclosure APIs directly.

OpenDart (Korea)

from dartlab import OpenDart

d = OpenDart()                                  # auto-detect API key
d = OpenDart(["key1", "key2"])                  # multi-key rotation

d.search("카카오", listed=True)                  # company search
d.filings("삼성전자", "2024")                    # filing list
d.company("삼성전자")                            # corporate profile
d.finstate("삼성전자", 2024)                     # financial statements
d.report("삼성전자", "배당", 2024)                # 56 report categories

# convenience proxy
s = d("삼성전자")
s.finance(2024)
s.report("배당", 2024)
s.filings("2024")

OpenEdgar (US)

from dartlab import OpenEdgar

e = OpenEdgar()

e.search("Apple")                               # ticker search
e.company("AAPL")                               # company info
e.filings("AAPL", forms=["10-K", "10-Q"])       # filing list
e.companyFactsJson("AAPL")                      # XBRL facts
e.companyConceptJson("AAPL", "us-gaap", "Revenue")  # single tag series

These wrappers keep the original source surface intact, while saved parquet stays compatible with DartLab's Company engine.

Core Ideas

1. Sections First

sections is the backbone. A company is described as one horizontalized map of disclosure units across periods — not a loose set of parser outputs.

2. Source-Aware Company

Company is a merged company object. When finance or report is more authoritative than docs for a given topic, it overrides automatically. trace() tells you which source was chosen and why.

3. Text Structure

Narrative text is not a flat string. DartLab splits it into heading/body rows with level and path metadata, enabling structural comparison across periods. This works for both Korean DART and English EDGAR filings.

4. Raw Access

You can always go deeper:

c.docs.sections          # pure docs horizontalization
c.finance.BS             # finance engine directly
c.report.extract("배당")  # report engine directly

Stability

Tier Scope
Stable DART Company (sections, show, trace, diff, BS/IS/CF, ratios, insights)
Beta EDGAR Company, OpenDart, OpenEdgar, Server API
Experimental AI tools, export

See docs/stability.md.

Documentation

Data

DartLab ships with pre-built datasets via GitHub Releases:

Dataset Coverage Source
DART docs 260+ companies Korean disclosure text + tables
DART finance 2,700+ companies XBRL financial statements
DART report 2,700+ companies Structured disclosure APIs
EDGAR docs 970+ companies 10-K/10-Q sections
EDGAR finance 970+ companies SEC XBRL facts

Contributing

The project prefers experiments before engine changes. If you want to propose a parser or mapping change, validate it first and then bring the result back into the engine.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dartlab-0.6.0.tar.gz (14.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dartlab-0.6.0-py3-none-any.whl (14.2 MB view details)

Uploaded Python 3

File details

Details for the file dartlab-0.6.0.tar.gz.

File metadata

  • Download URL: dartlab-0.6.0.tar.gz
  • Upload date:
  • Size: 14.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dartlab-0.6.0.tar.gz
Algorithm Hash digest
SHA256 8a2559ebab3459fa5b7ad0b5e3365acf3554463f6616410e35fe07a71a83edb6
MD5 449363d085861c87b8cdec69d5f5177a
BLAKE2b-256 526268f755f080bb1c0becbf2945df4fe7e63758104fdef23d9ff5597c99a802

See more details on using hashes here.

Provenance

The following attestation bundles were made for dartlab-0.6.0.tar.gz:

Publisher: publish.yml on eddmpython/dartlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dartlab-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: dartlab-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dartlab-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8334a2bcd96097ecee14b1ce396bc2db45d4b7ed2512dd8bef6f1a78101471a0
MD5 49d5d08c399fcba52e164914dccc1086
BLAKE2b-256 e76c788654bd48b3bc612243e09cca959acc1bdf8892e0f987e690e89dc732bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for dartlab-0.6.0-py3-none-any.whl:

Publisher: publish.yml on eddmpython/dartlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page