Extract structured data from SEC filings using LLM + Pydantic presets

These details have not been verified by PyPI

Project links

Homepage

Project description

SEC-Analyzer

Extract structured data from SEC filings using LLM + Pydantic presets.

Turn any SEC filing (10-K, 10-Q, 20-F, DEF 14A, ...) into structured JSON — define a Pydantic model, and the library does the rest.

Installation · Quick Start · Custom Presets · API Reference · CLI

Why This Library?

SEC filings contain invaluable data — supply chains, revenue concentration, executive compensation, risk factors — but every filing has a different format. Traditional parsing breaks constantly.

This library uses LLM structured output (Gemini) to extract exactly the data you define in a Pydantic model. The LLM reads the filing and fills in your schema. No regex, no HTML parsing, no breakage.

from sec_analyzer import extract
from sec_analyzer.presets import SupplyChain

result = extract("NVDA", preset=SupplyChain)
print(result["data"]["suppliers"])
# [{'entity': 'Taiwan Semiconductor Manufacturing Company Limited',
#   'relationship': 'foundry for semiconductor wafers',
#   'context': 'We utilize foundries, such as TSMC and Samsung...'}, ...]

Installation

pip install sec-analyzer

Requires Python 3.10+ and a Google AI API key.

Quick Start

1. Set your API key

export GOOGLE_API_KEY="your-key-here"
export EDGAR_IDENTITY="YourApp/1.0 your@email.com"

Or create a .env file:

GOOGLE_API_KEY=your-key-here
EDGAR_IDENTITY=YourApp/1.0 your@email.com

2. Extract data

from sec_analyzer import extract
from sec_analyzer.presets import SupplyChain

# Latest 10-K
result = extract("NVDA", preset=SupplyChain)

# Specific form
result = extract("TSM", preset=SupplyChain, form="20-F")

# Specific filing date
result = extract("AAPL", preset=SupplyChain, filing_date="2025-10-30")

3. Use the result

filing = result["filing"]
# {'form': '10-K', 'filing_date': '2026-02-25', 'accession_number': '...', 'filing_url': '...'}

data = result["data"]
print(f"Suppliers: {len(data['suppliers'])}")
print(f"Customers: {len(data['customers'])}")
print(f"Single-source deps: {len(data['single_source_dependencies'])}")

Custom Presets

The real power: define your own Pydantic model to extract anything.

Basic custom preset

from pydantic import BaseModel, Field
from sec_analyzer import extract

class RiskFactors(BaseModel):
    regulatory_risks: list[dict] = Field(
        default_factory=list,
        description="Government regulations that could impact the business"
    )
    litigation: list[dict] = Field(
        default_factory=list,
        description="Pending lawsuits and legal proceedings"
    )
    cybersecurity_risks: list[dict] = Field(
        default_factory=list,
        description="Data breach and cybersecurity threats"
    )

result = extract("META", preset=RiskFactors)

When no __prompt__ is defined, the library auto-generates a prompt from your field descriptions.

Advanced: custom prompt

For expert-level control, add a __prompt__ class variable:

from typing import ClassVar
from pydantic import BaseModel, Field

class ExecutiveComp(BaseModel):
    __prompt__: ClassVar[str] = """\
You are analyzing a DEF 14A proxy statement for {company_name}.
Extract executive compensation data from the Summary Compensation Table
and related disclosure sections.

Rules:
1. Include only Named Executive Officers (NEOs)
2. All dollar amounts in exact figures from the filing
3. Include stock awards, option awards, and non-equity incentive plan separately

Filing text:
{filing_text}
"""

    executives: list[dict] = Field(description="NEO compensation details")
    equity_awards: list[dict] = Field(description="Stock and option grant details")

result = extract("AAPL", preset=ExecutiveComp, form="DEF 14A")

The {company_name} and {filing_text} placeholders are filled automatically.

Built-in Presets

`SupplyChain`

Extracts 11 categories of supply chain intelligence from 10-K/10-Q/20-F filings:

Category	Description
`suppliers`	Companies supplying products/materials/services
`customers`	Companies purchasing products/services
`single_source_dependencies`	Components with sole-source suppliers
`geographic_concentration`	Manufacturing/sourcing location concentration
`capacity_constraints`	Production limitations and lead times
`supply_chain_risks`	Disruption risks (tariffs, shortages, geopolitical)
`revenue_concentration`	Customer/segment revenue % from Notes
`geographic_revenue`	Revenue by country/region from Notes
`purchase_obligations`	Commitments and take-or-pay contracts
`market_risk_disclosures`	Commodity/FX/interest rate exposures (Item 7A)
`inventory_composition`	Raw materials/WIP/finished goods breakdown

API Reference

`extract(symbol, preset, form="10-K", filing_date=None, max_chars=2_000_000, api_key=None, model=None)`

Parameter	Type	Description
`symbol`	str	Ticker symbol (e.g., "NVDA")
`preset`	BaseModel class	Pydantic model defining extraction schema
`form`	str	Filing type. Auto-fallback 10-K → 20-F
`filing_date`	str	Specific date (YYYY-MM-DD). None = latest
`max_chars`	int	Max filing markdown length
`api_key`	str	Google API key (fallback: `GOOGLE_API_KEY` env)
`model`	str	Gemini model (fallback: `GOOGLE_MODEL` env, default: `gemini-2.5-flash`)

Returns {"filing": {...}, "data": {...}}

CLI

# Supply chain extraction (default)
sec-analyzer NVDA

# Specific form
sec-analyzer TSM --form 20-F

# Compact JSON
sec-analyzer NVDA --json

# Specific filing date
sec-analyzer AAPL --filing-date 2025-10-30

How It Works

1. edgartools finds the filing on SEC EDGAR
2. Filing converted to markdown (tables preserved)
3. Full markdown + Pydantic schema sent to Gemini
4. Gemini returns structured JSON matching the schema
5. Pydantic validates and returns typed data

The key insight: Gemini's structured output mode forces the response to match your Pydantic schema exactly. No post-processing, no regex, no parsing.

Environment Variables

Variable	Required	Default	Description
`GOOGLE_API_KEY`	Yes	-	Google AI API key
`EDGAR_IDENTITY`	No	`SECAnalyzer/1.0 user@example.com`	SEC EDGAR User-Agent
`GOOGLE_MODEL`	No	`gemini-2.5-flash`	Gemini model ID

Disclaimer

This project is not affiliated with the SEC, EDGAR, or Google. Filing data comes from SEC EDGAR (public). LLM extraction may contain errors — always verify critical data against the original filing.

This tool is for research and educational purposes only. It is not financial advice.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sec_analyzer-0.1.0.tar.gz (68.7 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sec_analyzer-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file sec_analyzer-0.1.0.tar.gz.

File metadata

Download URL: sec_analyzer-0.1.0.tar.gz
Upload date: Mar 23, 2026
Size: 68.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sec_analyzer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`061ea6e6b9dfa73d4b7c50f1fdd1846d3ef6125afe2bc03a5e4984640cd7fad8`
MD5	`ec17d5662f2a71314e645d0e82816779`
BLAKE2b-256	`2a25dd077a233847f6d943c870c975cea14411add5190d1aee551217446a6de9`

See more details on using hashes here.

File details

Details for the file sec_analyzer-0.1.0-py3-none-any.whl.

File metadata

Download URL: sec_analyzer-0.1.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sec_analyzer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53d9a6e59f150bf7dfbda5f1efa678886788ebfc718516797253bad789d3ff65`
MD5	`65d701ee17f908cd0c7d86e6c1fc1a9b`
BLAKE2b-256	`28a5bc7e22a9c96f072130907f0b3fa332aca67f2d03eb6841c29f80050b79b5`

See more details on using hashes here.

sec-analyzer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SEC-Analyzer

Why This Library?

Installation

Quick Start

1. Set your API key

2. Extract data

3. Use the result

Custom Presets

Basic custom preset

Advanced: custom prompt

Built-in Presets

SupplyChain

API Reference

extract(symbol, preset, form="10-K", filing_date=None, max_chars=2_000_000, api_key=None, model=None)

CLI

How It Works

Environment Variables

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SupplyChain`

`extract(symbol, preset, form="10-K", filing_date=None, max_chars=2_000_000, api_key=None, model=None)`