Skip to main content

Open-source deterministic SEC filing analytics: parse, metrics, diff, score

Project description

Disclosure Alpha: Turn SEC filing language into reproducible risk scores

Python 3.11+ PyPI License: Apache-2.0 Documentation CI

Extract sections, measure tone and boilerplate, detect year-over-year changes, and screen peers.
Deterministic, versioned JSON. No LLM required.

Get started

What it is

Open-source, deterministic SEC filing analytics for 10-K, 10-Q, and 8-K HTML. Reproducible JSON scores from text metrics, boolean risk flags, and section diffs. Self-hosted CLI, Python SDK, HTTP API, and MCP.

What it is not

  • Not investment advice or a trading signal
  • Not a substitute for reading the filing
  • Not composite LLM scoring (open-source HTTP API is deterministic only; view=composite returns 402)

Full scope and limits: Evidence & limitations.

Why Disclosure Alpha

Comparing risk-factor and MD&A language across filings, or against a company's prior year, is slow manual work. Disclosure Alpha extracts SEC sections, runs reproducible text metrics and diffs, and returns sortable JSON scores you can wire into notebooks, screeners, or agents. The same deterministic engine powers every integration surface, with version strings in every response for reproducibility.

What you can do

Disclosure Alpha delivers deterministic scores (nine components, 0-100), section extraction from 10-K/10-Q/8-K HTML, year-over-year change detection, and four integration surfaces (section taxonomy).

Task How
Score one company disclosure-alpha score --ticker AAPL --fiscal-year 2025 --form 10-K
Screen up to 25 tickers HTTP POST /v1/panel/disclosure-matrix
Compare year-over-year --prior-html prior.html or HTTP compare=prior
Work offline (no EDGAR) disclosure-alpha score --html filing.html --form 10-K
Inspect raw signals disclosure-alpha metrics … or GET /disclosure-metrics
Pull boolean risk flags GET /disclosure-flags
Debug section extraction disclosure-alpha extract … or GET /sections
# Screen a peer set (start disclosure-alpha-api first)
curl -s -X POST "http://localhost:8000/v1/panel/disclosure-matrix" \
  -H "Content-Type: application/json" \
  -d '{"tickers": ["AAPL", "MSFT", "GOOGL"], "fiscal_year": 2025, "form_type": "10-K"}'

# Year-over-year change from local HTML (no network required)
disclosure-alpha score --html current.html --form 10-K --prior-html prior.html

# Raw metrics without headline aggregation
disclosure-alpha metrics --ticker AAPL --fiscal-year 2025 --form 10-K

Copy-paste recipes: Workflows.

How it works

Same pipeline powers every integration surface.

flowchart TB
  ingest["Ingest (HTML or EDGAR)"]
  extract["extract_sections_from_html()"]
  metrics["compute_section_metrics()"]
  aggregate["aggregate_deterministic_matrix()"]
  output["ScoreResult JSON"]

  ingest --> extract
  extract --> metrics
  metrics --> aggregate
  aggregate --> output

  subgraph deterministic ["Deterministic stage"]
    metrics
  end

Score signals

Nine weighted components (0-100; higher = more disclosure risk) feed the headline overall_disclosure_risk_score:

Signal What it captures
Risk-factor intensity Negative and uncertainty tone in Item 1A
Disclosure change Year-over-year language shift vs prior filing
MD&A uncertainty Demand stress and margin pressure in MD&A
Legal / regulatory risk Investigation and litigation language + flags
Liquidity stress Covenant and cash-flow stress signals
Boilerplate Vague, templated risk language
Internal controls Weakness signals in controls disclosures
Event severity Material changes in risk text (diff-only)
Tone negativity Cross-section negative language

Scale: 0-25 low concern · 26-50 moderate · 51-75 elevated · 76-100 high. Higher = more disclosure risk, except specificity_quality_score (higher = more specific).

specificity_quality_score is also returned but is excluded from headline weights. Full field guide: Understanding scores.

Who it's for

You are… Start with…
Researcher / notebook user CLI or Python SDK
Building a screener or dashboard HTTP API + Panel
Wiring Cursor / Claude MCP Analyst
Custom agent pipeline MCP Builder

Not sure? See Choose your surface.

Quick start

Requires Python 3.11+.

1. Install from PyPI

pip install "disclosure-alpha[dev]"

For HTTP API and MCP: pip install "disclosure-alpha[api,mcp,dev]". Full install options: Installation.

2. Set your SEC User-Agent

export SEC_USER_AGENT="YourName your@email.com"

Required for ticker/EDGAR commands. See SEC EDGAR setup.

3. Score a filing

disclosure-alpha score --ticker AAPL --fiscal-year 2025 --form 10-K \
  | jq '.scores.overall_disclosure_risk_score'
from disclosure_alpha import score_filing_ticker
result = score_filing_ticker("AAPL", 2025, form_type="10-K")
print(result.scores.overall_disclosure_risk_score)

Integrate your way

Surface Entry Granularity
CLI disclosure-alpha extractmetricsscore (stepwise or full pipeline)
Python import disclosure_alpha Same pipeline as CLI; compose in notebooks
HTTP API disclosure-alpha-api 8 endpoints: filings, sections, metrics, matrix, flags, changes, panel
MCP Analyst disclosure-alpha-mcp-analyst Ticker discovery + score (2 tools)
MCP Builder disclosure-alpha-mcp-builder Full pipeline as 5 composable tools

HTTP matrix tiers: tier=lite (headline score), tier=standard (components + metrics), tier=analyst (provenance for audit).

# Single-ticker dashboard headline (start disclosure-alpha-api first)
curl "http://localhost:8000/v1/company/AAPL/disclosure-matrix?fiscal_year=2025&form_type=10-K&tier=lite"

disclosure-alpha-api              # HTTP on :8000
disclosure-alpha-mcp-analyst      # MCP for Cursor / Claude Desktop

Endpoint map, Postman collections (docs/postman/), and MCP tool reference: Guides.

MCP in Cursor

Add to your MCP settings (Analyst bundle; requires pip install "disclosure-alpha[mcp,dev]"):

{
  "mcpServers": {
    "disclosure-alpha": {
      "command": "disclosure-alpha-mcp-analyst",
      "env": {
        "SEC_USER_AGENT": "YourName your@email.com"
      }
    }
  }
}

Full MCP guide: MCP (Builder bundle for raw HTML pipelines).

Research-backed

Validated on ~425 S&P 500 FY2025 10-Ks (~84% of the index):

Check Result
Language quality Boilerplate and specificity scores correlate with independent text measures (Spearman ρ ~0.68 / ~0.84)
Real-world signal Higher disclosure risk scores associate with higher 90-day post-filing volatility in the same cohort

Metrics draw on finance text-analysis literature (Loughran-McDonald tone proxies, boilerplate and specificity measures). See Research foundation.

Research tool, not investment advice. Read the underlying filings. Full scope and limits: Evidence & limitations.

Example output

See Understanding scores for field definitions.

Single filing score (synthetic 10-K):

{
  "scores": {
    "overall_disclosure_risk_score": 17.84,
    "score_coverage_ratio": 0.7778,
    "components": {
      "risk_factor_intensity_score": 8.62,
      "boilerplate_risk_score": 42.53,
      "legal_regulatory_risk_score": 25.34
    }
  }
}

More examples (YoY change, panel screener): docs/examples/ and Workflows.

Documentation

I want to… Start here
Copy-paste recipes Workflows
Interpret scores Understanding scores
Score from terminal Quickstart CLI
Build a screener HTTP guides
Wire an agent MCP guide
See methodology Methodology overview

License

Apache-2.0. See LICENSE.

Contributors

See CONTRIBUTING.md for development setup, tests, and docs build.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disclosure_alpha-1.0.0.tar.gz (662.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

disclosure_alpha-1.0.0-py3-none-any.whl (78.4 kB view details)

Uploaded Python 3

File details

Details for the file disclosure_alpha-1.0.0.tar.gz.

File metadata

  • Download URL: disclosure_alpha-1.0.0.tar.gz
  • Upload date:
  • Size: 662.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for disclosure_alpha-1.0.0.tar.gz
Algorithm Hash digest
SHA256 71fef19ce0c50efd5c2eecc39bb06e03ed305165fdeeb371f020ed0128990a12
MD5 e0ca732b6eae56eeb773d731fbc5e69a
BLAKE2b-256 bea472faedae5d38d9ee8f296b1372e8f3154c5806165229104c95f526273b3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for disclosure_alpha-1.0.0.tar.gz:

Publisher: publish.yml on alwank/disclosure-alpha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file disclosure_alpha-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for disclosure_alpha-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03beae380caaad7de91b561a7858c325508ebc40aadff3d37c41c128a087b113
MD5 20daa23b2e5950aa6d691c1d8cc2c402
BLAKE2b-256 2b813e90ac78d2261cd4408c62766ecf131983d929e3ac1c13f298e66fbb97b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for disclosure_alpha-1.0.0-py3-none-any.whl:

Publisher: publish.yml on alwank/disclosure-alpha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page