Open-source deterministic SEC filing analytics: parse, metrics, diff, score
Project description
Extract sections, measure tone and boilerplate, detect year-over-year changes, and screen peers.
Deterministic, versioned JSON. No LLM required.
What it is
Open-source, deterministic SEC filing analytics for 10-K, 10-Q, and 8-K HTML. Reproducible JSON scores from text metrics, boolean risk flags, and section diffs. Self-hosted CLI, Python SDK, HTTP API, and MCP.
What it is not
- Not investment advice or a trading signal
- Not a substitute for reading the filing
- Not composite LLM scoring (open-source HTTP API is deterministic only;
view=compositereturns 402)
Full scope and limits: Evidence & limitations.
Why Disclosure Alpha
Comparing risk-factor and MD&A language across filings, or against a company's prior year, is slow manual work. Disclosure Alpha extracts SEC sections, runs reproducible text metrics and diffs, and returns sortable JSON scores you can wire into notebooks, screeners, or agents. The same deterministic engine powers every integration surface, with version strings in every response for reproducibility.
What you can do
Disclosure Alpha delivers deterministic scores (nine components, 0-100), section extraction from 10-K/10-Q/8-K HTML, year-over-year change detection, and four integration surfaces (section taxonomy).
| Task | How |
|---|---|
| Score one company | disclosure-alpha score --ticker AAPL --fiscal-year 2025 --form 10-K |
| Screen up to 25 tickers | HTTP POST /v1/panel/disclosure-matrix |
| Compare year-over-year | --prior-html prior.html or HTTP compare=prior |
| Work offline (no EDGAR) | disclosure-alpha score --html filing.html --form 10-K |
| Inspect raw signals | disclosure-alpha metrics … or GET /disclosure-metrics |
| Pull boolean risk flags | GET /disclosure-flags |
| Debug section extraction | disclosure-alpha extract … or GET /sections |
# Screen a peer set (start disclosure-alpha-api first)
curl -s -X POST "http://localhost:8000/v1/panel/disclosure-matrix" \
-H "Content-Type: application/json" \
-d '{"tickers": ["AAPL", "MSFT", "GOOGL"], "fiscal_year": 2025, "form_type": "10-K"}'
# Year-over-year change from local HTML (no network required)
disclosure-alpha score --html current.html --form 10-K --prior-html prior.html
# Raw metrics without headline aggregation
disclosure-alpha metrics --ticker AAPL --fiscal-year 2025 --form 10-K
Copy-paste recipes: Workflows.
How it works
Same pipeline powers every integration surface.
flowchart TB
ingest["Ingest (HTML or EDGAR)"]
extract["extract_sections_from_html()"]
metrics["compute_section_metrics()"]
aggregate["aggregate_deterministic_matrix()"]
output["ScoreResult JSON"]
ingest --> extract
extract --> metrics
metrics --> aggregate
aggregate --> output
subgraph deterministic ["Deterministic stage"]
metrics
end
Score signals
Nine weighted components (0-100; higher = more disclosure risk) feed the headline overall_disclosure_risk_score:
| Signal | What it captures |
|---|---|
| Risk-factor intensity | Negative and uncertainty tone in Item 1A |
| Disclosure change | Year-over-year language shift vs prior filing |
| MD&A uncertainty | Demand stress and margin pressure in MD&A |
| Legal / regulatory risk | Investigation and litigation language + flags |
| Liquidity stress | Covenant and cash-flow stress signals |
| Boilerplate | Vague, templated risk language |
| Internal controls | Weakness signals in controls disclosures |
| Event severity | Material changes in risk text (diff-only) |
| Tone negativity | Cross-section negative language |
Scale: 0-25 low concern · 26-50 moderate · 51-75 elevated · 76-100 high. Higher = more disclosure risk, except specificity_quality_score (higher = more specific).
specificity_quality_score is also returned but is excluded from headline weights. Full field guide: Understanding scores.
Who it's for
| You are… | Start with… |
|---|---|
| Researcher / notebook user | CLI or Python SDK |
| Building a screener or dashboard | HTTP API + Panel |
| Wiring Cursor / Claude | MCP Analyst |
| Custom agent pipeline | MCP Builder |
Not sure? See Choose your surface.
Quick start
Requires Python 3.11+.
1. Install from PyPI
pip install "disclosure-alpha[dev]"
For HTTP API and MCP: pip install "disclosure-alpha[api,mcp,dev]". Full install options: Installation.
2. Set your SEC User-Agent
export SEC_USER_AGENT="YourName your@email.com"
Required for ticker/EDGAR commands. See SEC EDGAR setup.
3. Score a filing
disclosure-alpha score --ticker AAPL --fiscal-year 2025 --form 10-K \
| jq '.scores.overall_disclosure_risk_score'
from disclosure_alpha import score_filing_ticker
result = score_filing_ticker("AAPL", 2025, form_type="10-K")
print(result.scores.overall_disclosure_risk_score)
Integrate your way
| Surface | Entry | Granularity |
|---|---|---|
| CLI | disclosure-alpha |
extract → metrics → score (stepwise or full pipeline) |
| Python | import disclosure_alpha |
Same pipeline as CLI; compose in notebooks |
| HTTP API | disclosure-alpha-api |
8 endpoints: filings, sections, metrics, matrix, flags, changes, panel |
| MCP Analyst | disclosure-alpha-mcp-analyst |
Ticker discovery + score (2 tools) |
| MCP Builder | disclosure-alpha-mcp-builder |
Full pipeline as 5 composable tools |
HTTP matrix tiers: tier=lite (headline score), tier=standard (components + metrics), tier=analyst (provenance for audit).
# Single-ticker dashboard headline (start disclosure-alpha-api first)
curl "http://localhost:8000/v1/company/AAPL/disclosure-matrix?fiscal_year=2025&form_type=10-K&tier=lite"
disclosure-alpha-api # HTTP on :8000
disclosure-alpha-mcp-analyst # MCP for Cursor / Claude Desktop
Endpoint map, Postman collections (docs/postman/), and MCP tool reference: Guides.
MCP in Cursor
Add to your MCP settings (Analyst bundle; requires pip install "disclosure-alpha[mcp,dev]"):
{
"mcpServers": {
"disclosure-alpha": {
"command": "disclosure-alpha-mcp-analyst",
"env": {
"SEC_USER_AGENT": "YourName your@email.com"
}
}
}
}
Full MCP guide: MCP (Builder bundle for raw HTML pipelines).
Research-backed
Validated on ~425 S&P 500 FY2025 10-Ks (~84% of the index):
| Check | Result |
|---|---|
| Language quality | Boilerplate and specificity scores correlate with independent text measures (Spearman ρ ~0.68 / ~0.84) |
| Real-world signal | Higher disclosure risk scores associate with higher 90-day post-filing volatility in the same cohort |
Metrics draw on finance text-analysis literature (Loughran-McDonald tone proxies, boilerplate and specificity measures). See Research foundation.
Research tool, not investment advice. Read the underlying filings. Full scope and limits: Evidence & limitations.
Example output
See Understanding scores for field definitions.
Single filing score (synthetic 10-K):
{
"scores": {
"overall_disclosure_risk_score": 17.84,
"score_coverage_ratio": 0.7778,
"components": {
"risk_factor_intensity_score": 8.62,
"boilerplate_risk_score": 42.53,
"legal_regulatory_risk_score": 25.34
}
}
}
More examples (YoY change, panel screener): docs/examples/ and Workflows.
Documentation
| I want to… | Start here |
|---|---|
| Copy-paste recipes | Workflows |
| Interpret scores | Understanding scores |
| Score from terminal | Quickstart CLI |
| Build a screener | HTTP guides |
| Wire an agent | MCP guide |
| See methodology | Methodology overview |
License
Apache-2.0. See LICENSE.
Contributors
See CONTRIBUTING.md for development setup, tests, and docs build.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file disclosure_alpha-1.0.0.tar.gz.
File metadata
- Download URL: disclosure_alpha-1.0.0.tar.gz
- Upload date:
- Size: 662.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71fef19ce0c50efd5c2eecc39bb06e03ed305165fdeeb371f020ed0128990a12
|
|
| MD5 |
e0ca732b6eae56eeb773d731fbc5e69a
|
|
| BLAKE2b-256 |
bea472faedae5d38d9ee8f296b1372e8f3154c5806165229104c95f526273b3b
|
Provenance
The following attestation bundles were made for disclosure_alpha-1.0.0.tar.gz:
Publisher:
publish.yml on alwank/disclosure-alpha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
disclosure_alpha-1.0.0.tar.gz -
Subject digest:
71fef19ce0c50efd5c2eecc39bb06e03ed305165fdeeb371f020ed0128990a12 - Sigstore transparency entry: 1900646252
- Sigstore integration time:
-
Permalink:
alwank/disclosure-alpha@046839cfafac2f9aa1305024847e1b673d2d97c0 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/alwank
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@046839cfafac2f9aa1305024847e1b673d2d97c0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file disclosure_alpha-1.0.0-py3-none-any.whl.
File metadata
- Download URL: disclosure_alpha-1.0.0-py3-none-any.whl
- Upload date:
- Size: 78.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03beae380caaad7de91b561a7858c325508ebc40aadff3d37c41c128a087b113
|
|
| MD5 |
20daa23b2e5950aa6d691c1d8cc2c402
|
|
| BLAKE2b-256 |
2b813e90ac78d2261cd4408c62766ecf131983d929e3ac1c13f298e66fbb97b3
|
Provenance
The following attestation bundles were made for disclosure_alpha-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on alwank/disclosure-alpha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
disclosure_alpha-1.0.0-py3-none-any.whl -
Subject digest:
03beae380caaad7de91b561a7858c325508ebc40aadff3d37c41c128a087b113 - Sigstore transparency entry: 1900646373
- Sigstore integration time:
-
Permalink:
alwank/disclosure-alpha@046839cfafac2f9aa1305024847e1b673d2d97c0 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/alwank
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@046839cfafac2f9aa1305024847e1b673d2d97c0 -
Trigger Event:
release
-
Statement type: