Skip to main content

Causal signal extraction from SEC filings using LLMs

Project description

sigint

CI Python 3.11+ License: MIT

Causal signal extraction from SEC filings using LLMs.

sigint turns filing text into structured, timestamped trading and monitoring signals. The focus is not generic sentiment, but directional changes in risk language, supplier exposure, M&A patterns, and topic-specific management tone.


At a Glance

  • Async EDGAR ingestion with filing parsing and section extraction
  • LLM-assisted extraction engines for risk, supply chain, M&A, and tone
  • Timestamped signal schema designed for storage and backtesting
  • Supply-chain graph construction for second-order exposure analysis
  • Parquet, DuckDB, API, and webhook outputs for downstream workflows

Every quant fund scrapes SEC filings. Sentiment analysis on 10-K/10-Q text is a solved, commoditized problem with zero alpha left. sigint does something different: it extracts causal, structural relationships buried in filings -- supply chain dependencies, risk factor escalations, M&A language patterns, and topic-level management tone shifts -- and compiles them into timestamped, backtestable signals.

Why This Exists

The difference between "sentiment is positive" (useless) and "Company X just added 'supply chain concentration risk' to their 10-K for the first time, and their top supplier is Company Y which reports next week" (actionable).

Research shows (Lazy Prices, Cohen et al. 2020) that changes in 10-K language are among the strongest predictors of future returns. sigint operationalizes this insight.

Showcase

sigint supply-chain graph

Supply-chain dependency graph rendered from a small local signal set using the built-in graph utilities.

Architecture

graph TD
    A[EDGAR API] -->|10-K, 10-Q, 8-K| B[Section Parser]
    B -->|Risk Factors, MD&A, Business| C{Extraction Engines}
    C --> D[Supply Chain Graph Builder]
    C --> E[Risk Factor Differ]
    C --> F[M&A Signal Detector]
    C --> G[Management Tone Analyzer]
    D --> H[Signal Compiler]
    E --> H
    F --> H
    G --> H
    H --> I[Parquet Export]
    H --> J[DuckDB Storage]
    H --> K[REST API]
    H --> L[Webhook Alerts]

Extraction Engines

Engine What It Does Key Insight
Supply Chain Extracts supplier/customer/partner relationships into a knowledge graph When TSMC has a disruption, know exactly which companies are exposed
Risk Differ Diffs Item 1A between consecutive filings; classifies NEW, REMOVED, ESCALATED, DE_ESCALATED Legal language changes are the strongest predictive signals (Lazy Prices)
M&A Detector Identifies strategic-alternatives language, advisor engagements, cash positioning shifts Certain filing patterns strongly precede M&A announcements
Tone Analyzer Tracks topic-specific management tone across filings on a 6-point scale Not "positive/negative" but "confident -> hedging" on specific topics

Quick Start

Installation

pip install sigint

Basic Usage

import asyncio
from sigint import Pipeline

async def main():
    pipeline = Pipeline(
        model="claude-sonnet-4-6",
        user_agent="Your Name your@email.com",
    )

    signals = await pipeline.extract(
        tickers=["AAPL", "MSFT", "GOOGL"],
        filing_types=["10-K", "10-Q"],
        lookback_years=3,
        engines=["supply_chain", "risk_differ", "m_and_a", "tone"],
    )

    # Filter high-conviction bearish signals
    bearish = signals.by_direction("bearish").above_strength(0.7)
    for sig in bearish:
        print(f"[{sig.ticker}] {sig.context}")

    # Build supply chain graph
    graph = signals.supply_chain_graph()
    exposure = graph.exposure("TSMC")
    print(f"Companies exposed to TSMC: {exposure['direct_dependents']}")

    # Export for backtesting
    signals.to_parquet("signals.parquet")

asyncio.run(main())

The public API is designed around Pipeline and SignalCollection, so the same extraction run can feed notebooks, alerting, or backtests without an adapter layer.

CLI

# Extract signals
sigint extract --tickers AAPL MSFT --lookback 3 --output signals.parquet

# Query stored signals
sigint query --ticker AAPL --type risk_change --min-strength 0.7

# Launch REST API
sigint serve --port 8080

REST API

curl http://localhost:8080/signals?ticker=AAPL&min_strength=0.7
curl http://localhost:8080/signals/summary

Configuration

sigint reads the Anthropic API key from the ANTHROPIC_API_KEY environment variable. EDGAR requires a User-Agent with a contact email (SEC policy).

export ANTHROPIC_API_KEY="sk-ant-..."

Signal Schema

Every signal follows a universal schema for backtesting compatibility:

Signal(
    timestamp=datetime,          # Filing date (UTC)
    ticker="AAPL",               # Company ticker
    signal_type="risk_change",   # supply_chain | risk_change | m_and_a | tone_shift
    direction="bearish",         # bullish | bearish | neutral
    strength=0.85,               # 0.0 - 1.0
    confidence=0.92,             # 0.0 - 1.0
    context="ESCALATED: Supply chain concentration risk",
    source_filing="https://sec.gov/...",
    related_tickers=["TSMC"],
    metadata={...},              # Engine-specific details
)

Project Structure

sigint/
├── src/sigint/
│   ├── __init__.py          # Public API
│   ├── edgar.py             # Async EDGAR client with rate limiting
│   ├── parser.py            # HTML filing section parser
│   ├── llm.py               # Anthropic LLM client wrapper
│   ├── pipeline.py          # Main orchestration
│   ├── signals.py           # SignalCollection with filtering/export
│   ├── graph.py             # Supply chain NetworkX graph
│   ├── storage.py           # DuckDB signal store
│   ├── engines/
│   │   ├── supply_chain.py  # Supply chain extraction
│   │   ├── risk_differ.py   # Risk factor diffing
│   │   ├── m_and_a.py       # M&A signal detection
│   │   └── tone.py          # Management tone analysis
│   └── output/
│       ├── parquet.py       # Parquet/CSV export
│       ├── api.py           # FastAPI REST server
│       └── webhook.py       # Webhook notifications
├── tests/                   # pytest suite with mocked EDGAR/LLM
├── examples/
│   ├── mag7_analysis.py     # Analyse Magnificent 7
│   ├── supply_chain_map.py  # Visualise supply chain graph
│   └── risk_monitor.py      # Monitor risk factor changes
└── docs/
    ├── engines.md           # Engine documentation
    ├── signal_schema.md     # Signal schema reference
    └── backtesting.md       # Backtesting integration guide

Demo

Run the offline walkthrough with:

uv run python examples/demo.py

For EDGAR extraction and portfolio-scale signal analysis, see examples/.

Development

git clone https://github.com/sushaan-k/sigint.git
cd sigint
pip install -e ".[dev]"
pytest -v
ruff check src/ tests/
mypy src/sigint/

Research References

  • "Lazy Prices" (Cohen, Malloy, Nguyen, 2020) -- 10-K language changes predict returns
  • "FinToolBench: Benchmarking LLM Agents with Real-World Financial Tools" (arXiv:2603.08262, 2026)
  • "From Deep Learning to LLMs: A Survey of AI in Quantitative Investment" (arXiv:2503.21422, 2026)
  • SEC EDGAR Full-Text Search API documentation

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Write tests for your changes
  4. Ensure pytest, ruff check, and mypy pass
  5. Submit a pull request

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphasig-0.1.0.tar.gz (488.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphasig-0.1.0-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file alphasig-0.1.0.tar.gz.

File metadata

  • Download URL: alphasig-0.1.0.tar.gz
  • Upload date:
  • Size: 488.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for alphasig-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7dad51869a0ae8948340b98b09e289f182680346d9427ed43170a71d930a7fc4
MD5 14f25886178e42c1ecd8833f5b235044
BLAKE2b-256 6aa25376f83eb53544b6bb95f2c93742d18915eac10f245053ae91bc4bf9b9d8

See more details on using hashes here.

File details

Details for the file alphasig-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: alphasig-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for alphasig-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92a565b27804904f0120d1784ac135e19c0786906df38fb4e5f4c872c088cf37
MD5 74f66aab6dc8680ac0d4bfae985b536b
BLAKE2b-256 a654f1c02b590539a280838e8390c0800ddbfc82dbff9c856a03541f7ddc4984

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page