Causal signal extraction from SEC filings using LLMs
Project description
sigint
Causal signal extraction from SEC filings using LLMs.
sigint turns filing text into structured, timestamped trading and monitoring
signals. The focus is not generic sentiment, but directional changes in risk
language, supplier exposure, M&A patterns, and topic-specific management tone.
At a Glance
- Async EDGAR ingestion with filing parsing and section extraction
- LLM-assisted extraction engines for risk, supply chain, M&A, and tone
- Timestamped signal schema designed for storage and backtesting
- Supply-chain graph construction for second-order exposure analysis
- Parquet, DuckDB, API, and webhook outputs for downstream workflows
Every quant fund scrapes SEC filings. Sentiment analysis on 10-K/10-Q text is a solved, commoditized problem with zero alpha left. sigint does something different: it extracts causal, structural relationships buried in filings -- supply chain dependencies, risk factor escalations, M&A language patterns, and topic-level management tone shifts -- and compiles them into timestamped, backtestable signals.
Why This Exists
The difference between "sentiment is positive" (useless) and "Company X just added 'supply chain concentration risk' to their 10-K for the first time, and their top supplier is Company Y which reports next week" (actionable).
Research shows (Lazy Prices, Cohen et al. 2020) that changes in 10-K language are among the strongest predictors of future returns. sigint operationalizes this insight.
Showcase
Supply-chain dependency graph rendered from a small local signal set using the built-in graph utilities.
Architecture
graph TD
A[EDGAR API] -->|10-K, 10-Q, 8-K| B[Section Parser]
B -->|Risk Factors, MD&A, Business| C{Extraction Engines}
C --> D[Supply Chain Graph Builder]
C --> E[Risk Factor Differ]
C --> F[M&A Signal Detector]
C --> G[Management Tone Analyzer]
D --> H[Signal Compiler]
E --> H
F --> H
G --> H
H --> I[Parquet Export]
H --> J[DuckDB Storage]
H --> K[REST API]
H --> L[Webhook Alerts]
Extraction Engines
| Engine | What It Does | Key Insight |
|---|---|---|
| Supply Chain | Extracts supplier/customer/partner relationships into a knowledge graph | When TSMC has a disruption, know exactly which companies are exposed |
| Risk Differ | Diffs Item 1A between consecutive filings; classifies NEW, REMOVED, ESCALATED, DE_ESCALATED | Legal language changes are the strongest predictive signals (Lazy Prices) |
| M&A Detector | Identifies strategic-alternatives language, advisor engagements, cash positioning shifts | Certain filing patterns strongly precede M&A announcements |
| Tone Analyzer | Tracks topic-specific management tone across filings on a 6-point scale | Not "positive/negative" but "confident -> hedging" on specific topics |
Quick Start
Installation
pip install sigint
Basic Usage
import asyncio
from sigint import Pipeline
async def main():
pipeline = Pipeline(
model="claude-sonnet-4-6",
user_agent="Your Name your@email.com",
)
signals = await pipeline.extract(
tickers=["AAPL", "MSFT", "GOOGL"],
filing_types=["10-K", "10-Q"],
lookback_years=3,
engines=["supply_chain", "risk_differ", "m_and_a", "tone"],
)
# Filter high-conviction bearish signals
bearish = signals.by_direction("bearish").above_strength(0.7)
for sig in bearish:
print(f"[{sig.ticker}] {sig.context}")
# Build supply chain graph
graph = signals.supply_chain_graph()
exposure = graph.exposure("TSMC")
print(f"Companies exposed to TSMC: {exposure['direct_dependents']}")
# Export for backtesting
signals.to_parquet("signals.parquet")
asyncio.run(main())
The public API is designed around Pipeline and SignalCollection, so the
same extraction run can feed notebooks, alerting, or backtests without an
adapter layer.
CLI
# Extract signals
sigint extract --tickers AAPL MSFT --lookback 3 --output signals.parquet
# Query stored signals
sigint query --ticker AAPL --type risk_change --min-strength 0.7
# Launch REST API
sigint serve --port 8080
REST API
curl http://localhost:8080/signals?ticker=AAPL&min_strength=0.7
curl http://localhost:8080/signals/summary
Configuration
sigint reads the Anthropic API key from the ANTHROPIC_API_KEY environment variable. EDGAR requires a User-Agent with a contact email (SEC policy).
export ANTHROPIC_API_KEY="sk-ant-..."
Signal Schema
Every signal follows a universal schema for backtesting compatibility:
Signal(
timestamp=datetime, # Filing date (UTC)
ticker="AAPL", # Company ticker
signal_type="risk_change", # supply_chain | risk_change | m_and_a | tone_shift
direction="bearish", # bullish | bearish | neutral
strength=0.85, # 0.0 - 1.0
confidence=0.92, # 0.0 - 1.0
context="ESCALATED: Supply chain concentration risk",
source_filing="https://sec.gov/...",
related_tickers=["TSMC"],
metadata={...}, # Engine-specific details
)
Project Structure
sigint/
├── src/sigint/
│ ├── __init__.py # Public API
│ ├── edgar.py # Async EDGAR client with rate limiting
│ ├── parser.py # HTML filing section parser
│ ├── llm.py # Anthropic LLM client wrapper
│ ├── pipeline.py # Main orchestration
│ ├── signals.py # SignalCollection with filtering/export
│ ├── graph.py # Supply chain NetworkX graph
│ ├── storage.py # DuckDB signal store
│ ├── engines/
│ │ ├── supply_chain.py # Supply chain extraction
│ │ ├── risk_differ.py # Risk factor diffing
│ │ ├── m_and_a.py # M&A signal detection
│ │ └── tone.py # Management tone analysis
│ └── output/
│ ├── parquet.py # Parquet/CSV export
│ ├── api.py # FastAPI REST server
│ └── webhook.py # Webhook notifications
├── tests/ # pytest suite with mocked EDGAR/LLM
├── examples/
│ ├── mag7_analysis.py # Analyse Magnificent 7
│ ├── supply_chain_map.py # Visualise supply chain graph
│ └── risk_monitor.py # Monitor risk factor changes
└── docs/
├── engines.md # Engine documentation
├── signal_schema.md # Signal schema reference
└── backtesting.md # Backtesting integration guide
Demo
Run the offline walkthrough with:
uv run python examples/demo.py
For EDGAR extraction and portfolio-scale signal analysis, see examples/.
Development
git clone https://github.com/sushaan-k/sigint.git
cd sigint
pip install -e ".[dev]"
pytest -v
ruff check src/ tests/
mypy src/sigint/
Research References
- "Lazy Prices" (Cohen, Malloy, Nguyen, 2020) -- 10-K language changes predict returns
- "FinToolBench: Benchmarking LLM Agents with Real-World Financial Tools" (arXiv:2603.08262, 2026)
- "From Deep Learning to LLMs: A Survey of AI in Quantitative Investment" (arXiv:2503.21422, 2026)
- SEC EDGAR Full-Text Search API documentation
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Write tests for your changes
- Ensure
pytest,ruff check, andmypypass - Submit a pull request
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alphasig-0.1.0.tar.gz.
File metadata
- Download URL: alphasig-0.1.0.tar.gz
- Upload date:
- Size: 488.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dad51869a0ae8948340b98b09e289f182680346d9427ed43170a71d930a7fc4
|
|
| MD5 |
14f25886178e42c1ecd8833f5b235044
|
|
| BLAKE2b-256 |
6aa25376f83eb53544b6bb95f2c93742d18915eac10f245053ae91bc4bf9b9d8
|
File details
Details for the file alphasig-0.1.0-py3-none-any.whl.
File metadata
- Download URL: alphasig-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a565b27804904f0120d1784ac135e19c0786906df38fb4e5f4c872c088cf37
|
|
| MD5 |
74f66aab6dc8680ac0d4bfae985b536b
|
|
| BLAKE2b-256 |
a654f1c02b590539a280838e8390c0800ddbfc82dbff9c856a03541f7ddc4984
|