Skip to main content

SEC filings and Earnings call transcripts data

Project description

Finance Data MCP

A Python-first toolkit for SEC filing ingestion, OCR-to-Markdown conversion, transcript collection, and retrieval across hybrid retrieval (dense + BM25) with reranking.

What this project does

  • Downloads SEC filings and stores filing metadata.
  • Converts filing PDFs to Markdown via olmOCR.
  • Chunks and indexes filings/transcripts in Chroma.
  • Supports:
    • Hybrid search (dense + BM25 reciprocal-rank-fusion + reranker).
  • Exposes workflows through:
    • FastAPI (server.py).
    • MCP server (mcp_server.py).

Repository layout

  • finance_data/filings/: SEC download + helpers.
  • finance_data/ocr/: olmOCR pipeline.
  • finance_data/dataloader/: chunking, Chroma indexing, semantic + BM25 retrieval.
  • finance_data/earnings_transcripts/: transcript fetch + persistence.
  • finance_data/server_api/: API request/response models + batch helpers.
  • server.py: FastAPI app.
  • mcp_server.py: MCP entrypoint.
  • docs/: setup and operations docs.

Quick start

1) Install dependencies

uv sync

For OCR/embedding flows:

uv sync --group ocr-md

For MCP workflows:

uv sync --group ocr-md --group mcp

2) Configure environment

Use .env or environment variables. Common settings:

  • SEC_API_ORGANIZATION, SEC_API_EMAIL
  • OLMOCR_SERVER, OLMOCR_MODEL, OLMOCR_WORKSPACE
  • EMBEDDING_SERVER, EMBEDDING_MODEL
  • CHROMA_PERSIST_DIR
  • MCP_HOST, MCP_PORT, MCP_NGROK_ALLOWED_HOSTS

See finance_data/settings.py for defaults.

3) Run services

Start model servers:

make vllm-olmocr-serve
make vllm-embd-serve
make vllm-reranker-serve

Start API:

make start-server

Start MCP:

uv run --group ocr-md --group mcp python mcp_server.py

Search capabilities

SEC filings API

  • Hybrid (dense + BM25 + reranker): POST /vector_store/search_sec_filings

Transcript API

  • Hybrid (dense + BM25 + reranker): POST /vector_store/search_transcripts

MCP tools

  • Hybrid: search_sec_filings_tool, search_transcripts_tool

Core workflows

SEC filing → Markdown

uv run python -m finance_data.filings.sec_data --ticker AMZN --year 2025
uv run python -m finance_data.ocr.olmocr_pipeline --pdf-dir sec_data/AMZN-2025

Embed and search filings (API)

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_sec_filings" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_sec_filings" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","query":"operating income margin","top_k":5}'

Earnings transcripts

Fetch quarterly transcripts:

uv run python -m finance_data.earnings_transcripts.transcripts AMZN 2025

Embed + hybrid search transcripts:

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_transcripts" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_transcripts" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","query":"AWS revenue growth","top_k":5}'

Docker

Use Makefile wrappers:

make docker-build
make docker-start

Stop/remove by API port:

make docker-stop
make docker-remove

Documentation

  • docs/README.md
  • docs/setup-and-operations.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finance_data_llm-0.1.13.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finance_data_llm-0.1.13-py3-none-any.whl (66.9 kB view details)

Uploaded Python 3

File details

Details for the file finance_data_llm-0.1.13.tar.gz.

File metadata

  • Download URL: finance_data_llm-0.1.13.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for finance_data_llm-0.1.13.tar.gz
Algorithm Hash digest
SHA256 18640038768a9934be3cc8abf834851ead79a230b244f2c3004a24490121f977
MD5 7bf245ead2c02c91fb1b0dd37e2225ab
BLAKE2b-256 b23d68807e7b85c8102942239c960d993c860b74f5d7a16c1c2389310a80a9c4

See more details on using hashes here.

File details

Details for the file finance_data_llm-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: finance_data_llm-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 66.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for finance_data_llm-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 98dcb0a216d55ec3b9057108415722902ecc6321c5190101f14a1505b45536b4
MD5 8af62760604a1197799c55afc7271baf
BLAKE2b-256 446ad2b73846daed146f503822fd534760ff3b8da731d6cbe240459d77e73f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page