Skip to main content

SEC filings and Earnings call transcripts data

Project description

Finance Data MCP

A Python-first toolkit for SEC filing ingestion, OCR-to-Markdown conversion, transcript collection, and retrieval across semantic and BM25 lexical search.

What this project does

  • Downloads SEC filings and stores filing metadata.
  • Converts filing PDFs to Markdown via olmOCR.
  • Chunks and indexes filings/transcripts in Chroma.
  • Supports both:
    • Semantic search (embedding similarity).
    • BM25 search (keyword/lexical ranking).
  • Exposes workflows through:
    • FastAPI (server.py).
    • MCP server (mcp_server.py).

Repository layout

  • finance_data/filings/: SEC download + helpers.
  • finance_data/ocr/: olmOCR pipeline.
  • finance_data/dataloader/: chunking, Chroma indexing, semantic + BM25 retrieval.
  • finance_data/earnings_transcripts/: transcript fetch + persistence.
  • finance_data/server_api/: API request/response models + batch helpers.
  • server.py: FastAPI app.
  • mcp_server.py: MCP entrypoint.
  • docs/: setup and operations docs.

Quick start

1) Install dependencies

uv sync

For OCR/embedding flows:

uv sync --group ocr-md

For MCP workflows:

uv sync --group ocr-md --group mcp

2) Configure environment

Use .env or environment variables. Common settings:

  • SEC_API_ORGANIZATION, SEC_API_EMAIL
  • OLMOCR_SERVER, OLMOCR_MODEL, OLMOCR_WORKSPACE
  • EMBEDDING_SERVER, EMBEDDING_MODEL
  • CHROMA_PERSIST_DIR
  • MCP_HOST, MCP_PORT, MCP_NGROK_ALLOWED_HOSTS

See finance_data/settings.py for defaults.

3) Run services

Start model servers:

make vllm-olmocr-serve
make vllm-embd-serve

Start API:

make start-server

Start MCP:

uv run --group ocr-md --group mcp python mcp_server.py

Search capabilities

SEC filings API

  • Semantic: POST /vector_store/search_sec_filings
  • BM25: POST /vector_store/search_sec_filings_bm25

Transcript API

  • Semantic: POST /vector_store/search_transcripts
  • BM25: POST /vector_store/search_transcripts_bm25

MCP tools

  • Semantic: search_sec_filings_tool, search_transcripts_tool
  • BM25: search_sec_filings_bm25_tool, search_transcripts_bm25_tool

Core workflows

SEC filing → Markdown

uv run python -m finance_data.filings.sec_data --ticker AMZN --year 2025
uv run python -m finance_data.ocr.olmocr_pipeline --pdf-dir sec_data/AMZN-2025

Embed and search filings (API)

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_sec_filings" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_sec_filings_bm25" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","query":"operating income margin","top_k":5}'

Earnings transcripts

Fetch quarterly transcripts:

uv run python -m finance_data.earnings_transcripts.transcripts AMZN 2025

Embed + BM25 search transcripts:

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_transcripts" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_transcripts_bm25" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","query":"AWS revenue growth","top_k":5}'

Docker

Use Makefile wrappers:

make docker-build
make docker-start

Stop/remove by API port:

make docker-stop
make docker-remove

Documentation

  • docs/README.md
  • docs/setup-and-operations.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finance_data_llm-0.1.7.tar.gz (52.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finance_data_llm-0.1.7-py3-none-any.whl (59.5 kB view details)

Uploaded Python 3

File details

Details for the file finance_data_llm-0.1.7.tar.gz.

File metadata

  • Download URL: finance_data_llm-0.1.7.tar.gz
  • Upload date:
  • Size: 52.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for finance_data_llm-0.1.7.tar.gz
Algorithm Hash digest
SHA256 d5d17c93f16934b8de74f1626e67817e675684002493ce58d8f21986ba15d4c5
MD5 d34045616ef3b68d0ce3acd8b7606ab8
BLAKE2b-256 e1aecdc48140b994e83b12a15df1880a2a1f6a5cbd04abb594256a373a265b67

See more details on using hashes here.

File details

Details for the file finance_data_llm-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: finance_data_llm-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 59.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for finance_data_llm-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 190a1c40d96a6f015366eacba0edcbcd5e8b99754704412144898e0f04580cda
MD5 fb596c343add46a4c9efbf6fe19c2084
BLAKE2b-256 96243d6e18000e58c512fbb39cde89e06829b401e2d532eb30698b169662bcb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page