Skip to main content

Mining and parsing S-1 IPO filings

Project description

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

Dataset on HF PyPI - ipo-mine CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

Dataset Construction Pipelines

Image Dataset Pipeline Text Dataset Pipeline
Description 1 Description 2

Quickstart

Install from PyPI

pip install ipo-mine

Using ipo-mine to Download an IPO Filing (Python API)

from download import IPODownloader, Company

downloader = IPODownloader(
    email="example@gmail.com",
    company="Your Example Organization"
)

company = Company.from_ticker("SNOW")

company_filings = downloader.download_ipo(
    company,
    limit=1,
    save_filing=True,
    save_images=False,
    verbose=True
)

filing = company_filings.filings[0]

Parsing the Table of Contents

results = parser.parse_company(
    ticker="SNOW",
    validate=False
)

CLI Usage

You can use the command-line interface to download and parse filings without writing Python code.

Download

Download the latest S-1 filing for a company:

ipo-mine download SNOW --email your@email.com --org "Your Org"

Options:

  • --limit N: Download previous N filings (default: 1)
  • --images: Download and extract images from the filing
  • --all: Download all available IPO filings for the ticker

Parse

Parse a downloaded filing into section-specific files:

ipo-mine parse SNOW

Options:

  • --validate: Enable LLM-based validation of extracted sections
  • --provider: LLM provider (anthropic, openai, google, huggingface)
  • --mode: Validation mode (binary, likert)

Validate

Run LLM validation on existing parsed text files to check for truncation or completeness.

ipo-mine validate SNOW --provider anthropic

Supported Providers

You can choose from the following providers (requires API keys):

Provider Argument Env Variable
Anthropic (Claude) --provider anthropic ANTHROPIC_API_KEY
OpenAI (GPT-4o) --provider openai OPENAI_API_KEY
Google (Gemini) --provider google GOOGLE_API_KEY
HuggingFace --provider huggingface HUGGINGFACE_API_KEY

Validation Modes

  • Binary (--mode binary): Returns "Yes" (Valid) or "No" (Truncated/Incomplete). Default.
  • Likert (--mode likert): Returns a confidence score from 1 (Incomplete) to 5 (Complete).

Authentication

The CLI will look for API keys in this order:

  1. Command Line Argument: --api-key "sk-..."
  2. Environment Variable: e.g., export OPENAI_API_KEY="sk-..."
  3. Interactive Prompt: If neither is found, the CLI will securely prompt you to enter the key (input is hidden).

Examples

Validate using OpenAI with Likert scale:

ipo-mine validate TSLA --provider openai --mode likert

Validate using Google Gemini with explicit key:

ipo-mine validate TSLA --provider google --api-key "your-api-key"

Notes

  • The SEC requires a descriptive User-Agent. Provide a real organization name and your email.
  • download_ipo returns a CompanyFilings object; use company_filings.filings[0] to pass a Filing into the parser.
  • The parser automatically chooses HTML or text parsing based on the filing URL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipo_mine-0.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipo_mine-0.1.1-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file ipo_mine-0.1.1.tar.gz.

File metadata

  • Download URL: ipo_mine-0.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ipo_mine-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f5a0f7fa489488bb67aab03d1f8cde725251278fba0e4a2c0242793fbfcf5cf5
MD5 a9fa91c7aeacaf241c8da4820cf0afaf
BLAKE2b-256 fd6dbfe43ae4b96f7b78602da0355df58116fc1289f691c4f134b9129534dae0

See more details on using hashes here.

File details

Details for the file ipo_mine-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ipo_mine-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ipo_mine-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 598ef245918045d70d9e093ab5fbd08e48ed4a5bef9be1135c1891d3642c63bc
MD5 cc8748f29159af2de930037d27fa1674
BLAKE2b-256 838e74a3299ebb1a08a4b56a3a1aed62290029ee83cf33546b47bc61681d8154

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page