Skip to main content

Mining and parsing S-1 IPO filings

Project description

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

Dataset on HF PyPI - ipo-mine CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

Dataset Construction Pipelines

Image Dataset Pipeline Text Dataset Pipeline
Description 1 Description 2

Quickstart

Install from PyPI

pip install ipo-mine

Using ipo-mine to Download an IPO Filing (Python API)

from download import IPODownloader, Company

downloader = IPODownloader(
    email="example@gmail.com",
    company="Your Example Organization"
)

company = Company.from_ticker("SNOW")

company_filings = downloader.download_ipo(
    company,
    limit=1,
    save_filing=True,
    save_images=False,
    verbose=True
)

filing = company_filings.filings[0]

Parsing the Table of Contents

results = parser.parse_company(
    ticker="SNOW",
    validate=False
)

CLI Usage

You can use the command-line interface to download and parse filings without writing Python code.

Download

Download the latest S-1 filing for a company:

ipo-mine download SNOW --email your@email.com --org "Your Org"

Options:

  • --limit N: Download previous N filings (default: 1)
  • --images: Download and extract images from the filing
  • --all: Download all available IPO filings for the ticker

Parse

Parse a downloaded filing into section-specific files:

ipo-mine parse SNOW

Options:

  • --validate: Enable LLM-based validation of extracted sections
  • --provider: LLM provider (anthropic, openai, google, huggingface)
  • --mode: Validation mode (binary, likert)

Validate

Run LLM validation on existing parsed text files to check for truncation or completeness.

ipo-mine validate SNOW --provider anthropic

Supported Providers

You can choose from the following providers (requires API keys):

Provider Argument Env Variable
Anthropic (Claude) --provider anthropic ANTHROPIC_API_KEY
OpenAI (GPT-4o) --provider openai OPENAI_API_KEY
Google (Gemini) --provider google GOOGLE_API_KEY
HuggingFace --provider huggingface HUGGINGFACE_API_KEY

Validation Modes

  • Binary (--mode binary): Returns "Yes" (Valid) or "No" (Truncated/Incomplete). Default.
  • Likert (--mode likert): Returns a confidence score from 1 (Incomplete) to 5 (Complete).

Authentication

The CLI will look for API keys in this order:

  1. Command Line Argument: --api-key "sk-..."
  2. Environment Variable: e.g., export OPENAI_API_KEY="sk-..."
  3. Interactive Prompt: If neither is found, the CLI will securely prompt you to enter the key (input is hidden).

Examples

Validate using OpenAI with Likert scale:

ipo-mine validate TSLA --provider openai --mode likert

Validate using Google Gemini with explicit key:

ipo-mine validate TSLA --provider google --api-key "your-api-key"

Notes

  • The SEC requires a descriptive User-Agent. Provide a real organization name and your email.
  • download_ipo returns a CompanyFilings object; use company_filings.filings[0] to pass a Filing into the parser.
  • The parser automatically chooses HTML or text parsing based on the filing URL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipo_mine-0.1.3.tar.gz (84.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipo_mine-0.1.3-py3-none-any.whl (91.5 kB view details)

Uploaded Python 3

File details

Details for the file ipo_mine-0.1.3.tar.gz.

File metadata

  • Download URL: ipo_mine-0.1.3.tar.gz
  • Upload date:
  • Size: 84.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ipo_mine-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e25bbfefad14db60c9ab45fe8abd4a32c91e49b5223cd8501a0caee59775d5eb
MD5 8b74f8075770b6a57e7053b56ab42a5a
BLAKE2b-256 ff2d9330959b6c67cc3cc59f066d8eeddd3648cf461574cc12c8f2f2af71bf50

See more details on using hashes here.

File details

Details for the file ipo_mine-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ipo_mine-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 91.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ipo_mine-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8de42e5f8d33089a5387d9832670f17ac66d207c0e3d9b8928986ddf784a4d8e
MD5 062b773b30d74412d968770c5e106b61
BLAKE2b-256 db04f717f4d10095c4d8ce0c670fec0f00456bb96f9d31c4911e2aba629d66d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page