Skip to main content

Mining and parsing S-1 IPO filings

Project description

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

Dataset on HF PyPI - ipo-mine CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

Dataset Construction Pipelines

Image Dataset Pipeline Text Dataset Pipeline
Description 1 Description 2

Quickstart

Install from PyPI

pip install ipo-mine

Using ipo-mine to Download an IPO Filing (Python API)

from download import IPODownloader, Company

downloader = IPODownloader(
    email="example@gmail.com",
    company="Your Example Organization"
)

company = Company.from_ticker("SNOW")

company_filings = downloader.download_ipo(
    company,
    limit=1,
    save_filing=True,
    save_images=False,
    verbose=True
)

filing = company_filings.filings[0]

Parsing the Table of Contents

results = parser.parse_company(
    ticker="SNOW",
    validate=False
)

CLI Usage

You can use the command-line interface to download and parse filings without writing Python code.

Download

Download the latest S-1 filing for a company:

ipo-mine download SNOW --email your@email.com --org "Your Org"

Options:

  • --limit N: Download previous N filings (default: 1)
  • --images: Download and extract images from the filing
  • --all: Download all available IPO filings for the ticker

Parse

Parse a downloaded filing into section-specific files:

ipo-mine parse SNOW

Options:

  • --validate: Enable LLM-based validation of extracted sections
  • --provider: LLM provider (anthropic, openai, google, huggingface)
  • --mode: Validation mode (binary, likert)

Validate

Run LLM validation on existing parsed text files to check for truncation or completeness.

ipo-mine validate SNOW --provider anthropic

Supported Providers

You can choose from the following providers (requires API keys):

Provider Argument Env Variable
Anthropic (Claude) --provider anthropic ANTHROPIC_API_KEY
OpenAI (GPT-4o) --provider openai OPENAI_API_KEY
Google (Gemini) --provider google GOOGLE_API_KEY
HuggingFace --provider huggingface HUGGINGFACE_API_KEY

Validation Modes

  • Binary (--mode binary): Returns "Yes" (Valid) or "No" (Truncated/Incomplete). Default.
  • Likert (--mode likert): Returns a confidence score from 1 (Incomplete) to 5 (Complete).

Authentication

The CLI will look for API keys in this order:

  1. Command Line Argument: --api-key "sk-..."
  2. Environment Variable: e.g., export OPENAI_API_KEY="sk-..."
  3. Interactive Prompt: If neither is found, the CLI will securely prompt you to enter the key (input is hidden).

Examples

Validate using OpenAI with Likert scale:

ipo-mine validate TSLA --provider openai --mode likert

Validate using Google Gemini with explicit key:

ipo-mine validate TSLA --provider google --api-key "your-api-key"

Notes

  • The SEC requires a descriptive User-Agent. Provide a real organization name and your email.
  • download_ipo returns a CompanyFilings object; use company_filings.filings[0] to pass a Filing into the parser.
  • The parser automatically chooses HTML or text parsing based on the filing URL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipo_mine-0.1.2.tar.gz (84.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipo_mine-0.1.2-py3-none-any.whl (91.4 kB view details)

Uploaded Python 3

File details

Details for the file ipo_mine-0.1.2.tar.gz.

File metadata

  • Download URL: ipo_mine-0.1.2.tar.gz
  • Upload date:
  • Size: 84.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ipo_mine-0.1.2.tar.gz
Algorithm Hash digest
SHA256 23414e373cab63ddaed2da792d21f98d1720de56e5e24dd56cf9631392f29f64
MD5 619e2673362342d38793f969487ae209
BLAKE2b-256 90be2ddfbc56bd345a2779c173c4bf602ccddf3399aa4cc383ecf5cd8b7af55c

See more details on using hashes here.

File details

Details for the file ipo_mine-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ipo_mine-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 91.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ipo_mine-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f06afbb658950e08869a378c21d61bedfcaa0b5142d814a55705f322a31190a0
MD5 65b037554eaeaa50a5b2a6bd4bfc2e50
BLAKE2b-256 77496909dd0212fa089cf29037ca88551bc24a197aafdfcf372d91f7b6a11fc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page