jobhive-py

3.3M+ live jobs from 400 000+ companies, scraped directly from ATS sources. The open dataset and toolkit for job market data.

These details have not been verified by PyPI

Project links

Project description

jobhive

The open dataset and toolkit for global job market data. 3.3M+ live jobs from 400 000+ companies, scraped directly from the ATS platforms where companies actually post. No LinkedIn, no reposts, no recruiters.

from jobhive import search

df = search(query="ml engineer", location="Paris", remote=True)

No API key, no auth, no rate limits. The dataset refreshes every 24 hours.

Why jobhive

Most job aggregators scrape LinkedIn and Indeed — both full of duplicates, ghost listings, and reposts. jobhive goes one layer down: directly to the ATS platforms (Greenhouse, Lever, Ashby, Workday, BambooHR…) where companies actually post.

Single source of truth — every row comes from the company's own ATS, so titles, locations, and salaries are accurate.
No duplicates — one ATS posting = one row.
Structured salary when the ATS exposes it (Ashby, Greenhouse Pay Transparency, Lever salaryRange, etc.).
MIT licensed, fully open — fork the dataset, fork the scrapers.

Coverage

Metric	Value
Live jobs	3 376 000+
Companies	406 000+
ATS platforms	31

Top 10 by job count:

ATS	Jobs
Bundesagentur (DE public-sector)	931 049
Workday	653 041
EURES (EU/EEA public-sector)	626 783
SmartRecruiters	213 372
SuccessFactors	180 499
Greenhouse	110 071
Oracle HCM	107 464
iCIMS	92 211
Lever	60 342
Phenom	56 483

Counts come from the live manifest at https://storage.stapply.ai/jobhive/v1/manifest.json — verify any time with jobhive list-ats.

Install

pip install jobhive-py

Distributed as jobhive-py on PyPI; the import name is still jobhive.

Optional extras:

pip install "jobhive-py[parquet]"     # faster downloads via Apache Parquet
pip install "jobhive-py[scrapers]"    # build your own pipeline
pip install "jobhive-py[all]"

Two ways to use it

1. Query the public dataset

from jobhive import search

# Free-text title + location + remote filter
df = search(query="rust", location="Berlin", remote=True, salary_min=80_000)

# Restrict to one ATS slice (smaller download)
df = search(query="data engineer", ats="ashby")

# Pandas all the way down
df.groupby("company").size().sort_values(ascending=False).head(20)

Every row carries:

url, title, company, ats_type, ats_id,
location, is_remote, lat, lon,
salary_min, salary_max, salary_currency, salary_period, salary_summary,
employment_type, commitment, experience, department, team,
description, posted_at, fetched_at, requisition_id, apply_url, raw

Optional fields are None when the source ATS doesn't expose them. raw keeps any provider-specific fields the canonical schema doesn't represent — Greenhouse metadata, Workday bulletFields, etc.

2. Scrape your own companies

from jobhive.scrapers import GreenhouseScraper, LeverScraper, AshbyScraper

jobs = GreenhouseScraper("anthropic").fetch()    # → list[Job]
jobs = LeverScraper("palantir").fetch()
jobs = AshbyScraper("openai").fetch()

Or pick by name:

from jobhive.scrapers import get_scraper

scraper = get_scraper("ashby", "openai")

Scrapers

Multi-tenant ATS (pass the company's slug on that ATS):

Greenhouse, Lever, Ashby, SmartRecruiters, Workable, Rippling, Personio, Gem, JoinCom, iCIMS, JazzHR, Breezy, Teamtailor, Pinpoint, BambooHR, Cornerstone, Recruitee, Recruiterbox, Eightfold, Avature, Phenom, Workday, Oracle, SuccessFactors, Taleo, Mercor.

Custom big-tech APIs (single-tenant, slug ignored): Amazon, Apple, Google, TikTok, Uber.

National public-sector aggregators: Bundesagentur (DE), Arbetsformedlingen (SE), Eures (EU/EEA-wide).

Hybrid jobboards: WelcomeToTheJungle.

A few scrapers (Tesla, Meta) need a real browser session and ship as placeholders pending the optional browser backend in 0.2.

CLI

jobhive search "platform engineer" --location Paris --limit 20
jobhive scrape ashby openai
jobhive list-ats

Contributing

The goal is the largest open-source live job dataset on the internet. That's a forever project, and there's a clear path to make it bigger:

Add a new ATS scraper — every ATS we don't cover yet is a few thousand companies missing from the dataset. The scraper API is intentionally tiny: subclass BaseScraper, set ats, implement fetch(). See any file under src/jobhive/scrapers/ for a 50-line reference, and the Job model in src/jobhive/models.py for the schema you populate.
Improve coverage on an existing ATS — many scrapers extract description / salary / employment-type only when the ATS surfaces them. If you find a tenant where a field is structurally available but we're missing it, a one-line PR is welcome.
Discover new tenants — we maintain a {ats}/{ats}_companies.csv per ATS. New rows = new companies in the dataset.
Report broken scrapers — open an issue with the slug and the failure mode. ATS APIs drift; flagging a regression early keeps the dataset accurate for everyone.

git clone https://github.com/stapply-ai/ats-scrapers
cd ats-scrapers
uv pip install -e ".[dev,scrapers]"
pytest
ruff check .

PRs welcome on main. CI is green for all 6 of {3.11, 3.12, 3.13} × {ubuntu, macos}; please keep it that way.

License

MIT.

Acknowledgments

Built with Reverse API Engineer.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobhive_py-0.1.0.tar.gz (251.4 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jobhive_py-0.1.0-py3-none-any.whl (242.1 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file jobhive_py-0.1.0.tar.gz.

File metadata

Download URL: jobhive_py-0.1.0.tar.gz
Upload date: May 8, 2026
Size: 251.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.1

File hashes

Hashes for jobhive_py-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1ff2adb73568b70a3657f74089c91cb83597108a6b176a553e0a8e82571d83c6`
MD5	`fe8f736d8d54ff73eee8bd6b6e4ba99a`
BLAKE2b-256	`9cd65dfee2077871d469fbe3e9b1e68f94ab3dfbf70833f914e084ba40adc8f0`

See more details on using hashes here.

File details

Details for the file jobhive_py-0.1.0-py3-none-any.whl.

File metadata

Download URL: jobhive_py-0.1.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 242.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.1

File hashes

Hashes for jobhive_py-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6dc221c5e8eeab5c8b5452f427f5e5b325e75c0ef59725ab0786e66f8a2a1b39`
MD5	`16d6917e002d9c34365acc475f83feb1`
BLAKE2b-256	`fdd8935bbf076485a7db6e891983d8db6e316f3c0171348162f653401539f262`

See more details on using hashes here.

jobhive-py 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

jobhive

Why jobhive

Coverage

Install

Two ways to use it

1. Query the public dataset

2. Scrape your own companies

Scrapers

CLI

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes