Skip to main content

3.3M+ live jobs from 400 000+ companies, scraped directly from ATS sources. The open dataset and toolkit for job market data.

Project description

jobhive

jobhive

The open dataset and toolkit for global job market data. 3.3M+ live jobs from 400 000+ companies, scraped directly from the ATS platforms where companies actually post. No LinkedIn, no reposts, no recruiters.

PyPI Python License

from jobhive import search

df = search(query="ml engineer", location="Paris", remote=True)

No API key, no auth, no rate limits. The dataset refreshes every 24 hours.


Why jobhive

Most job aggregators scrape LinkedIn and Indeed — both full of duplicates, ghost listings, and reposts. jobhive goes one layer down: directly to the ATS platforms (Greenhouse, Lever, Ashby, Workday, BambooHR…) where companies actually post.

  • Single source of truth — every row comes from the company's own ATS, so titles, locations, and salaries are accurate.
  • No duplicates — one ATS posting = one row.
  • Structured salary when the ATS exposes it (Ashby, Greenhouse Pay Transparency, Lever salaryRange, etc.).
  • MIT licensed, fully open — fork the dataset, fork the scrapers.

Coverage

Metric Value
Live jobs 3 376 000+
Companies 406 000+
ATS platforms 31

Top 10 by job count:

ATS Jobs
Bundesagentur (DE public-sector) 931 049
Workday 653 041
EURES (EU/EEA public-sector) 626 783
SmartRecruiters 213 372
SuccessFactors 180 499
Greenhouse 110 071
Oracle HCM 107 464
iCIMS 92 211
Lever 60 342
Phenom 56 483

Counts come from the live manifest at https://storage.stapply.ai/jobhive/v1/manifest.json — verify any time with jobhive list-ats.

Install

pip install jobhive-py

Distributed as jobhive-py on PyPI; the import name is still jobhive.

Optional extras:

pip install "jobhive-py[parquet]"     # faster downloads via Apache Parquet
pip install "jobhive-py[scrapers]"    # build your own pipeline
pip install "jobhive-py[all]"

Two ways to use it

1. Query the public dataset

from jobhive import search

# Free-text title + location + remote filter
df = search(query="rust", location="Berlin", remote=True, salary_min=80_000)

# Restrict to one ATS slice (smaller download)
df = search(query="data engineer", ats="ashby")

# Pandas all the way down
df.groupby("company").size().sort_values(ascending=False).head(20)

Every row carries:

url, title, company, ats_type, ats_id,
location, is_remote, lat, lon,
salary_min, salary_max, salary_currency, salary_period, salary_summary,
employment_type, commitment, experience, department, team,
description, posted_at, fetched_at, requisition_id, apply_url, raw

Optional fields are None when the source ATS doesn't expose them. raw keeps any provider-specific fields the canonical schema doesn't represent — Greenhouse metadata, Workday bulletFields, etc.

2. Scrape your own companies

from jobhive.scrapers import GreenhouseScraper, LeverScraper, AshbyScraper

jobs = GreenhouseScraper("anthropic").fetch()    # → list[Job]
jobs = LeverScraper("palantir").fetch()
jobs = AshbyScraper("openai").fetch()

Or pick by name:

from jobhive.scrapers import get_scraper

scraper = get_scraper("ashby", "openai")

Scrapers

Multi-tenant ATS (pass the company's slug on that ATS):

Greenhouse, Lever, Ashby, SmartRecruiters, Workable, Rippling, Personio, Gem, JoinCom, iCIMS, JazzHR, Breezy, Teamtailor, Pinpoint, BambooHR, Cornerstone, Recruitee, Recruiterbox, Eightfold, Avature, Phenom, Workday, Oracle, SuccessFactors, Taleo, Mercor.

Custom big-tech APIs (single-tenant, slug ignored): Amazon, Apple, Google, TikTok, Uber.

National public-sector aggregators: Bundesagentur (DE), Arbetsformedlingen (SE), Eures (EU/EEA-wide).

Hybrid jobboards: WelcomeToTheJungle.

A few scrapers (Tesla, Meta) need a real browser session and ship as placeholders pending the optional browser backend in 0.2.

CLI

jobhive search "platform engineer" --location Paris --limit 20
jobhive scrape ashby openai
jobhive list-ats

Contributing

The goal is the largest open-source live job dataset on the internet. That's a forever project, and there's a clear path to make it bigger:

  • Add a new ATS scraper — every ATS we don't cover yet is a few thousand companies missing from the dataset. The scraper API is intentionally tiny: subclass BaseScraper, set ats, implement fetch(). See any file under src/jobhive/scrapers/ for a 50-line reference, and the Job model in src/jobhive/models.py for the schema you populate.
  • Improve coverage on an existing ATS — many scrapers extract description / salary / employment-type only when the ATS surfaces them. If you find a tenant where a field is structurally available but we're missing it, a one-line PR is welcome.
  • Discover new tenants — we maintain a {ats}/{ats}_companies.csv per ATS. New rows = new companies in the dataset.
  • Report broken scrapers — open an issue with the slug and the failure mode. ATS APIs drift; flagging a regression early keeps the dataset accurate for everyone.
git clone https://github.com/stapply-ai/ats-scrapers
cd ats-scrapers
uv pip install -e ".[dev,scrapers]"
pytest
ruff check .

PRs welcome on main. CI is green for all 6 of {3.11, 3.12, 3.13} × {ubuntu, macos}; please keep it that way.

License

MIT.

Acknowledgments

Built with Reverse API Engineer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobhive_py-0.1.0.tar.gz (251.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jobhive_py-0.1.0-py3-none-any.whl (242.1 kB view details)

Uploaded Python 3

File details

Details for the file jobhive_py-0.1.0.tar.gz.

File metadata

  • Download URL: jobhive_py-0.1.0.tar.gz
  • Upload date:
  • Size: 251.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.1

File hashes

Hashes for jobhive_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1ff2adb73568b70a3657f74089c91cb83597108a6b176a553e0a8e82571d83c6
MD5 fe8f736d8d54ff73eee8bd6b6e4ba99a
BLAKE2b-256 9cd65dfee2077871d469fbe3e9b1e68f94ab3dfbf70833f914e084ba40adc8f0

See more details on using hashes here.

File details

Details for the file jobhive_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jobhive_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 242.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.1

File hashes

Hashes for jobhive_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6dc221c5e8eeab5c8b5452f427f5e5b325e75c0ef59725ab0786e66f8a2a1b39
MD5 16d6917e002d9c34365acc475f83feb1
BLAKE2b-256 fdd8935bbf076485a7db6e891983d8db6e316f3c0171348162f653401539f262

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page