3.3M+ live jobs from 400 000+ companies, scraped directly from ATS sources. The open dataset and toolkit for job market data.
Project description
jobhive
The open dataset and toolkit for global job market data. 3.3M+ live jobs from 400 000+ companies, scraped directly from the ATS platforms where companies actually post. No LinkedIn, no reposts, no recruiters.
from jobhive import search
df = search(query="ml engineer", location="Paris", remote=True)
No API key, no auth, no rate limits. The dataset refreshes every 24 hours.
Why jobhive
Most job aggregators scrape LinkedIn and Indeed — both full of duplicates, ghost listings, and reposts. jobhive goes one layer down: directly to the ATS platforms (Greenhouse, Lever, Ashby, Workday, BambooHR…) where companies actually post.
- Single source of truth — every row comes from the company's own ATS, so titles, locations, and salaries are accurate.
- No duplicates — one ATS posting = one row.
- Structured salary when the ATS exposes it (Ashby, Greenhouse Pay Transparency, Lever salaryRange, etc.).
- MIT licensed, fully open — fork the dataset, fork the scrapers.
Coverage
| Metric | Value |
|---|---|
| Live jobs | 3 376 000+ |
| Companies | 406 000+ |
| ATS platforms | 31 |
Top 10 by job count:
| ATS | Jobs |
|---|---|
| Bundesagentur (DE public-sector) | 931 049 |
| Workday | 653 041 |
| EURES (EU/EEA public-sector) | 626 783 |
| SmartRecruiters | 213 372 |
| SuccessFactors | 180 499 |
| Greenhouse | 110 071 |
| Oracle HCM | 107 464 |
| iCIMS | 92 211 |
| Lever | 60 342 |
| Phenom | 56 483 |
Counts come from the live manifest at
https://storage.stapply.ai/jobhive/v1/manifest.json — verify any time
with jobhive list-ats.
Install
pip install jobhive-py
Distributed as jobhive-py on PyPI; the import name is still jobhive.
Optional extras:
pip install "jobhive-py[parquet]" # faster downloads via Apache Parquet
pip install "jobhive-py[scrapers]" # build your own pipeline
pip install "jobhive-py[all]"
Two ways to use it
1. Query the public dataset
from jobhive import search
# Free-text title + location + remote filter
df = search(query="rust", location="Berlin", remote=True, salary_min=80_000)
# Restrict to one ATS slice (smaller download)
df = search(query="data engineer", ats="ashby")
# Pandas all the way down
df.groupby("company").size().sort_values(ascending=False).head(20)
Every row carries:
url, title, company, ats_type, ats_id,
location, is_remote, lat, lon,
salary_min, salary_max, salary_currency, salary_period, salary_summary,
employment_type, commitment, experience, department, team,
description, posted_at, fetched_at, requisition_id, apply_url, raw
Optional fields are None when the source ATS doesn't expose them.
raw keeps any provider-specific fields the canonical schema doesn't
represent — Greenhouse metadata, Workday bulletFields, etc.
2. Scrape your own companies
from jobhive.scrapers import GreenhouseScraper, LeverScraper, AshbyScraper
jobs = GreenhouseScraper("anthropic").fetch() # → list[Job]
jobs = LeverScraper("palantir").fetch()
jobs = AshbyScraper("openai").fetch()
Or pick by name:
from jobhive.scrapers import get_scraper
scraper = get_scraper("ashby", "openai")
Scrapers
Multi-tenant ATS (pass the company's slug on that ATS):
Greenhouse, Lever, Ashby, SmartRecruiters, Workable,
Rippling, Personio, Gem, JoinCom, iCIMS, JazzHR, Breezy,
Teamtailor, Pinpoint, BambooHR, Cornerstone, Recruitee,
Recruiterbox, Eightfold, Avature, Phenom, Workday, Oracle,
SuccessFactors, Taleo, Mercor.
Custom big-tech APIs (single-tenant, slug ignored): Amazon,
Apple, Google, TikTok, Uber.
National public-sector aggregators: Bundesagentur (DE),
Arbetsformedlingen (SE), Eures (EU/EEA-wide).
Hybrid jobboards: WelcomeToTheJungle.
A few scrapers (Tesla, Meta) need a real browser session and ship as
placeholders pending the optional browser backend in 0.2.
CLI
jobhive search "platform engineer" --location Paris --limit 20
jobhive scrape ashby openai
jobhive list-ats
Contributing
The goal is the largest open-source live job dataset on the internet. That's a forever project, and there's a clear path to make it bigger:
- Add a new ATS scraper — every ATS we don't cover yet is a few
thousand companies missing from the dataset. The scraper API is
intentionally tiny: subclass
BaseScraper, setats, implementfetch(). See any file undersrc/jobhive/scrapers/for a 50-line reference, and theJobmodel insrc/jobhive/models.pyfor the schema you populate. - Improve coverage on an existing ATS — many scrapers extract description / salary / employment-type only when the ATS surfaces them. If you find a tenant where a field is structurally available but we're missing it, a one-line PR is welcome.
- Discover new tenants — we maintain a
{ats}/{ats}_companies.csvper ATS. New rows = new companies in the dataset. - Report broken scrapers — open an issue with the slug and the failure mode. ATS APIs drift; flagging a regression early keeps the dataset accurate for everyone.
git clone https://github.com/stapply-ai/ats-scrapers
cd ats-scrapers
uv pip install -e ".[dev,scrapers]"
pytest
ruff check .
PRs welcome on main. CI is green for all 6 of {3.11, 3.12, 3.13} ×
{ubuntu, macos}; please keep it that way.
License
MIT.
Acknowledgments
Built with Reverse API Engineer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jobhive_py-0.1.0.tar.gz.
File metadata
- Download URL: jobhive_py-0.1.0.tar.gz
- Upload date:
- Size: 251.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ff2adb73568b70a3657f74089c91cb83597108a6b176a553e0a8e82571d83c6
|
|
| MD5 |
fe8f736d8d54ff73eee8bd6b6e4ba99a
|
|
| BLAKE2b-256 |
9cd65dfee2077871d469fbe3e9b1e68f94ab3dfbf70833f914e084ba40adc8f0
|
File details
Details for the file jobhive_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jobhive_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 242.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dc221c5e8eeab5c8b5452f427f5e5b325e75c0ef59725ab0786e66f8a2a1b39
|
|
| MD5 |
16d6917e002d9c34365acc475f83feb1
|
|
| BLAKE2b-256 |
fdd8935bbf076485a7db6e891983d8db6e316f3c0171348162f653401539f262
|