20-source job scraper library + MCP server. LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter, Wellfound, Hiring Cafe, Greenhouse, USAJobs, Adzuna, Jooble, Findwork, The Muse, Insight Global, Clearance Jobs, Kforce, CollabWork, Naukri, Bayt, BDJobs.
Project description
jobdrop
A multi-source job scraper. Hits 20 job boards in one call, normalizes the results into a pandas DataFrame, and ships with anti-bot handling for the boards that block standard scrapers.
Maintainer: this project is maintained by kbwhodat. Substantially extended from the original
cullenwatson/JobSpy(MIT licensed) with new sources, an integrated MCP server, salary/seniority filters, and reliability fixes across all scrapers.
What's in here
20 sources
site_name |
Source | Notes |
|---|---|---|
linkedin |
Public listings + optional detail-page enrichment | |
indeed |
Indeed | GraphQL with per-company cap + paginate-until-quota |
glassdoor |
Glassdoor | Listings + company reviews + salary data |
google |
Google Jobs | SERP aggregation across many sources |
zip_recruiter |
ZipRecruiter | US/Canada-focused |
hiring_cafe |
Hiring Cafe | AI-curated, ~140 jobs/page with rich tags (seniority, comp, skills, workplace_type) |
wellfound |
Wellfound (formerly AngelList) | 50k+ startup roles |
collab_work |
CollabWork | Community/newsletter aggregator (~2k curated roles, fastest source) |
greenhouse |
Greenhouse-hosted boards | Most YC and Series A+ companies; 3-layer staleness filter |
bayt |
Bayt | Middle East focused |
naukri |
Naukri | India's largest job portal |
bdjobs |
BDJobs | Bangladesh's premier job portal |
usajobs |
USAJobs.gov | US federal public API |
adzuna |
Adzuna | Public API, 100% salary fill rate |
jooble |
Jooble | Public API, 60+ countries |
findwork |
Findwork.dev | Developer-focused public API |
the_muse |
The Muse | Culture-forward public API |
insight_global |
Insight Global staffing | Server-rendered listings |
clearance_jobs |
ClearanceJobs (DHI) | Security-cleared roles, full JD + salary + structured job_type |
kforce |
Kforce staffing | Direct backend API for fast results |
Quality + reliability tightening
- LinkedIn — salary extraction from description body, optional per-company cap, parallel detail fetches.
- Indeed —
radiusGraphQL fix, per-company cap to surface diverse employers, pagination loop hardened. - ClearanceJobs — parallel detail-page fetch so you get full JD, salary range, structured
job_type, authoritativeremotebool (vs the API's 200-char preview alone). - Greenhouse — three layers of stale-protection (404 drop / past application deadline / first-published age with a 90-day default that respects
hours_old). - Wellfound + Hiring Cafe — added with anti-bot handling that defeats the strictest CDN/WAF tiers in the catalog.
Bundled credentials
API keys for USAJobs, Adzuna, Jooble, Findwork, and The Muse are baked
into a positional resolver (jobdrop/_defaults.py) so the new sources
work without environment setup. User-set env vars still win via
setdefault semantics.
Installation
As a Python library
pip install -U jobdrop
Python ≥ 3.10 required.
As an MCP server (Claude Desktop / Claude Code / Cursor / Cline)
Install the binary once with uv tool install (or pipx install):
uv tool install "jobdrop[mcp]"
# or: pipx install "jobdrop[mcp]"
Then add to your MCP client config — e.g. ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"jobdrop": {
"command": "jobdrop-mcp-server"
}
}
}
That's it — the client launches jobdrop-mcp-server as a stdio subprocess on demand. No daemon, no port, no nix.
Note: prefer the
uv tool installpath so the binary lands in PATH and the client launches it directly — same pattern as reference MCP servers (filesystem, git, etc.).
Usage
from jobdrop import scrape_jobs
jobs = scrape_jobs(
site_name=["insight_global", "clearance_jobs", "kforce", "greenhouse",
"linkedin", "indeed", "google"],
search_term="site reliability engineer",
location="Atlanta, GA",
results_wanted=20,
hours_old=720, # 30-day freshness cap
country_indeed="usa",
)
print(f"Found {len(jobs)} jobs")
print(jobs[["site", "title", "company", "location", "min_amount", "max_amount", "job_url"]].head())
Parameters
scrape_jobs(
site_name list[str] | str — any of the 17 sources above (default: all)
search_term str — keyword query
google_search_term str — Google Jobs override (only filter for `google`)
location str — "City, ST" or ZIP. Each scraper geocodes its own way.
distance int — radius miles, default 50
is_remote bool — remote-only filter (where supported)
job_type str — "fulltime" | "parttime" | "contract" | "internship"
easy_apply bool — direct-board apply only (LinkedIn easy-apply is broken)
results_wanted int — per-site target
offset int — pagination offset
hours_old int — drop postings older than N hours
country_indeed str — Indeed/Glassdoor country (see list below)
description_format str — "markdown" | "html"
enforce_annual_salary bool — convert hourly/monthly to yearly
linkedin_fetch_description bool — full JD + direct URL (slower)
linkedin_company_ids list[int] — filter LinkedIn by company IDs
proxies list[str] — round-robin proxies, "user:pass@host:port"
ca_cert str — CA cert path for proxies
user_agent str — override the default UA
verbose int — 0 errors / 1 warnings / 2 all
)
Per-scraper limitations
- Indeed — only one of
hours_old/ (job_type+is_remote) /easy_applyper call. - LinkedIn — only one of
hours_old/easy_applyper call. - ClearanceJobs — location/remote filters require facet IDs from the dropdown endpoints (not implemented). Filter client-side or scope by keyword.
- InsightGlobal — does not expose client-company name (it's the staffing firm).
is_remoteis not available in their data. - Greenhouse — Google indexes some postings after they're filled. Stale 404s are filtered out; the freshness cutoff filters "live but ancient" postings (default 90 days, override with
hours_old).
JobPost schema
JobPost
├── id, title, company_name, company_url, job_url
├── location { country, city, state }
├── description
├── is_remote
├── date_posted
├── job_type fulltime | parttime | contract | internship
├── compensation
│ ├── interval yearly | monthly | weekly | daily | hourly
│ ├── min_amount, max_amount, currency
│ └── salary_source
├── job_level (LinkedIn, ClearanceJobs)
├── company_industry (LinkedIn, Indeed, Greenhouse, Kforce)
├── company_country, company_addresses,
│ company_employees_label, company_revenue_label,
│ company_description, company_logo (Indeed)
├── skills, experience_range,
│ company_rating, company_reviews_count,
│ vacancy_count, work_from_home_type (Naukri)
└── emails
Indeed / Glassdoor country list
Pass country_indeed (use the exact name; * = also supported on Glassdoor):
| Argentina | Australia* | Austria* | Bahrain |
| Belgium* | Brazil* | Canada* | Chile |
| China | Colombia | Costa Rica | Czech Republic |
| Denmark | Ecuador | Egypt | Finland |
| France* | Germany* | Greece | Hong Kong* |
| Hungary | India* | Indonesia | Ireland* |
| Israel | Italy* | Japan | Kuwait |
| Luxembourg | Malaysia | Mexico* | Morocco |
| Netherlands* | New Zealand* | Nigeria | Norway |
| Oman | Pakistan | Panama | Peru |
| Philippines | Poland | Portugal | Qatar |
| Romania | Saudi Arabia | Singapore* | South Africa |
| South Korea | Spain* | Sweden | Switzerland* |
| Taiwan | Thailand | Turkey | Ukraine |
| United Arab Emirates | UK* | USA* | Uruguay |
| Venezuela | Vietnam* |
LinkedIn searches globally and uses only location. ZipRecruiter is US/Canada and uses only location. Bayt searches internationally with only search_term.
Notes
- Most boards cap a single search at ~1000 results.
- LinkedIn rate-limits aggressively around the 10th page of pagination on a single IP. Use
proxies. - For Indeed search-term tuning: it searches the description too. Use
-footo exclude,"exact phrase"for exact match. Example:search_term='"site reliability engineer" (kubernetes OR terraform) -recruiter'
- For Google: copy the exact filter syntax from a real Google Jobs search and pass it as
google_search_term. - For Greenhouse: keyword + location are passed straight to a Google
site:greenhouse.ioquery, so Boolean operators and quotes work. Don't quote the full"City, ST"— quote the city alone, leave the state bare.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jobdrop-2.1.1.tar.gz.
File metadata
- Download URL: jobdrop-2.1.1.tar.gz
- Upload date:
- Size: 99.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
baa62e13baedee975d72b2c2e2f0845db830f68fa02be16c8ef494b418c32b6c
|
|
| MD5 |
abc897240c7c03028a8979b12617c9b8
|
|
| BLAKE2b-256 |
a3e66bce9d76be20672000e53423c85c17edb272a7e3a471f3d1f760ebccd497
|
File details
Details for the file jobdrop-2.1.1-py3-none-any.whl.
File metadata
- Download URL: jobdrop-2.1.1-py3-none-any.whl
- Upload date:
- Size: 124.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
874e39dd64d75ef05f1ccbd6c537a68da8a3a399fd2fd2044016cb34c21644e3
|
|
| MD5 |
b3f799d1ffedaec118d673bdba895eb7
|
|
| BLAKE2b-256 |
fef25c5e9f027f4140484052ffdcc95674f1ad28d733cc0399772f4e3daea46f
|