Stealth job scraper for LinkedIn, Indeed & Glassdoor — powered by Scrapling
Project description
JobQuest
A Python library that scrapes real job postings from LinkedIn, Indeed, and Glassdoor — and actually gets results back instead of 403 errors.
Built on top of Scrapling for Chrome-level TLS fingerprinting and Cloudflare bypass. Every request looks like it came from a real browser, because under the hood it basically did.
Why this exists
Most job scraping libraries break within weeks. Cloudflare updates its bot detection, LinkedIn starts rate-limiting harder, Glassdoor changes its GraphQL schema — and suddenly you're staring at empty DataFrames.
JobQuest handles this with stealth-first defaults:
- Chrome TLS fingerprinting via
curl_cffi— your requests have the same TLS signature as Chrome 138 - Cloudflare bypass via Scrapling's
StealthyFetcherwhen needed (full headless Chromium) - Realistic headers generated by
browserforge— not hardcoded strings from 2023 - Automatic fallback — if stealth isn't installed, it falls back to
requests/tls_client
Install
pip install jobquest
Quick start
from jobquest import scrape_jobs
jobs = scrape_jobs(
site_name=["linkedin", "indeed", "glassdoor"],
search_term="AI Engineer",
location="Germany",
results_wanted=25,
country_indeed="germany",
)
print(f"{len(jobs)} jobs found")
print(jobs[["title", "company", "location"]])
That's it. Returns a Pandas DataFrame with all the job data you'd expect.
What you get back
Each row in the DataFrame has:
| Column | Description |
|---|---|
title |
Job title |
company |
Company name |
location |
City, state, country |
job_url |
Link to the posting |
date_posted |
When it was listed |
description |
Full job description (markdown by default) |
salary_source |
Where salary info came from |
min_amount / max_amount |
Salary range |
is_remote |
Remote or not |
job_type |
Full-time, part-time, contract, etc. |
company_url |
Company page link |
emails |
Contact emails found in the description |
Plus site-specific fields like job_level, job_function, and more.
Supported sites
| Site | Status | Notes |
|---|---|---|
| Working | Guest API, no login needed | |
| Indeed | Working | GraphQL API, mobile app headers |
| Glassdoor | Working | GraphQL API, auto CSRF token |
All the options
from jobquest import scrape_jobs
jobs = scrape_jobs(
site_name="linkedin", # or ["linkedin", "indeed", "glassdoor"]
search_term="machine learning",
location="Berlin",
distance=50, # km radius
is_remote=True,
job_type="fulltime", # fulltime, parttime, contract, internship
easy_apply=True, # LinkedIn/Glassdoor easy apply filter
results_wanted=50,
country_indeed="germany", # for Indeed's country-specific API
hours_old=72, # only jobs posted in last 72 hours
linkedin_fetch_description=True, # fetch full descriptions (slower)
description_format="markdown", # markdown, html, or plain
enforce_annual_salary=True, # normalize all salaries to yearly
proxies=["http://user:pass@proxy:8080"],
verbose=2, # 0=errors, 1=warnings, 2=info
)
How the stealth works
JobQuest uses a two-tier approach:
Tier 1 -- StealthSession (default for all requests)
Every HTTP request goes through Scrapling's FetcherSession, which uses curl_cffi to impersonate Chrome's TLS fingerprint. Headers are generated by browserforge to match real browser patterns. This is enough for LinkedIn, Indeed, and most API calls.
Tier 2 -- StealthyFetcher (for Cloudflare-protected pages)
When a site puts up a Cloudflare challenge (like Glassdoor's CSRF page), JobQuest launches a real headless Chromium browser via patchright to solve it. Cookies from that session get transferred back to the lighter HTTP session for subsequent requests.
If scrapling isn't installed, everything falls back to requests + tls_client -- the same approach most job scrapers use. You just lose the stealth advantage.
Using proxies
jobs = scrape_jobs(
site_name="linkedin",
search_term="data engineer",
proxies=["http://user:pass@proxy1:8080", "socks5://proxy2:1080"],
)
Proxies rotate automatically between requests.
Export to CSV / Excel
import csv
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False)
jobs.to_excel("jobs.xlsx", index=False)
Requirements
- Python 3.10+
- The stealth stack installs automatically:
scrapling,curl_cffi,browserforge,patchright,playwright
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jobquest-0.1.1.tar.gz.
File metadata
- Download URL: jobquest-0.1.1.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c80480077f895e1ac317ef309926c329b5bfaac5975122dd7a30915c94ad2d3d
|
|
| MD5 |
587c1e3e47fba044aeb15b12a2d42773
|
|
| BLAKE2b-256 |
24cd83fb3b6e53ed44c4723d3c81f3dc3d23a354d3bdb0e0b751edf4c9cb9175
|
File details
Details for the file jobquest-0.1.1-py3-none-any.whl.
File metadata
- Download URL: jobquest-0.1.1-py3-none-any.whl
- Upload date:
- Size: 33.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a146618f9c330ca6970d5c63a6e9d635137b99ff6e8f546ce632ff602ea50e5b
|
|
| MD5 |
a7549630a83eb32142611c0d125f1aec
|
|
| BLAKE2b-256 |
2392521dd96dc93f380f1fdc819b657bbe64bc8b3c39f662aa0db81d1028b4d6
|