Skip to main content

Stealth job scraper for LinkedIn, Indeed & Glassdoor — powered by Scrapling

Project description

JobQuest

A Python library that scrapes real job postings from LinkedIn, Indeed, and Glassdoor — and actually gets results back instead of 403 errors.

Built on top of Scrapling for Chrome-level TLS fingerprinting and Cloudflare bypass. Every request looks like it came from a real browser, because under the hood it basically did.

Why this exists

Most job scraping libraries break within weeks. Cloudflare updates its bot detection, LinkedIn starts rate-limiting harder, Glassdoor changes its GraphQL schema — and suddenly you're staring at empty DataFrames.

JobQuest handles this with stealth-first defaults:

  • Chrome TLS fingerprinting via curl_cffi — your requests have the same TLS signature as Chrome 138
  • Cloudflare bypass via Scrapling's StealthyFetcher when needed (full headless Chromium)
  • Realistic headers generated by browserforge — not hardcoded strings from 2023
  • Automatic fallback — if stealth isn't installed, it falls back to requests/tls_client

Install

pip install jobquest

Quick start

from jobquest import scrape_jobs

jobs = scrape_jobs(
    site_name=["linkedin", "indeed", "glassdoor"],
    search_term="AI Engineer",
    location="Germany",
    results_wanted=25,
    country_indeed="germany",
)

print(f"{len(jobs)} jobs found")
print(jobs[["title", "company", "location"]])

That's it. Returns a Pandas DataFrame with all the job data you'd expect.

What you get back

Each row in the DataFrame has:

Column Description
title Job title
company Company name
location City, state, country
job_url Link to the posting
date_posted When it was listed
description Full job description (markdown by default)
salary_source Where salary info came from
min_amount / max_amount Salary range
is_remote Remote or not
job_type Full-time, part-time, contract, etc.
company_url Company page link
emails Contact emails found in the description

Plus site-specific fields like job_level, job_function, and more.

Supported sites

Site Status Notes
LinkedIn Working Guest API, no login needed
Indeed Working GraphQL API, mobile app headers
Glassdoor Working GraphQL API, auto CSRF token

All the options

from jobquest import scrape_jobs

jobs = scrape_jobs(
    site_name="linkedin",              # or ["linkedin", "indeed", "glassdoor"]
    search_term="machine learning",
    location="Berlin",
    distance=50,                       # km radius
    is_remote=True,
    job_type="fulltime",               # fulltime, parttime, contract, internship
    easy_apply=True,                   # LinkedIn/Glassdoor easy apply filter
    results_wanted=50,
    country_indeed="germany",          # for Indeed's country-specific API
    hours_old=72,                      # only jobs posted in last 72 hours
    linkedin_fetch_description=True,   # fetch full descriptions (slower)
    description_format="markdown",     # markdown, html, or plain
    enforce_annual_salary=True,        # normalize all salaries to yearly
    proxies=["http://user:pass@proxy:8080"],
    verbose=2,                         # 0=errors, 1=warnings, 2=info
)

How the stealth works

JobQuest uses a two-tier approach:

Tier 1 -- StealthSession (default for all requests) Every HTTP request goes through Scrapling's FetcherSession, which uses curl_cffi to impersonate Chrome's TLS fingerprint. Headers are generated by browserforge to match real browser patterns. This is enough for LinkedIn, Indeed, and most API calls.

Tier 2 -- StealthyFetcher (for Cloudflare-protected pages) When a site puts up a Cloudflare challenge (like Glassdoor's CSRF page), JobQuest launches a real headless Chromium browser via patchright to solve it. Cookies from that session get transferred back to the lighter HTTP session for subsequent requests.

If scrapling isn't installed, everything falls back to requests + tls_client -- the same approach most job scrapers use. You just lose the stealth advantage.

Using proxies

jobs = scrape_jobs(
    site_name="linkedin",
    search_term="data engineer",
    proxies=["http://user:pass@proxy1:8080", "socks5://proxy2:1080"],
)

Proxies rotate automatically between requests.

Export to CSV / Excel

import csv

jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False)
jobs.to_excel("jobs.xlsx", index=False)

Requirements

  • Python 3.10+
  • The stealth stack installs automatically: scrapling, curl_cffi, browserforge, patchright, playwright

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobquest-0.1.6.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jobquest-0.1.6-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file jobquest-0.1.6.tar.gz.

File metadata

  • Download URL: jobquest-0.1.6.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for jobquest-0.1.6.tar.gz
Algorithm Hash digest
SHA256 3faa715217952d96a0d47bd32e5af9b5ec8991e66046dc409c9f27a770f008bc
MD5 5f6baa7677fcb85e50f88af841cedae2
BLAKE2b-256 f412f6b787b8e6abf8dd69e3c0953db528e8afb5d890cb471b91410d0a7502b7

See more details on using hashes here.

File details

Details for the file jobquest-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: jobquest-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for jobquest-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 32974dee37ae1b7fcd3c4290ad0299f0046f05c4a3c180360fb861a607affc56
MD5 7d84395bf3d074aa4c5bac8cd53939f2
BLAKE2b-256 3c5d2502883a56079bb84e4950fd73314eeb12b81833f010b58f0c3661c0ff63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page