Skip to main content

Stealth job scraper for LinkedIn, Indeed & Glassdoor — powered by Scrapling

Project description

JobQuest

A Python library that scrapes real job postings from LinkedIn, Indeed, and Glassdoor — and actually gets results back instead of 403 errors.

Built on top of Scrapling for Chrome-level TLS fingerprinting and Cloudflare bypass. Every request looks like it came from a real browser, because under the hood it basically did.

Why this exists

Most job scraping libraries break within weeks. Cloudflare updates its bot detection, LinkedIn starts rate-limiting harder, Glassdoor changes its GraphQL schema — and suddenly you're staring at empty DataFrames.

JobQuest handles this with stealth-first defaults:

  • Chrome TLS fingerprinting via curl_cffi — your requests have the same TLS signature as Chrome 138
  • Cloudflare bypass via Scrapling's StealthyFetcher when needed (full headless Chromium)
  • Realistic headers generated by browserforge — not hardcoded strings from 2023
  • Automatic fallback — if stealth isn't installed, it falls back to requests/tls_client

Install

pip install jobquest

Quick start

from jobquest import scrape_jobs

jobs = scrape_jobs(
    site_name=["linkedin", "indeed", "glassdoor"],
    search_term="AI Engineer",
    location="Germany",
    results_wanted=25,
    country_indeed="germany",
)

print(f"{len(jobs)} jobs found")
print(jobs[["title", "company", "location"]])

That's it. Returns a Pandas DataFrame with all the job data you'd expect.

What you get back

Each row in the DataFrame has:

Column Description
title Job title
company Company name
location City, state, country
job_url Link to the posting
date_posted When it was listed
description Full job description (markdown by default)
salary_source Where salary info came from
min_amount / max_amount Salary range
is_remote Remote or not
job_type Full-time, part-time, contract, etc.
company_url Company page link
emails Contact emails found in the description

Plus site-specific fields like job_level, job_function, and more.

Supported sites

Site Status Notes
LinkedIn Working Guest API, no login needed
Indeed Working GraphQL API, mobile app headers
Glassdoor Working GraphQL API, auto CSRF token

All the options

from jobquest import scrape_jobs

jobs = scrape_jobs(
    site_name="linkedin",              # or ["linkedin", "indeed", "glassdoor"]
    search_term="machine learning",
    location="Berlin",
    distance=50,                       # km radius
    is_remote=True,
    job_type="fulltime",               # fulltime, parttime, contract, internship
    easy_apply=True,                   # LinkedIn/Glassdoor easy apply filter
    results_wanted=50,
    country_indeed="germany",          # for Indeed's country-specific API
    hours_old=72,                      # only jobs posted in last 72 hours
    linkedin_fetch_description=True,   # fetch full descriptions (slower)
    description_format="markdown",     # markdown, html, or plain
    enforce_annual_salary=True,        # normalize all salaries to yearly
    proxies=["http://user:pass@proxy:8080"],
    verbose=2,                         # 0=errors, 1=warnings, 2=info
)

How the stealth works

JobQuest uses a two-tier approach:

Tier 1 -- StealthSession (default for all requests) Every HTTP request goes through Scrapling's FetcherSession, which uses curl_cffi to impersonate Chrome's TLS fingerprint. Headers are generated by browserforge to match real browser patterns. This is enough for LinkedIn, Indeed, and most API calls.

Tier 2 -- StealthyFetcher (for Cloudflare-protected pages) When a site puts up a Cloudflare challenge (like Glassdoor's CSRF page), JobQuest launches a real headless Chromium browser via patchright to solve it. Cookies from that session get transferred back to the lighter HTTP session for subsequent requests.

If scrapling isn't installed, everything falls back to requests + tls_client -- the same approach most job scrapers use. You just lose the stealth advantage.

Using proxies

jobs = scrape_jobs(
    site_name="linkedin",
    search_term="data engineer",
    proxies=["http://user:pass@proxy1:8080", "socks5://proxy2:1080"],
)

Proxies rotate automatically between requests.

Export to CSV / Excel

import csv

jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False)
jobs.to_excel("jobs.xlsx", index=False)

Requirements

  • Python 3.10+
  • The stealth stack installs automatically: scrapling, curl_cffi, browserforge, patchright, playwright

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobquest-0.1.7.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jobquest-0.1.7-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file jobquest-0.1.7.tar.gz.

File metadata

  • Download URL: jobquest-0.1.7.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for jobquest-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f4f79089de3a3389df292a42520cc88a8cd50d58a05fd6d7ee0ac01977ab6c6b
MD5 a9196da7b25c44dd5a09df411385ef42
BLAKE2b-256 75905c1a5b58d00c6094227d8c14c368cfc904291cf89e563e2d81764c68d2ba

See more details on using hashes here.

File details

Details for the file jobquest-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: jobquest-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for jobquest-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c3e188d29346d35c4db89a69240279da101cffd9b238e56f9b865a632f3acbcf
MD5 936fbcbe4d2aaf6d21bd422c2503778b
BLAKE2b-256 61030c8bf1fdd213290e92febc715bfd023397bf23e1c993fba9c7bdd62ce6b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page