Skip to main content

Enhanced job scraper for LinkedIn, Indeed, Glassdoor, ZipRecruiter with improved filtering capabilities

Project description

Build Status Test Status PyPI Version

JobSpy Enhanced Scraper is an enhanced job scraping library with the goal of aggregating all the jobs from popular job boards with one tool. This enhanced version includes improved filtering capabilities and fixes for LinkedIn and Indeed limitations.

๐Ÿš€ What's New in Enhanced Version

  • โœ… Fixed LinkedIn Limitations: Can now combine hours_old + easy_apply + job_type + is_remote
  • โœ… Fixed Indeed Limitations: Can now combine hours_old + job_type + is_remote + easy_apply
  • โœ… Enhanced Filtering: All scrapers now support multiple filter combinations
  • โœ… Improved Performance: Better error handling and rate limiting management
  • โœ… Backward Compatible: All existing code continues to work

Features

  • Scrapes job postings from LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter, & other job boards concurrently
  • Aggregates the job postings in a dataframe
  • Proxies support to bypass blocking

jobspy

Installation

pip install -U jobspy-enhanced-scraper

Python version >= 3.10 required

Usage

import csv
from jobspy_enhanced import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter", "google"], # "glassdoor", "naukri"
    search_term="software engineer",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
    location="San Francisco, CA",
    results_wanted=20,
    hours_old=72,
    country_indeed='USA',
    
    # linkedin_fetch_description=True # gets more info such as description, direct job url (slower)
    # proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel

Output

SITE           TITLE                             COMPANY           CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
indeed         Software Engineer                 AMERICAN SYSTEMS  Arlington     VA     None      yearly    200000      150000      https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed         Senior Software Engineer          TherapyNotes.com  Philadelphia  PA     fulltime  yearly    135000      110000      https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
linkedin       Software Engineer - Early Career  Lockheed Martin   Sunnyvale     CA     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3693012711      Description:By bringing together people that u...
linkedin       Full-Stack Software Engineer      Rain              New York      NY     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3696158877      Rainโ€™s mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad       ZipRecruiter      Santa Monica  CA     fulltime  yearly    130000      150000      https://www.ziprecruiter.com/jobs/ziprecruiter...  We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer                 TEKsystems        Phoenix       AZ     fulltime  hourly    65          75          https://www.ziprecruiter.com/jobs/teksystems-0...  Top Skills' Detailsโ€ข 6 years of Java developme...

Parameters for scrape_jobs()

Optional
โ”œโ”€โ”€ site_name (list|str): 
|    linkedin, zip_recruiter, indeed, glassdoor, google, naukri
|    (default is all)
โ”‚
โ”œโ”€โ”€ search_term (str)
|
โ”œโ”€โ”€ google_search_term (str)
|     search term for google jobs. This is the only param for filtering google jobs.
โ”‚
โ”œโ”€โ”€ location (str)
โ”‚
โ”œโ”€โ”€ distance (int): 
|    in miles, default 50
โ”‚
โ”œโ”€โ”€ job_type (str): 
|    fulltime, parttime, internship, contract
โ”‚
โ”œโ”€โ”€ proxies (list): 
|    in format ['user:pass@host:port', 'localhost']
|    each job board scraper will round robin through the proxies
|
โ”œโ”€โ”€ is_remote (bool)
โ”‚
โ”œโ”€โ”€ results_wanted (int): 
|    number of job results to retrieve for each site specified in 'site_name'
โ”‚
โ”œโ”€โ”€ easy_apply (bool): 
|    filters for jobs that are hosted on the job board site (LinkedIn easy apply filter no longer works)
|
โ”œโ”€โ”€ user_agent (str): 
|    override the default user agent which may be outdated
โ”‚
โ”œโ”€โ”€ description_format (str): 
|    markdown, html (Format type of the job descriptions. Default is markdown.)
โ”‚
โ”œโ”€โ”€ offset (int): 
|    starts the search from an offset (e.g. 25 will start the search from the 25th result)
โ”‚
โ”œโ”€โ”€ hours_old (int): 
|    filters jobs by the number of hours since the job was posted 
|    (ZipRecruiter and Glassdoor round up to next day.)
โ”‚
โ”œโ”€โ”€ verbose (int) {0, 1, 2}: 
|    Controls the verbosity of the runtime printouts 
|    (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)

โ”œโ”€โ”€ linkedin_fetch_description (bool): 
|    fetches full description and direct job url for LinkedIn (Increases requests by O(n))
โ”‚
โ”œโ”€โ”€ linkedin_company_ids (list[int]): 
|    searches for linkedin jobs with specific company ids
|
โ”œโ”€โ”€ country_indeed (str): 
|    filters the country on Indeed & Glassdoor (see below for correct spelling)
|
โ”œโ”€โ”€ enforce_annual_salary (bool): 
|    converts wages to annual salary
|
โ”œโ”€โ”€ ca_cert (str)
|    path to CA Certificate file for proxies
โ”œโ”€โ”€ Indeed limitations:
|    โœ… FIXED: All filters can now be combined:
|    - hours_old + job_type + is_remote + easy_apply
โ”‚
โ””โ”€โ”€ LinkedIn limitations:
|    โœ… FIXED: All filters can now be combined:
|    - hours_old + easy_apply + job_type + is_remote

Supported Countries for Job Searching

LinkedIn

LinkedIn searches globally & uses only the location parameter.

ZipRecruiter

ZipRecruiter searches for jobs in US/Canada & uses only the location parameter.

Indeed / Glassdoor

Indeed & Glassdoor supports most countries, but the country_indeed parameter is required. Additionally, use the location parameter to narrow down the location, e.g. city & state if necessary.

You can specify the following countries when searching on Indeed (use the exact name, * indicates support for Glassdoor):

Argentina Australia* Austria* Bahrain
Belgium* Brazil* Canada* Chile
China Colombia Costa Rica Czech Republic
Denmark Ecuador Egypt Finland
France* Germany* Greece Hong Kong*
Hungary India* Indonesia Ireland*
Israel Italy* Japan Kuwait
Luxembourg Malaysia Mexico* Morocco
Netherlands* New Zealand* Nigeria Norway
Oman Pakistan Panama Peru
Philippines Poland Portugal Qatar
Romania Saudi Arabia Singapore* South Africa
South Korea Spain* Sweden Switzerland*
Taiwan Thailand Turkey Ukraine
United Arab Emirates UK* USA* Uruguay
Venezuela Vietnam*

Notes

  • Indeed is the best scraper currently with no rate limiting.
  • All the job board endpoints are capped at around 1000 jobs on a given search.
  • LinkedIn is the most restrictive and usually rate limits around the 10th page with one ip. Proxies are a must basically.

Frequently Asked Questions


Q: Why is Indeed giving unrelated roles?
A: Indeed searches the description too.

  • use - to remove words
  • "" for exact match

Example of a good Indeed query

search_term='"engineering intern" software summer (java OR python OR c++) 2025 -tax -marketing'

This searches the description/title and must include software, summer, 2025, one of the languages, engineering intern exactly, no tax, no marketing.


Q: No results when using "google"?
A: You have to use super specific syntax. Search for google jobs on your browser and then whatever pops up in the google jobs search box after applying some filters is what you need to copy & paste into the google_search_term.


Q: Received a response code 429?
A: This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:

  • Wait some time between scrapes (site-dependent).
  • Try using the proxies param to change your IP address.

JobPost Schema

JobPost
โ”œโ”€โ”€ title
โ”œโ”€โ”€ company
โ”œโ”€โ”€ company_url
โ”œโ”€โ”€ job_url
โ”œโ”€โ”€ location
โ”‚   โ”œโ”€โ”€ country
โ”‚   โ”œโ”€โ”€ city
โ”‚   โ”œโ”€โ”€ state
โ”œโ”€โ”€ is_remote
โ”œโ”€โ”€ description
โ”œโ”€โ”€ job_type: fulltime, parttime, internship, contract
โ”œโ”€โ”€ job_function
โ”‚   โ”œโ”€โ”€ interval: yearly, monthly, weekly, daily, hourly
โ”‚   โ”œโ”€โ”€ min_amount
โ”‚   โ”œโ”€โ”€ max_amount
โ”‚   โ”œโ”€โ”€ currency
โ”‚   โ””โ”€โ”€ salary_source: direct_data, description (parsed from posting)
โ”œโ”€โ”€ date_posted
โ””โ”€โ”€ emails

Linkedin specific
โ””โ”€โ”€ job_level

Linkedin & Indeed specific
โ””โ”€โ”€ company_industry

Indeed specific
โ”œโ”€โ”€ company_country
โ”œโ”€โ”€ company_addresses
โ”œโ”€โ”€ company_employees_label
โ”œโ”€โ”€ company_revenue_label
โ”œโ”€โ”€ company_description
โ””โ”€โ”€ company_logo

Naukri specific
โ”œโ”€โ”€ skills
โ”œโ”€โ”€ experience_range
โ”œโ”€โ”€ company_rating
โ”œโ”€โ”€ company_reviews_count
โ”œโ”€โ”€ vacancy_count
โ””โ”€โ”€ work_from_home_type

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobspy_enhanced_scraper-1.3.3.tar.gz (55.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jobspy_enhanced_scraper-1.3.3-py3-none-any.whl (62.8 kB view details)

Uploaded Python 3

File details

Details for the file jobspy_enhanced_scraper-1.3.3.tar.gz.

File metadata

  • Download URL: jobspy_enhanced_scraper-1.3.3.tar.gz
  • Upload date:
  • Size: 55.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for jobspy_enhanced_scraper-1.3.3.tar.gz
Algorithm Hash digest
SHA256 bd3f94d5912bb8850ac06685744f79e3886b23b39070712f68272326002fa1eb
MD5 d43709eed066542fd2fd5ab8abee2fc8
BLAKE2b-256 8f93f16064d2d9a709edf9eac67e6e065a3c99e6e3dc82e43e71e5a32a43fc5b

See more details on using hashes here.

File details

Details for the file jobspy_enhanced_scraper-1.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for jobspy_enhanced_scraper-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0fe279d3f7c7ffe6c115ff4546a758334991a8f38939efddff7b8efd8edb35db
MD5 127434aa453874f148da55116044dab3
BLAKE2b-256 b2a349bc189a4faa1783959f6151bfe13cd95b2e119043253bc508f555728942

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page