Enhanced job scraper for LinkedIn, Indeed, Glassdoor, ZipRecruiter with improved filtering capabilities
Project description
JobSpy Enhanced Scraper is an enhanced job scraping library with the goal of aggregating all the jobs from popular job boards with one tool. This enhanced version includes improved filtering capabilities and fixes for LinkedIn and Indeed limitations.
๐ What's New in Enhanced Version
- โ
Fixed LinkedIn Limitations: Can now combine
hours_old+easy_apply+job_type+is_remote - โ
Fixed Indeed Limitations: Can now combine
hours_old+job_type+is_remote+easy_apply - โ Enhanced Filtering: All scrapers now support multiple filter combinations
- โ Improved Performance: Better error handling and rate limiting management
- โ Backward Compatible: All existing code continues to work
Features
- Scrapes job postings from LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter, & other job boards concurrently
- Aggregates the job postings in a dataframe
- Proxies support to bypass blocking
Installation
pip install -U jobspy-enhanced-scraper
Python version >= 3.10 required
Usage
import csv
from jobspy_enhanced import scrape_jobs
jobs = scrape_jobs(
site_name=["indeed", "linkedin", "zip_recruiter", "google"], # "glassdoor", "naukri"
search_term="software engineer",
google_search_term="software engineer jobs near San Francisco, CA since yesterday",
location="San Francisco, CA",
results_wanted=20,
hours_old=72,
country_indeed='USA',
# linkedin_fetch_description=True # gets more info such as description, direct job url (slower)
# proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel
Output
SITE TITLE COMPANY CITY STATE JOB_TYPE INTERVAL MIN_AMOUNT MAX_AMOUNT JOB_URL DESCRIPTION
indeed Software Engineer AMERICAN SYSTEMS Arlington VA None yearly 200000 150000 https://www.indeed.com/viewjob?jk=5e409e577046... THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed Senior Software Engineer TherapyNotes.com Philadelphia PA fulltime yearly 135000 110000 https://www.indeed.com/viewjob?jk=da39574a40cb... About Us TherapyNotes is the national leader i...
linkedin Software Engineer - Early Career Lockheed Martin Sunnyvale CA fulltime yearly None None https://www.linkedin.com/jobs/view/3693012711 Description:By bringing together people that u...
linkedin Full-Stack Software Engineer Rain New York NY fulltime yearly None None https://www.linkedin.com/jobs/view/3696158877 Rainโs mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad ZipRecruiter Santa Monica CA fulltime yearly 130000 150000 https://www.ziprecruiter.com/jobs/ziprecruiter... We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer TEKsystems Phoenix AZ fulltime hourly 65 75 https://www.ziprecruiter.com/jobs/teksystems-0... Top Skills' Detailsโข 6 years of Java developme...
Parameters for scrape_jobs()
Optional
โโโ site_name (list|str):
| linkedin, zip_recruiter, indeed, glassdoor, google, naukri
| (default is all)
โ
โโโ search_term (str)
|
โโโ google_search_term (str)
| search term for google jobs. This is the only param for filtering google jobs.
โ
โโโ location (str)
โ
โโโ distance (int):
| in miles, default 50
โ
โโโ job_type (str):
| fulltime, parttime, internship, contract
โ
โโโ proxies (list):
| in format ['user:pass@host:port', 'localhost']
| each job board scraper will round robin through the proxies
|
โโโ is_remote (bool)
โ
โโโ results_wanted (int):
| number of job results to retrieve for each site specified in 'site_name'
โ
โโโ easy_apply (bool):
| filters for jobs that are hosted on the job board site (LinkedIn easy apply filter no longer works)
|
โโโ user_agent (str):
| override the default user agent which may be outdated
โ
โโโ description_format (str):
| markdown, html (Format type of the job descriptions. Default is markdown.)
โ
โโโ offset (int):
| starts the search from an offset (e.g. 25 will start the search from the 25th result)
โ
โโโ hours_old (int):
| filters jobs by the number of hours since the job was posted
| (ZipRecruiter and Glassdoor round up to next day.)
โ
โโโ verbose (int) {0, 1, 2}:
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)
โโโ linkedin_fetch_description (bool):
| fetches full description and direct job url for LinkedIn (Increases requests by O(n))
โ
โโโ linkedin_company_ids (list[int]):
| searches for linkedin jobs with specific company ids
|
โโโ country_indeed (str):
| filters the country on Indeed & Glassdoor (see below for correct spelling)
|
โโโ enforce_annual_salary (bool):
| converts wages to annual salary
|
โโโ ca_cert (str)
| path to CA Certificate file for proxies
โโโ Indeed limitations:
| โ
FIXED: All filters can now be combined:
| - hours_old + job_type + is_remote + easy_apply
โ
โโโ LinkedIn limitations:
| โ
FIXED: All filters can now be combined:
| - hours_old + easy_apply + job_type + is_remote
Supported Countries for Job Searching
LinkedIn searches globally & uses only the location parameter.
ZipRecruiter
ZipRecruiter searches for jobs in US/Canada & uses only the location parameter.
Indeed / Glassdoor
Indeed & Glassdoor supports most countries, but the country_indeed parameter is required. Additionally, use the location
parameter to narrow down the location, e.g. city & state if necessary.
You can specify the following countries when searching on Indeed (use the exact name, * indicates support for Glassdoor):
| Argentina | Australia* | Austria* | Bahrain |
| Belgium* | Brazil* | Canada* | Chile |
| China | Colombia | Costa Rica | Czech Republic |
| Denmark | Ecuador | Egypt | Finland |
| France* | Germany* | Greece | Hong Kong* |
| Hungary | India* | Indonesia | Ireland* |
| Israel | Italy* | Japan | Kuwait |
| Luxembourg | Malaysia | Mexico* | Morocco |
| Netherlands* | New Zealand* | Nigeria | Norway |
| Oman | Pakistan | Panama | Peru |
| Philippines | Poland | Portugal | Qatar |
| Romania | Saudi Arabia | Singapore* | South Africa |
| South Korea | Spain* | Sweden | Switzerland* |
| Taiwan | Thailand | Turkey | Ukraine |
| United Arab Emirates | UK* | USA* | Uruguay |
| Venezuela | Vietnam* |
Notes
- Indeed is the best scraper currently with no rate limiting.
- All the job board endpoints are capped at around 1000 jobs on a given search.
- LinkedIn is the most restrictive and usually rate limits around the 10th page with one ip. Proxies are a must basically.
Frequently Asked Questions
Q: Why is Indeed giving unrelated roles?
A: Indeed searches the description too.
- use - to remove words
- "" for exact match
Example of a good Indeed query
search_term='"engineering intern" software summer (java OR python OR c++) 2025 -tax -marketing'
This searches the description/title and must include software, summer, 2025, one of the languages, engineering intern exactly, no tax, no marketing.
Q: No results when using "google"?
A: You have to use super specific syntax. Search for google jobs on your browser and then whatever pops up in the google jobs search box after applying some filters is what you need to copy & paste into the google_search_term.
Q: Received a response code 429?
A: This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:
- Wait some time between scrapes (site-dependent).
- Try using the proxies param to change your IP address.
JobPost Schema
JobPost
โโโ title
โโโ company
โโโ company_url
โโโ job_url
โโโ location
โ โโโ country
โ โโโ city
โ โโโ state
โโโ is_remote
โโโ description
โโโ job_type: fulltime, parttime, internship, contract
โโโ job_function
โ โโโ interval: yearly, monthly, weekly, daily, hourly
โ โโโ min_amount
โ โโโ max_amount
โ โโโ currency
โ โโโ salary_source: direct_data, description (parsed from posting)
โโโ date_posted
โโโ emails
Linkedin specific
โโโ job_level
Linkedin & Indeed specific
โโโ company_industry
Indeed specific
โโโ company_country
โโโ company_addresses
โโโ company_employees_label
โโโ company_revenue_label
โโโ company_description
โโโ company_logo
Naukri specific
โโโ skills
โโโ experience_range
โโโ company_rating
โโโ company_reviews_count
โโโ vacancy_count
โโโ work_from_home_type
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jobspy_enhanced_scraper-1.3.7.tar.gz.
File metadata
- Download URL: jobspy_enhanced_scraper-1.3.7.tar.gz
- Upload date:
- Size: 73.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7a22f2e1ff70d6a0177aeee399c5e0e10cd59f58ef8746f435c81e471e40fe4
|
|
| MD5 |
92b9df287456029de1754fefc0cc6aa4
|
|
| BLAKE2b-256 |
984db2b37e46559c46be2ea51bf0317faff14b2329e0b28cb50f97e448eab5a5
|
File details
Details for the file jobspy_enhanced_scraper-1.3.7-py3-none-any.whl.
File metadata
- Download URL: jobspy_enhanced_scraper-1.3.7-py3-none-any.whl
- Upload date:
- Size: 82.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdb22a1ce666c1d13ce90823b7eaccb5b43ff22b914b3177d50e45752f5365ba
|
|
| MD5 |
c31100072ab60ac272707ea01e60d4a1
|
|
| BLAKE2b-256 |
dc394124ee055912752db4bdbcf02f1adae24a800cb03b174cd4526f0419e4df
|