A simple job postings scraper for Indeed based on requests and BeautifulSoup.
Project description
jobs_scraper
jobs_scraper is a simple job postings scraper for the website Indeed, it is written in Python and is based on the requests
and BeautifulSoup
libraries.
Installation
Run the following to install the package:
pip install jobs_scraper
Usage
To use jobs_scraper you need to create a new JobsScraper object and provide the following attributes to its constructor:
country
: prefix country.position
: job position.location
: job location.pages
: number of pages to be scraped.
from jobs_scraper import JobsScraper
# Let's create a new JobsScraper object and perform the scraping for a given query.
scraper = JobsScraper(country="nl", position="Data Engineer", location="Amsterdam", pages=3)
df = scraper.scrape()
In this way, the first three pages for the example query "Data Engineer" based in "Amsterdam" on the Dutch version of the portal Indeed get scraped.
The scrape
method returns a Pandas dataframe, therefore it is possible to export it into a csv file.
Additional Parameters
-
max_delay
: bearing in mind that this package is meant only for educational purposes, a delay in the requests can be provided. By settingmax_delay
in the constructor, every job posting will be randomly scraped in an interval between0
andmax_delay
seconds.scraper = JobsScraper(country="...", position="...", location="...", pages=..., max_delay=5)
-
full_urls
: since most of the scraped job urls are pretty long, the returned Pandas dataframe will truncate them, making it not simple to access. Settingfull_urls
toTrue
, the scraped urls will not be truncated.scraper = JobsScraper(country="...", position="...", location="...", pages=..., full_urls=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file jobs_scraper-0.0.5.tar.gz
.
File metadata
- Download URL: jobs_scraper-0.0.5.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50ef1dacefad1c7ebb8837180850e0eb5e812480ed602044cb17fd2d7784e7d6 |
|
MD5 | 040bacaa015266221fd1339b894c8a25 |
|
BLAKE2b-256 | a5bdf6ad975d1fd71f0d5eeb10abf36d4e1fa071031fe61e438d397515b86da2 |