Automated tool for scraping job postings.
Project description
Automated tool for scraping job postings into a .csv
file.
Benefits over job search sites:
- Never see the same job twice!
- No advertising.
- See jobs from multiple job search websites all in one place.
Installation
JobFunnel requires Python 3.8 or later.
pip install git+https://github.com/PaulMcInnis/JobFunnel.git
Usage
By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.
Configure
You can search for jobs with YAML configuration files or by passing command arguments.
Download the demo settings.yaml by running the below command:
wget https://git.io/JUWeP -O my_settings.yaml
NOTE:
-
It is recommended to provide as few search keywords as possible (i.e.
Python
,AI
). -
JobFunnel currently only supports
CANADA_ENGLISH
andUSA_ENGLISH
locales.
Scrape
Run funnel
with your settings YAML to populate your master CSV file with jobs from available providers:
funnel load -s my_settings.yaml
Review
Open the master CSV file and update the per-job status
:
-
Set to
interested
,applied
,interview
oroffer
to reflect your progression on the job. -
Set to
archive
,rejected
ordelete
to remove a job from this search. You can review 'blocked' jobs within yourblock_list_file
.
Advanced Usage
-
Automating Searches
JobFunnel can be easily automated to run nightly with crontab
For more information see the crontab document. -
Writing your own Scrapers
If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details. -
Remote Work
Bypass a frustrating user experience looking for remote work by setting the search parameterremoteness
to match your desired level, i.e.FULLY_REMOTE
. -
Adding Support for X Language / Job Website
JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details. -
Blocking Companies
Filter undesired companies by adding them to yourcompany_block_list
in your YAML or pass them by command line as-cbl
. -
Job Age Filter
You can configure the maximum age of scraped listings (in days) by configuringmax_listing_days
. -
Reviewing Jobs in Terminal
You can review the job list in the command line:column -s, -t < master_list.csv | less -#2 -N -S
-
Respectful Delaying
Respectfully scrape your job posts with our built-in delaying algorithms.To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.
-
Recovering Lost Data
JobFunnel can re-build your master CSV from yourcache_folder
where all the historic scrape data is located:funnel --recover
-
Running by CLI
You can run JobFunnel using CLI only, review the command structure via:funnel inline -h
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file JobFunnel-3.0.0.tar.gz
.
File metadata
- Download URL: JobFunnel-3.0.0.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f750ed99b356471b9d8f3857df31626d078b5b9bdbdb2134996694da60afa78d |
|
MD5 | 2c8b7fdb6e612d105b3f7f30abd55bb7 |
|
BLAKE2b-256 | af05d645c888f28565e9530949d76ff27bad1a95edf0e85588b95cbe6c52d77a |
File details
Details for the file JobFunnel-3.0.0-py3-none-any.whl
.
File metadata
- Download URL: JobFunnel-3.0.0-py3-none-any.whl
- Upload date:
- Size: 62.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5655a72aee2335be4cb4322fa88215ea7bbf0dcc2bb2f65ab41dd7f1de767abd |
|
MD5 | 3f10a9042444224e8239ebdfa24d140d |
|
BLAKE2b-256 | a8fe3112b07814269f10a0a77b8fe13cd2477414c0822f46f038cbc63d802479 |