Skip to main content

Automated tool for scraping job postings.

Project description

JobFunnel Banner
Build Status Code Coverage

Automated tool for scraping job postings into a .csv file.

Benefits over job search sites:

  • Never see the same job twice!
  • No advertising.
  • See jobs from multiple job search websites all in one place.

masterlist.csv

Installation

JobFunnel requires Python 3.8 or later.

pip install git+https://github.com/PaulMcInnis/JobFunnel.git

Usage

By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.

Configure

You can search for jobs with YAML configuration files or by passing command arguments.

Download the demo settings.yaml by running the below command:

wget https://git.io/JUWeP -O my_settings.yaml

NOTE:

  • It is recommended to provide as few search keywords as possible (i.e. Python, AI).

  • JobFunnel currently only supports CANADA_ENGLISH and USA_ENGLISH locales.

Scrape

Run funnel with your settings YAML to populate your master CSV file with jobs from available providers:

funnel load -s my_settings.yaml

Review

Open the master CSV file and update the per-job status:

  • Set to interested, applied, interview or offer to reflect your progression on the job.

  • Set to archive, rejected or delete to remove a job from this search. You can review 'blocked' jobs within your block_list_file.

Advanced Usage

  • Automating Searches
    JobFunnel can be easily automated to run nightly with crontab
    For more information see the crontab document.

  • Writing your own Scrapers
    If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details.

  • Remote Work
    Bypass a frustrating user experience looking for remote work by setting the search parameter remoteness to match your desired level, i.e. FULLY_REMOTE.

  • Adding Support for X Language / Job Website
    JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.

  • Blocking Companies
    Filter undesired companies by adding them to your company_block_list in your YAML or pass them by command line as -cbl.

  • Job Age Filter
    You can configure the maximum age of scraped listings (in days) by configuring max_listing_days.

  • Reviewing Jobs in Terminal
    You can review the job list in the command line:

    column -s, -t < master_list.csv | less -#2 -N -S
    
  • Respectful Delaying
    Respectfully scrape your job posts with our built-in delaying algorithms.

    To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.

  • Recovering Lost Data
    JobFunnel can re-build your master CSV from your cache_folder where all the historic scrape data is located:

    funnel --recover
    
  • Running by CLI
    You can run JobFunnel using CLI only, review the command structure via:

    funnel inline -h
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JobFunnel-3.0.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

JobFunnel-3.0.0-py3-none-any.whl (62.3 kB view details)

Uploaded Python 3

File details

Details for the file JobFunnel-3.0.0.tar.gz.

File metadata

  • Download URL: JobFunnel-3.0.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for JobFunnel-3.0.0.tar.gz
Algorithm Hash digest
SHA256 f750ed99b356471b9d8f3857df31626d078b5b9bdbdb2134996694da60afa78d
MD5 2c8b7fdb6e612d105b3f7f30abd55bb7
BLAKE2b-256 af05d645c888f28565e9530949d76ff27bad1a95edf0e85588b95cbe6c52d77a

See more details on using hashes here.

File details

Details for the file JobFunnel-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: JobFunnel-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 62.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for JobFunnel-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5655a72aee2335be4cb4322fa88215ea7bbf0dcc2bb2f65ab41dd7f1de767abd
MD5 3f10a9042444224e8239ebdfa24d140d
BLAKE2b-256 a8fe3112b07814269f10a0a77b8fe13cd2477414c0822f46f038cbc63d802479

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page