Skip to main content

A tool to scrap job listings from USAJobs, CIRES, and CIRA locations.

Project description

Job Scraper

Description

Job Scraper is a Python tool designed to scrape job listings from various sources including USAJobs, CIRES, and CIRA. It filters job listings based on specified keywords and can save the results to JSON files for further analysis or processing.

Installation

Before installing Job Scraper, ensure you have Python and Poetry installed on your system. You will also need a suitable Chrome WebDriver installed and accessible on your system path.

  1. Clone the repository:

    git clone https://github.com/NOAA-GSL/jscraper.git
    cd JobScraper
    
  2. Install the project dependencies using Poetry:

    poetry install
    

This will create a virtual environment and install all necessary dependencies.

Configuration

Credentials

Before running the scraper, you need to provide your user agent and API authorization key for USAJobs. These should be stored in a credentials file located at ~/.jscraper/credentials. The file should have the following format:

USER_AGENT=your_user_agent_here
AUTHORIZATION_KEY=your_authorization_key_here

Replace your_user_agent_here and your_authorization_key_here with your actual credentials. The user agent is a string that identifies your web browser to servers, while the authorization key is a specific key provided by USAJobs for accessing their API. API keys can be requested at https://developer.usajobs.gov/apirequest/

Command-line Arguments

The Job Scraper tool can be configured using command-line arguments:

--usajobs-keyword: Keyword for filtering USAJobs listings.
--cires-keyword: Keyword for filtering CIRES job listings.
--cira-keyword: Keyword for filtering CIRA job listings.
--usajobs-json-file: Path to save the fetched USAJobs listings.
--cires-json-file: Path to save the fetched CIRES listings.
--cira-json-file: Path to save the fetched CIRA listings.
--verbose: Enable verbose logging.

Usage

Activate the Poetry virtual environment and run the main.py script with the desired command-line arguments. Here are some example usages:

Scrape federal job listings with a specific keyword:

poetry run python main.py --usajobs-keyword "Data Scientist"

Scrape CIRES job listings and save to a JSON file:

poetry run python main.py --cires-keyword "Climate" --cires-json-file "cires_jobs.json"

For more information on available command-line options, use:

poetry run python main.py --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jscraper-0.1.2.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

jscraper-0.1.2-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file jscraper-0.1.2.tar.gz.

File metadata

  • Download URL: jscraper-0.1.2.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure

File hashes

Hashes for jscraper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a10e8f7a9cf526be1a0ce87d120c7fc3c8ed75278ce8c273fee8a3fbfad3088e
MD5 e4bc69bd266e741d4db4d21524759708
BLAKE2b-256 8fae18f97229ac32c94ace2d8249a77994d5e41a1d91b475b84ccb6f8f326717

See more details on using hashes here.

File details

Details for the file jscraper-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: jscraper-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure

File hashes

Hashes for jscraper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b36e41d27478df70e19f2f6c29ae9540e9382c07e2b7425a4528e11366299c8d
MD5 10e151264551f5e0e3d1649fa558943f
BLAKE2b-256 722bd67bb49b4730b001bc8d2fdc36934344bb41dfa52836244615dcb793db69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page