A tool to scrap job listings from USAJobs, CIRES, and CIRA locations.
Project description
Job Scraper
Description
Job Scraper is a Python tool designed to scrape job listings from various sources including USAJobs, CIRES, and CIRA. It filters job listings based on specified keywords and can save the results to JSON files for further analysis or processing.
Installation
Before installing Job Scraper, ensure you have Python and Poetry installed on your system. You will also need a suitable Chrome WebDriver installed and accessible on your system path.
-
Clone the repository:
git clone https://github.com/NOAA-GSL/jscraper.git cd JobScraper
-
Install the project dependencies using Poetry:
poetry install
This will create a virtual environment and install all necessary dependencies.
Configuration
Credentials
Before running the scraper, you need to provide your user agent and API authorization key for USAJobs. These should be stored in a credentials file located at ~/.jscraper/credentials. The file should have the following format:
USER_AGENT=your_user_agent_here
AUTHORIZATION_KEY=your_authorization_key_here
Replace your_user_agent_here and your_authorization_key_here with your actual credentials. The user agent is a string that identifies your web browser to servers, while the authorization key is a specific key provided by USAJobs for accessing their API. API keys can be requested at https://developer.usajobs.gov/apirequest/
Command-line Arguments
The Job Scraper tool can be configured using command-line arguments:
--usajobs-keyword: Keyword for filtering USAJobs listings.
--cires-keyword: Keyword for filtering CIRES job listings.
--cira-keyword: Keyword for filtering CIRA job listings.
--usajobs-json-file: Path to save the fetched USAJobs listings.
--cires-json-file: Path to save the fetched CIRES listings.
--cira-json-file: Path to save the fetched CIRA listings.
--verbose: Enable verbose logging.
Usage
Activate the Poetry virtual environment and run the main.py script with the desired command-line arguments. Here are some example usages:
Scrape federal job listings with a specific keyword:
poetry run python main.py --usajobs-keyword "Data Scientist"
Scrape CIRES job listings and save to a JSON file:
poetry run python main.py --cires-keyword "Climate" --cires-json-file "cires_jobs.json"
For more information on available command-line options, use:
poetry run python main.py --help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jscraper-0.1.2.tar.gz
.
File metadata
- Download URL: jscraper-0.1.2.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a10e8f7a9cf526be1a0ce87d120c7fc3c8ed75278ce8c273fee8a3fbfad3088e |
|
MD5 | e4bc69bd266e741d4db4d21524759708 |
|
BLAKE2b-256 | 8fae18f97229ac32c94ace2d8249a77994d5e41a1d91b475b84ccb6f8f326717 |
File details
Details for the file jscraper-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: jscraper-0.1.2-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b36e41d27478df70e19f2f6c29ae9540e9382c07e2b7425a4528e11366299c8d |
|
MD5 | 10e151264551f5e0e3d1649fa558943f |
|
BLAKE2b-256 | 722bd67bb49b4730b001bc8d2fdc36934344bb41dfa52836244615dcb793db69 |