A small app to grab job postings from online job boards
Project description
Introduction
Job boards (like LinkedIn) can be a good source for finding job openings. Unfortunately the search results cannot always be filtered to a usable degree. This application lets users scrape and parse jobs with more flexability provided by the default search.
Currently only LinkedIn is supported.
Project Structure
Directories:
src/exfill/parsers
- Contains parser(s)src/exfill/scrapers
- Contains scraper(s)src/exfill/support
- Contains
geckodriver
driver for FireFox which is used by Selenium - Download the latest driver from the Mozilla GeckoDriver repo in GitHub
- Contains
data/html
- Not in source control
- Contains HTML elements for a specific job posting
- Populated by a scraper
data/csv
- Not in source control
- Contains parsed information in a csv table
- Populated by a parser
- Also contains an error table
logs
- Not in source control
- Contains logs created during execution
creds.json
File
Syntax should be as follows:
{
"linkedin": {
"username": "jay-law@gmail.com",
"password": "password1"
}
}
Usage
Configure Environment
Tested on Ubuntu Ubuntu 20.04.4 LTS (64-bit) and Python 3.8.10.
# Confirm Python 3 is installed
$ python3 --version
Python 3.8.10
# Install venv
$ sudo apt install python3.8-venv
# Install pip
$ sudo apt install python3-pip
Execution
There are two phase. First is scraping the postings. Second is parsing the scraped information. Therefore the scraping phase must occur before the parsing phase.
# Scrape linkedin
$ python3 src/exfill/extractor.py linkedin scrape
# Parse linkedin
$ python3 src/exfill/extractor.py linkedin parse
# or execute as module
$ python3 -m exfill.extractor linkedin parse
Roadmap
- Write unit tests
- Improve secret handling
- Add packaging
- Move paths to config file
- Move keyword logic
- Set/include default config.ini for users installing with PIP
- Add CICD
- Automate versioning
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
exfill-0.0.11.tar.gz
(2.7 MB
view hashes)