A small app to grab job postings from online job boards
Project description
Introduction
Job boards (like LinkedIn) can be a good source for finding job openings. Unfortunately the search results cannot always be filtered to a usable degree. Exfill (short for extraction) lets users scrape and parse jobs with more flexability provided by the default search.
Currently only LinkedIn is supported.
Project Structure
Directories:
src/exfill/parsers- Contains parser(s)src/exfill/scrapers- Contains scraper(s)src/exfill/support- Contains
geckodriverdriver for FireFox which is used by Selenium - Download the latest driver from the Mozilla GeckoDriver repo in GitHub
- Contains
data/html- Not in source control
- Contains HTML elements for a specific job posting
- Populated by a scraper
data/csv- Not in source control
- Contains parsed information in a csv table
- Populated by a parser
- Also contains an error table
logs- Not in source control
- Contains logs created during execution
creds.json File
Syntax should be as follows:
{
"linkedin": {
"username": "jay-law@protonmail.com",
"password": "password1"
}
}
Usage
There are two actions required to generate usable data:
First is the scraping action. When called, a browser will open and perform a job query on the specified site. Each posting will be exported to the data/html directory.
The second action is parsing. Each job posting in data/html will be opened and analyzed. Once all postings have been analyzed a single CSV file will be exported to data/csv.
The csv file provides a high-level overview of all the jobs returned during the query. When imported to a spreadsheet, users can filter on fields not present in the original search options. Examples include sorting by companies or excluding certain industries.
Add Creds File
This is required for all usage.
# Install with git
$ git clone git@github.com:jay-law/job-scraper.git
# Create and populate creds.json. Bash only:
cat <<EOF > creds.json
{
"linkedin": {
"username": "jay-law@protonmail.com",
"password": "password1"
}
}
EOF
Use as Code
# Install with git
$ git clone git@github.com:jay-law/job-scraper.git
# Activate virtual env
$ poetry shell
# Install dependencies
$ poetry install
# Ensure creds.json exists (see above)
# Execute - Scrape linkedin
$ python3 exfill/extractor.py linkedin scrape
# Execute - Parse linkedin
$ python3 exfill/extractor.py linkedin parse
Use as Module
# Install
$ poetry add exfill
# Ensure creds.json exists (see above)
# Execute - Scrape linkedin
$ python3 -m exfill.extractor linkedin scrape
# Execute - Parse linkedin
$ python3 -m exfill.extractor linkedin parse
Roadmap
- Write unit tests
- Improve secret handling
- Add packaging
- Move paths to config file
- Move keyword logic
- Set/include default config.ini for users installing with PIP
- Add CICD
- Automate versioning
- Add formatter (black module)
- Add static type checking (mypy module)
- Add import sorter (isort module)
- Add linter (flake8 module)
- Update string interpolation from %f to f-string
- Replace sys.exit calls with exceptions
- Update how the config object is accessed
- Migrate to
poetryfor virtual env, building, and publishing - Replace os.path usage with pathlib
- Replace pandas export with csv export
- Replace unittest with pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exfill-0.1.29.tar.gz.
File metadata
- Download URL: exfill-0.1.29.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c581f3a656aa92a75a89d9677a9999da842bbaf86bc212a157522ff22beb57f2
|
|
| MD5 |
e11bf78b6389293033990f2ffeaf5230
|
|
| BLAKE2b-256 |
b916cb6151066e47b02e601f54eca34e309f5866622bc9cfb7f5ddcbe47e7fd8
|
File details
Details for the file exfill-0.1.29-py3-none-any.whl.
File metadata
- Download URL: exfill-0.1.29-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6ca5e0ca4b73896ea06621a2f7dbc44e34233479f4e52194d62ea49b2321a55
|
|
| MD5 |
86af245fd54a95ab0539748579f783bd
|
|
| BLAKE2b-256 |
363654c69c84909c4e1d8e0d848f9cd6952a3e043a4e159115e3498f8e85d09e
|