Skip to main content

A small app to grab job postings from online job boards

Project description

Introduction

Job boards (like LinkedIn) can be a good source for finding job openings. Unfortunately the search results cannot always be filtered to a usable degree. Exfill (short for extraction) lets users scrape and parse jobs with more flexability provided by the default search.

Currently only LinkedIn is supported.

Project Structure

Directories:

  • src/exfill/parsers - Contains parser(s)
  • src/exfill/scrapers - Contains scraper(s)
  • src/exfill/support
  • data/html
    • Not in source control
    • Contains HTML elements for a specific job posting
    • Populated by a scraper
  • data/csv
    • Not in source control
    • Contains parsed information in a csv table
    • Populated by a parser
    • Also contains an error table
  • logs
    • Not in source control
    • Contains logs created during execution

creds.json File

Syntax should be as follows:

{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}

Usage

There are two actions required to generate usable data:

First is the scraping action. When called, a browser will open and perform a job query on the specified site. Each posting will be exported to the data/html directory.

The second action is parsing. Each job posting in data/html will be opened and analyzed. Once all postings have been analyzed a single CSV file will be exported to data/csv.

The csv file provides a high-level overview of all the jobs returned during the query. When imported to a spreadsheet, users can filter on fields not present in the original search options. Examples include sorting by companies or excluding certain industries.

Add Creds File

This is required for all usage.

# Install with git
$ git clone git@github.com:jay-law/job-scraper.git

# Create and populate creds.json.  Bash only:
cat <<EOF > creds.json
{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}
EOF

Use as Code

# Install with git
$ git clone git@github.com:jay-law/job-scraper.git

# Activate virtual env
$ poetry shell

# Install dependencies
$ poetry install

# Ensure creds.json exists (see above)

# Execute - Scrape linkedin
$ python3 exfill/extractor.py linkedin scrape

# Execute - Parse linkedin
$ python3 exfill/extractor.py linkedin parse

Use as Module

# Install
$ poetry add exfill

# Ensure creds.json exists (see above)

# Execute - Scrape linkedin
$ python3 -m exfill.extractor linkedin scrape

# Execute - Parse linkedin
$ python3 -m exfill.extractor linkedin parse

Roadmap

  • Write unit tests
  • Improve secret handling
  • Add packaging
  • Move paths to config file
  • Move keyword logic
  • Set/include default config.ini for users installing with PIP
  • Add CICD
  • Automate versioning
  • Add formatter (black module)
  • Add static type checking (mypy module)
  • Add import sorter (isort module)
  • Add linter (flake8 module)
  • Update string interpolation from %f to f-string
  • Replace sys.exit calls with exceptions
  • Update how the config object is accessed
  • Migrate to poetry for virtual env, building, and publishing
  • Replace os.path usage with pathlib
  • Replace pandas export with csv export
  • Replace unittest with pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exfill-0.1.29.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exfill-0.1.29-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file exfill-0.1.29.tar.gz.

File metadata

  • Download URL: exfill-0.1.29.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for exfill-0.1.29.tar.gz
Algorithm Hash digest
SHA256 c581f3a656aa92a75a89d9677a9999da842bbaf86bc212a157522ff22beb57f2
MD5 e11bf78b6389293033990f2ffeaf5230
BLAKE2b-256 b916cb6151066e47b02e601f54eca34e309f5866622bc9cfb7f5ddcbe47e7fd8

See more details on using hashes here.

File details

Details for the file exfill-0.1.29-py3-none-any.whl.

File metadata

  • Download URL: exfill-0.1.29-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for exfill-0.1.29-py3-none-any.whl
Algorithm Hash digest
SHA256 a6ca5e0ca4b73896ea06621a2f7dbc44e34233479f4e52194d62ea49b2321a55
MD5 86af245fd54a95ab0539748579f783bd
BLAKE2b-256 363654c69c84909c4e1d8e0d848f9cd6952a3e043a4e159115e3498f8e85d09e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page