Skip to main content

A small app to grab job postings from online job boards

Project description

Introduction

Job boards (like LinkedIn) can be a good source for finding job openings. Unfortunately the search results cannot always be filtered to a usable degree. Exfill (short for extraction) lets users scrape and parse jobs with more flexability provided by the default search.

Currently only LinkedIn is supported.

Project Structure

Directories:

  • src/exfill/parsers - Contains parser(s)
  • src/exfill/scrapers - Contains scraper(s)
  • src/exfill/support
  • data/html
    • Not in source control
    • Contains HTML elements for a specific job posting
    • Populated by a scraper
  • data/csv
    • Not in source control
    • Contains parsed information in a csv table
    • Populated by a parser
    • Also contains an error table
  • logs
    • Not in source control
    • Contains logs created during execution

creds.json File

Syntax should be as follows:

{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}

Usage

There are two actions required to generate usable data:

First is the scraping action. When called, a browser will open and perform a job query on the specified site. Each posting will be exported to the data/html directory.

The second action is parsing. Each job posting in data/html will be opened and analyzed. Once all postings have been analyzed a single CSV file will be exported to data/csv.

The csv file provides a high-level overview of all the jobs returned during the query. When imported to a spreadsheet, users can filter on fields not present in the original search options. Examples include sorting by companies or excluding certain industries.

Use as Code

# Install with git
$ git clone git@github.com:jay-law/job-scraper.git

# Create and populate creds.json.  Bash only:
cat <<EOF > creds.json
{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}
EOF

# Activate virtual env
$ poetry shell

# Install dependencies
$ poetry install            # all deps
$ poetry install --no-dev   # don't install linters/formatters

# Execute - Scrape linkedin
$ python3 exfill/extractor.py linkedin scrape

# Execute - Parse linkedin
$ python3 exfill/extractor.py linkedin parse

Use as Module

NOTE - This was broken during the implementation of poetry. It will be fixed soon... Hopefully

# Install
$ python3 -m pip install --upgrade exfill

# Execute - Scrape linkedin
$ python3 -m exfill.extractor linkedin scrape

# Execute - Parse linkedin
$ python3 -m exfill.extractor linkedin parse

Roadmap

  • Write unit tests
  • Improve secret handling
  • Add packaging
  • Move paths to config file
  • Move keyword logic
  • Set/include default config.ini for users installing with PIP
  • Add CICD
  • Automate versioning
  • Add formatter (black module)
  • Add static type checking (mypy module)
  • Add import sorter (isort module)
  • Add linter (flake8 module)
  • Update string interpolation from %f to f-string
  • Replace sys.exit calls with exceptions
  • Update how the config object is accessed
  • Migrate to poetry for virtual env, building, and publishing
  • Replace os.path usage with pathlib
  • Replace pandas export with csv export
  • Replace unittest with pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exfill-0.1.24.tar.gz (2.7 MB view hashes)

Uploaded Source

Built Distribution

exfill-0.1.24-py3-none-any.whl (2.7 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page