A small app to grab job postings from online job boards

These details have not been verified by PyPI

Project links

Project description

Introduction

Job boards (like LinkedIn) can be a good source for finding job openings. Unfortunately the search results cannot always be filtered to a usable degree. Exfill (short for extraction) lets users scrape and parse jobs with more flexability provided by the default search.

Currently only LinkedIn is supported.

Project Structure

Directories:

src/exfill/parsers - Contains parser(s)
src/exfill/scrapers - Contains scraper(s)
src/exfill/support
- Contains geckodriver driver for FireFox which is used by Selenium
- Download the latest driver from the Mozilla GeckoDriver repo in GitHub
data/html
- Not in source control
- Contains HTML elements for a specific job posting
- Populated by a scraper
data/csv
- Not in source control
- Contains parsed information in a csv table
- Populated by a parser
- Also contains an error table
logs
- Not in source control
- Contains logs created during execution

`creds.json` File

Syntax should be as follows:

{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}

Usage

There are two actions required to generate usable data:

First is the scraping action. When called, a browser will open and perform a job query on the specified site. Each posting will be exported to the data/html directory.

The second action is parsing. Each job posting in data/html will be opened and analyzed. Once all postings have been analyzed a single CSV file will be exported to data/csv.

The csv file provides a high-level overview of all the jobs returned during the query. When imported to a spreadsheet, users can filter on fields not present in the original search options. Examples include sorting by companies or excluding certain industries.

Use as Code

# Install with git
$ git clone git@github.com:jay-law/job-scraper.git

# Create and populate creds.json.  Bash only:
cat <<EOF > creds.json
{
    "linkedin": {
        "username": "jay-law@protonmail.com",
        "password": "password1"
    }
}
EOF

# Activate virtual env
$ poetry shell

# Install dependencies
$ poetry install            # all deps
$ poetry install --no-dev   # don't install linters/formatters

# Execute - Scrape linkedin
$ python3 exfill/extractor.py linkedin scrape

# Execute - Parse linkedin
$ python3 exfill/extractor.py linkedin parse

Use as Module

NOTE - This was broken during the implementation of poetry. It will be fixed soon... Hopefully

# Install
$ python3 -m pip install --upgrade exfill

# Execute - Scrape linkedin
$ python3 -m exfill.extractor linkedin scrape

# Execute - Parse linkedin
$ python3 -m exfill.extractor linkedin parse

Roadmap

Write unit tests
Improve secret handling
Add packaging
Move paths to config file
Move keyword logic
Set/include default config.ini for users installing with PIP
Add CICD
Automate versioning
Add formatter (black module)
Add static type checking (mypy module)
Add import sorter (isort module)
Add linter (flake8 module)
Update string interpolation from %f to f-string
Replace sys.exit calls with exceptions
Update how the config object is accessed
Migrate to poetry for virtual env, building, and publishing
Replace os.path usage with pathlib
Replace pandas export with csv export
Replace unittest with pytest

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.29

Jun 25, 2022

0.1.25

Jun 25, 2022

This version

0.1.24

May 26, 2022

0.1.21

May 25, 2022

0.1.20

May 25, 2022

0.1.18

May 25, 2022

0.1.17

May 24, 2022

0.1.16

May 24, 2022

0.1.15

May 21, 2022

0.1.14

May 17, 2022

0.1.13

May 17, 2022

0.1.10

May 16, 2022

0.1.8

May 15, 2022

0.1.7

May 14, 2022

0.0.20

May 14, 2022

0.0.19

May 14, 2022

0.0.18

May 14, 2022

0.0.17

May 13, 2022

0.0.15

May 13, 2022

0.0.14

May 12, 2022

0.0.12

May 2, 2022

0.0.11

May 2, 2022

0.0.9

May 2, 2022

0.0.7

May 2, 2022

0.0.6

May 2, 2022

0.0.5

May 2, 2022

0.0.3

May 2, 2022

0.0.1

May 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exfill-0.1.24.tar.gz (2.7 MB view hashes)

Uploaded May 26, 2022 Source

Built Distribution

exfill-0.1.24-py3-none-any.whl (2.7 MB view hashes)

Uploaded May 26, 2022 Python 3

Hashes for exfill-0.1.24.tar.gz

Hashes for exfill-0.1.24.tar.gz
Algorithm	Hash digest
SHA256	`7c09989e3ed6f8899e8117d183e1c6492a60fabb6ae4f970b683e15a1046b88f`
MD5	`a8fa9981fc7cee499f314c8f1b51e6f2`
BLAKE2b-256	`d8e84d901bc9fe672452af67fc13ee18821d5382c95c9eb100f8bb07905e3c48`

Hashes for exfill-0.1.24-py3-none-any.whl

Hashes for exfill-0.1.24-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f7eea7ffc58237a8f559d80ae320b003ead61eb0f106512b745ccad9f3f44ff`
MD5	`2ddc422da879c64c4396c6eef0dc4271`
BLAKE2b-256	`af126d4a86fefde15818156aa9296e39b00b0fda752250a892219109b671cbfa`

exfill 0.1.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Project Structure

`creds.json` File

Usage

Use as Code

Use as Module

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

exfill 0.1.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Project Structure

creds.json File

Usage

Use as Code

Use as Module

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`creds.json` File