A library for scraping LinkedIn job postings.
Project description
LinkedInWebScraper
LinkedInWebScraper is a production-minded Python library and scheduled job runner for collecting LinkedIn job listings, normalizing the data, persisting run history, and exporting reusable datasets.
Highlights
- Canonical package namespace under
linkedin_web_scraper - Typed programmatic config for single scrapes and TOML runtime config for CLI and scheduled runs
- Managed artifacts under
artifacts/jobs,artifacts/logs, andartifacts/state - SQLite-backed persistence through a clean application storage port
- Package CLI with
scrape once,scrape daily,export, and--dry-run - Optional OpenAI enrichment built on the current Responses API
- Runnable examples under
examples/ - Auto release automation that waits for green CI and Docs runs on
main
Install
pip install LinkedInWebScraper
pip install LinkedInWebScraper[openai]
pip install -e .[dev]
Quickstart
from linkedin_web_scraper import (
JobScraperConfig,
LinkedInJobScraper,
RemoteType,
configure_logging,
)
logger = configure_logging(filename="example.log")
config = JobScraperConfig(
position="Data Analyst",
location="San Francisco",
remote=RemoteType.REMOTE,
)
jobs = LinkedInJobScraper(logger=logger, config=config).run()
print(jobs.head())
Examples
Run the example scripts from examples/:
python examples/example.py
python examples/example_advanced_config.py
python examples/example_openai.py
The OpenAI example requires OPENAI_API_KEY in the environment.
CLI Runtime
linkedin-webscraper scrape once --dry-run
linkedin-webscraper scrape daily
linkedin-webscraper export --run-id <run-id>
Use runtime.example.toml as the template for a real runtime.toml. The root runtime scripts remain available for the daily and once workflows:
python main.py
python process_ds_jobs.py
Docs
- Getting Started
- Configuration
- Runtime and Deployment
- Release and Automation
- Validation
- API Reference
Development
Run the local gate before risky pushes or merges:
python -m tox -e preflight
For a faster smoke-only path:
python -m tox -e smoke
The detailed validation matrix and release flow live in docs/development/validation.md and docs/development/release-and-automation.md.
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linkedinwebscraper-1.1.1.tar.gz.
File metadata
- Download URL: linkedinwebscraper-1.1.1.tar.gz
- Upload date:
- Size: 55.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ad0b214a1e577608d7c44db4d2f07d6742f8c383b3cdd15631c0979feb2faae
|
|
| MD5 |
1dd916ab06119fc0778fc37787310655
|
|
| BLAKE2b-256 |
53835343f06660b813a7c9025a4a6533122dbdbd1e9022f36edfd1a1f1422c96
|
Provenance
The following attestation bundles were made for linkedinwebscraper-1.1.1.tar.gz:
Publisher:
release.yml on ricardogr07/LinkedInWebScraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
linkedinwebscraper-1.1.1.tar.gz -
Subject digest:
2ad0b214a1e577608d7c44db4d2f07d6742f8c383b3cdd15631c0979feb2faae - Sigstore transparency entry: 1155454903
- Sigstore integration time:
-
Permalink:
ricardogr07/LinkedInWebScraper@1c7e259d9f22a5d1777753e0c6278e9c62a9039c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ricardogr07
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1c7e259d9f22a5d1777753e0c6278e9c62a9039c -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file linkedinwebscraper-1.1.1-py3-none-any.whl.
File metadata
- Download URL: linkedinwebscraper-1.1.1-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e00c939d39ca729656acbd9a46e02de4c0a73e236efdd2433ff7ce56262fdf8e
|
|
| MD5 |
516d9c6ecf0b276719961f4861bf7c43
|
|
| BLAKE2b-256 |
67241ab919241a73e0711338ed6faa984bf7adf5a62475168c5bafe1c6e3abfe
|
Provenance
The following attestation bundles were made for linkedinwebscraper-1.1.1-py3-none-any.whl:
Publisher:
release.yml on ricardogr07/LinkedInWebScraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
linkedinwebscraper-1.1.1-py3-none-any.whl -
Subject digest:
e00c939d39ca729656acbd9a46e02de4c0a73e236efdd2433ff7ce56262fdf8e - Sigstore transparency entry: 1155454920
- Sigstore integration time:
-
Permalink:
ricardogr07/LinkedInWebScraper@1c7e259d9f22a5d1777753e0c6278e9c62a9039c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ricardogr07
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1c7e259d9f22a5d1777753e0c6278e9c62a9039c -
Trigger Event:
workflow_run
-
Statement type: