Skip to main content

A library for scraping LinkedIn job postings.

Project description

LinkedInWebScraper

CI Docs Docs site Release PyPI version Python versions License

LinkedInWebScraper is a production-minded Python library and scheduled job runner for collecting LinkedIn job listings, normalizing the data, persisting run history, and exporting reusable datasets.

Highlights

  • Canonical package namespace under linkedin_web_scraper
  • Typed programmatic config for single scrapes and TOML runtime config for CLI and scheduled runs
  • Managed artifacts under artifacts/jobs, artifacts/logs, and artifacts/state
  • SQLite-backed persistence through a clean application storage port
  • Package CLI with scrape once, scrape daily, export, and --dry-run
  • Optional OpenAI enrichment built on the current Responses API
  • Runnable examples under examples/
  • Auto release automation that waits for green CI and Docs runs on main

Install

pip install LinkedInWebScraper
pip install LinkedInWebScraper[openai]
pip install -e .[dev]

Quickstart

from linkedin_web_scraper import (
    JobScraperConfig,
    LinkedInJobScraper,
    RemoteType,
    configure_logging,
)

logger = configure_logging(filename="example.log")
config = JobScraperConfig(
    position="Data Analyst",
    location="San Francisco",
    remote=RemoteType.REMOTE,
)

jobs = LinkedInJobScraper(logger=logger, config=config).run()
print(jobs.head())

Examples

Run the example scripts from examples/:

python examples/example.py
python examples/example_advanced_config.py
python examples/example_openai.py

The OpenAI example requires OPENAI_API_KEY in the environment.

CLI Runtime

linkedin-webscraper scrape once --dry-run
linkedin-webscraper scrape daily
linkedin-webscraper export --run-id <run-id>

Use runtime.example.toml as the template for a real runtime.toml. The root runtime scripts remain available for the daily and once workflows:

python main.py
python process_ds_jobs.py

Docs

Development

Run the local gate before risky pushes or merges:

python -m tox -e preflight

For a faster smoke-only path:

python -m tox -e smoke

The detailed validation matrix and release flow live in docs/development/validation.md and docs/development/release-and-automation.md.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkedinwebscraper-1.1.1.tar.gz (55.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkedinwebscraper-1.1.1-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file linkedinwebscraper-1.1.1.tar.gz.

File metadata

  • Download URL: linkedinwebscraper-1.1.1.tar.gz
  • Upload date:
  • Size: 55.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for linkedinwebscraper-1.1.1.tar.gz
Algorithm Hash digest
SHA256 2ad0b214a1e577608d7c44db4d2f07d6742f8c383b3cdd15631c0979feb2faae
MD5 1dd916ab06119fc0778fc37787310655
BLAKE2b-256 53835343f06660b813a7c9025a4a6533122dbdbd1e9022f36edfd1a1f1422c96

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkedinwebscraper-1.1.1.tar.gz:

Publisher: release.yml on ricardogr07/LinkedInWebScraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file linkedinwebscraper-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for linkedinwebscraper-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e00c939d39ca729656acbd9a46e02de4c0a73e236efdd2433ff7ce56262fdf8e
MD5 516d9c6ecf0b276719961f4861bf7c43
BLAKE2b-256 67241ab919241a73e0711338ed6faa984bf7adf5a62475168c5bafe1c6e3abfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkedinwebscraper-1.1.1-py3-none-any.whl:

Publisher: release.yml on ricardogr07/LinkedInWebScraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page