Skip to main content

Scrape job offers and extract structured data using AI

Project description

job-scrapper

Scrape job offers and extract structured data using AI (Claude).

Features

  • Scrapes job listing pages using Selenium with the standard Chrome WebDriver
  • Extracts structured data (title, company, skills, stack, process…) via Claude LLM
  • Outputs a formatted Markdown fiche de poste
  • Caches results by URL hash under var/jobs/
  • Opens a live browser by default for manual interaction (Cloudflare, login walls)

Install

uv sync

Usage

uv run job-scrapper <URL>                            # opens browser (default)
uv run job-scrapper <URL> --no-live                  # headless mode
uv run job-scrapper <URL> --output-dir ~/out         # custom output directory
uv run job-scrapper <URL> --model claude-haiku-4-5   # cheaper/faster model

Development

uv sync --extra dev --extra lint
pre-commit install

Lint:

ruff check src/
ruff format src/

Environment

Variable Required Description
ANTHROPIC_API_KEY Yes Claude API key

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_scrapper-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

job_scrapper-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file job_scrapper-0.1.0.tar.gz.

File metadata

  • Download URL: job_scrapper-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for job_scrapper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ed793dae5f2a24c90a8d0a8afbfbbcaaebbb1f6f1ec12ed72d08f6ba69ef994
MD5 03764df430b2e2e9c640212eb059450c
BLAKE2b-256 333be0638c730e1acb775e1a5db4bbba19cd70913713ad9b5026b143100d3b2e

See more details on using hashes here.

File details

Details for the file job_scrapper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: job_scrapper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for job_scrapper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56111077051a620936f556942e8efb08a27b9a0c722415294ca3b156940b585f
MD5 cfbb8ea42cb3bebd8decb16998996e9d
BLAKE2b-256 500f892e9e28bd8ab0f9c8481118d7814348ed6e4a1295ec081b5376d0617e97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page