Skip to main content

Generic job listings scraper with baseline dedupe and optional Netlify cache submission.

Project description

Generic Job Listings Scraper

This scraper accepts one or more business websites, infers careers/listings structure, extracts candidate job listings, and writes:

  • all scraped rows
  • rows not found in the April 2026 JobPool baseline dataset

It can also POST unique rows to the Netlify cache function for review.

Run

py scripts/generic_job_listings_scraper.py `
  --business-url https://mossyhonda.hireology.careers/ `
  --company-name "Mossy Honda" `
  --output output/mossy-scraped.csv `
  --unique-output output/mossy-unique.csv

Send Unique Rows To Netlify Cache

py scripts/generic_job_listings_scraper.py `
  --business-url https://mossyhonda.hireology.careers/ `
  --company-name "Mossy Honda" `
  --cache-endpoint https://<your-netlify-site>/api/scrape-cache `
  --output output/mossy-scraped.csv `
  --unique-output output/mossy-unique.csv

The scraper infers user_name from local environment/git config and sends:

  • user_name
  • request_timestamp
  • source_business_urls
  • listings (including the standard listing fields and any additional fields present)

Cache API

  • POST /api/scrape-cache stores a scrape request payload.
  • GET /api/scrape-cache?limit=25&user_name=<name> returns recent cached submissions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pooled_job_scraper-0.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file pooled_job_scraper-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pooled_job_scraper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 664cb2a51834ab49b6750d7206b94bc0398765371d0b06bc26336d56b6079438
MD5 ffeb538e8331ac2320622eb1b3719ba5
BLAKE2b-256 ff5a72606e891070dec783d6157217794d86d2a6d8614fa485199a1a4e2230c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page