Skip to main content

AI-powered job scraper - extract job listings from any careers page using Firecrawl + Gemini AI. Handles JavaScript-heavy sites, ATS systems, and React/Next.js SPAs.

Project description

OpenJobs

PyPI version Python 3.9+ License: MIT Tests

Scrape jobs from any careers page in 3 lines of code. No custom scrapers needed.

from openjobs import scrape_careers_page

jobs = scrape_careers_page("https://stripe.com/jobs")
print(f"Found {len(jobs)} jobs")  # Found 142 jobs

Works with JavaScript-heavy sites, React/Next.js SPAs, and complex ATS systems.


Why OpenJobs?

Feature OpenJobs Scrapy BeautifulSoup Selenium
Works on any site Yes No (custom spider per site) No (static HTML only) Yes (but slow)
Handles JavaScript Yes (Firecrawl) No No Yes
AI extraction Yes (Gemini) No No No
Setup time 30 seconds Hours Hours Minutes
Maintenance Zero High High Medium

The problem: Every careers page has different HTML. Scrapy/BeautifulSoup need custom code per site. Selenium is slow and breaks often.

The solution: OpenJobs uses Firecrawl (JS rendering) + Gemini AI (smart extraction) = works everywhere, no maintenance.


Install

pip install openjobs

Quick Start

from openjobs import scrape_careers_page

# Scrape any careers page
jobs = scrape_careers_page("https://linear.app/careers")

for job in jobs:
    print(f"{job['title']} - {job['location']}")

Environment variables needed:

export GOOGLE_API_KEY=your_key  # Free: https://aistudio.google.com/apikey

That's it. No Firecrawl key needed for basic usage (uses cloud with generous free tier).


Features

Find Careers Page URL

Don't know the exact URL? OpenJobs finds it:

from openjobs import discover_careers_url

url = discover_careers_url("stripe.com")
# Returns: https://stripe.com/jobs/search

AI Enrichment

Extract tech stacks, salary ranges, and categorize jobs:

from openjobs import scrape_careers_page, process_jobs

jobs = scrape_careers_page("https://figma.com/careers")
enriched = process_jobs(jobs, enrich=True)

for job in enriched:
    print(f"{job['title_original']}")
    print(f"  Category: {job['category']}")
    print(f"  Tech: {job.get('tech_stack', [])}")

Filter by Category

# Only engineering jobs
eng_jobs = process_jobs(jobs, enrich=True, filter_categories=["Software Engineering"])

Self-Hosted (Unlimited Free)

Run Firecrawl locally for unlimited scraping:

git clone https://github.com/federicodeponte/openjobs.git
cd openjobs && docker compose up -d

export FIRECRAWL_URL=http://localhost:3002

Output

{
  "company": "Linear",
  "title": "Senior Software Engineer",
  "department": "Engineering",
  "location": "Remote (US/EU)",
  "job_url": "https://linear.app/careers/...",
  "slug": "linear-senior-software-engineer",
  "date_scraped": "2025-01-08T10:00:00"
}

With enrichment:

{
  "category": "Software Engineering",
  "subcategory": "Backend Engineer",
  "tech_stack": ["TypeScript", "PostgreSQL", "Redis"],
  "experience_years": "5+",
  "salary_range": "$150,000 - $200,000"
}

Supported Sites

Works with most careers pages:

Type Examples Status
Company sites stripe.com, linear.app, figma.com Supported
JavaScript SPAs React, Next.js, Vue apps Supported
ATS platforms Lever, Greenhouse, Ashby Supported
Heavy SPAs Retool, Airtable, Vercel, Notion Supported
Job boards LinkedIn, Indeed, Glassdoor Blocked (ToS)

API Reference

Function Description
scrape_careers_page(url) Scrape jobs from a careers page
discover_careers_url(domain) Find careers URL from domain
process_jobs(jobs, enrich=True) Enrich with AI categorization
scrape_with_firecrawl(url) Get page content as markdown
extract_jobs_from_markdown(md) Extract jobs from markdown

Environment Variables

Variable Required Description
GOOGLE_API_KEY Yes Gemini API key (free)
FIRECRAWL_URL No Self-hosted Firecrawl URL
FIRECRAWL_API_KEY No Firecrawl cloud key (500 free/mo)

How It Works

URL → Firecrawl (renders JS) → Gemini AI (extracts jobs) → Structured JSON
  1. Firecrawl renders JavaScript and returns clean markdown
  2. Fallback extracts embedded JSON from React/Next.js data
  3. Gemini AI parses job listings intelligently
  4. Output returns structured job data

Contributing

git clone https://github.com/federicodeponte/openjobs.git
cd openjobs
pip install -e ".[dev]"
make test

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openjobs-0.1.0.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openjobs-0.1.0-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file openjobs-0.1.0.tar.gz.

File metadata

  • Download URL: openjobs-0.1.0.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for openjobs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf53e58e9d3cc5b1bd86ba5b7ddc9cca850648a120776cbec59c2db7fa68d0b2
MD5 714fbcb622c504d3e63778985a3b1cc5
BLAKE2b-256 7a4dad9a7a990c728626c774381ede92ac93b2c1cb5a0613828bacfbd7e333d4

See more details on using hashes here.

File details

Details for the file openjobs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openjobs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for openjobs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb4a639275599bfe9c3a25f2862f46c08343895948e9511cea0f502e42959d21
MD5 55c7e5df22c7ed1ccbae6b12d2376b42
BLAKE2b-256 7ed7bd26d4bd13d38cd4ee78b1b460807b797a547a7ffc6d65df4b2dd0eecb8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page