AI-powered job scraper - extract job listings from any careers page using Firecrawl + Gemini AI. Handles JavaScript-heavy sites, ATS systems, and React/Next.js SPAs.
Project description
OpenJobs
Scrape jobs from any careers page in 3 lines of code. No custom scrapers needed.
from openjobs import scrape_careers_page
jobs = scrape_careers_page("https://stripe.com/jobs")
print(f"Found {len(jobs)} jobs") # Found 142 jobs
Works with JavaScript-heavy sites, React/Next.js SPAs, and complex ATS systems.
Why OpenJobs?
| Feature | OpenJobs | Scrapy | BeautifulSoup | Selenium |
|---|---|---|---|---|
| Works on any site | Yes | No (custom spider per site) | No (static HTML only) | Yes (but slow) |
| Handles JavaScript | Yes (Firecrawl) | No | No | Yes |
| AI extraction | Yes (Gemini) | No | No | No |
| Setup time | 30 seconds | Hours | Hours | Minutes |
| Maintenance | Zero | High | High | Medium |
The problem: Every careers page has different HTML. Scrapy/BeautifulSoup need custom code per site. Selenium is slow and breaks often.
The solution: OpenJobs uses Firecrawl (JS rendering) + Gemini AI (smart extraction) = works everywhere, no maintenance.
Install
pip install openjobs
Quick Start
from openjobs import scrape_careers_page
# Scrape any careers page
jobs = scrape_careers_page("https://linear.app/careers")
for job in jobs:
print(f"{job['title']} - {job['location']}")
Environment variables needed:
export GOOGLE_API_KEY=your_key # Free: https://aistudio.google.com/apikey
That's it. No Firecrawl key needed for basic usage (uses cloud with generous free tier).
Features
Find Careers Page URL
Don't know the exact URL? OpenJobs finds it:
from openjobs import discover_careers_url
url = discover_careers_url("stripe.com")
# Returns: https://stripe.com/jobs/search
AI Enrichment
Extract tech stacks, salary ranges, and categorize jobs:
from openjobs import scrape_careers_page, process_jobs
jobs = scrape_careers_page("https://figma.com/careers")
enriched = process_jobs(jobs, enrich=True)
for job in enriched:
print(f"{job['title_original']}")
print(f" Category: {job['category']}")
print(f" Tech: {job.get('tech_stack', [])}")
Filter by Category
# Only engineering jobs
eng_jobs = process_jobs(jobs, enrich=True, filter_categories=["Software Engineering"])
Self-Hosted (Unlimited Free)
Run Firecrawl locally for unlimited scraping:
git clone https://github.com/federicodeponte/openjobs.git
cd openjobs && docker compose up -d
export FIRECRAWL_URL=http://localhost:3002
Output
{
"company": "Linear",
"title": "Senior Software Engineer",
"department": "Engineering",
"location": "Remote (US/EU)",
"job_url": "https://linear.app/careers/...",
"slug": "linear-senior-software-engineer",
"date_scraped": "2025-01-08T10:00:00"
}
With enrichment:
{
"category": "Software Engineering",
"subcategory": "Backend Engineer",
"tech_stack": ["TypeScript", "PostgreSQL", "Redis"],
"experience_years": "5+",
"salary_range": "$150,000 - $200,000"
}
Supported Sites
Works with most careers pages:
| Type | Examples | Status |
|---|---|---|
| Company sites | stripe.com, linear.app, figma.com | Supported |
| JavaScript SPAs | React, Next.js, Vue apps | Supported |
| ATS platforms | Lever, Greenhouse, Ashby | Supported |
| Heavy SPAs | Retool, Airtable, Vercel, Notion | Supported |
| Job boards | LinkedIn, Indeed, Glassdoor | Blocked (ToS) |
API Reference
| Function | Description |
|---|---|
scrape_careers_page(url) |
Scrape jobs from a careers page |
discover_careers_url(domain) |
Find careers URL from domain |
process_jobs(jobs, enrich=True) |
Enrich with AI categorization |
scrape_with_firecrawl(url) |
Get page content as markdown |
extract_jobs_from_markdown(md) |
Extract jobs from markdown |
Environment Variables
| Variable | Required | Description |
|---|---|---|
GOOGLE_API_KEY |
Yes | Gemini API key (free) |
FIRECRAWL_URL |
No | Self-hosted Firecrawl URL |
FIRECRAWL_API_KEY |
No | Firecrawl cloud key (500 free/mo) |
How It Works
URL → Firecrawl (renders JS) → Gemini AI (extracts jobs) → Structured JSON
- Firecrawl renders JavaScript and returns clean markdown
- Fallback extracts embedded JSON from React/Next.js data
- Gemini AI parses job listings intelligently
- Output returns structured job data
Contributing
git clone https://github.com/federicodeponte/openjobs.git
cd openjobs
pip install -e ".[dev]"
make test
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openjobs-0.1.0.tar.gz.
File metadata
- Download URL: openjobs-0.1.0.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf53e58e9d3cc5b1bd86ba5b7ddc9cca850648a120776cbec59c2db7fa68d0b2
|
|
| MD5 |
714fbcb622c504d3e63778985a3b1cc5
|
|
| BLAKE2b-256 |
7a4dad9a7a990c728626c774381ede92ac93b2c1cb5a0613828bacfbd7e333d4
|
File details
Details for the file openjobs-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openjobs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb4a639275599bfe9c3a25f2862f46c08343895948e9511cea0f502e42959d21
|
|
| MD5 |
55c7e5df22c7ed1ccbae6b12d2376b42
|
|
| BLAKE2b-256 |
7ed7bd26d4bd13d38cd4ee78b1b460807b797a547a7ffc6d65df4b2dd0eecb8e
|