Multi-provider AI job scraper with Streamlit UI and REST API
Project description
Job Scraper by Firas V2 Improved version 23/04/2026
A no-nonsense job search tool that finds listings across multiple sites and scores them against your CV using AI. Built it as a tool for myself because I was tired of checking ten different job boards every morning.now that i found a job i can focus on improving this tool.for community use. ill be able to release full version in few months.if you wanna help me you can mail me at firaslamou@gmail.com et merci!
IT Runs entirely in Docker on a Linux VM. No local Python setup, no dependency hell, no "works on my machine".
What you get
- Paste your CV and keywords into the web UI
- AI scores every job 0-100 for relevance
- Pause, resume, or restart runs from the dashboard
- Export results as JSON or CSV
- All data lives in a SQLite database you actually own
How to run it
You need Docker. That's it.
1. Set your environment
cp .env.example .env
Edit .env and drop in any AI keys you have (Groq, Anthropic, or Gemini). If you don't have any, lite mode works fine with keyword matching.
2. Spin it up
docker compose up --build
This builds two containers:
- scraper at
http://localhost:8000(the brain) - UI at
http://localhost:8501(your dashboard)
The optional n8n automation engine lives under a separate profile if you want it later:
docker compose --profile automation up
3. Open the UI
Go to http://localhost:8501, paste your CV, add some keywords like "senior python remote", pick your AI provider (or stay in lite mode), and hit Start. Watch the progress bar fill up. High-scoring jobs bubble to the top.
4. Export when done
curl http://localhost:8000/export/csv > jobs.csv
Docker is the only way
This app is designed to run inside Docker containers on a Linux VM. Do not try to run it natively on Windows or macOS. The scraper uses Playwright, the UI needs Streamlit, and the database expects a Unix path structure. Docker handles all of that for you.
Requirements:
- Docker Engine 24+ or Docker Desktop
- A Linux VM (WSL2 on Windows, OrbStack or Docker Desktop on Mac, any Linux host)
- 2GB RAM minimum, 4GB recommended
Environment variables
| Variable | What it does | Default |
|---|---|---|
GROQ_API_KEY |
Groq AI scoring | empty |
ANTHROPIC_API_KEY |
Claude AI scoring | empty |
GEMINI_API_KEY |
Google AI scoring | empty |
DATA_DIR |
Where SQLite and logs live | ./data |
REQUEST_DELAY_SECONDS |
Politeness between searches | 2.0 |
RETRY_MAX_ATTEMPTS |
How many times to retry a failed search | 5 |
API for power users
The scraper exposes a FastAPI server. The UI talks to it, but you can too.
Start a run:
curl -X POST http://localhost:8000/run \
-H "Content-Type: application/json" \
-d '{"provider":"groq","lite_mode":true,"sites":["example.com"],"keywords":["python"],"cv_text":"developer"}'
Check status:
curl http://localhost:8000/status
Pause a running job:
curl -X POST http://localhost:8000/pause
Resume:
curl -X POST http://localhost:8000/resume
Kill it:
curl -X POST http://localhost:8000/stop
Makefile shortcuts
make build
make up
make down
make logs
Keeping your keys safe
Never commit .env. It is gitignored by default. If you accidentally pushed a key, rotate it immediately.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file job_scraper02-0.2.0.tar.gz.
File metadata
- Download URL: job_scraper02-0.2.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90ba0841c5aa1eae73b418d4b73795e9e168f6b8e06be8273d8025d9599c3182
|
|
| MD5 |
7f964c4d4af68f35062a0362ce9aced0
|
|
| BLAKE2b-256 |
17ad43ef9c21fd69b6bb94571a65f707e7188bc97ee32a24a827b914bc476d24
|
Provenance
The following attestation bundles were made for job_scraper02-0.2.0.tar.gz:
Publisher:
ci.yml on firaslamouchi21/Job-Scraper02
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
job_scraper02-0.2.0.tar.gz -
Subject digest:
90ba0841c5aa1eae73b418d4b73795e9e168f6b8e06be8273d8025d9599c3182 - Sigstore transparency entry: 1365454513
- Sigstore integration time:
-
Permalink:
firaslamouchi21/Job-Scraper02@987612168d1e30993d859760a654aac27a934945 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/firaslamouchi21
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@987612168d1e30993d859760a654aac27a934945 -
Trigger Event:
push
-
Statement type:
File details
Details for the file job_scraper02-0.2.0-py3-none-any.whl.
File metadata
- Download URL: job_scraper02-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86a0c34c187bea9210007084e77e35a65180232233e54ea9b2c9875a98790d76
|
|
| MD5 |
ab0367e5ff42ae962412408c11271851
|
|
| BLAKE2b-256 |
1c3d550dfc0edf690af18bf221876374f4468af30581bdb403f965f73ba0e960
|
Provenance
The following attestation bundles were made for job_scraper02-0.2.0-py3-none-any.whl:
Publisher:
ci.yml on firaslamouchi21/Job-Scraper02
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
job_scraper02-0.2.0-py3-none-any.whl -
Subject digest:
86a0c34c187bea9210007084e77e35a65180232233e54ea9b2c9875a98790d76 - Sigstore transparency entry: 1365454609
- Sigstore integration time:
-
Permalink:
firaslamouchi21/Job-Scraper02@987612168d1e30993d859760a654aac27a934945 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/firaslamouchi21
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@987612168d1e30993d859760a654aac27a934945 -
Trigger Event:
push
-
Statement type: