A programmable job-scraping framework for India & global markets. Aggregates Naukri, Shine, Internshala, LinkedIn, Indeed, and FAANG companies into a unified dataset.
Project description
๐ฏ HireHunt
A programmable job-scraping framework for India & global markets.
Aggregate jobs from 12 sources โ Naukri, Internshala, Shine, LinkedIn, Indeed, and FAANG companies โ into a unified, filterable, ranked dataset.
โจ Sources
| Source | Region | Type | Method |
|---|---|---|---|
naukri |
๐ฎ๐ณ India | Jobs | REST API โ 15,000+ listings |
shine |
๐ฎ๐ณ India | Jobs | SSR JSON โ 17,000+ listings |
internshala |
๐ฎ๐ณ India | Internships / Jobs | HTML scraping |
unstop |
๐ฎ๐ณ India | Hackathons / Competitions | REST API |
linkedin |
๐ Global | Jobs | Guest HTML API |
indeed |
๐ Global | Jobs | GraphQL API |
google_careers |
๐ FAANG | Jobs | LinkedIn (company-filtered) |
amazon |
๐ FAANG | Jobs | REST API |
meta |
๐ FAANG | Jobs | LinkedIn (company-filtered) |
apple |
๐ FAANG | Jobs | LinkedIn (keyword search) |
netflix |
๐ FAANG | Jobs | LinkedIn (company-filtered) |
microsoft |
๐ FAANG | Jobs | LinkedIn (company-filtered) |
๐ฆ Installation
pip install hirehunt
Note: The PyPI package is
hirehunt. The import name isjobhunter.import jobhunter # โ this is correct after pip install hirehunt
Requirements: Python 3.10+
โก Quick Start
Python API
from jobhunter import scrape_jobs
# Search across India's top job boards
jobs = scrape_jobs(
search_term="python developer",
sources=["naukri", "shine", "internshala"],
city="Bengaluru",
results_wanted=50,
)
for job in jobs:
print(job)
# Python Developer @ TCS | Bengaluru | naukri
# Python Developer @ Infosys | Bengaluru | shine
CLI
# India job search
jobhunter search "data scientist" --city Mumbai --sources naukri,shine
# Hackathons & competitions
jobhunter search "hackathon" --sources unstop
# FAANG company jobs
jobhunter search "software engineer" --sources google_careers,amazon,netflix
# Export to CSV
jobhunter search "backend developer" --sources naukri,linkedin --output jobs.csv
# Top 20 ranked results
jobhunter search "machine learning" --sources naukri,shine,linkedin --top 20
๐ง Python API Reference
scrape_jobs()
from jobhunter import scrape_jobs
jobs = scrape_jobs(
search_term="python developer", # What to search
sources=["naukri", "shine"], # Which sources (list or "auto")
city="Bengaluru", # City filter (optional)
location="India", # Broader location (optional)
country="India", # Country (optional)
results_wanted=50, # Max results per source
job_kind="job", # "job", "internship", "hackathon"
remote=None, # True = remote only
salary_min=500000, # Min salary in INR (optional)
posted_within_days=30, # Only jobs from last N days
skills=["python", "django"], # Skill filter (optional)
experience_min=0, # Min years experience (optional)
experience_max=5, # Max years experience (optional)
)
Job Object
Every source returns the same normalized Job dataclass:
@dataclass
class Job:
title: str
company: str
source: str
job_url: str
location: str
city: str
country: str
work_mode: WorkMode # "remote" | "hybrid" | "onsite" | "unknown"
job_kind: JobKind # "job" | "internship" | "hackathon" | "competition"
salary: Money # min_amount, max_amount, currency, period
stipend: Money
skills: list[str]
experience_min: float | None
experience_max: float | None
description: str
date_posted: str | None
deadline: str | None # for competitions/hackathons
match_score: float # 0.0โ1.0 after ranking
Export
from jobhunter import scrape_jobs
from jobhunter.exporters import to_csv, to_json, to_dataframe
jobs = scrape_jobs("python developer", sources=["naukri", "shine"])
to_csv(jobs, "jobs.csv")
to_json(jobs, "jobs.json")
df = to_dataframe(jobs) # pandas DataFrame
๐๏ธ Project Structure
jobhunter/
โโโ __init__.py # scrape_jobs() entry point
โโโ models.py # Job, Money, WorkMode, JobKind dataclasses
โโโ query.py # JobQuery โ unified search parameters
โโโ engine.py # Orchestrates parallel scraping + dedup
โโโ registry.py # Scraper registry + auto-source selection
โโโ filtering.py # Soft filtering (salary, city, skills, date)
โโโ ranking.py # Relevance scoring / match_score
โโโ validation.py # Input validation
โโโ exceptions.py # Custom exceptions
โโโ cli.py # `jobhunter` CLI entry point
โ
โโโ scrapers/
โ โโโ base.py # BaseScraper ABC
โ โโโ naukri.py # ๐ฎ๐ณ Naukri โ /jobapi/v2/search REST API
โ โโโ shine.py # ๐ฎ๐ณ Shine โ __NEXT_DATA__ SSR JSON
โ โโโ internshala.py # ๐ฎ๐ณ Internshala โ HTML + pagination
โ โโโ unstop.py # ๐ฎ๐ณ Unstop โ hackathons REST API
โ โโโ linkedin.py # ๐ LinkedIn โ guest HTML API
โ โโโ indeed.py # ๐ Indeed โ GraphQL API
โ โโโ faang.py # ๐ Google, Amazon, Meta, Apple, Netflix, Microsoft
โ
โโโ exporters/
โ โโโ csv_exporter.py
โ โโโ json_exporter.py
โ โโโ dataframe.py
โ
โโโ utils/
โโโ fetchers.py # CachedFetcher with proxy + backend support
โโโ normalization.py # clean_text, parse_money, normalize_city, ...
tests/
๐ Source Details
๐ฎ๐ณ Naukri
- Endpoint:
GET https://www.naukri.com/jobapi/v2/search - Auth: Session cookies from page warm-up (automatic)
- Fields: Title, company, salary (LPA), location, skills, experience, date
- Pagination:
pageNo=N, 20 results/page, 3,000+ pages available
๐ฎ๐ณ Shine
- Endpoint:
__NEXT_DATA__SSR JSON embedded in HTML - Fields:
jJT(title),jCName(company),jSal(salary),jLoc(location),jKwd(skills),jPDate(date),jSlug(URL) - Pagination:
?page=N, 20 results/page, 900+ pages
๐ฎ๐ณ Internshala
- Endpoint: HTML scraping โ
div[id^='individual_internship_'][internshipid] - Pagination:
?page=N, 40+ cards/page - City filter: URL slug e.g.
/internships/python-intern-in-bengaluru/
๐ฎ๐ณ Unstop
- Endpoint:
GET https://unstop.com/api/public/opportunity/search-result - Note: Returns hackathons, coding competitions, and challenges only
- Fields: Title, organisation, skills, location, deadline, prize
๐ Indeed
- Endpoint:
POST https://apis.indeed.com/graphql - Auth: Public API key (included)
- Pagination: Cursor-based
๐ LinkedIn
- Endpoint:
GET https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search - Auth: None โ guest API
- FAANG filter:
f_Ccompany ID parameter
๐ Amazon
- Endpoint:
GET https://www.amazon.jobs/en/search.json - Auth: None โ public REST API
โ๏ธ Filtering
Filters are soft by default โ jobs missing a field pass through rather than being dropped:
jobs = scrape_jobs(
"python developer",
sources=["naukri", "shine"],
salary_min=600_000, # Only applied if salary data exists
city="Bengaluru", # Only applied if location data exists
skills=["python", "sql"], # Only applied if skills data exists
posted_within_days=14, # Only applied if date data exists
)
๐ Advanced Usage
FAANG-only search
from jobhunter import scrape_jobs
from jobhunter.registry import default_registry
registry = default_registry()
faang = registry.faang_sources() # ['google_careers', 'amazon', 'meta', 'apple', 'netflix', 'microsoft']
jobs = scrape_jobs(
search_term="software engineer",
sources=faang,
results_wanted=20,
)
Parallel scraping with custom config
jobs = scrape_jobs(
search_term="backend developer",
sources=["naukri", "shine", "linkedin"],
city="Hyderabad",
results_wanted=100,
posted_within_days=7,
cache_enabled=True, # Cache responses locally
proxies=["http://..."], # Optional proxy list
)
Auto-source selection
# Automatically picks India sources when country="India"
jobs = scrape_jobs(
search_term="python developer",
country="India",
sources="auto", # โ [indeed, linkedin, internshala, naukri, shine, unstop]
)
๐งช Running Tests
pip install -e .
pytest tests/
๐ License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hirehunt-0.2.0.tar.gz.
File metadata
- Download URL: hirehunt-0.2.0.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbb72c54dd512bc272e649178b96d45692701e8cb56730fa41e5904a02f7e5b4
|
|
| MD5 |
18f26e82274795d1904246b5c7d92f1d
|
|
| BLAKE2b-256 |
004478162b92b4c66dd73ddaa63014c643830c36cb96b79f2c564ed8660dc5a3
|
File details
Details for the file hirehunt-0.2.0-py3-none-any.whl.
File metadata
- Download URL: hirehunt-0.2.0-py3-none-any.whl
- Upload date:
- Size: 44.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9358998af70489de945e1702535ebef458f93ca2e5cd7dec54439a18e1a2f4bf
|
|
| MD5 |
44c474f73ee9cc7fa9fb213eb3834c54
|
|
| BLAKE2b-256 |
3e449835db64ffcb065a2b36b38d4fc5d999826aaace9e6a9e6bcbb73d912ed0
|