A programmable job-scraping framework for India & global markets. Aggregates Naukri, Shine, Internshala, LinkedIn, Indeed, and FAANG companies into a unified dataset.

Project description

🎯 HireHunt

A programmable job-scraping framework for India & global markets.
Aggregate jobs from 12 sources — Naukri, Internshala, Shine, LinkedIn, Indeed, and FAANG companies — into a unified, filterable, ranked dataset.

✨ Sources

Source	Region	Type	Method
`naukri`	🇮🇳 India	Jobs	REST API — 15,000+ listings
`shine`	🇮🇳 India	Jobs	SSR JSON — 17,000+ listings
`internshala`	🇮🇳 India	Internships / Jobs	HTML scraping
`unstop`	🇮🇳 India	Hackathons / Competitions	REST API
`linkedin`	🌍 Global	Jobs	Guest HTML API
`indeed`	🌍 Global	Jobs	GraphQL API
`google_careers`	🌍 FAANG	Jobs	LinkedIn (company-filtered)
`amazon`	🌍 FAANG	Jobs	REST API
`meta`	🌍 FAANG	Jobs	LinkedIn (company-filtered)
`apple`	🌍 FAANG	Jobs	LinkedIn (keyword search)
`netflix`	🌍 FAANG	Jobs	LinkedIn (company-filtered)
`microsoft`	🌍 FAANG	Jobs	LinkedIn (company-filtered)

📦 Installation

pip install hirehunt

Note: The PyPI package is hirehunt. The import name is jobhunter.
import jobhunter   # ← this is correct after pip install hirehunt

Requirements: Python 3.10+

⚡ Quick Start

Python API

from jobhunter import scrape_jobs

# Search across India's top job boards
jobs = scrape_jobs(
    search_term="python developer",
    sources=["naukri", "shine", "internshala"],
    city="Bengaluru",
    results_wanted=50,
)

for job in jobs:
    print(job)
# Python Developer @ TCS | Bengaluru | naukri
# Python Developer @ Infosys | Bengaluru | shine

CLI

# India job search
jobhunter search "data scientist" --city Mumbai --sources naukri,shine

# Hackathons & competitions
jobhunter search "hackathon" --sources unstop

# FAANG company jobs
jobhunter search "software engineer" --sources google_careers,amazon,netflix

# Export to CSV
jobhunter search "backend developer" --sources naukri,linkedin --output jobs.csv

# Top 20 ranked results
jobhunter search "machine learning" --sources naukri,shine,linkedin --top 20

🔧 Python API Reference

`scrape_jobs()`

from jobhunter import scrape_jobs

jobs = scrape_jobs(
    search_term="python developer",   # What to search
    sources=["naukri", "shine"],      # Which sources (list or "auto")
    city="Bengaluru",                 # City filter (optional)
    location="India",                 # Broader location (optional)
    country="India",                  # Country (optional)
    results_wanted=50,                # Max results per source
    job_kind="job",                   # "job", "internship", "hackathon"
    remote=None,                      # True = remote only
    salary_min=500000,                # Min salary in INR (optional)
    posted_within_days=30,            # Only jobs from last N days
    skills=["python", "django"],      # Skill filter (optional)
    experience_min=0,                 # Min years experience (optional)
    experience_max=5,                 # Max years experience (optional)
)

`Job` Object

Every source returns the same normalized Job dataclass:

@dataclass
class Job:
    title: str
    company: str
    source: str
    job_url: str

    location: str
    city: str
    country: str
    work_mode: WorkMode         # "remote" | "hybrid" | "onsite" | "unknown"
    job_kind: JobKind           # "job" | "internship" | "hackathon" | "competition"

    salary: Money               # min_amount, max_amount, currency, period
    stipend: Money

    skills: list[str]
    experience_min: float | None
    experience_max: float | None
    description: str
    date_posted: str | None
    deadline: str | None        # for competitions/hackathons

    match_score: float          # 0.0–1.0 after ranking

Export

from jobhunter import scrape_jobs
from jobhunter.exporters import to_csv, to_json, to_dataframe

jobs = scrape_jobs("python developer", sources=["naukri", "shine"])

to_csv(jobs, "jobs.csv")
to_json(jobs, "jobs.json")
df = to_dataframe(jobs)   # pandas DataFrame

🏗️ Project Structure

jobhunter/
├── __init__.py          # scrape_jobs() entry point
├── models.py            # Job, Money, WorkMode, JobKind dataclasses
├── query.py             # JobQuery — unified search parameters
├── engine.py            # Orchestrates parallel scraping + dedup
├── registry.py          # Scraper registry + auto-source selection
├── filtering.py         # Soft filtering (salary, city, skills, date)
├── ranking.py           # Relevance scoring / match_score
├── validation.py        # Input validation
├── exceptions.py        # Custom exceptions
├── cli.py               # `jobhunter` CLI entry point
│
├── scrapers/
│   ├── base.py          # BaseScraper ABC
│   ├── naukri.py        # 🇮🇳 Naukri — /jobapi/v2/search REST API
│   ├── shine.py         # 🇮🇳 Shine — __NEXT_DATA__ SSR JSON
│   ├── internshala.py   # 🇮🇳 Internshala — HTML + pagination
│   ├── unstop.py        # 🇮🇳 Unstop — hackathons REST API
│   ├── linkedin.py      # 🌍 LinkedIn — guest HTML API
│   ├── indeed.py        # 🌍 Indeed — GraphQL API
│   └── faang.py         # 🌍 Google, Amazon, Meta, Apple, Netflix, Microsoft
│
├── exporters/
│   ├── csv_exporter.py
│   ├── json_exporter.py
│   └── dataframe.py
│
└── utils/
    ├── fetchers.py      # CachedFetcher with proxy + backend support
    └── normalization.py # clean_text, parse_money, normalize_city, ...

tests/

🔍 Source Details

🇮🇳 Naukri

Endpoint: GET https://www.naukri.com/jobapi/v2/search
Auth: Session cookies from page warm-up (automatic)
Fields: Title, company, salary (LPA), location, skills, experience, date
Pagination: pageNo=N, 20 results/page, 3,000+ pages available

🇮🇳 Shine

Endpoint: __NEXT_DATA__ SSR JSON embedded in HTML
Fields: jJT (title), jCName (company), jSal (salary), jLoc (location), jKwd (skills), jPDate (date), jSlug (URL)
Pagination: ?page=N, 20 results/page, 900+ pages

🇮🇳 Internshala

Endpoint: HTML scraping — div[id^='individual_internship_'][internshipid]
Pagination: ?page=N, 40+ cards/page
City filter: URL slug e.g. /internships/python-intern-in-bengaluru/

🇮🇳 Unstop

Endpoint: GET https://unstop.com/api/public/opportunity/search-result
Note: Returns hackathons, coding competitions, and challenges only
Fields: Title, organisation, skills, location, deadline, prize

🌍 Indeed

Endpoint: POST https://apis.indeed.com/graphql
Auth: Public API key (included)
Pagination: Cursor-based

🌍 LinkedIn

Endpoint: GET https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search
Auth: None — guest API
FAANG filter: f_C company ID parameter

🌍 Amazon

Endpoint: GET https://www.amazon.jobs/en/search.json
Auth: None — public REST API

⚙️ Filtering

Filters are soft by default — jobs missing a field pass through rather than being dropped:

jobs = scrape_jobs(
    "python developer",
    sources=["naukri", "shine"],
    salary_min=600_000,        # Only applied if salary data exists
    city="Bengaluru",          # Only applied if location data exists
    skills=["python", "sql"],  # Only applied if skills data exists
    posted_within_days=14,     # Only applied if date data exists
)

🚀 Advanced Usage

FAANG-only search

from jobhunter import scrape_jobs
from jobhunter.registry import default_registry

registry = default_registry()
faang = registry.faang_sources()  # ['google_careers', 'amazon', 'meta', 'apple', 'netflix', 'microsoft']

jobs = scrape_jobs(
    search_term="software engineer",
    sources=faang,
    results_wanted=20,
)

Parallel scraping with custom config

jobs = scrape_jobs(
    search_term="backend developer",
    sources=["naukri", "shine", "linkedin"],
    city="Hyderabad",
    results_wanted=100,
    posted_within_days=7,
    cache_enabled=True,        # Cache responses locally
    proxies=["http://..."],    # Optional proxy list
)

Auto-source selection

# Automatically picks India sources when country="India"
jobs = scrape_jobs(
    search_term="python developer",
    country="India",
    sources="auto",  # → [indeed, linkedin, internshala, naukri, shine, unstop]
)

🧪 Running Tests

pip install -e .
pytest tests/

📄 License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hirehunt-0.2.0.tar.gz (36.0 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hirehunt-0.2.0-py3-none-any.whl (44.8 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file hirehunt-0.2.0.tar.gz.

File metadata

Download URL: hirehunt-0.2.0.tar.gz
Upload date: Jun 9, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hirehunt-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`cbb72c54dd512bc272e649178b96d45692701e8cb56730fa41e5904a02f7e5b4`
MD5	`18f26e82274795d1904246b5c7d92f1d`
BLAKE2b-256	`004478162b92b4c66dd73ddaa63014c643830c36cb96b79f2c564ed8660dc5a3`

See more details on using hashes here.

File details

Details for the file hirehunt-0.2.0-py3-none-any.whl.

File metadata

Download URL: hirehunt-0.2.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 44.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hirehunt-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9358998af70489de945e1702535ebef458f93ca2e5cd7dec54439a18e1a2f4bf`
MD5	`44c474f73ee9cc7fa9fb213eb3834c54`
BLAKE2b-256	`3e449835db64ffcb065a2b36b38d4fc5d999826aaace9e6a9e6bcbb73d912ed0`

See more details on using hashes here.

hirehunt 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🎯 HireHunt

✨ Sources

📦 Installation

⚡ Quick Start

Python API

CLI

🔧 Python API Reference

scrape_jobs()

Job Object

Export

🏗️ Project Structure

🔍 Source Details

🇮🇳 Naukri

🇮🇳 Shine

🇮🇳 Internshala

🇮🇳 Unstop

🌍 Indeed

🌍 LinkedIn

🌍 Amazon

⚙️ Filtering

🚀 Advanced Usage

FAANG-only search

Parallel scraping with custom config

Auto-source selection

🧪 Running Tests

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`scrape_jobs()`

`Job` Object