Skip to main content

Unified scraper for Japan's major job boards, with AI-assistant integration

Project description

jpjobs

Unified scraper for Japan's major job boards, with AI-assistant integration.

Pure Python. No API keys. No account required. Output drops cleanly into Claude / ChatGPT / Codex for ranking, filtering, and cover-letter drafting.

Why this exists

  • LinkedIn and Indeed index only a fraction of Japan's job market — the rest live on Japanese-language boards.
  • HelloWork (Japan's largest government job board, hundreds of thousands of listings) has no working public scraper. Its JavaScript-locked Maba form defeats naive HTTP scrapers.
  • Existing alternatives like python-jobspy are degrading under LinkedIn's anti-bot.
  • No unified job schema exists across Japan boards.

jpjobs solves all four behind one CLI and LLM-friendly output.

Sources

10 active sources returning real jobs as of v0.2:

Slug Type Browser? Description
hellowork Playwright (Maba form) yes Japan MHLW government board
linkedin HTTP (jobs-guest) no LinkedIn public guest endpoint
tokyodev HTTP no English-first IT
japandev HTTP no English-first IT
daijob HTTP no Bilingual professional
gaijinpot HTTP no English-speaker general
jobsinjapan HTTP no English-speaker general
green HTTP no IT/startup
forkwell HTTP no Engineer-focused
wantedly HTTP no Startups (public side)

7 experimental (anti-bot or login walls — contributions welcome): indeed, careercross, jrecin, otta, wellfound, doda, enworld

A typical Tokyo IT-Support scan across the 10 active sources returns ~300–400 deduplicated jobs.

Install

pip install jpjobs
playwright install chromium   # one-time, only for hellowork / indeed

Quickstart

# List supported sources
jpjobs --list-sources

# Cast the widest net: every source, last 7 days
jpjobs --keyword="IT Support" --pages=2

# Tokyo English-friendly only, ready to paste into an AI assistant
jpjobs --sources=linkedin,tokyodev,japandev,gaijinpot \
       --keyword="IT Support" --prefecture=tokyo \
       --format=llm > jobs.txt

See USAGE.md for a step-by-step walkthrough including troubleshooting.

CLI reference

Flag Purpose
--list-sources Show available sources and their capabilities
--sources Comma-separated slugs, or all (default)
--keyword Keyword filter — can be repeated
--prefecture One of 47 prefecture slugs (tokyo, osaka, …)
--location Free-text location (LinkedIn)
--pages Pagination depth per source (default 2)
--days Posted-within window (default 7)
--employment-type fulltime / parttime / contract / dispatch / freelance / intern
--language english / japanese / bilingual
--english-filter Post-filter results for English-signal jobs
--format json (default) / csv / markdown / table / llm
--output Write to file instead of stdout
--quiet Suppress progress events on stderr
--no-headless Run browser in visible mode (debugging)
--rate-limit Inter-request pacing in ms (default 700)

Output formats

Format Best for
json Pipe to jq or downstream code
csv Open in spreadsheets
markdown Embed in a GitHub README
table Read in the terminal
llm Paste into Claude / ChatGPT (capped at 50 jobs)

Using with AI assistants

See AGENTS.md. Typical flow:

# 1. Scan
jpjobs --sources=linkedin,tokyodev,gaijinpot \
       --keyword="IT Support" --format=llm > jobs.txt

# 2. Open one of the prompts, paste your resume + jobs.txt into Claude / ChatGPT
cat prompts/rank-against-resume.md

Ready-made prompts:

Job schema

Every source returns the same shape. Useful when piping to jq or AI assistants:

Field Type Notes
id string Stable cross-source hash
source string hellowork, linkedin, etc.
url string Direct link to posting
title string Job title
company string Employer name
description_snippet string ≤400 chars, LLM-safe
workplace string Raw posting text
prefecture string | null Normalized slug (tokyo, osaka, …)
prefecture_name string | null Tokyo / 東京
wage.min / .max number | null JPY
wage.unit string | null monthly / hourly / annual
employment_type string | null fulltime / parttime / contract / …
date_posted string | null ISO8601
language array english, japanese, bilingual signals

Configuration (optional)

Drop a jpjobs.config.json in your working directory:

{
  "sources": ["hellowork", "linkedin", "tokyodev"],
  "prefecture": "tokyo",
  "keywords": ["IT Support", "Helpdesk"],
  "pages": 2
}

No accounts, no API keys, no .env. Your resume and chat history go to whichever AI provider you paste them into — jpjobs never sees them.

Adding a source

See CONTRIBUTING.md. Short version: copy jpjobs/sources/_template.py, implement the scan() function, register the slug in aggregate.py.

Limitations

  • Detail pages aren't scraped — listings only.
  • Some boards (Wantedly full apply, Bizreach, Findy) are login-gated and intentionally unsupported.
  • Heavy anti-bot sites (Wellfound, Doda from some networks) are shipped as experimental stubs.
  • Rate limits apply. Large scans take minutes.

Ethical use

Scrape responsibly. Respect each site's robots.txt. Throttle requests. Identify yourself via the default User-Agent. Don't spam employers or mass-apply — job boards exist to serve job seekers and employers both.

License

MIT — see LICENSE.

Disclaimer

This project is not affiliated with HelloWork, MHLW, LinkedIn, Indeed, TokyoDev, JapanDev, Daijob, CareerCross, GaijinPot, JobsInJapan, Green, Forkwell, Wantedly, or any other listed board. All trademarks belong to their respective owners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jpjobs-0.2.0.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jpjobs-0.2.0-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file jpjobs-0.2.0.tar.gz.

File metadata

  • Download URL: jpjobs-0.2.0.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jpjobs-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a2be6f71a1b403d7c116dbe3e843b6c6bfbf4c5acd8ef22e5e6a87b823cca894
MD5 6b4d3fb358eb2f01c4593fc2c5bba029
BLAKE2b-256 13d8632ad83da4c8d55a5ee19ec69c7b0256b74507a1eb793595af3e7838f8b7

See more details on using hashes here.

File details

Details for the file jpjobs-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: jpjobs-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for jpjobs-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd8dffa162b6ad77ffd7aff4f004f4ca3c4bb83c945ece962333d84f82c99223
MD5 2cc1b955055ac433f825cbac5c0a9ace
BLAKE2b-256 364ff9233c96b77b6c955ff47b496f73f0622455dcae8505eb3be2a4cc9bc0bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page