Unified scraper for Japan's major job boards, with AI-assistant integration
Project description
jpjobs
Unified scraper for Japan's major job boards, with AI-assistant integration.
Pure Python. No API keys. No account required. Output drops cleanly into Claude / ChatGPT / Codex for ranking, filtering, and cover-letter drafting.
Why this exists
- LinkedIn and Indeed index only a fraction of Japan's job market — the rest live on Japanese-language boards.
- HelloWork (Japan's largest government job board, hundreds of thousands of listings) has no working public scraper. Its JavaScript-locked Maba form defeats naive HTTP scrapers.
- Existing alternatives like
python-jobspyare degrading under LinkedIn's anti-bot. - No unified job schema exists across Japan boards.
jpjobs solves all four behind one CLI and LLM-friendly output.
Sources
10 active sources returning real jobs as of v0.2:
| Slug | Type | Browser? | Description |
|---|---|---|---|
hellowork |
Playwright (Maba form) | yes | Japan MHLW government board |
linkedin |
HTTP (jobs-guest) | no | LinkedIn public guest endpoint |
tokyodev |
HTTP | no | English-first IT |
japandev |
HTTP | no | English-first IT |
daijob |
HTTP | no | Bilingual professional |
gaijinpot |
HTTP | no | English-speaker general |
jobsinjapan |
HTTP | no | English-speaker general |
green |
HTTP | no | IT/startup |
forkwell |
HTTP | no | Engineer-focused |
wantedly |
HTTP | no | Startups (public side) |
7 experimental (anti-bot or login walls — contributions welcome):
indeed, careercross, jrecin, otta, wellfound, doda, enworld
A typical Tokyo IT-Support scan across the 10 active sources returns ~300–400 deduplicated jobs.
Install
pip install jpjobs
playwright install chromium # one-time, only for hellowork / indeed
Quickstart
# List supported sources
jpjobs --list-sources
# Cast the widest net: every source, last 7 days
jpjobs --keyword="IT Support" --pages=2
# Tokyo English-friendly only, ready to paste into an AI assistant
jpjobs --sources=linkedin,tokyodev,japandev,gaijinpot \
--keyword="IT Support" --prefecture=tokyo \
--format=llm > jobs.txt
See USAGE.md for a step-by-step walkthrough including troubleshooting.
CLI reference
| Flag | Purpose |
|---|---|
--list-sources |
Show available sources and their capabilities |
--sources |
Comma-separated slugs, or all (default) |
--keyword |
Keyword filter — can be repeated |
--prefecture |
One of 47 prefecture slugs (tokyo, osaka, …) |
--location |
Free-text location (LinkedIn) |
--pages |
Pagination depth per source (default 2) |
--days |
Posted-within window (default 7) |
--employment-type |
fulltime / parttime / contract / dispatch / freelance / intern |
--language |
english / japanese / bilingual |
--english-filter |
Post-filter results for English-signal jobs |
--format |
json (default) / csv / markdown / table / llm |
--output |
Write to file instead of stdout |
--quiet |
Suppress progress events on stderr |
--no-headless |
Run browser in visible mode (debugging) |
--rate-limit |
Inter-request pacing in ms (default 700) |
Output formats
| Format | Best for |
|---|---|
json |
Pipe to jq or downstream code |
csv |
Open in spreadsheets |
markdown |
Embed in a GitHub README |
table |
Read in the terminal |
llm |
Paste into Claude / ChatGPT (capped at 50 jobs) |
Using with AI assistants
See AGENTS.md. Typical flow:
# 1. Scan
jpjobs --sources=linkedin,tokyodev,gaijinpot \
--keyword="IT Support" --format=llm > jobs.txt
# 2. Open one of the prompts, paste your resume + jobs.txt into Claude / ChatGPT
cat prompts/rank-against-resume.md
Ready-made prompts:
prompts/rank-against-resume.mdprompts/filter-english-friendly.mdprompts/extract-companies-for-research.mdprompts/summarize-market-trends.mdprompts/write-tailored-cover-letter.md
Job schema
Every source returns the same shape. Useful when piping to jq or AI assistants:
| Field | Type | Notes |
|---|---|---|
id |
string | Stable cross-source hash |
source |
string | hellowork, linkedin, etc. |
url |
string | Direct link to posting |
title |
string | Job title |
company |
string | Employer name |
description_snippet |
string | ≤400 chars, LLM-safe |
workplace |
string | Raw posting text |
prefecture |
string | null | Normalized slug (tokyo, osaka, …) |
prefecture_name |
string | null | Tokyo / 東京 |
wage.min / .max |
number | null | JPY |
wage.unit |
string | null | monthly / hourly / annual |
employment_type |
string | null | fulltime / parttime / contract / … |
date_posted |
string | null | ISO8601 |
language |
array | english, japanese, bilingual signals |
Configuration (optional)
Drop a jpjobs.config.json in your working directory:
{
"sources": ["hellowork", "linkedin", "tokyodev"],
"prefecture": "tokyo",
"keywords": ["IT Support", "Helpdesk"],
"pages": 2
}
No accounts, no API keys, no .env. Your resume and chat history go to whichever AI provider you paste them into — jpjobs never sees them.
Adding a source
See CONTRIBUTING.md. Short version: copy jpjobs/sources/_template.py, implement the scan() function, register the slug in aggregate.py.
Limitations
- Detail pages aren't scraped — listings only.
- Some boards (Wantedly full apply, Bizreach, Findy) are login-gated and intentionally unsupported.
- Heavy anti-bot sites (Wellfound, Doda from some networks) are shipped as
experimentalstubs. - Rate limits apply. Large scans take minutes.
Ethical use
Scrape responsibly. Respect each site's robots.txt. Throttle requests. Identify yourself via the default User-Agent. Don't spam employers or mass-apply — job boards exist to serve job seekers and employers both.
License
MIT — see LICENSE.
Disclaimer
This project is not affiliated with HelloWork, MHLW, LinkedIn, Indeed, TokyoDev, JapanDev, Daijob, CareerCross, GaijinPot, JobsInJapan, Green, Forkwell, Wantedly, or any other listed board. All trademarks belong to their respective owners.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jpjobs-0.2.0.tar.gz.
File metadata
- Download URL: jpjobs-0.2.0.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2be6f71a1b403d7c116dbe3e843b6c6bfbf4c5acd8ef22e5e6a87b823cca894
|
|
| MD5 |
6b4d3fb358eb2f01c4593fc2c5bba029
|
|
| BLAKE2b-256 |
13d8632ad83da4c8d55a5ee19ec69c7b0256b74507a1eb793595af3e7838f8b7
|
File details
Details for the file jpjobs-0.2.0-py3-none-any.whl.
File metadata
- Download URL: jpjobs-0.2.0-py3-none-any.whl
- Upload date:
- Size: 59.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd8dffa162b6ad77ffd7aff4f004f4ca3c4bb83c945ece962333d84f82c99223
|
|
| MD5 |
2cc1b955055ac433f825cbac5c0a9ace
|
|
| BLAKE2b-256 |
364ff9233c96b77b6c955ff47b496f73f0622455dcae8505eb3be2a4cc9bc0bc
|