Skip to main content

A jisho.org API and scraper in Python.

Project description

jisho-api

GitHub tag

A Python API built around scraping jisho.org, an online Japanese dictionary.

pip install jisho_api

asciicast

Requests

You can request three types of information:

  • Words
  • Kanji
  • Sentences
  • Tokenize sentences

The search terms are directly injected into jisho's search engine, which means all of the filters used to curate a search should work as well. For instance, "水" would look precisely for a word with just that character.

Check https://jisho.org/docs on how to use the search filters.

jisho search word water
jisho search word 水
jisho search word "#jlpt-n4"

The request replies are Pydantic objects. You can check the structure of a word request in jisho/word/cfg.py, and likewise for both kanji and sentences.

You could also do so programatically, by doing:

from jisho_api.word import Word
r = Word.request('water')
from jisho_api.kanji import Kanji
r = Kanji.request('水')
from jisho_api.sentence import Sentence
r = Sentence.request('水')
from jisho_api.tokenize import Tokens
r = Tokens.request('昨日すき焼きを食べました')

Note: Almost everything that is available in a page is being scraped. Note: Kanji requests can come with incomplete information, because it is not available in the page.

Scrapers

You can scrape the website for a list of given search terms. Supply them with a .txt file with the words separated by newlines.

jisho scrape word words.txt
jisho scrape kanji kanji.txt
jisho scrape sentence search_words.txt
jisho scrape tokens sentences.txt

All of the resulting searches will be stored in ~/.jisho/data.

In case you want to scrape programatically you can:

from jisho_api import scrape
from jisho_api.word import Word

word_requests = scrape(Word, ['water', 'fire'], 'to/path/')

This will return a dictionary, which key values are the search term and request result. Failing requests are not included.

Cache and config

If you want cache enabled just run

jisho config

This will create a ~/.jisho/ folder with a config.json with your settings. All your searches will be cached, and accessed if you search for the exact same term again.

Notes and considerations

According to this thread, there is no official API, although there is a kind of API request made by jisho.org, which is used to scrape words. This does not work for Kanji tho, because it would search the Kanji as a word, and not have any relevant metadata for the character itself.

Permissions to scrape also granted in the aforementioned thread.

As stated in their about page as well, jisho.org uses a collection of well-known electronic dictionaries:

This site uses the JMdict, Kanjidic2, JMnedict and Radkfile dictionary files. -jisho.org

Credits and Acknowledgements for data

All credit is given where it's due, and the several extracted resources is given at jisho.org's about page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jisho_api-0.1.8.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

jisho_api-0.1.8-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file jisho_api-0.1.8.tar.gz.

File metadata

  • Download URL: jisho_api-0.1.8.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.7.12 Linux/5.11.0-1022-azure

File hashes

Hashes for jisho_api-0.1.8.tar.gz
Algorithm Hash digest
SHA256 cd25a8831d4f081e623bbd774eef6770ffc6458c44453f2adeeece2df042918a
MD5 bd943791c243df90ef149e734791a331
BLAKE2b-256 6128f4aaade2f674144e5af34fa96f9b2474a86153fffc39c58452a045b5138e

See more details on using hashes here.

File details

Details for the file jisho_api-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: jisho_api-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.7.12 Linux/5.11.0-1022-azure

File hashes

Hashes for jisho_api-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d2dc82e73ee9f6c41ea3f89eb122a06a4226b2a35b56a4128dba994d5b531fd9
MD5 c69eae686c0026a7662ade053d8e5c55
BLAKE2b-256 1cb63f75122498c0c81c076948115c0ba27f98cae6ce1f9114df671916914ecb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page