Skip to main content

A Python tool that scrapes pitch accent, phonetic pronunciation, definitions, and example sentences from Wiktionary to create flashcards for Anki.

Project description

jplookup

jplookup is a Python tool designed to scrape pitch accent, phonetic pronunciation, definitions and example sentences from Wiktionary and turn them into straight-forward flashcards for Anki. oniisan

Features

Pitch Accent

Anki cards made with jplookup can mark the pitch accent. This will put a solid dot above the mora that will have a high pitch that's then followed by a mora with a low pitch. oriru


When a noted high pitch is sustained due to being part of a diagraph and/or has a lengthening vowel, then CSS is used to render a line indicating this. suiyoobi

If there's no dot present, then the word follows the standard pitch accent.


Scrapes Japanese word data

jplookup.scrape("猫") returns a list of dictionary objects. The very first dictionary in the list contains the primary results: neko

The rest of the list may provide further dictionaries, which are gathered from page redirects whose contents could not be linked back to the primary results dictionary through mutual matching components.


jplookup seeks out parts of speech, under those there are pronunciations, definitions, synonyms and antonyms. Each pronunciation will generally have the kana, the IPA, the pitch accent, and the furigana. Each definition is a dictionary and can contain example sentences.


Anki Integration

The program outputs a text file that can easily be read into Anki. Its fields are:

  • Key Term
  • Kana
  • Kanji
  • Definitions
  • IPA
  • Pretty Kana (HTML rendering)
  • Pretty Kanji (HTML rendering)
  • Usage Notes
  • Counter Noun

Handles Terms Linking to Other Pages

When Wiktionary links to a different page for an alternative spelling, then the information gathered from that redirect will be filtered through the original spelling in order to provide the only relevant information.

  • "撮る" redirects to the Wiktionary page for "とる" and grabs any definitions that are either specified as fitting with "撮る" or definitions with no context/kanji specification at all.
  • "取る" redirects to the Wiktionary page for "とる" and grabs any definitions that are either specified as fitting with "取る" or definitions with no context/kanji specification at all.
  • "とる" (the hiragana directly) goes to the Wiktionary page for "とる" and grabs all definitions regardless of context specification.

Installation

Clone the repository and install the required dependencies (bs4 and jaconv):

git clone https://github.com/travisgk/jplookup.git
cd jplookup
pip install -r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jplookup-1.0.0.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jplookup-1.0.0-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file jplookup-1.0.0.tar.gz.

File metadata

  • Download URL: jplookup-1.0.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for jplookup-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f437ba80bd6876d7db42a10888a6c9e3970c12713737dfa250667efb0df33aae
MD5 cbd563ddb758f4b7c07547f7b20a60f6
BLAKE2b-256 4487634bcaf6ab5d9e2e21c1f3ad6e16371ea4d59bc3e4be91a7fd3300d08dc1

See more details on using hashes here.

File details

Details for the file jplookup-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jplookup-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 48.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for jplookup-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6337ad935ab840edc13751a054173733dbf333d19deee99f72cb528885259910
MD5 339e5d71b0e6b27aed2c6716c044685a
BLAKE2b-256 26f33294b2d663e3d3a07f5780be9053ed296c0d3ee4414750ee9b75aa18f8bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page