Skip to main content

A language learning utility with Anki integration

Project description

Ankipan

Ankipan is a flashcard creation program for language-learning that helps you spend more time on what you enjoy, and less on guessing and looking up words while immersed.

Prepare for your upcoming immersions by deliberately focussing on the words that are most relevant to the sources that you are interested in. Ankipan lets you parse any text or corpus (text, subtitles, websites, lyrics etc.), sorts the words by frequency and filters the words you are currently learning or that you already know.

New words are internally stored as decks and can be converted to Anki Flashcards, which contain customizable content such as scraped dictionary definitions and example sentences from different sources. Optional translations and explanations for the example sentences can be generated from your own (free) google gemini api key.

Getting started

1. Prerequisites

2. Installation

  • Using pip:
pip install ankipan
  • From source:
git clone git@gitlab.com:ankipan/ankipan.git
cd ankipan
pip install .

3. (Optional) Install lemmatizers to parse your own texts

pip install stanza

4. (Optional but recommended) Use your own gemini API key to generate translations and explanations for example sentences (see prompt in ankipan/translator.py)

 python3 -c "import ankipan;ankipan.Config.set_gemini_api_key('<api key>')"
  • Each gemini key has a free quota of 1500 prompts/day, and each prompt can process up to 300 sentences at once, which allows you to process sentences for ~10.000 flashcards per day.
  • If one of your sentences has been cached on the server by a previous user, it is not processed in your prompt.
  • The server also has a free gemini api key set up, and each user has up to 10 server-side prompts per IP. If they run out, users will have to use their own keys to generate translations/explanations.

Usage

See interactive source notebook in /examples

# Create a new collection with your name, learning language and native language
from ankipan import Collection
collection = Collection('One Piece 1', learning_lang='jp', native_lang='en')

# Specify content to be downloaded for flashcards (see collection.get_available_sources() for example sentences and scraper.py module)

# the following e.g. prints ['jisho', 'wadoku', 'wikitionary_de', 'wikitionary_en', 'wikitionary_fr', 'wikitionary_jp', 'tatoeba', 'urban']:
print(collection.valid_definition_fields)
# now we select which definitions we want on our flashcard backside:
definitions = ['wadoku', 'jisho', 'wikitionary_en']

# the following e.g. prints ['lyrics', 'wikipedia', 'youtube']:
print(collection.get_available_sources())
# the following e.g. prints ['hajimesyacho', 'sushiramen', 'hikakin', 'fischers']:
print(c.get_available_sources('youtube'))
# the following can also be left empty if you have no preference, otherwise example sentences from the specified sources will be prioritized:
example_sentence_source_paths = ['wikipedia', 'syosetu.com', 'youtube/fischers', 'youtube/sushiramen']

# set the fields in the collection:
c.set_flashcard_fields(definitions = definitions, example_sentence_source_paths = example_sentence_source_paths)

# Specify a source the words of which you would like to add to your deck, either directly as string, as path to file or folder, or directly by source name
# see source names from collection.get_available_sources()

words = collection.collect(source_path='wikipedia/O/ONE_PIECE.html') # from DB, no lemmatizers required
# words = collection.collect(string='かつてこの世の全てを手に入れた男、〝海賊王〟ゴールド・ロジャー。') # from string
# words = collection.collect('./example_text_jp.txt') # textfile from path (original source: https://ja.wikipedia.org/wiki/ONE_PIECE)
# words = collection.collect('./example_subtitle_jp.srt') # subtitle from path

# Select the words you already know and the words you would like to learn from the table overview
words.select_new_words()

# Add words to collection
collection.add_deck(words, 'example_source')

# Optional: Persist collection state to harddrive (see /'.data' folder)
collection.save()

# Download content for new cards (also autosaves collection to drive)
collection.fetch('example_source')

# Sync current collection with anki to upload them to currently open anki instance
collection.sync_with_anki('testsource')

Notes

  • Current lemmatization is done via the stanza library in the reader.py module. While this works mostly fine, the library still just uses a statistical model to estimate the likely word roots (lemmas) of the different pieces of sentences. It sometimes makes mistakes, which requires the users to manually filter them in the select_new_words overview, or suspend the card later on in anki.

  • The translation engine running on the server has a limited quota (free gemini api). Once it has been exceeded for the day, users will have to specify their own google gemini API key which is then locally used for translations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ankipan-0.3.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ankipan-0.3-py3-none-any.whl (54.6 kB view details)

Uploaded Python 3

File details

Details for the file ankipan-0.3.tar.gz.

File metadata

  • Download URL: ankipan-0.3.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for ankipan-0.3.tar.gz
Algorithm Hash digest
SHA256 5a1d4f1dc103791f3f58acbfbdadeb89bda858c806fe31b4ff12953e536e814b
MD5 465659912481fe36cb5837028701c2cc
BLAKE2b-256 80c53542c16aa813a08bcb83999233647e91223cbab3be51705ebc8462fb2fcc

See more details on using hashes here.

File details

Details for the file ankipan-0.3-py3-none-any.whl.

File metadata

  • Download URL: ankipan-0.3-py3-none-any.whl
  • Upload date:
  • Size: 54.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for ankipan-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 114b251e121b525ecd45e207f0b60e6596b887585518c9c5099a761f7eee4720
MD5 9c1af545ed17eea5928a95a64a61d2eb
BLAKE2b-256 03cc131ee0c85006890082d072f1acc8d73c4e22faa888a9cec2c328c3f6da14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page