Skip to main content

Efficient Vocabulary Selection for Foreign-Language Learning.

Project description

🎲 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Python MIT license

Infolingo logo

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Check out the live demo.

Installation

Use the package manager pip to install infolingo.

pip install infolingo

(Optional) Streamlit Demo GUI

# download repo
git clone
python -m venv .venv
source .venv/bin/activate

# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py

You should then see a locally hosted website like this:

Infolingo demo

Usage

Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.

from infolingo import Infolingo

il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]

Supported Languages

Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")

Custom Corpus

Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.

Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")

Vocabulary Picking Functions

We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.

Cross-Entropy

Selects the top n vocabulary that decreases cross-entropy for the text the most.

il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")

KL-Divergence

Select the top n vocabulary that decreases KL-Divergence for the text the most.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")

Frequent

Select the top n most frequent words in the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")

Random

Select n random words from the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")

Default Corpora

The default corpora used are listed below:

To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Changelog

1.0.0

Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infolingo-1.0.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

infolingo-1.0.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file infolingo-1.0.0.tar.gz.

File metadata

  • Download URL: infolingo-1.0.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8276ce64e79d8f81eed61f8089b17d7a04e6a22f80e4276606a3be29122f11c0
MD5 2c107ebd70ebac95ea562ca4f293c120
BLAKE2b-256 f3c85f112cbbc1bfb39fc3261dc7952315c9ee64f1a12561e9155fb2ec15695e

See more details on using hashes here.

File details

Details for the file infolingo-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: infolingo-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b10df804e2d296620b9bc5e772bfff06496a9112e1d9d72e785f397b29b8dab
MD5 9ee4ebc57f02006bea0c231dccc58493
BLAKE2b-256 4aa7e761f9144a7fbbc7761958b0744a558d0a50daef74e77dd74f5c3092f5a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page