Skip to main content

Efficient Vocabulary Selection for Foreign-Language Learning.

Project description

Infolingo logo

🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Python MIT license

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Check out the live demo.

Installation

Use the package manager pip to install infolingo.

pip install infolingo

(Optional) Streamlit Demo GUI

# download repo
git clone
python -m venv .venv
source .venv/bin/activate

# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py

You should then see a locally hosted website like this:

Infolingo demo

Usage

Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.

from infolingo import Infolingo

il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]

Supported Languages

Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")

Custom Corpus

Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.

Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")

Vocabulary Picking Functions

We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.

Cross-Entropy

Selects the top n vocabulary that decreases cross-entropy for the text the most.

il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")

KL-Divergence

Select the top n vocabulary that decreases KL-Divergence for the text the most.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")

Frequent

Select the top n most frequent words in the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")

Random

Select n random words from the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")

Default Corpora

The default corpora used are listed below:

To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Changelog

1.0.1

Update README.md and links.

1.0.0

Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infolingo-1.0.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

infolingo-1.0.1-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file infolingo-1.0.1.tar.gz.

File metadata

  • Download URL: infolingo-1.0.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c7e0381c4a735d842917cc3932b7c3168eb38f0bd233d16a172d658b13328151
MD5 6a8088f96815245a8965ce7309c88f2d
BLAKE2b-256 2dff4cb4b4f93565efc587519e72487b886b3235100beb1ba5b38afc9ba8a134

See more details on using hashes here.

File details

Details for the file infolingo-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: infolingo-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a4411e02b0e86273f0c0f87fa57064b09ccf4d404b63064e2ad0102651546ad1
MD5 503732fd08e1646165d1edeadc7a7e48
BLAKE2b-256 1133f9e38eae28697101e6031eb9551822fab4eec08b75a3d835c87b0c3916fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page