Skip to main content

Efficient Vocabulary Selection for Foreign-Language Learning.

Project description

Infolingo logo

🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Python MIT license

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Check out the live demo.

Installation

Use the package manager pip to install infolingo.

pip install infolingo

(Optional) Streamlit Demo GUI

# download repo
git clone
python -m venv .venv
source .venv/bin/activate

# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py

You should then see a locally hosted website like this:

Infolingo demo

Usage

Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.

from infolingo import Infolingo

il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]

Supported Languages

Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")

Custom Corpus

Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.

Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")

Vocabulary Picking Functions

We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.

Cross-Entropy

Selects the top n vocabulary that decreases cross-entropy for the text the most.

il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")

KL-Divergence

Select the top n vocabulary that decreases KL-Divergence for the text the most.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")

Frequent

Select the top n most frequent words in the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")

Random

Select n random words from the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")

Default Corpora

The default corpora used are listed below:

To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Changelog

1.1.0

Update README.md and links.

1.0.0

Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infolingo-1.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

infolingo-1.1.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file infolingo-1.1.0.tar.gz.

File metadata

  • Download URL: infolingo-1.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c4ace5596c4e40724d195cf9798ea6c6bcf69ad43d818066c86be6e41cab4168
MD5 39fbfb8c5efd2f97cbda1c1f3035e21d
BLAKE2b-256 e10c68c8b3482013bbf20797b352cf17845f0329e01fa37e7d0fd4eab516114f

See more details on using hashes here.

File details

Details for the file infolingo-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: infolingo-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for infolingo-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3fdafd2ba630a7105c60044633158a3fee8812f9cd1f32e79f369719244416e
MD5 108a617bd14aa3e0e9129397dcbd8e4e
BLAKE2b-256 1d7e529b157016a2499db743b5ba9fe650cd58dc7fd0df4c252a285c306cff5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page