Efficient Vocabulary Selection for Foreign-Language Learning.
Project description
🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning
Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.
Check out the live demo.
Installation
Use the package manager pip to install infolingo.
pip install infolingo
(Optional) Streamlit Demo GUI
# download repo
git clone
python -m venv .venv
source .venv/bin/activate
# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py
You should then see a locally hosted website like this:
Usage
Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.
from infolingo import Infolingo
il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]
Supported Languages
Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")
Custom Corpus
Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.
Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")
Vocabulary Picking Functions
We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.
Cross-Entropy
Selects the top n vocabulary that decreases cross-entropy for the text the most.
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")
KL-Divergence
Select the top n vocabulary that decreases KL-Divergence for the text the most.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")
Frequent
Select the top n most frequent words in the text.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")
Random
Select n random words from the text.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")
Default Corpora
The default corpora used are listed below:
- English: Brown Corpus
- Spanish: Wortschatz Leipzig. spa_news_2023_300K-words
- French: Wortschatz Leipzig. fra_news_2023_300K-words
To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.
Contributing
Any contributions you make are greatly appreciated.
If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Changelog
1.1.0
Update README.md and links.
1.0.0
Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infolingo-1.1.0.tar.gz.
File metadata
- Download URL: infolingo-1.1.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4ace5596c4e40724d195cf9798ea6c6bcf69ad43d818066c86be6e41cab4168
|
|
| MD5 |
39fbfb8c5efd2f97cbda1c1f3035e21d
|
|
| BLAKE2b-256 |
e10c68c8b3482013bbf20797b352cf17845f0329e01fa37e7d0fd4eab516114f
|
File details
Details for the file infolingo-1.1.0-py3-none-any.whl.
File metadata
- Download URL: infolingo-1.1.0-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3fdafd2ba630a7105c60044633158a3fee8812f9cd1f32e79f369719244416e
|
|
| MD5 |
108a617bd14aa3e0e9129397dcbd8e4e
|
|
| BLAKE2b-256 |
1d7e529b157016a2499db743b5ba9fe650cd58dc7fd0df4c252a285c306cff5f
|