Convert vocabulary exports from Pod101, Language Reactor, and more to Readlang CSV format
Project description
words-to-readlang
A Python library and CLI tool for converting vocabulary exports from language learning platforms (Pod101, Language Reactor, etc.) into the CSV format accepted by Readlang's word import.
It also comes with a web interface for uploading, previewing, editing, and exporting vocabulary — with automatic example sentence fetching from Tatoeba and the Leipzig Corpora Collection.
Background
Readlang is a reading-focused vocabulary tool with spaced-repetition flashcards. It supports importing words via CSV, but the format it expects is specific, and the exports from other learning platforms don't match it out of the box. This library bridges that gap.
Installation
pip install words-to-readlang
Or install from source:
git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e .
To use the web interface, install with the web extras:
pip install -e ".[web]"
Supported Input Formats
Pod101 (FinnishPod101, SpanishPod101, etc.)
Pod101 language learning sites let Premium subscribers save words to a Word Bank, which can be exported as a CSV file. The export is a simple two-column file (Word, English) encoded in UTF-16, which is automatically detected.
Language Reactor
Language Reactor is a browser extension for learning languages through Netflix and YouTube. Pro subscribers can save words and export them as a tab-separated file from the Saved Items panel. This format includes the base/dictionary form, translations, and subtitle sentences.
CLI Usage
Commands
words-to-readlang convert Convert a vocabulary file to Readlang CSV
words-to-readlang serve Start the web interface
words-to-readlang list-formats List all available input formats
Basic Conversion
# From Pod101
words-to-readlang convert input.csv output.csv --format pod101
# From Language Reactor
words-to-readlang convert saved.csv output.csv --format languagereactor
# Short alias
words-to-readlang convert saved.csv output.csv --format lr
If your input has more than 200 words (Readlang's import limit), the output is automatically split into multiple files: output (part 1).csv, output (part 2).csv, etc.
Example Sentence Lookup
Language Reactor saves words in their inflected form as found in subtitles. Readlang requires the example sentence to contain the exact word, so mismatches are detected and the context is cleared.
The --fetch flag fetches replacement example sentences automatically. It tries Tatoeba first, then falls back to the Leipzig Corpora Collection if Tatoeba has no match.
words-to-readlang convert input.csv output.csv --format languagereactor --fetch
This is also useful for Pod101 exports, which never include example sentences.
By default the source language is Finnish (--lang fin). Pass the ISO 639-3 code for other languages:
words-to-readlang convert words.csv output.csv --format pod101 --fetch --lang swe
convert options
| Option | Short | Default | Description |
|---|---|---|---|
--format |
-f |
(required) | Input format: pod101, languagereactor, or lr |
--fetch |
off | Fetch missing examples from Tatoeba / Leipzig | |
--lang |
fin |
ISO 639-3 source language code | |
--version |
Print version and exit |
List Available Formats
words-to-readlang list-formats
Web Interface
# Development server (localhost only)
words-to-readlang serve
# Custom host/port
words-to-readlang serve --port 8080
# Bind to all interfaces with auto-reload
words-to-readlang serve --host 0.0.0.0 --port 8080 --debug
serve options:
| Option | Short | Default | Description |
|---|---|---|---|
--host |
-h |
127.0.0.1 |
Address to bind to |
--port |
-p |
5000 |
Port to listen on |
--debug |
off | Enable Flask debug mode with auto-reload | |
--metrics-port |
0 (off) |
Expose Prometheus metrics on this port | |
--metrics-host |
127.0.0.1 |
Address for the metrics endpoint |
See README_WEB.md for full web interface and Docker deployment documentation.
Programmatic API
from pathlib import Path
from words_to_readlang import Converter
# Basic usage
converter = Converter()
output_files = converter.convert(
input_path=Path("words.csv"),
output_path=Path("readlang.csv"),
parser_name="pod101"
)
# With example sentence lookup
converter = Converter(fetch_examples=True, src_lang="fin")
output_files = converter.convert(
input_path=Path("saved_items.csv"),
output_path=Path("readlang.csv"),
parser_name="languagereactor"
)
print(f"Created {len(output_files)} file(s)")
Advanced: Custom Processing
from pathlib import Path
from words_to_readlang import get_parser, ReadlangWriter, ExampleFetcher
# Parse input
parser = get_parser("languagereactor")
entries = parser.parse(Path("saved.csv"))
# Filter or modify entries
entries = [e for e in entries if len(e.word) > 3]
# Write output with example lookup
fetcher = ExampleFetcher(delay=1.0, verbose=True)
writer = ReadlangWriter(example_fetcher=fetcher, src_lang="fin")
output_files = writer.write(entries, Path("output.csv"), fetch_examples=True)
Adding Custom Parsers
from pathlib import Path
from typing import List
from words_to_readlang import Entry, register_parser
@register_parser("myformat")
class MyFormatParser:
@property
def name(self) -> str:
return "myformat"
@property
def description(self) -> str:
return "My custom vocabulary format"
def parse(self, file_path: Path) -> List[Entry]:
entries = []
# ... parse file ...
return entries
words-to-readlang convert input.txt output.csv --format myformat
Readlang CSV Format Reference
| Column | Content | Notes |
|---|---|---|
| 1 | Word or phrase | / separates synonyms |
| 2 | Translation | / separates alternatives |
| 3 | Context sentence (optional) | Must contain the exact word from column 1 |
| 4 | Practice interval in days (optional) | |
| 5 | Next practice date (optional) | YYYY-MM-DD |
- No header row
- No newlines within cells
- Maximum 200 words per file
Development
git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e ".[dev]"
# Run tests
pytest
# Type check
ty check
# Format / lint
black src tests
ruff check src tests
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file words_to_readlang-1.3.0.tar.gz.
File metadata
- Download URL: words_to_readlang-1.3.0.tar.gz
- Upload date:
- Size: 90.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ed925ee288dd21cbb34286ce095d2c4158362ffb012ffa2a823c58b5be013fe
|
|
| MD5 |
ac0c31ad88a81c075e29ad0b0d180047
|
|
| BLAKE2b-256 |
9d117729bff93f449f1927357f99589f5e3395c3026d0804c4cdec06db878f0d
|
File details
Details for the file words_to_readlang-1.3.0-py3-none-any.whl.
File metadata
- Download URL: words_to_readlang-1.3.0-py3-none-any.whl
- Upload date:
- Size: 117.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cde27d61237b090a36ab3e23bc5a70e981c26573051b07928579cdf780b8f46
|
|
| MD5 |
2ab176fc389d84a8c0490968ddaa06bc
|
|
| BLAKE2b-256 |
ee28e1186673a5f440976640ba1d16f85e0ecee8ecdabadb01aad5293e8c1551
|