Skip to main content

Convert vocabulary exports from Pod101, Language Reactor, and more to Readlang CSV format

Project description

words-to-readlang

A Python library and CLI tool for converting vocabulary exports from language learning platforms (Pod101, Language Reactor, etc.) into the CSV format accepted by Readlang's word import.

It also comes with a web interface for uploading, previewing, editing, and exporting vocabulary — with automatic example sentence fetching from Tatoeba and the Leipzig Corpora Collection.

Background

Readlang is a reading-focused vocabulary tool with spaced-repetition flashcards. It supports importing words via CSV, but the format it expects is specific, and the exports from other learning platforms don't match it out of the box. This library bridges that gap.

Installation

pip install words-to-readlang

Or install from source:

git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e .

To use the web interface, install with the web extras:

pip install -e ".[web]"

Supported Input Formats

Pod101 (FinnishPod101, SpanishPod101, etc.)

Pod101 language learning sites let Premium subscribers save words to a Word Bank, which can be exported as a CSV file. The export is a simple two-column file (Word, English) encoded in UTF-16, which is automatically detected.

Language Reactor

Language Reactor is a browser extension for learning languages through Netflix and YouTube. Pro subscribers can save words and export them as a tab-separated file from the Saved Items panel. This format includes the base/dictionary form, translations, and subtitle sentences.

CLI Usage

Commands

words-to-readlang convert      Convert a vocabulary file to Readlang CSV
words-to-readlang serve        Start the web interface
words-to-readlang list-formats List all available input formats

Basic Conversion

# From Pod101
words-to-readlang convert input.csv output.csv --format pod101

# From Language Reactor
words-to-readlang convert saved.csv output.csv --format languagereactor

# Short alias
words-to-readlang convert saved.csv output.csv --format lr

If your input has more than 200 words (Readlang's import limit), the output is automatically split into multiple files: output (part 1).csv, output (part 2).csv, etc.

Example Sentence Lookup

Language Reactor saves words in their inflected form as found in subtitles. Readlang requires the example sentence to contain the exact word, so mismatches are detected and the context is cleared.

The --fetch flag fetches replacement example sentences automatically. It tries Tatoeba first, then falls back to the Leipzig Corpora Collection if Tatoeba has no match.

words-to-readlang convert input.csv output.csv --format languagereactor --fetch

This is also useful for Pod101 exports, which never include example sentences.

By default the source language is Finnish (--lang fin). Pass the ISO 639-3 code for other languages:

words-to-readlang convert words.csv output.csv --format pod101 --fetch --lang swe

convert options

Option Short Default Description
--format -f (required) Input format: pod101, languagereactor, or lr
--fetch off Fetch missing examples from Tatoeba / Leipzig
--lang fin ISO 639-3 source language code
--version Print version and exit

List Available Formats

words-to-readlang list-formats

Web Interface

# Development server (localhost only)
words-to-readlang serve

# Custom host/port
words-to-readlang serve --port 8080

# Bind to all interfaces with auto-reload
words-to-readlang serve --host 0.0.0.0 --port 8080 --debug

serve options:

Option Short Default Description
--host -h 127.0.0.1 Address to bind to
--port -p 5000 Port to listen on
--debug off Enable Flask debug mode with auto-reload

See README_WEB.md for full web interface and Docker deployment documentation.

Programmatic API

from pathlib import Path
from words_to_readlang import Converter

# Basic usage
converter = Converter()
output_files = converter.convert(
    input_path=Path("words.csv"),
    output_path=Path("readlang.csv"),
    parser_name="pod101"
)

# With example sentence lookup
converter = Converter(use_tatoeba=True, src_lang="fin")
output_files = converter.convert(
    input_path=Path("saved_items.csv"),
    output_path=Path("readlang.csv"),
    parser_name="languagereactor"
)

print(f"Created {len(output_files)} file(s)")

Advanced: Custom Processing

from pathlib import Path
from words_to_readlang import get_parser, ReadlangWriter, ExampleFetcher

# Parse input
parser = get_parser("languagereactor")
entries = parser.parse(Path("saved.csv"))

# Filter or modify entries
entries = [e for e in entries if len(e.word) > 3]

# Write output with example lookup
fetcher = ExampleFetcher(delay=1.0, verbose=True)
writer = ReadlangWriter(example_fetcher=fetcher, src_lang="fin")
output_files = writer.write(entries, Path("output.csv"), use_tatoeba=True)

Adding Custom Parsers

from pathlib import Path
from typing import List
from words_to_readlang import Entry, register_parser

@register_parser("myformat")
class MyFormatParser:
    @property
    def name(self) -> str:
        return "myformat"

    @property
    def description(self) -> str:
        return "My custom vocabulary format"

    def parse(self, file_path: Path) -> List[Entry]:
        entries = []
        # ... parse file ...
        return entries
words-to-readlang convert input.txt output.csv --format myformat

Readlang CSV Format Reference

Column Content Notes
1 Word or phrase / separates synonyms
2 Translation / separates alternatives
3 Context sentence (optional) Must contain the exact word from column 1
4 Practice interval in days (optional)
5 Next practice date (optional) YYYY-MM-DD
  • No header row
  • No newlines within cells
  • Maximum 200 words per file

Development

git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e ".[dev]"

# Run tests
pytest

# Type check
ty check

# Format / lint
black src tests
ruff check src tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

words_to_readlang-1.1.0.tar.gz (82.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

words_to_readlang-1.1.0-py3-none-any.whl (112.0 kB view details)

Uploaded Python 3

File details

Details for the file words_to_readlang-1.1.0.tar.gz.

File metadata

  • Download URL: words_to_readlang-1.1.0.tar.gz
  • Upload date:
  • Size: 82.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for words_to_readlang-1.1.0.tar.gz
Algorithm Hash digest
SHA256 85ef092f9e682503e473ea3731c262a64ed7545e695e703664b62774ea5e893c
MD5 317088e5f8195b97c5be715d1d3a9c94
BLAKE2b-256 1413b3f3e772fa9d179ce35932010a11e0581037b1e1e953dd086610a26f556f

See more details on using hashes here.

File details

Details for the file words_to_readlang-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: words_to_readlang-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 112.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for words_to_readlang-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 606d81b471f7ece33a8402423d36126ad5ed0d2c8454ab362181738666721656
MD5 dba82621652481accaa901b2a486a5e9
BLAKE2b-256 23c2bbab0745b25254300da600e9e4524078d1c8e1086ebc0fedfeb7eedb11ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page