Skip to main content

Convert vocabulary exports from Pod101, Language Reactor, and more to Readlang CSV format

Project description

words-to-readlang

A Python library and CLI tool for converting vocabulary exports from language learning platforms (Pod101, Language Reactor, etc.) into the CSV format accepted by Readlang's word import.

It also comes with a web interface for uploading, previewing, editing, and exporting vocabulary — with automatic example sentence fetching from Tatoeba and the Leipzig Corpora Collection.

Background

Readlang is a reading-focused vocabulary tool with spaced-repetition flashcards. It supports importing words via CSV, but the format it expects is specific, and the exports from other learning platforms don't match it out of the box. This library bridges that gap.

Installation

pip install words-to-readlang

Or install from source:

git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e .

To use the web interface, install with the web extras:

pip install -e ".[web]"

Supported Input Formats

Pod101 (FinnishPod101, SpanishPod101, etc.)

Pod101 language learning sites let Premium subscribers save words to a Word Bank, which can be exported as a CSV file. The export is a simple two-column file (Word, English) encoded in UTF-16, which is automatically detected.

Language Reactor

Language Reactor is a browser extension for learning languages through Netflix and YouTube. Pro subscribers can save words and export them as a tab-separated file from the Saved Items panel. This format includes the base/dictionary form, translations, and subtitle sentences.

LingQ

LingQ is a language learning platform focused on extensive reading and listening. Users can export their saved vocabulary (LingQs) as a CSV file from the vocabulary section. The export contains the term, an example phrase, the translation language, and the translation.

CLI Usage

Commands

words-to-readlang convert      Convert a vocabulary file to Readlang CSV
words-to-readlang serve        Start the web interface
words-to-readlang list-formats List all available input formats

Basic Conversion

# From Pod101
words-to-readlang convert input.csv output.csv --format pod101

# From Language Reactor
words-to-readlang convert saved.csv output.csv --format languagereactor

# Short alias
words-to-readlang convert saved.csv output.csv --format lr

# From LingQ
words-to-readlang convert lingqs.csv output.csv --format lingq

If your input has more than 200 words (Readlang's import limit), the output is automatically split into multiple files: output (part 1).csv, output (part 2).csv, etc.

Example Sentence Lookup

Language Reactor saves words in their inflected form as found in subtitles. Readlang requires the example sentence to contain the exact word, so mismatches are detected and the context is cleared.

The --fetch flag fetches replacement example sentences automatically. It tries Tatoeba first, then falls back to the Leipzig Corpora Collection if Tatoeba has no match.

words-to-readlang convert input.csv output.csv --format languagereactor --fetch

This is also useful for Pod101 exports, which never include example sentences.

By default the source language is Finnish (--lang fin). Pass the ISO 639-3 code for other languages:

words-to-readlang convert words.csv output.csv --format pod101 --fetch --lang swe

convert options

Option Short Default Description
--format -f (required) Input format: pod101, languagereactor, lr, or lingq
--fetch off Fetch missing examples from Tatoeba / Leipzig
--lang fin ISO 639-3 source language code
--version Print version and exit

List Available Formats

words-to-readlang list-formats

Web Interface

# Development server (localhost only)
words-to-readlang serve

# Custom host/port
words-to-readlang serve --port 8080

# Bind to all interfaces with auto-reload
words-to-readlang serve --host 0.0.0.0 --port 8080 --debug

serve options:

Option Short Default Description
--host -h 127.0.0.1 Address to bind to
--port -p 5000 Port to listen on
--debug off Enable Flask debug mode with auto-reload
--metrics-port 0 (off) Expose Prometheus metrics on this port
--metrics-host 127.0.0.1 Address for the metrics endpoint

See README_WEB.md for full web interface and Docker deployment documentation.

Programmatic API

from pathlib import Path
from words_to_readlang import Converter

# Basic usage
converter = Converter()
output_files = converter.convert(
    input_path=Path("words.csv"),
    output_path=Path("readlang.csv"),
    parser_name="pod101"
)

# With example sentence lookup
converter = Converter(fetch_examples=True, src_lang="fin")
output_files = converter.convert(
    input_path=Path("saved_items.csv"),
    output_path=Path("readlang.csv"),
    parser_name="languagereactor"
)

print(f"Created {len(output_files)} file(s)")

Advanced: Custom Processing

from pathlib import Path
from words_to_readlang import get_parser, ReadlangWriter, ExampleFetcher

# Parse input
parser = get_parser("languagereactor")
entries = parser.parse(Path("saved.csv"))

# Filter or modify entries
entries = [e for e in entries if len(e.word) > 3]

# Write output with example lookup
fetcher = ExampleFetcher(delay=1.0, verbose=True)
writer = ReadlangWriter(example_fetcher=fetcher, src_lang="fin")
output_files = writer.write(entries, Path("output.csv"), fetch_examples=True)

Adding Custom Parsers

from pathlib import Path
from typing import List
from words_to_readlang import Entry, register_parser

@register_parser("myformat")
class MyFormatParser:
    @property
    def name(self) -> str:
        return "myformat"

    @property
    def description(self) -> str:
        return "My custom vocabulary format"

    def parse(self, file_path: Path) -> List[Entry]:
        entries = []
        # ... parse file ...
        return entries
words-to-readlang convert input.txt output.csv --format myformat

Readlang CSV Format Reference

Column Content Notes
1 Word or phrase / separates synonyms
2 Translation / separates alternatives
3 Context sentence (optional) Must contain the exact word from column 1
4 Practice interval in days (optional)
5 Next practice date (optional) YYYY-MM-DD
  • No header row
  • No newlines within cells
  • Maximum 200 words per file

Development

git clone https://codeberg.org/psy-q/words-to-readlang
cd words-to-readlang
pip install -e ".[dev]"

# Run tests
pytest

# Type check
ty check

# Format / lint
black src tests
ruff check src tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

words_to_readlang-1.5.0.tar.gz (94.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

words_to_readlang-1.5.0-py3-none-any.whl (122.3 kB view details)

Uploaded Python 3

File details

Details for the file words_to_readlang-1.5.0.tar.gz.

File metadata

  • Download URL: words_to_readlang-1.5.0.tar.gz
  • Upload date:
  • Size: 94.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for words_to_readlang-1.5.0.tar.gz
Algorithm Hash digest
SHA256 463becd4f4bbd5fac0ce81b9251f1db6c8a7579c9065079eb76ae01eb942a128
MD5 5cf8bfbd17ed1355c29ab872f26bf8b5
BLAKE2b-256 4605daa4a9aca959b1dda9470c1c857e11c60780bd4c1d708ecc319a9e4fcbde

See more details on using hashes here.

File details

Details for the file words_to_readlang-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: words_to_readlang-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 122.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for words_to_readlang-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4db6c6aa39c1bd6a543a73308686210dda057948f12345f703568b3168304f2
MD5 8b4f3b7073a36bb8f3bb21a7a93eccfb
BLAKE2b-256 fc377ded3e95fd44f3f3dc3aa1f6894a48808f71135ac58694a9cab3d6cd0607

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page