Skip to main content

Natural language processing playground

Project description

nl_processing

PyPI

Dutch language processing toolkit powered by LangChain + OpenAI.

Installation

pip install nl_processing

Modules

Module Class Description Docs
extract_text_from_image ImageTextExtractor Extract Dutch text from images via Vision API docs
extract_words_from_text WordExtractor Extract and normalize words from markdown text docs
translate_text TextTranslator Translate text (NL → RU) with markdown preservation docs
translate_word WordTranslator Batch-translate words (NL → RU) docs
database DatabaseService Remote source-of-truth persistence for words, translations, and exercise scores docs
sampling WordSampler Weighted word sampling with adversarial distractors docs
database_cache DatabaseCacheService Local-first cache for vocabulary practice (planned) docs

Each module's docs/ folder contains a product brief, PRD, and architecture doc.

Recommended: database_cache for Interactive Practice

Status: database_cache is currently in the design/documentation phase. The implementation is planned.

database_cache is a local-first cache module that accelerates the vocabulary practice loop. It sits in front of the remote database module and keeps a durable local snapshot of translated words and exercise statistics using SQLite. All reads are served entirely from local storage (sub-200ms).

Why Use It

  • Interactive practice sessions — word retrieval and score-aware sampling without network latency.
  • Offline score recording — exercise outcomes are written locally first and synced later.
  • Stale-while-revalidate — a stale cache is served immediately while a background refresh happens.
  • Safe sync — pending local writes use idempotent event IDs, so retries never double-apply scores.
  • Exercise-aware — initialized with specific exercise_types, mirrors the remote per-exercise-type tables.

How It Works

  1. Initialized with user_id, language pair, exercise_types, and cache_ttl.
  2. On init(): opens/creates local SQLite, returns CacheStatus, starts background refresh if stale.
  3. get_words() and get_word_pairs_with_scores() — read from local cache only.
  4. record_exercise_result() — updates local score and appends to a durable outbox.
  5. flush() — replays pending events to remote database using idempotent event IDs.
  6. refresh() — fetches a fresh snapshot from remote, atomically swaps, reapplies pending local events.

Planned API

from nl_processing.database_cache.service import DatabaseCacheService
from nl_processing.core.models import Language, PartOfSpeech, Word
from datetime import timedelta

cache = DatabaseCacheService(
    user_id="alex",
    source_language=Language.NL,
    target_language=Language.RU,
    exercise_types=["nl_to_ru", "multiple_choice"],
    cache_ttl=timedelta(minutes=30),
)

status = await cache.init()
pairs = await cache.get_words(word_type=PartOfSpeech.NOUN, limit=10, random=True)
scored = await cache.get_word_pairs_with_scores()

await cache.record_exercise_result(
    source_word=Word(normalized_form="fiets", word_type=PartOfSpeech.NOUN, language=Language.NL),
    exercise_type="nl_to_ru",
    delta=-1,
)

await cache.flush()

Relationship to Other Modules

  • database — authoritative remote store; database_cache consumes its export/sync APIs.
  • sampling — should use database_cache as its hot-path data source for score-aware sampling.
  • CachedDatabaseService (legacy) — superseded by database_cache; retained for backward compatibility.

Development

uv sync                # install dependencies
make check             # full lint + test pipeline
uv run pytest tests/unit   # unit tests only (free, no API key)

See docs/ENV_VARS.md for required environment variables and NEON.md for database setup.

Contributing

See docs/REALEASE_WORKFLOW.md for the release process and publishing considerations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nl_processing-0.5.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nl_processing-0.5.0-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file nl_processing-0.5.0.tar.gz.

File metadata

  • Download URL: nl_processing-0.5.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nl_processing-0.5.0.tar.gz
Algorithm Hash digest
SHA256 72656c9699517ffc52a3390889f71feaf9bbf4646b27ef522073241457c13e9e
MD5 6cd46a35399852699febe030e071a9a4
BLAKE2b-256 5ed10d6c71d1cba3e6fd369444631f02c13ff1bdbd781b1e4a6550dc0f1349a3

See more details on using hashes here.

File details

Details for the file nl_processing-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: nl_processing-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nl_processing-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 543b6af4d0af2d1ede827314842d5c2bf9102dafeec1d9bf335937ade19e47b8
MD5 017cdaa208de338169c6415e0607f6b0
BLAKE2b-256 709598eaca58d2306996391e297925d950b636586c22cab2cc542afad60a8636

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page