Skip to main content

Natural language processing playground

Project description

nl_processing

PyPI

Dutch language processing toolkit powered by LangChain + OpenAI.

Installation

pip install nl_processing

Modules

Module Class Description Docs
extract_text_from_image ImageTextExtractor Extract Dutch text from images via Vision API docs
extract_words_from_text WordExtractor Extract and normalize words from markdown text docs
translate_text TextTranslator Translate text (NL → RU) with markdown preservation docs
translate_word WordTranslator Batch-translate words (NL → RU) docs
database DatabaseService Remote source-of-truth persistence for words, translations, and exercise scores docs
sampling WordSampler Weighted word sampling with adversarial distractors docs
database_cache DatabaseCacheService Local-first cache for vocabulary practice (planned) docs

Each module's docs/ folder contains a product brief, PRD, and architecture doc.

Recommended: database_cache for Interactive Practice

Status: database_cache is currently in the design/documentation phase. The implementation is planned.

database_cache is a local-first cache module that accelerates the vocabulary practice loop. It sits in front of the remote database module and keeps a durable local snapshot of translated words and exercise statistics using SQLite. All reads are served entirely from local storage (sub-200ms).

Why Use It

  • Interactive practice sessions — word retrieval and score-aware sampling without network latency.
  • Offline score recording — exercise outcomes are written locally first and synced later.
  • Stale-while-revalidate — a stale cache is served immediately while a background refresh happens.
  • Safe sync — pending local writes use idempotent event IDs, so retries never double-apply scores.
  • Exercise-aware — initialized with specific exercise_types, mirrors the remote per-exercise-type tables.

How It Works

  1. Initialized with user_id, language pair, exercise_types, and cache_ttl.
  2. On init(): opens/creates local SQLite, returns CacheStatus, starts background refresh if stale.
  3. get_words() and get_word_pairs_with_scores() — read from local cache only.
  4. record_exercise_result() — updates local score and appends to a durable outbox.
  5. flush() — replays pending events to remote database using idempotent event IDs.
  6. refresh() — fetches a fresh snapshot from remote, atomically swaps, reapplies pending local events.

Planned API

from nl_processing.database_cache.service import DatabaseCacheService
from nl_processing.core.models import Language, PartOfSpeech, Word
from datetime import timedelta

cache = DatabaseCacheService(
    user_id="alex",
    source_language=Language.NL,
    target_language=Language.RU,
    exercise_types=["nl_to_ru", "multiple_choice"],
    cache_ttl=timedelta(minutes=30),
)

status = await cache.init()
pairs = await cache.get_words(word_type=PartOfSpeech.NOUN, limit=10, random=True)
scored = await cache.get_word_pairs_with_scores()

await cache.record_exercise_result(
    source_word=Word(normalized_form="fiets", word_type=PartOfSpeech.NOUN, language=Language.NL),
    exercise_type="nl_to_ru",
    delta=-1,
)

await cache.flush()

Relationship to Other Modules

  • database — authoritative remote store; database_cache consumes its export/sync APIs.
  • sampling — should use database_cache as its hot-path data source for score-aware sampling.
  • CachedDatabaseService (legacy) — superseded by database_cache; retained for backward compatibility.

Development

uv sync                # install dependencies
make check             # full lint + test pipeline
uv run pytest tests/unit   # unit tests only (free, no API key)

See docs/ENV_VARS.md for required environment variables and NEON.md for database setup.

Contributing

See docs/REALEASE_WORKFLOW.md for the release process and publishing considerations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nl_processing-0.3.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nl_processing-0.3.0-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file nl_processing-0.3.0.tar.gz.

File metadata

  • Download URL: nl_processing-0.3.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nl_processing-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3b1106c66523352fb39b9d8e6d039e97658b5a953e057dc523da83b2c280db52
MD5 90671109fda1258a9d8cd038cf69e638
BLAKE2b-256 a5100c2855c37edddee1792504630e9fbf3e682df1fe6f8e339347ebf1a56632

See more details on using hashes here.

File details

Details for the file nl_processing-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: nl_processing-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nl_processing-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8598b59fab547e84618a7f045ffffc0f55214d2ba8e2b1bf5329440762317949
MD5 8186673142210a036f87bc2ac2acdb4b
BLAKE2b-256 5dac57cae7ea76ae6cefe0c3d0dd523ce786417809e63e04db059ac3a5bfb753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page