Natural language processing playground
Project description
nl_processing
Dutch language processing toolkit powered by LangChain + OpenAI.
Installation
pip install nl_processing
Modules
| Module | Class | Description | Docs |
|---|---|---|---|
extract_text_from_image |
ImageTextExtractor |
Extract Dutch text from images via Vision API | docs |
extract_words_from_text |
WordExtractor |
Extract and normalize words from markdown text | docs |
translate_text |
TextTranslator |
Translate text (NL → RU) with markdown preservation | docs |
translate_word |
WordTranslator |
Batch-translate words (NL → RU) | docs |
database |
DatabaseService |
Remote source-of-truth persistence for words, translations, and exercise scores | docs |
sampling |
WordSampler |
Weighted word sampling with adversarial distractors | docs |
database_cache |
DatabaseCacheService |
Local-first cache for vocabulary practice (planned) | docs |
Each module's docs/ folder contains a product brief, PRD, and architecture doc.
Recommended: database_cache for Interactive Practice
Status:
database_cacheis currently in the design/documentation phase. The implementation is planned.
database_cache is a local-first cache module that accelerates the vocabulary practice loop. It sits in front of the remote database module and keeps a durable local snapshot of translated words and exercise statistics using SQLite. All reads are served entirely from local storage (sub-200ms).
Why Use It
- Interactive practice sessions — word retrieval and score-aware sampling without network latency.
- Offline score recording — exercise outcomes are written locally first and synced later.
- Stale-while-revalidate — a stale cache is served immediately while a background refresh happens.
- Safe sync — pending local writes use idempotent event IDs, so retries never double-apply scores.
- Exercise-aware — initialized with specific
exercise_types, mirrors the remote per-exercise-type tables.
How It Works
- Initialized with
user_id, language pair,exercise_types, andcache_ttl. - On
init(): opens/creates local SQLite, returnsCacheStatus, starts background refresh if stale. get_words()andget_word_pairs_with_scores()— read from local cache only.record_exercise_result()— updates local score and appends to a durable outbox.flush()— replays pending events to remotedatabaseusing idempotent event IDs.refresh()— fetches a fresh snapshot from remote, atomically swaps, reapplies pending local events.
Planned API
from nl_processing.database_cache.service import DatabaseCacheService
from nl_processing.core.models import Language, PartOfSpeech, Word
from datetime import timedelta
cache = DatabaseCacheService(
user_id="alex",
source_language=Language.NL,
target_language=Language.RU,
exercise_types=["nl_to_ru", "multiple_choice"],
cache_ttl=timedelta(minutes=30),
)
status = await cache.init()
pairs = await cache.get_words(word_type=PartOfSpeech.NOUN, limit=10, random=True)
scored = await cache.get_word_pairs_with_scores()
await cache.record_exercise_result(
source_word=Word(normalized_form="fiets", word_type=PartOfSpeech.NOUN, language=Language.NL),
exercise_type="nl_to_ru",
delta=-1,
)
await cache.flush()
Relationship to Other Modules
database— authoritative remote store;database_cacheconsumes its export/sync APIs.sampling— should usedatabase_cacheas its hot-path data source for score-aware sampling.CachedDatabaseService(legacy) — superseded bydatabase_cache; retained for backward compatibility.
Development
uv sync # install dependencies
make check # full lint + test pipeline
uv run pytest tests/unit # unit tests only (free, no API key)
See docs/ENV_VARS.md for required environment variables and NEON.md for database setup.
Contributing
See docs/REALEASE_WORKFLOW.md for the release process and publishing considerations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nl_processing-0.3.0.tar.gz.
File metadata
- Download URL: nl_processing-0.3.0.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b1106c66523352fb39b9d8e6d039e97658b5a953e057dc523da83b2c280db52
|
|
| MD5 |
90671109fda1258a9d8cd038cf69e638
|
|
| BLAKE2b-256 |
a5100c2855c37edddee1792504630e9fbf3e682df1fe6f8e339347ebf1a56632
|
File details
Details for the file nl_processing-0.3.0-py3-none-any.whl.
File metadata
- Download URL: nl_processing-0.3.0-py3-none-any.whl
- Upload date:
- Size: 2.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8598b59fab547e84618a7f045ffffc0f55214d2ba8e2b1bf5329440762317949
|
|
| MD5 |
8186673142210a036f87bc2ac2acdb4b
|
|
| BLAKE2b-256 |
5dac57cae7ea76ae6cefe0c3d0dd523ce786417809e63e04db059ac3a5bfb753
|