Skip to main content

Modal-powered embedding pipeline for Krira chunks with Pinecone upsert.

Project description

krira-embed

krira-embed provides a Modal-powered embedding pipeline for chunked text and upserts vectors to Pinecone.

Features

  • Batch chunk ingestion from JSONL (text, optional metadata, optional id)
  • Distributed embedding jobs with Modal
  • Pinecone upsert with deterministic ID fallback
  • Simple Python client API: KriraEmbedding

Requirements

  • Python >=3.10,<3.13
  • Modal account and token (MODAL_TOKEN_ID, MODAL_TOKEN_SECRET)
  • Pinecone API key (PINECONE_API_KEY)
  • Chunked JSONL file (for example, chunks.jsonl)

Installation

pip install krira-embed

Quickstart

from krira_embed import KriraEmbedding

client = KriraEmbedding(
    chunk_file_path="chunks.jsonl",
    pinecone_api_key="YOUR_PINECONE_API_KEY",
    pinecone_index_name="YOUR_INDEX_NAME",
    namespace="default",
)

result = client.embed(
    worker_batch_size=12000,
    parallel_jobs=6,
    model_batch_size=768,
    upsert_batch_size=200,
)

print(result)

Credentials model

  • End users provide Pinecone credentials explicitly in code (pinecone_api_key, pinecone_index_name).
  • The package does not read local .env for Pinecone credentials.
  • Modal credentials can still be supplied via environment variables (MODAL_TOKEN_ID, MODAL_TOKEN_SECRET) or existing Modal auth setup.

Modal CLI usage

When using the Modal entrypoint directly, pass both values explicitly:

modal run main.py --index-name YOUR_INDEX_NAME --pinecone-api-key YOUR_PINECONE_API_KEY

Use .env only for your local Modal tokens if needed (see .env.example).

Local validation (maintainers)

python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krira_embed-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krira_embed-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file krira_embed-0.1.0.tar.gz.

File metadata

  • Download URL: krira_embed-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for krira_embed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3198f07b89ae172bc229f2e878c8396516481d240ebc3bbc608c290639c85bcb
MD5 6db16e5d6e4b4761fc88a914f03e7072
BLAKE2b-256 c74444108f2fd71d3b18ad1d0c43f3e6dd46eeff1b4a21fabdb148fd3fc673b0

See more details on using hashes here.

File details

Details for the file krira_embed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: krira_embed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for krira_embed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3aee6db9dbeacacb5300855f98c8439afc681c8e14dc50acdc1b4a885c89933f
MD5 3a578ebb408367ed657e5090c6de9594
BLAKE2b-256 46a61a0f48ce5f5dc11910b40f4e8d009d77bdcf3179bc00977313299f58ff09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page