Skip to main content

Python SDK for creating verse-based content sites with AI translations, multimedia (images, audio), semantic search, RAG-grounded Puranic context, and deployment

Project description

Sanatan Verse SDK - Python SDK for Spiritual Verse Collections

Complete toolkit for generating rich multimedia content for spiritual text collections (Hanuman Chalisa, Sundar Kaand, etc.)

Features

  • 🔄 Complete Workflow: Generate media and embeddings from canonical sources - all in one command
  • 📖 Canonical Sources: Local YAML files ensure text accuracy and quality
  • 🎨 AI Images: Generate themed images with DALL-E 3
  • 🎵 Audio Pronunciation: Full and slow-speed audio with ElevenLabs
  • 🔍 Semantic Search: Vector embeddings for intelligent verse discovery
  • 📚 Multi-Collection: Organized support for multiple verse collections
  • 🎨 Theme System: Customizable visual styles (modern, traditional, kids-friendly, etc.)

Quick Start

New Project Setup (Recommended)

# 1. Install
pip install sanatan-verse-sdk

# 2. Create project with collection templates
verse-init --project-name my-verse-project --collection hanuman-chalisa
cd my-verse-project

# 3. Configure API keys
cp .env.example .env
# Edit .env and add your API keys from:
# - OpenAI: https://platform.openai.com/api-keys
# - ElevenLabs: https://elevenlabs.io/app/settings/api-keys

# 4. Add canonical Devanagari text
# Edit data/verses/hanuman-chalisa.yaml with actual verse text

# 5. Validate setup
verse-validate

# 6. Generate multimedia content
verse-generate --collection hanuman-chalisa --verse 1

What you get: Verse file, AI-generated image, audio (full + slow speed), and search embeddings!

Existing Project

# Validate and fix structure
verse-validate --fix

# Generate content
verse-generate --collection hanuman-chalisa --verse 15

# Check status
verse-status --collection hanuman-chalisa

Advanced Usage

# Multiple collections at once
verse-init --collection hanuman-chalisa --collection sundar-kaand

# Custom number of sample verses
verse-init --collection my-collection --num-verses 10

# Generate specific components only
verse-generate --collection sundar-kaand --verse 3 --image
verse-generate --collection sundar-kaand --verse 3 --audio

# Skip embeddings update (faster)
verse-generate --collection hanuman-chalisa --verse 15 --no-update-embeddings

What Gets Generated

Each verse generation creates:

  • 🎨 Image: images/{collection}/{theme}/verse-01.png (DALL-E 3)
  • 🎵 Audio (full): audio/{collection}/verse-01-full.mp3 (ElevenLabs)
  • 🎵 Audio (slow): audio/{collection}/verse-01-slow.mp3 (0.75x speed)
  • 🔍 Embeddings: data/embeddings.json (for semantic search)

Text Source: Canonical Devanagari text from data/verses/{collection}.yaml (Local Verses Guide)

Puranic Context Generation

Enrich verse pages with grounded story references from indexed sacred texts. Two-stage workflow:

Stage 1 — Index a Source Text

verse-index-sources --file data/sources/ananda-ramayana.txt

This command:

  1. Splits the source text into ~4000-char chunks
  2. Parses each chunk into discrete named episodes (keywords, type, summary in English + Hindi)
  3. Generates embeddings for each episode
  4. Writes outputs:
    • data/puranic-index/{key}.yml — human-readable episode index with _meta section
    • data/embeddings/{key}.json — embedding vectors for RAG retrieval
    • data/puranic-references.yml — registry of indexed sources

Only needs to run once per source, or when the source file changes.

# Use Bedrock Cohere for better Sanskrit/Hindi accuracy
verse-index-sources --file data/sources/shiv-puran.txt --provider bedrock-cohere

# Larger chunk size for dense Puranic prose
verse-index-sources --file data/sources/valmiki-ramayana.pdf --chunk-size 6000

Stage 2 — Generate Puranic Context per Verse

verse-puranic-context --collection hanuman-chalisa --all

For each verse this command:

  1. Embeds the verse text using the same provider as the indexed source
  2. Runs cosine similarity search across all indexed sources to find the most relevant episodes
  3. Filters to episodes involving the collection's subject (configured in _data/collections.yml)
  4. Passes top episodes + verse text to GPT-4o with citation constraints
  5. Post-validates each entry: drops entries where the subject is not an active participant
  6. Writes puranic_context: block into the verse's .md frontmatter
# Skip verses that already have context (default)
verse-puranic-context --collection hanuman-chalisa --all

# Regenerate all existing entries
verse-puranic-context --collection hanuman-chalisa --all --regenerate

# Single verse
verse-puranic-context --collection hanuman-chalisa --verse chaupai-06

Collection Subject Configuration

The subject filter is read automatically from _data/collections.yml — no CLI flag needed:

hanuman-chalisa:
  enabled: true
  name:
    en: Hanuman Chalisa
    hi: हनुमान चालीसा
  subject: Hanuman       # primary deity/subject of this collection
  subject_type: deity    # deity | avatar | concept | figure
  permalink_base: /hanuman-chalisa
  total_verses: 43

Multiple Sources

Multiple indexed sources are automatically combined in RAG retrieval:

data/sources/
  shiv-puran-part1.txt
  ananda-ramayana.txt        ← add new sources here

data/puranic-index/
  shiv-puran-part1.yml       ← auto-generated episode index
  ananda-ramayana.yml

data/embeddings/
  shiv-puran-part1.json      ← auto-generated embedding vectors
  ananda-ramayana.json

See verse-index-sources and verse-puranic-context for full documentation.

Installation

pip install sanatan-verse-sdk

Commands

Project Setup

  • verse-init - Initialize new project with recommended structure
  • verse-validate - Validate project structure and configuration

Content Generation

Puranic Context

  • verse-index-sources - Index Puranic source texts (PDFs, TXTs) into episodes and embeddings for RAG retrieval
  • verse-puranic-context - Generate Puranic context boxes for verses (RAG-grounded or GPT-4o free recall)

Project Management

  • verse-add - Add new verse entries to collections (supports multi-chapter formats)
  • verse-status - Check status, completion, and validate text against canonical source
  • verse-sync - Sync verse text with canonical source (fix mismatches)
  • verse-deploy - Deploy Cloudflare Worker for API proxy

Configuration

Copy the example environment file and add your API keys:

cp .env.example .env
# Edit .env and add your API keys

See the Usage Guide for detailed information on project structure, workflows, batch processing, and cost optimization.

Documentation

Example Project

Hanuman GPT - Multi-collection project with Hanuman Chalisa, Sundar Kaand, and Sankat Mochan Hanumanashtak

Requirements

  • Python 3.8+
  • OpenAI API key (for text/images/embeddings)
  • ElevenLabs API key (for audio)

License

MIT License - See LICENSE file for details

Support

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanatan_verse_sdk-0.31.3.tar.gz (103.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanatan_verse_sdk-0.31.3-py3-none-any.whl (120.1 kB view details)

Uploaded Python 3

File details

Details for the file sanatan_verse_sdk-0.31.3.tar.gz.

File metadata

  • Download URL: sanatan_verse_sdk-0.31.3.tar.gz
  • Upload date:
  • Size: 103.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sanatan_verse_sdk-0.31.3.tar.gz
Algorithm Hash digest
SHA256 3e0e27d9a5a59e2cbf38ecdc93642b61a152536cd4e5bec40aefccf78bca9ade
MD5 9e2a74fca696835022d12948a2481862
BLAKE2b-256 d21200d1914a1b42f90abddb6e0ca9112c92efde4d5b045b9f53065e3f9a8c55

See more details on using hashes here.

File details

Details for the file sanatan_verse_sdk-0.31.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sanatan_verse_sdk-0.31.3-py3-none-any.whl
Algorithm Hash digest
SHA256 67f9187e4252a30e640a977e6fee7d56b6351d802762e00510120310f744a4b3
MD5 06c358a3062c2ee36d7303cdf26fbdf6
BLAKE2b-256 b8d570d313802a26e68ccae7230cab20f0f7be51a5d39b8e4dc5b309fbb7f08f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page