Skip to main content

A high-performance, asynchronous, and extensible Python package for processing files, generating embeddings, and storing them in various vector databases with optional cloud storage integration.

Project description

๐Ÿš€ EmbeddingFramework

Modular โ€ข Extensible โ€ข Production-Ready
A Python framework for embeddings, vector databases, and cloud storage providers.

Build Status PyPI Version License


๐Ÿ“š Documentation

Documentation

Rich UI Docs

Explore Features API Reference


A modular, extensible, and production-ready Python framework for working with embeddings, vector databases, and cloud storage providers.
Designed for AI, NLP, and semantic search applications, EmbeddingFramework provides a unified API to process, store, and query embeddings across multiple backends.


โœจ Features

๐Ÿ”น Multi-Vector Database Support

  • ChromaDB โ€“ Local and persistent vector storage.
  • Milvus โ€“ High-performance distributed vector database.
  • Pinecone โ€“ Fully managed vector database service.
  • Weaviate โ€“ Open-source vector search engine.

๐Ÿ”น Cloud Storage Integrations

  • AWS S3 โ€“ Store and retrieve embeddings or documents.
  • Google Cloud Storage (GCS) โ€“ Scalable object storage.
  • Azure Blob Storage โ€“ Enterprise-grade cloud storage.

๐Ÿ”น Embedding Providers

  • OpenAI Embeddings โ€“ State-of-the-art embedding generation.
  • Easily extendable to other providers.

๐Ÿ”น File Processing & Preprocessing

  • Automatic file type detection.
  • Text extraction from multiple formats.
  • Preprocessing utilities for cleaning and normalizing text.
  • Intelligent text splitting for optimal embedding performance.

๐Ÿ”น Utilities

  • Retry logic for robust API calls.
  • File utilities for safe and efficient I/O.
  • Modular architecture for easy extension.

๐Ÿ“ฆ Installation & Setup

# Basic installation
pip install embeddingframework

# With development dependencies
pip install embeddingframework[dev]

โšก Quick Start Example

from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter

# Initialize embedding provider
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")

# Initialize vector database
vector_db = ChromaDBAdapter(persist_directory="./chroma_store")

# Generate embeddings
embeddings = embedding_provider.embed_texts(["Hello world", "EmbeddingFramework is awesome!"])

# Store embeddings
vector_db.add_texts(["Hello world", "EmbeddingFramework is awesome!"], embeddings)

๐Ÿ“‚ Project Structure

embeddingframework/
โ”‚
โ”œโ”€โ”€ adapters/                # Vector DB & storage adapters
โ”‚   โ”œโ”€โ”€ base.py
โ”‚   โ”œโ”€โ”€ chromadb_adapter.py
โ”‚   โ”œโ”€โ”€ milvus_adapter.py
โ”‚   โ”œโ”€โ”€ pinecone_adapter.py
โ”‚   โ”œโ”€โ”€ weaviate_adapter.py
โ”‚   โ”œโ”€โ”€ storage/             # Cloud storage adapters
โ”‚
โ”œโ”€โ”€ processors/              # File processing logic
โ”œโ”€โ”€ utils/                    # Helper utilities
โ””โ”€โ”€ tests/                    # Test suite

๐Ÿงช Testing

pytest --maxfail=1 --disable-warnings -q

With coverage:

pytest --cov=embeddingframework --cov-report=term-missing

๐Ÿ”„ CI/CD

This project includes a GitHub Actions workflow (.github/workflows/python-package.yml) for:

  • Automated testing with coverage.
  • Version bumping & changelog generation.
  • PyPI publishing.
  • GitHub release creation.

๐Ÿ“œ License

MIT License

This project is licensed under the MIT License โ€“ see the LICENSE file for details.


๐Ÿค Contributing

Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.

  1. Fork the repository.
  2. Create a new branch (feature/my-feature).
  3. Commit your changes.
  4. Push to your branch.
  5. Open a Pull Request.

๐ŸŒŸ Why EmbeddingFramework?

  • Unified API โ€“ Work with multiple vector DBs and storage providers seamlessly.
  • Extensible โ€“ Add new adapters with minimal effort.
  • Production-Ready โ€“ Built with scalability and reliability in mind.
  • Developer-Friendly โ€“ Clean, modular, and well-documented codebase.

๐Ÿ“– Full Documentation Overview

Below is a comprehensive, end-to-end guide covering all features, usage patterns, and advanced configurations of EmbeddingFramework.

1๏ธโƒฃ Introduction

EmbeddingFramework is designed to simplify the integration of embeddings, vector databases, and cloud storage into AI-powered applications. It provides:

  • A unified API for multiple backends.
  • Extensible architecture for adding new providers.
  • Production-ready reliability with retries, error handling, and modular design.

2๏ธโƒฃ Installation

pip install embeddingframework
pip install embeddingframework[dev]  # For development

3๏ธโƒฃ Supported Vector Databases

Database Type Key Features
ChromaDB Local Persistent storage, lightweight
Milvus Distributed High-performance, scalable
Pinecone Managed Fully hosted, easy to scale
Weaviate Open-source Semantic search, hybrid queries

4๏ธโƒฃ Cloud Storage Integrations

EmbeddingFramework supports:

  • AWS S3
  • Google Cloud Storage
  • Azure Blob Storage

Example:

from embeddingframework.adapters.storage.s3_storage_adapter import S3StorageAdapter
storage = S3StorageAdapter(bucket_name="my-bucket")
storage.upload_file("local.txt", "remote.txt")

5๏ธโƒฃ Embedding Providers

Currently supported:

  • OpenAI Embeddings
  • Easily extendable to HuggingFace, Cohere, etc.

Example:

from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
provider = OpenAIEmbeddingAdapter(api_key="YOUR_KEY")
embeddings = provider.embed_texts(["Hello", "World"])

6๏ธโƒฃ File Processing

  • Automatic file type detection
  • Text extraction from PDF, DOCX, TXT
  • Preprocessing utilities for cleaning text
  • Intelligent text splitting

Example:

from embeddingframework.processors.file_processor import FileProcessor
processor = FileProcessor()
text = processor.process_file("document.pdf")

7๏ธโƒฃ Utilities

  • Retry logic
  • File utilities
  • Preprocessing helpers

8๏ธโƒฃ CLI Usage

EmbeddingFramework includes a CLI:

embeddingframework --help

9๏ธโƒฃ Advanced Configurations

  • Custom vector DB adapters
  • Custom embedding providers
  • Batch processing
  • Async support

๐Ÿ”Ÿ End-to-End Example

from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter

provider = OpenAIEmbeddingAdapter(api_key="KEY")
db = ChromaDBAdapter(persist_directory="./store")

texts = ["AI is amazing", "EmbeddingFramework is powerful"]
embeddings = provider.embed_texts(texts)
db.add_texts(texts, embeddings)

๐Ÿ“Š Feature Matrix

Feature Supported
Multi-DB Support โœ…
Cloud Storage โœ…
File Processing โœ…
Retry Logic โœ…
CLI โœ…
Async โœ…

๐Ÿ“š Learn More

For the full documentation, visit:
๐Ÿ‘‰ EmbeddingFramework Docs


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingframework-1.0.6.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embeddingframework-1.0.6-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file embeddingframework-1.0.6.tar.gz.

File metadata

  • Download URL: embeddingframework-1.0.6.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embeddingframework-1.0.6.tar.gz
Algorithm Hash digest
SHA256 8dd6a031963275822f25a50b0b9aeb6eceed4ea34488db2050fed35eeebc7d7e
MD5 1120cb430189933b5fa4dde92a8ef8d5
BLAKE2b-256 a1d30f79c4cb436de582b70ac112f09af9ee4a9d623f92902e2830fb9038a719

See more details on using hashes here.

File details

Details for the file embeddingframework-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for embeddingframework-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e4b4b35d47864d6d0079c556f375a71c499c4f6dc8c1ff4037e59937f0fc0c3f
MD5 4b222c5114e97898d61ae509b23e3055
BLAKE2b-256 681d1f3af6c981dadf48c660dd4ea7df971d8855b341eb66238e7768438ea2ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page