Skip to main content

A high-performance, asynchronous, and extensible Python package for processing files, generating embeddings, and storing them in various vector databases with optional cloud storage integration.

Project description

EmbeddingFramework

A modular, extensible, and production-ready Python framework for working with embeddings, vector databases, and cloud storage providers.
Designed for AI, NLP, and semantic search applications, EmbeddingFramework provides a unified API to process, store, and query embeddings across multiple backends.


๐Ÿš€ Features

๐Ÿ”น Multi-Vector Database Support

  • ChromaDB โ€“ Local and persistent vector storage.
  • Milvus โ€“ High-performance distributed vector database.
  • Pinecone โ€“ Fully managed vector database service.
  • Weaviate โ€“ Open-source vector search engine.

๐Ÿ”น Cloud Storage Integrations

  • AWS S3 โ€“ Store and retrieve embeddings or documents.
  • Google Cloud Storage (GCS) โ€“ Scalable object storage.
  • Azure Blob Storage โ€“ Enterprise-grade cloud storage.

๐Ÿ”น Embedding Providers

  • OpenAI Embeddings โ€“ State-of-the-art embedding generation.
  • Easily extendable to other providers.

๐Ÿ”น File Processing & Preprocessing

  • Automatic file type detection.
  • Text extraction from multiple formats.
  • Preprocessing utilities for cleaning and normalizing text.
  • Intelligent text splitting for optimal embedding performance.

๐Ÿ”น Utilities

  • Retry logic for robust API calls.
  • File utilities for safe and efficient I/O.
  • Modular architecture for easy extension.

๐Ÿ“ฆ Installation

# Basic installation
pip install embeddingframework

# With development dependencies
pip install embeddingframework[dev]

โšก Quick Start

from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter

# Initialize embedding provider
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")

# Initialize vector database
vector_db = ChromaDBAdapter(persist_directory="./chroma_store")

# Generate embeddings
embeddings = embedding_provider.embed_texts(["Hello world", "EmbeddingFramework is awesome!"])

# Store embeddings
vector_db.add_texts(["Hello world", "EmbeddingFramework is awesome!"], embeddings)

๐Ÿ›  Project Structure

embeddingframework/
โ”‚
โ”œโ”€โ”€ adapters/                # Vector DB & storage adapters
โ”‚   โ”œโ”€โ”€ base.py
โ”‚   โ”œโ”€โ”€ chromadb_adapter.py
โ”‚   โ”œโ”€โ”€ milvus_adapter.py
โ”‚   โ”œโ”€โ”€ pinecone_adapter.py
โ”‚   โ”œโ”€โ”€ weaviate_adapter.py
โ”‚   โ”œโ”€โ”€ storage/             # Cloud storage adapters
โ”‚
โ”œโ”€โ”€ processors/              # File processing logic
โ”œโ”€โ”€ utils/                    # Helper utilities
โ””โ”€โ”€ tests/                    # Test suite

๐Ÿงช Running Tests

pytest --maxfail=1 --disable-warnings -q

With coverage:

pytest --cov=embeddingframework --cov-report=term-missing

๐Ÿ”„ CI/CD Workflow

This project includes a GitHub Actions workflow (.github/workflows/python-package.yml) for:

  • Automated testing with coverage.
  • Version bumping & changelog generation.
  • PyPI publishing.
  • GitHub release creation.

๐Ÿ“œ License

This project is licensed under the MIT License โ€“ see the LICENSE file for details.


๐Ÿค Contributing

  1. Fork the repository.
  2. Create a new branch (feature/my-feature).
  3. Commit your changes.
  4. Push to your branch.
  5. Open a Pull Request.

๐ŸŒŸ Why EmbeddingFramework?

  • Unified API โ€“ Work with multiple vector DBs and storage providers seamlessly.
  • Extensible โ€“ Add new adapters with minimal effort.
  • Production-Ready โ€“ Built with scalability and reliability in mind.
  • Developer-Friendly โ€“ Clean, modular, and well-documented codebase.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingframework-1.0.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embeddingframework-1.0.1-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file embeddingframework-1.0.1.tar.gz.

File metadata

  • Download URL: embeddingframework-1.0.1.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embeddingframework-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9c43827254eb2d4d42f96ef9187f3f8bbab4c19e6a2f0dc74804d7ae31d520dd
MD5 9cf2e2bdf3ef3aed78616ba5030f6d81
BLAKE2b-256 7536edff2d4dacb049cab7e5c687baf350c04930eff2420acb64dd442154ea2b

See more details on using hashes here.

File details

Details for the file embeddingframework-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for embeddingframework-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ae4cd0c6f25263aa8dc140265b67f96b153db355375eed824cebc533a38f822
MD5 9848bab43042dcb3963d66796fddf794
BLAKE2b-256 bf4210aff376ee117f686d4582655f6c831aba876f9bd075e953e2f88725ce45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page