Skip to main content

A high-performance, asynchronous, and extensible Python package for processing files, generating embeddings, and storing them in various vector databases with optional cloud storage integration.

Project description

๐Ÿš€ EmbeddingFramework

Modular โ€ข Extensible โ€ข Production-Ready
A Python framework for embeddings, vector databases, and cloud storage providers.

Build Status PyPI Version License


A modular, extensible, and production-ready Python framework for working with embeddings, vector databases, and cloud storage providers.
Designed for AI, NLP, and semantic search applications, EmbeddingFramework provides a unified API to process, store, and query embeddings across multiple backends.


โœจ Features

๐Ÿ”น Multi-Vector Database Support

  • ChromaDB โ€“ Local and persistent vector storage.
  • Milvus โ€“ High-performance distributed vector database.
  • Pinecone โ€“ Fully managed vector database service.
  • Weaviate โ€“ Open-source vector search engine.

๐Ÿ”น Cloud Storage Integrations

  • AWS S3 โ€“ Store and retrieve embeddings or documents.
  • Google Cloud Storage (GCS) โ€“ Scalable object storage.
  • Azure Blob Storage โ€“ Enterprise-grade cloud storage.

๐Ÿ”น Embedding Providers

  • OpenAI Embeddings โ€“ State-of-the-art embedding generation.
  • Easily extendable to other providers.

๐Ÿ”น File Processing & Preprocessing

  • Automatic file type detection.
  • Text extraction from multiple formats.
  • Preprocessing utilities for cleaning and normalizing text.
  • Intelligent text splitting for optimal embedding performance.

๐Ÿ”น Utilities

  • Retry logic for robust API calls.
  • File utilities for safe and efficient I/O.
  • Modular architecture for easy extension.

๐Ÿ“ฆ Installation & Setup

# Basic installation
pip install embeddingframework

# With development dependencies
pip install embeddingframework[dev]

โšก Quick Start Example

from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter

# Initialize embedding provider
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")

# Initialize vector database
vector_db = ChromaDBAdapter(persist_directory="./chroma_store")

# Generate embeddings
embeddings = embedding_provider.embed_texts(["Hello world", "EmbeddingFramework is awesome!"])

# Store embeddings
vector_db.add_texts(["Hello world", "EmbeddingFramework is awesome!"], embeddings)

๐Ÿ“‚ Project Structure

embeddingframework/
โ”‚
โ”œโ”€โ”€ adapters/                # Vector DB & storage adapters
โ”‚   โ”œโ”€โ”€ base.py
โ”‚   โ”œโ”€โ”€ chromadb_adapter.py
โ”‚   โ”œโ”€โ”€ milvus_adapter.py
โ”‚   โ”œโ”€โ”€ pinecone_adapter.py
โ”‚   โ”œโ”€โ”€ weaviate_adapter.py
โ”‚   โ”œโ”€โ”€ storage/             # Cloud storage adapters
โ”‚
โ”œโ”€โ”€ processors/              # File processing logic
โ”œโ”€โ”€ utils/                    # Helper utilities
โ””โ”€โ”€ tests/                    # Test suite

๐Ÿงช Testing

pytest --maxfail=1 --disable-warnings -q

With coverage:

pytest --cov=embeddingframework --cov-report=term-missing

๐Ÿ”„ CI/CD

This project includes a GitHub Actions workflow (.github/workflows/python-package.yml) for:

  • Automated testing with coverage.
  • Version bumping & changelog generation.
  • PyPI publishing.
  • GitHub release creation.

๐Ÿ“œ License

MIT License

This project is licensed under the MIT License โ€“ see the LICENSE file for details.


๐Ÿค Contributing

Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.

  1. Fork the repository.
  2. Create a new branch (feature/my-feature).
  3. Commit your changes.
  4. Push to your branch.
  5. Open a Pull Request.

๐ŸŒŸ Why EmbeddingFramework?

  • Unified API โ€“ Work with multiple vector DBs and storage providers seamlessly.
  • Extensible โ€“ Add new adapters with minimal effort.
  • Production-Ready โ€“ Built with scalability and reliability in mind.
  • Developer-Friendly โ€“ Clean, modular, and well-documented codebase.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingframework-1.0.2.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embeddingframework-1.0.2-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file embeddingframework-1.0.2.tar.gz.

File metadata

  • Download URL: embeddingframework-1.0.2.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embeddingframework-1.0.2.tar.gz
Algorithm Hash digest
SHA256 2b59b076fd2a6ee0c998c374d192ae1315e9b8fac9cbf2faf540250ab54ac3a3
MD5 b8c3fa3753235617b59db15b4a4c45eb
BLAKE2b-256 60ebfa37ef8b3f8feeb1cb768547ce28382829b5d26358959793716585a3bdfd

See more details on using hashes here.

File details

Details for the file embeddingframework-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for embeddingframework-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ee376a22b0158b43d94b1e678e8c5978705d4b40ebeae86e9e88a12c39c90124
MD5 411f6ef8ab17ac6c264de4f024f901c4
BLAKE2b-256 c5b5a44eb754049d2316f09cbc0423c7d12f1cb90313dd443aaf5bdea09611f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page