A high-performance, asynchronous, and extensible Python package for processing files, generating embeddings, and storing them in various vector databases with optional cloud storage integration.
Project description
๐ EmbeddingFramework
Modular โข Extensible โข Production-Ready
A Python framework for embeddings, vector databases, and cloud storage providers.
๐ Documentation
A modular, extensible, and production-ready Python framework for working with embeddings, vector databases, and cloud storage providers.
Designed for AI, NLP, and semantic search applications, EmbeddingFramework provides a unified API to process, store, and query embeddings across multiple backends.
โจ Features
๐น Multi-Vector Database Support
- ChromaDB โ Local and persistent vector storage.
- Milvus โ High-performance distributed vector database.
- Pinecone โ Fully managed vector database service.
- Weaviate โ Open-source vector search engine.
๐น Cloud Storage Integrations
- AWS S3 โ Store and retrieve embeddings or documents.
- Google Cloud Storage (GCS) โ Scalable object storage.
- Azure Blob Storage โ Enterprise-grade cloud storage.
๐น Embedding Providers
- OpenAI Embeddings โ State-of-the-art embedding generation.
- Easily extendable to other providers.
๐น File Processing & Preprocessing
- Automatic file type detection.
- Text extraction from multiple formats.
- Preprocessing utilities for cleaning and normalizing text.
- Intelligent text splitting for optimal embedding performance.
๐น Utilities
- Retry logic for robust API calls.
- File utilities for safe and efficient I/O.
- Modular architecture for easy extension.
๐ฆ Installation & Setup
# Basic installation
pip install embeddingframework
# With development dependencies
pip install embeddingframework[dev]
โก Quick Start Example
from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter
# Initialize embedding provider
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")
# Initialize vector database
vector_db = ChromaDBAdapter(persist_directory="./chroma_store")
# Generate embeddings
embeddings = embedding_provider.embed_texts(["Hello world", "EmbeddingFramework is awesome!"])
# Store embeddings
vector_db.add_texts(["Hello world", "EmbeddingFramework is awesome!"], embeddings)
๐ Project Structure
embeddingframework/
โ
โโโ adapters/ # Vector DB & storage adapters
โ โโโ base.py
โ โโโ chromadb_adapter.py
โ โโโ milvus_adapter.py
โ โโโ pinecone_adapter.py
โ โโโ weaviate_adapter.py
โ โโโ storage/ # Cloud storage adapters
โ
โโโ processors/ # File processing logic
โโโ utils/ # Helper utilities
โโโ tests/ # Test suite
๐งช Testing
pytest --maxfail=1 --disable-warnings -q
With coverage:
pytest --cov=embeddingframework --cov-report=term-missing
๐ CI/CD
This project includes a GitHub Actions workflow (.github/workflows/python-package.yml) for:
- Automated testing with coverage.
- Version bumping & changelog generation.
- PyPI publishing.
- GitHub release creation.
๐ License
This project is licensed under the MIT License โ see the LICENSE file for details.
๐ค Contributing
Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.
- Fork the repository.
- Create a new branch (
feature/my-feature). - Commit your changes.
- Push to your branch.
- Open a Pull Request.
๐ Why EmbeddingFramework?
- Unified API โ Work with multiple vector DBs and storage providers seamlessly.
- Extensible โ Add new adapters with minimal effort.
- Production-Ready โ Built with scalability and reliability in mind.
- Developer-Friendly โ Clean, modular, and well-documented codebase.
๐ Full Documentation Overview
Below is a comprehensive, end-to-end guide covering all features, usage patterns, and advanced configurations of EmbeddingFramework.
1๏ธโฃ Introduction
EmbeddingFramework is designed to simplify the integration of embeddings, vector databases, and cloud storage into AI-powered applications. It provides:
- A unified API for multiple backends.
- Extensible architecture for adding new providers.
- Production-ready reliability with retries, error handling, and modular design.
2๏ธโฃ Installation
pip install embeddingframework
pip install embeddingframework[dev] # For development
3๏ธโฃ Supported Vector Databases
| Database | Type | Key Features |
|---|---|---|
| ChromaDB | Local | Persistent storage, lightweight |
| Milvus | Distributed | High-performance, scalable |
| Pinecone | Managed | Fully hosted, easy to scale |
| Weaviate | Open-source | Semantic search, hybrid queries |
4๏ธโฃ Cloud Storage Integrations
EmbeddingFramework supports:
- AWS S3
- Google Cloud Storage
- Azure Blob Storage
Example:
from embeddingframework.adapters.storage.s3_storage_adapter import S3StorageAdapter
storage = S3StorageAdapter(bucket_name="my-bucket")
storage.upload_file("local.txt", "remote.txt")
5๏ธโฃ Embedding Providers
Currently supported:
- OpenAI Embeddings
- Easily extendable to HuggingFace, Cohere, etc.
Example:
from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
provider = OpenAIEmbeddingAdapter(api_key="YOUR_KEY")
embeddings = provider.embed_texts(["Hello", "World"])
6๏ธโฃ File Processing
- Automatic file type detection
- Text extraction from PDF, DOCX, TXT
- Preprocessing utilities for cleaning text
- Intelligent text splitting
Example:
from embeddingframework.processors.file_processor import FileProcessor
processor = FileProcessor()
text = processor.process_file("document.pdf")
7๏ธโฃ Utilities
- Retry logic
- File utilities
- Preprocessing helpers
8๏ธโฃ CLI Usage
EmbeddingFramework includes a CLI:
embeddingframework --help
9๏ธโฃ Advanced Configurations
- Custom vector DB adapters
- Custom embedding providers
- Batch processing
- Async support
๐ End-to-End Example
from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter
provider = OpenAIEmbeddingAdapter(api_key="KEY")
db = ChromaDBAdapter(persist_directory="./store")
texts = ["AI is amazing", "EmbeddingFramework is powerful"]
embeddings = provider.embed_texts(texts)
db.add_texts(texts, embeddings)
๐ Feature Matrix
| Feature | Supported |
|---|---|
| Multi-DB Support | โ |
| Cloud Storage | โ |
| File Processing | โ |
| Retry Logic | โ |
| CLI | โ |
| Async | โ |
๐ Learn More
For the full documentation, visit:
๐ EmbeddingFramework Docs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embeddingframework-1.0.6.tar.gz.
File metadata
- Download URL: embeddingframework-1.0.6.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd6a031963275822f25a50b0b9aeb6eceed4ea34488db2050fed35eeebc7d7e
|
|
| MD5 |
1120cb430189933b5fa4dde92a8ef8d5
|
|
| BLAKE2b-256 |
a1d30f79c4cb436de582b70ac112f09af9ee4a9d623f92902e2830fb9038a719
|
File details
Details for the file embeddingframework-1.0.6-py3-none-any.whl.
File metadata
- Download URL: embeddingframework-1.0.6-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4b4b35d47864d6d0079c556f375a71c499c4f6dc8c1ff4037e59937f0fc0c3f
|
|
| MD5 |
4b222c5114e97898d61ae509b23e3055
|
|
| BLAKE2b-256 |
681d1f3af6c981dadf48c660dd4ea7df971d8855b341eb66238e7768438ea2ff
|