Scalable Python SDK for document management and collaboration across organizations and AI agents
Project description
DocVault SDK
Scalable Python SDK for document management and collaboration across organizations and AI agents.
DocVault provides a complete solution for document upload, management, version control, and access control. It supports multi-organization isolation, role-based permissions, and integrates seamlessly with PostgreSQL and MinIO/S3 storage.
โจ Features
- ๐ Document Management: Upload, download, update, and delete documents
- ๐ Access Control: Role-based permissions (READ, WRITE, DELETE, SHARE, ADMIN)
- ๐ Version Control: Full document history with restore capabilities
- ๐ข Multi-Organization: Strong isolation between organizations
- ๐ Full-Text Search: PostgreSQL-powered document search
- โ๏ธ Cloud Storage: MinIO/S3 integration for binary file storage
- ๐ค AI Agent Support: Designed for both human and AI agent collaboration
- โก High Performance: Built with async-first design using psqlpy
๐ Quick Start
Installation
# Using uv (recommended)
uv add docvault-sdk
# Or using pip
pip install docvault-sdk
Basic Usage
import asyncio
from doc_vault import DocVaultSDK
async def main():
# Initialize SDK (loads config from .env)
async with DocVaultSDK() as vault:
# Upload a document
document = await vault.upload(
file_path="./report.pdf",
name="Q4 Financial Report",
organization_id="org-123",
agent_id="agent-456"
)
# Download the document
content = await vault.download(
document_id=document.id,
agent_id="agent-456"
)
print(f"Uploaded document: {document.name}")
asyncio.run(main())
Try the Examples
Get started quickly with our comprehensive examples:
# Clone and setup
git clone https://github.com/docvault/doc-vault.git
cd doc-vault
# Install dependencies
uv sync
# Start services
docker-compose up -d
# Run basic usage example
uv run python examples/basic_usage.py
See Examples for detailed usage patterns including access control, versioning, and multi-organization scenarios.
๐ Requirements
- Python: 3.10+
- Database: PostgreSQL 14+
- Storage: MinIO or AWS S3
- Memory: 512MB+ RAM
- Disk: Depends on document storage needs
โ๏ธ Configuration
DocVault supports three flexible configuration patterns to support different deployment scenarios:
1. Direct Python Configuration (Recommended for PyPI Users)
For maximum control in your application, pass a Config object directly:
from doc_vault import DocVaultSDK
from doc_vault.config import Config
# Create configuration programmatically
config = Config(
# PostgreSQL Configuration
postgres_host="localhost",
postgres_port=5432,
postgres_user="postgres",
postgres_password="your_password",
postgres_db="doc_vault",
postgres_ssl="disable",
# MinIO/S3 Configuration
minio_endpoint="localhost:9000",
minio_access_key="minioadmin",
minio_secret_key="minioadmin",
minio_secure=False,
# DocVault Configuration
bucket_prefix="doc-vault",
log_level="INFO"
)
# Use the configuration
async with DocVaultSDK(config=config) as vault:
document = await vault.upload(
file_path="./report.pdf",
name="Report",
organization_id="org-123",
agent_id="agent-456"
)
2. Environment Variables (Recommended for Docker/Kubernetes)
Set environment variables before running your application:
# PostgreSQL
export POSTGRES_HOST=postgres
export POSTGRES_PORT=5432
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=password
export POSTGRES_DB=doc_vault
export POSTGRES_SSL=disable
# MinIO/S3
export MINIO_ENDPOINT=minio:9000
export MINIO_ACCESS_KEY=minioadmin
export MINIO_SECRET_KEY=minioadmin
export MINIO_SECURE=false
# DocVault
export BUCKET_PREFIX=doc-vault
export LOG_LEVEL=INFO
Then use the SDK without passing a config:
from doc_vault import DocVaultSDK
# Automatically loads from environment variables
async with DocVaultSDK() as vault:
document = await vault.upload(...)
3. .env File Configuration (Convenient for Local Development)
Create a .env file in your project root (git-ignored):
# PostgreSQL Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password
POSTGRES_DB=doc_vault
POSTGRES_SSL=disable
# MinIO/S3 Configuration
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false
# DocVault Configuration
BUCKET_PREFIX=doc-vault
LOG_LEVEL=INFO
Install python-dotenv for local development:
uv sync --all # or: pip install doc-vault[dev]
Then use the SDK as before (automatically loads .env file):
from doc_vault import DocVaultSDK
async with DocVaultSDK() as vault:
document = await vault.upload(...)
Configuration Priority
DocVault uses this priority order when loading configuration (first match wins):
- Explicit Config object - When you pass
Config(...)directly - Environment variables -
POSTGRES_*,MINIO_*variables - .env file - Loaded automatically if
python-dotenvis available - Defaults - Hardcoded defaults in the Config class
Configuration Reference
| Variable | Type | Default | Description |
|---|---|---|---|
POSTGRES_HOST |
str | localhost |
PostgreSQL server hostname |
POSTGRES_PORT |
int | 5432 |
PostgreSQL server port |
POSTGRES_USER |
str | required | PostgreSQL username |
POSTGRES_PASSWORD |
str | required | PostgreSQL password |
POSTGRES_DB |
str | required | PostgreSQL database name |
POSTGRES_SSL |
str | disable |
SSL mode: disable, prefer, or require |
MINIO_ENDPOINT |
str | required | MinIO/S3 endpoint (e.g., localhost:9000) |
MINIO_ACCESS_KEY |
str | required | MinIO/S3 access key ID |
MINIO_SECRET_KEY |
str | required | MinIO/S3 secret access key |
MINIO_SECURE |
bool | false |
Use HTTPS for MinIO/S3 |
BUCKET_PREFIX |
str | doc-vault |
Prefix for S3/MinIO bucket names |
LOG_LEVEL |
str | INFO |
Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
๐๏ธ Architecture
DocVault uses a three-layer architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SDK API Layer โ
โ (core.py - DocVaultSDK) โ
โ High-level methods: upload(), download(), list() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Service Layer โ
โ DocumentService | AccessService | VersionService โ
โ Business logic orchestration โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Repository Layer โ Storage Layer โ
โ DocumentRepo โ S3StorageBackend โ
โ OrganizationRepo โ (MinIO/S3) โ
โ AgentRepo โ โ
โ VersionRepo โ โ
โ ACLRepo โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PostgreSQL โ MinIO/S3 โ
โ (Metadata, ACL) โ (Binary Files) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ API Reference
Document Operations
# Upload document
document = await vault.upload(
file_path="./document.pdf",
name="Document Name",
organization_id="org-123",
agent_id="agent-456",
description="Optional description",
tags=["tag1", "tag2"],
metadata={"custom": "data"}
)
# Download document
content = await vault.download(
document_id=document.id,
agent_id="agent-456",
version=None # None for latest, or specific version number
)
# Update metadata
updated = await vault.update_metadata(
document_id=document.id,
agent_id="agent-456",
name="New Name",
description="Updated description"
)
# Delete document
await vault.delete(document_id=document.id, agent_id="agent-456")
# List documents
result = await vault.list_docs(
organization_id="org-123",
agent_id="agent-456",
limit=50
)
documents = result.documents
# Search documents
results = await vault.search(
query="financial report",
organization_id="org-123",
agent_id="agent-456"
)
Access Control
# Set permissions for document using PermissionGrant models
from doc_vault.database.schemas.permission import PermissionGrant
await vault.set_permissions(
document_id=document.id,
permissions=[
PermissionGrant(agent_id="agent-789", permission="READ"),
PermissionGrant(agent_id="agent-012", permission="WRITE"),
],
granted_by="agent-456"
)
# Get permissions
perms_result = await vault.get_permissions(
document_id=document.id,
agent_id="agent-789"
)
for acl in perms_result.permissions:
print(f"Permission: {acl.permission}")
Version Management
# Get document details with versions
details = await vault.get_document_details(
document_id=document.id,
agent_id="agent-456",
include_versions=True
)
if details.versions:
for v in details.versions:
print(f"Version {v.version_number}: {v.created_at}")
# Restore previous version
restored = await vault.restore_version(
document_id=document.id,
version_number=2,
agent_id="agent-456",
change_description="Restored version 2"
)
Organization & Agent Management
# Register organization
org = await vault.register_organization(
org_id="550e8400-e29b-41d4-a716-446655440000",
metadata={"industry": "technology"}
)
# Register agent
agent = await vault.register_agent(
agent_id="550e8400-e29b-41d4-a716-446655440001",
organization_id=org.id,
metadata={"role": "admin"}
)
๐ก Examples
DocVault includes comprehensive examples demonstrating real-world usage patterns:
Core Functionality
- Basic Usage - Complete end-to-end workflow
- Access Control - Permission management and sharing
- Versioning - Document version control
- Multi-Organization - Cross-organization collaboration
Running Examples
# Install in development mode
pip install -e .
# Start required services
docker-compose up -d
# Run any example
python examples/basic_usage.py
Each example includes detailed comments explaining the concepts and expected output.
๐ ๏ธ Development
Setup Development Environment
# Clone repository
git clone https://github.com/docvault/doc-vault.git
cd doc-vault
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Activate virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install pre-commit hooks
pre-commit install
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov
# Run specific test file
uv run pytest tests/test_config.py
Code Quality
# Format code
uv run black src/
# Lint code
uv run ruff check src/
# Type checking
uv run mypy src/
# Run all quality checks
uv run pre-commit run --all-files
Database Setup
DocVault requires PostgreSQL for metadata and MinIO for file storage.
Quick Setup with Docker Compose
# Start all services
docker-compose up -d
# Check services are running
docker-compose ps
# View service logs
docker-compose logs postgres
docker-compose logs minio
Manual Setup
# PostgreSQL
docker run -d --name postgres -p 5432:5432 \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=doc_vault \
tensorchord/vchord-suite:pg16-latest
# MinIO
docker run -d --name minio -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"
Initialize Database
# Initialize database schema
uv run python -m doc_vault.database.init_db
Service Access
- PostgreSQL:
localhost:5432 - MinIO API:
localhost:9000 - MinIO Console:
http://localhost:9001(admin/minioadmin)
๐ Documentation
- Full API Documentation - Complete API reference
- Examples - Usage examples and patterns
- Contributing Guide - How to contribute
- Development Guide - Local development setup
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes and add tests
- Run quality checks:
uv run pre-commit run --all-files - Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built with psqlpy for high-performance PostgreSQL
- Uses Pydantic for data validation
- Powered by MinIO for S3-compatible storage
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Read the Docs
DocVault - Making document collaboration simple, secure, and scalable.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docvault_sdk-2.2.1.tar.gz.
File metadata
- Download URL: docvault_sdk-2.2.1.tar.gz
- Upload date:
- Size: 321.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34a4b7702c5e459e1208a59e5a8bef811a78c82b52d16e42caec86fbb78be881
|
|
| MD5 |
89a64efd453324b22edd2f166892ad97
|
|
| BLAKE2b-256 |
0b29491eca32b2edf263d5bad3f482233b2724adaf1aba4d433d87caab7aa32b
|
File details
Details for the file docvault_sdk-2.2.1-py3-none-any.whl.
File metadata
- Download URL: docvault_sdk-2.2.1-py3-none-any.whl
- Upload date:
- Size: 84.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e36cede744e7f0aa7820e2fbf7f35c075423850d6416983167e9f101f68ed410
|
|
| MD5 |
2d0a5e381df5bf628ce97952c76b9892
|
|
| BLAKE2b-256 |
1f880cab5c9abfd3ea1719476f8cb0ef8c3b67b88c20967edf106a8928bc9caa
|