Skip to main content

Scalable Python SDK for document management and collaboration across organizations and AI agents

Project description

DocVault SDK

Python Version License: MIT Documentation Tests

Scalable Python SDK for document management and collaboration across organizations and AI agents.

DocVault provides a complete solution for document upload, management, version control, and access control. It supports multi-organization isolation, role-based permissions, and integrates seamlessly with PostgreSQL and MinIO/S3 storage.

โœจ Features

  • ๐Ÿ“ Document Management: Upload, download, update, and delete documents
  • ๐Ÿ”’ Access Control: Role-based permissions (READ, WRITE, DELETE, SHARE, ADMIN)
  • ๐Ÿ“š Version Control: Full document history with restore capabilities
  • ๐Ÿข Multi-Organization: Strong isolation between organizations
  • ๐Ÿ” Full-Text Search: PostgreSQL-powered document search
  • โ˜๏ธ Cloud Storage: MinIO/S3 integration for binary file storage
  • ๐Ÿค– AI Agent Support: Designed for both human and AI agent collaboration
  • โšก High Performance: Built with async-first design using psqlpy

๐Ÿš€ Quick Start

Installation

# Using uv (recommended)
uv add docvault-sdk

# Or using pip
pip install docvault-sdk

Basic Usage

import asyncio
from doc_vault import DocVaultSDK

async def main():
    # Initialize SDK (loads config from .env)
    async with DocVaultSDK() as vault:
        # Upload a document
        document = await vault.upload(
            file_path="./report.pdf",
            name="Q4 Financial Report",
            organization_id="org-123",
            agent_id="agent-456"
        )

        # Download the document
        content = await vault.download(
            document_id=document.id,
            agent_id="agent-456"
        )

        print(f"Uploaded document: {document.name}")

asyncio.run(main())

Try the Examples

Get started quickly with our comprehensive examples:

# Clone and setup
git clone https://github.com/docvault/doc-vault.git
cd doc-vault

# Install dependencies
uv sync

# Start services
docker-compose up -d

# Run basic usage example
uv run python examples/basic_usage.py

See Examples for detailed usage patterns including access control, versioning, and multi-organization scenarios.

๐Ÿ“‹ Requirements

  • Python: 3.10+
  • Database: PostgreSQL 14+
  • Storage: MinIO or AWS S3
  • Memory: 512MB+ RAM
  • Disk: Depends on document storage needs

โš™๏ธ Configuration

DocVault supports three flexible configuration patterns to support different deployment scenarios:

1. Direct Python Configuration (Recommended for PyPI Users)

For maximum control in your application, pass a Config object directly:

from doc_vault import DocVaultSDK
from doc_vault.config import Config

# Create configuration programmatically
config = Config(
    # PostgreSQL Configuration
    postgres_host="localhost",
    postgres_port=5432,
    postgres_user="postgres",
    postgres_password="your_password",
    postgres_db="doc_vault",
    postgres_ssl="disable",
    
    # MinIO/S3 Configuration
    minio_endpoint="localhost:9000",
    minio_access_key="minioadmin",
    minio_secret_key="minioadmin",
    minio_secure=False,
    
    # DocVault Configuration
    bucket_prefix="doc-vault",
    log_level="INFO"
)

# Use the configuration
async with DocVaultSDK(config=config) as vault:
    document = await vault.upload(
        file_path="./report.pdf",
        name="Report",
        organization_id="org-123",
        agent_id="agent-456"
    )

2. Environment Variables (Recommended for Docker/Kubernetes)

Set environment variables before running your application:

# PostgreSQL
export POSTGRES_HOST=postgres
export POSTGRES_PORT=5432
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=password
export POSTGRES_DB=doc_vault
export POSTGRES_SSL=disable

# MinIO/S3
export MINIO_ENDPOINT=minio:9000
export MINIO_ACCESS_KEY=minioadmin
export MINIO_SECRET_KEY=minioadmin
export MINIO_SECURE=false

# DocVault
export BUCKET_PREFIX=doc-vault
export LOG_LEVEL=INFO

Then use the SDK without passing a config:

from doc_vault import DocVaultSDK

# Automatically loads from environment variables
async with DocVaultSDK() as vault:
    document = await vault.upload(...)

3. .env File Configuration (Convenient for Local Development)

Create a .env file in your project root (git-ignored):

# PostgreSQL Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password
POSTGRES_DB=doc_vault
POSTGRES_SSL=disable

# MinIO/S3 Configuration
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false

# DocVault Configuration
BUCKET_PREFIX=doc-vault
LOG_LEVEL=INFO

Install python-dotenv for local development:

uv sync --all  # or: pip install doc-vault[dev]

Then use the SDK as before (automatically loads .env file):

from doc_vault import DocVaultSDK

async with DocVaultSDK() as vault:
    document = await vault.upload(...)

Configuration Priority

DocVault uses this priority order when loading configuration (first match wins):

  1. Explicit Config object - When you pass Config(...) directly
  2. Environment variables - POSTGRES_*, MINIO_* variables
  3. .env file - Loaded automatically if python-dotenv is available
  4. Defaults - Hardcoded defaults in the Config class

Configuration Reference

Variable Type Default Description
POSTGRES_HOST str localhost PostgreSQL server hostname
POSTGRES_PORT int 5432 PostgreSQL server port
POSTGRES_USER str required PostgreSQL username
POSTGRES_PASSWORD str required PostgreSQL password
POSTGRES_DB str required PostgreSQL database name
POSTGRES_SSL str disable SSL mode: disable, prefer, or require
MINIO_ENDPOINT str required MinIO/S3 endpoint (e.g., localhost:9000)
MINIO_ACCESS_KEY str required MinIO/S3 access key ID
MINIO_SECRET_KEY str required MinIO/S3 secret access key
MINIO_SECURE bool false Use HTTPS for MinIO/S3
BUCKET_PREFIX str doc-vault Prefix for S3/MinIO bucket names
LOG_LEVEL str INFO Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

๐Ÿ—๏ธ Architecture

DocVault uses a three-layer architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   SDK API Layer                     โ”‚
โ”‚              (core.py - DocVaultSDK)                โ”‚
โ”‚  High-level methods: upload(), download(), list()   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Service Layer                      โ”‚
โ”‚   DocumentService | AccessService | VersionService  โ”‚
โ”‚         Business logic orchestration                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Repository Layer       โ”‚    Storage Layer        โ”‚
โ”‚  DocumentRepo             โ”‚    S3StorageBackend     โ”‚
โ”‚  OrganizationRepo         โ”‚    (MinIO/S3)           โ”‚
โ”‚  AgentRepo                โ”‚                         โ”‚
โ”‚  VersionRepo              โ”‚                         โ”‚
โ”‚  ACLRepo                  โ”‚                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      PostgreSQL           โ”‚      MinIO/S3           โ”‚
โ”‚   (Metadata, ACL)         โ”‚   (Binary Files)        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“– API Reference

Document Operations

# Upload document
document = await vault.upload(
    file_path="./document.pdf",
    name="Document Name",
    organization_id="org-123",
    agent_id="agent-456",
    description="Optional description",
    tags=["tag1", "tag2"],
    metadata={"custom": "data"}
)

# Download document
content = await vault.download(
    document_id=document.id,
    agent_id="agent-456",
    version=None  # None for latest, or specific version number
)

# Update metadata
updated = await vault.update_metadata(
    document_id=document.id,
    agent_id="agent-456",
    name="New Name",
    description="Updated description"
)

# Delete document
await vault.delete(document_id=document.id, agent_id="agent-456")

# List documents
result = await vault.list_docs(
    organization_id="org-123",
    agent_id="agent-456",
    limit=50
)
documents = result.documents

# Search documents
results = await vault.search(
    query="financial report",
    organization_id="org-123",
    agent_id="agent-456"
)

Access Control

# Set permissions for document using PermissionGrant models
from doc_vault.database.schemas.permission import PermissionGrant

await vault.set_permissions(
    document_id=document.id,
    permissions=[
        PermissionGrant(agent_id="agent-789", permission="READ"),
        PermissionGrant(agent_id="agent-012", permission="WRITE"),
    ],
    granted_by="agent-456"
)

# Get permissions
perms_result = await vault.get_permissions(
    document_id=document.id,
    agent_id="agent-789"
)
for acl in perms_result.permissions:
    print(f"Permission: {acl.permission}")

Version Management

# Get document details with versions
details = await vault.get_document_details(
    document_id=document.id,
    agent_id="agent-456",
    include_versions=True
)
if details.versions:
    for v in details.versions:
        print(f"Version {v.version_number}: {v.created_at}")

# Restore previous version
restored = await vault.restore_version(
    document_id=document.id,
    version_number=2,
    agent_id="agent-456",
    change_description="Restored version 2"
)

Organization & Agent Management

# Register organization
org = await vault.register_organization(
    org_id="550e8400-e29b-41d4-a716-446655440000",
    metadata={"industry": "technology"}
)

# Register agent
agent = await vault.register_agent(
    agent_id="550e8400-e29b-41d4-a716-446655440001",
    organization_id=org.id,
    metadata={"role": "admin"}
)

๐Ÿ’ก Examples

DocVault includes comprehensive examples demonstrating real-world usage patterns:

Core Functionality

Running Examples

# Install in development mode
pip install -e .

# Start required services
docker-compose up -d

# Run any example
python examples/basic_usage.py

Each example includes detailed comments explaining the concepts and expected output.

๐Ÿ› ๏ธ Development

Setup Development Environment

# Clone repository
git clone https://github.com/docvault/doc-vault.git
cd doc-vault

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov

# Run specific test file
uv run pytest tests/test_config.py

Code Quality

# Format code
uv run black src/

# Lint code
uv run ruff check src/

# Type checking
uv run mypy src/

# Run all quality checks
uv run pre-commit run --all-files

Database Setup

DocVault requires PostgreSQL for metadata and MinIO for file storage.

Quick Setup with Docker Compose

# Start all services
docker-compose up -d

# Check services are running
docker-compose ps

# View service logs
docker-compose logs postgres
docker-compose logs minio

Manual Setup

# PostgreSQL
docker run -d --name postgres -p 5432:5432 \
  -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=doc_vault \
  tensorchord/vchord-suite:pg16-latest

# MinIO
docker run -d --name minio -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  minio/minio server /data --console-address ":9001"

Initialize Database

# Initialize database schema
uv run python -m doc_vault.database.init_db

Service Access

  • PostgreSQL: localhost:5432
  • MinIO API: localhost:9000
  • MinIO Console: http://localhost:9001 (admin/minioadmin)

๐Ÿ“š Documentation

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes and add tests
  4. Run quality checks: uv run pre-commit run --all-files
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built with psqlpy for high-performance PostgreSQL
  • Uses Pydantic for data validation
  • Powered by MinIO for S3-compatible storage

๐Ÿ“ž Support


DocVault - Making document collaboration simple, secure, and scalable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docvault_sdk-2.2.1.tar.gz (321.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docvault_sdk-2.2.1-py3-none-any.whl (84.1 kB view details)

Uploaded Python 3

File details

Details for the file docvault_sdk-2.2.1.tar.gz.

File metadata

  • Download URL: docvault_sdk-2.2.1.tar.gz
  • Upload date:
  • Size: 321.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for docvault_sdk-2.2.1.tar.gz
Algorithm Hash digest
SHA256 34a4b7702c5e459e1208a59e5a8bef811a78c82b52d16e42caec86fbb78be881
MD5 89a64efd453324b22edd2f166892ad97
BLAKE2b-256 0b29491eca32b2edf263d5bad3f482233b2724adaf1aba4d433d87caab7aa32b

See more details on using hashes here.

File details

Details for the file docvault_sdk-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: docvault_sdk-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 84.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for docvault_sdk-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e36cede744e7f0aa7820e2fbf7f35c075423850d6416983167e9f101f68ed410
MD5 2d0a5e381df5bf628ce97952c76b9892
BLAKE2b-256 1f880cab5c9abfd3ea1719476f8cb0ef8c3b67b88c20967edf106a8928bc9caa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page