Skip to main content

Chuk Artifacts provides a production-ready, modular artifact storage system that works seamlessly across multiple storage backends (memory, filesystem, AWS S3, IBM Cloud Object Storage) with Redis or memory-based metadata caching and strict session-based security.

Project description

Chuk Artifacts

Async artifact storage with session-based security and multi-backend support

A production-ready Python library for storing and managing files across multiple storage backends (S3, IBM COS, filesystem, memory) with Redis-based metadata caching and strict session isolation.

Python 3.11+ License: MIT Async

Why Chuk Artifacts?

  • ๐Ÿ”’ Session-based security - Every file belongs to a session, preventing data leaks
  • ๐ŸŒ Multiple backends - Switch between S3, filesystem, memory without code changes
  • โšก High performance - 3,000+ operations/second with async/await
  • ๐ŸŽฏ Zero config - Works out of the box, configure only what you need
  • ๐Ÿ”— Presigned URLs - Secure file access without exposing credentials
  • ๐Ÿ“ฆ Grid architecture - Organized paths for scalability and federation

Quick Start

Installation

pip install chuk-artifacts

30-Second Example

from chuk_artifacts import ArtifactStore

# Works immediately - no configuration needed
async with ArtifactStore() as store:
    # Store a file
    file_id = await store.store(
        data=b"Hello, world!",
        mime="text/plain", 
        summary="My first file",
        filename="hello.txt"
    )
    
    # Get it back
    content = await store.retrieve(file_id)
    print(content.decode())  # "Hello, world!"
    
    # Share with a secure URL
    url = await store.presign(file_id)
    print(f"Download: {url}")

That's it! No AWS credentials, no Redis setup, no configuration files. Perfect for development and testing.

Core Concepts

Sessions = Security Boundaries

Every file belongs to a session. Sessions prevent users from accessing each other's files:

# Files are isolated by session
alice_file = await store.store(
    data=b"Alice's private data",
    mime="text/plain",
    summary="Private file",
    session_id="user_alice"  # Alice's session
)

bob_file = await store.store(
    data=b"Bob's private data", 
    mime="text/plain",
    summary="Private file",
    session_id="user_bob"  # Bob's session
)

# Alice can't access Bob's files
alice_files = await store.list_by_session("user_alice")  # Only Alice's files
bob_files = await store.list_by_session("user_bob")      # Only Bob's files

# Cross-session operations are blocked
await store.copy_file(alice_file, target_session_id="user_bob")  # โŒ Denied

Grid Architecture

Files are organized in a predictable hierarchy:

grid/
โ”œโ”€โ”€ sandbox_id/
โ”‚   โ”œโ”€โ”€ session_alice/
โ”‚   โ”‚   โ”œโ”€โ”€ file_1
โ”‚   โ”‚   โ””โ”€โ”€ file_2
โ”‚   โ””โ”€โ”€ session_bob/
โ”‚       โ”œโ”€โ”€ file_3
โ”‚       โ””โ”€โ”€ file_4

This makes the system multi-tenant safe and federation-ready.

File Operations

Basic Operations

# Store a file
file_id = await store.store(
    data=file_bytes,
    mime="image/jpeg",
    summary="Profile photo",
    filename="avatar.jpg",
    session_id="user_123"
)

# Read file content directly  
content = await store.read_file(file_id, as_text=True)

# Write text files easily
doc_id = await store.write_file(
    content="# My Document\n\nHello world!",
    filename="docs/readme.md",
    mime="text/markdown",
    session_id="user_123"
)

# Check if file exists
if await store.exists(file_id):
    print("File found!")

# Delete file
await store.delete(file_id)

Directory-Like Operations

# List files in a session
files = await store.list_by_session("user_123")

# List files in a "directory"
docs = await store.get_directory_contents("user_123", "docs/")
images = await store.get_directory_contents("user_123", "images/")

# Copy files (within same session only)
backup_id = await store.copy_file(
    file_id,
    new_filename="docs/readme_backup.md"
)

Metadata and Updates

# Get file metadata
meta = await store.metadata(file_id)
print(f"Size: {meta['bytes']} bytes")
print(f"Created: {meta['stored_at']}")

# Update metadata
await store.update_metadata(
    file_id,
    summary="Updated description",
    meta={"version": 2, "author": "Alice"}
)

# Update file content
await store.update_file(
    file_id,
    data=b"New content",
    summary="Updated file"
)

Storage Providers

Memory Provider (Default)

Perfect for development and testing:

# Automatic - no configuration needed
store = ArtifactStore()
  • โœ… Zero setup
  • โœ… Fast
  • โŒ Non-persistent (lost on restart)

Filesystem Provider

Local disk storage:

store = ArtifactStore(storage_provider="filesystem")

# Or via environment
export ARTIFACT_PROVIDER=filesystem
export ARTIFACT_FS_ROOT=./my-files
  • โœ… Persistent
  • โœ… Good for development
  • โœ… Easy debugging
  • โŒ Not suitable for production clustering

AWS S3 Provider

Production-ready cloud storage:

store = ArtifactStore(storage_provider="s3")

# Configure via environment
export ARTIFACT_PROVIDER=s3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret  
export AWS_REGION=us-east-1
export ARTIFACT_BUCKET=my-bucket
  • โœ… Highly scalable
  • โœ… Durable (99.999999999%)
  • โœ… Native presigned URLs
  • โœ… Production ready

IBM Cloud Object Storage

Enterprise object storage:

# HMAC credentials
store = ArtifactStore(storage_provider="ibm_cos")

export ARTIFACT_PROVIDER=ibm_cos
export AWS_ACCESS_KEY_ID=your_hmac_key
export AWS_SECRET_ACCESS_KEY=your_hmac_secret
export IBM_COS_ENDPOINT=https://s3.us-south.cloud-object-storage.appdomain.cloud

# Or IAM credentials  
store = ArtifactStore(storage_provider="ibm_cos_iam")

export ARTIFACT_PROVIDER=ibm_cos_iam
export IBM_COS_APIKEY=your_api_key
export IBM_COS_INSTANCE_CRN=crn:v1:bluemix:public:cloud-object-storage:...

Session Providers

Memory Sessions (Default)

Fast, in-memory metadata storage:

store = ArtifactStore(session_provider="memory")
  • โœ… Fast
  • โœ… No setup
  • โŒ Non-persistent
  • โŒ Single instance only

Redis Sessions

Persistent, shared metadata storage:

store = ArtifactStore(session_provider="redis")

# Configure via environment
export SESSION_PROVIDER=redis
export SESSION_REDIS_URL=redis://localhost:6379/0
  • โœ… Persistent
  • โœ… Shared across instances
  • โœ… Production ready
  • โœ… High performance

Environment Variables

Variable Description Default Examples
Storage Configuration
ARTIFACT_PROVIDER Storage backend memory s3, filesystem, ibm_cos
ARTIFACT_BUCKET Bucket/container name artifacts my-files, prod-storage
ARTIFACT_FS_ROOT Filesystem root directory ./artifacts /data/files, ~/storage
ARTIFACT_SANDBOX_ID Sandbox identifier Auto-generated myapp, prod-env
Session Configuration
SESSION_PROVIDER Session metadata storage memory redis
SESSION_REDIS_URL Redis connection URL - redis://localhost:6379/0
AWS/S3 Configuration
AWS_ACCESS_KEY_ID AWS access key - AKIA...
AWS_SECRET_ACCESS_KEY AWS secret key - abc123...
AWS_REGION AWS region us-east-1 us-west-2, eu-west-1
S3_ENDPOINT_URL Custom S3 endpoint - https://minio.example.com
IBM COS Configuration
IBM_COS_ENDPOINT IBM COS endpoint - https://s3.us-south.cloud-object-storage.appdomain.cloud
IBM_COS_APIKEY IBM Cloud API key (IAM) - abc123...
IBM_COS_INSTANCE_CRN COS instance CRN (IAM) - crn:v1:bluemix:public:...

Configuration Examples

Development Setup

# Zero configuration - uses memory providers
from chuk_artifacts import ArtifactStore

store = ArtifactStore()

Local Development with Persistence

import os
from chuk_artifacts import ArtifactStore

# Use filesystem for persistence
os.environ["ARTIFACT_PROVIDER"] = "filesystem"
os.environ["ARTIFACT_FS_ROOT"] = "./dev-storage"

store = ArtifactStore()

Production with S3 + Redis

import os
from chuk_artifacts import ArtifactStore

# Configure S3 storage
os.environ.update({
    "ARTIFACT_PROVIDER": "s3",
    "AWS_ACCESS_KEY_ID": "AKIA...",
    "AWS_SECRET_ACCESS_KEY": "...", 
    "AWS_REGION": "us-east-1",
    "ARTIFACT_BUCKET": "prod-artifacts"
})

# Configure Redis sessions  
os.environ.update({
    "SESSION_PROVIDER": "redis",
    "SESSION_REDIS_URL": "redis://prod-redis:6379/0"
})

store = ArtifactStore()

Docker Compose Example

version: '3.8'
services:
  app:
    image: myapp
    environment:
      # Storage
      ARTIFACT_PROVIDER: s3
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
      AWS_REGION: us-east-1
      ARTIFACT_BUCKET: myapp-artifacts
      
      # Sessions  
      SESSION_PROVIDER: redis
      SESSION_REDIS_URL: redis://redis:6379/0
    depends_on:
      - redis
      
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  redis_data:

Presigned URLs

Generate secure, time-limited URLs for file access without exposing your storage credentials:

# Generate download URLs
url = await store.presign(file_id)                    # 1 hour (default)
short_url = await store.presign_short(file_id)        # 15 minutes  
medium_url = await store.presign_medium(file_id)      # 1 hour
long_url = await store.presign_long(file_id)          # 24 hours

# Generate upload URLs
upload_url, artifact_id = await store.presign_upload(
    session_id="user_123",
    filename="upload.jpg",
    mime_type="image/jpeg"
)

# Client uploads to upload_url, then register the file
await store.register_uploaded_artifact(
    artifact_id,
    mime="image/jpeg",
    summary="User uploaded image",
    filename="upload.jpg"
)

Common Use Cases

Web Application File Uploads

from chuk_artifacts import ArtifactStore

store = ArtifactStore(
    storage_provider="s3",
    session_provider="redis"
)

async def handle_upload(file_data: bytes, filename: str, user_id: str):
    """Handle file upload with user isolation"""
    file_id = await store.store(
        data=file_data,
        mime="application/octet-stream",
        summary=f"Uploaded: {filename}",
        filename=filename,
        session_id=f"user_{user_id}"  # User-specific session
    )
    
    # Return download URL  
    download_url = await store.presign_medium(file_id)
    return {"file_id": file_id, "download_url": download_url}

async def list_user_files(user_id: str):
    """List all files for a user"""
    return await store.list_by_session(f"user_{user_id}")

MCP Server Integration

async def mcp_upload_file(data_b64: str, filename: str, session_id: str):
    """MCP tool for file uploads"""
    import base64
    
    data = base64.b64decode(data_b64)
    file_id = await store.store(
        data=data,
        mime="application/octet-stream",
        summary=f"Uploaded via MCP: {filename}",
        filename=filename,
        session_id=session_id
    )
    
    return {"file_id": file_id, "message": f"Uploaded {filename}"}

async def mcp_list_files(session_id: str, directory: str = ""):
    """MCP tool for listing files"""
    files = await store.get_directory_contents(session_id, directory)
    return {"files": [{"name": f["filename"], "size": f["bytes"]} for f in files]}

Document Management

async def create_document(content: str, path: str, user_id: str):
    """Create a text document"""
    doc_id = await store.write_file(
        content=content,
        filename=path,
        mime="text/plain",
        summary=f"Document: {path}",
        session_id=f"user_{user_id}"
    )
    return doc_id

async def get_document(doc_id: str):
    """Read document content"""
    return await store.read_file(doc_id, as_text=True)

async def list_documents(user_id: str, folder: str = ""):
    """List documents in a folder"""
    return await store.get_directory_contents(f"user_{user_id}", folder)

Batch Operations

Process multiple files efficiently:

# Prepare batch data
files = [
    {
        "data": file1_bytes,
        "mime": "image/jpeg",
        "summary": "Product image 1", 
        "filename": "products/img1.jpg"
    },
    {
        "data": file2_bytes,
        "mime": "image/jpeg",
        "summary": "Product image 2",
        "filename": "products/img2.jpg"  
    }
]

# Store all files at once
file_ids = await store.store_batch(files, session_id="product_catalog")

Error Handling

from chuk_artifacts import (
    ArtifactStoreError,
    ArtifactNotFoundError,
    ProviderError,
    SessionError
)

try:
    data = await store.retrieve(file_id)
except ArtifactNotFoundError:
    print("File not found or expired")
except ProviderError as e:
    print(f"Storage error: {e}")
except SessionError as e:  
    print(f"Session error: {e}")
except ArtifactStoreError as e:
    # This catches security violations like cross-session operations
    print(f"Operation denied: {e}")

Performance

Chuk Artifacts is built for high performance:

  • 3,000+ operations/second in benchmarks
  • Async/await throughout for non-blocking I/O
  • Connection pooling with aioboto3
  • Redis caching for sub-millisecond metadata lookups
  • Batch operations to reduce overhead

Benchmark Results

โœ… File Creation: 3,083 files/sec
โœ… File Reading: 4,693 reads/sec  
โœ… File Copying: 1,811 copies/sec
โœ… Session Listing: ~2ms for 20+ files

Testing

Run the included examples to verify everything works:

# Basic functionality test
python -c "
import asyncio
from chuk_artifacts import ArtifactStore

async def test():
    async with ArtifactStore() as store:
        file_id = await store.store(
            data=b'Hello, world!',
            mime='text/plain',
            summary='Test file'
        )
        content = await store.retrieve(file_id)
        print(f'Success! Retrieved: {content.decode()}')

asyncio.run(test())
"

For development, use the filesystem provider for easy debugging:

import tempfile
from chuk_artifacts import ArtifactStore

async def test_with_filesystem():
    with tempfile.TemporaryDirectory() as tmpdir:
        store = ArtifactStore(
            storage_provider="filesystem",
            fs_root=tmpdir
        )
        
        file_id = await store.store(
            data=b"Test content",
            mime="text/plain", 
            summary="Test file"
        )
        
        # Files are visible in tmpdir for debugging
        print(f"Files stored in: {tmpdir}")

Security

Session Isolation

Sessions provide strict security boundaries:

# Each user gets their own session
alice_session = "user_alice"  
bob_session = "user_bob"

# Users can only access their own files
alice_files = await store.list_by_session(alice_session)
bob_files = await store.list_by_session(bob_session)

# Cross-session operations are blocked
try:
    await store.copy_file(alice_file_id, target_session_id=bob_session)
except ArtifactStoreError:
    print("โœ… Cross-session access denied!")

Secure Defaults

  • Files expire automatically (configurable TTL)
  • Presigned URLs have time limits
  • No sensitive data in error messages
  • Environment-based credential configuration
  • Session-based access control

Migration Guide

From Local Storage

# Before: Simple file operations
with open("file.txt", "rb") as f:
    data = f.read()

# After: Session-based storage
file_id = await store.store(
    data=data,
    mime="text/plain",
    summary="Migrated file",
    filename="file.txt",
    session_id="migration_session"
)

From Basic S3

# Before: Direct S3 operations
s3.put_object(Bucket="bucket", Key="key", Body=data)

# After: Managed artifact storage
file_id = await store.store(
    data=data,
    mime="application/octet-stream", 
    summary="File description",
    filename="myfile.dat"
)

FAQ

Q: Do I need Redis for development?

A: No! The default memory providers work great for development and testing. Only use Redis for production or when you need persistence.

Q: Can I switch storage providers later?

A: Yes! Change the ARTIFACT_PROVIDER environment variable. The API stays the same.

Q: How do sessions work with authentication?

A: Sessions are just strings. Map them to your users however you want:

# Example mappings
session_id = f"user_{user.id}"           # User-based
session_id = f"org_{org.id}"             # Organization-based  
session_id = f"project_{project.uuid}"   # Project-based

Q: What happens when files expire?

A: Expired files are automatically cleaned up during session cleanup operations. You can also run manual cleanup:

expired_count = await store.cleanup_expired_sessions()

Q: Can I use this with Django/FastAPI/Flask?

A: Absolutely! Chuk Artifacts is framework-agnostic. Initialize the store at startup and use it in your request handlers.

Q: Is it production ready?

A: Yes! It's designed for production with:

  • High performance (3,000+ ops/sec)
  • Multiple storage backends
  • Session-based security
  • Comprehensive error handling
  • Redis support for clustering

Next Steps

  1. Try it out: pip install chuk-artifacts
  2. Start simple: Use the default memory providers
  3. Add persistence: Switch to filesystem or S3
  4. Scale up: Add Redis for production
  5. Secure it: Use session-based isolation

Ready to build something awesome? ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuk_artifacts-0.4.tar.gz (93.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chuk_artifacts-0.4-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file chuk_artifacts-0.4.tar.gz.

File metadata

  • Download URL: chuk_artifacts-0.4.tar.gz
  • Upload date:
  • Size: 93.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for chuk_artifacts-0.4.tar.gz
Algorithm Hash digest
SHA256 7833181c7e9902284d2c94bee3c2f6043a64b18c2d51e8cca087b26f6045d184
MD5 8e118f472740256e4310b3d6023d8d9a
BLAKE2b-256 f5b08ab0f05c09ca9bd600d1eb55e4468e24f7ac2b68c1de7fae02d45587cb41

See more details on using hashes here.

File details

Details for the file chuk_artifacts-0.4-py3-none-any.whl.

File metadata

  • Download URL: chuk_artifacts-0.4-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for chuk_artifacts-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 469447dd00982f3601149a5d8ab7388f1bebd2a76b848a1cf35552da4a861dd6
MD5 4e70f1291f7bd3d7b535d997480b9b5d
BLAKE2b-256 a6494fa5357352416e40ad54b80161849babd1a7335b724b56855920e8afdf16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page