Skip to main content

AI-powered cloud storage object tagging using LLMs

Project description

Smart Cloud Tag

AI-powered cloud storage object tagging using Large Language Models.

Features

  • Multi-Cloud Support: AWS S3, Azure Blob Storage, Google Cloud Storage
  • AI-Powered: Uses OpenAI, Anthropic Claude, or Google Gemini for intelligent tagging
  • Auto-Detection: Automatically detects storage provider from URI prefix
  • Batch Processing: Process multiple files with one command
  • Preview Mode: Preview tags before applying them
  • Custom Prompts: Use your own LLM prompt templates

Quick Start

Installation

Basic Installation

pip install smart_cloud_tag

Note: Basic installation includes AWS S3 and OpenAI support. For other cloud providers or LLM providers, use the optional dependencies below.

Installation with Optional Dependencies

You can install additional dependencies based on your needs:

# Install with all optional dependencies (recommended)
pip install smart_cloud_tag[all]

# Install with specific cloud providers
pip install smart_cloud_tag[aws]      # AWS S3 (included by default)
pip install smart_cloud_tag[azure]    # Azure Blob Storage
pip install smart_cloud_tag[gcp]      # Google Cloud Storage

# Install with specific LLM providers
pip install smart_cloud_tag[openai]   # OpenAI (included by default)
pip install smart_cloud_tag[anthropic] # Anthropic Claude
pip install smart_cloud_tag[gemini]   # Google Gemini

# Combine multiple options
pip install smart_cloud_tag[azure,anthropic]  # Azure + Anthropic
pip install smart_cloud_tag[gcp,gemini]       # GCP + Gemini

Installation Options:

  • [all] - Installs all optional dependencies (all cloud providers + LLM providers)
  • [aws] - AWS S3 support (included by default)
  • [azure] - Azure Blob Storage support
  • [gcp] - Google Cloud Storage support
  • [openai] - OpenAI LLM support (included by default)
  • [anthropic] - Anthropic Claude LLM support
  • [gemini] - Google Gemini LLM support
  • [dev] - Development dependencies (testing, linting, formatting)

Basic Usage

from smart_cloud_tag import SmartCloudTagger

# Define your tag schema
tags = {
    "document_type": ["invoice", "contract", "report"],
    "department": ["finance", "legal", "hr"],
    "confidential": ["true", "false"]
}

# Initialize tagger (provider auto-detected from URI)
tagger = SmartCloudTagger(
    storage_uri="s3://my-bucket",  # or "az://container" or "gs://bucket"
    tags=tags
)

# Preview tags before applying
preview_result = tagger.preview_tags()
print(f"Preview: {preview_result.summary}")

# Apply tags
result = tagger.apply_tags()
print(f"Applied tags to {result.summary['applied']} objects")

Configuration

Environment Variables

Create a .env file:

# LLM Provider API Key (used for all providers)
API_KEY=your_api_key_here

# AWS (if using S3)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1

# Azure (if using Blob Storage)
AZURE_STORAGE_CONNECTION_STRING=your_connection_string

# Google Cloud (if using GCS)
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json

Supported Storage Providers

Provider URI Format Example
AWS S3 s3://bucket s3://my-documents
Azure Blob az://container az://documents
Google Cloud gs://bucket gs://my-files

Supported LLM Providers

Provider Default Model Environment Variable
OpenAI gpt-4.1 API_KEY
Anthropic claude-3-5-sonnet-20241022 API_KEY
Google Gemini gemini-1.5-pro API_KEY

Advanced Usage

Custom Prompt Templates

custom_prompt = """
Analyze this document and assign tags based on content.

Filename: {filename}
Content: {content}
Tags to assign: {tags}

Focus on document classification and confidentiality.
Return only tag values separated by commas.
"""

tagger = SmartCloudTagger(
    storage_uri="s3://my-bucket",
    tags=tags,
    custom_prompt_template=custom_prompt
)

Different LLM Providers

# Using Anthropic Claude
tagger = SmartCloudTagger(
    storage_uri="s3://my-bucket",
    tags=tags,
    llm_provider="anthropic"
)

# Using Google Gemini
tagger = SmartCloudTagger(
    storage_uri="s3://my-bucket", 
    tags=tags,
    llm_provider="gemini"
)

Architecture

Architecture Diagram

Example

Example

Development

Installation from Source

git clone https://github.com/yourusername/smart_cloud_tag.git
cd smart_cloud_tag
pip install -e ".[all]"

Note: Use quotes around ".[all]" to prevent shell expansion issues in zsh and other shells.

Running Tests

python -m pytest tests/

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_cloud_tag-0.1.0.tar.gz (118.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_cloud_tag-0.1.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file smart_cloud_tag-0.1.0.tar.gz.

File metadata

  • Download URL: smart_cloud_tag-0.1.0.tar.gz
  • Upload date:
  • Size: 118.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for smart_cloud_tag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c99627e8e396b1c26816468b341e74dd247471e3813de931764a431f73c55bac
MD5 3c2863fab8e4b265380ec5ad6998eb57
BLAKE2b-256 37a4cc9a6af16bd737908901cf749a818aa1f096167044ca2f5c701a6597e2e3

See more details on using hashes here.

File details

Details for the file smart_cloud_tag-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for smart_cloud_tag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c1ea43498e4489bdbad38f144bc42631182db2a5f6a804f76488b61bf2d9c7f
MD5 95cdf84e2e135ffda2b518f3a64f21c1
BLAKE2b-256 38da9a067363ff3a17260bff90fa267d4b7e553fea1e49c2c718bbb316e1628c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page