AI-powered cloud storage object tagging using LLMs
Project description
Smart Cloud Tag
AI-powered cloud storage object tagging using Large Language Models.
Features
- Multi-Cloud Support: AWS S3, Azure Blob Storage, Google Cloud Storage
- AI-Powered: Uses OpenAI, Anthropic Claude, or Google Gemini for intelligent tagging
- Auto-Detection: Automatically detects storage provider from URI prefix
- Batch Processing: Process multiple files with one command
- Preview Mode: Preview tags before applying them
- Custom Prompts: Use your own LLM prompt templates
Quick Start
Installation
Basic Installation
pip install smart_cloud_tag
Note: Basic installation includes AWS S3 and OpenAI support. For other cloud providers or LLM providers, use the optional dependencies below.
Installation with Optional Dependencies
You can install additional dependencies based on your needs:
# Install with all optional dependencies (recommended)
pip install smart_cloud_tag[all]
# Install with specific cloud providers
pip install smart_cloud_tag[aws] # AWS S3 (included by default)
pip install smart_cloud_tag[azure] # Azure Blob Storage
pip install smart_cloud_tag[gcp] # Google Cloud Storage
# Install with specific LLM providers
pip install smart_cloud_tag[openai] # OpenAI (included by default)
pip install smart_cloud_tag[anthropic] # Anthropic Claude
pip install smart_cloud_tag[gemini] # Google Gemini
# Combine multiple options
pip install smart_cloud_tag[azure,anthropic] # Azure + Anthropic
pip install smart_cloud_tag[gcp,gemini] # GCP + Gemini
Installation Options:
[all]- Installs all optional dependencies (all cloud providers + LLM providers)[aws]- AWS S3 support (included by default)[azure]- Azure Blob Storage support[gcp]- Google Cloud Storage support[openai]- OpenAI LLM support (included by default)[anthropic]- Anthropic Claude LLM support[gemini]- Google Gemini LLM support[dev]- Development dependencies (testing, linting, formatting)
Basic Usage
from smart_cloud_tag import SmartCloudTagger
# Define your tag schema
tags = {
"document_type": ["invoice", "contract", "report"],
"department": ["finance", "legal", "hr"],
"confidential": ["true", "false"]
}
# Initialize tagger (provider auto-detected from URI)
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket", # or "az://container" or "gs://bucket"
tags=tags
)
# Preview tags before applying
preview_result = tagger.preview_tags()
print(f"Preview: {preview_result.summary}")
# Apply tags
result = tagger.apply_tags()
print(f"Applied tags to {result.summary['applied']} objects")
Configuration
Environment Variables
Create a .env file:
# LLM Provider API Key (used for all providers)
API_KEY=your_api_key_here
# AWS (if using S3)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
# Azure (if using Blob Storage)
AZURE_STORAGE_CONNECTION_STRING=your_connection_string
# Google Cloud (if using GCS)
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
Supported Storage Providers
| Provider | URI Format | Example |
|---|---|---|
| AWS S3 | s3://bucket |
s3://my-documents |
| Azure Blob | az://container |
az://documents |
| Google Cloud | gs://bucket |
gs://my-files |
Supported LLM Providers
| Provider | Default Model | Environment Variable |
|---|---|---|
| OpenAI | gpt-4.1 |
API_KEY |
| Anthropic | claude-3-5-sonnet-20241022 |
API_KEY |
| Google Gemini | gemini-1.5-pro |
API_KEY |
Advanced Usage
Custom Prompt Templates
custom_prompt = """
Analyze this document and assign tags based on content.
Filename: {filename}
Content: {content}
Tags to assign: {tags}
Focus on document classification and confidentiality.
Return only tag values separated by commas.
"""
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket",
tags=tags,
custom_prompt_template=custom_prompt
)
Different LLM Providers
# Using Anthropic Claude
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket",
tags=tags,
llm_provider="anthropic"
)
# Using Google Gemini
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket",
tags=tags,
llm_provider="gemini"
)
Architecture
Example
Development
Installation from Source
git clone https://github.com/yourusername/smart_cloud_tag.git
cd smart_cloud_tag
pip install -e ".[all]"
Note: Use quotes around ".[all]" to prevent shell expansion issues in zsh and other shells.
Running Tests
python -m pytest tests/
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please read our Contributing Guide for details.
Support
- 📧 Email: dawarwaqar71@gmail.com
- 🐛 Issues: GitHub Issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_cloud_tag-0.1.0.tar.gz.
File metadata
- Download URL: smart_cloud_tag-0.1.0.tar.gz
- Upload date:
- Size: 118.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c99627e8e396b1c26816468b341e74dd247471e3813de931764a431f73c55bac
|
|
| MD5 |
3c2863fab8e4b265380ec5ad6998eb57
|
|
| BLAKE2b-256 |
37a4cc9a6af16bd737908901cf749a818aa1f096167044ca2f5c701a6597e2e3
|
File details
Details for the file smart_cloud_tag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: smart_cloud_tag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c1ea43498e4489bdbad38f144bc42631182db2a5f6a804f76488b61bf2d9c7f
|
|
| MD5 |
95cdf84e2e135ffda2b518f3a64f21c1
|
|
| BLAKE2b-256 |
38da9a067363ff3a17260bff90fa267d4b7e553fea1e49c2c718bbb316e1628c
|