Unified storage abstraction for Genropy framework
Project description
genro-storage
Universal storage abstraction for Python with pluggable backends
A modern, elegant Python library that provides a unified interface for accessing files across local filesystems, cloud storage (S3, GCS, Azure), and remote protocols (HTTP). Built on top of fsspec, genro-storage adds an intuitive mount-point system and user-friendly API inspired by Unix filesystems.
Documentation
- Full Documentation - Complete API reference and guides
- API Design - Detailed design specification
- Testing Guide - How to run tests with MinIO
- Interactive Tutorials - Hands-on Jupyter notebooks
Status: Beta - Ready for Production Testing
Current Version: 0.4.3 Last Updated: October 2025
- Core implementation complete
- 15 storage backends working (local, S3, GCS, Azure, HTTP, Memory, Base64, SMB, SFTP, ZIP, TAR, Git, GitHub, WebDAV, LibArchive)
- 411 tests (401 passing, 10 skipped) with 85% coverage on Python 3.9-3.12
- Full documentation on ReadTheDocs
- Battle-tested code from Genropy (19+ years in production, storage abstraction since 2018)
- Available on PyPI
Key Features
- Async/await support - Use in FastAPI, asyncio apps with AsyncStorageManager
- Native permission control - Configure readonly, readwrite, or delete permissions for any backend
- Powered by fsspec - Leverage 20+ battle-tested storage backends
- Mount point system - Organize storage with logical names like
home:,uploads:,s3: - Intuitive API - Pathlib-inspired interface that feels natural and Pythonic
- Intelligent copy strategies - Skip files by existence, size, or hash for efficient incremental backups
- Progress tracking - Built-in callbacks for progress bars and logging during copy operations
- Content-based comparison - Compare files by MD5 hash across different backends
- Efficient hashing - Uses cloud metadata (S3 ETag) when available, avoiding downloads
- External tool integration -
call()method for seamless integration with ffmpeg, imagemagick, pandoc, etc. - WSGI file serving -
serve()method for web frameworks (Flask, Django, Pyramid) with ETag caching - MIME type detection - Automatic content-type detection from file extensions
- Flexible configuration - Load mounts from YAML, JSON, or code
- Dynamic paths - Support for callable paths that resolve at runtime (perfect for user-specific directories)
- Cloud metadata - Get/set custom metadata on S3, GCS, Azure files
- URL generation - Generate presigned URLs for S3, public URLs for sharing
- Base64 utilities - Encode files to data URIs, download from URLs
- S3 versioning - Access historical file versions (when S3 versioning enabled)
- Test-friendly - In-memory backend for fast, isolated testing
- Base64 data URIs - Embed data inline with automatic encoding (writable with mutable paths)
- Production-ready backends - Built on 6+ years of Genropy production experience
- Lightweight core - Optional backends installed only when needed
- Cross-storage operations - Copy/move files between different storage types seamlessly
Why genro-storage vs raw fsspec?
While fsspec is powerful, genro-storage provides:
- Mount point abstraction - Work with logical names instead of full URIs
- Simpler API - Less verbose, more intuitive for common operations
- Configuration management - Load storage configs from files
- Enhanced utilities - Cross-storage copy, unified error handling
Think of it as "requests" is to "urllib" - a friendlier interface to an excellent foundation.
Perfect For
- Multi-cloud applications that need storage abstraction
- Data pipelines processing files from various sources
- Web applications managing uploads across environments
- CLI tools that work with local and remote files
- Testing scenarios requiring storage mocking
Quick Example
Synchronous Usage
from genro_storage import StorageManager
# Configure storage backends
storage = StorageManager()
storage.configure([
{'name': 'home', 'type': 'local', 'path': '/home/user'},
{'name': 'uploads', 'type': 's3', 'bucket': 'my-app-uploads'},
{'name': 'backups', 'type': 'gcs', 'bucket': 'my-backups', 'permissions': 'readwrite'},
{'name': 'public', 'type': 'http', 'base_url': 'https://cdn.example.com', 'permissions': 'readonly'},
{'name': 'data', 'type': 'base64'} # Inline base64 data
])
# Work with files using a unified API
node = storage.node('uploads:users/123/avatar.jpg')
if node.exists:
# Copy from S3 to local
node.copy_to(storage.node('home:cache/avatar.jpg'))
# Read and process
data = node.read_bytes()
# Backup to GCS
node.copy_to(storage.node('backups:avatars/user_123.jpg'))
# Base64 backend: embed data directly in URIs (data URI style)
# Read inline data
import base64
text = "Configuration data"
b64_data = base64.b64encode(text.encode()).decode()
node = storage.node(f'data:{b64_data}')
print(node.read_text()) # "Configuration data"
# Or write to create base64 (path updates automatically)
node = storage.node('data:')
node.write_text("New content")
print(node.path) # "TmV3IGNvbnRlbnQ=" (base64 of "New content")
# Copy from S3 to base64 for inline use
s3_image = storage.node('uploads:photo.jpg')
b64_image = storage.node('data:')
s3_image.copy_to(b64_image)
data_uri = f"data:image/jpeg;base64,{b64_image.path}"
# Advanced features
# 1. Intelligent incremental backups (NEW!)
docs = storage.node('home:documents')
s3_backup = storage.node('uploads:backup/documents')
# Skip files that already exist (fastest)
docs.copy_to(s3_backup, skip='exists')
# Skip files with same size (fast, good accuracy)
docs.copy_to(s3_backup, skip='size')
# Skip files with same content (accurate, uses S3 ETag - fast!)
docs.copy_to(s3_backup, skip='hash')
# With progress tracking
from tqdm import tqdm
pbar = tqdm(desc="Backing up", unit="file")
docs.copy_to(s3_backup, skip='hash',
progress=lambda cur, tot: pbar.update(1))
pbar.close()
# 2. Work with external tools using call() (ffmpeg, imagemagick, etc.)
video = storage.node('uploads:video.mp4')
thumbnail = storage.node('uploads:thumb.jpg')
# Automatically handles cloud download/upload
video.call('ffmpeg', '-i', video, '-vf', 'thumbnail', '-frames:v', '1', thumbnail)
# Or use local_path() for more control
with video.local_path(mode='r') as local_path:
import subprocess
subprocess.run(['ffmpeg', '-i', local_path, 'output.mp4'])
# 3. Serve files via WSGI (Flask, Django, Pyramid)
from flask import Flask, request
app = Flask(__name__)
@app.route('/files/<path:filepath>')
def serve_file(filepath):
node = storage.node(f'uploads:{filepath}')
# ETag caching, streaming, MIME types - all automatic!
return node.serve(request.environ, lambda s, h: None, cache_max_age=3600)
# 4. Check MIME types
doc = storage.node('uploads:report.pdf')
print(doc.mimetype) # 'application/pdf'
# 5. Dynamic paths for multi-user apps
def get_user_storage():
user_id = get_current_user()
return f'/data/users/{user_id}'
storage.configure([
{'name': 'user', 'type': 'local', 'path': get_user_storage}
])
# Path resolves differently per user!
# 6. Cloud metadata
file = storage.node('uploads:document.pdf')
file.set_metadata({
'Author': 'John Doe',
'Department': 'Engineering'
})
# 7. Generate shareable URLs
url = file.url(expires_in=3600) # S3 presigned URL
# 8. Encode to data URI
img = storage.node('home:logo.png')
data_uri = img.to_base64() # data:image/png;base64,...
# 9. Download from internet
remote = storage.node('uploads:downloaded.pdf')
remote.fill_from_url('https://example.com/file.pdf')
Async Usage (NEW in v0.3.0!)
Built on asyncer by Sebastián Ramírez (FastAPI author) for automatic sync→async conversion with no event loop blocking.
from genro_storage import AsyncStorageManager
# Initialize async storage manager
storage = AsyncStorageManager()
# Configure (sync - call at startup)
storage.configure([
{'name': 'uploads', 'type': 's3', 'bucket': 'my-app-uploads'},
{'name': 'cache', 'type': 'local', 'path': '/tmp/cache'}
])
# Use in async context (FastAPI, asyncio, etc.)
async def process_file(file_path: str):
node = storage.node(f'uploads:{file_path}')
# All I/O operations are async
if await node.exists():
data = await node.read_bytes()
# Process and cache
processed = process_data(data)
cache_node = storage.node('cache:processed.dat')
await cache_node.write_bytes(processed)
return processed
raise FileNotFoundError(file_path)
# FastAPI example
from fastapi import FastAPI, HTTPException
app = FastAPI()
@app.get("/files/{filepath:path}")
async def get_file(filepath: str):
"""Serve file from S3 storage."""
node = storage.node(f'uploads:{filepath}')
if not await node.exists():
raise HTTPException(status_code=404, detail="File not found")
return {
"data": await node.read_bytes(),
"size": await node.size(),
"mime_type": node.mimetype # Sync property
}
# Concurrent operations
import asyncio
async def backup_files(file_list):
"""Backup multiple files concurrently."""
async def backup_one(filepath):
source = storage.node(f'uploads:{filepath}')
target = storage.node(f'backups:{filepath}')
data = await source.read_bytes()
await target.write_bytes(data)
# Process all files in parallel
await asyncio.gather(*[backup_one(f) for f in file_list])
Learning with Interactive Tutorials
The best way to learn genro-storage is through our hands-on Jupyter notebooks in the notebooks/ directory.
Run Online (No Installation Required)
Click the badge above to launch an interactive Jupyter environment in your browser. Ready in ~2 minutes!
Run Locally
# 1. Install Jupyter
pip install jupyter notebook
# 2. Navigate to notebooks directory
cd notebooks
# 3. Launch Jupyter
jupyter notebook
# 4. Open 01_quickstart.ipynb and start learning!
Note: Jupyter will open in your browser automatically. Execute cells sequentially with Shift+Enter.
Tutorial Contents
| Notebook | Topic | Duration | Level |
|---|---|---|---|
| 01 - Quickstart | Basic concepts and first steps | 15 min | Beginner |
| 02 - Backends | Storage backends and configuration | 20 min | Beginner |
| 03 - File Operations | Read, write, copy, directories | 25 min | Beginner |
| 04 - Virtual Nodes | iternode, diffnode, zip archives | 30 min | Intermediate |
| 05 - Copy Strategies | Smart copying and filtering | 25 min | Intermediate |
| 06 - Versioning | S3 version history and rollback | 30 min | Intermediate |
| 07 - Advanced Features | External tools, WSGI, metadata | 35 min | Advanced |
| 08 - Real World Examples | Complete use cases | 40 min | Advanced |
Total time: ~3.5 hours • Start here: 01_quickstart.ipynb
See notebooks/README.md for the complete learning guide.
Installation
From GitHub (Recommended)
Install directly from GitHub without cloning:
# Base package
pip install git+https://github.com/genropy/genro-storage.git
# With S3 support
pip install "genro-storage[s3] @ git+https://github.com/genropy/genro-storage.git"
# With all backends
pip install "genro-storage[all] @ git+https://github.com/genropy/genro-storage.git"
From Source (Development)
Clone and install in editable mode:
# Clone repository
git clone https://github.com/genropy/genro-storage.git
cd genro-storage
# Install base package
pip install -e .
# Install with S3 support
pip install -e ".[s3]"
# Install with all backends
pip install -e ".[all]"
# Install for development
pip install -e ".[all,dev]"
Supported Backends
Install optional dependencies for specific backends:
# Cloud storage
pip install genro-storage[s3] # Amazon S3
pip install genro-storage[gcs] # Google Cloud Storage
pip install genro-storage[azure] # Azure Blob Storage
# Network protocols
pip install genro-storage[http] # HTTP/HTTPS
pip install genro-storage[smb] # SMB/CIFS (Windows/Samba shares)
pip install genro-storage[sftp] # SFTP (SSH File Transfer)
pip install genro-storage[webdav] # WebDAV (Nextcloud, ownCloud, SharePoint)
# Archive formats
pip install genro-storage[libarchive] # RAR, 7z, ISO, and 20+ formats
# Version control
# Git and GitHub are built-in to fsspec (no extra install needed)
# Other
pip install genro-storage[async] # Async support
pip install genro-storage[all] # All backends + async
Built-in backends (no extra dependencies):
- Local filesystem
- Memory (in-memory storage for testing)
- Base64 (inline data URIs)
- ZIP archives
- TAR archives (with gzip, bzip2, xz compression)
- Git repositories (requires system
pygit2) - GitHub repositories
Testing
# Unit tests (fast, no external dependencies)
pytest tests/test_local_storage.py -v
# Integration tests (requires Docker + MinIO)
docker-compose up -d
pytest tests/test_s3_integration.py -v
# All tests
pytest tests/ -v
# With coverage
pytest tests/ -v --cov=genro_storage
See TESTING.md for detailed testing instructions with MinIO.
Built With
- fsspec - Pythonic filesystem abstraction
- asyncer - Async wrapper (v0.3.0+)
- Modern Python (3.9+) with full type hints
- Optional backends: s3fs, gcsfs, adlfs, aiohttp, smbprotocol, paramiko, webdav4, libarchive-c
Origins
genro-storage is extracted and modernized from Genropy, a Python web framework in production since 2006 (19+ years). The storage abstraction layer was introduced in 2018 and has been battle-tested in production for 6+ years. We're making this powerful storage abstraction available as a standalone library for the wider Python community.
Development Status
Phase: Beta - Production Testing
- API Design Complete and Stable
- Core Implementation Complete
- FsspecBackend (15 storage backends: local, S3, GCS, Azure, HTTP, Memory, Base64, SMB, SFTP, ZIP, TAR, Git, GitHub, WebDAV, LibArchive)
- Comprehensive Test Suite (411 tests, 85% coverage)
- CI/CD with Python 3.9, 3.10, 3.11, 3.12
- MD5 hashing and content-based equality
- Base64 backend with writable mutable paths
- Intelligent copy skip strategies (exists, size, hash, custom)
- call() method for external tool integration (ffmpeg, imagemagick, etc.)
- serve() method for WSGI file serving (Flask, Django, Pyramid)
- mimetype property for automatic content-type detection
- local_path() context manager for external tools
- Callable path support for dynamic directories
- Native permission control (readonly, readwrite, delete)
- Cloud metadata get/set (S3, GCS, Azure)
- URL generation (presigned URLs, data URIs)
- S3 versioning support
- Full Documentation on ReadTheDocs
- MinIO Integration Testing
- Async/await support (AsyncStorageManager, AsyncStorageNode)
- Ready for early adopters and production testing
- Extended GCS/Azure integration testing in progress
Recent Releases:
- v0.4.2 (October 2025) - Git, GitHub, WebDAV, LibArchive backends
- v0.4.1 (October 2025) - SMB, SFTP, ZIP, TAR backends
- v0.4.0 (October 2025) - Relative mounts with permissions, unified read/write API
- v0.3.0 (October 2025) - Async support via asyncer wrapper
- v0.2.0 (October 2025) - Virtual nodes, tutorials, enhanced testing
Contributing
Contributions are welcome! We follow a Git Flow workflow with protected branches for code quality.
Quick Start:
- Read our Contributing Guide for detailed workflow and guidelines
- Fork the repository and create a feature branch from
develop - Make your changes with tests and documentation
- Submit a Pull Request to the
developbranch
Branch Structure:
main- Production releases (protected, requires PR review)develop- Integration branch (protected, requires PR review)feature/*- Feature development branchesbugfix/*- Bug fixeshotfix/*- Critical production fixes
See CONTRIBUTING.md for complete workflow documentation.
Areas for contribution:
- Add integration tests for GCS and Azure backends
- Improve test coverage (target: 90%+)
- Add integration tests for new backends (SMB, SFTP, WebDAV, etc.)
- Performance optimizations
- Additional backend implementations
License
MIT License - See LICENSE for details
Made with ❤️ by the Genropy team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genro_storage-0.4.3.tar.gz.
File metadata
- Download URL: genro_storage-0.4.3.tar.gz
- Upload date:
- Size: 103.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a0745f82fa821a9cfa40d3455e729bb91410b17ff67d30c4978bc4948c8fe34
|
|
| MD5 |
1490de00b4638673daf82ba3915e6a61
|
|
| BLAKE2b-256 |
38db7bde88563a6813b6730b986a389c66168eb8795fd1ea40b4b94286be33a9
|
Provenance
The following attestation bundles were made for genro_storage-0.4.3.tar.gz:
Publisher:
release.yml on genropy/genro-storage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genro_storage-0.4.3.tar.gz -
Subject digest:
2a0745f82fa821a9cfa40d3455e729bb91410b17ff67d30c4978bc4948c8fe34 - Sigstore transparency entry: 652881679
- Sigstore integration time:
-
Permalink:
genropy/genro-storage@78e34e331197606db03d76c7fd76bc67e4f0ec70 -
Branch / Tag:
refs/tags/v0.4.3 - Owner: https://github.com/genropy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@78e34e331197606db03d76c7fd76bc67e4f0ec70 -
Trigger Event:
push
-
Statement type:
File details
Details for the file genro_storage-0.4.3-py3-none-any.whl.
File metadata
- Download URL: genro_storage-0.4.3-py3-none-any.whl
- Upload date:
- Size: 63.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e09a48c080abf0cba27a098469ca84bbe2364c4f9eb849e4c129010ab58e6db
|
|
| MD5 |
a67325be7be6227ec192a5427dfef5a7
|
|
| BLAKE2b-256 |
f2904bfb7e9d774363252f7bb3f6ddf30d3b74922e5c5601fe26b10523a9cab5
|
Provenance
The following attestation bundles were made for genro_storage-0.4.3-py3-none-any.whl:
Publisher:
release.yml on genropy/genro-storage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genro_storage-0.4.3-py3-none-any.whl -
Subject digest:
3e09a48c080abf0cba27a098469ca84bbe2364c4f9eb849e4c129010ab58e6db - Sigstore transparency entry: 652881683
- Sigstore integration time:
-
Permalink:
genropy/genro-storage@78e34e331197606db03d76c7fd76bc67e4f0ec70 -
Branch / Tag:
refs/tags/v0.4.3 - Owner: https://github.com/genropy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@78e34e331197606db03d76c7fd76bc67e4f0ec70 -
Trigger Event:
push
-
Statement type: