MCP server for safe S3 data layout analysis and cleanup planning
Project description
S3 Data Organizer MCP
MCP server for safe S3 data layout analysis and cleanup planning.
This is an early prototype. Version 0.1.x is read-only and exposes scan,
analysis, and proposal tools only. It does not delete, copy, tag, or change
lifecycle policies.
This project is not affiliated with, endorsed by, or sponsored by Amazon Web Services. AWS and Amazon S3 are trademarks of Amazon.com, Inc. or its affiliates.
Purpose
The goal is a cloud-storage equivalent of a safe file organizer:
scan S3 prefix
-> summarize layout and cost
-> find large objects and duplicate candidates
-> suggest cleanup/lifecycle options
-> generate a reviewable plan
-> only later apply with explicit confirmation
Current Tools
get_s3_organizer_status: local policy/dependency status.scan_s3_prefix: read S3 object metadata under an allowlisted prefix.summarize_s3_layout: object count, total bytes, extensions, storage classes, top prefixes, and rough monthly storage cost.find_s3_large_objects: largest objects under a prefix.find_s3_duplicate_candidates: ETag-based duplicate candidates.rank_s3_cold_candidates: LRU-like ranking usingLastModified, object size, and artifact type. This is not true last-access time.list_s3_prefix_children: read-only pseudo-folder navigation for one prefix level.analyze_s3_prefix_tree: folder-like rollups by projected S3 prefix depth.analyze_s3_artifact_types: classify objects by artifact type, extension, and top prefix.inspect_s3_hidden_storage: inspect object versions, delete markers, and incomplete multipart uploads.propose_s3_cleanup_options: review options and safe next steps.propose_s3_lifecycle_options: heuristic lifecycle rule ideas.
See docs/COMMANDS.md for the public command contract.
Safety Boundaries
- Read-only by default.
- Requires
S3_ORGANIZER_ALLOWED_ROOTS. - Refuses to inspect S3 URIs outside allowlisted roots.
- Does not perform writes in this version.
- Destructive operations should require future policy opt-in and confirmation tokens.
- ETag duplicate detection is only a candidate signal; multipart/encrypted objects need additional checksum validation.
- Cold-candidate ranking uses S3
LastModifiedas a proxy; S3 object listing metadata does not include true last-access time.
Install
From PyPI:
pipx install s3-data-organizer-mcp
Or run without a persistent install:
uvx s3-data-organizer-mcp
Local development:
python3.11 -m venv .venv
.venv/bin/pip install -e ".[test]"
.venv/bin/python -m pytest
Run the MCP server:
s3-data-organizer-mcp
Example MCP client config:
{
"mcpServers": {
"s3-data-organizer": {
"command": "s3-data-organizer-mcp",
"env": {
"AWS_PROFILE": "research",
"AWS_REGION": "eu-north-1",
"S3_ORGANIZER_ENDPOINT_URL": "",
"S3_ORGANIZER_ALLOWED_ROOTS": "s3://YOUR_BUCKET/data,s3://YOUR_BUCKET/archive",
"S3_ORGANIZER_MAX_SCAN_KEYS": "10000",
"S3_ORGANIZER_STORAGE_PRICE_USD_PER_GB_MONTH": "0.023",
"S3_ORGANIZER_ALLOW_WRITES": "false"
}
}
}
}
The same example is available in examples/mcp-config.json.
Configuration
export AWS_PROFILE=research
export AWS_REGION=eu-north-1
export S3_ORGANIZER_ENDPOINT_URL=
export S3_ORGANIZER_ALLOWED_ROOTS=s3://YOUR_BUCKET/data,s3://YOUR_BUCKET/archive
export S3_ORGANIZER_MAX_SCAN_KEYS=10000
export S3_ORGANIZER_STORAGE_PRICE_USD_PER_GB_MONTH=0.023
Writes are intentionally disabled in the current prototype:
export S3_ORGANIZER_ALLOW_WRITES=false
For S3-compatible providers such as reg.ru, set S3_ORGANIZER_ENDPOINT_URL,
for example:
export AWS_REGION=auto
export S3_ORGANIZER_ENDPOINT_URL=https://s3.regru.cloud
IAM
Read-only prototype permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::YOUR_BUCKET",
"Condition": {
"StringLike": {
"s3:prefix": [
"data/*",
"archive/*"
]
}
}
}
]
}
Only s3:ListBucket is needed for the core scan/summarize/rank tools. The
version and multipart actions are needed only for inspect_s3_hidden_storage.
Future write-capable versions will need separate policies for tagging, copy, delete, lifecycle configuration, or Batch Operations manifest generation.
Publishing
Release steps are documented in PUBLISHING.md. The short version is:
python -m pytest -q
python -m build
python -m twine check dist/*
python -m twine upload dist/*
Product Direction
This should not become a generic S3 file manager. The useful product is:
- S3 layout intelligence.
- Read-only pseudo-folder navigation.
- Cleanup options.
- Lifecycle rule suggestions.
- Duplicate candidate review.
- Cold-candidate and artifact-type ranking.
- Cost/savings estimates.
- Safe manifests for AWS-native execution.
For large buckets, the right backend is likely S3 Inventory + Athena + S3 Batch Operations rather than listing every object interactively.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file s3_data_organizer_mcp-0.1.0.tar.gz.
File metadata
- Download URL: s3_data_organizer_mcp-0.1.0.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89ccae9d6ba61058c2646386bff5c3c167853ad2a70ac6ec8641bc8704371711
|
|
| MD5 |
797c3554213c8d2dcf639cc20a403d64
|
|
| BLAKE2b-256 |
dea74ca1f4f657e3c7b06fd7855c7573a7a17a1df7f51ab3573dbadb5894640f
|
File details
Details for the file s3_data_organizer_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: s3_data_organizer_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ac1bbe152ad2a8b2fc076fd9aeea3e4486d5e99607f16cea384c32fb14bd7a8
|
|
| MD5 |
7f2d1b3467695528fbf2d5c5679e567e
|
|
| BLAKE2b-256 |
41dbbb46a00d71bd16724697cda4be1199093129f83852b5de112ff1976f7aab
|