Skip to main content

MCP server for safe S3 data layout analysis and cleanup planning

Project description

S3 Data Organizer MCP

MCP server for safe S3 data layout analysis and cleanup planning.

This is an early prototype. Version 0.1.x is read-only and exposes scan, analysis, and proposal tools only. It does not delete, copy, tag, or change lifecycle policies.

This project is not affiliated with, endorsed by, or sponsored by Amazon Web Services. AWS and Amazon S3 are trademarks of Amazon.com, Inc. or its affiliates.

Purpose

The goal is a cloud-storage equivalent of a safe file organizer:

scan S3 prefix
-> summarize layout and cost
-> find large objects and duplicate candidates
-> suggest cleanup/lifecycle options
-> generate a reviewable plan
-> only later apply with explicit confirmation

Current Tools

  • get_s3_organizer_status: local policy/dependency status.
  • scan_s3_prefix: read S3 object metadata under an allowlisted prefix.
  • summarize_s3_layout: object count, total bytes, extensions, storage classes, top prefixes, and rough monthly storage cost.
  • find_s3_large_objects: largest objects under a prefix.
  • find_s3_duplicate_candidates: ETag-based duplicate candidates.
  • rank_s3_cold_candidates: LRU-like ranking using LastModified, object size, and artifact type. This is not true last-access time.
  • list_s3_prefix_children: read-only pseudo-folder navigation for one prefix level.
  • analyze_s3_prefix_tree: folder-like rollups by projected S3 prefix depth.
  • analyze_s3_artifact_types: classify objects by artifact type, extension, and top prefix.
  • inspect_s3_hidden_storage: inspect object versions, delete markers, and incomplete multipart uploads.
  • propose_s3_cleanup_options: review options and safe next steps.
  • propose_s3_lifecycle_options: heuristic lifecycle rule ideas.

See docs/COMMANDS.md for the public command contract.

Safety Boundaries

  • Read-only by default.
  • Requires S3_ORGANIZER_ALLOWED_ROOTS.
  • Refuses to inspect S3 URIs outside allowlisted roots.
  • Does not perform writes in this version.
  • Destructive operations should require future policy opt-in and confirmation tokens.
  • ETag duplicate detection is only a candidate signal; multipart/encrypted objects need additional checksum validation.
  • Cold-candidate ranking uses S3 LastModified as a proxy; S3 object listing metadata does not include true last-access time.

Install

From PyPI:

pipx install s3-data-organizer-mcp

Or run without a persistent install:

uvx s3-data-organizer-mcp

Local development:

python3.11 -m venv .venv
.venv/bin/pip install -e ".[test]"
.venv/bin/python -m pytest

Run the MCP server:

s3-data-organizer-mcp

Example MCP client config:

{
  "mcpServers": {
    "s3-data-organizer": {
      "command": "s3-data-organizer-mcp",
      "env": {
        "AWS_PROFILE": "research",
        "AWS_REGION": "eu-north-1",
        "S3_ORGANIZER_ENDPOINT_URL": "",
        "S3_ORGANIZER_ALLOWED_ROOTS": "s3://YOUR_BUCKET/data,s3://YOUR_BUCKET/archive",
        "S3_ORGANIZER_MAX_SCAN_KEYS": "10000",
        "S3_ORGANIZER_STORAGE_PRICE_USD_PER_GB_MONTH": "0.023",
        "S3_ORGANIZER_ALLOW_WRITES": "false"
      }
    }
  }
}

The same example is available in examples/mcp-config.json.

Configuration

export AWS_PROFILE=research
export AWS_REGION=eu-north-1
export S3_ORGANIZER_ENDPOINT_URL=
export S3_ORGANIZER_ALLOWED_ROOTS=s3://YOUR_BUCKET/data,s3://YOUR_BUCKET/archive
export S3_ORGANIZER_MAX_SCAN_KEYS=10000
export S3_ORGANIZER_STORAGE_PRICE_USD_PER_GB_MONTH=0.023

Writes are intentionally disabled in the current prototype:

export S3_ORGANIZER_ALLOW_WRITES=false

For S3-compatible providers such as reg.ru, set S3_ORGANIZER_ENDPOINT_URL, for example:

export AWS_REGION=auto
export S3_ORGANIZER_ENDPOINT_URL=https://s3.regru.cloud

IAM

Read-only prototype permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:ListBucketVersions",
        "s3:ListBucketMultipartUploads"
      ],
      "Resource": "arn:aws:s3:::YOUR_BUCKET",
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "data/*",
            "archive/*"
          ]
        }
      }
    }
  ]
}

Only s3:ListBucket is needed for the core scan/summarize/rank tools. The version and multipart actions are needed only for inspect_s3_hidden_storage.

Future write-capable versions will need separate policies for tagging, copy, delete, lifecycle configuration, or Batch Operations manifest generation.

Publishing

Release steps are documented in PUBLISHING.md. The short version is:

python -m pytest -q
python -m build
python -m twine check dist/*
python -m twine upload dist/*

Product Direction

This should not become a generic S3 file manager. The useful product is:

  • S3 layout intelligence.
  • Read-only pseudo-folder navigation.
  • Cleanup options.
  • Lifecycle rule suggestions.
  • Duplicate candidate review.
  • Cold-candidate and artifact-type ranking.
  • Cost/savings estimates.
  • Safe manifests for AWS-native execution.

For large buckets, the right backend is likely S3 Inventory + Athena + S3 Batch Operations rather than listing every object interactively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3_data_organizer_mcp-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3_data_organizer_mcp-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file s3_data_organizer_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: s3_data_organizer_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for s3_data_organizer_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 89ccae9d6ba61058c2646386bff5c3c167853ad2a70ac6ec8641bc8704371711
MD5 797c3554213c8d2dcf639cc20a403d64
BLAKE2b-256 dea74ca1f4f657e3c7b06fd7855c7573a7a17a1df7f51ab3573dbadb5894640f

See more details on using hashes here.

File details

Details for the file s3_data_organizer_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for s3_data_organizer_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ac1bbe152ad2a8b2fc076fd9aeea3e4486d5e99607f16cea384c32fb14bd7a8
MD5 7f2d1b3467695528fbf2d5c5679e567e
BLAKE2b-256 41dbbb46a00d71bd16724697cda4be1199093129f83852b5de112ff1976f7aab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page