Skip to main content

A powerful Python tool for searching S3 objects across multiple buckets

Project description

S3 Object Search Tool

Tests Python Version License: MIT

A powerful Python tool for searching S3 objects across multiple buckets with flexible filtering and output options.

Features

  • Search for objects containing specific terms in their keys
  • Support for regex patterns in all search and filter operations
  • Filter buckets by inclusion/exclusion patterns
  • Multiple output formats (table, stacked, raw, CSV)
  • Streaming output for immediate results
  • Human-readable file sizes
  • Cross-platform compatibility

Requirements

  • Python 3.6+
  • AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
  • Required permissions: s3:ListBucket, s3:ListObjectsV2

Installation

Option 1: Install from PyPI (Recommended)

pip install search-s3

Option 2: Install from GitHub

pip install git+https://github.com/avanrossum/search_s3.git

Option 3: Install from source

# Clone the repository
git clone https://github.com/avanrossum/search_s3.git
cd search_s3
pip install -e .

AWS Configuration

Ensure you have AWS credentials configured:

aws configure

Required permissions: s3:ListBucket, s3:ListObjectsV2

Basic Usage

Required Arguments

The search term is required and can be provided as a positional argument or flag. By default, it performs literal substring matching:

# Positional argument
search-s3 "search-term"

# Flag format
search-s3 --term "search-term"
search-s3 -t "search-term"

Regex Support

Enable regex pattern matching for more powerful searches:

# Case-sensitive regex
search-s3 --regex -t "config\.(json|yaml|yml)$"

# Case-insensitive regex
search-s3 --regex-ignore-case -t "backup.*202[34]"

# Regex with bucket filtering
search-s3 --regex -t "\.log$" -b "prod.*"

Optional Bucket Filtering

Filter buckets by inclusion pattern:

# Search only buckets containing "gridpane"
search-s3 "foobar" "gridpane"

# Using flags
search-s3 --term "foobar" --bucket "gridpane"
search-s3 -t "foobar" -b "gridpane"

Advanced Filtering

Regex Patterns

The tool supports three modes of pattern matching:

  1. Literal mode (default): Simple substring matching
  2. Regex mode (--regex): Case-sensitive regex patterns
  3. Regex ignore-case mode (--regex-ignore-case): Case-insensitive regex patterns

Common Regex Examples

# Find files with specific extensions
search-s3 --regex -t "\.(log|txt|csv)$"

# Find files from specific date ranges
search-s3 --regex -t "202[34]-[01][0-9]-[0-3][0-9]"

# Find files in specific directories
search-s3 --regex -t "^config/.*\.json$"

# Case-insensitive search
search-s3 --regex-ignore-case -t "backup.*\.(zip|tar|gz)$"

# Complex patterns
search-s3 --regex -t "(prod|staging)/.*\.(log|error)$"

Exclusion Filters

Exclude objects or buckets containing specific terms:

# Exclude objects with "backup" in the key
search-s3 -t "config" -te "backup"

# Exclude buckets with "archive" in the name
search-s3 -t "data" -be "archive"

# Combine inclusion and exclusion
search-s3 -t "foobar" -b "gridpane" -te "temp" -be "archive"

# Regex exclusions
search-s3 --regex -t "\.log$" -te "\.(tmp|temp)$" -be "archive.*"

Multiple Exclusions

You can use multiple exclusion filters:

# Exclude multiple terms from object keys
search-s3 -t "config" -te "backup" -te "temp" -te "cache"

# Exclude multiple bucket patterns
search-s3 -t "data" -be "archive" -be "old" -be "deprecated"

# Regex exclusions
search-s3 --regex -t "\.log$" -te "\.(tmp|temp)$" -be "archive.*"

Output Formats

1. Table Format (Default)

Clean, aligned table output with no truncation:

search-s3 "foobar"

Example output:

Bucket                                                    Key                                    Size       Modified              Class
gridpane-backups-58s48ra6-d31e-4ffe-6326-6421ad5ca95b   snapshots/foobar-com/10481      550B       2025-06-20T00:00:10+00:00 STANDARD
gridpane-backups-58s48ra6-d31e-4ffe-6326-6421ad5ca95b   snapshots/foobar-com/11231      550B       2025-07-20T00:00:10+00:00 STANDARD

2. Stacked Format

One object per section with clear separation:

search-s3 "foobar" --stacked

Example output:

=== Object 1 ===
Bucket:     gridpane-backups-58s48ra6-531e-4ffe-1233-6421ad5ca95b
Key:        snapshots/foobar-com/10481
Size:       550B
Modified:   2025-06-20T00:00:10+00:00
Class:      STANDARD

=== Object 2 ===
Bucket:     gridpane-backups-58s48ra6-531e-4ffe-1233-6421ad5ca95b
Key:        snapshots/foobar-com/11231
Size:       550B
Modified:   2025-07-20T00:00:10+00:00
Class:      STANDARD

3. Raw Format

Tab-separated output for easy copy-paste:

search-s3 "foobar" --raw

Example output:

Bucket	Key	Size	LastModified	StorageClass
gridpane-backups-58s48ra6-g31e-4ffe-7895-6421ad5ca95b	snapshots/foobar-com/10481	550B	2025-06-20T00:00:10+00:00	STANDARD

4. CSV Format

Comma-separated values for spreadsheet import:

# Output to terminal
search-s3 "foobar" --csv

# Save to file
search-s3 "foobar" --csv --csv-file results.csv

Example output:

Bucket,Key,Size,LastModified,StorageClass
gridpane-backups-58s48ra6-a31e-4ffe-1548-6421ad5ca95b,snapshots/foobar-com/10481,550B,2025-06-20T00:00:10+00:00,STANDARD

Performance Characteristics

  • Table format: Collects all results first for proper column sizing
  • Stacked format: Streams results as they're found
  • Raw format: Streams results as they're found
  • CSV format: Streams results as they're found

Real-World Examples

Find Configuration Files

# Find all config files but exclude backups and temp files
search-s3 -t "config" -te "backup" -te "temp" --stacked

# Find config files using regex (more precise)
search-s3 --regex -t "config\.(json|yaml|yml|conf)$" -te "\.(bak|tmp)$" --stacked

Search Specific Project

# Search for project files in specific bucket pattern
search-s3 -t "myproject" -b "production" -be "archive" --csv --csv-file project_files.csv

Backup Analysis

# Find all backup files from last month
search-s3 -t "backup" -b "gridpane" -te "old" --raw

Data Migration Planning

# Find all data files for migration planning
search-s3 -t "data" -be "archive" -be "deprecated" --csv --csv-file migration_data.csv

# Find specific data file types using regex
search-s3 --regex -t "\.(csv|json|parquet)$" -be "archive.*" --csv --csv-file data_files.csv

Command Line Options

Option Short Description
--term -t Search term or regex pattern (case-sensitive)
--bucket -b Include buckets matching this term or regex
--term-excluding -te Exclude objects with keys matching this term or regex
--bucket-excluding -be Exclude buckets matching this term or regex
--regex Treat all patterns as regex (case-sensitive)
--regex-ignore-case Treat all patterns as regex (case-insensitive)
--raw Output tab-separated data
--stacked Output in stacked format
--csv Output in CSV format
--csv-file Specify CSV output file

Error Handling

  • Missing search term: Shows error message with usage instructions
  • No results found: Displays "No results found." message
  • AWS errors: Standard AWS SDK error messages
  • File write errors: Clear error messages for CSV file operations

Tips and Best Practices

  1. Use bucket filtering to improve performance when searching large numbers of buckets
  2. Combine inclusion and exclusion filters for precise results
  3. Use stacked format for detailed inspection of individual objects
  4. Use CSV format for data analysis and reporting
  5. Use raw format for quick copy-paste operations
  6. Streaming formats (stacked, raw, CSV) provide immediate feedback for long searches

Troubleshooting

Common Issues

  1. No results found: Check your search term and bucket filters
  2. Permission denied: Ensure AWS credentials have S3 list permissions
  3. CSV file not created: Check write permissions in the target directory
  4. Slow performance: Use bucket filtering to reduce search scope

Debug Mode

For troubleshooting, you can add verbose output by modifying the tool to include debug prints.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_s3-1.0.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

search_s3-1.0.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file search_s3-1.0.0.tar.gz.

File metadata

  • Download URL: search_s3-1.0.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for search_s3-1.0.0.tar.gz
Algorithm Hash digest
SHA256 56b7ff6016f324d405d51287ccecd7ec6b7c000e0acb8f214fef32ddb9f90369
MD5 26d538176cb19177edc1453e7135ca21
BLAKE2b-256 159ded25325535ffd0a2e5c3634f1d40aade982f1a30f65b3e006a5240f4068a

See more details on using hashes here.

File details

Details for the file search_s3-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: search_s3-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for search_s3-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c1f750807fc601512a4b11aa3a081c8924bd3e5ad05f908aebf90ccfbc9beb8
MD5 a14fa63bd90f9040924d18030f61b855
BLAKE2b-256 dd4034e14b7b46e2f2d4dd8de33582cc5fc83a3161e65a60570a7429a4d13e97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page