A powerful Python tool for searching S3 objects across multiple buckets
Project description
S3 Object Search Tool
A powerful Python tool for searching S3 objects across multiple buckets with flexible filtering and output options.
Features
- Search for objects containing specific terms in their keys
- Support for regex patterns in all search and filter operations
- Filter buckets by inclusion/exclusion patterns
- Multiple output formats (table, stacked, raw, CSV)
- Streaming output for immediate results
- Human-readable file sizes
- Cross-platform compatibility
Requirements
- Python 3.6+
- AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
- Required permissions:
s3:ListBucket,s3:ListObjectsV2
Installation
Option 1: Install from PyPI (Recommended)
pip install search-s3
Option 2: Install from GitHub
pip install git+https://github.com/avanrossum/search_s3.git
Option 3: Install from source
# Clone the repository
git clone https://github.com/avanrossum/search_s3.git
cd search_s3
pip install -e .
AWS Configuration
Ensure you have AWS credentials configured:
aws configure
Required permissions: s3:ListBucket, s3:ListObjectsV2
Basic Usage
Required Arguments
The search term is required and can be provided as a positional argument or flag. By default, it performs literal substring matching:
# Positional argument
search-s3 "search-term"
# Flag format
search-s3 --term "search-term"
search-s3 -t "search-term"
Regex Support
Enable regex pattern matching for more powerful searches:
# Case-sensitive regex
search-s3 --regex -t "config\.(json|yaml|yml)$"
# Case-insensitive regex
search-s3 --regex-ignore-case -t "backup.*202[34]"
# Regex with bucket filtering
search-s3 --regex -t "\.log$" -b "prod.*"
Optional Bucket Filtering
Filter buckets by inclusion pattern:
# Search only buckets containing "gridpane"
search-s3 "foobar" "gridpane"
# Using flags
search-s3 --term "foobar" --bucket "gridpane"
search-s3 -t "foobar" -b "gridpane"
Advanced Filtering
Regex Patterns
The tool supports three modes of pattern matching:
- Literal mode (default): Simple substring matching
- Regex mode (
--regex): Case-sensitive regex patterns - Regex ignore-case mode (
--regex-ignore-case): Case-insensitive regex patterns
Common Regex Examples
# Find files with specific extensions
search-s3 --regex -t "\.(log|txt|csv)$"
# Find files from specific date ranges
search-s3 --regex -t "202[34]-[01][0-9]-[0-3][0-9]"
# Find files in specific directories
search-s3 --regex -t "^config/.*\.json$"
# Case-insensitive search
search-s3 --regex-ignore-case -t "backup.*\.(zip|tar|gz)$"
# Complex patterns
search-s3 --regex -t "(prod|staging)/.*\.(log|error)$"
Exclusion Filters
Exclude objects or buckets containing specific terms:
# Exclude objects with "backup" in the key
search-s3 -t "config" -te "backup"
# Exclude buckets with "archive" in the name
search-s3 -t "data" -be "archive"
# Combine inclusion and exclusion
search-s3 -t "foobar" -b "gridpane" -te "temp" -be "archive"
# Regex exclusions
search-s3 --regex -t "\.log$" -te "\.(tmp|temp)$" -be "archive.*"
Multiple Exclusions
You can use multiple exclusion filters:
# Exclude multiple terms from object keys
search-s3 -t "config" -te "backup" -te "temp" -te "cache"
# Exclude multiple bucket patterns
search-s3 -t "data" -be "archive" -be "old" -be "deprecated"
# Regex exclusions
search-s3 --regex -t "\.log$" -te "\.(tmp|temp)$" -be "archive.*"
Output Formats
1. Table Format (Default)
Clean, aligned table output with no truncation:
search-s3 "foobar"
Example output:
Bucket Key Size Modified Class
gridpane-backups-58s48ra6-d31e-4ffe-6326-6421ad5ca95b snapshots/foobar-com/10481 550B 2025-06-20T00:00:10+00:00 STANDARD
gridpane-backups-58s48ra6-d31e-4ffe-6326-6421ad5ca95b snapshots/foobar-com/11231 550B 2025-07-20T00:00:10+00:00 STANDARD
2. Stacked Format
One object per section with clear separation:
search-s3 "foobar" --stacked
Example output:
=== Object 1 ===
Bucket: gridpane-backups-58s48ra6-531e-4ffe-1233-6421ad5ca95b
Key: snapshots/foobar-com/10481
Size: 550B
Modified: 2025-06-20T00:00:10+00:00
Class: STANDARD
=== Object 2 ===
Bucket: gridpane-backups-58s48ra6-531e-4ffe-1233-6421ad5ca95b
Key: snapshots/foobar-com/11231
Size: 550B
Modified: 2025-07-20T00:00:10+00:00
Class: STANDARD
3. Raw Format
Tab-separated output for easy copy-paste:
search-s3 "foobar" --raw
Example output:
Bucket Key Size LastModified StorageClass
gridpane-backups-58s48ra6-g31e-4ffe-7895-6421ad5ca95b snapshots/foobar-com/10481 550B 2025-06-20T00:00:10+00:00 STANDARD
4. CSV Format
Comma-separated values for spreadsheet import:
# Output to terminal
search-s3 "foobar" --csv
# Save to file
search-s3 "foobar" --csv --csv-file results.csv
Example output:
Bucket,Key,Size,LastModified,StorageClass
gridpane-backups-58s48ra6-a31e-4ffe-1548-6421ad5ca95b,snapshots/foobar-com/10481,550B,2025-06-20T00:00:10+00:00,STANDARD
Performance Characteristics
- Table format: Collects all results first for proper column sizing
- Stacked format: Streams results as they're found
- Raw format: Streams results as they're found
- CSV format: Streams results as they're found
Real-World Examples
Find Configuration Files
# Find all config files but exclude backups and temp files
search-s3 -t "config" -te "backup" -te "temp" --stacked
# Find config files using regex (more precise)
search-s3 --regex -t "config\.(json|yaml|yml|conf)$" -te "\.(bak|tmp)$" --stacked
Search Specific Project
# Search for project files in specific bucket pattern
search-s3 -t "myproject" -b "production" -be "archive" --csv --csv-file project_files.csv
Backup Analysis
# Find all backup files from last month
search-s3 -t "backup" -b "gridpane" -te "old" --raw
Data Migration Planning
# Find all data files for migration planning
search-s3 -t "data" -be "archive" -be "deprecated" --csv --csv-file migration_data.csv
# Find specific data file types using regex
search-s3 --regex -t "\.(csv|json|parquet)$" -be "archive.*" --csv --csv-file data_files.csv
Command Line Options
| Option | Short | Description |
|---|---|---|
--term |
-t |
Search term or regex pattern (case-sensitive) |
--bucket |
-b |
Include buckets matching this term or regex |
--term-excluding |
-te |
Exclude objects with keys matching this term or regex |
--bucket-excluding |
-be |
Exclude buckets matching this term or regex |
--regex |
Treat all patterns as regex (case-sensitive) | |
--regex-ignore-case |
Treat all patterns as regex (case-insensitive) | |
--raw |
Output tab-separated data | |
--stacked |
Output in stacked format | |
--csv |
Output in CSV format | |
--csv-file |
Specify CSV output file |
Error Handling
- Missing search term: Shows error message with usage instructions
- No results found: Displays "No results found." message
- AWS errors: Standard AWS SDK error messages
- File write errors: Clear error messages for CSV file operations
Tips and Best Practices
- Use bucket filtering to improve performance when searching large numbers of buckets
- Combine inclusion and exclusion filters for precise results
- Use stacked format for detailed inspection of individual objects
- Use CSV format for data analysis and reporting
- Use raw format for quick copy-paste operations
- Streaming formats (stacked, raw, CSV) provide immediate feedback for long searches
Troubleshooting
Common Issues
- No results found: Check your search term and bucket filters
- Permission denied: Ensure AWS credentials have S3 list permissions
- CSV file not created: Check write permissions in the target directory
- Slow performance: Use bucket filtering to reduce search scope
Debug Mode
For troubleshooting, you can add verbose output by modifying the tool to include debug prints.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file search_s3-1.0.0.tar.gz.
File metadata
- Download URL: search_s3-1.0.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56b7ff6016f324d405d51287ccecd7ec6b7c000e0acb8f214fef32ddb9f90369
|
|
| MD5 |
26d538176cb19177edc1453e7135ca21
|
|
| BLAKE2b-256 |
159ded25325535ffd0a2e5c3634f1d40aade982f1a30f65b3e006a5240f4068a
|
File details
Details for the file search_s3-1.0.0-py3-none-any.whl.
File metadata
- Download URL: search_s3-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c1f750807fc601512a4b11aa3a081c8924bd3e5ad05f908aebf90ccfbc9beb8
|
|
| MD5 |
a14fa63bd90f9040924d18030f61b855
|
|
| BLAKE2b-256 |
dd4034e14b7b46e2f2d4dd8de33582cc5fc83a3161e65a60570a7429a4d13e97
|