No project description provided
Project description
AWS Glue Workflow Analyzer
AWS Glue Workflow Analyzer is a Python tool designed to streamline the analysis and debugging of AWS Glue workflows. By leveraging AWS services such as Glue and CloudWatch Logs, this tool provides detailed insights into workflow executions, helping you identify and troubleshoot errors efficiently.
Table of Contents
- AWS Glue Workflow Analyzer
Features
- AWS Client Management: Initialize and manage AWS Glue and CloudWatch Logs clients with robust error handling.
- Step Details Collection: Gather detailed execution data for each step in a workflow, including errors and affected tables.
- Error Context Retrieval: Retrieve relevant error logs from CloudWatch, pinpointing the root cause of failures.
- Workflow Run Retrieval: Fetch and filter workflow runs from AWS Glue within a specified time range.
- Table Analysis: Identify tables affected by workflow failures using a depth-first search (DFS) on the workflow graph.
- Output Management: Save analysis results in JSON or CSV format for easy sharing and review.
- Rich Logging: Enhanced logging with the Rich library for better readability and debugging.
- Command-Line Interface (CLI): Easy-to-use CLI for analyzing workflows and generating reports.
Installation
Prerequisites
- Python 3.7 or higher
- AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
Install from PyPI
pip install aws-glue-workflow-analyzer
Install from Source
Clone the repository and install the dependencies:
git clone https://github.com/victor-mariano-leite/aws_glue_workflow_analyzer.git
cd aws_glue_workflow_analyzer
pip install .
Usage
The AWS Glue Workflow Analyzer can be used to analyze workflows, identify errors, and generate detailed reports. You can run the tool directly from the command line.
Example
gwfa -w my-glue-workflow -d 7 -o output.json -f json
This command analyzes the my-glue-workflow
for the past 7 days, saving the results in JSON format to output.json
.
Command-Line Interface
The CLI provides a simple interface to interact with the AWS Glue Workflow Analyzer.
Options
-w
,--workflows
: List of AWS Glue workflows to analyze (required).-d
,--days
: Number of days to look back for workflow runs (default: 30).-o
,--output
: File path to save the analysis results.-f
,--format
: Output format (json
orcsv
, default:json
).
Help Command
For detailed help, use:
gwfa --help
Configuration
AWS Credentials
Ensure that your AWS credentials are properly configured. The tool will use these credentials to access AWS Glue and CloudWatch Logs.
Credentials can be set up via:
- AWS CLI:
aws configure
- Environment Variables:
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- IAM Roles: If running on an EC2 instance or another AWS service with an assigned role.
Logging Configuration
You can adjust the logging level using the LOG_LEVEL
environment variable:
export LOG_LEVEL=DEBUG
By default, the log level is set to INFO
.
Logging
The tool uses the Rich library to enhance log readability. Logs include timestamps, paths, and local variables for better traceability. Logs are essential for understanding the tool's operations, especially in production environments with limited access to real-time data.
Error Handling
The AWS Glue Workflow Analyzer includes robust error handling:
- CredentialsNotFoundError: Raised when AWS credentials are missing or incomplete.
- APIRequestError: Raised when an API request to AWS services fails.
- WorkflowConnectionError: Raised when a connection to AWS services cannot be established.
These custom exceptions ensure that errors are clearly communicated, making it easier to diagnose and fix issues.
Testing and Pre-Commit Hooks
Unit Tests
The project includes comprehensive unit tests to ensure reliability and correctness. Tests are located in the tests
directory and are executed using pytest
.
To run the tests:
make test
For coverage reports:
make test-cov
Pre-Commit Hooks
To maintain code quality, the project uses several pre-commit hooks configured via .pre-commit-config.yaml
. These hooks include:
- isort: Sorts imports according to PEP8 standards.
- black: Formats code to adhere to Python's Black style guide.
- pylint: Lints the code to catch potential errors and enforce coding standards.
- flake8: Checks the code for style issues and logical errors.
- mypy: Performs static type checking.
- pytest: Runs unit tests with coverage reports.
Setting Up Pre-Commit Hooks
To set up the pre-commit hooks:
pip install pre-commit
pre-commit install
Now, every time you commit changes, these hooks will run automatically, ensuring that your code adheres to the project's standards.
Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/your-feature
). - Commit your changes (
git commit -m 'Add your feature'
). - Push to the branch (
git push origin feature/your-feature
). - Open a pull request.
Please ensure your code adheres to the existing code style and includes appropriate tests. The pre-commit hooks and unit tests must pass before your pull request can be merged.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Boto3: For providing an excellent Python interface to AWS services.
- Rich: For making logging beautiful and more informative.
- The Open Source Community: For their continuous contributions that make projects like this possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aws_glue_workflow_analyzer-1.0.0.tar.gz
.
File metadata
- Download URL: aws_glue_workflow_analyzer-1.0.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1f3c92020e11ab7fb18ba3ab63ae58290c7b4b026e24cf31b2350d17b2aa092 |
|
MD5 | 6da023cec83a55ff29e55d64bbde33ea |
|
BLAKE2b-256 | c244da4dbcc41d05b6b67c5f069691f6274b6acb3ca2f53ed17ad571c76e6ccd |
File details
Details for the file aws_glue_workflow_analyzer-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: aws_glue_workflow_analyzer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 752be667cc640d99fe2a50cd698d97575f72d7d6f5e4ff804e5779ffc2833386 |
|
MD5 | a1b135319224ec660eebe5a99293e980 |
|
BLAKE2b-256 | 297d5a9b9ce83a670b0cfef5db09ef693fd548472101ade02662e0696c478e68 |