Diagnose why your ECS tasks fail to start or keep crashing
Project description
ECS Task Doctor
Diagnose why your ECS tasks fail to start or keep crashing — in one command.
ECS Task Doctor aggregates information from ECS, CloudWatch, ECR, IAM, and EC2 into a single, human-readable diagnosis report. No more jumping between 7 AWS console tabs.
Installation
pip install ecs-task-doctor
Quick Start
# Diagnose a specific service
ecs-doctor diagnose --cluster my-cluster --service my-service
# Diagnose a specific task
ecs-doctor diagnose --cluster my-cluster --task arn:aws:ecs:us-east-1:123:task/my-cluster/abc123
# Scan all services in a cluster for issues
ecs-doctor scan --cluster my-cluster
# Quick health check
ecs-doctor health --cluster my-cluster
What It Checks
| Check | What it does |
|---|---|
| Task Status | Parses stopped reasons and container exit codes (OOM, segfault, etc.) |
| Service Events | Detects crash loops, placement failures, and capacity issues |
| CloudWatch Logs | Scans recent logs for error patterns (OOM, connection refused, etc.) |
| Image | Verifies ECR images exist and are pullable |
| IAM | Validates task execution and task roles exist |
| Resources | Checks CPU/memory constraints and cluster capacity |
| Networking | Verifies subnets have IPs, security groups allow egress |
Example Output
╭─────────────────────────────────────────────────╮
│ ECS Task Doctor — Diagnosis Report │
│ Cluster: production Service: api-server │
╰─────────────────────────────────────────────────╯
🔴 CRITICAL: Container keeps crashing (3 restarts in 10 min)
📋 Checks:
✅ Image: 123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.1.0 — exists and pullable
✅ IAM: Task execution role has required permissions
✅ Network: Subnets have available IPs, security groups allow egress
❌ Task Status: Essential container exited with code 137 (OOM Kill)
⚠️ Resources: Container memory limit (512MB) is close to task memory (512MB)
❌ Logs: Last error — "JavaScript heap out of memory"
💡 Recommendation:
1. Increase container memory limit from 512MB to 1024MB
2. Update task definition memory from 512 to 1024
3. Consider adding --max-old-space-size=768 to Node.js startup
📝 Full logs: aws logs tail /ecs/api-server --since 1h
Output Formats
# Rich terminal output (default)
ecs-doctor diagnose --cluster my-cluster --service my-service
# JSON (for scripting/automation)
ecs-doctor diagnose --cluster my-cluster --service my-service --format json
# Markdown (for reports/PRs)
ecs-doctor diagnose --cluster my-cluster --service my-service --format markdown
Commands
ecs-doctor diagnose
Run a full diagnosis on a service or task.
ecs-doctor diagnose --cluster CLUSTER --service SERVICE [--region REGION] [--format FORMAT]
ecs-doctor diagnose --cluster CLUSTER --task TASK_ARN [--region REGION] [--format FORMAT]
ecs-doctor scan
Scan all services in a cluster and diagnose any unhealthy ones.
ecs-doctor scan --cluster CLUSTER [--region REGION] [--format FORMAT]
ecs-doctor health
Quick health overview of all services in a cluster.
ecs-doctor health --cluster CLUSTER [--region REGION] [--format FORMAT]
Required AWS Permissions
ECS Task Doctor needs read-only access to several AWS services:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:DescribeClusters",
"ecs:DescribeServices",
"ecs:DescribeTasks",
"ecs:DescribeTaskDefinition",
"ecs:ListServices",
"ecs:ListTasks",
"ecs:ListContainerInstances",
"ecs:DescribeContainerInstances",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"ecr:DescribeRepositories",
"ecr:DescribeImages",
"iam:GetRole",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups"
],
"Resource": "*"
}
]
}
Development
# Install in dev mode
pip install -e '.[dev]'
# Run tests
pytest -v
# Lint
ruff check src/ tests/
See CONTRIBUTING.md for more details.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ecs_task_doctor-0.1.1.tar.gz.
File metadata
- Download URL: ecs_task_doctor-0.1.1.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fc2e9d9f79a1a8b1cca948058f6241648bccb9b0ae65ae912955979f23ec8ca
|
|
| MD5 |
e1e0850d4d92b450f2c8c1ed21674502
|
|
| BLAKE2b-256 |
453a87691803e5523870d0e356a9da1964b6942f285cd9d5abde3567d3d1bfab
|
Provenance
The following attestation bundles were made for ecs_task_doctor-0.1.1.tar.gz:
Publisher:
publish.yml on rishi1508/ecs-task-doctor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecs_task_doctor-0.1.1.tar.gz -
Subject digest:
6fc2e9d9f79a1a8b1cca948058f6241648bccb9b0ae65ae912955979f23ec8ca - Sigstore transparency entry: 1162454568
- Sigstore integration time:
-
Permalink:
rishi1508/ecs-task-doctor@7b6bd5c43343c43bc0c6debecc688adf6c7b1f01 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rishi1508
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7b6bd5c43343c43bc0c6debecc688adf6c7b1f01 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ecs_task_doctor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ecs_task_doctor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bb4d0ac1323f9b5d1e97b9cd51fb5835ebbc19710ef74a39b7540a8ca9bab65
|
|
| MD5 |
a97536f60bc8073e1a4142ccd896bdf9
|
|
| BLAKE2b-256 |
e3ded6a80960f53f3a65548a86a5c7ed566b7f29619ce6589e867edf1d7131eb
|
Provenance
The following attestation bundles were made for ecs_task_doctor-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on rishi1508/ecs-task-doctor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecs_task_doctor-0.1.1-py3-none-any.whl -
Subject digest:
2bb4d0ac1323f9b5d1e97b9cd51fb5835ebbc19710ef74a39b7540a8ca9bab65 - Sigstore transparency entry: 1162454628
- Sigstore integration time:
-
Permalink:
rishi1508/ecs-task-doctor@7b6bd5c43343c43bc0c6debecc688adf6c7b1f01 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rishi1508
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7b6bd5c43343c43bc0c6debecc688adf6c7b1f01 -
Trigger Event:
release
-
Statement type: