Apache Airflow Operator exporting AWS Cost Explorer data to local file or S3
Project description
Airflow AWS Cost Explorer Plugin
A plugin for Apache Airflow that allows you to export AWS Cost Explorer as S3 metrics to local file or S3 in Parquet, JSON, or CSV format.
System Requirements
- Airflow Versions
- 1.10.3 or newer
- pyarrow or fastparquet (optional, for writing Parquet files)
Deployment Instructions
-
Install the plugin
pip install airflow-aws-cost-explorer
-
Optional for writing Parquet files - Install pyarrow or fastparquet
pip install pyarrow
or
pip install fastparquet
-
Restart the Airflow Web Server
-
Configure the AWS connection (Conn type = 'aws')
-
Optional for S3 - Configure the S3 connection (Conn type = 's3')
Operators
AWSCostExplorerToS3Operator
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday)
:type day: str, date or datetime
:param aws_conn_id: Cost Explorer AWS connection id (default: aws_default)
:type aws_conn_id: str
:param region_name: Cost Explorer AWS Region
:type region_name: str
:param s3_conn_id: Destination S3 connection id (default: s3_default)
:type s3_conn_id: str
:param s3_bucket: Destination S3 bucket
:type s3_bucket: str
:param s3_key: Destination S3 key
:type s3_key: str
:param file_format: Destination file format (parquet, json or csv default: parquet)
:type file_format: str or FileFormat
:param metrics: Metrics (default: UnblendedCost, BlendedCost)
:type metrics: list
AWSCostExplorerToLocalFileOperator
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday)
:type day: str, date or datetime
:param aws_conn_id: Cost Explorer AWS connection id (default: aws_default)
:type aws_conn_id: str
:param region_name: Cost Explorer AWS Region
:type region_name: str
:param destination: Destination file complete path
:type destination: str
:param file_format: Destination file format (parquet, json or csv default: parquet)
:type file_format: str or FileFormat
:param metrics: Metrics (default: UnblendedCost, BlendedCost)
:type metrics: list
AWSBucketSizeToS3Operator
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday)
:type day: str, date or datetime
:param aws_conn_id: Cost Explorer AWS connection id (default: aws_default)
:type aws_conn_id: str
:param region_name: Cost Explorer AWS Region
:type region_name: str
:param s3_conn_id: Destination S3 connection id (default: s3_default)
:type s3_conn_id: str
:param s3_bucket: Destination S3 bucket
:type s3_bucket: str
:param s3_key: Destination S3 key
:type s3_key: str
:param file_format: Destination file format (parquet, json or csv default: parquet)
:type file_format: str or FileFormat
:param metrics: Metrics (default: bucket_size, number_of_objects)
:type metrics: list
AWSBucketSizeToLocalFileOperator
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday)
:type day: str, date or datetime
:param aws_conn_id: Cost Explorer AWS connection id (default: aws_default)
:type aws_conn_id: str
:param region_name: Cost Explorer AWS Region
:type region_name: str
:param destination: Destination file complete path
:type destination: str
:param file_format: Destination file format (parquet, json or csv default: parquet)
:type file_format: str or FileFormat
:param metrics: Metrics (default: bucket_size, number_of_objects)
:type metrics: list
Example
#!/usr/bin/env python
import airflow
from airflow import DAG
from airflow_aws_cost_explorer import AWSCostExplorerToLocalFileOperator
from datetime import timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=30)
}
dag = DAG('cost_explorer',
default_args=default_args,
schedule_interval=None,
concurrency=1,
max_active_runs=1,
catchup=False
)
aws_cost_explorer_to_file = AWSCostExplorerToLocalFileOperator(
task_id='aws_cost_explorer_to_file',
day='{{ yesterday_ds }}',
destination='/tmp/{{ yesterday_ds }}.parquet',
file_format='parquet',
dag=dag)
if __name__ == "__main__":
dag.cli()
Links
- Apache Airflow - https://github.com/apache/airflow
- Apache Arrow - https://github.com/apache/arrow
- fastparquet - https://github.com/dask/fastparquet
- AWS Cost Explorer - https://aws.amazon.com/aws-cost-management/aws-cost-explorer/ API Reference
- S3 CloudWatch Metrics - https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airflow_aws_cost_explorer-1.3.0.tar.gz
.
File metadata
- Download URL: airflow_aws_cost_explorer-1.3.0.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de2eb066725bcf940295ab00b37ab846e2329526f2415f95a667217bade490ef |
|
MD5 | 65790ef4f949c6905357d272adcfd81c |
|
BLAKE2b-256 | 68d9b35438627532ee3f8e7307b553e30056ea6a967605298c65f0b980a8d87c |
File details
Details for the file airflow_aws_cost_explorer-1.3.0-py2.py3-none-any.whl
.
File metadata
- Download URL: airflow_aws_cost_explorer-1.3.0-py2.py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3896bc7c56e3d2db9181eb75de188edf049aa1185e1aaba5a2018523634fa0a |
|
MD5 | 92ece09f957b85244350b7815809d8e7 |
|
BLAKE2b-256 | 0e8d4e753a4eb62965033e9db19757e23832ee10db8d41c1bc0a92cd6a366ad2 |