Skip to main content

The DAMN (Data Assets Metric Navigation) tool extracts and reports metrics about your data assets

Project description

████████▄     ▄████████   ▄▄▄▄███▄▄▄▄   ███▄▄▄▄   
███   ▀███   ███    ███ ▄██▀▀▀███▀▀▀██▄ ███▀▀▀██▄ 
███    ███   ███    ███ ███   ███   ███ ███   ███ 
███    ███   ███    ███ ███   ███   ███ ███   ███ 
███    ███ ▀███████████ ███   ███   ███ ███   ███ 
███    ███   ███    ███ ███   ███   ███ ███   ███ 
███   ▄███   ███    ███ ███   ███   ███ ███   ███ 
████████▀    ███    █▀   ▀█   ███   █▀   ▀█   █▀                                                 



Latest release Forks Stars Open issues Contributors



Data Asset Metrics Navigator

The DAMN tool extracts and reports metrics about your data assets.

It allows you to inspect your assets, lineage, and all sorts of metrics around materialization, usage, physical space usage and query performance. The objective of the DAMN tool is to give you a convenient command-line tool to track and report on the data assets you're working on.

Installation

To install the DAMN tool, run the following command:

pip install damn-tool



Connectors

The DAMN tool leverages various connectors to interact with different service providers.

Configurations

Configuring these connectors is done via a YAML file located at ~/.damn/connectors.yml. You can override the location of those connector configurations using the --configs-dir option.

See example configuration file here

The configuration file uses the following structure:

connector_type:
  service_provider:
    param1: value1
    param2: value2
  • connector_type: The name of the connector (e.g., orchestrator, io-manager, data-warehouse, etc.).
  • service_provider: The name of the service provider for the connector. You can have multiple providers per connector.
  • param1, param2, etc.: The parameters needed for each connector. The required parameters will depend on the specific connector. For example, a Dagster connector might require endpoint and api_token.

Connector types

Orchestrator

This is the default connector required by the DAMN tool. For now, we only support Dagster as the service provider for this connector. Here's an example configuration for an orchestrator connector with a dagster profile:

orchestrator:
  dagster:
    endpoint: https://your-dagster-instance.com/prod/graphql
    api_token: your-api-token

IO Manager

Your assets can be stored in storage services. For now, we only support the AWS storage service. This can be configured like this.

io-manager:
  aws:
    credentials:
      access_key_id: "{{ env('AWS_ACCESS_KEY_ID') }}"
      secret_access_key: "{{ env('AWS_SECRET_ACCESS_KEY') }}"
      region: "us-east-1"
    bucket_name: "bucket-name"
    key_prefix: "asset-prefix"

Data Warehouses

Your assets can be materialized to a data warehouse. For now, we only support Snowflake. This can be configured like this.

data-warehouse:
  snowflake:
    account: ab1234.us-east-1
    user: username
    password: "{{ env('SNOWFLAKE_PASSWORD') }}"
    role: my-role
    database: my-database
    warehouse: my-warehouse
    schema: analytics

Switching Between Service Providers

The active service provider for each connector can be changed by specifying the service provider when running DAMN commands. By default, DAMN will use the first service provider configured for each connector.

Example usage:

damn ls --orchestrator dagster --io-manager aws --data-warehouse snowflake



Usage

The DAMN tool is both a CLI tool and a python library.

Output option

Note that in CLI model, commands support an output option which allows flexibility in how the DAMN tool might be used:

  • terminal: By default, the output of commands will be printed to the terminal
  • json: You can also have the output as a json object, which is more useful if you're to use DAMN in a programmatic way.
  • copy: You can also copy the output to your clipboard, which is useful if you want to share an asset's metrics in a PR for example.

List assets

In python...

from damn_tool.ls import list_assets

result = list_assets()
print(result)

From the command line...

foo@bar:~$ damn ls
- airbyte/protest_groupings
- data_warehouse/movements_dim
- data_warehouse/observations_fct
- gdelt/gdelt_gkg_articles
- gdelt/gdelt_mention_summaries
- hex/hex_main_dashboard_refresh
- semantic_definitions

List all assets for a specifc key group In python...

from damn_tool.ls import list_assets

result = list_assets(prefix='gdelt')
print(result)

From the command line...

foo@bar:~$ damn ls --prefix gdelt
- gdelt/gdelt_article_summaries
- gdelt/gdelt_articles_enhanced
- gdelt/gdelt_events
- gdelt/gdelt_gkg_articles
- gdelt/gdelt_mention_summaries
- gdelt/gdelt_mentions
- gdelt/gdelt_mentions_enhanced

Show details for a specific asset

In python...

from damn_tool.show import show_asset

result = show_asset('gdelt/gdelt_articles_enhanced')
print(result)

From the command line...

foo@bar:~$ damn show gdelt/data_warehouse/integration/int__events_actors
From orchestrator:
 - description: dbt model int__events_actors
 - computeKind: dbt
 - policyType: LAZY
 - maximumLagMinutes: 360.0
 - cronSchedule: None
 - isPartitioned: False
- dependedByKeys:
   - data_warehouse
   - events_actors_bridge
- dependencyKeys:
   - data_warehouse
   - integration
   - int__events_observations
   - data_warehouse
   - integration
   - int__actors
- metadataEntries:
  - Execution Duration: 4.183706
From data warehouse:
 - table_schema: analytics_integration
 - table_type: base table
 - created: 2023-07-05T08:36:40.935000-07:00
 - last_altered: 2023-07-19T09:56:36.410000-07:00

Show metrics for a specific asset

In python...

from damn_tool.metrics import asset_metrics

result = asset_metrics('gdelt/gdelt_articles_enhanced')
print(result)

From the command line...

foo@bar:~$ damn metrics gdelt/gdelt_gkg_articles
From orchestrator:
 - run_id: 03466ceb-1c51-43ab-9384-33b6472c3f24
 - status: SUCCESS
 - start_time: 2023-07-19 14:19:00
 - end_time: 2023-07-19 14:19:02
 - elapsed_time: 0:00:02.563292
 - num_partitions: 4963
 - num_materialized: 4963
 - num_failed: 0
From IO manager:
 - files: 4976
 - size: 76.25 MB
 - last_modified: 2023-07-19T18:19:03+00:00
From data warehouse:
 - row_count: None
 - bytes: N/A



Contribution

Contributions to the DAMN tool are always welcome. Whether it's feature requests, bug fixes, or new features, your contribution is appreciated.



License

The DAMN tool is open-source software, licensed under MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

damn_tool-0.1.4.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

damn_tool-0.1.4-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file damn_tool-0.1.4.tar.gz.

File metadata

  • Download URL: damn_tool-0.1.4.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.4 Darwin/22.5.0

File hashes

Hashes for damn_tool-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0d8461b87d43b1393f908195f752082f408d48f2ed68236b5b93a01887b372c0
MD5 d9b1d86b33f8304f6e2db81b67246556
BLAKE2b-256 8571e108b56437f92aa4920c4e272e519d11fb24c9ca4fd7835e8d0847c79308

See more details on using hashes here.

File details

Details for the file damn_tool-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: damn_tool-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.4 Darwin/22.5.0

File hashes

Hashes for damn_tool-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 81a636f59c9aa18bee9c8e66a3025724be1bd92c51e108dcffff08a509e703b5
MD5 3031fba512a35209827725f622d1e2ca
BLAKE2b-256 b2eff89bfd7c8f68e95103bbe979c8f96d2f8cf2a8fd285707b005b17ffa74bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page