Data monitoring and lineage
Project description
Our goal is to provide data teams with immediate visibility, detection of data issues, and impact analysis.
We focus on providing a simple setup, integrations with the existing stack, and centralized metadata in your own data warehouse.
Supported use cases
Live data lineage
- An end-to-end view of data enriched with operational context like freshness, volume, duration, and permissions.
- View data that has been transformed with or without dbt, or across all of your dbt projects.
Source tables monitoring
- Detect breaking changes and discover new data you can leverage in your source tables.
- Simple and minimal configuration (it can also be read from your dbt sources definitions).
Alerts
- Slack alerts on breaking changes and new data can be configured within minutes.
Demo & sandbox
Try out our live lineage sandbox here.
:star: If you like what we are building, support us with a :star:Quick start
pip install elementary-data
# The tool is named edr (Elementary Data Reliability),
# run it to validate the installation:
edr --help
Add your data warehouse connection details in a profiles.yml
file, see our quickstart page to learn more or use this template here. Yes, if you are a dbt user we use dbt's profiles.yml by default (simply add a new profile called 'elementary').
Now, generate a lineage graph:
# Creates a lineage graph from queries executed between 7 days ago and current time,
# for the database named 'my_db'
edr lineage -db my_db
After you configure sources to monitor, execute it using:
edr monitor
To continuously monitor your data, schedule this command to run periodically with your existing orchestration solution (we highly recommend running it at least once a day).
Documentation
Want to learn more on how to quickly get started with it?
Go to our quickstart page.
Have questions about the configuration?
Go to our configuration FAQ here.
Curious to learn about the different modules?
Use this modules overview.
Our full documentation is available here.
Features
Data lineage
- Lineage visualization: Visual map of data flow and dependencies in the data warehouse, including legacy that is not managed by dbt.
- Dataset status: Present data about freshness, volume, permissions and more on the lineage graph itself.
- Accuracy: Reflects the actual state in the DWH based on logs and your query history.
- Plug-and-play: No need for code changes.
- Graph filters: Filter the graph by dataset, dates, direction, and depth.
Source tables monitoring
- Slack notifications.
- Detect deletions: columns and tables that were removed.
- Detect data type changes.
- Detect new data: columns and tables that were added.
You can impact our next features in this roadmap by voting :+1: to issues and opening new ones.
We aim to build an open, transparent, and community-powered data observability platform. A solution that data teams could easily integrate into their workflows, detect data incidents and prevent them from even happening in the first place.
Community & Support
For additional information and help, you can use one of these channels:
- Slack (Live chat with the team, support, discussions, etc.)
- GitHub issues (Bug reports, feature requests)
- Roadmap (Vote for features and add your inputs)
- Twitter (Updates on new releases and stuff)
Integrations
- Snowflake - Lineage & monitoring
- BigQuery - Lineage only
- Redshift
Ask us for integrations on Slack or as a GitHub issue.
License
Elementary is licensed under Apache License 2.0. See the LICENSE file for licensing information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for elementary_data-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f33f878dbc4a403569154e9c2aa82ba3e42cf9a4fcadd8d1cf6b0197feb02f6 |
|
MD5 | 4b65060dd9d38d06032890a819ff50e4 |
|
BLAKE2b-256 | 8c5920dc3ac6ebd8b72962edf2e9c6a7df4721e3190ea4c79bf2dc75aa910765 |