Skip to main content

Git based metric store

Project description

Datagit

Datagit is a git based metric store

>>> from datagit import github_connector

>>> dataframe = bigquery.Client().query(query).to_dataframe()
{"unique_key": ['2022-01-01_FR', '2022-01-01_GB'...
>>> github_connector.store_metric(Github("Token"), dataframe=dataframe, filepath="Samox/datagit/data/act_metrics_finance/mrr.csv", assignees=["Samox"])
'🎉 data/act_metrics_finance/mrr.csv Successfully stored!'
'💩 Historical data change detected, Samox was assigned to it'

Getting Started

To get started with Datagit, follow these steps:

  1. Create a new repository on GitHub called datagit with a README file.
  2. Generate a personal access token on GitHub that has access to the datagit repository. You can do this by going to your GitHub settings, selecting "Developer settings", and then "Personal access tokens". Click "Generate new token" and give it the necessary permissions (content and pull requests).
  3. In your data pipelines, when relevant, call store_metric with the following parameters
    • a github client with your token Github("Token")
    • your metric in a dataframe format
    • the path of metric in a with a csv format: "your_orga/your_repo/path/to/your.csv"
    • The owner of the metric

For instance

>>> from datagit import github_connector
>>> github_connector.store_metric(Github("Token"), dataframe=dataframe, filename="Samox/datagit/data/act_metrics_finance/mrr.csv", assignee=["Samox"])

That's it! With these steps, you can start using Datagit to store and track your metrics over time.

Example

>>> githubToken = "github_pat****"
>>> githubRepo = "ReplaceOrgaName/ReplaceRepoName"
>>> import pandas as pd
>>> dataframe = pd.DataFrame({'unique_key': ['a', 'b', 'c'], 'amount': [1001, 1002, 1003], 'is_active': [True, False, True]})
>>> from github import Github
>>> from datagit import github_connector
>>> github_connector.store_metric(Github(githubToken), dataframe=dataframe, filename=githubRepo+"data/act_metrics_finance/mrr.csv")

Dataframe

Datagit is base on the standard dataframe format from Pandas. One can use any library to get the data as long as the format fits the following requirements:

  1. The first column of the dataframe must be unique_key
  2. The first columns must have only unique keys

Query Builder

Datagit provides a simple query builder to store a table:

>>> from datagit import query_builder
>>> query = query_builder.build_query(table_id="my_table", unique_key_columns=["organisation_id", "date_month"])
'SELECT CONCAT(organisation_id, '__', date_month) AS unique_key, * FROM my_table WHERE TRUE ORDER BY 1'

More examples here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagit-0.9.tar.gz (6.1 kB view hashes)

Uploaded Source

Built Distribution

datagit-0.9-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page