Skip to main content

A metadata toolkit written in Python

Project description

recap

A metadata toolkit written in Python

Actions Status Imports: isort Code style: black pylint

About

Recap is a Python library that helps you build tools for data quality, data goverenance, data profiling, data lineage, data contracts, and schema conversion.

Features

Installation

pip install recap-core

Usage

Grab schemas from filesystems:

schema("s3://corp-logs/2022-03-01/0.json")

And databases:

schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests")

In a standardized format:

{
  "fields": [
    {
      "name": "unique_key",
      "type": "VARCHAR",
      "nullable": false,
      "comment": "The service request tracking number."
    },
    {
      "name": "complaint_description",
      "type": "VARCHAR",
      "nullable": true,
      "comment": "Service request type"
    }
  ]
}

See what schemas used to look like:

schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests", datetime(2023, 1, 1))

Build metadata extractors:

@registry.metadata("s3://{path:path}.json", include_df=True)
@registry.metadata("bigquery://{project}/{dataset}/{table}", include_df=True)
def pandas_describe(df: DataFrame, *_) -> BaseModel:
    description_dict = df.describe(include="all")
    return PandasDescription.parse_obj(description_dict)

Crawl your data:

crawl("s3://corp-logs")
crawl("bigquery://floating-castle-728053")

And read the results:

search("json_extract(metadata_obj, '$.count') > 9999", PandasDescription)

See where data comes from:

writers("bigquery://floating-castle-728053/austin_311/311_service_requests")

And where it's going:

readers("bigquery://floating-castle-728053/austin_311/311_service_requests")

All cached in Recap's catalog.

Getting Started

See the Quickstart page to get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recap-core-0.5.0.tar.gz (18.9 kB view hashes)

Uploaded Source

Built Distribution

recap_core-0.5.0-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page