A metadata toolkit written in Python
Project description
A metadata toolkit written in Python
About
Recap is a Python library that helps you build tools for data quality, data goverenance, data profiling, data lineage, data contracts, and schema conversion.
Features
- Compatible with fsspec filesystems and SQLAlchemy databases.
- Built-in support for Parquet, CSV, TSV, and JSON files.
- Includes Pandas for data profiling.
- Uses Pydantic for metadata models.
- Convenient CLI, Python API, and REST API
- No external system dependencies.
Installation
pip install recap-core
Usage
Grab schemas from filesystems:
schema("s3://corp-logs/2022-03-01/0.json")
And databases:
schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests")
In a standardized format:
{
"fields": [
{
"name": "unique_key",
"type": "VARCHAR",
"nullable": false,
"comment": "The service request tracking number."
},
{
"name": "complaint_description",
"type": "VARCHAR",
"nullable": true,
"comment": "Service request type"
}
]
}
See what schemas used to look like:
schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests", datetime(2023, 1, 1))
Build metadata extractors:
@registry.metadata("s3://{path:path}.json", include_df=True)
@registry.metadata("bigquery://{project}/{dataset}/{table}", include_df=True)
def pandas_describe(df: DataFrame, *_) -> BaseModel:
description_dict = df.describe(include="all")
return PandasDescription.parse_obj(description_dict)
Crawl your data:
crawl("s3://corp-logs")
crawl("bigquery://floating-castle-728053")
And read the results:
search("json_extract(metadata_obj, '$.count') > 9999", PandasDescription)
See where data comes from:
writers("bigquery://floating-castle-728053/austin_311/311_service_requests")
And where it's going:
readers("bigquery://floating-castle-728053/austin_311/311_service_requests")
All cached in Recap's catalog.
Getting Started
See the Quickstart page to get started.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
recap-core-0.5.0.tar.gz
(18.9 kB
view hashes)
Built Distribution
recap_core-0.5.0-py3-none-any.whl
(23.0 kB
view hashes)
Close
Hashes for recap_core-0.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39c39788f077e958b5c5f4b9c21653b224a6032acf80280b4b697d1c8fb8e8bd |
|
MD5 | 20271e3de4da3ecd347154689511141d |
|
BLAKE2b-256 | a252e2662648f2e6c02650d92113cc1d4ce5017b46ec0f00b85ec2d6bce589e6 |