A metadata toolkit written in Python
Project description
A metadata toolkit written in Python
About
Recap is a Python library that helps you build tools for data quality, data goverenance, data profiling, data lineage, data contracts, and schema conversion.
Features
- Compatible with fsspec filesystems and SQLAlchemy databases.
- Built-in support for Parquet, CSV, TSV, and JSON files.
- Includes Pandas for data profiling.
- Uses Pydantic for metadata models.
- Convenient CLI, Python API, and REST API
- No external system dependencies.
Installation
pip install recap-core
Usage
Grab schemas from filesystems:
schema("s3://corp-logs/2022-03-01/0.json")
And databases:
schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests")
In a standardized format:
{
"fields": [
{
"name": "unique_key",
"type": "VARCHAR",
"nullable": false,
"comment": "The service request tracking number."
},
{
"name": "complaint_description",
"type": "VARCHAR",
"nullable": true,
"comment": "Service request type"
}
]
}
See what schemas used to look like:
schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests", datetime(2023, 1, 1))
Build metadata extractors:
@registry.metadata("s3://{path:path}.json", include_df=True)
@registry.metadata("bigquery://{project}/{dataset}/{table}", include_df=True)
def pandas_describe(df: DataFrame, *_) -> BaseModel:
description_dict = df.describe(include="all")
return PandasDescription.parse_obj(description_dict)
Crawl your data:
crawl("s3://corp-logs")
crawl("bigquery://floating-castle-728053")
And read the results:
search("json_extract(metadata_obj, '$.count') > 9999", PandasDescription)
See where data comes from:
writers("bigquery://floating-castle-728053/austin_311/311_service_requests")
And where it's going:
readers("bigquery://floating-castle-728053/austin_311/311_service_requests")
All cached in Recap's catalog.
Getting Started
See the Quickstart page to get started.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
recap-core-0.5.1.tar.gz
(18.8 kB
view hashes)
Built Distribution
recap_core-0.5.1-py3-none-any.whl
(23.1 kB
view hashes)
Close
Hashes for recap_core-0.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f2425c76478192c2b27bc9ca39488e1790c1becee111a906517dcf40de2fb5b |
|
MD5 | 466decec9e528737b3b38baa94ee1d1c |
|
BLAKE2b-256 | a07f6e96418a4c1102ea4cd33e44d3736b9c4b9e0b814a5727f13a91a354b64f |