Toolkit to enrich, validate and explore YAML metadata from a pandas DataFrame.
Project description
MetaCraft Toolkit
MetaCraft is a Python package for enriching and validating YAML schemas from a pandas.DataFrame. The metadata.update() function can now read YAML directly from URLs and even download remote ZIP files with multiple schemas, just like pandas.read_csv.
Features
- update: enriches YAML with statistics and sketches (
tdigest,HyperLogLog), storing the results inmetadata.df. - validate: checks the consistency between a DataFrame and the YAML (types, ranges, nulls, ...).
- compare: detects schema drift between two schemas.
- export_schema: converts the YAML to other formats (Spark, SQL, etc.).
- generate_expectations: creates Great Expectations suites.
- transform: returns a DataFrame adjusted to the schema.
- quality_report: simple quality score (completeness + drift).
- research: uses OpenAI to explore relationships and anomalies.
- loglevel: controls verbosity via
Metadata(loglevel="DEBUG").
Installation
pip install MetaCraft
Or from the repository:
pip install -r requirements.txt
Optional dependencies: openai, tdigest, datasketch.
Quick example
import pandas as pd
from metacraft import Metadata
# Example DataFrame
df = pd.DataFrame({
'survived': [0, 1, 1, 0],
'age': [22, 38, 26, 35],
})
# Minimal schema
yaml_schema = {
'schema': [
{'identity': {'name': 'survived'}},
{'identity': {'name': 'age'}},
]
}
# Save YAML to disk
import yaml
with open('schema.yaml', 'w') as f:
yaml.safe_dump(yaml_schema, f, sort_keys=False, allow_unicode=True)
m = Metadata(loglevel="INFO")
m.update(df, 'schema.yaml', inplace=True)
m.quality_report(df)
Results
✔ schema.yaml updated
root
|-- survived: integer (nullable = false)
|-- age: integer (nullable = false)
<class 'metadata.dataset'>
Columns: 2 entries
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 4 integer
1 age 4 integer
dtypes: integer(2)
Validation passed: True
Quality score: 100.0 (A)
Remote ZIP example
metadata.update() can also process ZIP files hosted on the web. Just pass a URL ending in .zip:
m.update(df, 'https://example.com/schemas.zip', verbose=True)
This downloads the ZIP to a temporary directory, applies the updates and leaves the resulting file in the same folder (or in the path provided with output).
Editing metadata via metadata.df
After m.update() the schema lives in m.df, an editable DataFrame. Changes
can be propagated back to YAML with m.df.upgrade():
# 1) If all columns are integers
m.df['type.logical_type'] = 'integer'
# 2) Change the description of `age`
m.df.loc['age', 'identity.description_i18n.es'] = 'Passenger age'
# 3) Adjust the allowed range for `age`
m.df.loc['age', ['domain.numeric.min', 'domain.numeric.max']] = [0, 120]
m.df.upgrade('schema.yaml') # save the updated YAML
m.df.revert() # discard the changes in memory
Roadmap
- ✔️ Remote YAML support (v 2025‑07‑30)
- ✔️ Remote ZIP download (v 2025‑07‑30)
- ✔️ Optional local cache
- ⬜ CLI (
metadata-cli update titanic.csv titanic.yaml)
Metadata generator
You can try the Metadata Generator, a GPT that creates the YAML from a .head.
Contributions welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metacraft-2025.7.30.tar.gz.
File metadata
- Download URL: metacraft-2025.7.30.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e4858fd04df9cda3ff51e29339b7520c216820101614fcf0961a98fb04f71b2
|
|
| MD5 |
d749db68fe727818ae85f70fadac7ae5
|
|
| BLAKE2b-256 |
2af78a161ff738286c31b03a49490fd6d1352ef5b3d02a190d55a699b7aa787e
|
File details
Details for the file metacraft-2025.7.30-py3-none-any.whl.
File metadata
- Download URL: metacraft-2025.7.30-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c179977bc3651ef452ffbab47f17d843977cfc485f1dc3d46df6be3fce63ff84
|
|
| MD5 |
e0a2bcb167aac63646a9c7b9cb648f03
|
|
| BLAKE2b-256 |
a15f0474565db660820c54010ac55fbdd8e235583757264cf41f43a932033b35
|