Skip to main content

Metadata generator using polars as backend.

Project description

PyMetaGen

pymetagenlogo

PyMetaGen is a powerful and fast data quality tool base on Polars designed for generating metadata and extracting useful information from various data file formats. It provides both a Python API and a command-line interface (CLI) to inspect, filter, and extract data from files such as CSV, JSON, Parquet, and Excel.

Key Features

  • Metadata Generation: Automatically generates metadata for your datasets, including statistics such as min, max, standard deviation, and more.
  • Data Extraction: Easily extract specific rows from your datasets using head, tail, or random sampling.
  • Command Line Interface: Perform operations like metadata generation, data inspection, and filtering using an intuitive CLI.
  • Multiple File Format Support: Import and export data in various formats, including CSV, Parquet, Excel, and JSON.
  • SQL Query Support: Filter data using SQL queries directly on the command line.

Installation

To install the package, use the following command:

pip install pymetagen

Local Installation

To install the package locally, use the following command:

python -m pip install -U git+ssh://git@github.com/itsbigspark/dotdda.git@dev/main

Usage

Python API

You can use the Python API to load a data file and generate metadata:

from pymetagen import MetaGen

# Create an instance of the MetaGen class reading a data file

metagen = MetaGen.from_path("tests/data/testdata.csv", loading_mode="eager")

# Display the first few rows of the data

metagen.data.head()
# Generate metadata and reset the index

metadata = metagen.compute_metadata().reset_index()
# Save the metadata to a file

metagen.write_metadata("tests/data/testdata_metadata.csv")

Command Line Interface

  • Metadata Generation Generate metadata for a tabular data file:
$ metagen metadata -i tests/data/testdata.csv -o tests/data/testdata_metadata.csv

>>> Generating metadata for tests/data/testdata.csv...
  • Data Inspection Inspect a data file (e.g., a partitioned Parquet file):
metagen inspect -i tests/data/input_ab_partition.parquet
  • Data Filtering Filter a data set using an SQL query:
metagen filter -i tests/data/testdata.csv -q "SELECT * FROM data WHERE imdb_score > 9"
  • Data Extraction Extract a specific number of rows from a data set:
$ metagen extracts -i tests/data/testdata.csv -o tests.csv -n 3

>>> Writing extract in: tests-head.csv
>>> Writing extract in: tests-tail.csv
>>> Writing extract in: tests-sample.csv

Available Output Formats

  • CSV
  • Parquet
  • JSON
  • Excel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymetagen-0.4.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymetagen-0.4.0-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file pymetagen-0.4.0.tar.gz.

File metadata

  • Download URL: pymetagen-0.4.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymetagen-0.4.0.tar.gz
Algorithm Hash digest
SHA256 edf4ae1dc222e1dd77efb901e8bc0311bbc2f707867e392e6c144153f246579b
MD5 e3afeb02e5e20de10ba7c955207da92c
BLAKE2b-256 cd8050f7a3b311d327dd17e5e51d77b140abba34af3d3fa38ea560fed7e1512b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymetagen-0.4.0.tar.gz:

Publisher: pypi-release.yml on itsbigspark/pymetagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymetagen-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pymetagen-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymetagen-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6412e8bf99cc05050994a2929db7c4ae53197e977865cfa00862b7de8a321a3
MD5 72c5b321263d63e8056633c474dd4d16
BLAKE2b-256 bd423742779ca875d3c8ff78e88cc1381809204f88b41bbb6cab90fb8e958263

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymetagen-0.4.0-py3-none-any.whl:

Publisher: pypi-release.yml on itsbigspark/pymetagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page