Skip to main content

Metadata generator using polars as backend.

Project description

PyMetaGen

PyMetaGen tests

pymetagenlogo

PyMetaGen is a powerful and fast data quality tool base on Polars designed for generating metadata and extracting useful information from various data file formats. It provides both a Python API and a command-line interface (CLI) to inspect, filter, and extract data from files such as CSV, JSON, Parquet, and Excel.

Key Features

  • Metadata Generation: Automatically generates metadata for your datasets, including statistics such as min, max, standard deviation, and more.
  • Data Extraction: Easily extract specific rows from your datasets using head, tail, or random sampling.
  • Command Line Interface: Perform operations like metadata generation, data inspection, and filtering using an intuitive CLI.
  • Multiple File Format Support: Import and export data in various formats, including CSV, Parquet, Excel, and JSON.
  • SQL Query Support: Filter data using SQL queries directly on the command line.

Installation

To install the package, use the following command:

pip install pymetagen

Local Installation

To install the package locally, use the following command:

python -m pip install -U git+ssh://git@github.com/itsbigspark/dotdda.git@dev/main

Usage

Python API

You can use the Python API to load a data file and generate metadata:

from pymetagen import MetaGen

# Create an instance of the MetaGen class reading a data file

metagen = MetaGen.from_path("tests/data/testdata.csv", loading_mode="eager")

# Display the first few rows of the data

metagen.data.head()
# Generate metadata and reset the index

metadata = metagen.compute_metadata().reset_index()
# Save the metadata to a file

metagen.write_metadata("tests/data/testdata_metadata.csv")

Command Line Interface

  • Metadata Generation Generate metadata for a tabular data file:
$ metagen metadata -i tests/data/testdata.csv -o tests/data/testdata_metadata.csv

>>> Generating metadata for tests/data/testdata.csv...
  • Data Inspection Inspect a data file (e.g., a partitioned Parquet file):
metagen inspect -i tests/data/input_ab_partition.parquet
  • Data Filtering Filter a data set using an SQL query:
metagen filter -i tests/data/testdata.csv -q "SELECT * FROM data WHERE imdb_score > 9"
  • Data Extraction Extract a specific number of rows from a data set:
$ metagen extracts -i tests/data/testdata.csv -o tests.csv -n 3

>>> Writing extract in: tests-head.csv
>>> Writing extract in: tests-tail.csv
>>> Writing extract in: tests-sample.csv

Available Output Formats

  • CSV
  • Parquet
  • JSON
  • Excel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymetagen-0.4.1.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymetagen-0.4.1-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file pymetagen-0.4.1.tar.gz.

File metadata

  • Download URL: pymetagen-0.4.1.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymetagen-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2792ead17ea7be58881e457a3b20db10c282fdb19b80fb9fb2d4090c1431c359
MD5 c0c15967e83a56fecb23fc440699cba9
BLAKE2b-256 a78d453880373684aa3064ac2739c5317277e7454c44e792610538a9b12d8a0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymetagen-0.4.1.tar.gz:

Publisher: pypi-release.yml on itsbigspark/pymetagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymetagen-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: pymetagen-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymetagen-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8916db6105e70674196ba9a254da2e47edfc8de6e70ea157d511332b1c9729fa
MD5 eb903c634833c80d6fc068cd255a8030
BLAKE2b-256 96621c7cd36f38ecd6fc4ecf0118cd55966afbecfb6e94031bf3f92c12081b94

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymetagen-0.4.1-py3-none-any.whl:

Publisher: pypi-release.yml on itsbigspark/pymetagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page