No project description provided
Project description
data-tools(et)
data-toolset is designed to simplify your data processing tasks by providing a more user-friendly alternative to the traditional JAR utilities like avro-tools and parquet-tools. With this Python package, you can effortlessly handle various data file formats, including Avro and Parquet, using a simple and intuitive command-line interface.
Installation
Python 3.9 and 3.10 are supported and tested (to some extent).
python -m pip install --user data-toolset
Usage
$ data-toolset -h
usage: data-toolset [-h] {head,tail,meta,schema,stats,query,validate,merge,count,to_json,to_csv} ...
positional arguments:
{head,tail,meta,schema,stats,query,validate,merge,count,to_json,to_csv}
commands
head Print the first N records from a file
tail Print the last N records from a file
meta Print a file's metadata
schema Print the Avro schema for a file
stats Print statistics about a file
query Query a file
validate Validate a file
merge Merge multiple files into one
count Count the number of records in a file
to_json Convert a file to JSON format
to_csv Convert a file to CSV format
optional arguments:
-h, --help show this help message and exit
Examples
Print the first 10 records of a Parquet file:
data-toolset head my_data.parquet -n 10
Query a Parquet file using a SQL-like expression:
data-toolset query my_data.parquet "SELECT * FROM 'my_data.parquet' WHERE age > 25"
Merge multiple Avro files into one:
data-toolset merge file1.avro file2.avro file3.avro merged_file.avro
Contributing
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue on GitHub.
TODO
- make parquet validation work with avsc schemas?
- create random_sample function
- create schema_evolution function
- mature create_sample function
- optimizations [TBD]
- support 3.11+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_toolset-0.1.4.tar.gz
.
File metadata
- Download URL: data_toolset-0.1.4.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.9.16 Darwin/22.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02d1e82e7be3d0dcf792fbd82a668e151731cc530a490fbc1466a467e13d2989 |
|
MD5 | fd30023f7b933e9f18e928d4d51d0fca |
|
BLAKE2b-256 | 57d9aee87c5dd92fbab5186b66782c0be15b7f0dceec440c1b00049f6d134f32 |
File details
Details for the file data_toolset-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: data_toolset-0.1.4-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.9.16 Darwin/22.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f02f2eb47b9d9170e5f2592a6cee7cc4c2ff1d10cfdceda7e9bb73c7dabb2f4 |
|
MD5 | eea5626cfac8ee5cac4ab2d6dbdcfc41 |
|
BLAKE2b-256 | fb6f1f872bd4eb45d5224cc1a429342c0746451b1af7e0eb30ced11777fb70e6 |