Read CSV files and convert to other file formats easily
Project description
Welcome To Datagrunt
Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.
Why Datagrunt?
Born out of real-world frustration, Datagrunt eliminates the need For repetitive coding when handling CSV files. Whether you're a data analyst, data engineer, or data scientist, Datagrunt empowers you to focus on insights, not tedious data wrangling.
Key Features
- Intelligent Delimiter Inference: Datagrunt automatically detects and applies the correct delimiter for your csv files.
- Seamless Data Processing: Leverage the robust capabilities of DuckDB and Polars to perform advanced data processing tasks directly on your CSV data.
- Flexible Transformation: Easily convert your processed CSV data into various formats to suit your needs.
- Pythonic API: Enjoy a clean and intuitive API that integrates seamlessly into your existing Python workflows.
Installation
Get started with Datagrunt in seconds using pip:
pip install datagrunt
Getting Started
from datagrunt import CSVReader
# Load your CSV file
csv_file = 'electric_vehicle_population_data.csv'
engine = 'duckdb'
# Set duckdb as the processing engine. Engine set to 'polars' by default
dg = CSVReader(csv_file, engine=engine)
# return sample of the data to get a peek at the schema
dg.get_sample()
┌────────────┬───────────┬──────────────┬───┬──────────────────────┬──────────────────────┬───────────────────┐
│ VIN (1-10) │ County │ City │ … │ Vehicle Location │ Electric Utility │ 2020 Census Tract │
│ varchar │ varchar │ varchar │ │ varchar │ varchar │ varchar │
├────────────┼───────────┼──────────────┼───┼──────────────────────┼──────────────────────┼───────────────────┤
│ 5YJSA1E28K │ Snohomish │ Mukilteo │ … │ POINT (-122.29943 … │ PUGET SOUND ENERGY… │ 53061042001 │
│ 1C4JJXP68P │ Yakima │ Yakima │ … │ POINT (-120.468875… │ PACIFICORP │ 53077001601 │
│ WBY8P6C05L │ Kitsap │ Kingston │ … │ POINT (-122.517835… │ PUGET SOUND ENERGY… │ 53035090102 │
│ JTDKARFP1J │ Kitsap │ Port Orchard │ … │ POINT (-122.653005… │ PUGET SOUND ENERGY… │ 53035092802 │
│ 5UXTA6C09N │ Snohomish │ Everett │ … │ POINT (-122.203234… │ PUGET SOUND ENERGY… │ 53061041605 │
│ 5YJYGDEF8L │ King │ Seattle │ … │ POINT (-122.378886… │ CITY OF SEATTLE - … │ 53033004703 │
│ JTMAB3FV7P │ Thurston │ Rainier │ … │ POINT (-122.677141… │ PUGET SOUND ENERGY… │ 53067012530 │
│ JN1AZ0CPXC │ King │ Kirkland │ … │ POINT (-122.192596… │ PUGET SOUND ENERGY… │ 53033022402 │
│ JN1AZ0CP7B │ King │ Kirkland │ … │ POINT (-122.192596… │ PUGET SOUND ENERGY… │ 53033022603 │
│ 1N4AZ0CP0F │ Thurston │ Olympia │ … │ POINT (-122.86491 … │ PUGET SOUND ENERGY… │ 53067010300 │
│ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │
│ 5YJYGDEE7M │ Clark │ Vancouver │ … │ POINT (-122.515805… │ BONNEVILLE POWER A… │ 53011041310 │
│ 7SAYGAEE0P │ Snohomish │ Monroe │ … │ POINT (-121.968385… │ PUGET SOUND ENERGY… │ 53061052203 │
│ 2C4RC1N75P │ King │ Burien │ … │ POINT (-122.347227… │ CITY OF SEATTLE - … │ 53033027600 │
│ 1FTVW1EVXP │ King │ Kirkland │ … │ POINT (-122.202653… │ PUGET SOUND ENERGY… │ 53033022300 │
│ 4JGGM1CB2P │ King │ Seattle │ … │ POINT (-122.2453 4… │ CITY OF SEATTLE - … │ 53033011700 │
│ 1N4BZ0CP0G │ King │ Seattle │ … │ POINT (-122.334079… │ CITY OF SEATTLE - … │ 53033008300 │
│ 7SAYGDEF2N │ King │ Bellevue │ … │ POINT (-122.144149… │ PUGET SOUND ENERGY… │ 53033024704 │
│ 1N4BZ1DP7L │ King │ Bellevue │ … │ POINT (-122.144149… │ PUGET SOUND ENERGY… │ 53033024902 │
...
├────────────┴───────────┴──────────────┴───┴──────────────────────┴──────────────────────┴───────────────────┤
│ ? rows (>9999 rows, 20 shown) 17 columns (6 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
DuckDB Integration for Performant SQL Queries
from datagrunt import CSVReader
csv_file = 'electric_vehicle_population_data.csv'
engine = 'duckdb'
dg = CSVReader(csv_file, engine=engine)
# Construct your SQL query
query = f"""
WITH core AS (
SELECT
City AS city,
"VIN (1-10)" AS vin
FROM {dg.db_table}
)
SELECT
city,
COUNT(vin) AS vehicle_count
FROM core
GROUP BY 1
ORDER BY 2 DESC
"""
# Execute the query and get results as a Polars DataFrame
df = dg.query_data(query).pl()
print(df)
┌────────────────┬───────────────┐
│ city ┆ vehicle_count │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════════╪═══════════════╡
│ Seattle ┆ 32602 │
│ Bellevue ┆ 9960 │
│ Redmond ┆ 7165 │
│ Vancouver ┆ 7081 │
│ Bothell ┆ 6602 │
│ … ┆ … │
│ Glenwood ┆ 1 │
│ Walla Walla Co ┆ 1 │
│ Pittsburg ┆ 1 │
│ Decatur ┆ 1 │
│ Redwood City ┆ 1 │
└────────────────┴───────────────┘
License
This project is licensed under the MIT License
Acknowledgements
A HUGE thank you to the open source community and the creators of DuckDB and Polars for their fantastic libraries that power Datagrunt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datagrunt-0.0.2.tar.gz
.
File metadata
- Download URL: datagrunt-0.0.2.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ed26e11f91feb61bea775b89db0d5bc1d290e0265a08df91b0b0e48e4613322 |
|
MD5 | f18653092e00ece091295e4b4fddb2e0 |
|
BLAKE2b-256 | 81492f85f495a1c57ec7a17620394ed3bc9d11cded41f9ac9c86cb6f9005748d |
File details
Details for the file datagrunt-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: datagrunt-0.0.2-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97a57cc2d1867304cc5dc92c020abd3cd72edde01ed78a68e086eeca4753f73e |
|
MD5 | cdc9ac3dbd45b828a05d19205d6a28f6 |
|
BLAKE2b-256 | 45744b46e8feee8e07c9d4c1bbf45081640e356d4edc09cb5d16b240795ba713 |