Skip to main content

Read CSV files and convert to other file formats easily

Project description

Welcome To Datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

Why Datagrunt?

Born out of real-world frustration, Datagrunt eliminates the need For repetitive coding when handling CSV files. Whether you're a data analyst, data engineer, or data scientist, Datagrunt empowers you to focus on insights, not tedious data wrangling.

Key Features

  • Intelligent Delimiter Inference: Datagrunt automatically detects and applies the correct delimiter for your csv files.
  • Seamless Data Processing: Leverage the robust capabilities of DuckDB and Polars to perform advanced data processing tasks directly on your CSV data.
  • Flexible Transformation: Easily convert your processed CSV data into various formats to suit your needs.
  • Pythonic API: Enjoy a clean and intuitive API that integrates seamlessly into your existing Python workflows.

Installation

Get started with Datagrunt in seconds using pip:

pip install datagrunt

Getting Started

from datagrunt import CSVReader

# Load your CSV file
csv_file = 'electric_vehicle_population_data.csv'
engine = 'duckdb'

# Set duckdb as the processing engine. Engine set to 'polars' by default
dg = CSVReader(csv_file, engine=engine)

# return sample of the data to get a peek at the schema
dg.get_sample()
┌────────────┬───────────┬──────────────┬───┬──────────────────────┬──────────────────────┬───────────────────┐
 VIN (1-10)   County        City          Vehicle Location      Electric Utility    2020 Census Tract 
  varchar     varchar     varchar              varchar               varchar              varchar      
├────────────┼───────────┼──────────────┼───┼──────────────────────┼──────────────────────┼───────────────────┤
 5YJSA1E28K  Snohomish  Mukilteo        POINT (-122.29943    PUGET SOUND ENERGY   53061042001       
 1C4JJXP68P  Yakima     Yakima          POINT (-120.468875   PACIFICORP            53077001601       
 WBY8P6C05L  Kitsap     Kingston        POINT (-122.517835   PUGET SOUND ENERGY   53035090102       
 JTDKARFP1J  Kitsap     Port Orchard    POINT (-122.653005   PUGET SOUND ENERGY   53035092802       
 5UXTA6C09N  Snohomish  Everett         POINT (-122.203234   PUGET SOUND ENERGY   53061041605       
 5YJYGDEF8L  King       Seattle         POINT (-122.378886   CITY OF SEATTLE -    53033004703       
 JTMAB3FV7P  Thurston   Rainier         POINT (-122.677141   PUGET SOUND ENERGY   53067012530       
 JN1AZ0CPXC  King       Kirkland        POINT (-122.192596   PUGET SOUND ENERGY   53033022402       
 JN1AZ0CP7B  King       Kirkland        POINT (-122.192596   PUGET SOUND ENERGY   53033022603       
 1N4AZ0CP0F  Thurston   Olympia         POINT (-122.86491    PUGET SOUND ENERGY   53067010300       
     ·         ·           ·          ·           ·                     ·                 ·            
     ·         ·           ·          ·           ·                     ·                 ·            
     ·         ·           ·          ·           ·                     ·                 ·            
 5YJYGDEE7M  Clark      Vancouver       POINT (-122.515805   BONNEVILLE POWER A   53011041310       
 7SAYGAEE0P  Snohomish  Monroe          POINT (-121.968385   PUGET SOUND ENERGY   53061052203       
 2C4RC1N75P  King       Burien          POINT (-122.347227   CITY OF SEATTLE -    53033027600       
 1FTVW1EVXP  King       Kirkland        POINT (-122.202653   PUGET SOUND ENERGY   53033022300       
 4JGGM1CB2P  King       Seattle         POINT (-122.2453 4   CITY OF SEATTLE -    53033011700       
 1N4BZ0CP0G  King       Seattle         POINT (-122.334079   CITY OF SEATTLE -    53033008300       
 7SAYGDEF2N  King       Bellevue        POINT (-122.144149   PUGET SOUND ENERGY   53033024704       
 1N4BZ1DP7L  King       Bellevue        POINT (-122.144149   PUGET SOUND ENERGY   53033024902       
...
├────────────┴───────────┴──────────────┴───┴──────────────────────┴──────────────────────┴───────────────────┤
 ? rows (>9999 rows, 20 shown)                                                          17 columns (6 shown) 
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

DuckDB Integration for Performant SQL Queries

from datagrunt import CSVReader

csv_file = 'electric_vehicle_population_data.csv'
engine = 'duckdb'

dg = CSVReader(csv_file, engine=engine)

# Construct your SQL query
query = f"""
WITH core AS (
    SELECT
        City AS city,
        "VIN (1-10)" AS vin
    FROM {dg.db_table}
)
SELECT
    city,
    COUNT(vin) AS vehicle_count
FROM core
GROUP BY 1
ORDER BY 2 DESC
"""

# Execute the query and get results as a Polars DataFrame
df = dg.query_data(query).pl()
print(df)
┌────────────────┬───────────────┐
 city            vehicle_count 
 ---             ---           
 str             i64           
╞════════════════╪═══════════════╡
 Seattle         32602         
 Bellevue        9960          
 Redmond         7165          
 Vancouver       7081          
 Bothell         6602          
                             
 Glenwood        1             
 Walla Walla Co  1             
 Pittsburg       1             
 Decatur         1             
 Redwood City    1             
└────────────────┴───────────────┘

License

This project is licensed under the MIT License

Acknowledgements

A HUGE thank you to the open source community and the creators of DuckDB and Polars for their fantastic libraries that power Datagrunt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagrunt-0.0.2.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

datagrunt-0.0.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file datagrunt-0.0.2.tar.gz.

File metadata

  • Download URL: datagrunt-0.0.2.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for datagrunt-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7ed26e11f91feb61bea775b89db0d5bc1d290e0265a08df91b0b0e48e4613322
MD5 f18653092e00ece091295e4b4fddb2e0
BLAKE2b-256 81492f85f495a1c57ec7a17620394ed3bc9d11cded41f9ac9c86cb6f9005748d

See more details on using hashes here.

File details

Details for the file datagrunt-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: datagrunt-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for datagrunt-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 97a57cc2d1867304cc5dc92c020abd3cd72edde01ed78a68e086eeca4753f73e
MD5 cdc9ac3dbd45b828a05d19205d6a28f6
BLAKE2b-256 45744b46e8feee8e07c9d4c1bbf45081640e356d4edc09cb5d16b240795ba713

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page