Skip to main content

Tiny wrapper around pandas with built‑in logging & docs.

Project description

DataGuy

DataGuy is a Python package designed to simplify data science workflows by leveraging the power of Large Language Models (LLMs). It provides tools for automated data wrangling, intelligent analysis, and AI-assisted visualization, making it ideal for small-to-medium datasets.

Features

  • Automated Data Wrangling: Clean and preprocess your data with minimal effort using LLM-generated code.
  • AI-Powered Data Visualization: Generate insightful plots and visualizations based on natural language descriptions.
  • Intelligent Data Analysis: Perform descriptive and inferential analysis with the help of LLMs.
  • Customizable Workflows: Integrate with pandas, matplotlib, and other Python libraries for seamless data manipulation.
  • Safe Code Execution: Built-in safeguards to ensure only safe and trusted code is executed.

Installation

Install the package using pip:

pip install dataguy

Usage

Getting Started

  1. Load Anthrpic API key in your environment:

    import os
    os.environ["ANTHROPIC_API_KEY"] = "your_api_key_here"
    

    Replace your_api_key_here with your actual API key from Anthropic.

  2. Import the Package:

    from dataguy import DataGuy
    
  3. Initialize a DataGuy Instance:

    dg = DataGuy()
    
  4. Load Your Data:

    import pandas as pd
    data = pd.DataFrame({"age": [25, 30, None], "score": [88, 92, 75]})
    dg.set_data(data)
    
  5. Summarize Your Data:

    summary = dg.summarize_data()
    print(summary)
    
  6. Wrangle Your Data:

    cleaned_data = dg.wrangle_data()
    
  7. Visualize Your Data:

    dg.plot_data("age", "score")
    
  8. Analyze Your Data:

    results = dg.analyze_data()
    print(results)
    

Example Workflow

from dataguy import DataGuy
import pandas as pd

# Initialize DataGuy
dg = DataGuy()

# Load data
data = pd.read_csv("path/to/data.csv")
dg.set_data(data)

# Summarize data
summary = dg.summarize_data()
print("Data Summary:", summary)

# Wrangle data
cleaned_data = dg.wrangle_data()

# Visualize data
dg.plot_data("column_x", "column_y")

# Analyze data
analysis_results = dg.analyze_data()
print("Analysis Results:", analysis_results)

Key Methods

  • set_data(obj): Load data into the DataGuy instance. Supports pandas DataFrames, dictionaries, lists, numpy arrays, and CSV files.
  • summarize_data(): Generate a summary of the dataset, including shape, columns, missing values, and means.
  • wrangle_data(): Automatically clean and preprocess the dataset for analysis.
  • plot_data(column_x, column_y): Create a scatter plot of two columns using matplotlib.
  • analyze_data(): Perform an automated analysis of the dataset, returning descriptive statistics and insights.

Requirements

  • Python 3.8 or higher
  • Dependencies:
    • pandas
    • numpy
    • matplotlib
    • scikit-learn
    • claudette
    • anthropic

Contributing

Contributions are welcome! Please submit issues or pull requests via the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Authors

  • István Magyary
  • Sára Viemann
  • Kristóf Bálint

For inquiries, contact: magistak@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataguy-0.1.6.tar.gz (350.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataguy-0.1.6-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file dataguy-0.1.6.tar.gz.

File metadata

  • Download URL: dataguy-0.1.6.tar.gz
  • Upload date:
  • Size: 350.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataguy-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c2104226937bcdbc858c7a4b20e72e4dccd811ba3c0a721f953ebb2abe4da76e
MD5 615cd1566bdc202df95b24d9dca2d7d3
BLAKE2b-256 41a6fb6e836a1ffa114f1612e54bf5fa4b9bbf301bc1e13457bc68607294117c

See more details on using hashes here.

File details

Details for the file dataguy-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: dataguy-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataguy-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0a995c3d32b117bf12fe54765808646195d1fe339e2c836ee1717c6f65b7b04a
MD5 a17283856d964e4915b4125ae394272f
BLAKE2b-256 4ce0a8a69b8248e9b4eec897f1525e77b58c6cd3dd2446e90086f1a7ff95443b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page