Tiny wrapper around pandas with built‑in logging & docs.
Project description
DataGuy
DataGuy is a Python package designed to simplify data science workflows by leveraging the power of Large Language Models (LLMs). It provides tools for automated data wrangling, intelligent analysis, and AI-assisted visualization, making it ideal for small-to-medium datasets.
- GitHub: View the source code on GitHub
- PyPI: Install from PyPI
- Documentation: Read the full documentation
- Demo: Try the demo
Features
- Automated Data Wrangling: Clean and preprocess your data with minimal effort using LLM-generated code.
- AI-Powered Data Visualization: Generate insightful plots and visualizations based on natural language descriptions.
- Intelligent Data Analysis: Perform descriptive and inferential analysis with the help of LLMs.
- Customizable Workflows: Integrate with pandas, matplotlib, and other Python libraries for seamless data manipulation.
- Safe Code Execution: Built-in safeguards to ensure only safe and trusted code is executed.
Installation
Install the package using pip:
pip install dataguy
Usage
Getting Started
-
Load Anthrpic API key in your environment:
import os os.environ["ANTHROPIC_API_KEY"] = "your_api_key_here"
Replace
your_api_key_herewith your actual API key from Anthropic. -
Import the Package:
from dataguy import DataGuy
-
Initialize a DataGuy Instance:
dg = DataGuy()
-
Load Your Data:
import pandas as pd data = pd.DataFrame({"age": [25, 30, None], "score": [88, 92, 75]}) dg.set_data(data)
-
Summarize Your Data:
summary = dg.summarize_data() print(summary)
-
Wrangle Your Data:
cleaned_data = dg.wrangle_data()
-
Visualize Your Data:
dg.plot_data("age", "score")
-
Analyze Your Data:
results = dg.analyze_data() print(results)
Example Workflow
from dataguy import DataGuy
import pandas as pd
# Initialize DataGuy
dg = DataGuy()
# Load data
data = pd.read_csv("path/to/data.csv")
dg.set_data(data)
# Summarize data
summary = dg.summarize_data()
print("Data Summary:", summary)
# Wrangle data
cleaned_data = dg.wrangle_data()
# Visualize data
dg.plot_data("column_x", "column_y")
# Analyze data
analysis_results = dg.analyze_data()
print("Analysis Results:", analysis_results)
Key Methods
set_data(obj): Load data into theDataGuyinstance. Supports pandas DataFrames, dictionaries, lists, numpy arrays, and CSV files.summarize_data(): Generate a summary of the dataset, including shape, columns, missing values, and means.wrangle_data(): Automatically clean and preprocess the dataset for analysis.plot_data(column_x, column_y): Create a scatter plot of two columns using matplotlib.analyze_data(): Perform an automated analysis of the dataset, returning descriptive statistics and insights.
Requirements
- Python 3.8 or higher
- Dependencies:
- pandas
- numpy
- matplotlib
- scikit-learn
- claudette
- anthropic
Contributing
Contributions are welcome! Please submit issues or pull requests via the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Authors
- István Magyary
- Sára Viemann
- Kristóf Bálint
For inquiries, contact: magistak@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataguy-0.1.6.tar.gz.
File metadata
- Download URL: dataguy-0.1.6.tar.gz
- Upload date:
- Size: 350.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2104226937bcdbc858c7a4b20e72e4dccd811ba3c0a721f953ebb2abe4da76e
|
|
| MD5 |
615cd1566bdc202df95b24d9dca2d7d3
|
|
| BLAKE2b-256 |
41a6fb6e836a1ffa114f1612e54bf5fa4b9bbf301bc1e13457bc68607294117c
|
File details
Details for the file dataguy-0.1.6-py3-none-any.whl.
File metadata
- Download URL: dataguy-0.1.6-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a995c3d32b117bf12fe54765808646195d1fe339e2c836ee1717c6f65b7b04a
|
|
| MD5 |
a17283856d964e4915b4125ae394272f
|
|
| BLAKE2b-256 |
4ce0a8a69b8248e9b4eec897f1525e77b58c6cd3dd2446e90086f1a7ff95443b
|