Skip to main content

The DataAnalysisToolkit project is a Python-based data analysis tool designed to streamline various data analysis tasks. It allows users to load data from CSV files and perform operations such as statistical calculations, outlier detection, data cleaning, and visualization.

Project description

Data Analysis Toolkit

Upload Python Package PyPI License Python Version Code Size Last Commit Issues Pull Requests Documentation Status

DataAnalysisToolkit is a comprehensive Python package offering a suite of tools designed for efficient data analysis. This toolkit simplifies tasks such as loading CSV data, performing statistical analysis, cleaning data, and visualizing results. It's an ideal tool for data analysts, scientists, and anyone looking to dive into data exploration and machine learning.

Features

  • Data Loading: Load data directly from CSV files into a Python environment.
  • Statistical Analysis: Perform calculations like mean, median, mode, and trimmed mean.
  • Outlier Detection: Identify outliers using the z-score method.
  • Data Cleaning: Handle missing values, drop duplicates, and encode categorical data.
  • Data Splitting: Easily split data into training and testing sets for machine learning models.
  • Data Visualization: Create histograms and other plots to explore data visually.
  • Data Export: Export cleaned and processed data back into CSV format.

Enhanced Functionalities

  • Advanced Visualization: Utilize a dedicated visualizer for creating a variety of insightful data plots.
  • Feature Engineering: Enhance your data with new, informative features.
  • Model Evaluation: Assess the performance of machine learning models.
  • Report Generation: Automatically generate comprehensive HTML reports with summaries and visualizations.
  • Data Imputation: Implement advanced imputation techniques to handle missing data.

This toolkit is an asset for conducting preliminary data analysis, and it seamlessly integrates into larger data processing workflows.

Getting Started

Here's how you can get started with DataAnalysisToolkit:

from data_analysis_toolkit import DataAnalysisToolkit

# Initialize the analyzer with the path to a CSV file
analyzer = DataAnalysisToolkit('../data/test.csv')


# Calculate the mean, median, mode, and trimmed mean of a column
statistics = analyzer.calculate_budget_statistics('column_name')
print(statistics)

# Detect outliers in a column using the z-score method
outliers = analyzer.detect_outliers('column_name')
print(outliers)

# Handle missing values in a column
analyzer.handle_missing_values('column_name', strategy='fill', fill_value=0)

# Drop duplicate rows in the DataFrame
analyzer.drop_duplicates()

# Encode categorical features in the DataFrame
analyzer.encode_categorical_features()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = analyzer.split_data('target_column')

# Plot a histogram of a column
analyzer.plot_data('column_name')

# Export the data to a CSV file
analyzer.export_data('new_file.csv')

Installation

Install DataAnalysisToolkit using pip:

pip install dataanalysistoolkit

Documentation

For detailed documentation, examples, and usage guides, please visit DataAnalysisToolkit Documentation.

Contributing

Contributions are welcome! For guidelines on how to contribute, please refer to our Contribution Guide.

License

DataAnalysisToolkit is open-sourced under the MIT License. For more details, see the LICENSE file.


Developed with ❤ by the DataAnalysisToolkit Team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataanalysistoolkit-1.2.2.tar.gz (61.8 kB view details)

Uploaded Source

Built Distribution

dataanalysistoolkit-1.2.2-py3-none-any.whl (67.8 kB view details)

Uploaded Python 3

File details

Details for the file dataanalysistoolkit-1.2.2.tar.gz.

File metadata

  • Download URL: dataanalysistoolkit-1.2.2.tar.gz
  • Upload date:
  • Size: 61.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dataanalysistoolkit-1.2.2.tar.gz
Algorithm Hash digest
SHA256 800964229bbc5c911aaf52e4f6ee61d84f27e75a89d2ead2a9c470b5992b8f5b
MD5 9f683c9078d1343979aae6abf453a037
BLAKE2b-256 886bd73bcf92b3afbfb76e29382fc3bec865e9a85750d920010a620ab08c0ca1

See more details on using hashes here.

File details

Details for the file dataanalysistoolkit-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dataanalysistoolkit-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c58a38fd5f1a8f438c6392769c87b2a87291d78984054cebb838bdc3066d7212
MD5 9bb9f8f4b94c0234b5f9fceb291db384
BLAKE2b-256 f4eb81fcf52d2347049ede1328c2429164076a290a1466b56fef488491fbddd2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page