Skip to main content

A Python package for automatic EDA, data cleaning, and visualization.

Project description

pydatalens

pydatalens is a Python package designed to streamline the process of Exploratory Data Analysis (EDA), data cleaning, and visualization. It enables data scientists and analysts to quickly prepare, explore, and gain insights from datasets with minimal effort.


Features

1. Smart Summarization

  • Automatically generates a summary of the dataset, including:
    • Data types
    • Missing values
    • Descriptive statistics
    • Unique value counts

2. Data Cleaning

  • Detects and handles missing values using various strategies (mean, median, mode).
  • Identifies and removes duplicate rows.
  • Supports basic outlier detection (planned for future updates).

3. Correlation Analysis

  • Generates a correlation matrix to identify relationships between features.
  • Provides heatmaps for better visualization.

4. Automatic Visualizations

  • Supports generating:
    • Histograms
    • Box plots
    • Correlation heatmaps
    • Scatter plots (planned for future updates).

5. Report Generation

  • Exports EDA results and visualizations into a detailed HTML report for easy sharing.

Installation

Using pip (from source)

  1. Clone the repository:
    git clone https://github.com/gopalakrishnanarjun/pydatalens.git
    cd pydatalens
    
  2. Install the package:
    pip install -e .
    

Dependencies

  • Python >= 3.6
  • pandas >= 1.0
  • numpy >= 1.18
  • matplotlib >= 3.1
  • seaborn >= 0.11

Install dependencies manually:

pip install pandas numpy matplotlib seaborn

Quick Start

1. Import the package

from pydatalens import eda, cleaning, visualizations

2. Load a dataset

import pandas as pd
df = pd.read_csv("your_dataset.csv")

3. Summarize the dataset

print(eda.summarize(df))

4. Handle missing values

df_cleaned = cleaning.handle_missing(df, strategy="mean")

5. Visualize the data

visualizations.plot_histogram(df_cleaned, column="age")
visualizations.correlation_heatmap(df_cleaned)

Examples

Summarizing the Data

from pydatalens import eda
summary = eda.summarize(df)
print(summary)

Cleaning the Data

from pydatalens import cleaning
df = cleaning.handle_missing(df, strategy="median")
df = cleaning.drop_duplicates(df)

Visualizing the Data

from pydatalens import visualizations
visualizations.plot_histogram(df, "column_name")
visualizations.correlation_heatmap(df)

Future Enhancements

  • Advanced anomaly detection.
  • Support for time series analysis.
  • Enhanced visualization options (e.g., scatter plots, pair plots).
  • Integration with machine learning pipelines.

Contributing

Contributions are welcome! If you'd like to contribute, please fork the repository and submit a pull request.


License

pydatalens is licensed under the MIT License. See the LICENSE file for more details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatalens-0.0.6.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydatalens-0.0.6-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file pydatalens-0.0.6.tar.gz.

File metadata

  • Download URL: pydatalens-0.0.6.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for pydatalens-0.0.6.tar.gz
Algorithm Hash digest
SHA256 458771a5c3f18b8ea4ce7e75ad27749a32aceeb744452fde28346bb059b49686
MD5 84dc53ed52e20ce9059e71b2245b1613
BLAKE2b-256 31b831950fb55eace56a4438cacd2b058d6b216bcd78b5754cc853a86965699c

See more details on using hashes here.

File details

Details for the file pydatalens-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pydatalens-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for pydatalens-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e880f5b71d0cfd53e9250ebf3a7eb6b90a2df992ea43399116b47cee18b5c1e6
MD5 b17a360ebadecbf94f8b11dc184a5c0f
BLAKE2b-256 44bcf759072639df8e592346198eeb2e8e7622cf42949fb9886f73d7ffde6680

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page