Skip to main content

A no-code solution for performing data cleaning like misssing value imputation,outlier handling,normalisation,transformation and quality check with an intuitive interface for interactive DataFrame manipulation and easy CSV export.

Project description

DataRefine

DataRefine logo

PyPI License Python Versions

DataRefine is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.

Features

  • Interactive Data Upload: Easy CSV file upload functionality

  • Missing Data Handling:

    • Multiple imputation strategies (mean, median, mode, predictive)
    • Visual representation of missing value patterns
    • Column-specific imputation options
  • Outlier Detection & Treatment:

    • Multiple detection methods (IQR, Z-score)
    • Configurable thresholds
    • Visual outlier analysis using box plots
    • Multiple handling strategies (capping, removal, imputation)
  • Data Normalization:

    • Multiple normalization methods (Min-Max, Z-score, Robust scaling)
    • Interactive distribution visualization
    • Column-specific normalization
  • Data Transformation:

    • Log transformation
    • Square root transformation
    • Box-Cox transformation
    • Before/after distribution comparison
  • Data Quality Assessment:

    • Summary statistics
    • Visual quality reports

Installation

It's recommended to install DataRefine in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.

1. Set Up a Virtual Environment

For Python 3.3 and above:

  1. Create a Virtual Environment:

    python -m venv env
    

    Replace env with your preferred name for the virtual environment.

  2. Activate the Virtual Environment:

    • On Windows:

      env\Scripts\activate
      
    • On macOS/Linux:

      source env/bin/activate
      

2. Install DataRefine

Once the virtual environment is activated, you can install DataRefine using pip:

pip install datarefine==1.0

Quick Start

After installation, you can start DataRefine directly by running:

DataRefine

Open your web browser and navigate to the provided local URL.

Upload your CSV file.

Start cleaning your data!

How to use?

  • Data Upload:

    • Click the "Upload CSV" button.
    • Select your CSV file from your local system.
  • Data Cleaning:

    • Use the sidebar to navigate between different cleaning operations.
    • Configure parameters using the interactive controls.
    • View real-time visualizations of the changes.
    • Download the cleaned dataset when finished.
    • For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.

Requirements

  • Python >= 3.7
  • Streamlit
  • Pandas
  • NumPy
  • plotly
  • scikit-learn

For more detailed information, see the requirements.txt file.

Contributing

We welcome contributions! Please follow these steps:

  • Fork the repository
  • Create a new branch (git checkout -b feature/improvement)
  • Make your changes
  • Commit your changes (git commit -am 'Add new feature')
  • Push to the branch (git push origin feature/improvement)
  • Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE.mdfile for details.

Acknowledgments

Special thanks to all the libraries and frameworks that have helped in developing this package.

Version History

  • 1.0.0: Initial release
  • Basic data cleaning functionality
  • Interactive web interface
  • Visualization capabilities

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataRefine-1.0.tar.gz (67.7 kB view details)

Uploaded Source

Built Distribution

DataRefine-1.0-py3-none-any.whl (67.9 kB view details)

Uploaded Python 3

File details

Details for the file DataRefine-1.0.tar.gz.

File metadata

  • Download URL: DataRefine-1.0.tar.gz
  • Upload date:
  • Size: 67.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for DataRefine-1.0.tar.gz
Algorithm Hash digest
SHA256 6deafb2f6fe1cd524f828f45d16f986809b2e3b71a74badcb1f6dc6a1b58403c
MD5 8b5a9f5d9530fc15b88a021244271ea0
BLAKE2b-256 cbef4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633

See more details on using hashes here.

File details

Details for the file DataRefine-1.0-py3-none-any.whl.

File metadata

  • Download URL: DataRefine-1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for DataRefine-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e06a6dd082c0300f475eba6c7857043fb5e029245166c8db3f0325f8d36ecf25
MD5 31ce9e42b04e9437cb557975fbc1cff7
BLAKE2b-256 a5688c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page