A no-code solution for performing data cleaning like misssing value imputation,outlier handling,normalisation,transformation and quality check with an intuitive interface for interactive DataFrame manipulation and easy CSV export.
Project description
DataRefine
DataRefine is a Python package designed for data cleaning with interactive output and visualizations. It offers a streamlined interface to help users detect and handle missing values, outliers, perform normalization and transformation, and assess data quality. The package also integrates interactive visualizations to make it easy for users to understand their data, along with an interface for an enhanced user experience.
Features
-
Interactive Data Upload: Easy CSV file upload functionality
-
Missing Data Handling:
- Multiple imputation strategies (mean, median, mode, predictive)
- Visual representation of missing value patterns
- Column-specific imputation options
-
Outlier Detection & Treatment:
- Multiple detection methods (IQR, Z-score)
- Configurable thresholds
- Visual outlier analysis using box plots
- Multiple handling strategies (capping, removal, imputation)
-
Data Normalization:
- Multiple normalization methods (Min-Max, Z-score, Robust scaling)
- Interactive distribution visualization
- Column-specific normalization
-
Data Transformation:
- Log transformation
- Square root transformation
- Box-Cox transformation
- Before/after distribution comparison
-
Data Quality Assessment:
- Summary statistics
- Visual quality reports
Installation
It's recommended to install DataRefine
in a virtual environment to manage dependencies effectively and avoid conflicts with other projects.
1. Set Up a Virtual Environment
For Python 3.3 and above:
-
Create a Virtual Environment:
python -m venv env
Replace
env
with your preferred name for the virtual environment. -
Activate the Virtual Environment:
-
On Windows:
env\Scripts\activate
-
On macOS/Linux:
source env/bin/activate
-
2. Install DataRefine
Once the virtual environment is activated, you can install DataRefine
using pip
:
pip install datarefine==1.0
Quick Start
After installation, you can start DataRefine directly by running:
DataRefine
Open your web browser and navigate to the provided local URL.
Upload your CSV file.
Start cleaning your data!
How to use?
-
Data Upload:
- Click the "Upload CSV" button.
- Select your CSV file from your local system.
-
Data Cleaning:
- Use the sidebar to navigate between different cleaning operations.
- Configure parameters using the interactive controls.
- View real-time visualizations of the changes.
- Download the cleaned dataset when finished.
- For a detailed video walkthrough of the app's features and functionality, check out our YouTube demo.
Requirements
- Python >= 3.7
- Streamlit
- Pandas
- NumPy
- plotly
- scikit-learn
For more detailed information, see the requirements.txt
file.
Contributing
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a new branch (git checkout -b feature/improvement)
- Make your changes
- Commit your changes (git commit -am 'Add new feature')
- Push to the branch (git push origin feature/improvement)
- Create a Pull Request
License
This project is licensed under the MIT License - see the LICENSE.mdfile for details.
Acknowledgments
Special thanks to all the libraries and frameworks that have helped in developing this package.
Version History
- 1.0.0: Initial release
- Basic data cleaning functionality
- Interactive web interface
- Visualization capabilities
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DataRefine-1.0.tar.gz
.
File metadata
- Download URL: DataRefine-1.0.tar.gz
- Upload date:
- Size: 67.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6deafb2f6fe1cd524f828f45d16f986809b2e3b71a74badcb1f6dc6a1b58403c |
|
MD5 | 8b5a9f5d9530fc15b88a021244271ea0 |
|
BLAKE2b-256 | cbef4975f5b5da5cfd8bcf5a6121ac1633757a5924d0eac0fc27e2d8895e9633 |
File details
Details for the file DataRefine-1.0-py3-none-any.whl
.
File metadata
- Download URL: DataRefine-1.0-py3-none-any.whl
- Upload date:
- Size: 67.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e06a6dd082c0300f475eba6c7857043fb5e029245166c8db3f0325f8d36ecf25 |
|
MD5 | 31ce9e42b04e9437cb557975fbc1cff7 |
|
BLAKE2b-256 | a5688c50b319843449506340b805dbd8d5d9fd365e1512d93e64033018b89083 |