A powerful web application for assessing data quality issues
Project description
Data Quality Assessment Tool
A powerful web application for quickly assessing data quality issues in datasets. This tool automatically identifies missing values, outliers, data type inconsistencies, and duplicate records, helping data professionals save time and improve data reliability.
🌟 Features
-
Comprehensive Quality Analysis
- Missing value detection and visualization
- Outlier identification using statistical methods
- Data type consistency validation
- Duplicate record detection
-
Interactive Visualizations
- Visual representation of data quality issues
- Dynamic charts showing data distribution
- Clear indicators of problematic areas
-
Flexible Input Support
- CSV file support
- Excel file compatibility
- JSON data processing
-
Detailed Reporting
- Downloadable quality reports
- Actionable insights for data cleaning
- Summarized quality metrics
📋 Installation
- Clone the repository
git clone https://github.com/godwinwa/data-quality-app.git
cd data-quality-assessment-tool
Create and activate a virtual environment
bashpython -m venv dqa-env
source dqa-env/bin/activate # On Windows: dqa-env\Scripts\activate
Install dependencies
bashpip install -r requirements.txt
Run the application
bashpython app.py
Access the tool
Open your browser and go to: http://localhost:5000
🚀 Usage
Upload your dataset
Click the "Upload" button on the homepage
Select a CSV, Excel, or JSON file
Click "Analyze Data"
Review the analysis
Examine the summary statistics
Explore interactive visualizations
Review detailed quality issues by category
Export results
Download the complete quality report
Use insights to clean and improve your data
📊 Data Quality Checks
Missing Values Analysis
- Identifies columns with missing data
- Calculates the percentage of missing values in each field
- Highlights fields requiring data completion
Outlier Detection
- Uses statistical methods (IQR or Z-score)
- Identifies numerical values that significantly deviate from the norm
- Provides visual representation of outlier distribution
Data Type Consistency
- Validates that data conforms to expected types
- Identifies potential type mismatches or conversion opportunities
- Suggests appropriate data type transformations
Duplicate Detection
- Finds exact duplicate records
- Highlights columns with high duplication rates
- Calculates duplication percentages across the dataset
🔧 Technical Architecture
Backend: Flask web framework
Data Processing: Pandas, NumPy
Visualization: Plotly
Frontend: Bootstrap, HTML/CSS/JavaScript
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
📬 Contact
Have questions or suggestions? Feel free to reach out!
Made with ❤️ by G
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_quality_assessment-0.1.0.tar.gz.
File metadata
- Download URL: data_quality_assessment-0.1.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e18cfe3a04984fb78d480e88be14968d7bf752ca06107934efb1b034a009b085
|
|
| MD5 |
140caf6481aa5427b1d33b36465a1b36
|
|
| BLAKE2b-256 |
8e0904e488d960d1a3acf7532ee836e156e0b8742a0cc7b7ed908ea20177e540
|
File details
Details for the file data_quality_assessment-0.1.0-py3-none-any.whl.
File metadata
- Download URL: data_quality_assessment-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
343944ebbacdf9e6dfca09ac16ca726b4f082c49cda41a79b7c08276fccde282
|
|
| MD5 |
5825f3176179d14b4ba1defa5aa0fc30
|
|
| BLAKE2b-256 |
e40028b371e095c920f1a997fe9dc70354389752220712feffd7e86d64e90e17
|