Skip to main content

A powerful web application for assessing data quality issues

Project description

Data Quality Assessment Tool

Data Quality Banner Python Flask Pandas License

A powerful web application for quickly assessing data quality issues in datasets. This tool automatically identifies missing values, outliers, data type inconsistencies, and duplicate records, helping data professionals save time and improve data reliability.

Screenshot Screenshot Screenshot Screenshot

🌟 Features

  • Comprehensive Quality Analysis

    • Missing value detection and visualization
    • Outlier identification using statistical methods
    • Data type consistency validation
    • Duplicate record detection
  • Interactive Visualizations

    • Visual representation of data quality issues
    • Dynamic charts showing data distribution
    • Clear indicators of problematic areas
  • Flexible Input Support

    • CSV file support
    • Excel file compatibility
    • JSON data processing
  • Detailed Reporting

    • Downloadable quality reports
    • Actionable insights for data cleaning
    • Summarized quality metrics

📋 Installation

  1. Clone the repository
git clone https://github.com/godwinwa/data-quality-app.git
cd data-quality-assessment-tool

Create and activate a virtual environment

bashpython -m venv dqa-env
source dqa-env/bin/activate  # On Windows: dqa-env\Scripts\activate

Install dependencies

bashpip install -r requirements.txt

Run the application

bashpython app.py

Access the tool

Open your browser and go to: http://localhost:5000
🚀 Usage

Upload your dataset

Click the "Upload" button on the homepage
Select a CSV, Excel, or JSON file
Click "Analyze Data"


Review the analysis

Examine the summary statistics
Explore interactive visualizations
Review detailed quality issues by category


Export results

Download the complete quality report
Use insights to clean and improve your data



📊 Data Quality Checks
Missing Values Analysis

- Identifies columns with missing data
- Calculates the percentage of missing values in each field
- Highlights fields requiring data completion

Outlier Detection

- Uses statistical methods (IQR or Z-score)
- Identifies numerical values that significantly deviate from the norm
- Provides visual representation of outlier distribution

Data Type Consistency

- Validates that data conforms to expected types
- Identifies potential type mismatches or conversion opportunities
- Suggests appropriate data type transformations

Duplicate Detection

- Finds exact duplicate records
- Highlights columns with high duplication rates
- Calculates duplication percentages across the dataset

🔧 Technical Architecture

Backend: Flask web framework
Data Processing: Pandas, NumPy
Visualization: Plotly
Frontend: Bootstrap, HTML/CSS/JavaScript

🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
📬 Contact
Have questions or suggestions? Feel free to reach out!

Made with ❤️ by G

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_quality_assessment-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_quality_assessment-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file data_quality_assessment-0.1.0.tar.gz.

File metadata

  • Download URL: data_quality_assessment-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for data_quality_assessment-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e18cfe3a04984fb78d480e88be14968d7bf752ca06107934efb1b034a009b085
MD5 140caf6481aa5427b1d33b36465a1b36
BLAKE2b-256 8e0904e488d960d1a3acf7532ee836e156e0b8742a0cc7b7ed908ea20177e540

See more details on using hashes here.

File details

Details for the file data_quality_assessment-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_quality_assessment-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 343944ebbacdf9e6dfca09ac16ca726b4f082c49cda41a79b7c08276fccde282
MD5 5825f3176179d14b4ba1defa5aa0fc30
BLAKE2b-256 e40028b371e095c920f1a997fe9dc70354389752220712feffd7e86d64e90e17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page