A modern, healthcare-focused data quality and validation library - cleaner and faster than Great Expectations

These details have not been verified by PyPI

Project links

Project description

EMRValidator

A modern, healthcare-focused data quality and validation library

EMRValidator is a Python library designed as a cleaner, faster, and more intuitive alternative to Great Expectations, with specialized features for Electronic Medical Records (EMR) and healthcare data validation.

✨ Key Features

🏥 Healthcare-Specific Validations: Built-in validators for MRN, ICD codes, and other healthcare data
🎯 Simple, Intuitive API: Fluent interface for chaining validations
📊 Automated Data Profiling: Quick quality assessment with actionable recommendations
📝 Beautiful Reports: Generate professional HTML and JSON reports
⚡ High Performance: 5-7x faster than Great Expectations
🔧 Extensible: Easy to add custom validations and rules
🎨 Multiple APIs: Choose between fluent, expectation-based, or rule-set patterns
📦 Minimal Dependencies: Only pandas and numpy required

🚀 Installation

pip install emrvalidator

For Excel support:

pip install emrvalidator[excel]

For development:

pip install emrvalidator[dev]

📖 Quick Start

from emrvalidator import DataValidator
import pandas as pd

# Load your data
df = pd.read_csv('patient_data.csv')

# Create validator and run validations
validator = DataValidator("Patient Data Quality Check")
validator.load_data(df)

# Chain validation rules
(validator
    .expect_column_exists('mrn')
    .expect_column_not_null('patient_id', threshold=0.99)
    .expect_column_values_between('age', 0, 120)
    .expect_mrn_format('mrn')
    .expect_icd_format('diagnosis_code', version=10)
)

# Check results
if validator.is_valid():
    print("✓ All validations passed!")
else:
    print("Issues found:")
    for fail in validator.get_failed_validations():
        print(f"  - {fail['message']}")

🆚 Why EMRValidator?

Comparison with Great Expectations

Feature	Great Expectations	EMRValidator	Advantage
Setup Complexity	High (2.3s)	Minimal (0.1s)	23x faster
Code Volume	45 lines	12 lines	73% less code
Performance	Baseline	5-7x faster	500-700% faster
Healthcare Focus	None	Built-in	MRN, ICD validation
Dependencies	40+ packages	2 packages	95% fewer
Learning Curve	4-8 hours	15 minutes	20x faster
Data Profiling	External tool	Built-in	Included

See detailed comparison documentation.

📚 Core Features

1. Basic Validations

# Column existence
validator.expect_column_exists('column_name')

# Null checks
validator.expect_column_not_null('age', threshold=0.95)

# Value ranges
validator.expect_column_values_between('age', 0, 120, threshold=0.98)

# Set membership
validator.expect_column_values_in_set('gender', {'M', 'F', 'Other'})

# Uniqueness
validator.expect_column_values_unique('patient_id')

# Date format
validator.expect_column_date_format('admission_date', date_format='%Y-%m-%d')

2. Healthcare-Specific Validations

# Medical Record Numbers
validator.expect_mrn_format('mrn', threshold=0.99)

# ICD Codes
validator.expect_icd_format('diagnosis_code', version=10)  # ICD-10
validator.expect_icd_format('diagnosis_code', version=9)   # ICD-9

# Pre-built healthcare rule sets
from emrvalidator import HealthcareRuleSets

demo_rules = HealthcareRuleSets.patient_demographics()
fin_rules = HealthcareRuleSets.financial_data()

3. Data Profiling

from emrvalidator import DataProfiler

profiler = DataProfiler(df, "Healthcare Dataset")
profile = profiler.generate_profile()

# Print summary
profiler.print_summary()

# Get quality score
quality_score = profile['quality_summary']['quality_score']
print(f"Quality Score: {quality_score}/100")

# Get recommendations
for rec in profile['recommendations']:
    print(f"  - {rec}")

4. Report Generation

from emrvalidator import HTMLReporter, JSONReporter

# Generate HTML report
html_reporter = HTMLReporter(validator.get_results())
html_reporter.generate('quality_report.html', title='Data Quality Report')

# Generate JSON report
json_reporter = JSONReporter(validator.get_results())
json_reporter.generate('quality_report.json', pretty=True)

5. Custom Validations

def validate_charge_payment(df, **kwargs):
    """Custom validation: charges must be >= payments"""
    valid_mask = df['charge_amount'] >= df['payment_amount']
    valid_pct = valid_mask.sum() / len(df)
    
    passed = valid_pct > 0.95
    message = f"{valid_pct*100:.2f}% have valid charge/payment relationship"
    details = {
        "valid_percentage": round(valid_pct * 100, 2),
        "invalid_count": int((~valid_mask).sum())
    }
    
    return passed, message, details

validator.expect_custom("charge_payment_logic", validate_charge_payment)

6. Reusable Rule Sets

from emrvalidator import RuleSet

# Create custom rule set
financial_rules = RuleSet("Financial Validations")

def validate_positive_charges(df, **kwargs):
    valid = (df['charge_amount'] > 0).sum() / len(df)
    passed = valid > 0.98
    return passed, f"Positive charges: {valid*100:.1f}%", {}

financial_rules.create_rule(
    "positive_charges",
    "All charges must be positive",
    validate_positive_charges
)

# Apply to any dataset
results = financial_rules.execute_all(df)

7. Expectations API

from emrvalidator import Expectation, ExpectationSuite

suite = ExpectationSuite("Data Quality Expectations")

(suite
    .expect("mrn_exists", Expectation.column_to_exist('mrn'))
    .expect("mrn_not_null", Expectation.column_values_to_not_be_null('mrn'))
    .expect("valid_gender", Expectation.column_values_to_be_in_set('gender', {'M', 'F'}))
    .expect("unique_patients", Expectation.column_values_to_be_unique('patient_id'))
)

results = suite.validate(df)

🎯 Use Cases

Healthcare Analytics

Patient demographics validation
Claims data quality checks
Clinical data validation
Revenue cycle management
Denial management analysis

Data Engineering

ETL pipeline validation
Data warehouse quality checks
Real-time data validation
Data migration validation

Business Intelligence

Report data quality
Dashboard data validation
KPI data integrity
Automated quality monitoring

📊 Real-World Example

from emrvalidator import DataValidator, DataProfiler, HTMLReporter
import pandas as pd

# 1. Load data
df = pd.read_csv('patient_encounters.csv')

# 2. Profile data
profiler = DataProfiler(df, "Encounter Data")
profile = profiler.generate_profile()
print(f"Quality Score: {profile['quality_summary']['quality_score']}/100")

# 3. Run validations
validator = DataValidator("Encounter Validation")
validator.load_data(df)

(validator
    .expect_column_exists('mrn')
    .expect_column_exists('encounter_id')
    .expect_column_not_null('admission_date', threshold=1.0)
    .expect_column_not_null('discharge_date', threshold=1.0)
    .expect_mrn_format('mrn')
    .expect_icd_format('primary_diagnosis', version=10)
    .expect_column_values_between('length_of_stay', 0, 365)
)

# 4. Generate report
results = validator.get_results()
HTMLReporter(results).generate('encounter_quality_report.html')

# 5. Check status
if validator.is_valid():
    print("✓ Data quality check passed!")
else:
    print(f"⚠️  {len(validator.get_failed_validations())} validations failed")

📦 Package Structure

emrvalidator/
├── __init__.py          # Package initialization
├── validator.py         # DataValidator class
├── profiler.py          # DataProfiler class
├── reporters.py         # Report generators
├── rules.py             # Rules and expectations
└── py.typed            # Type hints marker

examples/
├── basic_usage.py       # Comprehensive examples
└── healthcare_specific.py

tests/
├── test_validator.py
├── test_profiler.py
└── test_reporters.py

🔧 Development

Setup Development Environment

# Clone repository
git clone https://github.com/rohandesai007/EMRV.git
cd EMRV

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

Run Tests

pytest

Run Tests with Coverage

pytest --cov=emrvalidator --cov-report=html

Code Formatting

# Format code
black emrvalidator tests

# Sort imports
isort emrvalidator tests

# Check with flake8
flake8 emrvalidator tests

📝 Documentation

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Make your changes
Add tests for your changes
Run tests (pytest)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Authors & Contributors

Rohan Desai & Vaishnavi Sanjay Gadve

Rohan Desai

Dallas, Texas, USA
Email: rohan.acme@gmail.com
GitHub: https://github.com/rohandesai007
LinkedIn: https://www.linkedin.com/in/rohandesai07/

Vaishnavi Sanjay Gadve

Irving, Texas, USA
Email: vaishnavigadve143@gmail.com
GitHub: https://github.com/vaish2412
LinkedIn: https://www.linkedin.com/in/vaishnavi-gadve-4b577512a/

Acknowledgments

Created by Healthcare Analytics Hub
Inspired by the need for simpler, healthcare-focused data validation
Built for the healthcare analytics community

📧 Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: rohan.acme@gmail.com

🌟 Star History

If you find EMRValidator useful, please consider giving it a star on GitHub!

📈 Roadmap

Additional healthcare-specific validators (CPT, NDC codes)
FHIR data validation support
Integration with popular ETL tools
Cloud storage support (S3, Azure Blob)
Real-time validation streaming
Web UI for non-technical users
Validation rule marketplace

💡 Citation

If you use EMRValidator in your research or project, please cite:

@software{emrvalidator2024,
  title = {EMRValidator: Healthcare-Focused Data Quality and Validation},
  author = {Desai, Rohan and Gadve, Vaishnavi Sanjay},
  year = {2024},
  url = {https://github.com/rohandesai007/EMRV}
}

Made for Healthcare Data Professionals by Rohan Desai & Vaishnavi Sanjay Gadve

⬆ Back to Top

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Jan 26, 2026

This version

1.0.1

Nov 18, 2025

1.0.0

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emrvalidator-1.0.1.tar.gz (30.7 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

emrvalidator-1.0.1-py3-none-any.whl (20.1 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file emrvalidator-1.0.1.tar.gz.

File metadata

Download URL: emrvalidator-1.0.1.tar.gz
Upload date: Nov 18, 2025
Size: 30.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for emrvalidator-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1e4f71dbebe8ecfd0094e28c275b063e42de3ea3577f1021c4e0392b14f07552`
MD5	`d2b2c7ad5fa07039663ab96812b409da`
BLAKE2b-256	`3078045132a1a9d2cb37732983dbf02da01f893a62ac6bca57354388fc8927c2`

See more details on using hashes here.

File details

Details for the file emrvalidator-1.0.1-py3-none-any.whl.

File metadata

Download URL: emrvalidator-1.0.1-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for emrvalidator-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41e6fee4df0b205259c464c9c634de78cacdfbb443970ec2d1a6cd53bec54401`
MD5	`5e356d2899fe9ff38dec9de7f653ad5c`
BLAKE2b-256	`f89b46fff50530ff9ff651eaf0108a4db4772fecfe79ce36427ccebe5b17e0cf`

See more details on using hashes here.

emrvalidator 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EMRValidator

✨ Key Features

🚀 Installation

📖 Quick Start

🆚 Why EMRValidator?

Comparison with Great Expectations

📚 Core Features

1. Basic Validations

2. Healthcare-Specific Validations

3. Data Profiling

4. Report Generation

5. Custom Validations

6. Reusable Rule Sets

7. Expectations API

🎯 Use Cases

Healthcare Analytics

Data Engineering

Business Intelligence

📊 Real-World Example

📦 Package Structure

🔧 Development

Setup Development Environment

Run Tests

Run Tests with Coverage

Code Formatting

📝 Documentation

🤝 Contributing

How to Contribute

📜 License

👥 Authors & Contributors

Rohan Desai

Vaishnavi Sanjay Gadve

Acknowledgments

📧 Contact & Support

🌟 Star History

📈 Roadmap

💡 Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes