Skip to main content

Professional Data Science Library for ML Engineers and Researchers

Project description

91life Data Science Library

91.life Logo

Python Version License PyPI Version Documentation Build Status Coverage

Overview

The 91life Data Science Library is a professional, production-ready Python library designed for ML engineers and researchers at 91.life. It provides comprehensive tools for data loading, exploration, feature selection, preprocessing, visualization, and automated reporting.

Key Features

  • Async Data Loading: Support for multiple formats (CSV, Parquet, JSON, Excel) with cloud storage integration (AWS S3, Google Cloud, MinIO)
  • Comprehensive Data Exploration: Automated data quality assessment, missing data analysis, and statistical profiling
  • Advanced Feature Selection: Multiple methods including variance, correlation, mutual information, tree-based, L1 regularization, and consensus selection
  • Data Preprocessing: Complete pipeline for missing value handling, outlier treatment, scaling, encoding, and class imbalance
  • Rich Visualizations: Interactive plots with Plotly, static plots with Matplotlib/Seaborn, and automated dashboards
  • Automated Reporting: Integration with YData Profiling and Sweetviz, plus custom HTML/JSON reports
  • Clean Architecture: Domain-Driven Design (DDD) patterns with comprehensive logging and error handling
  • Performance Optimized: Memory-efficient chunked processing for large datasets

Installation

Basic Installation

pip install 91life-ds-lib

With Cloud Storage Support

pip install 91life-ds-lib[cloud]

With Profiling Tools

pip install 91life-ds-lib[profiling]

Development Installation

git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib
pip install -e ".[dev]"

Quickstart

from ninetyone_life_ds import DataLoader, DataExplorer, FeatureSelector

# Load data efficiently
loader = DataLoader()
data = loader.load_dataset('your_data.csv')

# Explore data comprehensively
explorer = DataExplorer()
basic_info = explorer.analyze_basic_info(data)
missing_analysis = explorer.analyze_missing_data(data)
readiness_score = explorer.calculate_data_readiness_score(data)

# Select features using consensus method
selector = FeatureSelector()
selected_features = selector.consensus_feature_selection(
    data, 
    target_col='target',
    task_type='classification'
)

print(f"Data readiness: {readiness_score['overall_readiness']}/100")
print(f"Selected features: {len(selected_features['selected_features'])}")

Full Example

See examples/complete_workflow.py for a comprehensive demonstration of all library capabilities.

API Overview

Core Modules

  • DataLoader: Efficient data loading with cloud storage support
  • DataExplorer: Comprehensive data exploration and quality assessment
  • FeatureSelector: Advanced feature selection with multiple algorithms
  • DataPreprocessor: Complete preprocessing pipeline
  • Visualizer: Rich visualizations and interactive plots
  • ReportGenerator: Automated report generation and profiling

Main Classes

  • DataLoader: Handles data loading from various sources and formats
  • DataExplorer: Performs comprehensive data analysis and quality assessment
  • FeatureSelector: Implements multiple feature selection algorithms
  • DataPreprocessor: Provides complete data preprocessing pipeline
  • Visualizer: Creates professional visualizations and plots
  • ReportGenerator: Generates comprehensive analysis reports

Development Setup

Prerequisites

  • Python 3.8+
  • pip or conda

Setup

# Clone repository
git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
flake8 src/ tests/

# Format code
black src/ tests/

# Type checking
mypy src/

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src/ninetyone_life_ds --cov-report=html

# Run specific test file
pytest tests/test_data_explorer.py -v

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and linting (pytest && flake8 src/ tests/)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Style

  • Follow PEP 8 guidelines
  • Use type hints for all functions
  • Write comprehensive docstrings (Google style)
  • Ensure all tests pass
  • Maintain test coverage above 90%

License

This project is licensed under the 91Life License - see the LICENSE file for details.

Contact

Company Insights

91.life is a technology company focused on data science and machine learning solutions. The company provides professional tools and services for data analysis, with a focus on healthcare and life sciences applications.

For more information about 91.life's services and team, visit https://91.life.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

91life_ds_lib-1.0.0.tar.gz (76.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

91life_ds_lib-1.0.0-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file 91life_ds_lib-1.0.0.tar.gz.

File metadata

  • Download URL: 91life_ds_lib-1.0.0.tar.gz
  • Upload date:
  • Size: 76.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for 91life_ds_lib-1.0.0.tar.gz
Algorithm Hash digest
SHA256 05ec0e48bd316822e8f771ec7dd1b3c77b410bad9f2c96b1a064e067af23187a
MD5 f87a4e2e1b230ae23d3cc6a2540e05e5
BLAKE2b-256 9acb47ca2ac9d12f70ef1cd74c833405418edcc34f92c8eec59b39b07e707b7e

See more details on using hashes here.

File details

Details for the file 91life_ds_lib-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: 91life_ds_lib-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for 91life_ds_lib-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e018d4c0393e495fa395e2e02992771fa690582e288fe829b329e5f75d45708
MD5 6a4e410a1e97b18ad1bf400e3746a415
BLAKE2b-256 094179684673eb59a537bb361e80c46962a97afb9db0b7eec71953765f81ba8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page