Professional Data Science Library for ML Engineers and Researchers
Project description
91life Data Science Library
Overview
The 91life Data Science Library is a professional, production-ready Python library designed for ML engineers and researchers at 91.life. It provides comprehensive tools for data loading, exploration, feature selection, preprocessing, visualization, and automated reporting.
Key Features
- Async Data Loading: Support for multiple formats (CSV, Parquet, JSON, Excel) with cloud storage integration (AWS S3, Google Cloud, MinIO)
- Comprehensive Data Exploration: Automated data quality assessment, missing data analysis, and statistical profiling
- Advanced Feature Selection: Multiple methods including variance, correlation, mutual information, tree-based, L1 regularization, and consensus selection
- Data Preprocessing: Complete pipeline for missing value handling, outlier treatment, scaling, encoding, and class imbalance
- Rich Visualizations: Interactive plots with Plotly, static plots with Matplotlib/Seaborn, and automated dashboards
- Automated Reporting: Integration with YData Profiling and Sweetviz, plus custom HTML/JSON reports
- Clean Architecture: Domain-Driven Design (DDD) patterns with comprehensive logging and error handling
- Performance Optimized: Memory-efficient chunked processing for large datasets
Installation
Basic Installation
pip install 91life-ds-lib
With Cloud Storage Support
pip install 91life-ds-lib[cloud]
With Profiling Tools
pip install 91life-ds-lib[profiling]
Development Installation
git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib
pip install -e ".[dev]"
Quickstart
from ninetyone_life_ds import DataLoader, DataExplorer, FeatureSelector
# Load data efficiently
loader = DataLoader()
data = loader.load_dataset('your_data.csv')
# Explore data comprehensively
explorer = DataExplorer()
basic_info = explorer.analyze_basic_info(data)
missing_analysis = explorer.analyze_missing_data(data)
readiness_score = explorer.calculate_data_readiness_score(data)
# Select features using consensus method
selector = FeatureSelector()
selected_features = selector.consensus_feature_selection(
data,
target_col='target',
task_type='classification'
)
print(f"Data readiness: {readiness_score['overall_readiness']}/100")
print(f"Selected features: {len(selected_features['selected_features'])}")
Full Example
See examples/complete_workflow.py for a comprehensive demonstration of all library capabilities.
API Overview
Core Modules
- DataLoader: Efficient data loading with cloud storage support
- DataExplorer: Comprehensive data exploration and quality assessment
- FeatureSelector: Advanced feature selection with multiple algorithms
- DataPreprocessor: Complete preprocessing pipeline
- Visualizer: Rich visualizations and interactive plots
- ReportGenerator: Automated report generation and profiling
Main Classes
DataLoader: Handles data loading from various sources and formatsDataExplorer: Performs comprehensive data analysis and quality assessmentFeatureSelector: Implements multiple feature selection algorithmsDataPreprocessor: Provides complete data preprocessing pipelineVisualizer: Creates professional visualizations and plotsReportGenerator: Generates comprehensive analysis reports
Development Setup
Prerequisites
- Python 3.8+
- pip or conda
Setup
# Clone repository
git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
flake8 src/ tests/
# Format code
black src/ tests/
# Type checking
mypy src/
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=src/ninetyone_life_ds --cov-report=html
# Run specific test file
pytest tests/test_data_explorer.py -v
Contributing Guidelines
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting (
pytest && flake8 src/ tests/) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style
- Follow PEP 8 guidelines
- Use type hints for all functions
- Write comprehensive docstrings (Google style)
- Ensure all tests pass
- Maintain test coverage above 90%
License
This project is licensed under the 91Life License - see the LICENSE file for details.
Contact
- Company: 91.life
- Author: Shpat Dobraj
- Email: shpatdobraj@91.life
- Issues: GitHub Issues
Company Insights
91.life is a technology company focused on data science and machine learning solutions. The company provides professional tools and services for data analysis, with a focus on healthcare and life sciences applications.
For more information about 91.life's services and team, visit https://91.life.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file 91life_ds_lib-1.0.0.tar.gz.
File metadata
- Download URL: 91life_ds_lib-1.0.0.tar.gz
- Upload date:
- Size: 76.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05ec0e48bd316822e8f771ec7dd1b3c77b410bad9f2c96b1a064e067af23187a
|
|
| MD5 |
f87a4e2e1b230ae23d3cc6a2540e05e5
|
|
| BLAKE2b-256 |
9acb47ca2ac9d12f70ef1cd74c833405418edcc34f92c8eec59b39b07e707b7e
|
File details
Details for the file 91life_ds_lib-1.0.0-py3-none-any.whl.
File metadata
- Download URL: 91life_ds_lib-1.0.0-py3-none-any.whl
- Upload date:
- Size: 46.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e018d4c0393e495fa395e2e02992771fa690582e288fe829b329e5f75d45708
|
|
| MD5 |
6a4e410a1e97b18ad1bf400e3746a415
|
|
| BLAKE2b-256 |
094179684673eb59a537bb361e80c46962a97afb9db0b7eec71953765f81ba8a
|