Skip to main content

Data Experimentation and Tinkering Kit - A comprehensive Python toolkit for data science, machine learning, optimization, simulation, and visualization experiments

Project description

Dexter Toolkit

Data Experimentation and Tinkering Kit

A comprehensive Python toolkit for data science, machine learning, optimization, simulation, and visualization experiments.

Python 3.8+ License: MIT Code style: black

Overview

Dexter is a modular toolkit designed for rapid prototyping and experimentation in data science and related fields. It provides a collection of specialized modules for different aspects of data analysis, machine learning, optimization, and visualization.

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/DenizK00/Dexter.git
cd Dexter

# Install in development mode
make install-dev

# Or install manually
python install_dev.py

Basic Usage

import dexter

# Machine Learning
from dexter import pick_classifier
import pandas as pd

df = pd.read_csv('your_data.csv')
best_model = pick_classifier(df, target='target_column')

# Optimization
from dexter import Problem

objective = "min 2*x + 3*y"
constraints = ["x + y >= 10", "x >= 0", "y >= 0"]
problem = Problem(objective, constraints)
solution = problem.solve()

# Statistics
from dexter import Normal, Uniform

normal_dist = Normal(mean=0, var=1)
rv = normal_dist.draw()

# Simulation
from dexter import SimManager
import simpy

env = simpy.Environment()
sim = SimManager(env)
sim.run(until=100)

๐Ÿ“ฆ Package Structure

dexter/
โ”œโ”€โ”€ src/dexter/                    # Main package
โ”‚   โ”œโ”€โ”€ __init__.py               # Package initialization
โ”‚   โ”œโ”€โ”€ board/                    # Interactive data dashboard
โ”‚   โ”œโ”€โ”€ core/                     # Core pipeline utilities
โ”‚   โ”œโ”€โ”€ data_wrangling/           # Data transformation tools
โ”‚   โ”œโ”€โ”€ environment/              # Environment simulation
โ”‚   โ”œโ”€โ”€ language/                 # Language processing
โ”‚   โ”œโ”€โ”€ ml/                      # Machine learning
โ”‚   โ”œโ”€โ”€ optimization/             # Mathematical optimization
โ”‚   โ”œโ”€โ”€ simulation/               # Discrete event simulation
โ”‚   โ”œโ”€โ”€ stats/                   # Statistical analysis
โ”‚   โ””โ”€โ”€ visualization/            # Visualization tools
โ”œโ”€โ”€ tests/                        # Test suite
โ”œโ”€โ”€ docs/                         # Documentation
โ”œโ”€โ”€ examples/                     # Usage examples
โ””โ”€โ”€ scripts/                      # Utility scripts

๐ŸŽฏ Modules

๐Ÿง  ML - Machine Learning

  • Auto Model Selection: Automated classifier selection with hyperparameter optimization
  • Model Comparison: Cross-validation and performance metrics comparison
  • Hyperopt Integration: Bayesian optimization for hyperparameter tuning
  • Binary/Multiclass Support: Handles both binary and multiclass classification tasks
from dexter.ml import pick_classifier

# Automatically find the best classifier
best_model = pick_classifier(df, target='target_column', mode='extensive')

โšก Optimization - Mathematical Optimization

  • Mathematical Optimization: Linear and nonlinear optimization problems
  • Pyomo Integration: Mathematical modeling with Pyomo framework
  • Equation Parsing: Natural language equation parsing and conversion
  • Solution Management: Optimal solution extraction and evaluation
from dexter.optimization import Problem

# Define and solve optimization problem
problem = Problem("min 2*x + 3*y", ["x + y >= 10", "x >= 0", "y >= 0"])
solution = problem.solve()

๐ŸŽฎ Simulation - Discrete Event Simulation

  • Discrete Event Simulation: Built on SimPy for event-driven simulations
  • Resource Management: Dynamic resource allocation and management
  • Process Control: Start, stop, and manage simulation processes
  • Step Mode: Step-by-step simulation execution for debugging
from dexter.simulation import SimManager
import simpy

env = simpy.Environment()
sim = SimManager(env)
sim.add_resource("service", simpy.Resource(env, capacity=2))
sim.run(until=100)

๐Ÿ“Š Stats - Statistical Analysis

  • Probability Distributions: Comprehensive distribution library
    • Normal, Uniform, Binomial, Geometric, Negative Binomial
    • Poisson, Exponential, Gamma, Chi-Square distributions
  • Random Variable Management: RV and Sample classes for statistical operations
  • Distribution Operations: Addition, multiplication, and transformation of distributions
from dexter.stats import Normal, Uniform, Binomial

# Create and work with distributions
normal_dist = Normal(mean=0, var=1)
uniform_dist = Uniform(a=0, b=1)
binomial_dist = Binomial(n=10, p=0.5)

# Generate random variables
rv = normal_dist.draw()
sample = uniform_dist.draw(n=100)

๐ŸŽจ Visualization - Interactive Visualization

  • 3D Space Visualization: Interactive 3D plotting with Plotly
  • Vector Visualization: 3D vector representation and manipulation
  • Surface Plotting: 3D surface and mesh grid visualization
  • Interactive Plots: Web-based interactive visualizations
from dexter.visualization import Space

# Create 3D visualization space
space = Space(x_size=10, y_size=10, z_size=10)
space.add_vector([1, 2, 3], color='red')
space.show()

๐ŸŒ Environment - Environment Simulation

  • Grid-based Environment: 2D grid system for agent-based simulations
  • Tkinter GUI: Interactive grid display with agent positioning
  • Agent Management: Place and track agents within the grid environment
from dexter.environment import Grid, GridApp

# Create grid environment
grid = Grid(nrows=10, ncolumns=10)
grid.set_agent(5, 5)
grid.set_cell(3, 3, '#')

๐ŸŽฏ Board - Interactive Data Dashboard

  • Interactive Web Dashboard: Built with Dash and Bootstrap for data visualization
  • IPython Integration: Custom kernel management with Jupyter console integration
  • Real-time Data Viewing: Live data table updates and interactive components

๐Ÿ”ง Data Wrangling - Data Transformation

  • Data Modification: Tools for data transformation and manipulation
  • Diffusion Functions: Data diffusion and spreading utilities
  • Deviation Functions: Statistical deviation and error introduction

๐Ÿ”„ Core - Pipeline Management

  • Modular Pipeline System: Extensible pipeline architecture
  • Process Chaining: Sequential process execution with result management
  • Step-by-step Execution: Individual step execution and monitoring

๐Ÿค– Language - Language Processing

  • Fine-tuning Framework: Tools for model fine-tuning and training
  • RAG Pipeline: Retrieval-Augmented Generation pipeline components
  • Chain Management: Modular chain-based processing architecture

๐Ÿ› ๏ธ Development

Setup Development Environment

# Install in development mode
make install-dev

# Run tests
make test

# Run linting
make lint

# Format code
make format

# Run all checks
make check

Project Structure

dexter/
โ”œโ”€โ”€ src/dexter/           # Source code
โ”œโ”€โ”€ tests/                # Test suite
โ”œโ”€โ”€ docs/                 # Documentation
โ”œโ”€โ”€ examples/             # Usage examples
โ”œโ”€โ”€ scripts/              # Utility scripts
โ”œโ”€โ”€ pyproject.toml        # Project configuration
โ”œโ”€โ”€ setup.py              # Setup script
โ”œโ”€โ”€ Makefile              # Development tasks
โ”œโ”€โ”€ install_dev.py        # Development installation
โ””โ”€โ”€ README.md            # This file

๐Ÿ“‹ Dependencies

Core Dependencies

  • Data Science: pandas, numpy, scipy, scikit-learn
  • Visualization: matplotlib, seaborn, plotly
  • Web Dashboard: dash, dash-bootstrap-components
  • Optimization: pyomo
  • Simulation: simpy
  • Machine Learning: hyperopt
  • GUI: PyQt5
  • Jupyter: ipykernel, ipython

Development Dependencies

  • Testing: pytest, pytest-cov
  • Linting: flake8, mypy
  • Formatting: black, isort
  • Documentation: sphinx, sphinx-rtd-theme

๐Ÿ“š Documentation

For detailed documentation, examples, and API reference, see the documentation.

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run tests: make test
  5. Format code: make format
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ‘จโ€๐Ÿ’ป Author

Deniz - denizkurtaran00@gmail.com

๐Ÿ™ Acknowledgments

  • Built with โค๏ธ for the data science community
  • Inspired by the need for rapid experimentation tools
  • Powered by the amazing Python ecosystem

Dexter Toolkit - Making data experimentation and tinkering easier and more efficient.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexter_toolkit-0.1.0.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dexter_toolkit-0.1.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file dexter_toolkit-0.1.0.tar.gz.

File metadata

  • Download URL: dexter_toolkit-0.1.0.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for dexter_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f02cc2e92cbc81129cee00ab820615f82019e70d8728e4d11e538dd4710af131
MD5 588d37994831d3a3a1365da9861fadd0
BLAKE2b-256 774bf7c99b44df0cf3a0f3a3038dce2ab16f305be675c18c8ea430cf8da0dae1

See more details on using hashes here.

File details

Details for the file dexter_toolkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dexter_toolkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for dexter_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cfb1e87fe9bfc15fe7fc9d23255984815e575a1042a9f6e6d1b3eca673242e88
MD5 0ae329ecdb727fdf8b87004cba51c898
BLAKE2b-256 0684cb1a3753b99addd5b3da8cd9067a5446fa62f1ab1d591039e0be59cd3a58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page