Skip to main content

Data Experimentation and Tinkering Kit - A comprehensive Python toolkit for data science, machine learning, optimization, simulation, and visualization experiments

Project description

Dexter Toolkit

Data Experimentation and Tinkering Kit

A comprehensive Python toolkit for data science, machine learning, optimization, simulation, and visualization experiments.

Python 3.8+ License: MIT Code style: black

Overview

Dexter is a modular toolkit designed for rapid prototyping and experimentation in data science and related fields. It provides a collection of specialized modules for different aspects of data analysis, machine learning, optimization, and visualization.

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/DenizK00/Dexter.git
cd Dexter

# Install in development mode
make install-dev

# Or install manually
python install_dev.py

Basic Usage

import dexter

# Machine Learning
from dexter import pick_classifier
import pandas as pd

df = pd.read_csv('your_data.csv')
best_model = pick_classifier(df, target='target_column')

# Optimization
from dexter import Problem

objective = "min 2*x + 3*y"
constraints = ["x + y >= 10", "x >= 0", "y >= 0"]
problem = Problem(objective, constraints)
solution = problem.solve()

# Statistics
from dexter import Normal, Uniform

normal_dist = Normal(mean=0, var=1)
rv = normal_dist.draw()

# Simulation
from dexter import SimManager
import simpy

env = simpy.Environment()
sim = SimManager(env)
sim.run(until=100)

๐Ÿ“ฆ Package Structure

dexter/
โ”œโ”€โ”€ src/dexter/                    # Main package
โ”‚   โ”œโ”€โ”€ __init__.py               # Package initialization
โ”‚   โ”œโ”€โ”€ board/                    # Interactive data dashboard
โ”‚   โ”œโ”€โ”€ core/                     # Core pipeline utilities
โ”‚   โ”œโ”€โ”€ data_wrangling/           # Data transformation tools
โ”‚   โ”œโ”€โ”€ environment/              # Environment simulation
โ”‚   โ”œโ”€โ”€ language/                 # Language processing
โ”‚   โ”œโ”€โ”€ ml/                      # Machine learning
โ”‚   โ”œโ”€โ”€ optimization/             # Mathematical optimization
โ”‚   โ”œโ”€โ”€ simulation/               # Discrete event simulation
โ”‚   โ”œโ”€โ”€ stats/                   # Statistical analysis
โ”‚   โ””โ”€โ”€ visualization/            # Visualization tools
โ”œโ”€โ”€ tests/                        # Test suite
โ”œโ”€โ”€ docs/                         # Documentation
โ”œโ”€โ”€ examples/                     # Usage examples
โ””โ”€โ”€ scripts/                      # Utility scripts

๐ŸŽฏ Modules

๐Ÿง  ML - Machine Learning

  • Auto Model Selection: Automated classifier selection with hyperparameter optimization
  • Model Comparison: Cross-validation and performance metrics comparison
  • Hyperopt Integration: Bayesian optimization for hyperparameter tuning
  • Binary/Multiclass Support: Handles both binary and multiclass classification tasks
from dexter.ml import pick_classifier

# Automatically find the best classifier
best_model = pick_classifier(df, target='target_column', mode='extensive')

โšก Optimization - Mathematical Optimization

  • Mathematical Optimization: Linear and nonlinear optimization problems
  • Pyomo Integration: Mathematical modeling with Pyomo framework
  • Equation Parsing: Natural language equation parsing and conversion
  • Solution Management: Optimal solution extraction and evaluation
from dexter.optimization import Problem

# Define and solve optimization problem
problem = Problem("min 2*x + 3*y", ["x + y >= 10", "x >= 0", "y >= 0"])
solution = problem.solve()

๐ŸŽฎ Simulation - Discrete Event Simulation

  • Discrete Event Simulation: Built on SimPy for event-driven simulations
  • Resource Management: Dynamic resource allocation and management
  • Process Control: Start, stop, and manage simulation processes
  • Step Mode: Step-by-step simulation execution for debugging
from dexter.simulation import SimManager
import simpy

env = simpy.Environment()
sim = SimManager(env)
sim.add_resource("service", simpy.Resource(env, capacity=2))
sim.run(until=100)

๐Ÿ“Š Stats - Statistical Analysis

  • Probability Distributions: Comprehensive distribution library
    • Normal, Uniform, Binomial, Geometric, Negative Binomial
    • Poisson, Exponential, Gamma, Chi-Square distributions
  • Random Variable Management: RV and Sample classes for statistical operations
  • Distribution Operations: Addition, multiplication, and transformation of distributions
from dexter.stats import Normal, Uniform, Binomial

# Create and work with distributions
normal_dist = Normal(mean=0, var=1)
uniform_dist = Uniform(a=0, b=1)
binomial_dist = Binomial(n=10, p=0.5)

# Generate random variables
rv = normal_dist.draw()
sample = uniform_dist.draw(n=100)

๐ŸŽจ Visualization - Interactive Visualization

  • 3D Space Visualization: Interactive 3D plotting with Plotly
  • Vector Visualization: 3D vector representation and manipulation
  • Surface Plotting: 3D surface and mesh grid visualization
  • Interactive Plots: Web-based interactive visualizations
from dexter.visualization import Space

# Create 3D visualization space
space = Space(x_size=10, y_size=10, z_size=10)
space.add_vector([1, 2, 3], color='red')
space.show()

๐ŸŒ Environment - Environment Simulation

  • Grid-based Environment: 2D grid system for agent-based simulations
  • Tkinter GUI: Interactive grid display with agent positioning
  • Agent Management: Place and track agents within the grid environment
from dexter.environment import Grid, GridApp

# Create grid environment
grid = Grid(nrows=10, ncolumns=10)
grid.set_agent(5, 5)
grid.set_cell(3, 3, '#')

๐ŸŽฏ Board - Interactive Data Dashboard

  • Interactive Web Dashboard: Built with Dash and Bootstrap for data visualization
  • IPython Integration: Custom kernel management with Jupyter console integration
  • Real-time Data Viewing: Live data table updates and interactive components

๐Ÿ”ง Data Wrangling - Data Transformation

  • Data Modification: Tools for data transformation and manipulation
  • Diffusion Functions: Data diffusion and spreading utilities
  • Deviation Functions: Statistical deviation and error introduction

๐Ÿ”„ Core - Pipeline Management

  • Modular Pipeline System: Extensible pipeline architecture
  • Process Chaining: Sequential process execution with result management
  • Step-by-step Execution: Individual step execution and monitoring

๐Ÿค– Language - Language Processing

  • Fine-tuning Framework: Tools for model fine-tuning and training
  • RAG Pipeline: Retrieval-Augmented Generation pipeline components
  • Chain Management: Modular chain-based processing architecture

๐Ÿ› ๏ธ Development

Setup Development Environment

# Install in development mode
make install-dev

# Run tests
make test

# Run linting
make lint

# Format code
make format

# Run all checks
make check

Project Structure

dexter/
โ”œโ”€โ”€ src/dexter/           # Source code
โ”œโ”€โ”€ tests/                # Test suite
โ”œโ”€โ”€ docs/                 # Documentation
โ”œโ”€โ”€ examples/             # Usage examples
โ”œโ”€โ”€ scripts/              # Utility scripts
โ”œโ”€โ”€ pyproject.toml        # Project configuration
โ”œโ”€โ”€ setup.py              # Setup script
โ”œโ”€โ”€ Makefile              # Development tasks
โ”œโ”€โ”€ install_dev.py        # Development installation
โ””โ”€โ”€ README.md            # This file

๐Ÿ“‹ Dependencies

Core Dependencies

  • Data Science: pandas, numpy, scipy, scikit-learn
  • Visualization: matplotlib, seaborn, plotly
  • Web Dashboard: dash, dash-bootstrap-components
  • Optimization: pyomo
  • Simulation: simpy
  • Machine Learning: hyperopt
  • GUI: PyQt5
  • Jupyter: ipykernel, ipython

Development Dependencies

  • Testing: pytest, pytest-cov
  • Linting: flake8, mypy
  • Formatting: black, isort
  • Documentation: sphinx, sphinx-rtd-theme

๐Ÿ“š Documentation

For detailed documentation, examples, and API reference, see the documentation.

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run tests: make test
  5. Format code: make format
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ‘จโ€๐Ÿ’ป Author

Deniz - denizkurtaran00@gmail.com

๐Ÿ™ Acknowledgments

  • Built with โค๏ธ for the data science community
  • Inspired by the need for rapid experimentation tools
  • Powered by the amazing Python ecosystem

Dexter Toolkit - Making data experimentation and tinkering easier and more efficient.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexter_toolkit-1.1.0.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dexter_toolkit-1.1.0-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file dexter_toolkit-1.1.0.tar.gz.

File metadata

  • Download URL: dexter_toolkit-1.1.0.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for dexter_toolkit-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a204e4e63f2400af58c067aa9edadef7fec51fdcd5f1e82ae53d2486befacd6c
MD5 6b556fb78340331e534141edeb2960df
BLAKE2b-256 6300f2f6506f326a1c0e4a00b463914bb825692cb6a3d430a2c8af1f19ea14c7

See more details on using hashes here.

File details

Details for the file dexter_toolkit-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dexter_toolkit-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for dexter_toolkit-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd68e0f669abdbf3aa6abe928d6a5c93ce1954a832c6cad56c2e39b63303777f
MD5 1a44b99f42fdea141cb34914041fbcfb
BLAKE2b-256 16435c7ca8f5085e8f8df29dff231ef6d6863e0da84ae6c23b3b1664cf69f5db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page