Skip to main content

Elegant data operations for DataFrames - add.to(), add.transform(), add.synthetic()

Project description

additory - Rust Core

Elegant data operations for DataFrames.

Three Functions Only

  • add.to() - Add data FROM external source
  • add.transform() - Transform data WITHIN DataFrame
  • add.synthetic() - Create or augment with synthetic data

Project Structure

rust-core/
├── src/
│   ├── lib.rs              # Main library entry
│   ├── core/               # Core functionality
│   │   ├── dataframe.rs    # DataFrame abstraction
│   │   ├── types.rs        # Type definitions
│   │   └── errors.rs       # Error types
│   ├── utils/              # Utilities
│   │   ├── validation.rs   # Input validation
│   │   ├── logging.rs      # Logging system
│   │   └── type_detection.rs
│   ├── to/                 # add.to() implementation
│   ├── transform/          # add.transform() implementation
│   └── synthetic/          # add.synthetic() implementation
├── Cargo.toml
└── README.md

Building

cargo build
cargo test
cargo build --release

Status

Phase 1: Core Infrastructure - ✅ Complete
Phase 2: Initial Features - In Progress

Core Infrastructure (Complete)

  • ✅ Project structure
  • ✅ Error types
  • ✅ Core types (Mode, FetchColumn, UniversalParams)
  • ✅ DataFrame abstraction
  • ✅ Validation utilities
  • ✅ Logging utilities
  • ✅ Type detection utilities

Implemented Features

  • add.to() LOOKUP mode - Add columns from reference DataFrame (5 tests passing)
  • add.to() @merge mode - Merge multiple DataFrames (9 tests passing)
  • add.transform() @filter mode - Filter rows and select columns (10 tests passing)
  • add.transform() @sort mode - Sort rows by column(s) (8 tests passing)
  • add.transform() @transpose mode - Transpose DataFrame (6 tests passing)
  • add.transform() @aggregate mode - Group and aggregate data (10 tests passing)
  • add.transform() @split mode - Split text column into multiple columns (7 tests passing)
  • add.transform() @calc mode - Calculate new columns from expressions (8 tests passing)
  • add.transform() @extract mode - Extract datetime components (9 tests passing)
  • add.transform() @onehot mode - One-hot encoding for categorical columns (7 tests passing)
  • add.transform() @label mode - Label encoding for categorical columns (6 tests passing)
  • add.transform() @harmonize mode - Unit conversions (8 tests passing)
  • add.transform() @knn mode - K-Nearest Neighbors imputation (27 tests passing, pure Python)
  • add.synthetic() @new mode - Create synthetic DataFrames with 7 distributions + date/time + patterns (18 tests passing)

Test Status

  • Total Tests: 163 passing
  • Core Tests: 25 passing
  • LOOKUP Tests: 5 passing
  • @merge Tests: 9 passing
  • @filter Tests: 10 passing
  • @sort Tests: 8 passing
  • @transpose Tests: 6 passing
  • @aggregate Tests: 10 passing
  • @split Tests: 7 passing
  • @calc Tests: 8 passing
  • @extract Tests: 9 passing
  • @onehot Tests: 7 passing
  • @label Tests: 6 passing
  • @harmonize Tests: 8 passing
  • @knn Tests: 27 passing (pure Python)
  • @new (synthetic) Tests: 18 passing

Next Steps

  1. Implement add.to() LOOKUP mode
  2. Implement add.to() @merge mode
  3. Implement add.transform() @filter mode
  4. Implement add.transform() @sort mode
  5. Implement add.transform() @transpose mode
  6. Implement add.transform() @aggregate mode
  7. Implement add.transform() @split mode
  8. Implement add.transform() @calc mode (basic version)
  9. Implement add.transform() @extract mode (datetime only)
  10. Implement add.transform() @onehot mode
  11. Implement add.transform() @label mode
  12. Implement add.transform() @harmonize mode
  13. Implement add.transform() @knn mode (pure Python)
  14. Implement add.synthetic() @new mode (basic version)
  15. Expand add.synthetic() @new mode (more distributions, patterns, dates)
  16. Implement add.synthetic() augment mode
  17. Implement add.synthetic() @analyze mode
  18. Add Python bindings (PyO3)
  19. Enhance @calc with expression namespaces
  20. Add text extraction to @extract mode

Documentation

See shadow_library/ for comprehensive documentation of all modules.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

additory-0.1.3a2-cp313-cp313-manylinux_2_34_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

File details

Details for the file additory-0.1.3a2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for additory-0.1.3a2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c10edae18e8a8a2c702940cdfeaa7c2b887de7a02a3236b9e72cc07f4fa66eb0
MD5 e0ed335716bf21dbcac49c6d1baec791
BLAKE2b-256 691d747adc4641556f6a1e144e3174f319515e2bf0fa0f56e70a4678c8f85012

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page