Skip to main content

Elegant data operations for DataFrames - add.to(), add.transform(), add.synthetic()

Project description

additory - Rust Core

Elegant data operations for DataFrames.

Three Functions Only

  • add.to() - Add data FROM external source
  • add.transform() - Transform data WITHIN DataFrame
  • add.synthetic() - Create or augment with synthetic data

Project Structure

rust-core/
├── src/
│   ├── lib.rs              # Main library entry
│   ├── core/               # Core functionality
│   │   ├── dataframe.rs    # DataFrame abstraction
│   │   ├── types.rs        # Type definitions
│   │   └── errors.rs       # Error types
│   ├── utils/              # Utilities
│   │   ├── validation.rs   # Input validation
│   │   ├── logging.rs      # Logging system
│   │   └── type_detection.rs
│   ├── to/                 # add.to() implementation
│   ├── transform/          # add.transform() implementation
│   └── synthetic/          # add.synthetic() implementation
├── Cargo.toml
└── README.md

Building

cargo build
cargo test
cargo build --release

Status

Phase 1: Core Infrastructure - ✅ Complete
Phase 2: Initial Features - In Progress

Core Infrastructure (Complete)

  • ✅ Project structure
  • ✅ Error types
  • ✅ Core types (Mode, FetchColumn, UniversalParams)
  • ✅ DataFrame abstraction
  • ✅ Validation utilities
  • ✅ Logging utilities
  • ✅ Type detection utilities

Implemented Features

  • add.to() LOOKUP mode - Add columns from reference DataFrame (5 tests passing)
  • add.to() @merge mode - Merge multiple DataFrames (9 tests passing)
  • add.transform() @filter mode - Filter rows and select columns (10 tests passing)
  • add.transform() @sort mode - Sort rows by column(s) (8 tests passing)
  • add.transform() @transpose mode - Transpose DataFrame (6 tests passing)
  • add.transform() @aggregate mode - Group and aggregate data (10 tests passing)
  • add.transform() @split mode - Split text column into multiple columns (7 tests passing)
  • add.transform() @calc mode - Calculate new columns from expressions (8 tests passing)
  • add.transform() @extract mode - Extract datetime components (9 tests passing)
  • add.transform() @onehot mode - One-hot encoding for categorical columns (7 tests passing)
  • add.transform() @label mode - Label encoding for categorical columns (6 tests passing)
  • add.transform() @harmonize mode - Unit conversions (8 tests passing)
  • add.transform() @knn mode - K-Nearest Neighbors imputation (27 tests passing, pure Python)
  • add.synthetic() @new mode - Create synthetic DataFrames with 7 distributions + date/time + patterns (18 tests passing)

Test Status

  • Total Tests: 163 passing
  • Core Tests: 25 passing
  • LOOKUP Tests: 5 passing
  • @merge Tests: 9 passing
  • @filter Tests: 10 passing
  • @sort Tests: 8 passing
  • @transpose Tests: 6 passing
  • @aggregate Tests: 10 passing
  • @split Tests: 7 passing
  • @calc Tests: 8 passing
  • @extract Tests: 9 passing
  • @onehot Tests: 7 passing
  • @label Tests: 6 passing
  • @harmonize Tests: 8 passing
  • @knn Tests: 27 passing (pure Python)
  • @new (synthetic) Tests: 18 passing

Next Steps

  1. Implement add.to() LOOKUP mode
  2. Implement add.to() @merge mode
  3. Implement add.transform() @filter mode
  4. Implement add.transform() @sort mode
  5. Implement add.transform() @transpose mode
  6. Implement add.transform() @aggregate mode
  7. Implement add.transform() @split mode
  8. Implement add.transform() @calc mode (basic version)
  9. Implement add.transform() @extract mode (datetime only)
  10. Implement add.transform() @onehot mode
  11. Implement add.transform() @label mode
  12. Implement add.transform() @harmonize mode
  13. Implement add.transform() @knn mode (pure Python)
  14. Implement add.synthetic() @new mode (basic version)
  15. Expand add.synthetic() @new mode (more distributions, patterns, dates)
  16. Implement add.synthetic() augment mode
  17. Implement add.synthetic() @analyze mode
  18. Add Python bindings (PyO3)
  19. Enhance @calc with expression namespaces
  20. Add text extraction to @extract mode

Documentation

See shadow_library/ for comprehensive documentation of all modules.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

additory-0.1.3a4-cp313-cp313-manylinux_2_34_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

additory-0.1.3a4-cp39-cp39-win_amd64.whl (11.3 MB view details)

Uploaded CPython 3.9Windows x86-64

File details

Details for the file additory-0.1.3a4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for additory-0.1.3a4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e075cffb796b957e2bbb011ef3bf917fe620cfe1de89dfad3042ae1263090153
MD5 61b6328a95f86e9104c560a7355afbed
BLAKE2b-256 0253d5da2b8bc89b39f8372dc0ff0b05bc5c9875427f51cc883dfcdabbdb43f3

See more details on using hashes here.

File details

Details for the file additory-0.1.3a4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: additory-0.1.3a4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 11.3 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.3a4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d309a154190cd44d9631be6ebcdc26b54ead85ba4ab39a52d69d91c1173366d2
MD5 44f6220184444d200fc1df67979fb7ed
BLAKE2b-256 978cac2a511288b874552e1a5d3110b13cc829ead9128ce7ae470833d53922c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page