Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Featuring DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.

These details have not been verified by PyPI

Project links

Project description

Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library

New in 0.2.1: Major performance improvements across all core operations. See CHANGELOG for details.

Veloxx is a new Rust library designed for highly performant and extremely lightweight in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.

Core Principles & Design Goals

Extreme Lightweighting: Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.
Performance First: Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.
Safety & Reliability: Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.
Ergonomics & Idiomatic Rust API: Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.
Composability & Extensibility: Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.

Key Features

Core Data Structures

DataFrame: A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String, DateTime). Efficient storage and handling of missing values.
Series (or Column): A single-typed, named column of data within a DataFrame, providing type-specific operations.

Data Ingestion & Loading

From Vec<Vec<T>> / Iterator: Basic in-memory construction from Rust native collections.
CSV Support: Minimalistic, highly efficient CSV parser for loading data.
JSON Support: Efficient parsing for common JSON structures.
Custom Data Sources: Traits/interfaces for users to implement their own data loading mechanisms.

Data Cleaning & Preparation

drop_nulls(): Remove rows with any null values.
fill_nulls(value): Fill nulls with a specified value (type-aware, including DateTime).
interpolate_nulls(): Basic linear interpolation for numeric and DateTime series.
Type Casting: Efficient conversion between compatible data types for Series (e.g., i32 to f64).
rename_column(old_name, new_name): Rename columns.

Data Transformation & Manipulation

Selection: select_columns(names), drop_columns(names).
Filtering: Predicate-based row selection using logical (AND, OR, NOT) and comparison operators (==, !=, <, >, <=, >=).
Projection: with_column(new_name, expression), apply() for user-defined functions.
Sorting: Sort DataFrame by one or more columns (ascending/descending).
Joining: Basic inner, left, and right join operations on common keys.
Concatenation/Append: Combine DataFrames vertically.

Aggregation & Reduction

Simple Aggregations: sum(), mean(), median(), min(), max(), count(), std_dev().
Group By: Perform aggregations on groups defined by one or more columns.
Unique Values: unique() for a Series or DataFrame columns.

Basic Analytics & Statistics

describe(): Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).
correlation(): Calculate Pearson correlation between two numeric Series.
covariance(): Calculate covariance.

Output & Export

To Vec<Vec<T>>: Export DataFrame content back to standard Rust collections.
To CSV: Efficiently write DataFrame to a CSV file.
Display/Pretty Print: User-friendly console output for DataFrame and Series.

Installation

Rust

Add the following to your Cargo.toml file:

[dependencies]
veloxx = "0.2.2" # Or the latest version

Python

You can install the Python bindings using pip after building them with maturin:

# First, build the Python wheel (from the project root)
maturin build --release

# Then install the wheel
pip install target/wheels/veloxx-*-py3-none-any.whl

WebAssembly (Node.js/Browser)

You can install the WebAssembly package using npm after building it with wasm-pack:

# First, build the WebAssembly package (from the project root)
wasm-pack build --target web --out-dir pkg

# Then install the package
npm install ./pkg

Usage Examples

Rust Usage

Here's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:

use veloxx::dataframe::DataFrame;
use veloxx::series::Series;
use veloxx::types::{Value, DataType};
use veloxx::conditions::Condition;
use veloxx::expressions::Expr;
use std::collections::BTreeMap;

fn main() -> Result<(), String> {
    // 1. Create a DataFrame
    let mut columns = BTreeMap::new();
    columns.insert("name".to_string(), Series::new_string("name", vec![Some("Alice".to_string()), Some("Bob".to_string()), Some("Charlie".to_string()), Some("David".to_string())]));
    columns.insert("age".to_string(), Series::new_i32("age", vec![Some(25), Some(30), Some(22), Some(35)]));
    columns.insert("city".to_string(), Series::new_string("city", vec![Some("New York".to_string()), Some("London".to_string()), Some("New York".to_string()), Some("Paris".to_string())]));
    columns.insert("last_login".to_string(), Series::new_datetime("last_login", vec![Some(1678886400), Some(1678972800), Some(1679059200), Some(1679145600)]));

    let df = DataFrame::new(columns)?;
    println!("Original DataFrame:
{}", df);

    // 2. Filter data: age > 25 AND city == "New York"
    let condition = Condition::And(
        Box::new(Condition::Gt("age".to_string(), Value::I32(25))),
        Box::new(Condition::Eq("city".to_string(), Value::String("New York".to_string()))),
    );
    let filtered_df = df.filter(&condition)?;
    println!("
Filtered DataFrame (age > 25 AND city == \"New York\"):
{}", filtered_df);

    // 3. Add a new column: age_in_10_years = age + 10
    let expr_add_10 = Expr::Add(Box::new(Expr::Column("age".to_string())), Box::new(Expr::Literal(Value::I32(10))));
    let df_with_new_col = df.with_column("age_in_10_years", &expr_add_10)?;
    println!("
DataFrame with new column (age_in_10_years):
{}", df_with_new_col);

    // 4. Group by city and calculate average age and count of users
    let grouped_df = df.group_by(vec!["city".to_string()])?;
    let aggregated_df = grouped_df.agg(vec![("age", "mean"), ("name", "count")])?;
    println!("
Aggregated DataFrame (average age and user count by city):
{}", aggregated_df);

    // 5. Demonstrate DateTime filtering (users logged in after a specific date)
    let specific_date_timestamp = 1679000000; // Example timestamp
    let condition_dt = Condition::Gt("last_login".to_string(), Value::DateTime(specific_date_timestamp));
    let filtered_df_dt = df.filter(&condition_dt)?;
    println!("
Filtered DataFrame (users logged in after {}):
{}", specific_date_timestamp, filtered_df_dt);

    Ok(())
}

Non-Functional Requirements


### Python Usage

```python
import veloxx

# 1. Create a DataFrame
df = veloxx.PyDataFrame({
    "name": veloxx.PySeries("name", ["Alice", "Bob", "Charlie", "David"]),
    "age": veloxx.PySeries("age", [25, 30, 22, 35]),
    "city": veloxx.PySeries("city", ["New York", "London", "New York", "Paris"]),
})
print("Original DataFrame:")
print(df)

# 2. Filter data: age > 25
filtered_df = df.filter([i for i, age in enumerate(df.get_column("age").to_vec_f64()) if age > 25])
print("\nFiltered DataFrame (age > 25):")
print(filtered_df)

# 3. Select columns
selected_df = df.select_columns(["name", "city"])
print("\nSelected Columns (name, city):")
print(selected_df)

# 4. Rename a column
renamed_df = df.rename_column("age", "years")
print("\nRenamed Column (age to years):")
print(renamed_df)

# 5. Series operations
age_series = df.get_column("age")
print(f"\nAge Series Sum: {age_series.sum()}")
print(f"Age Series Mean: {age_series.mean()}")
print(f"Age Series Max: {age_series.max()}")
print(f"Age Series Unique: {age_series.unique().to_vec_f64()}")

WebAssembly Usage (Node.js)

const veloxx = require('veloxx');

async function runWasmExample() {
    // 1. Create a DataFrame
    const df = new veloxx.WasmDataFrame({
        name: ["Alice", "Bob", "Charlie", "David"],
        age: [25, 30, 22, 35],
        city: ["New York", "London", "New York", "Paris"],
    });
    console.log("Original DataFrame:");
    console.log(df);

    // 2. Filter data: age > 25
    const ageSeries = df.getColumn("age");
    const filteredIndices = [];
    for (let i = 0; i < ageSeries.len; i++) {
        if (ageSeries.getValue(i) > 25) {
            filteredIndices.push(i);
        }
    }
    const filteredDf = df.filter(new Uint32Array(filteredIndices));
    console.log("\nFiltered DataFrame (age > 25):");
    console.log(filteredDf);

    // 3. Series operations
    console.log(`\nAge Series Sum: ${ageSeries.sum()}`);
    console.log(`Age Series Mean: ${ageSeries.mean()}`);
    console.log(`Age Series Unique: ${ageSeries.unique().toVecF64()}`);
}

runWasmExample();

Non-Functional Requirements


- **Comprehensive Documentation:** Extensive `///` documentation for all public APIs, examples, and design choices.
- **Robust Testing:** Thorough unit and integration tests covering all functionalities and edge cases.
- **Performance Benchmarking:** Includes benchmarks to track performance and memory usage, ensuring lightweight and high-performance goals are met.
- **Cross-Platform Compatibility:** Designed to work on common operating systems (Linux, macOS, Windows).
- **Safety:** Upholds Rust's safety guarantees, with minimal and heavily justified `unsafe` code.

## Future Considerations / Roadmap

- **Streaming Data:** Support for processing data in a streaming fashion.
- **Time-Series Functionality:** Basic time-series resampling, rolling windows.
- **FFI (Foreign Function Interface):** Consider C API for integration with other languages (Python, JavaScript).
- **Simple Plotting Integration:** Provide hooks or basic data preparation for common plotting libraries.
- **Persistence:** Basic serialization/deserialization formats (e.g., custom binary format, Parquet subset).

## WebAssembly Testing

WebAssembly bindings are currently tested using `console.assert` in `test_wasm.js`. Future work includes migrating to a more robust JavaScript testing framework like Jest.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Nov 26, 2025

0.3.2

Aug 29, 2025

0.3.1

Jul 25, 2025

0.3.0

Jul 13, 2025

0.2.4

Jul 9, 2025

0.2.3

Jul 7, 2025

This version

0.2.2

Jul 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

veloxx-0.2.2-py3-none-win_amd64.whl (64.2 kB view details)

Uploaded Jul 6, 2025 Python 3Windows x86-64

File details

Details for the file veloxx-0.2.2-py3-none-win_amd64.whl.

File metadata

Download URL: veloxx-0.2.2-py3-none-win_amd64.whl
Upload date: Jul 6, 2025
Size: 64.2 kB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for veloxx-0.2.2-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`90a6df1f90295a736905aaa1f9501d2364a8fae8a43e07a7c48b5b4840bbfdac`
MD5	`9c20c6fbb4fe3d8945d6fb053a50c7fa`
BLAKE2b-256	`e43c87e62aed40b57a6b898c50ead382d672f5500d07adb1665498b354efc6db`

See more details on using hashes here.

veloxx 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library

Core Principles & Design Goals

Key Features

Core Data Structures

Data Ingestion & Loading

Data Cleaning & Preparation

Data Transformation & Manipulation

Aggregation & Reduction

Basic Analytics & Statistics

Output & Export

Installation

Rust

Python

WebAssembly (Node.js/Browser)

Usage Examples

Rust Usage

Non-Functional Requirements

WebAssembly Usage (Node.js)

Non-Functional Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes