Skip to main content

Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Featuring DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.

Project description

Veloxx Logo Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library

crates.io

New in 0.2.1: Major performance improvements across all core operations. See CHANGELOG for details.

Veloxx is a new Rust library designed for highly performant and extremely lightweight in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.

Core Principles & Design Goals

  • Extreme Lightweighting: Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.
  • Performance First: Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.
  • Safety & Reliability: Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.
  • Ergonomics & Idiomatic Rust API: Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.
  • Composability & Extensibility: Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.

Key Features

Core Data Structures

  • DataFrame: A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String, DateTime). Efficient storage and handling of missing values.
  • Series (or Column): A single-typed, named column of data within a DataFrame, providing type-specific operations.

Data Ingestion & Loading

  • From Vec<Vec<T>> / Iterator: Basic in-memory construction from Rust native collections.
  • CSV Support: Minimalistic, highly efficient CSV parser for loading data.
  • JSON Support: Efficient parsing for common JSON structures.
  • Custom Data Sources: Traits/interfaces for users to implement their own data loading mechanisms.

Data Cleaning & Preparation

  • drop_nulls(): Remove rows with any null values.
  • fill_nulls(value): Fill nulls with a specified value (type-aware, including DateTime).
  • interpolate_nulls(): Basic linear interpolation for numeric and DateTime series.
  • Type Casting: Efficient conversion between compatible data types for Series (e.g., i32 to f64).
  • rename_column(old_name, new_name): Rename columns.

Data Transformation & Manipulation

  • Selection: select_columns(names), drop_columns(names).
  • Filtering: Predicate-based row selection using logical (AND, OR, NOT) and comparison operators (==, !=, <, >, <=, >=).
  • Projection: with_column(new_name, expression), apply() for user-defined functions.
  • Sorting: Sort DataFrame by one or more columns (ascending/descending).
  • Joining: Basic inner, left, and right join operations on common keys.
  • Concatenation/Append: Combine DataFrames vertically.

Aggregation & Reduction

  • Simple Aggregations: sum(), mean(), median(), min(), max(), count(), std_dev().
  • Group By: Perform aggregations on groups defined by one or more columns.
  • Unique Values: unique() for a Series or DataFrame columns.

Basic Analytics & Statistics

  • describe(): Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).
  • correlation(): Calculate Pearson correlation between two numeric Series.
  • covariance(): Calculate covariance.

Output & Export

  • To Vec<Vec<T>>: Export DataFrame content back to standard Rust collections.
  • To CSV: Efficiently write DataFrame to a CSV file.
  • Display/Pretty Print: User-friendly console output for DataFrame and Series.

Installation

Rust

Veloxx is available on crates.io.

Add the following to your Cargo.toml file:

[dependencies]
veloxx = "0.2.3" # Or the latest version

To build your Rust project with Veloxx, run:

cargo build

To run tests:

cargo test

Usage Examples

Rust Usage

Here's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:

use veloxx::dataframe::DataFrame;
use veloxx::series::Series;
use veloxx::types::{Value, DataType};
use veloxx::conditions::Condition;
use veloxx::expressions::Expr;
use std::collections::BTreeMap;

fn main() -> Result<(), String> {
    // 1. Create a DataFrame
    let mut columns = BTreeMap::new();
    columns.insert("name".to_string(), Series::new_string("name", vec![Some("Alice".to_string()), Some("Bob".to_string()), Some("Charlie".to_string()), Some("David".to_string())]));
    columns.insert("age".to_string(), Series::new_i32("age", vec![Some(25), Some(30), Some(22), Some(35)]));
    columns.insert("city".to_string(), Series::new_string("city", vec![Some("New York".to_string()), Some("London".to_string()), Some("New York".to_string()), Some("Paris".to_string())]));
    columns.insert("last_login".to_string(), Series::new_datetime("last_login", vec![Some(1678886400), Some(1678972800), Some(1679059200), Some(1679145600)]));

    let df = DataFrame::new(columns)?;
    println!("Original DataFrame:
{}", df);

    // 2. Filter data: age > 25 AND city == "New York"
    let condition = Condition::And(
        Box::new(Condition::Gt("age".to_string(), Value::I32(25))),
        Box::new(Condition::Eq("city".to_string(), Value::String("New York".to_string()))),
    );
    let filtered_df = df.filter(&condition)?;
    println!("
Filtered DataFrame (age > 25 AND city == \"New York\"):
{}", filtered_df);

    // 3. Add a new column: age_in_10_years = age + 10
    let expr_add_10 = Expr::Add(Box::new(Expr::Column("age".to_string())), Box::new(Expr::Literal(Value::I32(10))));
    let df_with_new_col = df.with_column("age_in_10_years", &expr_add_10)?;
    println!("
DataFrame with new column (age_in_10_years):
{}", df_with_new_col);

    // 4. Group by city and calculate average age and count of users
    let grouped_df = df.group_by(vec!["city".to_string()])?;
    let aggregated_df = grouped_df.agg(vec![("age", "mean"), ("name", "count")])?;
    println!("
Aggregated DataFrame (average age and user count by city):
{}", aggregated_df);

    // 5. Demonstrate DateTime filtering (users logged in after a specific date)
    let specific_date_timestamp = 1679000000; // Example timestamp
    let condition_dt = Condition::Gt("last_login".to_string(), Value::DateTime(specific_date_timestamp));
    let filtered_df_dt = df.filter(&condition_dt)?;
    println!("
Filtered DataFrame (users logged in after {}):
{}", specific_date_timestamp, filtered_df_dt);

    Ok(())
}

Python Usage

import veloxx

# 1. Create a DataFrame
df = veloxx.PyDataFrame({
    "name": veloxx.PySeries("name", ["Alice", "Bob", "Charlie", "David"]),
    "age": veloxx.PySeries("age", [25, 30, 22, 35]),
    "city": veloxx.PySeries("city", ["New York", "London", "New York", "Paris"]),
})
print("Original DataFrame:")
print(df)

# 2. Filter data: age > 25
filtered_df = df.filter([i for i, age in enumerate(df.get_column("age").to_vec_f64()) if age > 25])
print("\nFiltered DataFrame (age > 25):")
print(filtered_df)

# 3. Select columns
selected_df = df.select_columns(["name", "city"])
print("\nSelected Columns (name, city):")
print(selected_df)

# 4. Rename a column
renamed_df = df.rename_column("age", "years")
print("\nRenamed Column (age to years):")
print(renamed_df)

# 5. Series operations
age_series = df.get_column("age")
print(f"\nAge Series Sum: {age_series.sum()}")
print(f"Age Series Mean: {age_series.mean()}")
print(f"Age Series Max: {age_series.max()}")
print(f"Age Series Unique: {age_series.unique().to_vec_f64()}")

WebAssembly Usage (Node.js)

const veloxx = require('veloxx');

async function runWasmExample() {
    // 1. Create a DataFrame
    const df = new veloxx.WasmDataFrame({
        name: ["Alice", "Bob", "Charlie", "David"],
        age: [25, 30, 22, 35],
        city: ["New York", "London", "New York", "Paris"],
    });
    console.log("Original DataFrame:");
    console.log(df);

    // 2. Filter data: age > 25
    const ageSeries = df.getColumn("age");
    const filteredIndices = [];
    for (let i = 0; i < ageSeries.len; i++) {
        if (ageSeries.getValue(i) > 25) {
            filteredIndices.push(i);
        }
    }
    const filteredDf = df.filter(new Uint32Array(filteredIndices));
    console.log("\nFiltered DataFrame (age > 25):");
    console.log(filteredDf);

    // 3. Series operations
    console.log(`\nAge Series Sum: ${ageSeries.sum()}`);
    console.log(`Age Series Mean: ${ageSeries.mean()}`);
    console.log(`Age Series Unique: ${ageSeries.unique().toVecF64()}`);
}

runWasmExample();

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

veloxx-0.2.3-py3-none-win_amd64.whl (63.4 kB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file veloxx-0.2.3-py3-none-win_amd64.whl.

File metadata

  • Download URL: veloxx-0.2.3-py3-none-win_amd64.whl
  • Upload date:
  • Size: 63.4 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for veloxx-0.2.3-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 938c86f4d26b94172614b5c053a4040567e05fc5bc3d01eb3e0e6c67c42930a0
MD5 6fef7553432270f8fe8c2e73c7e0c85c
BLAKE2b-256 acf582942557bda906e9e2f0ef375af73cd02a549fe32aeec640acbfa4423631

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page