Skip to main content

🐻 DataFrame Library

Project description

Orso

Orso is a shared DataFrame library for Opteryx and Mabel.

PyPI Latest Release Downloads codecov Documentation

Overview

Orso is not intended to compete with Polars or Pandas (or your favorite bear DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.

Key Use Cases:

  • In Opteryx, Orso provides most of the database Cursor functionality
  • In Mabel, Orso provides the data schema and validation functionality

Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.

Installation

Install Orso from PyPI:

pip install orso

Quick Start

Creating a DataFrame

import orso

# Create from list of dictionaries
df = orso.DataFrame([
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
])

print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")

Displaying Data

# Display the DataFrame
print(df.display())

# Convert to different formats
arrow_table = df.arrow()  # PyArrow Table
pandas_df = df.pandas()   # Pandas DataFrame

Working with Schema

# Access column names
print("Columns:", df.column_names)

# Access schema information  
print("Schema:", df.schema)

Converting Between Formats

# From PyArrow
import pyarrow as pa
arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
orso_df = orso.DataFrame.from_arrow(arrow_table)

# To Pandas
pandas_df = orso_df.pandas()

Features

  • Lightweight: Minimal overhead for tabular data operations
  • Row-based: Optimized for row-oriented operations
  • Interoperable: Easy conversion to/from PyArrow, Pandas
  • Schema-aware: Built-in data validation and type checking
  • Fast serialization: Efficient conversion to bytes, tuples, and dictionaries

API Reference

DataFrame Class

The main DataFrame class provides the following key methods:

  • DataFrame(dictionaries=None, *, rows=None, schema=None) - Constructor
  • display(limit=5, colorize=True, show_types=True) - Pretty print the DataFrame
  • arrow(size=None) - Convert to PyArrow Table
  • pandas(size=None) - Convert to Pandas DataFrame
  • from_arrow(tables) - Create DataFrame from PyArrow Table(s)
  • fetchall() - Get all rows as list of Row objects
  • collect() - Materialize the DataFrame
  • append(other) - Append another DataFrame
  • distinct() - Get unique rows

Properties

  • rowcount - Number of rows
  • columncount - Number of columns
  • column_names - List of column names
  • schema - Schema information

Development

Building from Source

# Clone the repository
git clone https://github.com/mabel-dev/orso.git
cd orso

# Install dependencies
pip install -r requirements.txt
pip install -r tests/requirements.txt

# Build Cython extensions
make compile

# Run tests
make test

Contributing

Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:

  1. All tests pass: make test
  2. Code follows the project style: make lint
  3. New features include appropriate tests
  4. Documentation is updated for API changes

Performance Benchmarking

Orso includes a comprehensive performance benchmark suite to compare different versions:

# Run full benchmark suite
python tests/test_benchmark_suite.py

# Compare two versions
python tests/test_benchmark_suite.py -o baseline.json
# <switch version>
python tests/test_benchmark_suite.py -o current.json -c baseline.json

See BENCHMARK_SUITE.md for detailed documentation.

License

License

Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.

Status

Status

Orso is in beta. Beta means different things to different people, to us, being beta means:

  • Interfaces are generally stable but may still have breaking changes
  • Unit tests are not reliable enough to capture breaks to functionality
  • Bugs are likely to exist in edge cases
  • Code may not be tuned for performance

As such, we really don't recommend using Orso in critical applications.

Related Projects

  • Opteryx - SQL query engine for data files
  • Mabel - Data processing framework

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orso-0.0.233.tar.gz (349.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

orso-0.0.233-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (907.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp314-cp314-macosx_10_15_universal2.whl (323.0 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.233-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (918.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp313-cp313-macosx_10_15_universal2.whl (322.9 kB view details)

Uploaded CPython 3.13macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.233-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (936.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp312-cp312-macosx_10_15_universal2.whl (324.0 kB view details)

Uploaded CPython 3.12macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.233-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (882.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp311-cp311-macosx_10_15_universal2.whl (330.2 kB view details)

Uploaded CPython 3.11macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.233-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (842.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp310-cp310-macosx_10_15_universal2.whl (333.9 kB view details)

Uploaded CPython 3.10macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.233-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (835.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

orso-0.0.233-cp39-cp39-macosx_10_15_universal2.whl (334.4 kB view details)

Uploaded CPython 3.9macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file orso-0.0.233.tar.gz.

File metadata

  • Download URL: orso-0.0.233.tar.gz
  • Upload date:
  • Size: 349.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for orso-0.0.233.tar.gz
Algorithm Hash digest
SHA256 8ecd307dfd91758712c06d336a36b7e56bc0a0584580659565283e851e7abf90
MD5 927dc716a3f5a155294a5504d2956786
BLAKE2b-256 39d1e922b8f3322c81795c63037dc3c8ec2320269e3dbf164a062491824222b0

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7dce1ba9fc66228a1bdfddf83e7a7f9ee8df65fc8c6368514b1295a6a9048a9a
MD5 d1bd8c5bbf3d00350c623697719cbb4e
BLAKE2b-256 6e26e69fc51364129e00b64b2a420a32a73d40d248c07c976229132965c831fb

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 d192880e5fcfbb39cc072f7fe4d51c459581e090d45960ab7b12235928645ac1
MD5 7be12f700fab0ae58087034bb236361b
BLAKE2b-256 66f56ab18835143d1c613c967d63ba91eb2e3eeb2f5876848013c6eb849775d3

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 3b2f8eace7d6a4115a3d80f866777bd9bee23bdefa99413fa6d43079b7ffcbab
MD5 658ab082341a011613746d2bd97f9342
BLAKE2b-256 e0eb1864f045f9068e7309a56965b1ed19ba1c91f0edfe0a27ed454330c5dc22

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp313-cp313-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp313-cp313-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 f5fdb5773384c7cb0d1261908b22e9983057874cbb60a370830f9b3f944f826b
MD5 64100dc9bb14c0e6a099dffb510f4507
BLAKE2b-256 b511b5f78a9ddd157c60c311c5dde84bfc6feae89ca391f9b9004a13e3b0ab24

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7c282507eb6b785667564b36800a2b12358eb50b97e0f389f4b48fb2c8f154ed
MD5 8b688c00dfd8f84d177a2370b5714a3d
BLAKE2b-256 6e466827ab1bd0d81f8936928bc7427b066fe7c4bdb1ef423603870161745b54

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 b9e4ba681ea9259dc193fe07cf134369f86a5278263c845e83ba29d3dc76ad9d
MD5 5d91bb79155fc3d66270b0b5e73bc261
BLAKE2b-256 da56d428241450ab9114358956d149160f886c4aeeb4db61c1abe47cb4ba379b

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0a7dc19763ca048039cd3483325b74fc849872868d4ec523290786bdb6c4c52d
MD5 376e992ec7324704d15fab6f12ea2fdb
BLAKE2b-256 0e54ad32a04ddcd95011fd0a6b289acf11d81036fe2ca7a4aa9c439fa207577b

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 b5268adb00cdcaea279a94e04485158ddf2a8e6c30c5f22d95c4d7d3aa60a8df
MD5 125c4b84498b739bbb081feac27716e6
BLAKE2b-256 5cf38302934d37cc15f835ed67e9c47807ca15aaa2ba387e3db9e98d844ad576

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 27aac8848e0ad873dbc9fb5588c314f6d804a89ac09f9dcdb0b4200781bb5a8b
MD5 ef861c067cc11b736e29c2cd3c92c903
BLAKE2b-256 3b5485946ef8d46082cdaa530eb91875e3868a913aaca6476b917fe2faf4c1c8

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 0ea41f049b3c7998f025129f593cec18ff14d24651d0ef1f441e2901f4cf4d49
MD5 b43ba4597b8a88e069e1016f7cdec65e
BLAKE2b-256 723f32dfef3bdbed72d9172648808f6b77e2b6c0597092d03a3d219eec482b98

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ba14ab0feda444ed484524339104d432a61490ee21bfd147e97d32e38fdb25ce
MD5 0d3b149473c8640188eef07a4409dbc3
BLAKE2b-256 9e351920ec8fcbcef819f3851221205cfea2db0ee6517a1df87fd1e8b8baf6b1

See more details on using hashes here.

File details

Details for the file orso-0.0.233-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.233-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 4df3c9ea0502e9e7817a45480d97cda20bb6af20cd2e1eeed9598facd655b779
MD5 1fec8bf3aeb95214315aa931c39f13c5
BLAKE2b-256 e2805f372efaf9fad7665c1936ca1503cdd5c7fb1f1a76b090420f296f3da120

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page