Skip to main content

🐻 DataFrame Library

Project description

Orso

Orso is a shared DataFrame library for Opteryx and Mabel.

PyPI Latest Release Downloads codecov Documentation

Overview

Orso is not intended to compete with Polars or Pandas (or your favorite bear DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.

Key Use Cases:

  • In Opteryx, Orso provides most of the database Cursor functionality
  • In Mabel, Orso provides the data schema and validation functionality

Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.

Installation

Install Orso from PyPI:

pip install orso

Quick Start

Creating a DataFrame

import orso

# Create from list of dictionaries
df = orso.DataFrame([
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
])

print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")

Displaying Data

# Display the DataFrame
print(df.display())

# Convert to different formats
arrow_table = df.arrow()  # PyArrow Table
pandas_df = df.pandas()   # Pandas DataFrame

Working with Schema

# Access column names
print("Columns:", df.column_names)

# Access schema information  
print("Schema:", df.schema)

Converting Between Formats

# From PyArrow
import pyarrow as pa
arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
orso_df = orso.DataFrame.from_arrow(arrow_table)

# To Pandas
pandas_df = orso_df.pandas()

Features

  • Lightweight: Minimal overhead for tabular data operations
  • Row-based: Optimized for row-oriented operations
  • Interoperable: Easy conversion to/from PyArrow, Pandas
  • Schema-aware: Built-in data validation and type checking
  • Fast serialization: Efficient conversion to bytes, tuples, and dictionaries

API Reference

DataFrame Class

The main DataFrame class provides the following key methods:

  • DataFrame(dictionaries=None, *, rows=None, schema=None) - Constructor
  • display(limit=5, colorize=True, show_types=True) - Pretty print the DataFrame
  • arrow(size=None) - Convert to PyArrow Table
  • pandas(size=None) - Convert to Pandas DataFrame
  • from_arrow(tables) - Create DataFrame from PyArrow Table(s)
  • fetchall() - Get all rows as list of Row objects
  • collect() - Materialize the DataFrame
  • append(other) - Append another DataFrame
  • distinct() - Get unique rows

Properties

  • rowcount - Number of rows
  • columncount - Number of columns
  • column_names - List of column names
  • schema - Schema information

Development

Building from Source

# Clone the repository
git clone https://github.com/mabel-dev/orso.git
cd orso

# Install dependencies
pip install -r requirements.txt
pip install -r tests/requirements.txt

# Build Cython extensions
make compile

# Run tests
make test

Contributing

Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:

  1. All tests pass: make test
  2. Code follows the project style: make lint
  3. New features include appropriate tests
  4. Documentation is updated for API changes

Performance Benchmarking

Orso includes a comprehensive performance benchmark suite to compare different versions:

# Run full benchmark suite
python tests/test_benchmark_suite.py

# Compare two versions
python tests/test_benchmark_suite.py -o baseline.json
# <switch version>
python tests/test_benchmark_suite.py -o current.json -c baseline.json

See BENCHMARK_SUITE.md for detailed documentation.

License

License

Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.

Status

Status

Orso is in beta. Beta means different things to different people, to us, being beta means:

  • Interfaces are generally stable but may still have breaking changes
  • Unit tests are not reliable enough to capture breaks to functionality
  • Bugs are likely to exist in edge cases
  • Code may not be tuned for performance

As such, we really don't recommend using Orso in critical applications.

Related Projects

  • Opteryx - SQL query engine for data files
  • Mabel - Data processing framework

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orso-0.0.242.tar.gz (352.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

orso-0.0.242-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl (911.1 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.34+ x86-64

orso-0.0.242-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (910.3 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp314-cp314-macosx_10_15_universal2.whl (326.0 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.242-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (920.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp313-cp313-macosx_10_15_universal2.whl (325.8 kB view details)

Uploaded CPython 3.13macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.242-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (939.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp312-cp312-macosx_10_15_universal2.whl (326.9 kB view details)

Uploaded CPython 3.12macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.242-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (885.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp311-cp311-macosx_10_15_universal2.whl (333.1 kB view details)

Uploaded CPython 3.11macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.242-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (845.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp310-cp310-macosx_10_15_universal2.whl (336.8 kB view details)

Uploaded CPython 3.10macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.242-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (838.7 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

orso-0.0.242-cp39-cp39-macosx_10_15_universal2.whl (337.4 kB view details)

Uploaded CPython 3.9macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file orso-0.0.242.tar.gz.

File metadata

  • Download URL: orso-0.0.242.tar.gz
  • Upload date:
  • Size: 352.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for orso-0.0.242.tar.gz
Algorithm Hash digest
SHA256 63384d4f2904e6aecbc9c8f3f91f6d93681d422973be59c08cb64b4cd65a6029
MD5 2f8026c485ea4bd243e2858b75bfa7a9
BLAKE2b-256 5aa7feac13046c4e1c077c98ab872c88190c8b70f2ee38f6d1b2211ba08e544a

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 82a10894666541b37257dc0bcb54affd38becc078882051dfa23246faf63f1af
MD5 68adf243f41a0e9d9f8d1e76404bfd33
BLAKE2b-256 289dc3b338f3da26afc31598101afa3c07688c9360ad446545834d5703edd648

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ea72c483ede9f867c55c2f1f1eafcce5948fad916b2ce8717d7ce1badf851c26
MD5 11c6206d713c44db466edd01c89f9b14
BLAKE2b-256 5f315c34fc1676a0b1715ee3252981b7a66f6816691482b84e222a3c4de7db1f

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 d5745281e1b253642bc923f5d1bcf907fb43bb44879f6ce6fc40fd3b40356491
MD5 28b0c67098228dd89071fd21d8831e16
BLAKE2b-256 ac6265eb8c109626aff33600dd97348eb925bd26b14a615802415a9e83b70843

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 89813bd389198915da3a9cadc201acf0de2d7fcc256dd01d5e44eec3781d6851
MD5 716933a3dbe30c87273dc64c06dca5e5
BLAKE2b-256 b102e9eee3e9ac3827acef824898f5e370687b78babd3c95e077065e37049b33

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp313-cp313-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp313-cp313-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 a4d1994a9df95278ba829ada606f01d38374720d4044dae2c6002b39a246a6cc
MD5 af36c7872fa1da642e86621f35e53365
BLAKE2b-256 f0e3b15375bde2f77f31317a97a0f2d68989e642daa5ea5bf16ca2ed961f8586

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 025e65b3115226e6ced1d2b255e3b8327680c68f0990b2ff8f6364f6119b83c3
MD5 b291dcaa29758e0e0f5d53f3074fa4f6
BLAKE2b-256 b4870950a6dbc45d3bf06dc9c1d7e1d86570034faf80e28e412ed4ddfd677dfc

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 754fec2be1a92b4896182ffb3c66e0a0bb0d7e54481f186e576425aa43c60991
MD5 f5d9cc9439b6e0af66760e9f071270c2
BLAKE2b-256 99d945a63622aef4cf9b908d90c668eff0ce5e4755260593ab0ac0dd09fa5924

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 3d427ecf38321e4cc8bf00948ce58c8912c929b000c396c785b9bb895fbecfb7
MD5 2854bec6edcf567ae48f25771f69a1c3
BLAKE2b-256 639a100dd10439a6573dd2d3750210ea0f6fd888ef16945dddae4dbeec0fb1d3

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 615cdbc275deef21677247d4ae81e12cc0d2178cbea4c5ee10af56b91dffdd69
MD5 85a7537bfcfc00be49bcb909bc801662
BLAKE2b-256 1e057dbc2a39feba95f9503625ed3f5ca946eb56e0587e00729ec7f2c4792568

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c1c687386fcfac554287a27028280f829bf95d3f9973538c14e32db4ce989737
MD5 44748ad30f458352a0a1bd95ce59d03b
BLAKE2b-256 7266ef64c25890db206d167a484baa4db5b393b5a8fe403a4b1915673db443a2

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 53600a6ce90eb6d1552d2badf719db45bf5fd5f3004eb3e9c9724b8982c53fcc
MD5 a06cbf895fb56e43b272f5c33645c1a9
BLAKE2b-256 0977175990d7db371ec74004d7f7521669b78c900ffafd6ec6aee6574d205748

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 b5b5bc7d955c97e0d8fb6f189c9dc76d5a6cf11b2677e4cbef8cd49cb18c7ac1
MD5 280e471a38f5f5b123a308f2580c4b64
BLAKE2b-256 25eafa21225b14a0e987c3c1283766501beeaf66db3b572c70844d5034608c59

See more details on using hashes here.

File details

Details for the file orso-0.0.242-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.242-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 111a6d9d5b2f256422e182efa112ba00a100a50631278d16570cbd696ddd58d8
MD5 19d377eaa9cdbd61181071835d88dd13
BLAKE2b-256 0cc5de5fefbfb34037990d45a40452997490a902fdb4aeab6c772978e684cb99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page