Skip to main content

🐻 DataFrame Library

Project description

Orso

Orso is a shared DataFrame library for Opteryx and Mabel.

PyPI Latest Release Downloads codecov Documentation

Overview

Orso is not intended to compete with Polars or Pandas (or your favorite bear DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.

Key Use Cases:

  • In Opteryx, Orso provides most of the database Cursor functionality
  • In Mabel, Orso provides the data schema and validation functionality

Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.

Installation

Install Orso from PyPI:

pip install orso

Quick Start

Creating a DataFrame

import orso

# Create from list of dictionaries
df = orso.DataFrame([
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
])

print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")

Displaying Data

# Display the DataFrame
print(df.display())

# Convert to different formats
arrow_table = df.arrow()  # PyArrow Table
pandas_df = df.pandas()   # Pandas DataFrame

Working with Schema

# Access column names
print("Columns:", df.column_names)

# Access schema information  
print("Schema:", df.schema)

Converting Between Formats

# From PyArrow
import pyarrow as pa
arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
orso_df = orso.DataFrame.from_arrow(arrow_table)

# To Pandas
pandas_df = orso_df.pandas()

Features

  • Lightweight: Minimal overhead for tabular data operations
  • Row-based: Optimized for row-oriented operations
  • Interoperable: Easy conversion to/from PyArrow, Pandas
  • Schema-aware: Built-in data validation and type checking
  • Fast serialization: Efficient conversion to bytes, tuples, and dictionaries

API Reference

DataFrame Class

The main DataFrame class provides the following key methods:

  • DataFrame(dictionaries=None, *, rows=None, schema=None) - Constructor
  • display(limit=5, colorize=True, show_types=True) - Pretty print the DataFrame
  • arrow(size=None) - Convert to PyArrow Table
  • pandas(size=None) - Convert to Pandas DataFrame
  • from_arrow(tables) - Create DataFrame from PyArrow Table(s)
  • fetchall() - Get all rows as list of Row objects
  • collect() - Materialize the DataFrame
  • append(other) - Append another DataFrame
  • distinct() - Get unique rows

Properties

  • rowcount - Number of rows
  • columncount - Number of columns
  • column_names - List of column names
  • schema - Schema information

Development

Building from Source

# Clone the repository
git clone https://github.com/mabel-dev/orso.git
cd orso

# Install dependencies
pip install -r requirements.txt
pip install -r tests/requirements.txt

# Build Cython extensions
make compile

# Run tests
make test

Contributing

Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:

  1. All tests pass: make test
  2. Code follows the project style: make lint
  3. New features include appropriate tests
  4. Documentation is updated for API changes

Performance Benchmarking

Orso includes a comprehensive performance benchmark suite to compare different versions:

# Run full benchmark suite
python tests/test_benchmark_suite.py

# Compare two versions
python tests/test_benchmark_suite.py -o baseline.json
# <switch version>
python tests/test_benchmark_suite.py -o current.json -c baseline.json

See BENCHMARK_SUITE.md for detailed documentation.

License

License

Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.

Status

Status

Orso is in beta. Beta means different things to different people, to us, being beta means:

  • Interfaces are generally stable but may still have breaking changes
  • Unit tests are not reliable enough to capture breaks to functionality
  • Bugs are likely to exist in edge cases
  • Code may not be tuned for performance

As such, we really don't recommend using Orso in critical applications.

Related Projects

  • Opteryx - SQL query engine for data files
  • Mabel - Data processing framework

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orso-0.0.243.tar.gz (353.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

orso-0.0.243-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl (911.0 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.34+ x86-64

orso-0.0.243-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (910.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp314-cp314-macosx_10_15_universal2.whl (325.9 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.243-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (921.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp313-cp313-macosx_10_15_universal2.whl (325.7 kB view details)

Uploaded CPython 3.13macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.243-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (939.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp312-cp312-macosx_10_15_universal2.whl (326.8 kB view details)

Uploaded CPython 3.12macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.243-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (885.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp311-cp311-macosx_10_15_universal2.whl (333.1 kB view details)

Uploaded CPython 3.11macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.243-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (845.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp310-cp310-macosx_10_15_universal2.whl (336.7 kB view details)

Uploaded CPython 3.10macOS 10.15+ universal2 (ARM64, x86-64)

orso-0.0.243-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (838.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

orso-0.0.243-cp39-cp39-macosx_10_15_universal2.whl (337.3 kB view details)

Uploaded CPython 3.9macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file orso-0.0.243.tar.gz.

File metadata

  • Download URL: orso-0.0.243.tar.gz
  • Upload date:
  • Size: 353.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for orso-0.0.243.tar.gz
Algorithm Hash digest
SHA256 3ad2f531e151ab3b8172f60584d9365caf31bab55989666413ff1e062ccc5fdc
MD5 98e44330ae9b188646c78f786e0a0585
BLAKE2b-256 3e3386ff47148088cc0b3970d66862fb4fc36392955c0b91fd2b0be86f0b3fed

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4698aab578bfa88f326b54c1182d1bbb2fcd6fcfc9a897c846e24971132ddc57
MD5 d55dfc188875f3a5d140741f37fd475a
BLAKE2b-256 0a28c8b5640c72ee37fd40a67bb7113767230339f4c89a9c7ad698fd001c7e38

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2d2522f498e053eaeb269f7453d85daa563ed195704c2777692ad8e3d0bf60d3
MD5 b9e491d7d121c58bc0e7299be09d7e23
BLAKE2b-256 b917f4edc7a4dc74a8eb334027f37ed43302758c9d4a49cd06a43dcdb5ae443e

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 2fdc9ff8fcdca18a1e218e530cf227ac57f25680643c35581fbdb00f8fff0dbf
MD5 ee1299e25359a0cec3bdde64c1ed3109
BLAKE2b-256 ded10df61a321f3ddc01061487e246560dca955e960d4a9ae3542c0730874616

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 96c8d73266fb69ec00f7f788674c6571ab29e7c59e4a4bd88cd4038f1a1e9503
MD5 11f51d2d4383f409022476a39ba7e14f
BLAKE2b-256 8ae1908666809572852384ec8dccf19d0c4f47eec2a6e1e685646fd5567ed7fb

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp313-cp313-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp313-cp313-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 b174bfcd9afea3bb8f63c66fe5c9ff75c5fcfe1fab92be3ffd3db706abc94e47
MD5 830597b6bc660c070ed4d9d8882fdfbd
BLAKE2b-256 8f119faadc73baba06305fd902676e091a48c5863fd7fd522ecf5b37c1f26536

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 bd150e3755e04a264e31915f2eb201b9adb8e94fb9b8bf4472c90dfa787dbcc0
MD5 3fe2f5598b83a66b310ca4b7df317449
BLAKE2b-256 c3cfd7f55f39e73ae6d5fa1956b5c3dce5e00f3c770143f4f01c5be6683a3dc5

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 ca10b79034291b450329dd09a10aea1ca65f8d613360697f59464ab7fdd57df6
MD5 b69b991f08e79304817b5d1a58ccc812
BLAKE2b-256 afa50e65f614c60255ceeb2165f0be7482ba278a2cc46a5431ff55ae4d85aafd

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 89c885e3128825ac35cdda1cd341697d00660e36edfb1e8c6880488235772eec
MD5 dedd7ea8ef9f1aa6a8d91e0a47361854
BLAKE2b-256 7f6b5d758c192b70000e6188f4e12cf5135152cb0e8cda579ce5ac8512543c6b

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 38b8ca32f557fa32591fee905f0c36664d118c47d23b42fce255ec3acc8f19fe
MD5 8e6ee1a81a03c6ec87e2149a84fac29f
BLAKE2b-256 af8df8d4ad1a9f980d4cd7dedf15d62be15d795836871b5efa5c6bb3456dc37d

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 9671f622e3f2804d718122c548ec729869a1ac1ee6c5cf4304b0b27693e90886
MD5 65ccf135d289d178c9f6d9ad50192362
BLAKE2b-256 9eb9c380fc33083d835c233156cd35a6ecff4b8cf906071ccddc8d56be667f1e

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 8db19af87e4f8d59336ab4007d2eb181b35d43ceac3f1bb618e159b1044dd610
MD5 4ebfa9bf5c52a58175798fe008cae93a
BLAKE2b-256 8265919dbcb787845d577b2ff6a8c1ac8500be74ead6307734b39ae62a25b752

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 b25c62e5f2b77ff1d0f3059cde18e647cb4617b3093f26973c1b247bf8e82e49
MD5 e9c32ce4b842dd1d25a3f5c66b2d2231
BLAKE2b-256 5cd2d33277926495170893dbae8dce5abac2b8ada10d8cbd7363e8c7f5e5365d

See more details on using hashes here.

File details

Details for the file orso-0.0.243-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for orso-0.0.243-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 94aac66a69ad0ba25d3b3e4764730314fa3d06db4fbc323fffd3c00453a4f568
MD5 74d4e7aba5dd14fda4fc80f8e73c961d
BLAKE2b-256 830e25d61af50b3942b1a0747fc51ee90d94b2e2de7a6792c11949f00b74b567

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page