A small library with Pandas-Like api used for function ops execution and data transforms.

Project description

Dataruns

A powerful Python library for function pipeline execution and convenient data transformations. Build easy pipelines to execute different ops on your data. It is built on top of Pandas and Numpy.

Features

✨ Core Capabilities:

Pipeline Execution: Chain multiple data transformations seamlessly
Pandas-Like API: Familiar interface if you know pandas
Multiple Data Sources: Load from CSV, Excel, SQLite, and URLs
Built-in Transforms: Standard scalers, missing value handlers, column selection
NumPy & Pandas Support: Works with both arrays and DataFrames
Stateful Operations: Transforms remember their state (mean, std) for consistent results

Installation

pip install dataruns

Or with uv:

uv add dataruns

Quick Start

Basic Pipeline

from dataruns import Pipeline, standard_scaler, fill_na
import pandas as pd

# Create sample data
df = pd.DataFrame({
    'age': [20, 30, 40],
    'salary': [30000, 50000, 70000]
})

# Create a pipeline
pipeline = Pipeline(
    fill_na(strategy='mean'),      # Fill missing values
    standard_scaler()               # Standardize the data
)

# Execute the pipeline
result = pipeline(df)
print(result)

Load Data from Files

from dataruns import CSVSource, XLSsource, SQLiteSource

# From CSV
csv_source = CSVSource('data.csv')
df = csv_source.extract_data()

# From Excel
excel_source = XLSsource('data.xlsx', sheet_name='Sheet1')
df = excel_source.extract_data()

# From SQLite
sqlite_source = SQLiteSource('database.db', 'SELECT * FROM my_table')
df = sqlite_source.extract_data()

# From URL
csv_source = CSVSource(url='https://example.com/data.csv')
df = csv_source.extract_data()

Quick Convenience Functions

from dataruns import load_csv

# Load CSV quickly
data = load_csv('data.csv')

Core Concepts

Pipelines

Pipeline: Execute transforms sequentially

from dataruns import Pipeline

pipeline = Pipeline(transform1, transform2, transform3, verbose=True)
result = pipeline(data)

Make_Pipeline: Builder pattern for dynamic construction

from dataruns import Make_Pipeline

builder = Make_Pipeline()
builder.add(fill_na(strategy='mean'))
builder.add(standard_scaler())
pipeline = builder.build()

Available Transforms

from dataruns.core.transforms import get_transforms

# This lists out all available transforms that have been implemented
print(get_transforms())

Complete Example

from dataruns import Pipeline, load_csv
from dataruns.core.transforms import select_columns, fill_na, standard_scaler
import numpy as np

# Load data
data = load_csv('customers.csv')

# Create comprehensive pipeline
pipeline = Pipeline(
    fill_na(strategy='mean'),           # Handle missing values
    select_columns(['age', 'income']),  # Keep relevant columns
    standard_scaler(),                  # Normalize for ML
    verbose=True                        # Show each step
)

# Process data
result = pipeline(data)

# Use with machine learning models
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(result)

Data Sources

Datasources that are supported include CSVSource, XLSsource, SQLiteSource. More to come soon

from dataruns import CSVSource, XLSsource, SQLiteSource

# CSV
source = CSVSource(file_path='data.csv')
# or from URL
source = CSVSource(url='https://example.com/data.csv')

# Excel
source = XLSsource(file_path='data.xlsx', sheet_name='Sheet1')

# SQLite
source = SQLiteSource(
    connection_string='database.db',
    query='SELECT * FROM users WHERE age > 18'
)

# Extract data
df = source.extract_data()

Important Notes

Stateful Transforms

Transforms remember their state from the first call:

scaler = standard_scaler()

# First call: learns mean/std from data1
result1 = scaler(data1)

# Second call: reuses data1's statistics
result2 = scaler(data2)  # Normalized using data1's mean/std!

This matches scikit-learn's fit/transform pattern. Create new transform instances for independent scaling:

scaler1 = standard_scaler()  # For data1
result1 = scaler1(data1)

scaler2 = standard_scaler()  # For data2 (fresh state)
result2 = scaler2(data2)

Working with Different Data Types

Dataruns is built on pandas Dataframe and NumPy ndarray

import numpy as np
import pandas as pd
from dataruns import Pipeline, standard_scaler

# Works with arrays
array = np.array([[1, 2], [3, 4]])
pipeline(array)

# Works with DataFrames
df = pd.DataFrame({'a': [1, 3], 'b': [2, 4]})
pipeline(df)

# Works with lists (converted to array)
lst = [[1, 2], [3, 4]]
pipeline(lst)

Development

Install development dependencies:

uv add --dev pytest pytest-cov ruff black

Run tests:

uv run pytest

Run with coverage:

uv run pytest --cov=src/dataruns

Lint code:

uv run ruff check src/

Format code:

uv run black src/

License

MIT License - see LICENSE file for details

Author

Daniel Ali

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Issues

Do note that not all tests were marked as passed(about 8) but these tests are very niche tests Found a bug? Please report it on our issue tracker

Changelog

See CHANGELOG.md for version history and updates.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Oct 23, 2025

0.1.2

Jul 1, 2025

0.1.2a0 pre-release

Jul 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataruns-0.2.0.tar.gz (18.2 kB view details)

Uploaded Oct 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataruns-0.2.0-py3-none-any.whl (13.1 kB view details)

Uploaded Oct 23, 2025 Python 3

File details

Details for the file dataruns-0.2.0.tar.gz.

File metadata

Download URL: dataruns-0.2.0.tar.gz
Upload date: Oct 23, 2025
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for dataruns-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d16f7767ca554beb9f19d1ec7ed1168c45e8c083c7f00b4cfe147ec0a4f96e08`
MD5	`2d8f979bda22d1245a52a14c86ab2a9d`
BLAKE2b-256	`cf7adb56b2b90b31520dc7477fe3875b58b5cf6c4fec8335f642342efcf8868e`

See more details on using hashes here.

File details

Details for the file dataruns-0.2.0-py3-none-any.whl.

File metadata

Download URL: dataruns-0.2.0-py3-none-any.whl
Upload date: Oct 23, 2025
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for dataruns-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2985e2766a3aa241ec96c16db23204561b14d3f68aff6f613b06f669f55b9a9c`
MD5	`46199171a559570026a8226dec91d65c`
BLAKE2b-256	`1599e4b478f70846078f8cc42a57f1f9aba22657a684afbe8dfd4a824ca3e3fb`

See more details on using hashes here.

dataruns 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Dataruns

Features

Installation

Quick Start

Basic Pipeline

Load Data from Files

Quick Convenience Functions

Core Concepts

Pipelines

Available Transforms

Complete Example

Data Sources

Important Notes

Stateful Transforms

Working with Different Data Types

Development

License

Author

Contributing

Issues

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes