Skip to main content

Pandas-ORM Integration.

Project description

# RedPanda: Pandas & SQLAlchemy

<img src=https://travis-ci.org/amancevice/redpanda.svg?branch=master/>

Two great tastes that taste great together.

Use RedPanda to add simple pandas integration into your declarative models.

View [example.py](./example.py) for extended usage.

*Last Updated: `0.0.3`*


## Installation

```bash
pip install redpanda
```


## Basic Use

RedPanda wraps the `pandas.read_sql()` function into a dialect-agnostic class-method for declarative SQLAlchemy models. Use mixins to add the `redpanda()` and `redparse()` methods to your declarative model classes:

```python
import redpanda.mixins
import sqlalchemy.orm, sqlalchemy.ext.declarative

# Create an in-memory SQLite database engine and bind to RedPanda
engine = sqlalchemy.create_engine("sqlite://", echo=True)
redpanda.bind(engine)

# Call redpanda() to get a query-like object
MyModel.redpanda()
```

Use the resulting `RedPanda` instance to transform SQLAlchemy queries into DataFrames:

```python
MyModel.redpanda().join(MyParent).filter(MyParent.my_attr=='my_val').frame()
```

Or parse a DataFrame into SQLAlchemy model list-generator:

```python
for model in MyModel.redparse(frame):
print model
```


## Dialects

While the arguments to `pandas.read_sql()` are dialect-dependent, RedPanda is intended to be completely dialect-agnostic. RedPanda supports some SQLAlchemy dialects out of the box (MySQL, Postgres, and SQLite are supported). You can add support for other dialects by constructing a function to extract a parameter data-struct (eg, tuple, dict) from a compiled query statement:

```python
engine = sqlalchemy.create_engine("example://host/db")
func = lambda statement: statement.params.items()
redpanda.dialects.add(type(engine.dialect), func)
```


## Extended Use

View [example.py](./example.py) for extended usage examples.


## RedPanda Declarative Mixin

Use the `redpanda.mixins.RedPandaMixin` mixin to add RedPanda to your declarative SQLAlchemy models.

```python
class Widget(redpanda.mixins.RedPandaMixin, Base):
__tablename__ = "widgets"
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
timestamp = sqlalchemy.Column(sqlalchemy.DateTime)
name = sqlalchemy.Column(sqlalchemy.String)
kind = sqlalchemy.Column(sqlalchemy.String)
units = sqlalchemy.Column(sqlalchemy.Integer)
```

## Mixin Customization

Add customization at the model-level by overriding the default class attributes:


#### \__redpanda__

If you wish to use your own custom `RedPanda` class, you can override the `__redpanda__` class attribute:

```python
class MyRedPanda(redpanda.orm.RedPanda):
# ... custom logic here

class Widget(redpanda.mixins.RedPandaMixin, Base):
# ... see above for full definition
__redpanda__ = MyRedPanda

Widget.redpanda()
# => <MyRedPanda>
```


#### \__read_sql__

Set the `__read_sql__` attribute to control the defualt arguments for `frame()`, which are passed to `pandas.read_sql()`

```python
class Widget(redpanda.mixins.RedPandaMixin, Base):
# ... see above for full definition

# Class-defined RedPanda read_sql() arguments
# This allows us to forego passing these into Widget.redpanda()
__read_sql__ = {
"index_col" : ["created_at"],
"parse_dates" : ["created_at"] }

Widget.redpanda().frame()
# Same as Widget.redpanda().frame(index_col=["created_at"], parse_dates=["created_at"])
```


## Accessing Data

The `redpanda()` class-method accepts the same arguments as a `sqlalchemy.orm.Query` object:
* `entities` or the "select" portion of the query
* `session` an optional session binding

If the `entities` argument is omitted, the default behavior is to select the entire table.

If the `**read_sql` keyword-argument dict is omitted from the `frame()` method, the values are taken from the model class-attribute `__read_sql__`.

```python
# Default select-all
Widget.redpanda()

# Refine data set
Widget.redpanda([Widget, Joiner])\
.join(Joiner)
.filter(Widget.joiner_id==Joiner.id)

# Supply **read_sql keyword-args to alter returned DataFrame
Widget.redpanda().frame(index_col="id", parse_dates="timestamp")
```


## Parsing DataFrames as Models

The `redparse()` class-method handles the reverse translation of a DataFrame into a collection of SQLAlchemy models.

Use the `parse_index` flag to parse a named index as a model attribute:

```python
frame = pandas.DataFrame({
datetime.utcnow() : {"name" : "foo", "kind" : "fizzer", "units" : 10 },
datetime.utcnow() : {"name" : "goo", "kind" : "buzzer", "units" : 11 },
datetime.utcnow() : {"name" : "hoo", "kind" : "bopper", "units" : 12 }
}).T
frame.index.name = 'timestamp'

widgetgen = Widget.redparse(frame, parse_index=True)
for widget in widgetgen:
print widget
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redpanda-0.0.6.tar.gz (5.3 kB view details)

Uploaded Source

File details

Details for the file redpanda-0.0.6.tar.gz.

File metadata

  • Download URL: redpanda-0.0.6.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for redpanda-0.0.6.tar.gz
Algorithm Hash digest
SHA256 67022602665d715a7de41d9d8e0a03a3426ded59faee9d84e29ce0fdda6b63bd
MD5 450bf0a5b7bb85afa37ecb15d5b5fe3e
BLAKE2b-256 e67a399046ccfd530953d5c02e1f0366dcf468571f0a24b59d9237aec4a1b323

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page