Skip to main content

SQL based, pandas compatible DataFrame & Series

Project description

pandas-alchemy

pandas-alchemy is a Python package for analyzing data in SQL databases using a pandas compatible interface.

While SQL the language absolutely sucks, SQL databases are great. Databases shine at handling larger-than-memory data. There have been countless hours of engineering spent on optimizing their performance. Furthermore, often times, the data we need to analyze is already in some databases.

pandas-alchemy implements SQL based DataFrame and Series. Unlike read_sql() in pandas, the entire table is not loaded into the main memory. Instead, a SQL query is stored and is used to back a DataFrame or a Series.

import pandas as pd
from pandas_alchemy import init_db, DataFrame


# Connect to the database using an sqlalchemy database URL
init_db('postgresql://foo:bar@localhost/quux')
df = DataFrame.from_table('foobar')
df.head()  # Just treat it like a pandas DataFrame
df + [1,2,3]  # Arithmetic works too

pd_df = pd.DataFrame({'baz': [1, 2], 'quux': ['a', 'b']})
df + pd_df  # pandas interoperable

# Getting a Series from the DataFrame
# Data is not actually loaded into memory, either
df.baz

# Actually retrieve data and convert to a pandas DataFrame
df.to_pandas()

:warning: Unusability Warning :warning:

This project is in its very early development. Many features are still missing. While it is the goal of this project, it is not possible, as of now, to replace a pandas DataFrame or Series with a pandas-alchemy counterpart and to expect the program to JustWork:tm:.

pandas_alchemy.use_repr_workaround()

Currently, DataFrame.__repr__() and Series.__repr__() is not implemented yet. By calling pandas_alchemy.use_repr_workaround(), DataFrame.__repr__() and Series.__repr__() will retrieve all data, convert itself to a pandas counterpart, and then repr() that pandas counterpart. Note that this will usually defeat the whole point of using pandas-alchemy.

Exception-to-exception compatibility

pandas-alchemy aims to be completely transparent to the program. There should be zero difference between the behaviour of a pandas-alchemy DataFrame and a pandas DataFrame, even when an exception is raised. Differences between behaviours, including differences in exceptions raised (to a reasonable extent), are considered bugs for the purpose of this project. Please open an issue if you have spotted such a difference.

Usage

  1. pandas_alchemy.init_db(<sqlalchemy database URL>)
  2. df = pandas_alchemy.DataFrame.from_table(<table>)
  3. Treat df as a pandas DataFrame and do something awesome about it.
  4. Profit!!!

init_db(*args, **kwargs)

Create a database connection. If already connected, raise RuntimeError.

*args and **kwargs are passed directly to sqlalchemy.create_engine(). See the documentation of sqlalchemy.create_engien()

close_db()

Close the database connection. If not connected yet, raise RuntimeError.

DataFrame(index, columns, cte)

Probably not something you are looking for.

Create a DataFrame using index, columns and cte. index is a pandas Index representing the levels in the DataFrame's index. columns is a pandas Index representing the column names. cte is a sqlalchemy CTE that backs the DataFrame.

DataFrame.from_table(table, schema=None, columns=None, index=None)

Load table from the database as a DataFrame.

If columns is not None, it is taken as an ordered list of columns to be included in the DataFrame.

If index is a list-like object, it is taken as an ordered list of columns whose values are the DataFrame's index. Otherwise, if index is not None, it is taken as the name of the column to become the DataFrame's index.

DataFrame.from_pandas(df, optional=False)

Convert the pandas DataFrame df to a DataFrame. If df is not an instance of pandas.DataFrame, return it as is when optional is True. Otherwise, raise TypeError.

DataFrame.to_pandas()

Convert the DataFrame to a pandas DataFrame.

Series(index, columns, cte, name)

Probably not something you are looking for.

Create a Series using index, columns and cte, with name name. index is a pandas Index representing the levels in the Series's index. columns is a pandas Index representing the column names. cte is a sqlalchemy CTE that backs the Series. name is the name of the Series.

Series.from_pandas(seq, name=None, optional=False)

Convert the pandas Series seq to a Series. If name is not None, it will be used as the name of the resulting Series instead of seq.name. If seq is not an instance of pandas.Series, return it as is when optional is True, Otherwise, raise TypeError.

Series.to_pandas()

Convert the Series to a pandas Series.

pandas API Coverage

See API_COVERAGE.md.

Known Issues & Limitations

  • Cannot distinguish 0.0 and -0.0 (IEEE float)
  • Returns None for NaN in SQLite3 if every value in the column is None
  • Lacks support for arithmetic between two MultiIndex DataFrame/Series

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-alchemy-0.0.2.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_alchemy-0.0.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file pandas-alchemy-0.0.2.tar.gz.

File metadata

  • Download URL: pandas-alchemy-0.0.2.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for pandas-alchemy-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e7fcebb0fe8b6260fbb89daea4e047bc7c69291b10b146c87750b405d20e6d95
MD5 4a035e2b3c13c7df0814c16a33b6e97b
BLAKE2b-256 6181409d23fa853a4a821a9ba977dc630b1757fce3882fa813061e8bbc6be9f3

See more details on using hashes here.

File details

Details for the file pandas_alchemy-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pandas_alchemy-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for pandas_alchemy-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a24a13b315ad88487152e8fa8a2ccf16df0886350b95c178a19f4b498eb8c5dd
MD5 bbe9dc6d5c52c5e057f574fb721051ea
BLAKE2b-256 2e9855d0b6f8d15c1d58bdf632a387d103d816816e8b7929ca4c5afd8d6ee50f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page