Skip to main content

High performance datastore built upon Apache Arrow & Feather

Project description

FeatherStore

Documentation Status Test Status PyPI version Dev Status License: MIT

High performance datastore built upon Apache Arrow & Feather

FeatherStore is a fast datastore for storing Pandas DataFrames, Pandas Series, Polars DataFrames and PyArrow Tables as partitioned Feather Files. FeatherStore supports several operations on stored tables that can be done without loading in the full data:

  • Partial reading of data
  • Append data
  • Insert data
  • Update data
  • Drop data
  • Read metadata (column names, index, table shape, etc)
  • Changing column types

To learn more, read the User Guide.

Using FeatherStore

>>> # Create a Pandas DataFrame
import pandas as pd
from numpy.random import randn
import featherstore as fs

dates = pd.date_range("2021-01-01", periods=5)
df = pd.DataFrame(randn(5, 4), index=dates, columns=list("ABCD"))

                   A         B         C         D
2021-01-01  0.402138 -0.016436 -0.565256  0.520086
2021-01-02 -1.071026 -0.326358 -0.692681  1.188319
2021-01-03  0.777777 -0.665146  1.017527 -0.064830
2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509
2021-01-05 -0.649335 -0.830602  1.191749  0.396745

>>> # Create a database folder at the given path
fs.create_database('path/to/db')
fs.connect('path/to/db')
# Creates a data store
fs.create_store('example_store')
# List existing stores in current database
fs.list_stores()

['example_store']

>>> # Connects to store
store = fs.Store('example_store')
# Saves table to store; partition size defines the size of each partition in bytes
PARTITION_SIZE = 128  # bytes
store.write_table('example_table', df, partition_size=PARTITION_SIZE)
# Lists existing tables in current store
store.list_tables()

['example_table']

>>> # FeatherStore can read tables as Arrow Tables, Pandas DataFrames or Polars DataFrames
store.read_pandas('example_table')
# store.read_arrow('example_table') for reading to Arrow Tables
# store.read_polars('example_table') for reading to Polars DataFrames

                   A         B         C         D
2021-01-01  0.402138 -0.016436 -0.565256  0.520086
2021-01-02 -1.071026 -0.326358 -0.692681  1.188319
2021-01-03  0.777777 -0.665146  1.017527 -0.064830
2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509
2021-01-05 -0.649335 -0.830602  1.191749  0.396745

>>> # FeatherStore supports appending data without loading in the full table
new_dates = pd.date_range("2021-01-06", periods=1)
df1 = pd.DataFrame(randn(1, 4), index=new_dates, columns=list("ABCD"))
store.append_table('example_table', df1)
# It also supports querying parts of the data
store.read_pandas('example_table', rows={'after': '2021-01-05'}, cols=['D', 'A'])

                   D         A
2021-01-05  0.396745 -0.649335
2021-01-06  0.606950  0.408125

Performance

FeatherStore is very fast, and in fact is one of the best performing solutions available. See the performance benchmark here.

Installation

FeatherStore can be installed by using $ pip install featherstore

Requirements

  • Python >= 3.8
  • Arrow
  • Pandas
  • Polars
  • Numpy

Documentation

Want to know about all the features FeatherStore support? Read the docs!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

FeatherStore-0.2.1-py3-none-any.whl (38.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page