Skip to main content

Data analysis using a concatenative paradigm

Project description

pynto: Time series analysis in Python using the concatenative paradigm

pynto is a Python package that lets you manipulate tabular data with the expressiveness and code reusability of concatenative programming. With pynto you define an expression that formally specifies how to calculate the data in your table. Expressions are made by chaining together functions called words. It works like a pipeline: the output from one word becomes the input for the following word. The table of data is treated like a stack of independent columns. The rightmost column in the table is the top of the stack. Words can add, remove or modify columns, but they are row-agnostic--expressions can be evaluated over any range of rows.

What does it look like?

>>> import pynto as pt 
>>> ma_dev = pt.saved('stock_prices') \         # define an expression that appends columns from the build-in database
>>>     .quote(pt.dup.rolling(20).mean.sub) \   # append a quoted expression to the stack
>>>     .each                                   # use the each combinator to apply the quotation to the previous columns
>>> df = ma_dev['2021-06-01':]                  # evaluate your expression over a date range to get a DataFrame
>>> pt.db['stocks_ma_dev'] = df                 # save the results back to the database   

Why pynto?

  • Expressive: Pythonic syntax; Combinatory logic for modular, reusable code
  • Performant: Column-wise multiprocessing; caching of duplicate operations
  • Batteries included: Built-in Redis-based time series database
  • Interoperable: Seemlessly integration with Pandas

Get pynto

pip install pynto

Reference

The Basics

Create expressions by chaining built-in words together using a fluent interface, or using the + operator for words in the local namespace. When evaluated, words will operate in left-to-right order, with operators following their operands in postfix style.

>>> square = pt.dup.mul         # duplicates the top column, then multiplies top two columns together

The word c that adds a constant-value column to the stack. Like many pynto words, c takes a parameter in parentheses to specify the constant value c(10.0). pynto can handle any NumPy data type, but all rows in a column have to have the same type.

>>> expr = pt.c(10.0) + square    # compose square expression with a columns of 10s

To evaluate your expression, specify a date range using the [start:stop (exclusive):periodicity] syntax. (Instances of pt.Range in the local namespace also work.)

>>> expr['2021-06-01':'2021-06-03','B']                   # evaluate over a two business day date range                                                   
             constant
2021-06-01     100.0
2021-06-02     100.0

Each column has a string header that can be modified. hset sets the header to a new value. Headers can be usefully for filtering or arranging columns.

>>> expr += pt.hset('ten squared')
>>> expr['2021-06-01':'2021-06-03','B']  
               ten squared
2021-06-01        100.0
2021-06-02        100.0

Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. To create a quotation put an expression within the parentheses of pt.quote().

>>> expr = pt.c(9.).c(10.).quote(square).each
>>> expr['2021-06-01':'2021-06-02']
              constant  constant
2021-06-01      81.0     100.0

The Database

pynto has built-in time series database functionality that lets you save pandas DataFrames and Series to a Redis database. The database uses only String key/values, saving the underlying numpy data in native byte format for zero-copy retrieval. Each DataFrame column is saved as an independent key and can be retrieved on its own. The database also supports three-dimensional frames that have a two-level MultiIndex. Saved data can be accessed through pynto expressions or standard methods

>>> pt.db['my_df'] = expr['2021-06-01':'2021-06-03']
>>> pt.saved('my_df')['2021-06-01':'2021-06-02']
              constant  constant
2021-06-01      81.0     100.0
>>> pt.db.read('my_df')
              constant  constant
2021-06-01      81.0     100.0
2021-06-02      81.0     100.0

pynto built-in vocabulary

Words for adding columns

Name Parameters Stack effect
before -- after
Description
c value -- c Adds a constant-value column.
csv csv_file, index_col=0, header='infer' -- c (c) Adds columns from csv_file.
pandas frame_or_series -- c (c) Adds columns from a pandas data structure.
c_range value -- c (c) Add constant int columns from 0 to value.

Combinators

Name Parameters Stack effect
before -- after
Description
call depth=None, copy=False a q -- c Apply quotation to stack, up to depth if specified. Optionally leaves stack in place with copy.
each start=0, stop=None, every=1, copy=False a b q -- c d Apply quotation stack elements from start to end in groups of every. Optionally leaves stack in place with copy.
repeat times a b q -- c d Apply quotation to the stack times times.
cleave num_quotations, depth=None, copy=False a q q -- c d Apply num_quotations quotations to copies of stack elements up to depth. Optionally leaves stack in place with copy.

Words to manipulate columns

Name Parameters Stack effect
before -- after
Description
dup a -- a a Duplicate top column.
roll a b c -- c a b Permute columns.
swap a b -- b a Swap top two columns.
drop a b c -- a b Drop top column.
clear a b c -- Clear columns.
interleave count=None, split_into=2 a b c d -- a c b d Divide columns into split into groups and interleave group elements.
pull start,end=None,clear=False a b c -- b c a Bring columns start (to end) to the top.
hpull *headers, clear=False a b c -- b c a Bring columns with headers matching regex headers to the top. Optionally clear remainder of stack
hfilter *headers, clear=False a b c -- a Shortcut for hpull with clear=True

Words to manipulate headers

Name Parameters Stack effect
before -- after
Description
hset *headers a b -- a b Set top columns' headers to headers.
hformat format_string a -- a Apply format_string to existing headers.
happly header_function a -- a Apply header_function to existing header.

Words for arithmetic or logical operators

Name Parameters Stack effect
before -- after
Description
add a b -- c a + b
sub a b -- c a - b
mul a b -- c a * b
div a b -- c a / b
mod a b -- c a % b
pow a b -- c a ** b
eq a b -- c a == b
ne a b -- c a != b
ge a b -- c a >= b
gt a b -- c a > b
le a b -- c a <= b
lt a b -- c a < b
neg a -- c a * -1
abs a -- c abs(a)
sqrt a -- c a ** 0.5
zeroToNa a -- c Replaces zeros with np.nan

Words for creating window columns

Name Parameters Stack effect
before -- after
Description
rolling window=2, exclude_nans=True, lookback_multiplier=2 a -- w Create window column with values from most recent window rows. Exclude nan-valued rows from count unless exclude_nans. Extend history up to lookback_multiplier to look for non-nan rows.
crossing a b c -- w Create window column with cross-sectional values from the same rows of all columns.

Words for calculating statistics on window columns

Name Parameters Stack effect
before -- after
Description
sum w -- c Sums of windows.
mean w -- c Means of windows.
var w -- c Variances of windows.
std w -- c Standard deviations of windows.
change w -- c Changes between first and last rows of windows.
pct_change w -- c Percent changes between first and last rows of windows.
log_change w -- c Differences of logs of first and last rows of windows.
first w -- c First rows of windows.
last w -- c Last rows of windows.
zscore w -- c Z-score of most recent rows within windows.

Words for cleaning up data

Name Parameters Stack effect
before -- after
Description
fill value a -- a Fill nans with value.
ffill a -- a Last observation carry-forward.
join date a b -- c Join top two columns, switching from second to first on date index.

Other words

Name Parameters Stack effect
before -- after
Description
ewma window, fill_nans=True a -- c Calculates exponentially-weighted moving average with half-life window.
wlag number w -- c Lag number rows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynto-0.3.tar.gz (23.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page