Skip to main content

Data analysis using a concatenative paradigm

Project description

pynto: Data analysis in Python using the concatenative paradigm

pynto is a Python package that lets you manipulate tabular data with the expressiveness and code reusability of concatenative programming. With pynto you define an expression that formally specifies how to calculate the data in your table. Expressions are made by stringing together a sequence of functions called words. It works like a pipeline: the output from one word becomes the input for the following word. The table of data is treated like a stack of independent columns. The rightmost column in the table is the top of the stack. Words can add, remove or modify columns, but they are row-agnostic--expressions can be evaluated over any range of rows.

What does it look like?

>>> from pynto import * 
>>> stocks = csv('stocks.csv')                   # add columns to the stack
>>> ma_diff = dup | rolling(20) | wmean | sub    # define an operation
>>> stocks_ma = stocks | ~ma_diff | each         # operate on columns using quotation/combinator pattern
>>> stocks_ma['2019-01-01':]                     # evaluate your expression over certain rows

Why pynto?

  • Expressive: Foolproof syntax; Ideal for modular, reusable code
  • Performant: Efficient NumPy internals
  • Interoperable: Seemlessly integration with data analysis workflows
  • Batteries included: Datetime-based row ranges; Moving window statistics

Get pynto

pip install pynto

Reference

The Basics

Create expressions by composing words together with |. Words operate in left-to-right order, with operators following their operands in postfix style. When you assign an expression to a Python variable the variable name can be used as word in other expressions.

>>> square = dup | mul         # adds duplicate of top column to the stack, then multiplies top two columns 

The word c that adds a constant-value column to the stack. Like many pynto words, c takes a parameter in parentheses to specify the constant value c(10.0). pynto can handle any NumPy data type, but all rows in a column have to have the same type.

>>> expr = c(10.0) | square    # apply square expression to a columns of 10s

To evaluate your expression specify the range of rows you want using standard Python [start:stop:step] indexing and slicing. Indices can be ints or datetimes. For a datetime index the step is the periodicity.

>>> expr[:2]                   # evaluate first two rows                                                     
   constant
0     100.0
1     100.0

Each column has a string header that can be modified. hset sets the header to a new value. Headers can be usefully for filtering or arranging columns.

>>> expr |= hset('ten squared')
>>> expr[:2]  
   ten squared
0        100.0
1        100.0

Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. To create a quotation use ~ before a word ~square or before an expression in parentheses ~(dup | mul) for an anonymous quotation.

>>> expr = c(9.) | c(10.) | ~square | each
>>> expr[0]
   constant  constant
0      81.0     100.0

pynto vocabulary

Words for adding columns

Name Parameters Stack effect
before -- after
Description
c value -- c Adds a constant-value column.
csv csv_file, index_col=0, header='infer' -- c (c) Adds columns from csv_file.
pandas frame_or_series -- c (c) Adds columns from a pandas data structure.
c_range value -- c (c) Add constant int columns from 0 to value.

Combinators

Name Parameters Stack effect
before -- after
Description
call depth=None, copy=False a q -- c Apply quotation to stack, up to depth if specified. Optionally leaves stack in place with copy.
each start=0, stop=None, every=1, copy=False a b q -- c d Apply quotation stack elements from start to end in groups of every. Optionally leaves stack in place with copy.
cleave num_quotations, depth=None, copy=False a q q -- c d Apply num_quotations quotations to copies of stack elements up to depth. Optionally leaves stack in place with copy.

Words to manipulate columns

Name Parameters Stack effect
before -- after
Description
dup a -- a a Duplicate top column.
roll a b c -- c a b Permute columns.
swap a b -- b a Swap top two columns.
drop a b c -- a b Drop top column.
clear a b c -- Clear columns.
interleave count=None, split_into=2 a b c d -- a c b d Divide columns into split into groups and interleave group elements.
pull start,end=None,clear=False a b c -- b c a Bring columns start (to end) to the top.
hpull *headers, clear=False a b c -- b c a Bring columns with headers matching regex headers to the top. Optionally clear remainder of stack
hfilter *headers, clear=False a b c -- a Shortcut for hpull with clear=True

Words to manipulate headers

Name Parameters Stack effect
before -- after
Description
hset *headers a b -- a b Set top columns' headers to headers.
hformat format_string a -- a Apply format_string to existing headers.
happly header_function a -- a Apply header_function to existing header.

Words for arithmetic or logical operators

Name Parameters Stack effect
before -- after
Description
add a b -- c a + b
sub a b -- c a - b
mul a b -- c a * b
div a b -- c a / b
mod a b -- c a % b
exp a b -- c a ** b
eq a b -- c a == b
ne a b -- c a != b
ge a b -- c a >= b
gt a b -- c a > b
le a b -- c a <= b
lt a b -- c a < b
neg a -- c a * -1
absv a -- c abs(a)
sqrt a -- c a ** 0.5
zeroToNa a -- c Replaces zeros with np.nan

Words for creating window columns

Name Parameters Stack effect
before -- after
Description
rolling window=2, exclude_nans=True, lookback_multiplier=2 a -- w Create window column with values from most recent window rows. Exclude nan-valued rows from count unless exclude_nans. Extend history up to lookback_multiplier to look for non-nan rows.
crossing a b c -- w Create window column with cross-sectional values from the same rows of all columns.

Words for calculating statistics on window columns

Name Parameters Stack effect
before -- after
Description
wsum w -- c Sums of windows.
wmean w -- c Means of windows.
wvar w -- c Variances of windows.
wstd w -- c Standard deviations of windows.
wchange w -- c Changes between first and last rows of windows.
wpct_change w -- c Percent changes between first and last rows of windows.
wlog_change w -- c Differences of logs of first and last rows of windows.
wfirst w -- c First rows of windows.
wlast w -- c Last rows of windows.
wzscore w -- c Z-score of most recent rows within windows.

Words for cleaning up data

Name Parameters Stack effect
before -- after
Description
fill value a -- a Fill nans with value.
ffill a -- a Last observation carry-forward.
join date a b -- c Join top two columns, switching from second to first on date index.

Other words

Name Parameters Stack effect
before -- after
Description
ewma window, fill_nans=True a -- c Calculates exponentially-weighted moving average with half-life window.
wlag number w -- c Lag number rows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynto-0.1.1.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

pynto-0.1.1-py3-none-any.whl (12.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page