Skip to main content

Data analysis using a concatenative paradigm

Project description

pynto logo

pynto: Data analysis in Python using stack-based programming

pynto is a Python package that lets you manipulate a data frame as a stack of columns, using the the expressiveness of the concatenative/stack-oriented paradigm.

How does it work?

With pynto you chain together functions called words to formally specify how to calculate each column of your data frame. The composed words can be lazily evaluated over any range of rows to create your data frame.

Words add, remove or modify columns. They can operate on the entire stack or be limited to a certain columns using a column indexer. Composed words will operate in left-to-right order, with operators following their operands in postfix (Reverse Polish Notation) style. More complex operations can be specified using quotations, anonymous blocks of words that do not operate immediately, and combinators, higher-order words that control the execution of quotations.

What does it look like?

Here's a program to calculate deviations from moving average for each column in a table using the combinator/quotation pattern.

>>> import pynto as pt 
>>> ma_dev = (                        # create a pynto expression by concatenating words to
>>>     pt.saved('stock_prices')      # append columns to stack from the build-in database
>>>     .q                            # start a quotation 
>>>         .dup                      # push a copy of the top (leftmost) column of the stack
>>>         .ravg(20)                 # calculate 20-period moving average
>>>         .sub                      # subtract top column from second column 
>>>     .p                            # close the quotation
>>>     .map                          # use the map combinator to apply the quotation
>>> )                                 # to each column in the stack
>>>
>>> df = ma_dev.rows['2021-06-01':]         # evaluate over a range of rows to get a DataFrame
>>> pt.db['stocks_ma_dev'] = df             # save the results back to the database   

Why pynto?

  • Expressive: Pythonic syntax; Combinatory logic for modular, reusable code
  • Performant: Memoization to eliminate duplicate operations
  • Batteries included: Built-in time series database
  • Interoperable: Seemlessly integration with Pandas/numpy

Get pynto

pip install pynto

Reference

The Basics

Constant literals

Add constant-value columns to the stack using literals that start with c, followed by a number with - and . characters replaced by _. rn adds whole number-value constant columns up to n - 1.

>>> # Compose _words_ that add a column of 10s to the stack, duplicate the column, 
>>> # and then multiply the columns together
>>> ten_squared = pt.c10_0.dup.mul         

Row indexers

To evaluate your expression, you use a row indexer. Specify rows by date range using the .rows[start:stop (exclusive):periodicity] syntax. None slicing arguments default to the widest range available. int indices also work with the .rows indexer. .first, and .last are included for convenience.

>>> ten_squared.rows['2021-06-01':'2021-06-03','B']                   # evaluate over a two business day date range                                                   
                 c
2021-06-01     100.0
2021-06-02     100.0

Quotations and Combinators

Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. To push a quotation to the stack, put words in between q and p (or put an expression in the local namespace within the parentheses of pt.q(_expression_)). THe map combinator evaluated a quotation at the top of the stack over each column below in the stack.

>>> pt.c9.c10.q.dup.mul.p.map.last
                 c         c
2021-06-02      81.0     100.0

Headers

Each column has a string header. hset sets the header to a new value. Headers are useful for filtering or arranging columns.

>>> pt.c9.c10.q.dup.mul.p.map.hset('a','b').last
                 a         b
2021-06-02      81.0     100.0

Column indexers

Column indexers specify the columns on which a word operates, overiding the word's default. Postive int indices start from the bottom (left) of the stack and negative indices start from the top.

By default add has a column indexer of [-2:]

>>> pt.r5.add.last
              c    c    c    c
2021-06-02  0.0  1.0  2.0  7.0

Change the column indexer of add to [:] to sum all columns

>>> pt.r5.add[:].last
               c
2025-06-02  10.0

You can also index columns by header, using regular expressions

>>> pt.r3.hset('a,b,c').add['(a|c)'].last
              b    a
2025-06-02  1.0  2.0

Defining words

Words in the local namespace can be composed using the + operator.

>>> squared = pt.dup.mul
>>> ten_squared2 = pt.c10_0 + squared    # same thing

Words can also be defined globally in the pynto vocabulary.

>>> pt.define['squared'] = pt.dup.mul
>>> ten_squared3 = pt.c10_0.squared    # same thing

The Database

pynto has built-in database functionality that lets you save DataFrames and Series to a Redis database. The database saves the underlying numpy data in native byte format for zero-copy retrieval. Each DataFrame column is saved as an independent key and can be retrieved or updated on its own. The database also supports three-dimensional frames that have a two-level MultiIndex.

>>> pt.db['my_df'] = expr.rows['2021-06-01':'2021-06-03']
>>> pt.saved('my_df').rows[:]
              constant  constant
2021-06-01      81.0     100.0
2021-06-02      81.0     100.0

pynto built-in vocabulary

Column Creation

Word Default Selector Parameters Description

c|[-1:]|values|Pushes constant columns for each of values

dc|[-1:]||Pushes a column with the number of days in the period

nan|[-1:]|values|Pushes a constant nan-valued column

pandas|[:]|pandas, round_|Pushes columns from Pandas DataFrame or Series pandas

po|[-1:]||Pushes a column with the period ordinal

r|[-1:]|n|Pushes constant columns for each whole number from 0 to n - 1

randn|[-1:]||Pushes a column with values from a random normal distribution

saved|[-1:]||Pushes columns saved to internal DB as key

ts|[-1:]||Pushes a column with the timestamp of the end of the period

Stack Manipulation

Word Default Selector Parameters Description

drop|[-1:]||Removes selected columns

dup|[-1:]||Duplicates columns

filter|[:]||Removes non-selected columns

hsort|[:]||Sorts columns by header

id|[:]||Identity/no-op

interleave|[:]|parts|Divides columns in parts groups and interleaves the groups

nip|[-1:]||Removes non-selected columns, defaulting selection to top

pull|[:]||Brings selected columns to the top

rev|[:]||Reverses the order of selected columns

roll|[:]||Permutes selected columns

swap|[-2:]||Swaps top and bottom selected columns

Quotation

Word Default Selector Parameters Description

q|[-1:]|quoted, this|Wraps the following words until p as a quotation, or wraps quoted expression as a quotation

Header manipulation

Word Default Selector Parameters Description

halpha|[:]||Set headers to alphabetical values

happly|[:]|header_func|Apply header_func to headers_

hformat|[:]|format_spec|Apply format_spec to headers

hreplace|[:]|old, new|Replace old with new in headers

hset|[:]|headers|Set headers to *headers

hsetall|[:]|headers|Set headers to *headers repeating, if necessary

Combinators

Word Default Selector Parameters Description

call|[:]||Applies quotation

cleave|[:]|num_quotations|Applies all preceding quotations

compose|[:]|num_quotations|Combines quotations

hmap|[:]||Applies quotation to stacks created grouping columns by header

ifexists|[:]|count|Applies quotation if stack has at least count columns

ifexistselse|[:]|count|Applies top quotation if stack has at least count columns, otherwise applies second quotation

ifheaders|[:]|predicate|Applies top quotation if list of column headers fulfills predicate

ifheaderselse|[:]|predicate|Applies quotation if list of column headers fulfills predicate, otherwise applies second quotation

map|[:]|every|Applies quotation in groups of every

partial|[-1:]|quoted, this|Pushes stack columns to the front of quotation

repeat|[:]|times|Applies quotation times times

Data cleanup

Word Default Selector Parameters Description

ffill|[:]|lookback, leave_end|Fills nans with previous values, looking back lookback before range and leaving trailing nans unless not leave_end

fill|[:]||Fills nans with value

fillfirst|[-1:]|lookback|Fills first row with previous non-nan value, looking back lookback before range

join|[-2:]|date|Joins two columns at date

sync|[:]||Align available data by setting all values to NaN when any values is NaN

zero_first|[-1:]||Changes first value to zero

zero_to_na|[-1:]||Changes zeros to nans

Resample methods

Word Default Selector Parameters Description

per|[-1:]|periodicity|Changes column periodicity to periodicity, then resamples

resample_avg|[:]||Sets periodicity resampling method to avg

resample_first|[:]||Sets periodicity resampling method to first

resample_firstnofill|[:]||Sets periodicity resampling method to first

resample_last|[:]||Sets periodicity resampling method to last

resample_lastnofill|[:]||Sets periodicity resampling method to last with no fill

resample_max|[:]||Sets periodicity resampling method to max

resample_min|[:]||Sets periodicity resampling method to min

resample_sum|[:]||Sets periodicity resampling method to sum

start|[-1:]|start|Changes period start to start, then resamples

Row-wise Reduction

Word Default Selector Parameters Description

add|[-2:]|ignore_nans|Addition

avg|[-2:]|ignore_nans|Arithmetic average

div|[-2:]|ignore_nans|Division

max|[-2:]|ignore_nans|Maximum

med|[-2:]|ignore_nans|Median

min|[-2:]|ignore_nans|Minimum

mod|[-2:]|ignore_nans|Modulo

mul|[-2:]|ignore_nans|Multiplication

pow|[-2:]|ignore_nans|Power

std|[-2:]|ignore_nans|Standard deviation

sub|[-2:]|ignore_nans|Subtraction

var|[-2:]|ignore_nans|Variance

Row-wise Reduction Ignoring NaNs

Word Default Selector Parameters Description

nadd|[-2:]|ignore_nans|Addition

navg|[-2:]|ignore_nans|Arithmetic average

ndiv|[-2:]|ignore_nans|Division

nmax|[-2:]|ignore_nans|Maximum

nmed|[-2:]|ignore_nans|Median

nmin|[-2:]|ignore_nans|Minimum

nmod|[-2:]|ignore_nans|Modulo

nmul|[-2:]|ignore_nans|Multiplication

npow|[-2:]|ignore_nans|Power

nstd|[-2:]|ignore_nans|Standard deviation

nsub|[-2:]|ignore_nans|Subtraction

nvar|[-2:]|ignore_nans|Variance

Rolling Window

Word Default Selector Parameters Description

radd|[-1:]|window|Addition

ravg|[-1:]|window|Arithmetic average

rcor|[-2:]|window|Correlation

rcov|[-2:]|window|Covariance

rdif|[-1:]|window|Lagged difference

rewm|[-1:]|window|Exponentially-weighted average

rews|[-1:]|window|Exponentially-weighted standard deviation

rewv|[-1:]|window|Exponentially-weighted variance

rlag|[-1:]|window|Lag

rmax|[-1:]|window|Maximum

rmed|[-1:]|window|Median

rmin|[-1:]|window|Minimum

rret|[-1:]|window|Lagged return

rstd|[-1:]|window|Standard deviation

rvar|[-1:]|window|Variance

rzsc|[-1:]|window|Z-score

Cumulative

Word Default Selector Parameters Description

cadd|[-1:]||Addition

cavg|[-1:]||Arithmetic average

cdif|[-1:]||Lagged difference

clag|[-1:]||Lag

cmax|[-1:]||Maximum

cmin|[-1:]||Minimum

cmul|[-1:]||Multiplication

cret|[-1:]||Lagged return

cstd|[-1:]||Standard deviation

csub|[-1:]||Subtraction

cvar|[-1:]||Variance

Reverse Cumulative

Word Default Selector Parameters Description

rcadd|[-1:]||Addition

rcavg|[-1:]||Arithmetic average

rcdif|[-1:]||Lagged difference

rclag|[-1:]||Lag

rcmax|[-1:]||Maximum

rcmin|[-1:]||Minimum

rcmul|[-1:]||Multiplication

rcret|[-1:]||Lagged return

rcstd|[-1:]||Standard deviation

rcsub|[-1:]||Subtraction

rcvar|[-1:]||Variance

One-for-one functions

Word Default Selector Parameters Description

abs|[-1:]||Absolute value

dec|[-1:]||Decrement

exp|[-1:]||Exponential

expm1|[-1:]||Exponential minus one

inc|[-1:]||Increment

inv|[-1:]||Multiplicative inverse

lnot|[-1:]||Logical not

log|[-1:]||Natural log

log1p|[-1:]||Natural log of increment

neg|[-1:]||Additive inverse

rank|[:]||Row-wise rank

sign|[-1:]||Sign

sqrt|[-1:]||Square root

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynto-2.3.0.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pynto-2.3.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file pynto-2.3.0.tar.gz.

File metadata

  • Download URL: pynto-2.3.0.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for pynto-2.3.0.tar.gz
Algorithm Hash digest
SHA256 ba31ebeadd06754a7ae08df28cebe0e8bb7505bd7b8d59c41a28134ab65ea503
MD5 14b77037be9697dc7d1fce457a0f6beb
BLAKE2b-256 12d97f8d9ad3f3f3f06886735ab6efd3aaa85031abbea6d9bb9f4547071906c7

See more details on using hashes here.

File details

Details for the file pynto-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: pynto-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for pynto-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 65a7226dc4e260b2c11c14500361f896d16ba408f7ef6b1a553657ca2fdf7f2b
MD5 78a0351752364587a327844d0cf77ea9
BLAKE2b-256 0f28c0c7542c1e20156bb2fd794bf1d081897a2f7204709c15b3998e22a32fe7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page