Python Package for riptable studies framework

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Project description

RipTable

All in one, high performance 64 bit python analytics engine for numpy arrays with multithreaded support.

Support for Python 3.9 thru 3.11 on 64 bit Linux, Windows, and Mac OS.

Enhances or replaces numpy, pandas, and includes high speed cross platform SDS file format. RipTable can often crunch numbers at 1.5x to 10x the speed of numpy or pandas.

Maximum speed is achieved through the use of vector instrinsics: hand rolled loops, using AVX-256 with AVX-512 support coming; parallel computing: for large arrays, multiple threads are deployed; recycling: built in array garbage collection; hashing and parallel sorts for core algorithms.

Install

pip install riptable

Documentation: readthedocs

Basic Concepts and Classes

FastArray: subclasses from a numpy array with builtin multithreaded number crunching. All scikit routines that expect a numpy array will also accept a FastArray since it is subclassed. isinstance(fastarray, np.ndarray) will return True.

Dataset: replaces the pandas DataFrame class and holds equal row length numpy arrays (including > 1 dimension).

Struct: replaces the pandas Series class. A Struct is a grab bag collection class that Dataset subclasses from.

Categorical: replaces both pandas groupby and Categorical class. RipTable Categoricals are multikey, filterable, stackable, archivable, and can chain computations such as apply_reduce loops. They can do everything groupby can plus more.

Date/Time Classes: DateTimeNano, Date, TimeSpan, and DateSpan are designed more like Java, C++, or C# classes. Replaces most numpy and pandas date time classes.

Accum2/AccumTable: For cross tabulation.

SDS: a new file format which can stack multiple datasets in multiple files with zstd compression, threads, and no extra memory copies. SDS also supports loading and writing datasets to shared memory.

Getting Started

import riptable as rt
ds = rt.Dataset({'intarray': rt.arange(1_000_000), 'floatarray': rt.arange(1_000_000.0)})
ds
ds.intarray.sum()

Numpy Users

FastArray is a numpy array, however they can be flipped back and forth with no array copies taking place (it just changes the view).

import riptable as rt
import numpy as np
a = rt.arange(100)
numpyarray = a._np
fastarray = rt.FA(numpyarray)

or directly by changing the view, note how a FastArray is a numpy array

numpyarray.view(rt.FastArray)
fastarry.view(np.ndarray)
ininstance(fastarray, np.ndarray)

Pandas Users

Simply drop a pandas DataFrame class into a riptable Dataset and it will be auto converted.

import riptable as rt
import numpy as np
import pandas as pd
df = pd.DataFrame({'intarray': np.arange(1_000_000), 'floatarray': np.arange(1_000_000.0)})
ds = rt.Dataset(df)

How can I contribute?

RipTable has been public open sourced because it needs more users and contributions to take it to the next level. The RipTable team is confident the engine is the next generation building block for python data analytics computing. We need help from reporting bugs, docs, improved functionality, and new functionality. Please consider a github pull request or email the team.

See the contributing guide for more information.

How can I trust RipTable calculations?

RipTable has been in development for 3 years and tested by dozens of quants at a large financial firm. It has a full suite of testing. However just like any project, we still disover bugs and improvements. Please report them using github issues.

How can RipTable perform the same calculations faster?

RipTable was written from day one to handle large data and mulithreading using the riptide_cpp layer for basic arithmetic functions and algorithms. Many core algorithms have been painstakingly rewritten for multithreading.

Why doesn't numpy or pandas just pick up the same code?

numpy does not have a multithreaded layer (we are in discussions with the numpy team to add such a layer), nor is it designed to use C++ templates or hashing algorithms. pandas does not have a C++ layer (it uses cython instead) and is a victim of its own success making early design mistakes difficult to change (such as the block manager and lack of powerful Categoricals).

Small, Medium, and Large array performance

RipTable is designed for all sizes of arrays. For small arrays (< 100 length), low processing overhead is important. RipTable's FastArray is written in hand coded 'C' and processes simple arithmetic functions faster than numpy arrays. For medium arrays (< 100,000 length), RipTable has vector instrinic loops. For large arrays (>= 100,000) RipTable knows how to dynamically scale out threading, waking up threads efficiently using a futex.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

1.17.1

Apr 25, 2024

1.17.0

Apr 23, 2024

1.16.1

Apr 9, 2024

1.16.0

Mar 27, 2024

1.15.0

Mar 7, 2024

1.14.5

Feb 7, 2024

1.14.4

Jan 31, 2024

1.14.3

Jan 4, 2024

1.14.2

Dec 15, 2023

1.14.1

Dec 1, 2023

1.14.0

Nov 2, 2023

1.13.4

Oct 5, 2023

1.13.3

Oct 5, 2023

1.13.2

Sep 27, 2023

1.13.1

Sep 12, 2023

1.13.0

Aug 31, 2023

This version

1.12.0

Aug 16, 2023

1.11.0

Aug 1, 2023

1.10.0

Jul 24, 2023

1.9.2

Jul 12, 2023

1.9.1

Jun 22, 2023

1.9.0

Jun 14, 2023

1.8.1

May 26, 2023

1.8.0

May 18, 2023

1.7.0

May 9, 2023

1.6.11

Apr 18, 2023

1.6.10

Apr 13, 2023

1.6.9

Apr 5, 2023

1.6.8

Mar 28, 2023

1.6.7

Mar 7, 2023

1.6.6

Mar 6, 2023

1.6.5

Feb 22, 2023

1.6.4

Jan 25, 2023

1.6.3

Jan 17, 2023

1.6.2

Jan 9, 2023

1.6.1

Dec 21, 2022

1.6.0

Dec 15, 2022

1.5.1

Dec 1, 2022

1.5.0

Nov 9, 2022

1.4.2

Oct 27, 2022

1.4.1

Oct 21, 2022

1.4.0

Oct 18, 2022

1.3.6

Aug 24, 2022

1.3.5

Apr 18, 2022

1.3.4

Apr 13, 2022

1.3.3

Mar 11, 2022

1.3.2

Mar 9, 2022

1.3.1

Feb 23, 2022

1.2.9

Jan 31, 2022

1.2.8

Jan 19, 2022

1.2.7

Jan 19, 2022

1.2.6

Dec 23, 2021

1.2.5

Dec 15, 2021

1.2.4

Dec 8, 2021

1.2.3

Dec 8, 2021

1.2.2

Dec 2, 2021

1.2.1

Nov 18, 2021

1.2.0

Nov 16, 2021

1.1.4

Oct 22, 2021

1.1.3

Oct 8, 2021

1.1.2

Oct 6, 2021

1.1.1

Oct 1, 2021

1.1.0

Aug 11, 2021

1.0.58

Aug 11, 2021

1.0.57

Jul 13, 2021

1.0.56

Jul 2, 2021

1.0.55

Jun 29, 2021

1.0.54

Jun 3, 2021

1.0.53

Jun 1, 2021

1.0.42

Jan 20, 2021

1.0.41

Jan 11, 2021

1.0.40

Dec 22, 2020

1.0.39

Dec 21, 2020

1.0.38

Dec 10, 2020

1.0.37

Dec 10, 2020

1.0.36

Dec 9, 2020

1.0.35

Dec 4, 2020

1.0.34

Dec 4, 2020

1.0.33

Dec 3, 2020

1.0.32

Nov 30, 2020

1.0.31

Nov 27, 2020

1.0.29

Nov 25, 2020

1.0.27

Nov 23, 2020

1.0.26

Nov 17, 2020

1.0.25

Oct 22, 2020

1.0.24

Oct 20, 2020

1.0.19

Sep 22, 2020

1.0.17

Sep 18, 2020

1.0.15

Sep 17, 2020

1.0.11

Sep 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

riptable-1.12.0.tar.gz (1.6 MB view details)

Uploaded Aug 16, 2023 Source

File details

Details for the file riptable-1.12.0.tar.gz.

File metadata

Download URL: riptable-1.12.0.tar.gz
Upload date: Aug 16, 2023
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for riptable-1.12.0.tar.gz
Algorithm	Hash digest
SHA256	`7a882bbcdb8a6dc6e92c094d9a9c43ecf819b13f2a9f270dcc3da7e04576d3cc`
MD5	`3054cd16748a2c8449eca2fe35f13072`
BLAKE2b-256	`bace2dd60bfa91b59eb3a780d21c9c901fb00975316395ebea21f56f9dc09664`

See more details on using hashes here.

riptable 1.12.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RipTable

Install

Basic Concepts and Classes

Getting Started

Numpy Users

Pandas Users

How can I contribute?

How can I trust RipTable calculations?

How can RipTable perform the same calculations faster?

Why doesn't numpy or pandas just pick up the same code?

Small, Medium, and Large array performance

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes