Skip to main content

Use compact expressions to create diverse, deterministic DataFrame fixtures with StaticFrame

Project description

The FrameFixtures library defines a tiny domain-specific language that permits using compact expressions to create diverse, deterministic DataFrame fixtures with StaticFrame. The resulting Frame can be used for test, performance studies, or documentation examples, and can easily be converted to Pandas DataFrames or other representations available via StaticFrame.

Code: https://github.com/InvestmentSystems/frame-fixtures

Packages: https://pypi.org/project/frame-fixtures

Installation

Install FrameFixtures via PIP:

pip install frame-fixtures [extras]

The [extras] configuration includes StaticFrame as a requirement. As StaticFrame uses FrameFixtures, installation without [extras] assumes the availability of StaticFrame:

pip install frame-fixtures

Examples

Import FrameFixtures with the following convention:

>>> import frame_fixtures as ff

Create a 4 by 8 Frame of string, Booleans, and floats.

>>> ff.parse('v(str,str,bool,float)|s(4,8)')
<Frame>
<Index> 0     1     2      3         4     5     6      7         <int64>
<Index>
0       zjZQ  zaji  True   1080.4    zDVQ  zEdH  True   647.9
1       zO5l  zJnC  False  2580.34   z5hI  zB7E  True   2755.18
2       zEdH  zDdR  False  700.42    zyT8  zwIp  False  -1259.28
3       zB7E  zuVU  True   3338.48   zS6w  zDVQ  False  3442.84
<int64> <<U4> <<U4> <bool> <float64> <<U4> <<U4> <bool> <float64>

The same Frame can be converted to Pandas:

>>> ff.parse('v(str,str,bool,float)|s(4,8)').to_pandas()
      0     1      2        3     4     5      6        7
0  zjZQ  zaji   True  1080.40  zDVQ  zEdH   True   647.90
1  zO5l  zJnC  False  2580.34  z5hI  zB7E   True  2755.18
2  zEdH  zDdR  False   700.42  zyT8  zwIp  False -1259.28
3  zB7E  zuVU   True  3338.48  zS6w  zDVQ  False  3442.84

Create a 4 by 4 Frame of Booleans with three-level index and columns.

>>> ff.parse('v(bool)|i(IH,(str,int,str))|c(IH,(str,int,str))|s(4,4)')
<Frame>
<IndexHierarchy>               zZbu   zZbu   zZbu   zZbu   <<U4>
                               105269 105269 119909 119909 <int64>
                               zDVQ   z5hI   zyT8   zS6w   <<U4>
<IndexHierarchy>
zZbu             105269  zDVQ  False  False  True   False
zZbu             105269  z5hI  False  False  False  False
zZbu             119909  zyT8  False  False  False  True
zZbu             119909  zS6w  True   False  True   True
<<U4>            <int64> <<U4> <bool> <bool> <bool> <bool>

The same Frame can be converted to Pandas:

>>> ff.parse('v(bool)|i(IH,(str,int,str))|c(IH,(str,int,str))|s(4,4)').to_pandas()
__index0__                         zZbu
__index1__                       105269        119909
__index2__                         zDVQ   z5hI   zyT8   zS6w
__index0__ __index1__ __index2__
zZbu       105269     zDVQ        False  False   True  False
                      z5hI        False  False  False  False
           119909     zyT8        False  False  False   True
                      zS6w         True  False   True   True

FrameFixtures support defining features unique to StaticFrame, such as specifying a grow-only FrameGO, Index types within IndexHierarchy, and usage of np.datetime64 types other than nanoseconds. These specifications are not directly convertible to Pandas.

>>> ff.parse('f(Fg)|v(int,bool,str)|i((IY,ID),(dtY,dtD))|c(ISg,dts)|s(6,2)')
<FrameGO>
<IndexSecondGO>                  1970-01-01T09:38:35 1970-01-01T01:00:48 <datetime64[s]>
<IndexHierarchy>
36685            2258-03-21      -88017              False
36685            2298-04-20      92867               False
5618             2501-10-08      84967               False
5618             2441-04-14      13448               False
93271            2234-04-07      175579              False
93271            2210-12-26      58768               False
<datetime64[Y]>  <datetime64[D]> <int64>             <bool>

Grammar

Container Components

A FrameFixture is defined by specifying one or more container components using symbols such as s for shape and i for index. Container components (CCs) are given arguments using Python function call syntax, and multiple CCs are delimited with |. The shape CC takes integers as arguments; all other CCs take Constructor Specifiers (CS) and/or Dtype Specifiers (DS) as arguments. So a 100 by 20 Frame with an index of str is specified as s(100,20)|i(I,str), where 100 and 20 define the row and column counts, and I is the CC and str is the DS. Component symbols, whether components are required, and the number of required arguments, is summarized below.

Symbol

Component

Required

Arguments

Signature

f

Frame

False

1

(CS,)

i

Index

False

2

(CS, DS) or ((CS, …), (DS, …))

c

Columns

False

2

(CS, DS) or ((CS, …), (DS, …))

v

Values

False

unbound

(DS, …)

s

Shape

True

2

(int, int)

Constructor Specifiers

CSs are given to the f CC; the i and c CC take one or many CSs as their first argument.

Symbol

Class

F

Frame

Fg

FrameGO

I

Index

Ig

IndexGO

IH

IndexHierarchy

IHg

IndexHierarchyGO

IY

IndexYear

IYg

IndexYearGO

IYM

IndexYearMonth

IYMg

IndexYearMonthGO

ID

IndexDate

IDg

IndexDateGO

IS

IndexSecond

ISg

IndexSecondGO

IN

IndexNanosecond

INg

IndexNanosecondGO

IACF

IndexAutoConstructorFactory

Dtype Specifiers

DSs are given to the v CC, and are used repeatedly to fill all columns; the i and c CC take one or many DSs as their second argument.

Symbol

Class

dtY

dtype(‘<M8[Y]’)

dtM

dtype(‘<M8[M]’)

dtD

dtype(‘<M8[D]’)

dth

dtype(‘<M8[h]’)

dtm

dtype(‘<M8[m]’)

dts

dtype(‘<M8[s]’)

dtms

dtype(‘<M8[ms]’)

dtus

dtype(‘<M8[us]’)

dtns

dtype(‘<M8[ns]’)

tdY

dtype(‘<m8[Y]’)

tdM

dtype(‘<m8[M]’)

tdD

dtype(‘<m8[D]’)

tdh

dtype(‘<m8[h]’)

tdm

dtype(‘<m8[m]’)

tds

dtype(‘<m8[s]’)

tdms

dtype(‘<m8[ms]’)

tdus

dtype(‘<m8[us]’)

tdns

dtype(‘<m8[ns]’)

int

<class ‘int’>

str

<class ‘str’>

bytes

<class ‘bytes’>

float

<class ‘float’>

bool

<class ‘bool’>

complex

<class ‘complex’>

object

<class ‘object’>

int8

<class ‘numpy.int8’>

int16

<class ‘numpy.int16’>

int32

<class ‘numpy.int32’>

int64

<class ‘numpy.int64’>

uint8

<class ‘numpy.uint8’>

uint16

<class ‘numpy.uint16’>

uint32

<class ‘numpy.uint32’>

uint64

<class ‘numpy.uint64’>

float16

<class ‘numpy.float16’>

float32

<class ‘numpy.float32’>

float64

<class ‘numpy.float64’>

complex64

<class ‘numpy.complex64’>

complex128

<class ‘numpy.complex128’>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

frame-fixtures-0.3.0.tar.gz (15.3 kB view hashes)

Uploaded Source

Built Distribution

frame_fixtures-0.3.0-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page