oamap·PyPI

Perform high-speed calculations on columnar data without creating intermediate objects.

These details have not been verified by PyPI

Project links

Project description

https://travis-ci.org/diana-hep/oamap.svg?branch=master

Introduction

Data analysts are often faced with a choice between speed and flexibility. Tabular data, such as SQL tables, can be processed rapidly enough for a truly interactive analysis session, but hierarchically nested formats, such as JSON, are better at representing relationships in complex data models. In some domains (such as particle physics), we want to perform calculations on JSON-like structures at the speed of SQL.

The key to high throughput on large datasets, particularly ones with more attributes than are accessed in a single pass, is laying out the data in “columns.” All values of an attribute should be contiguous on disk or memory because data are paged from one cache to the next in locally contiguous blocks. The ROOT and Parquet file formats represent JSON-like data in columns on disk, but these data are usually deserialized into objects for processing in memory. Higher performance can be achieved by maintaining the columnar structure through all stages of the calculation (see this talk and this paper).

The OAMap toolkit implements an Object Array Mapping in Python. Object Array Mappings, by analogy with Object Relational Mappings (ORMs) are one-to-one relationships between conceptual objects and physical arrays. You can write functions that appear to be operating on ordinary Python objects– lists, tuples, class instances– but are actually being performed on low-level, contiguous buffers (Numpy arrays). The result is fast processing of large, complex datasets with a low memory footprint.

OAMap has two primary modes: (1) pure-Python object proxies, which pretend to be Python objects but actually access array data on demand, and (2) bare-metal bytecode compiled by Numba. The pure-Python form is good for low-latency, exploratory work, while the compiled form is good for high throughput. They are seamlessly interchangeable: a Python proxy converts to the compiled form when it enters a Numba-compiled function and switches back when it leaves. You can, for instance, do a fast search in compiled code and examine the results more fully by hand.

Any columnar file format or database can be used as a data source: OAMap can get arrays of data from any dict-like object (any Python object implementing __getitem__), even from within a Numba-compiled function. Backends to ROOT, Parquet, and HDF5 are included, as well as a Python shelve alternative. Storing and accessing a complete dataset, including metadata, requires no more infrastructure than a collection of named arrays. (Data types are encoded in the names, values in the arrays.) OAMap is intended as a middleware layer above file formats and databases but below a fully integrated analysis suite.

Installation

Install OAMap like any other Python package:

pip install oamap --user

or similar (use sudo, virtualenv, or conda if you wish).

Strict dependencies:

Python (2.6+, 3.4+)
Numpy

Recommended dependencies:

Numba and LLVM to JIT-compile functions (requires a particular version of LLVM, follow instructions)
thriftpy to read Parquet files (pure Python, pip is fine)
uproot to read ROOT files (pure Python, pip is fine)
h5py to read HDF5 files (requires binary libraries; follow instructions)

Optional dependencies: (all are bindings to binaries that can be package-installed)

lz4 compression used by some ROOT and Parquet files
python-snappy compression used by some Parquet files
lzo compression used by some Parquet files
brotli compression used by some Parquet files

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.12.4

Jun 12, 2018

0.12.3

Jun 11, 2018

0.12.2

May 30, 2018

0.12.1

May 30, 2018

0.12.0

Apr 26, 2018

0.10.11

Mar 15, 2018

0.10.10

Feb 9, 2018

0.10.8

Feb 2, 2018

0.10.6

Feb 2, 2018

0.10.5

Feb 2, 2018

0.10.4

Feb 1, 2018

0.10.2

Jan 31, 2018

0.10.0

Jan 31, 2018

0.9.1

Jan 27, 2018

0.7.1

Jan 19, 2018

0.7.0

Jan 19, 2018

0.6.0

Jan 19, 2018

0.5.0

Jan 18, 2018

0.4.1

Jan 17, 2018

0.3.3

Jan 3, 2018

0.3.2

Jan 3, 2018

0.3.1

Jan 3, 2018

0.2.8

Jan 3, 2018

0.2.6

Jan 3, 2018

0.2.5

Jan 3, 2018

0.2.4

Jan 2, 2018

0.2.2

Jan 2, 2018

0.2.0

Jan 1, 2018

0.1.7

Dec 31, 2017

0.1.6

Dec 31, 2017

0.1.5

Dec 31, 2017

0.1.4

Dec 31, 2017

0.1.3

Dec 31, 2017

0.1.0

Dec 31, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oamap-0.12.4.tar.gz (68.3 kB view details)

Uploaded Jun 12, 2018 Source

File details

Details for the file oamap-0.12.4.tar.gz.

File metadata

Download URL: oamap-0.12.4.tar.gz
Upload date: Jun 12, 2018
Size: 68.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for oamap-0.12.4.tar.gz
Algorithm	Hash digest
SHA256	`5b8b32b1f30516c4d4fc047e004c54787eecaa3122e6dbe7a8597a0c0f37b553`
MD5	`5d8171d4928f71cdf1d54863bbc1105a`
BLAKE2b-256	`7ba114afc15fd1642f39281a6cdf0be8d9c612a20780bf7ffc05853ec136563f`

See more details on using hashes here.

oamap 0.12.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes