AHL Research Versioned TimeSeries and Tick store
Project description
# [![arctic](logo/arctic_50.png)](https://github.com/manahl/arctic) [Arctic TimeSeries and Tick store](https://github.com/manahl/arctic)
[![Circle CI](https://circleci.com/gh/manahl/arctic.svg?style=shield)](https://circleci.com/gh/manahl/arctic)
[![Coverage Status](https://coveralls.io/repos/github/manahl/arctic/badge.svg?branch=master)](https://coveralls.io/github/manahl/arctic?branch=master)
[![Join the chat at https://gitter.im/manahl/arctic](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/manahl/arctic?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
Arctic is a high performance datastore for numeric data. It supports [Pandas](http://pandas.pydata.org/),
[numpy](http://www.numpy.org/) arrays and pickled objects out-of-the-box, with pluggable support for
other data types and optional versioning.
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth,
~10x compression on disk, and scales to hundreds of millions of rows per second per
[MongoDB](https://www.mongodb.org/) instance.
Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.
## Quickstart
### Install Arctic
```
pip install git+https://github.com/manahl/arctic.git
```
### Run a MongoDB
```
mongod --dbpath <path/to/db_directory>
```
### Using VersionStore
```
from arctic import Arctic
# Connect to Local MONGODB
store = Arctic('localhost')
# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')
# Access the library
library = store['NASDAQ']
# Load some data - maybe from Quandl
aapl = Quandl.get("NASDAQ/AAPL", authtoken="your token here")
# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})
# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata
```
VersionStore supports much more: [See the HowTo](howtos/how_to_use_arctic.py)!
### Adding your own storage engine
Plugging a custom class in as a library type is straightforward. [This example
shows how.](howtos/how_to_custom_arctic_library.py)
## Concepts
### Libraries
Arctic provides namespaced *libraries* of data. These libraries allow
bucketing data by *source*, *user* or some other metric (for example frequency:
End-Of-Day; Minute Bars; etc.).
Arctic supports multiple data libraries per user. A user (or namespace)
maps to a MongoDB database (the granularity of mongo authentication). The library
itself is composed of a number of collections within the database. Libraries look like:
* user.EOD
* user.ONEMINUTE
A library is mapped to a Python class. All library databases in MongoDB are prefixed with 'arctic_'
### Storage Engines
Arctic includes two storage engines:
* [VersionStore](arctic/store/version_store.py): a key-value versioned TimeSeries store. It supports:
* Pandas data types (other Python types pickled)
* Multiple versions of each data item. Can easily read previous versions.
* Create point-in-time snapshots across symbols in a library
* Soft quota support
* Hooks for persisting other data types
* Audited writes: API for saving metadata and data before and after a write.
* a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
* [See the HowTo](howtos/how_to_use_arctic.py)
* [TickStore](arctic/tickstore/tickstore.py): Column oriented tick database. Supports
dynamic fields, chunks aren't versioned. Designed for large continuously ticking data.
Arctic storage implementations are **pluggable**. VersionStore is the default.
## Requirements
Arctic currently works with:
* Python 2.7
* pymongo >= 3.0
* Pandas
* MongoDB >= 2.4.x
## Acknowledgements
Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.
It wouldn't be possible without the work of the AHL Data Engineering Team including:
* [Richard Bounds](https://github.com/richardbounds)
* [James Blackburn](https://github.com/jamesblackburn)
* [Vlad Mereuta](https://github.com/vmereuta)
* [Tom Taylor](https://github.com/TomTaylorLondon)
* Tope Olukemi
* Drake Siard
* [Slavi Marinov](https://github.com/slavi)
* [Wilfred Hughes](https://github.com/wilfred)
* [Edward Easton](https://github.com/eeaston)
* ... and many others ...
Contributions welcome!
## License
Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in [LICENSE](LICENSE)
## Changelog
### 1.12 (2015-11-12)
* Bugfix: correct version detection for Pandas >= 0.18.
* Bugfix: retrying connection initialisation in case of an AutoReconnect failure.
### 1.11 (2015-10-29)
* Bugfix: Improve performance of saving multi-index Pandas DataFrames
by 9x
* Bugfix: authenticate should propagate non-OperationFailure exceptions
(e.g. ConnectionFailure) as this might be indicative of socket failures
* Bugfix: return 'deleted' state in VersionStore.list_versions() so that
callers can pick up on the head version being the delete-sentinel.
### 1.10 (2015-10-28)
* Bugfix: VersionStore.read(date_range=...) could do the wrong thing with
TimeZones (which aren't yet supported for date_range slicing.).
### 1.9 (2015-10-06)
* Bugfix: fix authentication race condition when sharing an Arctic
instance between multiple threads.
### 1.8 (2015-09-29)
* Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for
querying current authentications
### 1.7 (2015-09-18)
* Feature: Add support for reading a subset of a pandas DataFrame
in VersionStore.read by passing in an arctic.date.DateRange
* Bugfix: Reauth against admin if not auth'd against a library a
specific library's DB. Sometimes we appear to miss admin DB auths.
This is to workaround that until we work out what the issue is.
### 1.6 (2015-09-16)
* Feature: Add support for multi-index Bitemporal DataFrame storage.
This allows persisting data and changes within the DataFrame making it
easier to see how old data has been revised over time.
* Bugfix: Ensure we call the error logging hook when exceptions occur
### 1.5 (2015-09-02)
* Always use the primary cluster node for 'has_symbol()', it's safer
### 1.4 (2015-08-19)
* Bugfixes for timezone handling, now ensures use of non-naive datetimes
* Bugfix for tickstore read missing images
### 1.3 (2015-08-011)
* Improvements to command-line control scripts for users and libraries
* Bugfix for pickling top-level Arctic object
### 1.2 (2015-06-29)
* Allow snapshotting a range of versions in the VersionStore, and
snapshot all versions by default.
### 1.1 (2015-06-16)
* Bugfix for backwards-compatible unpickling of bson-encoded data
* Added switch for enabling parallel lz4 compression
### 1.0 (2015-06-14)
* Initial public release
[![Circle CI](https://circleci.com/gh/manahl/arctic.svg?style=shield)](https://circleci.com/gh/manahl/arctic)
[![Coverage Status](https://coveralls.io/repos/github/manahl/arctic/badge.svg?branch=master)](https://coveralls.io/github/manahl/arctic?branch=master)
[![Join the chat at https://gitter.im/manahl/arctic](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/manahl/arctic?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
Arctic is a high performance datastore for numeric data. It supports [Pandas](http://pandas.pydata.org/),
[numpy](http://www.numpy.org/) arrays and pickled objects out-of-the-box, with pluggable support for
other data types and optional versioning.
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth,
~10x compression on disk, and scales to hundreds of millions of rows per second per
[MongoDB](https://www.mongodb.org/) instance.
Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.
## Quickstart
### Install Arctic
```
pip install git+https://github.com/manahl/arctic.git
```
### Run a MongoDB
```
mongod --dbpath <path/to/db_directory>
```
### Using VersionStore
```
from arctic import Arctic
# Connect to Local MONGODB
store = Arctic('localhost')
# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')
# Access the library
library = store['NASDAQ']
# Load some data - maybe from Quandl
aapl = Quandl.get("NASDAQ/AAPL", authtoken="your token here")
# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})
# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata
```
VersionStore supports much more: [See the HowTo](howtos/how_to_use_arctic.py)!
### Adding your own storage engine
Plugging a custom class in as a library type is straightforward. [This example
shows how.](howtos/how_to_custom_arctic_library.py)
## Concepts
### Libraries
Arctic provides namespaced *libraries* of data. These libraries allow
bucketing data by *source*, *user* or some other metric (for example frequency:
End-Of-Day; Minute Bars; etc.).
Arctic supports multiple data libraries per user. A user (or namespace)
maps to a MongoDB database (the granularity of mongo authentication). The library
itself is composed of a number of collections within the database. Libraries look like:
* user.EOD
* user.ONEMINUTE
A library is mapped to a Python class. All library databases in MongoDB are prefixed with 'arctic_'
### Storage Engines
Arctic includes two storage engines:
* [VersionStore](arctic/store/version_store.py): a key-value versioned TimeSeries store. It supports:
* Pandas data types (other Python types pickled)
* Multiple versions of each data item. Can easily read previous versions.
* Create point-in-time snapshots across symbols in a library
* Soft quota support
* Hooks for persisting other data types
* Audited writes: API for saving metadata and data before and after a write.
* a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
* [See the HowTo](howtos/how_to_use_arctic.py)
* [TickStore](arctic/tickstore/tickstore.py): Column oriented tick database. Supports
dynamic fields, chunks aren't versioned. Designed for large continuously ticking data.
Arctic storage implementations are **pluggable**. VersionStore is the default.
## Requirements
Arctic currently works with:
* Python 2.7
* pymongo >= 3.0
* Pandas
* MongoDB >= 2.4.x
## Acknowledgements
Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.
It wouldn't be possible without the work of the AHL Data Engineering Team including:
* [Richard Bounds](https://github.com/richardbounds)
* [James Blackburn](https://github.com/jamesblackburn)
* [Vlad Mereuta](https://github.com/vmereuta)
* [Tom Taylor](https://github.com/TomTaylorLondon)
* Tope Olukemi
* Drake Siard
* [Slavi Marinov](https://github.com/slavi)
* [Wilfred Hughes](https://github.com/wilfred)
* [Edward Easton](https://github.com/eeaston)
* ... and many others ...
Contributions welcome!
## License
Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in [LICENSE](LICENSE)
## Changelog
### 1.12 (2015-11-12)
* Bugfix: correct version detection for Pandas >= 0.18.
* Bugfix: retrying connection initialisation in case of an AutoReconnect failure.
### 1.11 (2015-10-29)
* Bugfix: Improve performance of saving multi-index Pandas DataFrames
by 9x
* Bugfix: authenticate should propagate non-OperationFailure exceptions
(e.g. ConnectionFailure) as this might be indicative of socket failures
* Bugfix: return 'deleted' state in VersionStore.list_versions() so that
callers can pick up on the head version being the delete-sentinel.
### 1.10 (2015-10-28)
* Bugfix: VersionStore.read(date_range=...) could do the wrong thing with
TimeZones (which aren't yet supported for date_range slicing.).
### 1.9 (2015-10-06)
* Bugfix: fix authentication race condition when sharing an Arctic
instance between multiple threads.
### 1.8 (2015-09-29)
* Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for
querying current authentications
### 1.7 (2015-09-18)
* Feature: Add support for reading a subset of a pandas DataFrame
in VersionStore.read by passing in an arctic.date.DateRange
* Bugfix: Reauth against admin if not auth'd against a library a
specific library's DB. Sometimes we appear to miss admin DB auths.
This is to workaround that until we work out what the issue is.
### 1.6 (2015-09-16)
* Feature: Add support for multi-index Bitemporal DataFrame storage.
This allows persisting data and changes within the DataFrame making it
easier to see how old data has been revised over time.
* Bugfix: Ensure we call the error logging hook when exceptions occur
### 1.5 (2015-09-02)
* Always use the primary cluster node for 'has_symbol()', it's safer
### 1.4 (2015-08-19)
* Bugfixes for timezone handling, now ensures use of non-naive datetimes
* Bugfix for tickstore read missing images
### 1.3 (2015-08-011)
* Improvements to command-line control scripts for users and libraries
* Bugfix for pickling top-level Arctic object
### 1.2 (2015-06-29)
* Allow snapshotting a range of versions in the VersionStore, and
snapshot all versions by default.
### 1.1 (2015-06-16)
* Bugfix for backwards-compatible unpickling of bson-encoded data
* Added switch for enabling parallel lz4 compression
### 1.0 (2015-06-14)
* Initial public release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
arctic-1.12.0.tar.gz
(215.7 kB
view details)
Built Distribution
arctic-1.12.0-py2.7-linux-x86_64.egg
(362.7 kB
view details)
File details
Details for the file arctic-1.12.0.tar.gz
.
File metadata
- Download URL: arctic-1.12.0.tar.gz
- Upload date:
- Size: 215.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f91736b540c948b38f2e5df8b2f43a6aa661a0ab759865f78e60417fe87bcea |
|
MD5 | d39d4aba467db7052f68655b2512a27d |
|
BLAKE2b-256 | 9aa4d551b93fbb00b9f2ab87e943c8faa4bf4f171f0c209380143b5f1b77fe80 |
File details
Details for the file arctic-1.12.0-py2.7-linux-x86_64.egg
.
File metadata
- Download URL: arctic-1.12.0-py2.7-linux-x86_64.egg
- Upload date:
- Size: 362.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e53b09507274c572c4f97543635a6cfc28cf731cfc08b72ae27e1697354e050b |
|
MD5 | 4da325628943a8b26993a7e6f084f064 |
|
BLAKE2b-256 | 2788c9a5e934408683867dbc2b6e3029ab0dec4b890001d82be13e12a4864b78 |