AHL Research Versioned TimeSeries and Tick store
Project description
Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-the-box, with pluggable support for other data types and optional versioning.
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.
Arctic has been under active development at Man AHL since 2012.
Quickstart
Install Arctic
pip install git+https://github.com/manahl/arctic.git
Run a MongoDB
mongod --dbpath <path/to/db_directory>
Using VersionStore
from arctic import Arctic # Connect to Local MONGODB store = Arctic('localhost') # Create the library - defaults to VersionStore store.initialize_library('NASDAQ') # Access the library library = store['NASDAQ'] # Load some data - maybe from Quandl aapl = Quandl.get("NASDAQ/AAPL", authtoken="your token here") # Store the data in the library library.write('AAPL', aapl, metadata={'source': 'Quandl'}) # Reading the data item = library.read('AAPL') aapl = item.data metadata = item.metadata
VersionStore supports much more: See the HowTo!
Adding your own storage engine
Plugging a custom class in as a library type is straightforward. This example shows how.
Concepts
Libraries
Arctic provides namespaced libraries of data. These libraries allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.).
Arctic supports multiple data libraries per user. A user (or namespace) maps to a MongoDB database (the granularity of mongo authentication). The library itself is composed of a number of collections within the database. Libraries look like:
user.EOD
user.ONEMINUTE
A library is mapped to a Python class. All library databases in MongoDB are prefixed with ‘arctic_’
Storage Engines
Arctic includes two storage engines:
VersionStore: a key-value versioned TimeSeries store. It supports:
Pandas data types (other Python types pickled)
Multiple versions of each data item. Can easily read previous versions.
Create point-in-time snapshots across symbols in a library
Soft quota support
Hooks for persisting other data types
Audited writes: API for saving metadata and data before and after a write.
a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
TickStore: Column oriented tick database. Supports dynamic fields, chunks aren’t versioned. Designed for large continuously ticking data.
Arctic storage implementations are pluggable. VersionStore is the default.
Requirements
Arctic currently works with:
Python 2.7, 3.3, 3.4
pymongo >= 3.0
Pandas
MongoDB >= 2.4.x
Acknowledgements
Arctic has been under active development at Man AHL since 2012.
It wouldn’t be possible without the work of the AHL Data Engineering Team including:
Tope Olukemi
Drake Siard
… and many others …
Contributions welcome!
License
Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE
Changelog
1.20 (2016-02-03)
Feature: #98 Add initial_image as optional parameter on tickstore write()
Bugfix: #100 Write error on end field when writing with pandas dataframes
1.19 (2016-01-29)
Feature: Add python 3.3/3.4 support
Bugfix: #95 Fix raising NoDataFoundException across multiple low level libraries
1.18 (2016-01-05)
Bugfix: #81 Fix broken read of multi-index DataFrame written by old version of Arctic
Bugfix: #49 Fix strifying tickstore
1.17 (2015-12-24)
Feature: Add timezone suppport to store multi-index dataframes
Bugfix: Fixed broken sdist releases
1.16 (2015-12-15)
Feature: ArticTransaction now supports non-audited ‘transactions’: audit=False with ArcticTransaction(Arctic('hostname')['some_library'], 'symbol', audit=False) as at: ... This is useful for batch jobs which read-modify-write and don’t want to clash with concurrent writers, and which don’t require keeping all versions of a symbol.
1.15 (2015-11-25)
Feature: get_info API added to version_store.
1.14 (2015-11-25)
1.12 (2015-11-12)
Bugfix: correct version detection for Pandas >= 0.18.
Bugfix: retrying connection initialisation in case of an AutoReconnect failure.
1.11 (2015-10-29)
Bugfix: Improve performance of saving multi-index Pandas DataFrames by 9x
Bugfix: authenticate should propagate non-OperationFailure exceptions (e.g. ConnectionFailure) as this might be indicative of socket failures
Bugfix: return ‘deleted’ state in VersionStore.list_versions() so that callers can pick up on the head version being the delete-sentinel.
1.10 (2015-10-28)
Bugfix: VersionStore.read(date_range=…) could do the wrong thing with TimeZones (which aren’t yet supported for date_range slicing.).
1.9 (2015-10-06)
Bugfix: fix authentication race condition when sharing an Arctic instance between multiple threads.
1.8 (2015-09-29)
Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for querying current authentications
1.7 (2015-09-18)
Feature: Add support for reading a subset of a pandas DataFrame in VersionStore.read by passing in an arctic.date.DateRange
Bugfix: Reauth against admin if not auth’d against a library a specific library’s DB. Sometimes we appear to miss admin DB auths. This is to workaround that until we work out what the issue is.
1.6 (2015-09-16)
Feature: Add support for multi-index Bitemporal DataFrame storage. This allows persisting data and changes within the DataFrame making it easier to see how old data has been revised over time.
Bugfix: Ensure we call the error logging hook when exceptions occur
1.5 (2015-09-02)
Always use the primary cluster node for ‘has_symbol()’, it’s safer
1.4 (2015-08-19)
Bugfixes for timezone handling, now ensures use of non-naive datetimes
Bugfix for tickstore read missing images
1.3 (2015-08-011)
Improvements to command-line control scripts for users and libraries
Bugfix for pickling top-level Arctic object
1.2 (2015-06-29)
Allow snapshotting a range of versions in the VersionStore, and snapshot all versions by default.
1.1 (2015-06-16)
Bugfix for backwards-compatible unpickling of bson-encoded data
Added switch for enabling parallel lz4 compression
1.0 (2015-06-14)
Initial public release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arctic-1.20.0.tar.gz
.
File metadata
- Download URL: arctic-1.20.0.tar.gz
- Upload date:
- Size: 430.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 008ee07884f5d8e4233247740df29781966287156674d6a89ac2aab320bb79cd |
|
MD5 | 7db6fe787202e5a7e0857b08e3567b16 |
|
BLAKE2b-256 | 757eecd7f97ba9263211601ffbacaa18f6a0991b40ddbeeaecb19a767f1e7fd2 |
File details
Details for the file arctic-1.20.0-py2.7-linux-x86_64.egg
.
File metadata
- Download URL: arctic-1.20.0-py2.7-linux-x86_64.egg
- Upload date:
- Size: 368.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05280f798e9c96afc6232c39b587cc6e2017db6e5509ef0d06bd6fa2fab4c093 |
|
MD5 | 8095cd0b4d84517c2e073404337d8bf9 |
|
BLAKE2b-256 | 3889318710081ad518d0a2ab7a28763d880dab1320a2eee900238680ccdf2272 |