Skip to main content

AHL Research Versioned TimeSeries and Tick store

Project description

# [![arctic](logo/arctic_50.png)](https://github.com/manahl/arctic) [Arctic TimeSeries and Tick store](https://github.com/manahl/arctic)


[![Travis CI](https://travis-ci.org/manahl/arctic.svg?branch=master)](https://travis-ci.org/manahl/arctic)
[![Coverage Status](https://coveralls.io/repos/github/manahl/arctic/badge.svg?branch=master)](https://coveralls.io/github/manahl/arctic?branch=master)
[![Code Health](https://landscape.io/github/manahl/arctic/master/landscape.svg?style=flat)](https://landscape.io/github/manahl/arctic/master)
[![Join the chat at https://gitter.im/manahl/arctic](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/manahl/arctic?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

Arctic is a high performance datastore for numeric data. It supports [Pandas](http://pandas.pydata.org/),
[numpy](http://www.numpy.org/) arrays and pickled objects out-of-the-box, with pluggable support for
other data types and optional versioning.

Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth,
~10x compression on disk, and scales to hundreds of millions of rows per second per
[MongoDB](https://www.mongodb.org/) instance.

Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.

## Quickstart

### Install Arctic

```
pip install git+https://github.com/manahl/arctic.git
```

### Run a MongoDB

```
mongod --dbpath <path/to/db_directory>
```

### Using VersionStore

```
from arctic import Arctic
import quandl

# Connect to Local MONGODB
store = Arctic('localhost')

# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')

# Access the library
library = store['NASDAQ']

# Load some data - maybe from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})

# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata
```

VersionStore supports much more: [See the HowTo](howtos/how_to_use_arctic.py)!


### Adding your own storage engine

Plugging a custom class in as a library type is straightforward. [This example
shows how.](howtos/how_to_custom_arctic_library.py)



## Concepts

### Libraries

Arctic provides namespaced *libraries* of data. These libraries allow
bucketing data by *source*, *user* or some other metric (for example frequency:
End-Of-Day; Minute Bars; etc.).

Arctic supports multiple data libraries per user. A user (or namespace)
maps to a MongoDB database (the granularity of mongo authentication). The library
itself is composed of a number of collections within the database. Libraries look like:

* user.EOD
* user.ONEMINUTE

A library is mapped to a Python class. All library databases in MongoDB are prefixed with 'arctic_'

### Storage Engines

Arctic includes three storage engines:

* [VersionStore](arctic/store/version_store.py): a key-value versioned TimeSeries store. It supports:
* Pandas data types (other Python types pickled)
* Multiple versions of each data item. Can easily read previous versions.
* Create point-in-time snapshots across symbols in a library
* Soft quota support
* Hooks for persisting other data types
* Audited writes: API for saving metadata and data before and after a write.
* a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
* [See the HowTo](howtos/how_to_use_arctic.py)
* [Documentation](docs/versionstore.md)
* [TickStore](arctic/tickstore/tickstore.py): Column oriented tick database. Supports
dynamic fields, chunks aren't versioned. Designed for large continuously ticking data.
* [Chunkstore](https://github.com/manahl/arctic/wiki/Chunkstore): A storage type that allows data to be stored in customizable chunk sizes. Chunks
aren't versioned, and can be appended to and updated in place.
* [Documentation](docs/chunkstore.md)

Arctic storage implementations are **pluggable**. VersionStore is the default.


## Requirements

Arctic currently works with:

* Python 2.7, 3.4, 3.5, 3.6
* pymongo >= 3.6
* Pandas
* MongoDB >= 2.4.x


Operating Systems:
* Linux
* macOS
* Windows 10

## Acknowledgements

Arctic has been under active development at [Man AHL](http://www.ahl.com/) since 2012.

It wouldn't be possible without the work of the AHL Data Engineering Team including:

* [Richard Bounds](https://github.com/richardbounds)
* [James Blackburn](https://github.com/jamesblackburn)
* [Vlad Mereuta](https://github.com/vmereuta)
* [Tom Taylor](https://github.com/TomTaylorLondon)
* Tope Olukemi
* [Drake Siard](https://github.com/drakesiard)
* [Slavi Marinov](https://github.com/slavi)
* [Wilfred Hughes](https://github.com/wilfred)
* [Edward Easton](https://github.com/eeaston)
* [Bryant Moscon](https://github.com/bmoscon)
* [Dimosthenis Pediaditakis](https://github.com/dimosped)
* ... and many others ...

Contributions welcome!

## License

Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in [LICENSE](LICENSE)

## Changelog

### 1.73
* Bugfix: #658 Write/append errors for Panel objects from older pandas versions
* Feature: #653 Add version meta-info in arctic module
* Feature: #663 Include arctic numerical version in the metadata of the version document
* Feature: #650 Implemented forward pointers for chunks in VersionStore (modes: enabled/disabled/hybrid)

### 1.72 (2018-11-06)
* Feature: #577 Added implementation for incremental serializer for numpy records
* Bugfix: #648 Fix issue with Timezone aware Pandas types, which don't contain hasobject attribute

### 1.71 (2018-11-05)
* Bugfix: #645 Fix write errors for Pandas DataFrame that has mixed object/string types in multi-index column

### 1.70 (2018-10-30)
* Bugfix: #157 Assure that serialized dataframes remain value-equivalent (e.g. avoid NaN --> 'nan' in mixed string columns)
* Bugfix: #608 Ensure Arctic performs well with MongoDB 3.6 (sorting)
* Bugfix: #629 Column kwarg no longer modified
* Bugfix: #641 DateRange.intersection open/closed range fix
* Feature: #493 Can pass kwargs when calling MongoClient, e.g. for ssl
* Feature: #590 Faster write handler selection for DataFrames with objects
* Feature: #604 Improved handling handling for pickling serialization decidions


### 1.69 (2018-09-12)
* Docs: VersionStore documentation
* Bugfix: Issue #612 ThreadPool should be created by process using it
* Feature: Upsert option on appends in ChunkStore

### 1.68 (2018-08-17)
* Feature: #553 Compatibility with both the new and old LZ4 API
* Feature: #571 Removed the Cython LZ4 code, use the latest python-lz4
* Feature: #557 Threadpool based compression. Speed imrpovement and tuning benchmarks.
* Bugfix: fix tickstore unicode handling, support both unicode and utf-8 arrays
* Bugfix: #591 Fix tickstore reads not returning index with localized timezone
* Feature: #595 add host attribute to VersionedItem.
* Bugfix: #594 Enable sharding on chunkstore

### 1.67.1 (2018-07-11)
* Bugfix: #579 Fix symbol corruption due to restore_version and append
* Bugfix: #584 Fix list_versions for a snapshot after deleting symbols in later versions

### 1.67 (2018-05-24)
* Bugfix: #561 Fix PickleStore read corruption after write_metadata

### 1.66 (2018-05-21)
* Bugfix: #168 Do not allow empty string as a column name
* Bugfix: #483 Remove potential floating point error from datetime_to_ms
* Bugfix: #271 Log when library doesnt exist on delete
* Feature: MetaDataStore: added list_symbols with regexp, as_of and metadata fields matching filters
* Feature: Support for serialization of DataFrames in Pandas 0.23.x

### 1.65 (2018-04-16)
* Bugfix: #534 VersionStore: overwriting a symbol with different dtype (but same data format) does not
raise exceptions anymore
* Bugfix: #531 arctic_prune_versions: clean broken snapshot references before pruning
* Bugfix: setup.py develop in a conda environment on Mac
* Feature: #490 add support to numpy 1.14

### 1.63 (2018-04-06)
* Bugfix: #521 Clang 6.0 compiler support on macOS
* Feature: #510 VersionStore: support multi column in pandas DataFrames

### 1.62 (2018-3-14)
* Bugfix: #517 VersionStore: append does not duplicate data in certain corner cases
* Bugfix: #519 VersionStore: list_symbols speed improvement and fix for memory limit exceed

### 1.61 (2018-3-2)
* Feature: #288 Mapping reads and writes over chunks in chunkstore
* Bugfix: #508 VersionStore: list_symbols and read now always returns latest version
* Bugfix: #512 Improved performance for list_versions
* Bugfix: #515 VersionStore: _prune_previous_versions now retries the cleanup operation

### 1.60 (2018-2-13)
* Bugfix: #503 ChunkStore: speedup check for -1 segments
* Feature: #504 Increasing number of libraries in Arctic to 5000.

### 1.59 (2018-2-6)
* Bugfix: Increase performance of invalid segment check in chunkstore
* Bugfix: #501 Fix the spurious data integrity exceptions at write path, due to moving chunks form the balancer

### 1.58 (2018-1-15)
* Bugfix: #491 roll back the use of frombuffer to fromstring, fixes the read-only ndarray issue

### 1.57 (2018-1-11)
* Feature: #206 String support for tickstore
* Bugfix: #486 improve mongo_retry robustness with failures for version store write/append

### 1.56 (2017-12-21)
* Bugfix: #468 Re-adding compatibility for pandas 0.20.x
* Bugfix: #476 Ensure we re-auth when a new MongoClient is created after fork

### 1.55 (2017-12-14)
* Bugfix: #439 fix cursor timeouts in chunkstore iterator
* Bugfix: #450 fix error in chunkstore delete when chunk range produces empty df
* Bugfix: #442 fix incorrect segment values in multi segment chunks in chunkstore
* Feature: #457 enchances fix for #442 via segment_id_repair tool
* Bugfix: #385 exceptions during quota statistics no longer kill a write
* Feature: PR#161 TickStore.max_date now returns a datetime in the 'local' timezone
* Feature: #425 user defined metadata for tickstore
* Feature: #464 performance improvement by avoiding unnecessary re-authentication
* Bugfix: #250 Added multiprocessing safety, check for initialized MongoClient after fork.
* Feature: #465 Added fast operations for write only metadata and restore symbol to a version

### 1.54 (2017-10-18)
* Bugfix: #440 Fix read empty MultiIndex+tz Series

### 1.53 (2017-10-06)
* Perf: #408 Improve memory performance of version store's serializer
* Bugfix #394 Multi symbol read in chunkstore
* Bugfix: #407 Fix segment issue on appends in chunkstore
* Bugfix: Inconsistent returns on MetadataStore.append
* Bugfix: #412 pandas deprecation and #289 improve exception report in numpy record serializer
* Bugfix: #420 chunkstore ignoring open interval date ranges
* Bugfix: #427 chunkstore metadata not being correctly replaced during symbol overwrite
* Bugfix: #431 chunkstore iterators do not handle multi segment chunks correctly

### 1.51 (2017-08-21)
* Bugfix: #397 Remove calls to deprecated methods in pymongo
* Bugfix: #402 Append to empty DF fails in VersionStore

### 1.50 (2017-08-18)
* Feature: #396 MetadataStore.read now supports as_of argument
* Bugfix: #397 Pin pymongo==3.4.0

### 1.49 (2017-08-02)
* Feature: #392 MetadataStore
* Bugfix: #384 sentinels missing time data on chunk start/ends in ChunkStore
* Bugfix: #382 Remove dependency on cython being pre-installed
* Bugfix: #343 Renaming libraries/collections within a namespace/database

### 1.48 (2017-06-26)
* BugFix: Rollback #363, as it breaks multi-index dataframe
* Bugfix: #372 OSX build improvements

### 1.47 (2017-06-19)
* Feature: Re-introduce #363 `concat` flag, essentially undo-ing 1.45
* BugFix: #377 Fix broken `replace_one` on BSONStore and add `bulk_write`

### 1.46 (2017-06-13)
* Feature: #374 Shard BSONStore on `_id` rather than `symbol`

### 1.45 (2017-06-09)
* BugFix: Rollback #363, which can cause ordering issues on append

### 1.44 (2017-06-08)
* Feature: #364 Expose compressHC from internal arctic LZ4 and remove external LZ4 dependency
* Feature: #363 Appending older data (compare to what's exist in library) will raise. Use `concat=True` to append only the
new bits
* Feature: #371 Expose more functionality in BSONStore

### 1.43 (2017-05-30)
* Bugfix: #350 remove deprecated pandas calls
* Bugfix: #360 version incorrect in empty append in VersionStore
* Feature: #365 add generic BSON store

### 1.42 (2017-05-12)
* Bugfix: #346 fixed daterange subsetting error on very large dateframes in version store
* Bugfix: #351 $size queries can't use indexes, use alternative queries

### 1.41 (2017-04-20)
* Bugfix: #334 Chunk range param with pandas object fails in chunkstore.get_chunk_ranges
* Bugfix: #339 Depending on lz4<=0.8.2 to fix build errors
* Bugfix: #342 fixed compilation errors on Mac OSX
* Bugfix: #344 fixed data corruption problem with concurrent appends

### 1.40 (2017-03-03)
* BugFix: #330 Make Arctic._lock reentrant

### 1.39 (2017-03-03)
* Feature: #329 Add reset() method to Arctic

### 1.38 (2017-02-22)
* Bugfix: #324 Datetime indexes must be sorted in chunkstore
* Feature: #290 improve performance of tickstore column reads

### 1.37 (2017-1-31)
* Bugfix: #300 to_datetime deprecated in pandas, use to_pydatetime instead
* Bugfix: #309 formatting change for DateRange ```__str__```
* Feature: #313 set and read user specified metadata in chunkstore
* Feature: #319 Audit log support in ChunkStor
* Bugfix: #216 Tickstore write fails with named index column


### 1.36 (2016-12-13)

* Feature: Default to hashed based sharding
* Bugfix: retry socket errors during VersionStore snapshot operations

### 1.35 (2016-11-29)

* Bugfix: #296 Cannot compress/decompress empty string

### 1.34 (2016-11-29)

* Feature: #294 Move per-chunk metadata for chunkstore to a separate collection
* Bugfix: #292 Account for metadata size during size chunking in ChunkStore
* Feature: #283 Support for all pandas frequency strings in ChunkStore DateChunker
* Feature: #286 Add has_symbol to ChunkStore and support for partial symbol matching in list_symbols

### 1.33 (2016-11-07)

* Feature: #275 Tuple range object support in DateChunker
* Bugfix: #273 Duplicate columns breaking serializer
* Feature: #267 Tickstore.delete returns deleted data
* Dependency: #266 Remove pytest-dbfixtures in favor of pytest-server-fixtures

### 1.32 (2016-10-25)

* Feature: #260 quota support on Chunkstore
* Bugfix: #259 prevent write of unnamed columns/indexes
* Bugfix: #252 pandas 0.19.0 compatibility fixes
* Bugfix: #249 open ended range reads on data without index fail
* Bugfix: #262 VersionStore.append must check data is written correctly during repack
* Bugfix: #263 Quota: Improve the error message when near soft-quota limit
* Perf: #265 VersionStore.write / append don't aggressively add indexes on each write

### 1.31 (2016-09-29)

* Bugfix: #247 segmentation read fix in chunkstore
* Feature: #243 add get_library_type method
* Bugfix: more cython changes to handle LZ4 errors properly
* Feature: #239 improve chunkstore's get_info method

### 1.30 (2016-09-26)

* Feature: #235 method to return chunk ranges on a symbol in ChunkStore
* Feature: #234 Iterator access to ChunkStore
* Bugfix: #236 Cython not handling errors from LZ4 function calls

### 1.29 (2016-09-20)

* Bugfix: #228 Mongo fail-over during append can leave a Version in an inconsistent state
* Feature: #193 Support for different Chunkers and Serializers by symbol in ChunkStore
* Feature: #220 Raise exception if older version of arctic attempts to read unsupported pickled data
* Feature: #219 and #220 Support for pickling large data (>2GB)
* Feature: #204 Add support for library renaming
* Feature: #209 Upsert capability in ChunkStore's update method
* Feature: #207 Support DatetimeIndexes in DateRange chunker
* Bugfix: #232 Don't raise during VersionStore #append(...) if the previous append failed

### 1.28 (2016-08-16)

* Bugfix: #195 Top level tickstore write with list of dicts now works with timezone aware datetimes

### 1.27 (2016-08-05)

* Bugfix: #187 Compatibility with latest version of pytest-dbfixtures
* Feature: #182 Improve ChunkStore read/write performance
* Feature: #162 Rename API for ChunkStore
* Feature: #186 chunk_range on update
* Bugfix: #189 range delete does not update symbol metadata

### 1.26 (2016-07-20)

* Bugfix: Faster TickStore querying for multiple symbols simultaneously
* Bugfix: TickStore.read now respects `allow_secondary=True`
* Bugfix: #147 Add get_info method to ChunkStore
* Bugfix: Periodically re-cache the library.quota to pick up any changes
* Bugfix: #166 Add index on SHA for ChunkStore
* Bugfix: #169 Dtype mismatch in chunkstore updates
* Feature: #171 allow deleting of values within a date range in ChunkStore
* Bugfix: #172 Fix date range bug when querying dates in the middle of chunks
* Bugfix: #176 Fix overwrite failures in Chunkstore
* Bugfix: #178 - Change how start/end dates are populated in the DB, also fix append so it works as expected.
* Bugfix: #43 - Remove dependency on hardcoded Linux timezone files

### 1.25 (2016-05-23)

* Bugfix: Ensure that Tickstore.write doesn't allow out of order messages
* Bugfix: VersionStore.write now allows writing 'None' as a value

### 1.24 (2016-05-10)

* Bugfix: Backwards compatibility reading/writing documents with previous versions of Arctic

### 1.22 (2016-05-09)

* Bugfix: #109 Ensure stable sort during Arctic read
* Feature: New benchmark suite using ASV
* Bugfix: #129 Fixed an issue where some chunks could get skipped during a multiple-symbol TickStore read
* Bugfix: #135 Fix issue with different datatype returned from pymongo in python3
* Feature: #130 New Chunkstore storage type

### 1.21 (2016-03-08)

* Bugfix: #106 Fix Pandas Panel storage for panels with different dimensions

### 1.20 (2016-02-03)

* Feature: #98 Add initial_image as optional parameter on tickstore write()
* Bugfix: #100 Write error on end field when writing with pandas dataframes

### 1.19 (2016-01-29)

* Feature: Add python 3.3/3.4 support
* Bugfix: #95 Fix raising NoDataFoundException across multiple low level libraries

### 1.18 (2016-01-05)

* Bugfix: #81 Fix broken read of multi-index DataFrame written by old version of Arctic
* Bugfix: #49 Fix strifying tickstore

### 1.17 (2015-12-24)

* Feature: Add timezone suppport to store multi-index dataframes
* Bugfix: Fixed broken sdist releases

### 1.16 (2015-12-15)

* Feature: ArticTransaction now supports non-audited 'transactions': `audit=False`
```
with ArcticTransaction(Arctic('hostname')['some_library'], 'symbol', audit=False) as at:
...
```
This is useful for batch jobs which read-modify-write and don't want to clash with
concurrent writers, and which don't require keeping all versions of a symbol.

### 1.15 (2015-11-25)

* Feature: get_info API added to version_store.

### 1.14 (2015-11-25)
### 1.12 (2015-11-12)

* Bugfix: correct version detection for Pandas >= 0.18.
* Bugfix: retrying connection initialisation in case of an AutoReconnect failure.

### 1.11 (2015-10-29)

* Bugfix: Improve performance of saving multi-index Pandas DataFrames
by 9x
* Bugfix: authenticate should propagate non-OperationFailure exceptions
(e.g. ConnectionFailure) as this might be indicative of socket failures
* Bugfix: return 'deleted' state in VersionStore.list_versions() so that
callers can pick up on the head version being the delete-sentinel.

### 1.10 (2015-10-28)

* Bugfix: VersionStore.read(date_range=...) could do the wrong thing with
TimeZones (which aren't yet supported for date_range slicing.).

### 1.9 (2015-10-06)

* Bugfix: fix authentication race condition when sharing an Arctic
instance between multiple threads.

### 1.8 (2015-09-29)

* Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for
querying current authentications

### 1.7 (2015-09-18)

* Feature: Add support for reading a subset of a pandas DataFrame
in VersionStore.read by passing in an arctic.date.DateRange
* Bugfix: Reauth against admin if not auth'd against a library a
specific library's DB. Sometimes we appear to miss admin DB auths.
This is to workaround that until we work out what the issue is.

### 1.6 (2015-09-16)

* Feature: Add support for multi-index Bitemporal DataFrame storage.
This allows persisting data and changes within the DataFrame making it
easier to see how old data has been revised over time.
* Bugfix: Ensure we call the error logging hook when exceptions occur

### 1.5 (2015-09-02)

* Always use the primary cluster node for 'has_symbol()', it's safer

### 1.4 (2015-08-19)

* Bugfixes for timezone handling, now ensures use of non-naive datetimes
* Bugfix for tickstore read missing images

### 1.3 (2015-08-011)

* Improvements to command-line control scripts for users and libraries
* Bugfix for pickling top-level Arctic object

### 1.2 (2015-06-29)

* Allow snapshotting a range of versions in the VersionStore, and
snapshot all versions by default.

### 1.1 (2015-06-16)

* Bugfix for backwards-compatible unpickling of bson-encoded data
* Added switch for enabling parallel lz4 compression

### 1.0 (2015-06-14)

* Initial public release

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arctic-1.73.0.tar.gz (425.6 kB view details)

Uploaded Source

Built Distribution

arctic-1.73.0-py2.7.egg (117.7 kB view details)

Uploaded Source

File details

Details for the file arctic-1.73.0.tar.gz.

File metadata

  • Download URL: arctic-1.73.0.tar.gz
  • Upload date:
  • Size: 425.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.2 setuptools/16.0 requests-toolbelt/0.8.0 tqdm/4.20.0 CPython/2.7.11

File hashes

Hashes for arctic-1.73.0.tar.gz
Algorithm Hash digest
SHA256 3b56dc843758d0e1dd818158b727352b94df2ebd01232557611f816d02534cc1
MD5 a95006a8fa62cba15b707cc04803fae3
BLAKE2b-256 cd1ea2ba9ecf59bde1d918fb9565a853f4f04b23afd5e791bf6031c6b5909c46

See more details on using hashes here.

File details

Details for the file arctic-1.73.0-py2.7.egg.

File metadata

  • Download URL: arctic-1.73.0-py2.7.egg
  • Upload date:
  • Size: 117.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.2 setuptools/16.0 requests-toolbelt/0.8.0 tqdm/4.20.0 CPython/2.7.11

File hashes

Hashes for arctic-1.73.0-py2.7.egg
Algorithm Hash digest
SHA256 ea09e6b8e7526504b11e81731ccba50a4271f8785ec9e188dbf996a3ebc09f5d
MD5 6ee4117b0f4b8f3ef52f7ab5dac8559b
BLAKE2b-256 8fc8c24b0a034e603ce96985212f88d0f16c4b1a1e48f71d397233383d3f06dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page