Simple library to manage local file storage within an application based on the concept of local partition and sort keys
Project description
ColumnFile : column oriented file storage
ColumnFile is a simple column oriented file storage solution for analyzing all kinds of data (csv, json, ...) using local files.
This solution allows efficient CRUD operations by key. Details concerning the implementation are available here : see implementation details. It produces a file organization allowing fast access to data as follow :
Documentation
Create and open operations
create(dbname, schema)
: creates the storage architecture and sets the db schema. The schema parameter must be a python dict containing two attributes : 'hash' and 'sort'. Both are lists of couples '(attribute_name, attribute_types)'. Allowed types are in ['string', 'int', 'float'].open(dbname)
: opens an existing database
Reading operations
get(key, report_error = False)
: returns the item associated to the key passed as parameter. If the 'report_error' parameter is set to False, empty objetcs are returned when no item matches the key. If True, the function raises an exception when nothin is found.scan(sub_key, row_filter = lambda _: True)
: returns an iterator on all the values matching the sub_key. Values are sorted in ascending order with respected with the sort key. No particular order is imposed for hash keys. The row_filter function which can be passed as parameter allows you to filter the iterated values. Note that even if a filter is provided, all values are scanned. However, only those fullfilling the filter will be returned.
Writing operations
put(key, column_values)
: the put operation can be use to update a row by replacing all its columns by those passed in the column_values parameter. If no row matches the given key, a new row is created.merge(key, column_values)
: the merge operation can be use to update a row by merging the existing columns with the columns passed as parameter in column_values. If no row matches the given key, a new row is created.delete(key)
: deletes a rowcommit()
: commits the changes to the database. Note that if no commit is performed after write operations, the database will not take them into account.
Example codes
from ColumnFile import ColumnFile
from random import randint
import numpy as np
db = ColumnFile(verbose = True)
# create the database
db.create("demo-db", {
"hash": [ ("year", "integer") ],
"sort": [ ("month", "integer"), ("day", "integer") ]
})
# open the existing database
db.open("demo-db")
# let's add some data
for year in range(2012, 2018):
for month in range(1, 13):
for day in range(1, 32):
operation_type = ['sell', 'buy'][randint(0,1)]
amount = randint(1, 500)
db.put((year, month, day), { "operation_type": operation_type, "amount": amount })
# change a specific column value
db.merge((2016, 6, 23), { "operation_amount": 250 })
# do NOT forget to commit
db.commit()
# retrieve one specific row and print its operation type
key, values = db.get((2016, 6, 23))
print(values['operation_amount'])
# compute average sell amount in 2016
results = db.scan(2016, row_filter = lambda row: row[1]['operation_type'] == 'sell')
sum_amount, n = sum(map(lambda row : np.array([row[1]['amount'], 1]), results))
average = sum_amount / n
print('Average = ', average)
Example projects
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ColumnFile-0.0.3.tar.gz
(7.8 kB
view hashes)
Built Distribution
Close
Hashes for ColumnFile-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d979756f02a70a0690802568134cad2a57a871b1f85eb38f68453afc37331aa0 |
|
MD5 | ff1ecb2e3b8e3a772ad22e0e311994ba |
|
BLAKE2b-256 | 4114803ff1bbdeb05f59b8f6229aaaf08cc1bcc609dc3cc9f28efb9807c44930 |