Skip to main content

Cython wrapper for tokyo cabinet table

Project description

About

Pythonic access to tokyo cabinet table database api. (NOTE: The original cython code was from pykesto.) The aims is to provide a simple syntax to load and query data in a table. Most of the work is handled by the Col query interface. e.g.

>>> from totable import ToTable, Col
>>> tbl = ToTable('t.tct', 'w')
>>> result = tbl.select(Col('age') > 18, Col('name').startswith('T'))

to allow querying columns with numbers and letters transparently. Even though tokyo cabinet stores all values as strings. And more syntatic sugar below.

Install

first, install Tokyo-Cabinet source, then, from a the directory containing this file:

# requires cython for now.
$ cython src/ctotable.pyx
$ python setup.py build_ext -i

# test
$ PYTHONPATH=. python totable/tests/test_totable.py

# install
$ sudo python setup.py install

Example Use

Make some fake data. Note it works just like a DBM or dictionary, except that the values themselves are dictionaries.

>>> from totable import ToTable, Col
>>> tbl = ToTable('doctest.tct', 'w')
>>> fnames = ['fred', 'jane', 'john', 'mark', 'bill', 'ted', 'ann']
>>> lnames = ['smith', 'cox', 'kit', 'ttt', 'zzz', 'ark', 'ddk']
>>> for i in range(len(fnames)):
...     tbl[str(i)] = {'fname': fnames[i], 'lname': lnames[i],
...                    'age': str((10 + i) * 2)}
...     tbl[str(i + len(fnames))] = {'fname': fnames[i],
...                                  'lname': lnames[len(lnames) - i - 1],
...                                   'age': str((30 + i) * 2)}

>>> len(tbl)
14

Col

Col, as sent to the select method makes it easy to do queries on a database the format is Col(colname) == ‘Fred’ where colname is one of the keys in the dictionary items in the database. or can use kwargs to select()

>>> tbl.select(lname='cox')
[('1', {'lname': 'cox', 'age': '22', 'fname': 'jane'}), ('12', {'lname': 'cox', 'age': '70', 'fname': 'ted'})]

though using Col gives more power

startswith

>>> results = tbl.select(Col('fname').startswith('j'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane cox', 'jane ark', 'john kit', 'john zzz']

endswith

#and combine queries by sending them in together.
>>> results = tbl.select(Col('fname').startswith('j'), Col('lname').endswith('k'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane ark']

like

this works like an sql query with ‘%’ on either end. (dont attach those values to the query!). so to get everyone with and ‘e’ in their firstname…

>>> r = tbl.select(Col('fname').like('e'))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'jane', 'ted']

in_list

return row that exactly match 1 of the values in the list.

>>> r = tbl.select(Col('fname').in_list(['ted', 'fred']))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'ted']

>>> r = tbl.select(Col('age').in_list([20, 70]))
>>> sorted(set([v['age'] for k, v in r]))
['20', '70']

between

use for number querying between a min and max. includes the endpoints.

>>> r = tbl.select(Col('age').between(68, 70))
>>> [v['age'] for k, v in r]
['68', '70']

numeric queries (richcmp)

in TC, everything is stored as strings, but you can force number based comparisons with ToTable by using (you guessed it) a number. Or using a string for non-numeric comparisons.

>>> results = tbl.select(Col('age') > 68)
>>> [d['age'] for k, d in results]
['70', '72']

combining queries

just add multiple Col() arguments to the select() call and they will be essentially and’ed together.

>>> results = tbl.select(Col('age') > 68, Col('age') < 72)
>>> [d['age'] for k, d in results]
['70']

Negate(~)

for example get everything that’s not a given value…

>>> results = tbl.select(~Col('age') <= 68)
>>> [d['age'] for k, d in results]
['70', '72']

#all rows where fname is not 'jane'
>>> results = tbl.select(~Col('fname') != 'jane')
>>> 'jane' in [d['fname'] for k, d in results]
False

Regular Expression Matching

supports normal regular expression characters “[ $ ^ | “ , etc.

>>> results = tbl.select(Col('fname').matches("a"))
>>> sorted(set([d['fname'] for k, d in results]))
['ann', 'jane', 'mark']

>>> results = tbl.select(Col('fname').matches("^a"))
>>> sorted(set([d['fname'] for k, d in results]))
['ann']

Offset/Limit

just like SQL, yo.

>>> results = tbl.select(Col('age') < 68, limit=1)
>>> len(results)
1

order

currently only works for string keys. use ‘-’ for descending and ‘+’ for ascending

>>> [v['fname'] for k, v in tbl.select(lname='cox', order='-fname')]
['ted', 'jane']

# ascending
>>> [v['fname'] for k, v in tbl.select(lname='cox', order='+fname')]
['jane', 'ted']

values

TC is a key-value store, but it also acts as a table. it may be convenient to get just the values as you’d expect from a database table. Note in all examples above, the ‘k’ey is not used, only the value dictionary. This can be made simpler with ‘values_only’. When ‘values_only’ is True, some python call overhead is removed as well.

::
>>> tbl.select(Col('fname').matches("^a"), values_only=True)
[{'lname': 'ddk', 'age': '32', 'fname': 'ann'}, {'lname': 'smith', 'age': '72', 'fname': 'ann'}]

Schemaless

since it’s schemaless, you can add anything

>>> tbl['weird'] = {"val": "hello"}
>>> tbl['weird']
{'val': 'hello'}

delete

delete as expected for a dictionary interface.

>>> del tbl['weird']
>>> print tbl.get('weird')
None

put

encapsulates put, putkeep and putcat with a mode kwarg that takes ‘p’ or ‘k’ or ‘c’ respectively.

>>> tbl.put('a', {'a': '1'}, mode='p')
>>> tbl.put('a', {'a': '2'}, mode='k')
'keep'
>>> assert tbl['a'] == {'a': '1'}

>>> tbl.put('b', {'a': '3'}, mode='k')
'put'

>>> tbl.put('a', {'b': '99'}, 'c')
>>> assert tbl['a'] == {'a': '1', 'b': '99'}

Performance Tuning

Tokyo Cabinet allows you to tune or optimize a table. the available parameters are:

  • bnum specifies the number of elements of the bucket array. Suggested size of ‘bnum’ is about from 0.5 to 4 times of the number of all records to be stored. default is about 132K.

  • apow specifies the size of record alignment by power of 2. The default value is 4 standing for 2^4=16.

  • fpow specifies the maximum number of elements of the free block pool by power of 2. The default value is 10 standing for 2^10=1024.

  • opts specifies options by bitwise-or (|):

    • ‘TDBTLARGE’ must be specified to use a database larger than 2GB. (you must also specify a config flag when compiling the TC library to enable this)

    • ‘TDBTDEFLATE’ use Deflate encoding.

    • ‘TDBTBZIP’ use BZIP2 encoding.

    • ‘TDBTTCBS’ use TCBS encoding.

The other parameters: cache and mmap_size are explained below.

tune

The arguments can be sent to the constructor.

>>> import totable
>>> t = ToTable("some.tct", 'w', bnum=1234, fpow=6, \
...                    opts=totable.TDBTLARGE | totable.TDBTBZIP)

>>> t.close()

optimize

optimize is called on an database opened with mode=’w’. if no arguments are specified, it will automatically adjust ‘bnum’ (only) according to the number of elements in the table.

>>> t = ToTable("some.tct", 'w')

# ... add some records ...
>>> t.optimize()
True

mmap_size

mmap_size is the size of mapped memory. default is 67,108,864 (64MB) set in the constructor. this is xmsiz in TC parlance.

>>> t.close()
>>> t = ToTable("some.tct", 'w', mmap_size=128 * 1e6) # ~128MB.

cache

TC also allows setting various caching parameters. * rcnum is the max number of records to be cached. default is 0 * lcnum is the max number of leaf-nodes to be cached. default is 4096 * ncnum is the max number of non-leaf nodes cached. default is 512 these also must be set in the constructor.

>>> t.close()
>>> t = ToTable("some.tct", 'w', rcnum=1e7, lcnum=32768)

index

create or delete a ‘s’tring or ‘d’ecimal index on a column for faster queries.

# create a decimal index on the number column 'age'.
>>> tbl.create_index('age', 'd')
True

# create a 'string index on the string column 'fname'.
>>> tbl.create_index('fname', 's')
True

# remove the index.
>>> tbl.delete_index('fname')
True

# optimize the index
>>> tbl.optimize_index('age')
True

clear

remove all records from the db.

>>> len(tbl)
16
>>> tbl.clear()
>>> len(tbl)
0

transaction

do stuff in a transaction. a rollback() is performed on any exceptions.

>>> try:
...     with transaction(tbl):
...         tbl['zzz'] = {'a': '4'}
...         1/0
... except: pass

>>> 'zzz' in tbl
False

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

totable-0.1.tar.gz (59.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page