Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Cython wrapper for tokyo cabinet table

Project Description

About

Pythonic access to tokyo cabinet table database api. (NOTE: The original cython code was from pykesto.) The aims is to provide a simple syntax to load and query data in a table. Most of the work is handled by the Col query interface. e.g.

>>> from totable import ToTable, Col
>>> tbl = ToTable('t.tct', 'w')
>>> result = tbl.select(Col('age') > 18, Col('name').startswith('T'))

to allow querying columns with numbers and letters transparently. Even though tokyo cabinet stores all values as strings. And more syntatic sugar below.

Install

first, install Tokyo-Cabinet source, then, from a the directory containing this file:

# requires cython for now.
$ cython src/ctotable.pyx
$ python setup.py build_ext -i

# test
$ PYTHONPATH=. python totable/tests/test_totable.py

# install
$ sudo python setup.py install

Example Use

Make some fake data. Note it works just like a DBM or dictionary, except that the values themselves are dictionaries.

>>> from totable import ToTable, Col
>>> tbl = ToTable('doctest.tct', 'w')
>>> fnames = ['fred', 'jane', 'john', 'mark', 'bill', 'ted', 'ann']
>>> lnames = ['smith', 'cox', 'kit', 'ttt', 'zzz', 'ark', 'ddk']
>>> for i in range(len(fnames)):
...     tbl[str(i)] = {'fname': fnames[i], 'lname': lnames[i],
...                    'age': str((10 + i) * 2)}
...     tbl[str(i + len(fnames))] = {'fname': fnames[i],
...                                  'lname': lnames[len(lnames) - i - 1],
...                                   'age': str((30 + i) * 2)}

>>> len(tbl)
14

Col

Col, as sent to the select method makes it easy to do queries on a database the format is Col(colname) == ‘Fred’ where colname is one of the keys in the dictionary items in the database. or can use kwargs to select()

>>> tbl.select(lname='cox')
[('1', {'lname': 'cox', 'age': '22', 'fname': 'jane'}), ('12', {'lname': 'cox', 'age': '70', 'fname': 'ted'})]

though using Col gives more power

startswith

>>> results = tbl.select(Col('fname').startswith('j'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane cox', 'jane ark', 'john kit', 'john zzz']

endswith

#and combine queries by sending them in together.
>>> results = tbl.select(Col('fname').startswith('j'), Col('lname').endswith('k'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane ark']

like

this works like an sql query with ‘%’ on either end. (dont attach those values to the query!). so to get everyone with and ‘e’ in their firstname…

>>> r = tbl.select(Col('fname').like('e'))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'jane', 'ted']

in_list

return row that exactly match 1 of the values in the list.

>>> r = tbl.select(Col('fname').in_list(['ted', 'fred']))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'ted']

>>> r = tbl.select(Col('age').in_list([20, 70]))
>>> sorted(set([v['age'] for k, v in r]))
['20', '70']

between

use for number querying between a min and max. includes the endpoints.

>>> r = tbl.select(Col('age').between(68, 70))
>>> [v['age'] for k, v in r]
['68', '70']

numeric queries (richcmp)

in TC, everything is stored as strings, but you can force number based comparisons with ToTable by using (you guessed it) a number. Or using a string for non-numeric comparisons.

>>> results = tbl.select(Col('age') > 68)
>>> [d['age'] for k, d in results]
['70', '72']

combining queries

just add multiple Col() arguments to the select() call and they will be essentially and’ed together.

>>> results = tbl.select(Col('age') > 68, Col('age') < 72)
>>> [d['age'] for k, d in results]
['70']

Negate(~)

for example get everything that’s not a given value…

>>> results = tbl.select(~Col('age') <= 68)
>>> [d['age'] for k, d in results]
['70', '72']

#all rows where fname is not 'jane'
>>> results = tbl.select(~Col('fname') != 'jane')
>>> 'jane' in [d['fname'] for k, d in results]
False

Regular Expression Matching

supports normal regular expression characters “[ $ ^ | ” , etc.

>>> results = tbl.select(Col('fname').matches("a"))
>>> sorted(set([d['fname'] for k, d in results]))
['ann', 'jane', 'mark']

>>> results = tbl.select(Col('fname').matches("^a"))
>>> sorted(set([d['fname'] for k, d in results]))
['ann']

Offset/Limit

just like SQL, yo.

>>> results = tbl.select(Col('age') < 68, limit=1)
>>> len(results)
1

order

currently only works for string keys. use ‘-‘ for descending and ‘+’ for ascending

>>> [v['fname'] for k, v in tbl.select(lname='cox', order='-fname')]
['ted', 'jane']

# ascending
>>> [v['fname'] for k, v in tbl.select(lname='cox', order='+fname')]
['jane', 'ted']

values

TC is a key-value store, but it also acts as a table. it may be convenient to get just the values as you’d expect from a database table. Note in all examples above, the ‘k’ey is not used, only the value dictionary. This can be made simpler with ‘values_only’. When ‘values_only’ is True, some python call overhead is removed as well.

::
>>> tbl.select(Col('fname').matches("^a"), values_only=True)
[{'lname': 'ddk', 'age': '32', 'fname': 'ann'}, {'lname': 'smith', 'age': '72', 'fname': 'ann'}]

Schemaless

since it’s schemaless, you can add anything

>>> tbl['weird'] = {"val": "hello"}
>>> tbl['weird']
{'val': 'hello'}

delete

delete as expected for a dictionary interface.

>>> del tbl['weird']
>>> print tbl.get('weird')
None

put

encapsulates put, putkeep and putcat with a mode kwarg that takes ‘p’ or ‘k’ or ‘c’ respectively.

>>> tbl.put('a', {'a': '1'}, mode='p')
>>> tbl.put('a', {'a': '2'}, mode='k')
'keep'
>>> assert tbl['a'] == {'a': '1'}

>>> tbl.put('b', {'a': '3'}, mode='k')
'put'

>>> tbl.put('a', {'b': '99'}, 'c')
>>> assert tbl['a'] == {'a': '1', 'b': '99'}

Performance Tuning

Tokyo Cabinet allows you to tune or optimize a table. the available parameters are:

  • bnum specifies the number of elements of the bucket array. Suggested size of ‘bnum’ is about from 0.5 to 4 times of the number of all records to be stored. default is about 132K.
  • apow specifies the size of record alignment by power of 2. The default value is 4 standing for 2^4=16.
  • fpow specifies the maximum number of elements of the free block pool by power of 2. The default value is 10 standing for 2^10=1024.
  • opts specifies options by bitwise-or (|):
    • ‘TDBTLARGE’ must be specified to use a database larger than 2GB. (you must also specify a config flag when compiling the TC library to enable this)
    • ‘TDBTDEFLATE’ use Deflate encoding.
    • ‘TDBTBZIP’ use BZIP2 encoding.
    • ‘TDBTTCBS’ use TCBS encoding.

The other parameters: cache and mmap_size are explained below.

tune

The arguments can be sent to the constructor.

>>> import totable
>>> t = ToTable("some.tct", 'w', bnum=1234, fpow=6, \
...                    opts=totable.TDBTLARGE | totable.TDBTBZIP)

>>> t.close()

optimize

optimize is called on an database opened with mode=’w’. if no arguments are specified, it will automatically adjust ‘bnum’ (only) according to the number of elements in the table.

>>> t = ToTable("some.tct", 'w')

# ... add some records ...
>>> t.optimize()
True

mmap_size

mmap_size is the size of mapped memory. default is 67,108,864 (64MB) set in the constructor. this is xmsiz in TC parlance.

>>> t.close()
>>> t = ToTable("some.tct", 'w', mmap_size=128 * 1e6) # ~128MB.

cache

TC also allows setting various caching parameters. * rcnum is the max number of records to be cached. default is 0 * lcnum is the max number of leaf-nodes to be cached. default is 4096 * ncnum is the max number of non-leaf nodes cached. default is 512 these also must be set in the constructor.

>>> t.close()
>>> t = ToTable("some.tct", 'w', rcnum=1e7, lcnum=32768)

index

create or delete a ‘s’tring or ‘d’ecimal index on a column for faster queries.

# create a decimal index on the number column 'age'.
>>> tbl.create_index('age', 'd')
True

# create a 'string index on the string column 'fname'.
>>> tbl.create_index('fname', 's')
True

# remove the index.
>>> tbl.delete_index('fname')
True

# optimize the index
>>> tbl.optimize_index('age')
True

clear

remove all records from the db.

>>> len(tbl)
16
>>> tbl.clear()
>>> len(tbl)
0

transaction

do stuff in a transaction. a rollback() is performed on any exceptions.

>>> try:
...     with transaction(tbl):
...         tbl['zzz'] = {'a': '4'}
...         1/0
... except: pass

>>> 'zzz' in tbl
False

See Also

Release History

Release History

This version
History Node

0.1

History Node

0.0.1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
totable-0.1.tar.gz (59.1 kB) Copy SHA256 Checksum SHA256 Source Jan 7, 2010

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting