Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

Cython wrapper for tokyo cabinet table

Project Description


Pythonic access to tokyo cabinet table database api. (NOTE: The original cython code was from pykesto.) The aims is to provide a simple syntax to load and query data in a table. Most of the work is handled by the Col query interface. e.g.

>>> from totable import ToTable, Col
>>> tbl = ToTable('t.tct', 'w')
>>> result ='age') > 18, Col('name').startswith('T'))

to allow querying columns with numbers and letters transparently. Even though tokyo cabinet stores all values as strings. And more syntatic sugar below.


first, install Tokyo-Cabinet source, then, from a the directory containing this file:

# requires cython for now.
$ cython src/ctotable.pyx
$ python build_ext -i

# test
$ PYTHONPATH=. python totable/tests/

# install
$ sudo python install

Example Use

Make some fake data. Note it works just like a DBM or dictionary, except that the values themselves are dictionaries.

>>> from totable import ToTable, Col
>>> tbl = ToTable('doctest.tct', 'w')
>>> fnames = ['fred', 'jane', 'john', 'mark', 'bill', 'ted', 'ann']
>>> lnames = ['smith', 'cox', 'kit', 'ttt', 'zzz', 'ark', 'ddk']
>>> for i in range(len(fnames)):
...     tbl[str(i)] = {'fname': fnames[i], 'lname': lnames[i],
...                    'age': str((10 + i) * 2)}
...     tbl[str(i + len(fnames))] = {'fname': fnames[i],
...                                  'lname': lnames[len(lnames) - i - 1],
...                                   'age': str((30 + i) * 2)}

>>> len(tbl)


Col, as sent to the select method makes it easy to do queries on a database the format is Col(colname) == ‘Fred’ where colname is one of the keys in the dictionary items in the database. or can use kwargs to select()

[('1', {'lname': 'cox', 'age': '22', 'fname': 'jane'}), ('12', {'lname': 'cox', 'age': '70', 'fname': 'ted'})]

though using Col gives more power


>>> results ='fname').startswith('j'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane cox', 'jane ark', 'john kit', 'john zzz']


#and combine queries by sending them in together.
>>> results ='fname').startswith('j'), Col('lname').endswith('k'))
>>> [d['fname'] + ' ' + d['lname'] for k, d in results]
['jane ark']


this works like an sql query with ‘%’ on either end. (dont attach those values to the query!). so to get everyone with and ‘e’ in their firstname…

>>> r ='fname').like('e'))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'jane', 'ted']


return row that exactly match 1 of the values in the list.

>>> r ='fname').in_list(['ted', 'fred']))
>>> sorted(set([v['fname'] for k, v in r]))
['fred', 'ted']

>>> r ='age').in_list([20, 70]))
>>> sorted(set([v['age'] for k, v in r]))
['20', '70']


use for number querying between a min and max. includes the endpoints.

>>> r ='age').between(68, 70))
>>> [v['age'] for k, v in r]
['68', '70']

numeric queries (richcmp)

in TC, everything is stored as strings, but you can force number based comparisons with ToTable by using (you guessed it) a number. Or using a string for non-numeric comparisons.

>>> results ='age') > 68)
>>> [d['age'] for k, d in results]
['70', '72']

combining queries

just add multiple Col() arguments to the select() call and they will be essentially and’ed together.

>>> results ='age') > 68, Col('age') < 72)
>>> [d['age'] for k, d in results]


for example get everything that’s not a given value…

>>> results ='age') <= 68)
>>> [d['age'] for k, d in results]
['70', '72']

#all rows where fname is not 'jane'
>>> results ='fname') != 'jane')
>>> 'jane' in [d['fname'] for k, d in results]

Regular Expression Matching

supports normal regular expression characters “[ $ ^ | ” , etc.

>>> results ='fname').matches("a"))
>>> sorted(set([d['fname'] for k, d in results]))
['ann', 'jane', 'mark']

>>> results ='fname').matches("^a"))
>>> sorted(set([d['fname'] for k, d in results]))


just like SQL, yo.

>>> results ='age') < 68, limit=1)
>>> len(results)


currently only works for string keys. use ‘-‘ for descending and ‘+’ for ascending

>>> [v['fname'] for k, v in'cox', order='-fname')]
['ted', 'jane']

# ascending
>>> [v['fname'] for k, v in'cox', order='+fname')]
['jane', 'ted']


TC is a key-value store, but it also acts as a table. it may be convenient to get just the values as you’d expect from a database table. Note in all examples above, the ‘k’ey is not used, only the value dictionary. This can be made simpler with ‘values_only’. When ‘values_only’ is True, some python call overhead is removed as well.

>>>'fname').matches("^a"), values_only=True)
[{'lname': 'ddk', 'age': '32', 'fname': 'ann'}, {'lname': 'smith', 'age': '72', 'fname': 'ann'}]


since it’s schemaless, you can add anything

>>> tbl['weird'] = {"val": "hello"}
>>> tbl['weird']
{'val': 'hello'}


delete as expected for a dictionary interface.

>>> del tbl['weird']
>>> print tbl.get('weird')


encapsulates put, putkeep and putcat with a mode kwarg that takes ‘p’ or ‘k’ or ‘c’ respectively.

>>> tbl.put('a', {'a': '1'}, mode='p')
>>> tbl.put('a', {'a': '2'}, mode='k')
>>> assert tbl['a'] == {'a': '1'}

>>> tbl.put('b', {'a': '3'}, mode='k')

>>> tbl.put('a', {'b': '99'}, 'c')
>>> assert tbl['a'] == {'a': '1', 'b': '99'}

Performance Tuning

Tokyo Cabinet allows you to tune or optimize a table. the available parameters are:

  • bnum specifies the number of elements of the bucket array. Suggested size of ‘bnum’ is about from 0.5 to 4 times of the number of all records to be stored. default is about 132K.
  • apow specifies the size of record alignment by power of 2. The default value is 4 standing for 2^4=16.
  • fpow specifies the maximum number of elements of the free block pool by power of 2. The default value is 10 standing for 2^10=1024.
  • opts specifies options by bitwise-or (|):
    • ‘TDBTLARGE’ must be specified to use a database larger than 2GB. (you must also specify a config flag when compiling the TC library to enable this)
    • ‘TDBTDEFLATE’ use Deflate encoding.
    • ‘TDBTBZIP’ use BZIP2 encoding.
    • ‘TDBTTCBS’ use TCBS encoding.

The other parameters: cache and mmap_size are explained below.


The arguments can be sent to the constructor.

>>> import totable
>>> t = ToTable("some.tct", 'w', bnum=1234, fpow=6, \
...                    opts=totable.TDBTLARGE | totable.TDBTBZIP)

>>> t.close()


optimize is called on an database opened with mode=’w’. if no arguments are specified, it will automatically adjust ‘bnum’ (only) according to the number of elements in the table.

>>> t = ToTable("some.tct", 'w')

# ... add some records ...
>>> t.optimize()


mmap_size is the size of mapped memory. default is 67,108,864 (64MB) set in the constructor. this is xmsiz in TC parlance.

>>> t.close()
>>> t = ToTable("some.tct", 'w', mmap_size=128 * 1e6) # ~128MB.


TC also allows setting various caching parameters. * rcnum is the max number of records to be cached. default is 0 * lcnum is the max number of leaf-nodes to be cached. default is 4096 * ncnum is the max number of non-leaf nodes cached. default is 512 these also must be set in the constructor.

>>> t.close()
>>> t = ToTable("some.tct", 'w', rcnum=1e7, lcnum=32768)


create or delete a ‘s’tring or ‘d’ecimal index on a column for faster queries.

# create a decimal index on the number column 'age'.
>>> tbl.create_index('age', 'd')

# create a 'string index on the string column 'fname'.
>>> tbl.create_index('fname', 's')

# remove the index.
>>> tbl.delete_index('fname')

# optimize the index
>>> tbl.optimize_index('age')


remove all records from the db.

>>> len(tbl)
>>> tbl.clear()
>>> len(tbl)


do stuff in a transaction. a rollback() is performed on any exceptions.

>>> try:
...     with transaction(tbl):
...         tbl['zzz'] = {'a': '4'}
...         1/0
... except: pass

>>> 'zzz' in tbl

See Also

Release History

This version
History Node


History Node


Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, Size & Hash SHA256 Hash Help File Type Python Version Upload Date
(59.1 kB) Copy SHA256 Hash SHA256
Source None Jan 7, 2010

Supported By

Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Google Google Cloud Servers