This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

madoka

Madoka is an implementation of a Count-Min sketch data structure for summarizing data streams.

String-int pairs in a Madoka-Sketch may take less memory than in a standard Python dict.

Based on madoka C++ library.

NOTE: Madoka-Sketch does not have index of keys. so Madoka-Sketch can not dump all keys such as Python dict’s dict.keys().

Installation

$ pip install madoka

Class

Madoka has some classes having same interface. These classes are vary in value data type. So you can choose for your purpose.

For example, if you wants to count float data, it’s preferable to choose CroquisFloat class or CroquisDouble class.

  • Sketch - storing unsigned long long (64bit) and fast implementation
  • CroquisFloat - storing float (32bit)
  • CroquisDouble - storing double (64bit)
  • CroquisUint8 - storing unsigned char (8bit)
  • CroquisUint16 - storing unsigned short (16bit)
  • CroquisUint32 - storing unsigned int (32bit)
  • CroquisUint64 - storing unsigned long long (64bit)

Usage

From here, I will describe about Sketch class. But, Croquis classes have also same interfaces mostly. So you can use other classes by the same way as Sketch class. In that case, you should replace to intended class from “Sketch”.

Create a new sketch

>>> import madoka
>>> sketch = madoka.Sketch()
  • Sketch madoka.Sketch([width=0, max_value=0, path=”, flags=0, seed=0])
    • Permission of file given to path should be 644
    • madoka.Sketch() calls madoka.Sketch.create(), so you don’t have to explicitly call create() in initialization

Increment a key value

>>> sketch['mami'] += 1

or

>>> sketch.inc('mami')
  • int inc(key[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Add a value to the current key value

>>> sketch['mami'] += 6

or

>>> sketch.add('mami', 6)
  • int add(key, value[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Update a key value

>>> sketch['mami'] = 6

or

>>> sketch.set('mami', 6)
  • void set(key, value[, key_length=0])
    • Note that set() does nothing when the given value is not greater than the current key value.
    • Also note that the new value is saturated when the given value is greater than the upper limit.
    • Additionally note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Get a key value

>>> sketch['mami']

or

>>> sketch.get('mami')
  • int get(key[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Get all values

>>> sketch.values()
  • generator<int> values()
    • Note that processing time increases according to sketch’s width. But this method may be slow, so I recommend setting width to less than 1000000 when creating sketch.

Save a sketch to a file

>>> sketch.save('example.madoka')
  • void save(path)
    • Permission of file given to path should be 644

Load a sketch from a file

>>> sketch.load('example.madoka')
  • void load(path)
    • Permission of file given to path should be 644

Clear a sketch

>>> sketch.clear()
  • void clear()
    • Delete all key-value pairs. It differs from create() in maintaining current settings.

Initialize a sketch with settings change

>>> sketch.create()
  • void create([width=0, max_value=0, path=NULL, flags=0, seed=0])
    • Permission of file given to path should be 644

Copy a sketch

>>> sketch.copy(othersketch)
  • void copy(Sketch)

Merge two sketches

>>> sketch += other_sketch

or

>>> sketch.merge(othersketch)
  • void merge(Sketch[, lhs_filter=None, rhs_filter=None])
    • lhs_filter is applied for self.sketch, rhs_filter is applied for given sketch

Shrink a sketch

>>> sketch.shrink(sketch, width=1000)
  • void shrink(Sketch[, width=0, max_value=0, filter=None, path=None, flags=0])
    • When width > 0, width must be less than source sketch
    • Permission of file given to path should be 644

Get summed sketch

>>> summed_sketch = sketch + other_sketch
  • Create summed sketch, So it does not break original sketches

Get summed sketch by dict

>>> summed_sketch = sketch + {'mami': 1, 'kyoko': 2}
  • Create summed sketch, So it does not break original sketches

Check whether sketch contains key value

>>> 'mami' in sketch

Get inner product of two sketches

>>> sketch.inner_product(other_sketch)
  • list<float> inner_product(Sketch)
    • Returns [inner product, square length of left hands sketch (float), square length of right hands sketch (float)]

Apply filter into all values

>>> sketch.filter(lambda x: x + 1)
  • void filter(Callable[, apply_zerovalue=False])
    • If apply_zerovalue = True, filter_method is applied also 0 values (It may be slow) (from version 0.6 or later)
    • Note that processing time increases according to sketch’s width. If you feel this method is slow, I recommend setting width to less than 1000000 when creating sketch

Set values from dict

>>> sketch.fromdict({'mami': 14, 'madoka': 13})

or

>>> sketch += {'mami': 14, 'madoka': 13}
  • void fromdict(dict)

TODO

  • Benchmark score about memory usage compared with Python standard dict and Redis

Contributions are welcome!

License

  • Wrapper code is licensed under New BSD License.
  • Bundled madoka C++ library is licensed under the Simplified BSD License.

CHANGES

0.6 (2014-11-23)

  • Support Python 3.4
  • Improve processing time of inner_product()
  • Fix shrink() method bug
  • Change filter() methods param
  • Support with-statement
  • Implement increment-add from dict - (e.g.) summed_sketch = sketch + dict; sketch += dict

0.5 (2014-04-08)

  • Add Croquis classes handling some data types (e.g., float, uint8)
  • Given length=True to inner product method, returns also square length of both left hands and right hands sketch
  • Add fromdict method

0.4 (2014-03-30)

  • Implement dict-like interface (e.g., sketch[‘key’])
  • Add filter() method
  • Add values() method for dumping all values

0.3 (2014-03-14)

  • Key length is automatically determined when it is not given
  • Remove filter function
  • Slightly decreasing amount of memory usage

0.2 (2013-10-12)

Simplify the step of creating new sketch.

0.1 (2013-10-11)

Initial release.

Release History

Release History

0.6

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
madoka-0.6.tar.gz (83.4 kB) Copy SHA256 Checksum SHA256 Source Nov 23, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting