Skip to main content

Container for finding Python objects using database-like indexes.

Project description

tests Actions Status Coverage - 100% license - MIT python - 3.7+

ducks

Provides Dex, a Python container for indexing objects of any type.

Install:

pip install ducks

Usage:

from ducks import Dex

objects = [{'x': 4, 'y': 1}, {'x': 6, 'y': 2}, {'x': 8, 'y': 5}]

# Create a Dex containing objects. Index on x and y.
dex = Dex(objects, ['x', 'y'])  

# find the ones you want
dex[{                           # find objects
    'x': {'>': 5, '<': 10},     # where x is between 5 and 10
    'y': {'in': [1, 2, 3]}      # and y is 1, 2, or 3
}]
# result: [{'x': 6, 'y': 2}]

Valid operators are ==, !=, <, <=, >, >=, in, not in.

Is Dex a database?

No. But like a database, Dex uses B-tree indexes and uses them to find results very quickly. It does not any do other database things like SQL, tables, etc. This keeps Dex simple, light, and performant.

Is Dex fast?

Yes. Here's how Dex compares to other object-finders on an example task.

Example benchmark

Benchmark code

The closest thing to a Dex is an in-memory SQLite. While SQLite is a fantastic database, it requires more overhead. As such, Dex is generally faster.

Class APIs

There are three containers.

  • Dex: Can add, remove, and update objects after creation. API
  • ConcurrentDex: Same as Dex, but thread-safe. API
  • FrozenDex: Cannot be changed after creation, it's read-only. But it's super fast, and of course thread-safe. API

All three can be pickled using the special functions ducks.save(dex, file) / ducks.load(file).

Fancy Tricks

Expand for sample code.

Use functions of the object as attributes
You can also index on functions evaluated on the object, as if they were attributes.

Find palindromes of length 5 or 7:

from ducks import Dex
strings = ['bob', 'fives', 'kayak', 'stats', 'pullup', 'racecar']

# define a function that takes the object as input
def is_palindrome(s):
    return s == s[::-1]

dex = Dex(strings, [is_palindrome, len])
dex[{
    is_palindrome: True, 
    len: {'in': [5, 7]}
}]
# result: ['kayak', 'racecar', 'stats']

Functions are evaluated on the object when it is added to the Dex.

Access nested data using functions
Use functions to get values from nested data structures.
from ducks import Dex

objs = [
    {'a': {'b': [1, 2, 3]}},
    {'a': {'b': [4, 5, 6]}}
]

def get_nested(obj):
    return obj['a']['b'][0]

dex = Dex(objs, [get_nested])
dex[{get_nested: 4}]
# result: {'a': {'b': [4, 5, 6]}}
Handle missing attributes

Objects don't need to have every attribute.

  • Objects that are missing an attribute will not be stored under that attribute. This saves lots of memory.
  • To find all objects that have an attribute, match the special value ANY.
  • To find objects missing the attribute, exclude ANY.
  • In functions, raise MissingAttribute to tell Dex the object is missing.

Example:

from ducks import Dex, ANY
from ducks.exceptions import MissingAttribute

objs = [{'a': 1}, {'a': 2}, {}]

def get_a(obj):
    try:
        return obj['a']
    except KeyError:
        raise MissingAttribute  # tell Dex this attribute is missing

dex = Dex(objs, ['a', get_a])

dex[{'a': ANY}]          # result: [{'a': 1}, {'a': 2}]
dex[{get_a: ANY}]        # result: [{'a': 1}, {'a': 2}]
dex[{'a': {'!=': ANY}}]  # result: [{}]

Note that None is treated as a normal value and is stored.

Recipes

How Dex works

For each attribute in the Dex, it holds a B-tree that maps every unique value to the set of objects with that value.

This is a rough idea of the data structure:

class Dex:
    indexes = {
        'attribute1': BTree({10: set(some_obj_ids), 20: set(other_obj_ids)}),
        'attribute2': BTree({'abc': set(some_obj_ids), 'def': set(other_obj_ids)}),
    }
    obj_map = {obj_ids: objects}
}

During a lookup, the object ID sets matching each query value are retrieved. Then set operations like union, intersect, and difference are applied to get the matching object IDs. Finally, the object IDs are converted to objects and returned.

In practice, Dex and FrozenDex have a bit more to them, as they are optimized to have much better memory usage and speed than a naive implementation. For example, FrozenDex uses an array-based tree structure.

See the "how it works" pages for more detail:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ducks-0.4.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ducks-0.4.0-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file ducks-0.4.0.tar.gz.

File metadata

  • Download URL: ducks-0.4.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.15.55-1-MANJARO

File hashes

Hashes for ducks-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ba6cb866bb6855d1d3c08739115ca3739a40fdf6aff89a7466db7ceeb40203bb
MD5 00dc0c5209e258f71a336ffdd02d1b0a
BLAKE2b-256 be2bf0dd36c6f86199548d6e48da783f17817abc35db5b717de162359197cb93

See more details on using hashes here.

File details

Details for the file ducks-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: ducks-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.15.55-1-MANJARO

File hashes

Hashes for ducks-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a853e517e587dead7e136ae6e04442d724657244f51442a1e3df8b53cfed5afc
MD5 42a78f475880c9c043d4c9a108cc25e7
BLAKE2b-256 54fbd34f36f069d980691c5346019f97a8e923090727b37fbc88048e82115690

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page