Skip to main content

A set of python modules for data structures written in C

Project description

Struct Pie

A set of python modules for data structures. All data structures are written in C and imported into python using cython. No python code involved to make the data structure pretty fast.

The idea is to extend python data structures to add more flexibility and to include more data structures that can help in different situations.

There is also a StructPie project on sourceforge.net that contains all these data structures as C shared libraries that can readily be used in C projects. The libraries can be downloaded from here.

Quick Start

To install the library

pip install structpie

Testing the different data structures from within python

Hash Table

Separate Chaining hash map that enables fast lookup and search. It facilitates searching by "value" O(1), in contrary to python dictionary which enables O(1) search when key is known.

For full comparison between list, dict and HashTable, try to run the example.py script in /hash_table/ folder.

To initiate a hash table, size and type should be specified. It supports 'int', 'float' and 'str'

from structpie import *

size = 10
type = "str"
ht = HashTable(size, type)
arr = ["Hello", "Salam", "Hola", "Ola", "Ciao", "Konnichiwa", "Merhaba", "Privetstviye"]

If the table is of "str" type, input string should be converted into bytes for Cython to be able to convert it into C char*.

# insert data into the table
for i in arr:
    ht.insert_val(i.encode())

# print the table
ht.display()
# index(0):  Hello
# index(1):
# index(2):
# index(3):  Konnichiwa
# index(4):  Ola
# index(5):  Ciao
# index(6):
# index(7):  Hola
# index(8):  Merhaba  Salam
# index(9):  Privetstviye

Check the size of the table and how many indices are occupied.

print("size of the table is %d while number of filled indices is %d" %
    (ht.length(), ht.occupied()))
# size of the table is 10 while number of filled indices is 7

Search for a value in the table, remove and search by index:

print("try now to search for and delete some values: (0 means False while 1 is True)")
# try now to search for and delete some values: (0 means False while 1 is True)
print("search for value Merhaba --> %d" % ht.search_value("Merhaba".encode()))
# search for value Merhaba --> 1
print("search for value Namaste --> %d" % ht.search_value("Namaste".encode()))
# search for value Namaste --> 0
print("search by index 3 --> value %s found" % ht.search_by_index(3))
# search by index 3 --> value Konnichiwa found
print("remove value Ola --> %d" % ht.val_del("Ola".encode()))
# remove value Ola --> 1
print("search again for value Ola --> %d" % ht.search_value("Ola".encode()))
# search again for value Ola --> 0
print("try again to remove value Ola --> %d" % ht.val_del("Ola".encode()))
# try again to remove value Ola --> 0

Print again

ht.display()
# index(0):  Hello
# index(1):
# index(2):
# index(3):  Konnichiwa
# index(4):
# index(5):  Ciao
# index(6):
# index(7):  Hola
# index(8):  Merhaba  Salam
# index(9):  Privetstviye

Get index of a specific value

ht.get_index(b"Hello")
# 0

ht.get_index(b"Salam")
# 8

We also can get all the values in a particular index

ht.search_index(8)
# ['Merhaba', 'Salam']

ht.search_index(1)
# []

ht.search_index(ht.get_index(b"Salam"))
# ['Merhaba', 'Salam']

Displaying first certain number of indices with display method. Acts like head in python dataframes:

ht.display(5) ##specifying how many indices to display

# index(0):  Hello
# index(1):
# index(2):
# index(3):  Konnichiwa
# index(4):

There are some empty unused indices that's because the hash_func used with string is so simple for now but better implementation will be available in future versions.

Meanwhile in HashTable this issue will be seen less.

LIFO & FIFO Stack

from structpie import *

# initiate a stack of type "int"
st = Stack("int")

# push values into the stack
st.push(13)
st.push(11)
st.push(19)
st.push(91)

# print the stack
st.display()
# [13 11 19 91 ]

# pop from right (LIFO)
st.pop()
# 91

# pop from left (FIFO)
st.popleft()
# 13

# check length
st.length()
# 2 

# print again
st.display()
# [11 19 ]

The stack can also contain data of type 'float' and 'str'

Binary Search Tree

Binary search tree is a great data structure for quick searching and dynamic sorting while inserting new data. It is widely used in so many applications like indexing in databases for fast lookup.

Struct Pie provides a C-written binary search tree for usage in python.

Insertion is a little bit slower than appending to a python list, but searching is so much faster.

Struct Pie BST nodes can hold 3 data pieces; (int index, char* info1, float info2). This will be extended soon to hold more data.

Let's try the binary search tree on some sample data about Uefa Champions League all-time goalscorers. Each row represents (number of goals, name, goal/game ratio).

# initiate bstree
bst = BSTree()

import pandas as pd
data = pd.read_csv("./binary_search_tree/example_data.csv", header = None)
data
#      0                    1     2
#0   130    Cristiano Ronaldo  0.76
#1   115         Lionel Messi  0.80
#2    71        Raul Gonzalez  0.50
#3    68   Robert Lewandowski  0.76
#4    65        Karim Benzema  0.54
#5    56  Ruud van Nistelrooy  0.77
#6    50        Thierry Henry  0.45
#7    49   Alfredo Di Stefano  0.84
#8    48    Andriy Shevchenko  0.48
#9    48   Zlatan Ibrahimovic  0.40
#10   46        Thomas Muller  0.40

Let's insert the data into the tree

for i in range(data.shape[0]):
    bst.insert(data.iloc[i, 0], data.iloc[i, 1].encode(), data.iloc[i, 2])

# check that all data was inserted
data.shape[0] == bst.node_count()
# True

Some operations on the tree; (inorder) printing, searching and deleting

# print inorder 
bst.inorder()
# 46 -> 48 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 71 -> 115 -> 130 ->

# search for an index
bst.search(40)
# "node doesn't exist"

bst.search(48)
# (48, b'Andriy Shevchenko', 0.48)

bst.search(46)
# (46, b'Thomas Muller', 0.4)

bst.search(49)
# (49, b'Alfredo Di Stefano', 0.84)

# delete a node
bst.remove(48)

# new order after deletion
bst.inorder()
# 46 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 71 -> 115 -> 130 ->

bst.node_count()
# 10

bst.remove(71)
bst.node_count()
# 9
bst.inorder()
# 46 -> 48 -> 49 -> 50 -> 56 -> 65 -> 68 -> 115 -> 130 ->

bst.search(65)
# (65, b'Karim Benzema', 0.54)

Hash Table with Binary Search Tree

Implementing hash table data structure that uses separate chaining to avoid collisions. While in each indiex it uses a binary search tree instead of linked lists to store data that has similar indices produced by the hashing function to achieve faster lookup.

Insertion is slower than insertion in linked lists but searching is faster.

from structpie import *
size = 5
type = "int"
hbst = HashBSTree(size, type)
arr = [13, 11, 19, 91, 22, 12, 19, 94]

for i in arr:
    hbst.insert_val(i)

hbst.display()

# index(0):
# index(1): 11 -> 91 ->
# index(2): 12 -> 22 ->
# index(3): 13 ->
# index(4): 19 -> 19 -> 94 ->

(hbst.length(), hbst.occupied())
# (5, 4)
hbst.search_value(94)
# 1
hbst.search_value(15)
# 0
hbst.search_by_index(3)
# '13'
hbst.val_del(91)

hbst.search_value(91)
# 0
hbst.val_del(91)
# The value does not exist
hbst.get_index(19)
# 4
hbst.get_index(11)
# 1

Priority Queue

Priority queue is a very good data structure for inserting new values and ordering the queue according to certain priority so that when poping the more important ones get popped first even if they were inserted later.

pq = PQ()
pq.display()
# The queue is Empty

PQ takes two inputs, for now, (char * name, int priority). This will definitely be expanded soon.

C char* is represented in python as bytes by cython. So while inserting strings, they should be encoded into bytes first.

Let's insert some tasks and priorities:

pq.push(b"exercise", 3)
pq.push(b"sleep", 2)
pq.push(b"repeat", 1)
pq.push(b"code", 5)
pq.display()
# (
#  {code, 5}
#  {exercise, 3}
#  {sleep, 2}
#  {repeat, 1}
# )

Queue was dynamically ordered according to priorities.

Now let's pop:

pq.pop()
# (code) with priority (5) has been removed
# 'code'

Note that pop method prints a confirmation message and returns the task name as a string

some = pq.pop()
# (exercise) with priority (3) has been removed
print(some)
# exercise

pq.display()
# (
#  {sleep, 2}
#  {repeat, 1}
# )

pq.pop()
# (sleep) with priority (2) has been removed
# 'sleep'
pq.pop()
# (repeat) with priority (1) has been removed
# 'repeat'

## when trying to pop from an empty queue, an empty string '' is returned.
pq.pop()
# The queue is empty
# ''
pq.display()
# The queue is Empty

More data structures will be supported soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structpie-0.0.5.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

structpie-0.0.5-cp37-cp37m-win_amd64.whl (114.3 kB view details)

Uploaded CPython 3.7m Windows x86-64

File details

Details for the file structpie-0.0.5.tar.gz.

File metadata

  • Download URL: structpie-0.0.5.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for structpie-0.0.5.tar.gz
Algorithm Hash digest
SHA256 35a277ba4b7706605fd367cf59b220b8a501235da412b5be7f45fcc5ede2c321
MD5 bca5090833ea3ef8dc6f756363806289
BLAKE2b-256 c54695eeccde2c519b181cdd5054b40847f82c06c8c760f5e11c9b4974805818

See more details on using hashes here.

File details

Details for the file structpie-0.0.5-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: structpie-0.0.5-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 114.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for structpie-0.0.5-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 0cf1f1f1fab5257d74345018304a6cda13313b2a5411acdcdcbbc328a5aa4d60
MD5 2c81df6700943dcbe3bb0b5fd35a9c2e
BLAKE2b-256 6fa8cc2000ab3d27645b8bcbe4be4e9a49a18cdab760c4a192527607f4a20825

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page