Read, write, and query sparse tables
Project description
Sparse-Numeric-Table
Query, write, and read sparse, numeric tables.
I love pandas.DataFrame
and numpy.recarray
, but with large and sparse tables I run out of memory or struggle to represent empty integer fields with the float's NaN
.
Here I use a dict
of numpy.recarray
s to represent large and sparse tables.
Writing into tarfile
s (.tar
) preserves the table's hirachy and makes it easy to explore in the file-system. I use pandas.merge
to query.
Restictions
- Only numeric fields
- Index is unsigned integer
Pros
- Fast read / write with
numpy
binaries (explicit endianness). - Just a
dict
ofnumpy.recarray
s. No classes. No stateful functions. - Easy to explore files in the tapearchive
.tar
.
Features
- Read from file / write to file.
- Create from 'records' (A list of dicts, each representing one row in the table)
- Query, cut, and merge on row-indices (columns can be omitted for speed)
- Concatenate files.
Usage
See ./sparse_numeric_table/tests
.
1st) You create a dict
representing the structure and dtype
of your table.
Columns which only appear together are bundeled into a level
. Each level
has an index to merge and join with other level
s.
my_table_structure = {
"A": {
"a": {"dtype": "<u8"},
"b": {"dtype": "<f8"},
"c": {"dtype": "<f4"},
},
"B": {
"g": {"dtype": "<i8"},
},
"C": {
"m": {"dtype": "<i2"},
"n": {"dtype": "<u8", "comment": "Some comment related to 'n'."},
},
}
Here A
, B
, and C
are the level
-keys. a, ... , n
are the column-keys.
You can add comments for yourself, but sparse_numeric_table
will ignore these.
2nd) You create/read/write the table.
A B C
idx a b c idx g idx m n
___ _ _ _ ___ _
|_0_|_|_|_| |_0_|_|
|_1_|_|_|_|
|_2_|_|_|_| ___ _
|_3_|_|_|_| |_3_|_|
|_4_|_|_|_| |_4_|_| ___ _ _
|_5_|_|_|_| |_5_|_| |_5_|_|_|
|_6_|_|_|_|
|_7_|_|_|_|
|_8_|_|_|_| ___ _
|_9_|_|_|_| |_9_|_|
|10_|_|_|_| |10_|_|
|11_|_|_|_| ___ _ ___ _ _
|12_|_|_|_| |12_|_| |12_|_|_|
|13_|_|_|_| ___ _
|14_|_|_|_| |14_|_|
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file sparse_numeric_table_sebastian_achim_mueller-0.0.6-py2.py3-none-any.whl
.
File metadata
- Download URL: sparse_numeric_table_sebastian_achim_mueller-0.0.6-py2.py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e9820d3e51ad1cfcddd6d13ab8cf77e80cbff57e7485c6856dbc21135dead7e |
|
MD5 | 912099b97ad94a9d4f4312d90af7f394 |
|
BLAKE2b-256 | bd908875ebc3ee3fe1e55d94a42bbec73d2a9c31dd22ec061faa03b30bca72ed |