multiprocessing enabled out-of-memory data analysis library for tabular data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Tablite

Build status

Overview

NEWS: Tablite 2022.7 has breaking changes: Even smaller memory requirements. Multiprocessing enabled by default. Faster than ever before. See the tutorial for details.

We're all tired of reinventing the wheel when we need to process a bit of data.

Pandas has a huge memory overhead when the datatypes are messy (hint: They are!).
Numpy has become a language of it's own. It just doesn't seem pythonic anymore.
Arrows isn't ready.
SQLite is great but just too slow, particularly on disk.
Protobuffer is just overkill for storing data when I still need to implement all the analytics after that.

So what do we do? We write a custom built class for the problem at hand and discover that we've just spent 3 hours doing something that should have taken 20 minutes. No more please!

Solution: Tablite

A python library for tables that does everything you need in < 200 kB.

Install: pip install tablite
Usage: >>> from tablite import Table

Table is multiprocessing enabled by default and ...

behaves like a dict with lists: my_table[column name] = [... data ...]
handles all python datatypes natively: str, float, bool, int, date, datetime, time, timedelta and None
uses HDF5 as storage which is faster than mmap'ed files for the average case. 10,000,000 integers python will use < 1 Mb RAM instead of 133.7 Mb (1M lists with 10 integers).

An instance of a table allows you to:

get rows in a column as mytable['A']
get rows across all columns as mytable[4:8]
slice as mytable['A', 'B', slice(4,8) ].
update individual values with mytable['A'][2] = new value
update many values even faster with list comprehensions such as: mytable['A'] = [ f(x) for x in mytable['A'] if x % 2 != 0 ]

You can:

Use Table.import_file to import csv*, tsv, txt, xls, xlsx, xlsm, ods, zip and logs. There is automatic type detection (see tutorial.ipynb)
To peek into any supported file use get_headers which shows the first 10 rows.
Use mytable.rows and mytable.columns to iterate over rows or columns.
Create multi-key .index for quick lookups.
Perform multi-key .sort,
Filter using .any and .all to select specific rows.
use multi-key .lookup and .join to find data across tables.
Perform .groupby and reorganise data as a .pivot table with max, min, sum, first, last, count, unique, average, st.deviation, median and mode
Append / concatenate tables with += which automatically sorts out the columns - even if they're not in perfect order.
Should you tables be similar but not the identical you can use .stack to "stack" tables on top of each other.

You can store or send data using json, by:

dumping to json: json_str = table.to_json(), or
you can load it with Table.from_json(json_str).-

One-liners

loop over rows: [ row for row in table.rows ]
loop over columns: [ table[col_name] for col_name in table.columns ]
slice: myslice = table['A', 'B', slice(0,None,15)]
join: left_join = numbers.left_join(letters, left_keys=['colour'], right_keys=['color'], left_columns=['number'], right_columns=['letter'])
lookup: travel_plan = friends.lookup(bustable, (DataTypes.time(21, 10), "<=", 'time'), ('stop', "==", 'stop'))
groupby: group_by = table.groupby(keys=['C', 'B'], functions=[('A', gb.count)])
pivot table my_pivot = t.pivot(rows=['C'], columns=['A'], functions=[('B', gb.sum), ('B', gb.count)], values_as_rows=False)
index: indices = old_table.index(*old_table.columns)
sort: lookup1_sorted = lookup_1.sort(**{'time': True, 'name':False, "sort_mode":'unix'})
filter: true,false = unfiltered.filter( [{"column1": 'a', "criteria":">=", 'value2':3}, ... more criteria ... ], filter_type='all' )
any: even = mytable.any('A': lambda x : x%2==0, 'B': lambda x > 0)
all: even = mytable.all('A': lambda x : x%2==0, 'B': lambda x > 0)

Tutorial

To learn more see the tutorial.ipynb

Credits

Martynas Kaunas - GroupBy functionality.
Audrius Kulikajevas - Edge case testing / various bugs.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2023.11.5

Apr 22, 2024

2023.11.4

Apr 17, 2024

2023.11.3

Apr 12, 2024

2023.11.2

Apr 10, 2024

2023.11.1

Apr 8, 2024

2023.11.0

Apr 5, 2024

2023.10.15

Apr 4, 2024

2023.10.14

Mar 27, 2024

2023.10.13

Mar 20, 2024

2023.10.12

Mar 18, 2024

2023.10.11

Mar 15, 2024

2023.10.10

Mar 15, 2024

2023.10.9

Mar 14, 2024

2023.10.8

Mar 8, 2024

2023.10.7

Mar 8, 2024

2023.10.6

Mar 7, 2024

2023.10.5

Mar 6, 2024

2023.10.4

Mar 6, 2024

2023.10.3

Mar 4, 2024

2023.10.2

Feb 28, 2024

2023.10.1

Feb 26, 2024

2023.10.0

Feb 22, 2024

2023.9.8

Feb 6, 2024

2023.9.7

Feb 1, 2024

2023.9.6

Jan 31, 2024

2023.9.5

Jan 30, 2024

2023.9.4

Jan 29, 2024

2023.9.3

Jan 26, 2024

2023.9.2

Jan 26, 2024

2023.9.1

Jan 25, 2024

2023.9.0

Jan 25, 2024

2023.8.11

Nov 24, 2023

2023.8.10

Nov 16, 2023

2023.8.9

Nov 15, 2023

2023.8.8

Nov 14, 2023

2023.8.7

Nov 8, 2023

2023.8.6

Nov 8, 2023

2023.8.5

Nov 8, 2023

2023.8.4

Nov 7, 2023

2023.8.3

Oct 26, 2023

2023.8.2

Oct 25, 2023

2023.8.1

Oct 24, 2023

2023.8.0

Oct 23, 2023

2023.8.dev72 pre-release

Nov 8, 2023

2023.8.dev7 pre-release

Oct 17, 2023

2023.8.dev6 pre-release

Oct 12, 2023

2023.8.dev5 pre-release

Oct 10, 2023

2023.8.dev4 pre-release

Oct 6, 2023

2023.8.dev3 pre-release

Oct 5, 2023

2023.8.dev2 pre-release

Oct 5, 2023

2023.8.dev1 pre-release

Oct 4, 2023

2023.8.dev0 pre-release

Oct 4, 2023

2023.7.dev6 pre-release

Sep 26, 2023

2023.7.dev5 pre-release

Sep 25, 2023

2023.7.dev4 pre-release

Aug 31, 2023

2023.7.dev3 pre-release

Aug 30, 2023

2023.7.dev2 pre-release

Aug 28, 2023

2023.7.dev1 pre-release

Aug 25, 2023

2023.7.dev0 pre-release

Aug 23, 2023

2023.6.5

Aug 18, 2023

2023.6.4

Aug 16, 2023

2023.6.3

Aug 14, 2023

2023.6.2

Aug 10, 2023

2023.6.1

Aug 1, 2023

2023.6.dev14 pre-release

Jul 13, 2023

2023.6.dev13 pre-release

Jul 11, 2023

2023.6.dev12 pre-release

Jul 3, 2023

2023.6.dev11 pre-release

Jul 3, 2023

2023.6.dev10 pre-release

Jun 27, 2023

2023.6.dev9 pre-release

Jun 22, 2023

2023.6.dev8 pre-release

Jun 19, 2023

2023.6.dev7 pre-release

Jun 19, 2023

2023.6.dev6 pre-release

Jun 16, 2023

2023.6.dev5 pre-release

Jun 13, 2023

2023.6.dev4 pre-release

Jun 13, 2023

2023.6.dev3 pre-release

Jun 12, 2023

2023.6.dev2 pre-release

Jun 9, 2023

2023.6.dev1 pre-release

Jun 6, 2023

2022.11.19

May 15, 2023

2022.11.18

May 8, 2023

2022.11.17

Apr 14, 2023

2022.11.16

Apr 7, 2023

2022.11.15

Apr 6, 2023

2022.11.14

Mar 31, 2023

2022.11.13

Mar 29, 2023

2022.11.12

Mar 17, 2023

2022.11.11

Mar 16, 2023

2022.11.10

Mar 13, 2023

2022.11.9

Mar 10, 2023

2022.11.8

Mar 9, 2023

2022.11.7

Mar 8, 2023

2022.11.6

Feb 28, 2023

2022.11.5

Feb 20, 2023

2022.11.4

Jan 26, 2023

2022.11.3

Dec 4, 2022

2022.11.2

Nov 28, 2022

2022.11.1

Nov 28, 2022

2022.11.0

Nov 23, 2022

2022.11.dev6 pre-release

Nov 18, 2022

2022.11.dev5 pre-release

Nov 14, 2022

2022.11.dev4 pre-release

Nov 9, 2022

2022.11.dev3 pre-release

Nov 7, 2022

2022.11.dev2 pre-release

Nov 5, 2022

2022.11.dev1 pre-release

Nov 5, 2022

2022.10.12

Oct 30, 2022

2022.10.11

Oct 20, 2022

2022.10.10

Oct 19, 2022

2022.10.9

Oct 18, 2022

2022.10.8

Oct 10, 2022

2022.10.7

Sep 8, 2022

2022.10.6

Sep 7, 2022

2022.10.5

Sep 5, 2022

2022.10.4

Aug 30, 2022

2022.10.3

Aug 21, 2022

2022.10.2

Aug 21, 2022

2022.10.1

Aug 21, 2022

2022.10.0

Aug 21, 2022

2022.9.3

Aug 19, 2022

2022.9.1

Aug 18, 2022

2022.9.0

Aug 16, 2022

2022.8.0

Aug 7, 2022

2022.7.9

Aug 5, 2022

2022.7.8

Aug 4, 2022

2022.7.7

Aug 3, 2022

2022.7.6

Jul 26, 2022

2022.7.5

Jul 26, 2022

This version

2022.7.4

Jul 25, 2022

2022.7.3

Jul 25, 2022

2022.7.2

Jul 21, 2022

2022.7.1

Jul 21, 2022

2022.7.0

Jul 14, 2022

2022.7.dev5 pre-release

Jul 12, 2022

2022.7.dev4 pre-release

Jul 12, 2022

2022.7.dev2 pre-release

Jul 8, 2022

2022.7.dev0 pre-release

Jul 13, 2022

2022.2.14.79350

Feb 14, 2022

2022.2.5.67057

Feb 5, 2022

2022.1.26.39915

Jan 26, 2022

2022.1.26.31981

Jan 26, 2022

2022.1.25.68738

Jan 25, 2022

2022.1.24.56156

Jan 24, 2022

2021.11.5.66041

Nov 5, 2021

2021.11.3.62708

Nov 3, 2021

2021.6.15.38091

Jun 15, 2021

2021.5.21.47274

May 21, 2021

2021.5.20.64155

May 20, 2021

2021.3.11.45804

Mar 11, 2021

2021.3.11.33688

Mar 11, 2021

2021.3.4.65414

Mar 4, 2021

2021.3.2.36410

Mar 2, 2021

2021.2.18.60263

Feb 18, 2021

2021.2.18.54360

Feb 18, 2021

2021.2.15.44662

Feb 15, 2021

2021.2.10.52756

Feb 10, 2021

2020.12.21.68845

Dec 21, 2020

2020.11.3.62707

Nov 3, 2020

2020.11.3.61944

Nov 3, 2020

2020.11.3.59813

Nov 3, 2020

2020.11.3.53696

Nov 3, 2020

2020.10.30.46577

Oct 30, 2020

2020.10.29.44766

Oct 29, 2020

2020.10.28.59904

Oct 28, 2020

2020.10.28.59727

Oct 28, 2020

2020.10.28.59455

Oct 28, 2020

2020.9.30.51757

Sep 30, 2020

2020.7.17.40404

Jul 17, 2020

2020.7.16.58401

Jul 16, 2020

2020.6.30.66481

Jun 30, 2020

2020.6.30.57694

Jun 30, 2020

2020.6.28.62006

Jun 28, 2020

2020.6.28.56572

Jun 28, 2020

2020.6.28.55011

Jun 28, 2020

2020.6.27.58703

Jun 27, 2020

2020.6.27.54477

Jun 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tablite-2022.7.4-py3-none-any.whl (55.1 kB view hashes)

Uploaded Jul 25, 2022 Python 3

Hashes for tablite-2022.7.4-py3-none-any.whl

Hashes for tablite-2022.7.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`305914c993465b2720cfccbdcabc1aab4882c7c1f8b95fb1597ce0aaf5f7fbf2`
MD5	`840d1426fb984107fdc3f51487703799`
BLAKE2b-256	`6aee4e2c7095d208fadcc59444faf0b3f95518fb682542c2857a247ed2bca44e`