multiprocessing enabled out-of-memory data analysis library for tabular data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Tablite

Build status

Overview

Tablite seeks to be the go-to library for tabular data with an api that is as close in synxtax to pure python as possible.

Even smaller memory footprint

Tablite uses HDF5 as a backend with strong abstraction, so that copy, append & repetition of data is handled in pages. This is imperative for incremental data processing.

Tablite tests for memory footprint. One test compares the memory footprint of 10,000,000 integers where tablite will use < 1 Mb RAM in contrast to python which will require around 133.7 Mb of RAM (1M lists with 10 integers). Tablite also tests to assure that working with 1Tb of data is tolerable.

Tablite achieves this by using HDF5 as storage which is faster than mmap'ed files for the average case [1, 2 ] and stores all data in /tmp/tablite.hdf5 so if your OS (windows/linux/mac) sits on a SSD it will benefit from high IOPS and permit slices of 9,000,000,000 rows in less than a second.

Multiprocessing enabled by default

Tablite uses multiprocessing for bypassing the GIL on all major operations. CSV import is tested with 96M fields that are imported and type-mapped to native python types in 120 secs.

All algorithms have been reworked to respect memory limits

Tablite respects the limits of free memory by tagging the free memory and defining task size before each memory intensive task is initiated (join, groupby, data import, etc)

100% support for all python datatypes

Tablite wants to make it easy for you to work with data. tablite.Table's behave like a dict with lists:

my_table[column name] = [... data ...].

Tablite uses datatype mapping to HDF5 native types where possible and uses type mapping for non-native types such as timedelta, None, date, time… e.g. what you put in, is what you get out. This is inspired by bank python.

Light weight

Tablite is ~200 kB.

Helpful

Tablite wants you to be productive, so a number of helpers are available.

Table.import_file to import csv*, tsv, txt, xls, xlsx, xlsm, ods, zip and logs. There is automatic type detection (see tutorial.ipynb)
To peek into any supported file use get_headers which shows the first 10 rows.
Use mytable.rows and mytable.columns to iterate over rows or columns.
Create multi-key .index for quick lookups.
Perform multi-key .sort,
Filter using .any and .all to select specific rows.
use multi-key .lookup and .join to find data across tables.
Perform .groupby and reorganise data as a .pivot table with max, min, sum, first, last, count, unique, average, st.deviation, median and mode
Append / concatenate tables with += which automatically sorts out the columns - even if they're not in perfect order.
Should you tables be similar but not the identical you can use .stack to "stack" tables on top of each other

If you're still missing something add it to the wishlist

Installation

Tablite

Install: pip install tablite
Usage: >>> from tablite import Table

General overview

want to...	this way...
loop over rows	`[ row for row in table.rows ]`
loop over columns	`[ table[col_name] for col_name in table.columns ]`
slice	`myslice = table['A', 'B', slice(0,None,15)]`
get column by name	`my_table['A']`
get row by index	`my_table[9_000_000_001]`
value update	`mytable['A'][2] = new value`
update w. list comprehension	`mytable['A'] = [ x*x for x in mytable['A'] if x % 2 != 0 ]`
join	`a_join = numbers.join(letters, left_keys=['colour'], right_keys=['color'], left_columns=['number'], right_columns=['letter'], kind='left')`
lookup	`travel_plan = friends.lookup(bustable, (DataTypes.time(21, 10), "<=", 'time'), ('stop', "==", 'stop'))`
groupby	`group_by = table.groupby(keys=['C', 'B'], functions=[('A', gb.count)])`
pivot table	`my_pivot = t.pivot(rows=['C'], columns=['A'], functions=[('B', gb.sum), ('B', gb.count)], values_as_rows=False)`
index	`indices = old_table.index(*old_table.columns)`
sort	`lookup1_sorted = lookup_1.sort(**{'time': True, 'name':False, "sort_mode":'unix'})`
filter	`true, false = unfiltered.filter( [{"column1": 'a', "criteria":">=", 'value2':3}, ... more criteria ... ], filter_type='all' )`
find any	`any_even_rows = mytable.any('A': lambda x : x%2==0, 'B': lambda x > 0)`
find all	`all_even_rows = mytable.all('A': lambda x : x%2==0, 'B': lambda x > 0)`
to json	`json_str = my_table.to_json()`
from json	`Table.from_json(json_str)`

Tutorial

To learn more see the tutorial.ipynb (Jupyter notebook)

Credits

Martynas Kaunas - GroupBy functionality.
Audrius Kulikajevas - Edge case testing / various bugs.
realratchet - Jupyter notebook integration.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2023.11.5

Apr 22, 2024

2023.11.4

Apr 17, 2024

2023.11.3

Apr 12, 2024

2023.11.2

Apr 10, 2024

2023.11.1

Apr 8, 2024

2023.11.0

Apr 5, 2024

2023.10.15

Apr 4, 2024

2023.10.14

Mar 27, 2024

2023.10.13

Mar 20, 2024

2023.10.12

Mar 18, 2024

2023.10.11

Mar 15, 2024

2023.10.10

Mar 15, 2024

2023.10.9

Mar 14, 2024

2023.10.8

Mar 8, 2024

2023.10.7

Mar 8, 2024

2023.10.6

Mar 7, 2024

2023.10.5

Mar 6, 2024

2023.10.4

Mar 6, 2024

2023.10.3

Mar 4, 2024

2023.10.2

Feb 28, 2024

2023.10.1

Feb 26, 2024

2023.10.0

Feb 22, 2024

2023.9.8

Feb 6, 2024

2023.9.7

Feb 1, 2024

2023.9.6

Jan 31, 2024

2023.9.5

Jan 30, 2024

2023.9.4

Jan 29, 2024

2023.9.3

Jan 26, 2024

2023.9.2

Jan 26, 2024

2023.9.1

Jan 25, 2024

2023.9.0

Jan 25, 2024

2023.8.11

Nov 24, 2023

2023.8.10

Nov 16, 2023

2023.8.9

Nov 15, 2023

2023.8.8

Nov 14, 2023

2023.8.7

Nov 8, 2023

2023.8.6

Nov 8, 2023

2023.8.5

Nov 8, 2023

2023.8.4

Nov 7, 2023

2023.8.3

Oct 26, 2023

2023.8.2

Oct 25, 2023

2023.8.1

Oct 24, 2023

2023.8.0

Oct 23, 2023

2023.8.dev72 pre-release

Nov 8, 2023

2023.8.dev7 pre-release

Oct 17, 2023

2023.8.dev6 pre-release

Oct 12, 2023

2023.8.dev5 pre-release

Oct 10, 2023

2023.8.dev4 pre-release

Oct 6, 2023

2023.8.dev3 pre-release

Oct 5, 2023

2023.8.dev2 pre-release

Oct 5, 2023

2023.8.dev1 pre-release

Oct 4, 2023

2023.8.dev0 pre-release

Oct 4, 2023

2023.7.dev6 pre-release

Sep 26, 2023

2023.7.dev5 pre-release

Sep 25, 2023

2023.7.dev4 pre-release

Aug 31, 2023

2023.7.dev3 pre-release

Aug 30, 2023

2023.7.dev2 pre-release

Aug 28, 2023

2023.7.dev1 pre-release

Aug 25, 2023

2023.7.dev0 pre-release

Aug 23, 2023

2023.6.5

Aug 18, 2023

2023.6.4

Aug 16, 2023

2023.6.3

Aug 14, 2023

2023.6.2

Aug 10, 2023

2023.6.1

Aug 1, 2023

2023.6.dev14 pre-release

Jul 13, 2023

2023.6.dev13 pre-release

Jul 11, 2023

2023.6.dev12 pre-release

Jul 3, 2023

2023.6.dev11 pre-release

Jul 3, 2023

2023.6.dev10 pre-release

Jun 27, 2023

2023.6.dev9 pre-release

Jun 22, 2023

2023.6.dev8 pre-release

Jun 19, 2023

2023.6.dev7 pre-release

Jun 19, 2023

2023.6.dev6 pre-release

Jun 16, 2023

2023.6.dev5 pre-release

Jun 13, 2023

2023.6.dev4 pre-release

Jun 13, 2023

2023.6.dev3 pre-release

Jun 12, 2023

2023.6.dev2 pre-release

Jun 9, 2023

2023.6.dev1 pre-release

Jun 6, 2023

2022.11.19

May 15, 2023

2022.11.18

May 8, 2023

2022.11.17

Apr 14, 2023

2022.11.16

Apr 7, 2023

2022.11.15

Apr 6, 2023

2022.11.14

Mar 31, 2023

2022.11.13

Mar 29, 2023

2022.11.12

Mar 17, 2023

2022.11.11

Mar 16, 2023

2022.11.10

Mar 13, 2023

2022.11.9

Mar 10, 2023

2022.11.8

Mar 9, 2023

2022.11.7

Mar 8, 2023

2022.11.6

Feb 28, 2023

2022.11.5

Feb 20, 2023

2022.11.4

Jan 26, 2023

2022.11.3

Dec 4, 2022

2022.11.2

Nov 28, 2022

2022.11.1

Nov 28, 2022

2022.11.0

Nov 23, 2022

2022.11.dev6 pre-release

Nov 18, 2022

2022.11.dev5 pre-release

Nov 14, 2022

2022.11.dev4 pre-release

Nov 9, 2022

2022.11.dev3 pre-release

Nov 7, 2022

2022.11.dev2 pre-release

Nov 5, 2022

2022.11.dev1 pre-release

Nov 5, 2022

2022.10.12

Oct 30, 2022

2022.10.11

Oct 20, 2022

2022.10.10

Oct 19, 2022

2022.10.9

Oct 18, 2022

2022.10.8

Oct 10, 2022

2022.10.7

Sep 8, 2022

2022.10.6

Sep 7, 2022

2022.10.5

Sep 5, 2022

2022.10.4

Aug 30, 2022

2022.10.3

Aug 21, 2022

2022.10.2

Aug 21, 2022

This version

2022.10.1

Aug 21, 2022

2022.10.0

Aug 21, 2022

2022.9.3

Aug 19, 2022

2022.9.1

Aug 18, 2022

2022.9.0

Aug 16, 2022

2022.8.0

Aug 7, 2022

2022.7.9

Aug 5, 2022

2022.7.8

Aug 4, 2022

2022.7.7

Aug 3, 2022

2022.7.6

Jul 26, 2022

2022.7.5

Jul 26, 2022

2022.7.4

Jul 25, 2022

2022.7.3

Jul 25, 2022

2022.7.2

Jul 21, 2022

2022.7.1

Jul 21, 2022

2022.7.0

Jul 14, 2022

2022.7.dev5 pre-release

Jul 12, 2022

2022.7.dev4 pre-release

Jul 12, 2022

2022.7.dev2 pre-release

Jul 8, 2022

2022.7.dev0 pre-release

Jul 13, 2022

2022.2.14.79350

Feb 14, 2022

2022.2.5.67057

Feb 5, 2022

2022.1.26.39915

Jan 26, 2022

2022.1.26.31981

Jan 26, 2022

2022.1.25.68738

Jan 25, 2022

2022.1.24.56156

Jan 24, 2022

2021.11.5.66041

Nov 5, 2021

2021.11.3.62708

Nov 3, 2021

2021.6.15.38091

Jun 15, 2021

2021.5.21.47274

May 21, 2021

2021.5.20.64155

May 20, 2021

2021.3.11.45804

Mar 11, 2021

2021.3.11.33688

Mar 11, 2021

2021.3.4.65414

Mar 4, 2021

2021.3.2.36410

Mar 2, 2021

2021.2.18.60263

Feb 18, 2021

2021.2.18.54360

Feb 18, 2021

2021.2.15.44662

Feb 15, 2021

2021.2.10.52756

Feb 10, 2021

2020.12.21.68845

Dec 21, 2020

2020.11.3.62707

Nov 3, 2020

2020.11.3.61944

Nov 3, 2020

2020.11.3.59813

Nov 3, 2020

2020.11.3.53696

Nov 3, 2020

2020.10.30.46577

Oct 30, 2020

2020.10.29.44766

Oct 29, 2020

2020.10.28.59904

Oct 28, 2020

2020.10.28.59727

Oct 28, 2020

2020.10.28.59455

Oct 28, 2020

2020.9.30.51757

Sep 30, 2020

2020.7.17.40404

Jul 17, 2020

2020.7.16.58401

Jul 16, 2020

2020.6.30.66481

Jun 30, 2020

2020.6.30.57694

Jun 30, 2020

2020.6.28.62006

Jun 28, 2020

2020.6.28.56572

Jun 28, 2020

2020.6.28.55011

Jun 28, 2020

2020.6.27.58703

Jun 27, 2020

2020.6.27.54477

Jun 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tablite-2022.10.1-py3-none-any.whl (59.1 kB view hashes)

Uploaded Aug 21, 2022 Python 3

Hashes for tablite-2022.10.1-py3-none-any.whl

Hashes for tablite-2022.10.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cadae289aa99b586b7acca4cceb56e8b7891c56973d71313ba147f10cd769722`
MD5	`d86f1bc902c5c77763a7aff67bfbb54c`
BLAKE2b-256	`705511a9775877ab173c25309fdc535c51e8a8f810126a63dee049128ba5239f`