multiprocessing enabled out-of-memory data analysis library for tabular data.

These details have not been verified by PyPI

Project links

Homepage

Project description

Tablite

Build status

introduction
installation
feature overview
api
tutorial
latest updates
credits

Introduction

Tablite seeks to be the go-to library for manipulating tabular data with an api that is as close in syntax to pure python as possible.

Even smaller memory footprint

Tablite uses numpys fileformat as a backend with strong abstraction, so that copy, append & repetition of data is handled in pages. This is imperative for incremental data processing.

Tablite tests for memory footprint. One test compares the memory footprint of 10,000,000 integers where tablite will use < 1 Mb RAM in contrast to python which will require around 133.7 Mb of RAM (1M lists with 10 integers). Tablite also tests to assure that working with 1Tb of data is tolerable.

Tablite achieves this minimal memory footprint by using a temporary storage set in config.Config.workdir as tempfile.gettempdir()/tablite-tmp. If your OS (windows/linux/mac) sits on a SSD this will benefit from high IOPS and permit slices of 9,000,000,000 rows in less than a second.

Multiprocessing enabled by default

Tablite uses numpy whereever possible and applies multiprocessing for bypassing the GIL on all major operations. CSV import is performed in C through using nims compiler and is as fast the hardware allows.

All algorithms have been reworked to respect memory limits

Tablite respects the limits of free memory by tagging the free memory and defining task size before each memory intensive task is initiated (join, groupby, data import, etc). If you still run out of memory you may try to reduce the config.Config.PAGE_SIZE and rerun your program.

100% support for all python datatypes

Tablite wants to make it easy for you to work with data. tablite.Table's behave like a dict with lists:

my_table[column name] = [... data ...].

Tablite uses datatype mapping to native numpy types where possible and uses type mapping for non-native types such as timedelta, None, date, time… e.g. what you put in, is what you get out. This is inspired by bank python.

Light weight

Tablite is ~200 kB.

Helpful

Tablite wants you to be productive, so a number of helpers are available.

Table.import_file to import csv*, tsv, txt, xls, xlsx, xlsm, ods, zip and logs. There is automatic type detection (see tutorial.ipynb )
To peek into any supported file use get_headers which shows the first 10 rows.
Use mytable.rows and mytable.columns to iterate over rows or columns.
Create multi-key .index for quick lookups.
Perform multi-key .sort,
Filter using .any and .all to select specific rows.
use multi-key .lookup and .join to find data across tables.
Perform .groupby and reorganise data as a .pivot table with max, min, sum, first, last, count, unique, average, st.deviation, median and mode
Append / concatenate tables with += which automatically sorts out the columns - even if they're not in perfect order.
Should you tables be similar but not the identical you can use .stack to "stack" tables on top of each other

If you're still missing something add it to the wishlist

Installation

Get it from pypi:

Install: pip install tablite
Usage: >>> from tablite import Table

Build & test

install nim >= 2.0.0

run: chmod +x ./build_nim.sh run: ./build_nim.sh

Should the default nim not be your desired taste, please use nims environment manager (atlas) and run source nim-2.0.0/activate.sh on UNIX or nim-2.0.0/activate.bat on windows.

install python >= 3.8
python -m venv /your/venv/dir
activate /your/venv/dir
pip install -r requirements.txt
pip install -r requirements_for_testing.py
pytest ./tests

Feature overview

want to...	this way...
loop over rows	`[ row for row in table.rows ]`
loop over columns	`[ table[col_name] for col_name in table.columns ]`
slice	`myslice = table['A', 'B', slice(0,None,15)]`
get column by name	`my_table['A']`
get row by index	`my_table[9_000_000_001]`
value update	`mytable['A'][2] = new value`
update w. list comprehension	`mytable['A'] = [ x*x for x in mytable['A'] if x % 2 != 0 ]`
join	`a_join = numbers.join(letters, left_keys=['colour'], right_keys=['color'], left_columns=['number'], right_columns=['letter'], kind='left')`
lookup	`travel_plan = friends.lookup(bustable, (DataTypes.time(21, 10), "<=", 'time'), ('stop', "==", 'stop'))`
groupby	`group_by = table.groupby(keys=['C', 'B'], functions=[('A', gb.count)])`
pivot table	`my_pivot = t.pivot(rows=['C'], columns=['A'], functions=[('B', gb.sum), ('B', gb.count)], values_as_rows=False)`
index	`indices = old_table.index(*old_table.columns)`
sort	`lookup1_sorted = lookup_1.sort(**{'time': True, 'name':False, "sort_mode":'unix'})`
filter	`true, false = unfiltered.filter( [{"column1": 'a', "criteria":">=", 'value2':3}, ... more criteria ... ], filter_type='all' )`
find any	`any_even_rows = mytable.any('A': lambda x : x%2==0, 'B': lambda x > 0)`
find all	`all_even_rows = mytable.all('A': lambda x : x%2==0, 'B': lambda x > 0)`
to json	`json_str = my_table.to_json()`
from json	`Table.from_json(json_str)`

API

To view the detailed API see api

Tutorial

To learn more see the tutorial.ipynb (Jupyter notebook)

Latest updates

See changelog.md

Credits

Eugene Antonov - the api documentation.
Audrius Kulikajevas - Edge case testing / various bugs, Jupyter notebook integration.
Ovidijus Grigas - various bugs, documentation.
Martynas Kaunas - GroupBy functionality.
Sergej Sinkarenko - various bugs.
Lori Cooper - spell checking.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2023.11.6

May 10, 2024

2023.11.5

Apr 22, 2024

2023.11.4

Apr 17, 2024

2023.11.3

Apr 12, 2024

2023.11.2

Apr 10, 2024

2023.11.1

Apr 8, 2024

2023.11.0

Apr 5, 2024

2023.10.15

Apr 4, 2024

2023.10.14

Mar 27, 2024

2023.10.13

Mar 20, 2024

2023.10.12

Mar 18, 2024

2023.10.11

Mar 15, 2024

2023.10.10

Mar 15, 2024

2023.10.9

Mar 14, 2024

2023.10.8

Mar 8, 2024

2023.10.7

Mar 8, 2024

2023.10.6

Mar 7, 2024

2023.10.5

Mar 6, 2024

2023.10.4

Mar 6, 2024

2023.10.3

Mar 4, 2024

2023.10.2

Feb 28, 2024

2023.10.1

Feb 26, 2024

2023.10.0

Feb 22, 2024

2023.9.8

Feb 6, 2024

2023.9.7

Feb 1, 2024

2023.9.6

Jan 31, 2024

2023.9.5

Jan 30, 2024

2023.9.4

Jan 29, 2024

2023.9.3

Jan 26, 2024

2023.9.2

Jan 26, 2024

2023.9.1

Jan 25, 2024

2023.9.0

Jan 25, 2024

2023.8.11

Nov 24, 2023

2023.8.10

Nov 16, 2023

2023.8.9

Nov 15, 2023

2023.8.8

Nov 14, 2023

2023.8.7

Nov 8, 2023

2023.8.6

Nov 8, 2023

2023.8.5

Nov 8, 2023

2023.8.4

Nov 7, 2023

2023.8.3

Oct 26, 2023

2023.8.2

Oct 25, 2023

2023.8.1

Oct 24, 2023

2023.8.0

Oct 23, 2023

2023.8.dev72 pre-release

Nov 8, 2023

2023.8.dev7 pre-release

Oct 17, 2023

2023.8.dev6 pre-release

Oct 12, 2023

2023.8.dev5 pre-release

Oct 10, 2023

2023.8.dev4 pre-release

Oct 6, 2023

2023.8.dev3 pre-release

Oct 5, 2023

2023.8.dev2 pre-release

Oct 5, 2023

2023.8.dev1 pre-release

Oct 4, 2023

2023.8.dev0 pre-release

Oct 4, 2023

2023.7.dev6 pre-release

Sep 26, 2023

2023.7.dev5 pre-release

Sep 25, 2023

2023.7.dev4 pre-release

Aug 31, 2023

2023.7.dev3 pre-release

Aug 30, 2023

2023.7.dev2 pre-release

Aug 28, 2023

2023.7.dev1 pre-release

Aug 25, 2023

2023.7.dev0 pre-release

Aug 23, 2023

2023.6.5

Aug 18, 2023

2023.6.4

Aug 16, 2023

2023.6.3

Aug 14, 2023

2023.6.2

Aug 10, 2023

2023.6.1

Aug 1, 2023

2023.6.dev14 pre-release

Jul 13, 2023

2023.6.dev13 pre-release

Jul 11, 2023

2023.6.dev12 pre-release

Jul 3, 2023

2023.6.dev11 pre-release

Jul 3, 2023

2023.6.dev10 pre-release

Jun 27, 2023

2023.6.dev9 pre-release

Jun 22, 2023

2023.6.dev8 pre-release

Jun 19, 2023

2023.6.dev7 pre-release

Jun 19, 2023

2023.6.dev6 pre-release

Jun 16, 2023

2023.6.dev5 pre-release

Jun 13, 2023

2023.6.dev4 pre-release

Jun 13, 2023

2023.6.dev3 pre-release

Jun 12, 2023

2023.6.dev2 pre-release

Jun 9, 2023

2023.6.dev1 pre-release

Jun 6, 2023

2022.11.19

May 15, 2023

2022.11.18

May 8, 2023

2022.11.17

Apr 14, 2023

2022.11.16

Apr 7, 2023

2022.11.15

Apr 6, 2023

2022.11.14

Mar 31, 2023

2022.11.13

Mar 29, 2023

2022.11.12

Mar 17, 2023

2022.11.11

Mar 16, 2023

2022.11.10

Mar 13, 2023

2022.11.9

Mar 10, 2023

2022.11.8

Mar 9, 2023

2022.11.7

Mar 8, 2023

2022.11.6

Feb 28, 2023

2022.11.5

Feb 20, 2023

2022.11.4

Jan 26, 2023

2022.11.3

Dec 4, 2022

2022.11.2

Nov 28, 2022

2022.11.1

Nov 28, 2022

2022.11.0

Nov 23, 2022

2022.11.dev6 pre-release

Nov 18, 2022

2022.11.dev5 pre-release

Nov 14, 2022

2022.11.dev4 pre-release

Nov 9, 2022

2022.11.dev3 pre-release

Nov 7, 2022

2022.11.dev2 pre-release

Nov 5, 2022

2022.11.dev1 pre-release

Nov 5, 2022

2022.10.12

Oct 30, 2022

2022.10.11

Oct 20, 2022

2022.10.10

Oct 19, 2022

2022.10.9

Oct 18, 2022

2022.10.8

Oct 10, 2022

2022.10.7

Sep 8, 2022

2022.10.6

Sep 7, 2022

2022.10.5

Sep 5, 2022

2022.10.4

Aug 30, 2022

2022.10.3

Aug 21, 2022

2022.10.2

Aug 21, 2022

2022.10.1

Aug 21, 2022

2022.10.0

Aug 21, 2022

2022.9.3

Aug 19, 2022

2022.9.1

Aug 18, 2022

2022.9.0

Aug 16, 2022

2022.8.0

Aug 7, 2022

2022.7.9

Aug 5, 2022

2022.7.8

Aug 4, 2022

2022.7.7

Aug 3, 2022

2022.7.6

Jul 26, 2022

2022.7.5

Jul 26, 2022

2022.7.4

Jul 25, 2022

2022.7.3

Jul 25, 2022

2022.7.2

Jul 21, 2022

2022.7.1

Jul 21, 2022

2022.7.0

Jul 14, 2022

2022.7.dev5 pre-release

Jul 12, 2022

2022.7.dev4 pre-release

Jul 12, 2022

2022.7.dev2 pre-release

Jul 8, 2022

2022.7.dev0 pre-release

Jul 13, 2022

2022.2.14.79350

Feb 14, 2022

2022.2.5.67057

Feb 5, 2022

2022.1.26.39915

Jan 26, 2022

2022.1.26.31981

Jan 26, 2022

2022.1.25.68738

Jan 25, 2022

2022.1.24.56156

Jan 24, 2022

2021.11.5.66041

Nov 5, 2021

2021.11.3.62708

Nov 3, 2021

2021.6.15.38091

Jun 15, 2021

2021.5.21.47274

May 21, 2021

2021.5.20.64155

May 20, 2021

2021.3.11.45804

Mar 11, 2021

2021.3.11.33688

Mar 11, 2021

2021.3.4.65414

Mar 4, 2021

2021.3.2.36410

Mar 2, 2021

2021.2.18.60263

Feb 18, 2021

2021.2.18.54360

Feb 18, 2021

2021.2.15.44662

Feb 15, 2021

2021.2.10.52756

Feb 10, 2021

2020.12.21.68845

Dec 21, 2020

2020.11.3.62707

Nov 3, 2020

2020.11.3.61944

Nov 3, 2020

2020.11.3.59813

Nov 3, 2020

2020.11.3.53696

Nov 3, 2020

2020.10.30.46577

Oct 30, 2020

2020.10.29.44766

Oct 29, 2020

2020.10.28.59904

Oct 28, 2020

2020.10.28.59727

Oct 28, 2020

2020.10.28.59455

Oct 28, 2020

2020.9.30.51757

Sep 30, 2020

2020.7.17.40404

Jul 17, 2020

2020.7.16.58401

Jul 16, 2020

2020.6.30.66481

Jun 30, 2020

2020.6.30.57694

Jun 30, 2020

2020.6.28.62006

Jun 28, 2020

2020.6.28.56572

Jun 28, 2020

2020.6.28.55011

Jun 28, 2020

2020.6.27.58703

Jun 27, 2020

2020.6.27.54477

Jun 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tablite-2023.11.6-py3-none-any.whl (1.2 MB view details)

Uploaded May 10, 2024 Python 3

File details

Details for the file tablite-2023.11.6-py3-none-any.whl.

File metadata

Download URL: tablite-2023.11.6-py3-none-any.whl
Upload date: May 10, 2024
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tablite-2023.11.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`35bcca8fc6c21bbb55c8f498ff41c04dffb883a34b2e0a97167a9de13d12d696`
MD5	`2b6c8e0b4a87af255b2723c2d07f7189`
BLAKE2b-256	`16f14ce4250514278b71050aac14ee5b5b8507834777d13f2759efc63911ffa5`

See more details on using hashes here.

tablite 2023.11.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tablite

Contents

Introduction

Even smaller memory footprint

Multiprocessing enabled by default

All algorithms have been reworked to respect memory limits

100% support for all python datatypes

Light weight

Helpful

Installation

Build & test

Feature overview

API

Tutorial

Latest updates

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes