Skip to main content

SQL atop numpy arrays represented as tables. Tables logic forked from github.com/BastiaanBergman/nptab

Project description

About Nptab

Lightweight, intuitive and fast data-tables.

Nptab data-tables are tables with columns and column names, rows and row numbers. Indexing and slicing your data is analogous to numpy array’s. The only real difference is that each column can have its own data type.

Design objectives

I got frustrated with pandas: it’s complicated slicing syntax (.loc, .x, .iloc, .. etc), it’s enforced index column and the Series objects I get when I want a numpy array. With Nptab I created the simplified pandas I need for many of my data-jobs. Just focussing on simple slicing of multi-datatype tables and basic table tools.

  • Intuitive simple slicing.

  • Using numpy machinery, for best performance, integration with other tools and future support.

  • Store data by column numpy arrays (column store).

  • No particular index column, all columns can be used as the index, the choice is up to the user.

  • Fundamental necessities for sorting, grouping, joining and appending tables.

Install

pip install nptab

Quickstart

init

To setup a Nptab:

>>> from nptab import Nptab
>>> nptab = Nptab([ ["John", "Joe", "Jane"],
...                [1.82,1.65,2.15],
...                [False,False,True]], columns = ["Name", "Height", "Married"])
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

Alternatively, Tabls can be setup from dictionaries, numpy arrays, pandas DataFrames, or no data at all. Database connectors usually return data as a list of records, the module provides a convenience function to transpose this into a list of columns.

slice

Slicing can be done the numpy way, always returning Nptab objects:

>>> nptab[1:3,[0,2]]
 Name   |   Married
--------+-----------
 Joe    |         0
 Jane   |         1
2 rows ['<U4', '|b1']

Slices will always return a Nptab except in three distinct cases, when:

  1. explicitly one column is requested, a numpy array is returned:

>>> nptab[1:3,'Name']       # doctest: +SKIP
array(['Joe', 'Jane'],
      dtype='<U4')
  1. explicitly one row is requested, a tuple is returned:

>>> nptab[0,:]
('John', 1.82, False)
  1. explicitly one element is requested:

>>> nptab[0,'Name']
'John'

In general, slicing is intuitive and does not deviate from what would expect from numpy. With the one addition that columns can be referred to by names as well as numbers.

set

Setting elements works the same as slicing:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab[0,"Name"] = "Jos"
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 Jos    |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

The datatype that the value is expected to have, is the same as the datatype a slice would result into.

Adding columns, works the same as setting elements, just give it a new name:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab['new'] = [1,2,3]
>>> nptab
 Name   |   Height |   Married |   new
--------+----------+-----------+-------
 John   |     1.82 |         0 |     1
 Joe    |     1.65 |         0 |     2
 Jane   |     2.15 |         1 |     3
3 rows ['<U4', '<f8', '|b1', '<i8']

Or set the whole column to the same value:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab['new'] = 13
>>> nptab
 Name   |   Height |   Married |   new
--------+----------+-----------+-------
 John   |     1.82 |         0 |    13
 Joe    |     1.65 |         0 |    13
 Jane   |     2.15 |         1 |    13
3 rows ['<U4', '<f8', '|b1', '<i8']

Just like numpy, slices are not actual copies of the data, rather they are references.

append Nptab and row

Tabls can be appended with other Tabls:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab += nptab
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
6 rows ['<U4', '<f8', '|b1']

Or append rows as dictionary:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab.row_append({'Height':1.81, 'Name':"Jack", 'Married':True})
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
 Jack   |     1.81 |         1
4 rows ['<U4', '<f8', '|b1']

instance properties

Your data is simply stored as a list of numpy arrays and can be accessed or manipulated like that (just don’t make a mess):

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab.columns
['Name', 'Height', 'Married']
>>> nptab.data        # doctest: +SKIP
[array(['John', 'Joe', 'Jane'],
      dtype='<U4'), array([ 1.82,  1.65,  2.15]), array([False, False,  True], dtype=bool)]

Further the basic means to asses the size of your data:

>>> nptab.shape
(3, 3)
>>> len(nptab)
3

pandas

For for interfacing with the popular datatable framework, going back and forth is easy:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':range(3),'b':range(10,13)})
>>> df
   a   b
0  0  10
1  1  11
2  2  12

To make a Nptab from a DataFrame, just supply it to the initialize:

>>> nptab = Nptab(df)
>>> nptab
   a |   b
-----+-----
   0 |  10
   1 |  11
   2 |  12
3 rows ['<i8', '<i8']

The dict property of Nptab provides a way to make a DataFrame from a Nptab:

>>> df = pd.DataFrame(nptab.dict)
>>> df
   a   b
0  0  10
1  1  11
2  2  12

Dependencies

  • numpy

  • tabulate (optional, recommended)

  • pandas (optional, for converting back and forth to DataFrames)

Tested on:

  • Python 3.8.2; numpy 1.18.1

Contributing to Nptab

Nptab is perfect already, no more contributions needed. Just kidding!

See the repository for filing issues and proposing enhancements.

  • pytest

    cd nptab/test
    conda activate py38
    pytest
  • pylint

    cd nptab/
    ./pylint.sh
  • doctest

    cd nptab/docs
    make doctest
  • sphynx

    cd nptab/docs
    make html
  • setuptools/pypi

    python setup.py sdist bdist_wheel
    twine upload dist/nptab-*

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

nptab-3.0.0-py3.8.egg (35.0 kB view details)

Uploaded Source

nptab-3.0.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file nptab-3.0.0-py3.8.egg.

File metadata

  • Download URL: nptab-3.0.0-py3.8.egg
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.4

File hashes

Hashes for nptab-3.0.0-py3.8.egg
Algorithm Hash digest
SHA256 0e44f9c48d3de1aa5d229064aa4f9cf97ef78f96864a34528af082bf541da4dd
MD5 f1e24c15661c81af2a9ac72f4a1daca9
BLAKE2b-256 fa379a6d8dce0b6aac9ead7fe846aea5fd88e2a5807eef2a810a765c0b0dabd2

See more details on using hashes here.

File details

Details for the file nptab-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: nptab-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.4

File hashes

Hashes for nptab-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f9fac655f58b3247926ca39e8c6639cb2a9a9b65f4fc37e57159b42deadb43db
MD5 f635ae9acfc49975d42e9eca5b030ee3
BLAKE2b-256 fb8c85cdcd7ff6fa4dab1a3007cdcfe7df9274f53309b0973becd11708ed557b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page