SQL atop numpy arrays represented as tables. Tables logic forked from github.com/BastiaanBergman/npsql
Project description
About Nptab
Lightweight, intuitive and fast data-tables.
Nptab data-tables are tables with columns and column names, rows and row numbers. Indexing and slicing your data is analogous to numpy array’s. The only real difference is that each column can have its own data type.
Design objectives
I got frustrated with pandas: it’s complicated slicing syntax (.loc, .x, .iloc, .. etc), it’s enforced index column and the Series objects I get when I want a numpy array. With Nptab I created the simplified pandas I need for many of my data-jobs. Just focussing on simple slicing of multi-datatype tables and basic table tools.
Intuitive simple slicing.
Using numpy machinery, for best performance, integration with other tools and future support.
Store data by column numpy arrays (column store).
No particular index column, all columns can be used as the index, the choice is up to the user.
Fundamental necessities for sorting, grouping, joining and appending tables.
Install
pip install npsql
Quickstart
init
To setup a Nptab:
>>> from npsql import Nptab >>> npsql = Nptab([ ["John", "Joe", "Jane"], ... [1.82,1.65,2.15], ... [False,False,True]], columns = ["Name", "Height", "Married"]) >>> npsql Name | Height | Married --------+----------+----------- John | 1.82 | 0 Joe | 1.65 | 0 Jane | 2.15 | 1 3 rows ['<U4', '<f8', '|b1']
Alternatively, Tabls can be setup from dictionaries, numpy arrays, pandas DataFrames, or no data at all. Database connectors usually return data as a list of records, the module provides a convenience function to transpose this into a list of columns.
slice
Slicing can be done the numpy way, always returning Nptab objects:
>>> npsql[1:3,[0,2]] Name | Married --------+----------- Joe | 0 Jane | 1 2 rows ['<U4', '|b1']
Slices will always return a Nptab except in three distinct cases, when:
explicitly one column is requested, a numpy array is returned:
>>> npsql[1:3,'Name'] # doctest: +SKIP array(['Joe', 'Jane'], dtype='<U4')
explicitly one row is requested, a tuple is returned:
>>> npsql[0,:] ('John', 1.82, False)
explicitly one element is requested:
>>> npsql[0,'Name'] 'John'
In general, slicing is intuitive and does not deviate from what would expect from numpy. With the one addition that columns can be referred to by names as well as numbers.
set
Setting elements works the same as slicing:
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql[0,"Name"] = "Jos" >>> npsql Name | Height | Married --------+----------+----------- Jos | 1.82 | 0 Joe | 1.65 | 0 Jane | 2.15 | 1 3 rows ['<U4', '<f8', '|b1']
The datatype that the value is expected to have, is the same as the datatype a slice would result into.
Adding columns, works the same as setting elements, just give it a new name:
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql['new'] = [1,2,3] >>> npsql Name | Height | Married | new --------+----------+-----------+------- John | 1.82 | 0 | 1 Joe | 1.65 | 0 | 2 Jane | 2.15 | 1 | 3 3 rows ['<U4', '<f8', '|b1', '<i8']
Or set the whole column to the same value:
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql['new'] = 13 >>> npsql Name | Height | Married | new --------+----------+-----------+------- John | 1.82 | 0 | 13 Joe | 1.65 | 0 | 13 Jane | 2.15 | 1 | 13 3 rows ['<U4', '<f8', '|b1', '<i8']
Just like numpy, slices are not actual copies of the data, rather they are references.
append Nptab and row
Tabls can be appended with other Tabls:
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql += npsql >>> npsql Name | Height | Married --------+----------+----------- John | 1.82 | 0 Joe | 1.65 | 0 Jane | 2.15 | 1 John | 1.82 | 0 Joe | 1.65 | 0 Jane | 2.15 | 1 6 rows ['<U4', '<f8', '|b1']
Or append rows as dictionary:
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql.row_append({'Height':1.81, 'Name':"Jack", 'Married':True}) >>> npsql Name | Height | Married --------+----------+----------- John | 1.82 | 0 Joe | 1.65 | 0 Jane | 2.15 | 1 Jack | 1.81 | 1 4 rows ['<U4', '<f8', '|b1']
instance properties
Your data is simply stored as a list of numpy arrays and can be accessed or manipulated like that (just don’t make a mess):
>>> npsql = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> npsql.columns ['Name', 'Height', 'Married'] >>> npsql.data # doctest: +SKIP [array(['John', 'Joe', 'Jane'], dtype='<U4'), array([ 1.82, 1.65, 2.15]), array([False, False, True], dtype=bool)]
Further the basic means to asses the size of your data:
>>> npsql.shape (3, 3) >>> len(npsql) 3
pandas
For for interfacing with the popular datatable framework, going back and forth is easy:
>>> import pandas as pd >>> df = pd.DataFrame({'a':range(3),'b':range(10,13)}) >>> df a b 0 0 10 1 1 11 2 2 12
To make a Nptab from a DataFrame, just supply it to the initialize:
>>> npsql = Nptab(df) >>> npsql a | b -----+----- 0 | 10 1 | 11 2 | 12 3 rows ['<i8', '<i8']
The dict property of Nptab provides a way to make a DataFrame from a Nptab:
>>> df = pd.DataFrame(npsql.dict) >>> df a b 0 0 10 1 1 11 2 2 12
Dependencies
numpy
tabulate (optional, recommended)
pandas (optional, for converting back and forth to DataFrames)
Tested on:
Python 3.8.2; numpy 1.18.1
Contributing to Nptab
Nptab is perfect already, no more contributions needed. Just kidding!
See the repository for filing issues and proposing enhancements.
pytest
cd npsql/test conda activate py38 pytestpylint
cd npsql/ ./pylint.shdoctest
cd npsql/docs make doctestsphynx
cd npsql/docs make htmlsetuptools/pypi
python setup.py sdist bdist_wheel twine upload dist/npsql-*
Contributors
Stephen Boesch [javadba@gmail.com]
For the original tabel logic: Bastiaan Bergman [Bastiaan.Bergman@gmail.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file npsql-0.1.0-py3.8.egg
.
File metadata
- Download URL: npsql-0.1.0-py3.8.egg
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 158303ad95c7e8a131a2b25806fb4e1603cda171a8968224b6bce5a993fa8d8b |
|
MD5 | fbc32e0caf9370d2daa24795c82ca8ad |
|
BLAKE2b-256 | 5e66ed1c6b23be6a863c1da5c0ba86822efb3c45e56d5d3d2d81b4f292240c52 |
File details
Details for the file npsql-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: npsql-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92338dbc62b447913f6c33b6198d42df574744a010a60e26728a1ba4de82cf92 |
|
MD5 | d9c582c03e3c650145cac1caffba7028 |
|
BLAKE2b-256 | 227201a0376cc8972c5a4138ee584d3099b1623b826b5fa65d9c739bb70f1621 |