NTV-pandas : A semantic, compact and reversible JSON-pandas converter
Project description
NTV-pandas : A semantic, compact and reversible JSON-pandas converter
Why a NTV-pandas converter ?
pandas provide JSON converter but three limitations are present:
- the JSON-pandas converter take into account few data types,
- the JSON-pandas converter is not always reversible (conversion round trip)
- external data types (e.g. TableSchema types) are not included
main features
The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.
The converter integrates:
- all the pandas
dtype
and the data-type associated to a JSON representation, - an always reversible conversion,
- a full compatibility with TableSchema specification
NTV-pandas was developped originally in the json-NTV project
example
In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).
The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).
With the existing JSON interface, these conversions are not possible.
data example
In [1]: from shapely.geometry import Point
from datetime import date
import pandas as pd
import ntv_pandas as npd
In [2]: data = {'index': [100, 200, 300, 400, 500],
'dates::date': [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
'value': [10, 10, 20, 20, 30],
'value32': pd.Series([12, 12, 22, 22, 32], dtype='int32'),
'res': [10, 20, 30, 10, 20],
'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
'unique': True }
In [3]: df = pd.DataFrame(data).set_index('index')
df.index.name = None
In [4]: df
Out[4]: dates::date value value32 res coord::point names unique
100 1964-01-01 10 12 10 POINT (1 2) john True
200 1985-02-05 10 12 20 POINT (3 4) eric True
300 2022-01-21 20 22 30 POINT (5 6) judith True
400 1964-01-01 20 22 10 POINT (7 8) mila True
500 1985-02-05 30 32 20 POINT (3 4) hector True
JSON-NTV representation
In [5]: df_to_json = npd.to_json(df)
pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
'value': [10, 10, 20, 20, 30],
'value32::int32': [12, 12, 22, 22, 32],
'res': [10, 20, 30, 10, 20],
'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
'unique': True}}
Reversibility
In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True
Table Schema representation
In [7]: df_to_table = npd.to_json(df, table=True)
pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
'dates': '1964-01-01',
'value': 10,
'value32': 12,
'res': 10,
'coord': [1.0, 2.0],
'names': 'john',
'unique': True}
In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
{'name': 'dates', 'type': 'date'},
{'name': 'value', 'type': 'integer'},
{'name': 'value32', 'type': 'integer', 'format': 'int32'},
{'name': 'res', 'type': 'integer'},
{'name': 'coord', 'type': 'geopoint', 'format': 'array'},
{'name': 'names', 'type': 'string'},
{'name': 'unique', 'type': 'boolean'}],
'primaryKey': ['index'],
'pandas_version': '1.4.0'}
Reversibility
In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True
installation and documentation
ntv-pandas
itself is a pure Python package maintained on ntv-pandas github repository.
It can be installed with pip
.
pip install ntv-pandas
dependency:
json_ntv
: support the NTV format,shapely
: for the location data,pandas
roadmap
- type extension : interval dtype and sparse format not yet included
- table schema : add type / format (
geojson
/topojson
,geopoint
/default
,geopoint
/object
,duration
/default
,string
/binary
,string
/uuid
), - null JSON data : strategy to define
- multidimensional : extension of the NTV format for multidimensional data (e.g. Xarray)
- pandas type : support for Series or DataFrame which include pandas data
- data consistency : controls between NTVtype and NTVvalue
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ntv_pandas-1.0.0.tar.gz
.
File metadata
- Download URL: ntv_pandas-1.0.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 214d46f39bdf7266fb0563032bc5e410d7fb6f7777c9bed3a854bf67f3ca1986 |
|
MD5 | ac44c5c3923202f0f9b3e5cde9af28c8 |
|
BLAKE2b-256 | 6a05f3066010d209e79a18d34d889356c71a0b26f7c86c27ef7f1f8044766611 |
File details
Details for the file ntv_pandas-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: ntv_pandas-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14e133d04319cf7d7acbcd925e68150a9f7f20c1113215110c535ec8643cf985 |
|
MD5 | 882149320ab460db234860fd07e936bc |
|
BLAKE2b-256 | 8e492bcec639ddbac780db7919c70a0254d79419d6dfe8a24c85db4c18d70c41 |