NTV-pandas : A tabular analyzer and a semantic, compact and reversible JSON-pandas converter
Project description
NTV-pandas : A tabular analyzer and a semantic, compact and reversible JSON-pandas converter
For more information, see the user guide or the github repository.
NTV-pandas is referenced in the pandas ecosystem.
Why a NTV-pandas converter ?
pandas provide JSON converter but three limitations are present:
- the JSON-pandas converter take into account few data types,
- the JSON-pandas converter is not always reversible (conversion round trip)
- external data types (e.g. TableSchema types) are not included
pandas does not have a tool for analyzing tabular structures and detecting integrity errors
main features
The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.
The converter integrates:
- all the pandas
dtype
and the data-type associated to a JSON representation, - an always reversible conversion,
- a full compatibility with Table Schema specification
The NTV-pandas analyzer uses the TAB-analysis tool to analyze and measure the relationships between Fields in DataFrame and the TAB-dataset to identify integrity errors (example).
NTV-pandas was developped originally in the json-NTV project
converter example
In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).
The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).
With the existing JSON interface, these conversions are not possible.
Data example:
In [1]: from shapely.geometry import Point
from datetime import date
import pandas as pd
import ntv_pandas as npd
In [2]: data = {'index': [100, 200, 300, 400, 500],
'dates::date': [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
'value': [10, 10, 20, 20, 30],
'value32': pd.Series([12, 12, 22, 22, 32], dtype='int32'),
'res': [10, 20, 30, 10, 20],
'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
'unique': True }
In [3]: df = pd.DataFrame(data).set_index('index')
df.index.name = None
In [4]: df
Out[4]: dates::date value value32 res coord::point names unique
100 1964-01-01 10 12 10 POINT (1 2) john True
200 1985-02-05 10 12 20 POINT (3 4) eric True
300 2022-01-21 20 22 30 POINT (5 6) judith True
400 1964-01-01 20 22 10 POINT (7 8) mila True
500 1985-02-05 30 32 20 POINT (3 4) hector True
JSON-NTV representation:
In [5]: df_to_json = df.npd.to_json()
pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
'value': [10, 10, 20, 20, 30],
'value32::int32': [12, 12, 22, 22, 32],
'res': [10, 20, 30, 10, 20],
'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
'unique': True}}
Reversibility:
In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True
Table Schema representation:
In [7]: df_to_table = df.npd.to_json(table=True)
pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
'dates': '1964-01-01',
'value': 10,
'value32': 12,
'res': 10,
'coord': [1.0, 2.0],
'names': 'john',
'unique': True}
In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
{'name': 'dates', 'type': 'date'},
{'name': 'value', 'type': 'integer'},
{'name': 'value32', 'type': 'integer', 'format': 'int32'},
{'name': 'res', 'type': 'integer'},
{'name': 'coord', 'type': 'geopoint', 'format': 'array'},
{'name': 'names', 'type': 'string'},
{'name': 'unique', 'type': 'boolean'}],
'primaryKey': ['index'],
'pandas_version': '1.4.0'}
Reversibility:
In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ntv_pandas-1.1.0.tar.gz
.
File metadata
- Download URL: ntv_pandas-1.1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 533c2a6fc44766a58ee6f4fbbdda3d11179e8c30d3639a66e38755aa5946c6e3 |
|
MD5 | b2a3114aa35f8e7ce978d228bec04cb1 |
|
BLAKE2b-256 | 425fa808618cbb045e79bbe96f2f706196d379e8fb89ef884c66f20ba2ad50d7 |
File details
Details for the file ntv_pandas-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: ntv_pandas-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8cb8f706cacd95a1919ceed976232443f274daa18d42283ae060d52ae1d23ec |
|
MD5 | 5ff77f40bf24a50c60c3ae9052d38984 |
|
BLAKE2b-256 | 05649996489e28bb59aa6e77ac28f613c08e9d2de1236716a46dac812322b9eb |