Skip to main content

NTV-pandas : A semantic, compact and reversible JSON-pandas converter

Project description

NTV-pandas : A semantic, compact and reversible JSON-pandas converter

ntv-pandas

Why a NTV-pandas converter ?

pandas provide JSON converter but three limitations are present:

  • the JSON-pandas converter take into account few data types,
  • the JSON-pandas converter is not always reversible (conversion round trip)
  • external data types (e.g. TableSchema types) are not included

main features

The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.

The converter integrates:

  • all the pandas dtype and the data-type associated to a JSON representation,
  • an always reversible conversion,
  • a full compatibility with TableSchema specification

NTV-pandas was developped originally in the json-NTV project

example

In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).

The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).

With the existing JSON interface, these conversions are not possible.

data example

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'index':        [100, 200, 300, 400, 500],
                'dates::date':  [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
                'value':        [10, 10, 20, 20, 30],
                'value32':      pd.Series([12, 12, 22, 22, 32], dtype='int32'),
                'res':          [10, 20, 30, 10, 20],
                'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
                'names':        pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
                'unique':       True }

In [3]: df = pd.DataFrame(data).set_index('index')
        df.index.name = None

In [4]: df
Out[4]:       dates::date  value  value32  res coord::point   names  unique
        100    1964-01-01     10       12   10  POINT (1 2)    john    True
        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
        500    1985-02-05     30       32   20  POINT (3 4)  hector    True

JSON-NTV representation

In [5]: df_to_json = npd.to_json(df)
        pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
                  'value': [10, 10, 20, 20, 30],
                  'value32::int32': [12, 12, 22, 22, 32],
                  'res': [10, 20, 30, 10, 20],
                  'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
                  'unique': True}}

Reversibility

In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True

Table Schema representation

In [7]: df_to_table = npd.to_json(df, table=True)
        pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
         'dates': '1964-01-01',
         'value': 10,
         'value32': 12,
         'res': 10,
         'coord': [1.0, 2.0],
         'names': 'john',
         'unique': True}

In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
                    {'name': 'dates', 'type': 'date'},
                    {'name': 'value', 'type': 'integer'},
                    {'name': 'value32', 'type': 'integer', 'format': 'int32'},
                    {'name': 'res', 'type': 'integer'},
                    {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
                    {'name': 'names', 'type': 'string'},
                    {'name': 'unique', 'type': 'boolean'}],
         'primaryKey': ['index'],
         'pandas_version': '1.4.0'}

Reversibility

In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True

installation and documentation

ntv-pandas itself is a pure Python package maintained on ntv-pandas github repository.

It can be installed with pip.

pip install ntv-pandas

dependency:

  • json_ntv: support the NTV format,
  • shapely: for the location data,
  • pandas

Documentation

roadmap

  • type extension : interval dtype and sparse format not yet included
  • table schema : add type / format (geojson/topojson, geopoint/default, geopoint/object, duration/default, string/binary, string/uuid),
  • null JSON data : strategy to define
  • multidimensional : extension of the NTV format for multidimensional data (e.g. Xarray)
  • pandas type : support for Series or DataFrame which include pandas data
  • data consistency : controls between NTVtype and NTVvalue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ntv_pandas-1.0.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

ntv_pandas-1.0.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file ntv_pandas-1.0.0.tar.gz.

File metadata

  • Download URL: ntv_pandas-1.0.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ntv_pandas-1.0.0.tar.gz
Algorithm Hash digest
SHA256 214d46f39bdf7266fb0563032bc5e410d7fb6f7777c9bed3a854bf67f3ca1986
MD5 ac44c5c3923202f0f9b3e5cde9af28c8
BLAKE2b-256 6a05f3066010d209e79a18d34d889356c71a0b26f7c86c27ef7f1f8044766611

See more details on using hashes here.

File details

Details for the file ntv_pandas-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ntv_pandas-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ntv_pandas-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14e133d04319cf7d7acbcd925e68150a9f7f20c1113215110c535ec8643cf985
MD5 882149320ab460db234860fd07e936bc
BLAKE2b-256 8e492bcec639ddbac780db7919c70a0254d79419d6dfe8a24c85db4c18d70c41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page