Skip to main content

NTV-pandas : A tabular analyzer and a semantic, compact and reversible JSON-pandas converter

Project description

NTV-pandas : A tabular analyzer and a semantic, compact and reversible JSON-pandas converter

ntv-pandas

For more information, see the user guide or the github repository.

NTV-pandas is referenced in the pandas ecosystem.

Why a NTV-pandas converter ?

pandas provide JSON converter but three limitations are present:

  • the JSON-pandas converter take into account few data types,
  • the JSON-pandas converter is not always reversible (conversion round trip)
  • external data types (e.g. TableSchema types) are not included

pandas does not have a tool for analyzing tabular structures and detecting integrity errors

main features

The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.

The converter integrates:

  • all the pandas dtype and the data-type associated to a JSON representation,
  • an always reversible conversion,
  • a full compatibility with Table Schema specification

The NTV-pandas analyzer uses the TAB-analysis tool to analyze and measure the relationships between Fields in DataFrame and the TAB-dataset to identify integrity errors (example).

NTV-pandas was developped originally in the json-NTV project

converter example

In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).

The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).

With the existing JSON interface, these conversions are not possible.

Data example:

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'index':        [100, 200, 300, 400, 500],
                'dates::date':  [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
                'value':        [10, 10, 20, 20, 30],
                'value32':      pd.Series([12, 12, 22, 22, 32], dtype='int32'),
                'res':          [10, 20, 30, 10, 20],
                'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
                'names':        pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
                'unique':       True }

In [3]: df = pd.DataFrame(data).set_index('index')
        df.index.name = None

In [4]: df
Out[4]:       dates::date  value  value32  res coord::point   names  unique
        100    1964-01-01     10       12   10  POINT (1 2)    john    True
        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
        500    1985-02-05     30       32   20  POINT (3 4)  hector    True

JSON-NTV representation:

In [5]: df_to_json = df.npd.to_json()
        pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
                  'value': [10, 10, 20, 20, 30],
                  'value32::int32': [12, 12, 22, 22, 32],
                  'res': [10, 20, 30, 10, 20],
                  'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
                  'unique': True}}

Reversibility:

In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True

Table Schema representation:

In [7]: df_to_table = df.npd.to_json(table=True)
        pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
         'dates': '1964-01-01',
         'value': 10,
         'value32': 12,
         'res': 10,
         'coord': [1.0, 2.0],
         'names': 'john',
         'unique': True}

In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
                    {'name': 'dates', 'type': 'date'},
                    {'name': 'value', 'type': 'integer'},
                    {'name': 'value32', 'type': 'integer', 'format': 'int32'},
                    {'name': 'res', 'type': 'integer'},
                    {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
                    {'name': 'names', 'type': 'string'},
                    {'name': 'unique', 'type': 'boolean'}],
         'primaryKey': ['index'],
         'pandas_version': '1.4.0'}

Reversibility:

In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ntv_pandas-1.1.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

ntv_pandas-1.1.1-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file ntv_pandas-1.1.1.tar.gz.

File metadata

  • Download URL: ntv_pandas-1.1.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ntv_pandas-1.1.1.tar.gz
Algorithm Hash digest
SHA256 8604f37b817ceb04d84fe32e41395f3d0cbaf1ce17fbd2e5871b44e6bcec8fec
MD5 705ef6b9f3f06e2a7a604bc7660568dc
BLAKE2b-256 dfe0924c7d3be5f7c4832e3bd6d18f468184f94eea9170c5764c7433b71688ad

See more details on using hashes here.

File details

Details for the file ntv_pandas-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: ntv_pandas-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ntv_pandas-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ca25bd720e1b00b133a54b8ffa757b7a35a1522eb75d5a8b539872df190048d
MD5 7a4a3f62ac89584ce4bf713d6f2e2e35
BLAKE2b-256 06a676764887d1ef01e344ab0c3ec350a3a4a4b5b696755093f1de9393e3b426

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page