Skip to main content

Python wrapper for Data Explorer

Project description

dx

This package provides convenient formatting and IPython display formatter registration for tabular data and DEX media types.

CI codecov code coverage PyPI - License PyPI - Python Version PyPI Code style: black


A Pythonic Data Explorer, open sourced with ❤️ by Noteable, a collaborative notebook platform that enables teams to use and visualize data, together.

Requirements

Python 3.8+

Installation

Poetry

poetry add dx

Then import the package:

import dx

Pip

pip install dx

Then import the package:

import dx

Usage

The dx library currently enables DEX media type visualization of pandas DataFrame and Series objects, as well as numpy ndarray objects. This can be handled in two ways:

  • explicit dx.display() calls
  • setting the display_mode to update the IPython display formatter for a session

With dx.display()

dx.display() will display a single dataset using the DEX media type. It currently supports:

  • pandas DataFrame objects

    import pandas as pd
    import random
    
    df = pd.DataFrame({
        'random_ints': [random.randint(0, 100) for _ in range(500)],
        'random_floats': [random.random() for _ in range(500)],
    })
    dx.display(df)
    

  • tabular data as dict or list types

    dx.display([
      [1, 5, 10, 20, 500],
      [1, 2, 3, 4, 5],
      [0, 0, 0, 0, 1]
    ])
    

  • .csv or .json filepaths

    df = dx.random_dataframe()
    df.to_csv("dx_docs_sample.csv", index=False)
    
    dx.display("dx_docs_sample.csv")
    

With dx.set_display_mode()

Using either "simple" or "enhanced" display modes will allow dx will update the current IPython display formatters to allow DEX media type visualization of pandas DataFrame objects for an entire notebook / kernel session instead of the default DataFrame display output.

Details

This will adjust pandas options to:

  • increasing the number of rows displayed to 50000 from pandas default of 60
  • increasing the number of columns displayed to 50 from pandas default of 20
  • enabling html.table_schema (False by default in pandas)

This will also handle some basic column cleaning and generate a schema for the DataFrame using pandas.io.json.build_table_schema. Depending on the display mode, the data will be transformed into either a list of dictionaries or list of lists of columnar values.

  • "simple" - list of dictionaries
  • "enhanced" - list of lists

NOTE: Unlike dx.display(), this only affects pandas DataFrames (or any types set in settings.RENDERABLE_TYPES); it does not affect the display of .csv/.json file data, or dict/list outputs

  • dx.set_display_mode("simple")

    import dx
    import numpy as np
    import pandas as pd
    
    # enable DEX display outputs from now on
    dx.set_display_mode("simple")
    
    df = pd.read_csv("dx_docs_sample.csv")
    df
    
    df2 = pd.DataFrame(
        [
            [1, 5, 10, 20, 500],
            [1, 2, 3, np.nan, 5],
            [0, 0, 0, np.nan, 1]
        ],
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df2
    

If, at any point, you want to go back to the default display formatting (vanilla pandas output), use the "plain" display mode. This will revert the IPython display format update to its original state and put the pandas options back to their default values.

  • dx.set_display_mode("plain")
    # revert to original pandas display outputs from now on
    dx.set_display_mode("plain")
    
    df = pd.read_csv("dx_docs_sample.csv")
    df
    
    df2 = pd.DataFrame(
        [
            [1, 5, 10, 20, 500],
            [1, 2, 3, np.nan, 5],
            [0, 0, 0, np.nan, 1]
        ],
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df2
    

Custom Settings

Default settings for dx can be found by calling dx.settings:

Each can be set using dx.set_option(): Setting DISPLAY_MAX_ROWS to 3 for the current session

...or with the dx.settings_context() context manager: Setting DISPLAY_MAX_ROWS to 3 within the current context, leaving options for the rest of the session alone

Generating Sample Data

Documentation coming soon!

Usage Outside of Noteable

If using this package in a notebook environment outside of Noteable, the frontend should support the following media types:

  • application/vnd.dataresource+json for "simple" display mode
  • application/vnd.dex.v1+json for "enhanced" display mode

Contributing

See CONTRIBUTING.md.

Code of Conduct

We follow the noteable.io code of conduct.

LICENSE

See LICENSE.md.


Open sourced with ❤️ by Noteable for the community.

Boost Data Collaboration with Notebooks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dx-1.3.0.tar.gz (51.4 kB view details)

Uploaded Source

Built Distribution

dx-1.3.0-py3-none-any.whl (65.8 kB view details)

Uploaded Python 3

File details

Details for the file dx-1.3.0.tar.gz.

File metadata

  • Download URL: dx-1.3.0.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-1034-azure

File hashes

Hashes for dx-1.3.0.tar.gz
Algorithm Hash digest
SHA256 8d8c7f7eac20569f031d00f37eed9dc712cd96cae57a4f09b67c5d11b36d598a
MD5 ad9a643295eb78faea2a55a82ac7952a
BLAKE2b-256 d2887bd2e955b475d0e15886150fd312553352f4e39ec2c45b1c724acf0f811e

See more details on using hashes here.

File details

Details for the file dx-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: dx-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 65.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-1034-azure

File hashes

Hashes for dx-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55b7c9c8381d24500afd013f80634f363dfc250a71853c313430ccc4af120e59
MD5 ef01db2379a514111018b806d9973688
BLAKE2b-256 eb8f3494fbb2b6ae3691ac28edba3030186c7d54159a8128ad2ae128126856ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page