Skip to main content

An ipywidget helper class to manually label rows in pandas data frames.

Project description

DataFrameLabeler

A small ipywidget tool for labeling data frames inside jupyter.

Installation

Currently, the only way to use the DataFrameLabeler is to clone this repositroy.

Why?

This small tool was inspired by the fast.ai image cleaner widget https://docs.fast.ai/widgets.image_cleaner.html . However, I needed a tool for tabular data.

How to use?

# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from DataFrameLabeler import DataFrameLabeler

# If you have a pandas data frame where you want to assign each row eihter 'SUCCESS' or 'FAILURE'.
# like the following one.
length = 100
cols = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(np.random.rand(length, len(cols)), columns=cols)

# First you need a function responsible to print a single row.
def plotter(idx, row):
    fig = plt.figure()
    plt.plot([i for i in row[cols]])
    # plot should not be shown when called.
    plt.close(fig)
    return fig

# Afterwards, just construct a DataFrameLabeler object.
# If `target_col` exists in the data frame, its content will be used as preselection.
lbl = DataFrameLabeler(df,
                       labels=['FAILURE', 'SUCCESS'], # choices for the labels
                       plotter=plotter,               # function which plots each row
                       target_col='class_name',       # column name where the labels will be stored
                       width=3,                       # number of figures in each row
                       height=2                       # number of rows shown at once
                       )

DataFrameLabeler

# To obtain the newly labeled data frame call lbl.get_labeled_data()

Result

TODO:

  • rework how user defined plotter works, atm its horrifying, especially when using matplotlib
  • proper styling of buttons
  • allow groupby argument
  • allow multi selection
  • add automatic saving of intermediate result to csv or pickle file
  • rethink interface
  • add more unit tests
  • Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataFrameLabeler-0.0.1.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DataFrameLabeler-0.0.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file DataFrameLabeler-0.0.1.tar.gz.

File metadata

  • Download URL: DataFrameLabeler-0.0.1.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.8

File hashes

Hashes for DataFrameLabeler-0.0.1.tar.gz
Algorithm Hash digest
SHA256 347a29e57ec660ca5a4b21b4cb7c05a151bdc8f9ab61c120d9a72d6d061e2ba4
MD5 cfa79f54fb56751ce5fe6e0fafb245db
BLAKE2b-256 fed49798a7d2233f8b39b468fda77a586715af2ecf9495e07444b4367fb4d544

See more details on using hashes here.

File details

Details for the file DataFrameLabeler-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: DataFrameLabeler-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.8

File hashes

Hashes for DataFrameLabeler-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 964b114d1e91ebd5993654458e6d6899e25ecd1d2258cafc254b46211e8ef265
MD5 8ef41a37ff6aea20b9c44529848b1e26
BLAKE2b-256 89ddb222ff06b90326980e50e8ab4b0a7c6fd9ad4a37971cf8cac86958ba2dd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page