Skip to main content

An ipywidget helper class to manually label rows in pandas data frames.

Project description

DataFrameLabeler

A small ipywidget tool for labeling data frames inside jupyter.

Installation

Currently, the only way to use the DataFrameLabeler is to clone this repositroy.

Why?

This small tool was inspired by the fast.ai image cleaner widget https://docs.fast.ai/widgets.image_cleaner.html . However, I needed a tool for tabular data.

How to use?

# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from DataFrameLabeler import DataFrameLabeler

# If you have a pandas data frame where you want to assign each row eihter 'SUCCESS' or 'FAILURE'.
# like the following one.
length = 100
cols = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(np.random.rand(length, len(cols)), columns=cols)

# First you need a function responsible to print a single row.
def plotter(idx, row):
    fig = plt.figure()
    plt.plot([i for i in row[cols]])
    # plot should not be shown when called.
    plt.close(fig)
    return fig

# Afterwards, just construct a DataFrameLabeler object.
# If `target_col` exists in the data frame, its content will be used as preselection.
lbl = DataFrameLabeler(df,
                       labels=['FAILURE', 'SUCCESS'], # choices for the labels
                       plotter=plotter,               # function which plots each row
                       target_col='class_name',       # column name where the labels will be stored
                       width=3,                       # number of figures in each row
                       height=2                       # number of rows shown at once
                       )

DataFrameLabeler

# To obtain the newly labeled data frame call lbl.get_labeled_data()

Result

TODO:

  • rework how user defined plotter works, atm its horrifying, especially when using matplotlib
  • proper styling of buttons
  • allow groupby argument
  • allow multi selection
  • add automatic saving of intermediate result to csv or pickle file
  • rethink interface
  • add more unit tests
  • Documentation

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for DataFrameLabeler, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size DataFrameLabeler-0.0.1-py3-none-any.whl (9.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size DataFrameLabeler-0.0.1.tar.gz (6.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page