Skip to main content

Open-source tool for tracking, exploring and labelling data for AI projects.

Project description

drawing

Explore, label, and monitor data for AI projects

CI CI CI CI CI CI CI Codecov

Rubrix Intro

Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Rubrix focuses on enabling novel, human in the loop workflows involving data scientists, subject matter experts and ML/data engineers.

With Rubrix, you can:

  • Monitor the predictions of deployed models.
  • Label data with a novel search-guided, iterative workflow.
  • Iterate on ground-truth and predictions to debug, track and improve your data and models over time.
  • Build custom dashboards on top of your model predictions and labels.

Rubrix is composed of:

  • a Python library to bridge data and models, which you can install via pip.
  • a web application to explore and label data, which you can launch using Docker or directly with Python.

This is an example of Rubrix's labeling mode:

Rubrix Annotation Mode

And this is an example for logging model predictions from a 🤗 transformers text classification pipeline:

from datasets import load_dataset
import rubrix as rb

model = pipeline('zero-shot-classification', model="typeform/distilbert-base-uncased-mnli")

dataset = load_dataset("ag_news", split='test[0:100]')

# Our labels are: ['World', 'Sports', 'Business', 'Sci/Tech']
labels = dataset.features["label"].names

for record in dataset:
    prediction = model(record['text'], labels)

    item = rb.TextClassificationRecord(
        inputs={"text": record["text"]},
        prediction=list(zip(prediction['labels'], prediction['scores'])),
        annotation=labels[record["label"]]
    )

    rb.log(item, name="ag_news_zeroshot")

Quick links

Doc Description
🚶 First steps New to Rubrix and want to get started?
👩‍🏫 Concepts Want to know more about Rubrix concepts?
🛠️ Setup and install How to configure and install Rubrix
🗒️ Tasks What can you use Rubrix for?
📱 UI reference How to use the web-app for data exploration and annotation
🐍 Python API docs How to use the Python classes and methods
👩‍🍳 Rubrix cookbook How to use Rubrix with your favourite libraries (flair, stanza...)
👋 Community forum Ask questions, share feedback, ideas and suggestions
🤗 Hugging Face tutorial Using Rubrix with 🤗transformers and datasets
💫 spaCy tutorial Using spaCy with Rubrix for NER projects
🐠 Weak supervision tutorial How to leverage weak supervision with snorkel & Rubrix
🤔 Active learning tutorial How to use active learning with modAL & Rubrix
🧪 Knowledge graph tutorial How to use Rubrix with kglab & pytorch_geometric

Get started

To get started you need to follow three steps:

  1. Install the Python client
  2. Launch the web app
  3. Start logging data

1. Install the Python client

You can install the Python client with pip:

pip install rubrix

2. Launch the webapp

There are two ways to launch the webapp:

  • Using docker-compose (recommended).
  • Executing the server code manually

Using docker-compose (recommended)

Create a folder:

mkdir rubrix && cd rubrix

and launch the docker-contained web app with the following command:

wget -O docker-compose.yml https://git.io/rb-docker && docker-compose up

This is the recommended way because it automatically includes an Elasticsearch instance, Rubrix's main persistence layer.

Executing the server code manually

When executing the server code manually you need to provide an Elasticsearch instance yourself.

  1. First you need to install Elasticsearch (we recommend version 7.10) and launch an Elasticsearch instance. For MacOS and Windows there are Homebrew formulae and a msi package, respectively.
  2. Install the Rubrix Python library together with its server dependencies:
pip install rubrix[server]
  1. Launch a local instance of the Rubrix web app
python -m rubrix.server

By default, the Rubrix server will look for your Elasticsearch endpoint at http://localhost:9200. If you want to customize this, you can set the ELASTICSEARCH environment variable pointing to your endpoint.

3. Start logging data

The following code will log one record into the example-dataset dataset:

import rubrix as rb

rb.log(
    rb.TextClassificationRecord(inputs="my first rubrix example"),
    name='example-dataset'
)
BulkResponse(dataset='example-dataset', processed=1, failed=0)

If you go to your Rubrix app at http://localhost:6900/, you should see your first dataset.

Congratulations! You are ready to start working with Rubrix with your own data.

To better understand what's possible take a look at Rubrix's Cookbook

Community

As a new open-source project, we are eager to hear your thoughts, fix bugs, and help you get started. Feel free to use the Discussion forum or the Issues and we'll be pleased to help out.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubrix-0.2.0.tar.gz (8.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubrix-0.2.0-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file rubrix-0.2.0.tar.gz.

File metadata

  • Download URL: rubrix-0.2.0.tar.gz
  • Upload date:
  • Size: 8.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for rubrix-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5951afebb30bac47a9a394e4dac6a5a3798dcedfe7dcf7b5b4a6369b94d1652c
MD5 2b1602d1816dad6e350b83a922052f03
BLAKE2b-256 b7b7cc8f4902cc3b3ded3e9066083ea907578911f9fe9c9e671cf4739ea14cb8

See more details on using hashes here.

File details

Details for the file rubrix-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rubrix-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for rubrix-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f978cefbc59820e119e1a93fd1c88dad97dffe039657119aacbb0f58de061012
MD5 2438706bd4ff664f84c965eeca9d229e
BLAKE2b-256 b18c5d7a42ce8c42f4a5cebf7c0f14710fdf22f27608b57fce1c718d6767be33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page