Skip to main content

Meerkat is building new data abstractions to make machine learning easier.

Project description

Meerkat logo

GitHub Workflow Status GitHub pre-commit

Create interactive views of any dataset.

Website | Quickstart | Docs | Contributing | Discord | Blogpost

⚡️ Quickstart

pip install meerkat-ml

Next Steps. Check out our Getting Started page and our documentation to start building with Meerkat.

Why Meerkat?

Meerkat is an open-source Python library that helps users visualize, explore, and annotate any dataset. It is especially useful when processing unstructured data types (e.g. free text, PDFs, images, video) with machine learning models.

✏️ Features and Design Principles

Here are four principles that inform Meerkat's design.

(1) Low overhead. With four lines of Python, start interacting with any dataset.

  • Zero-copy integrations with your preferred data abstractions: Pandas, Arrow, HF Datasets, Ibis, SQL.
  • Limited data movement. With Meerkat, you interact with your data where it already lives: no uploads to external databases and no reformatting.
import meerkat as mk
df = mk.from_csv("paintings.csv")
df["image"] = mk.files("image_url")
df
Meerkat logo

(2) Diverse data types. Visualize and annotate almost any data type in Meerkat interfaces: text, images, audio, video, MRI scans, PDFs, HTML, JSON.

(3) "Intelligent" user interfaces. Meerkat makes it easy to embed machine learning models (e.g. LLMs) within user interfaces to enable intelligent functionality such as searching, grouping and autocomplete.

df["embedding"] = mk.embed(df["img"], engine="clip")
match = mk.gui.Match(df,
	against="embedding",
	engine="clip"
)
sorted_df = mk.sort(df,
	by=match.criterion.name,
	ascending=False
)
gallery = mk.gui.Gallery(sorted_df)
mk.gui.html.div([match, gallery])
Meerkat logo

(4) Declarative (think: Seaborn), but also infinitely customizable and composable. Meerkat visualization components can be composed and customized to create new interfaces.

plot = mk.gui.plotly.ScatterPlot(df=plot_df, x="umap_1", y="umap_2",)

@mk.gui.reactive
def filter(selected: list, df: mk.DataFrame):
    return df[df.primary_key.isin(selected)]

filtered_df = filter(plot.selected, plot_df)
table = mk.gui.Table(filtered_df, classes="h-full")

mk.gui.html.flex([plot, table], classes="h-[600px]") 
Meerkat logo

✨ Use cases where Meerkat shines

  • Exploratory analysis over unstructured data types. Demo
  • Spot-checking the behavior of large language models (e.g. GPT-3). Demo
  • Identifying systematic errors made by machine learning models. Demo
  • Rapid labeling of validation data.

🤔 Use cases where Meerkat may not be the right fit

  • Are you only working with structured data (e.g. numerical and categorical variables)? Popular data visualization libraries (e.g. Seaborn, Matplotlib) are often sufficient. If you're looking for interactivity, Plotly and Streamlit work well with structured data. Meerkat is differentiated in how it visualizes unstructured data types: long-form text, PDFs, HTML, images, video, audio...
  • Are you trying to make a straightforward demo of a machine learning model (single input/output, chatbot) and share with the world? Gradio is likely a better fit! Though, if your demo involves visualizing lots of data, you may find Meerkat useful.
  • Are you trying to manually label tens of thousands of data points? If you are looking for a data labeling tool to use with a labeling team, there are great open source labeling solutions designed for this (e.g. LabelStudio). In contrast, Meerkat is great fit for teams/individuals without access to a large labeling workforce who are using pretrained models (e.g. GPT-3) and need to label validation data or in-context examples.

✉️ About

Meerkat is being built by Machine Learning PhD students in the Hazy Research lab at Stanford. We're excited to build for a future where models will make it easier for teams to sift and reason through large volumes of unstructtured data effortlessly.

Please reach out to kgoel [at] cs [dot] stanford [dot] edu, eyuboglu [at] stanford [dot] edu, and arjundd [at] stanford [dot] edu if you would like to use Meerkat for a project, at your company or if you have any questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meerkat-ml-0.4.11.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

meerkat_ml-0.4.11-py2.py3-none-any.whl (3.6 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file meerkat-ml-0.4.11.tar.gz.

File metadata

  • Download URL: meerkat-ml-0.4.11.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for meerkat-ml-0.4.11.tar.gz
Algorithm Hash digest
SHA256 8e4dce3224dcb0a30dee2af7d8159d28956c259f2b1d2f6f81a79712d6457dca
MD5 448809c22911004cdaaff6306d31c3d0
BLAKE2b-256 a9bc5fdbb633825a8116dec96eafd89e84e43eabc4efaeee5818806b3ad80941

See more details on using hashes here.

File details

Details for the file meerkat_ml-0.4.11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for meerkat_ml-0.4.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d0e7e2c5ad1d386f86d098c6bebb854c794b24940310e29fe343c07e957db91f
MD5 d7b2cea0a29449dd2c7f54290cdf06f5
BLAKE2b-256 c7a67dc47dc2655da31d33adfa3662f66768f431158a12a265d24758bb778354

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page