Skip to main content

Meerkat is building new data abstractions to make machine learning easier.

Project description

Meerkat logo

GitHub Workflow Status GitHub pre-commit

Meerkat is a open-source Python library designed for technical teams that want to interactively wrangle their unstructured data with foundation models.

Website | Quickstart | Docs | Contributing | Discord | Blogpost

⚡️ Quickstart

We recommend installing Meerkat in a virtual environment,

pip install meerkat-ml

GPU Install: If you want to use Meerkat with a GPU, you will need to install PyTorch with GPU support. See here for more details.

Optional Dependencies: some parts of Meerkat rely on optional dependencies e.g. audio processing may rely on utilities from torchaudio. We leave it up to you to install necessary dependencies when required. As a convenience, we provide bundles of optional dependencies that you can install e.g. pip install meerkat-ml[text] for text dependencies. See setup.py for a full list of optional dependencies.

Then try one of our demos,

mk demo tutorial-image-gallery --copy

Explore the code for this demo in tutorial-image-gallery.py.

To see a full list of demos, use mk demo --help. (If this didn't work for you, we'd appreciate if you could open an issue and let us know.)

Next Steps. Check out our Getting Started page and our documentation to start building with Meerkat. As we work to make the documentation more comprehensive, please feel free to open an issue or reach out if you have any questions.

Why Meerkat?

Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.

Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.

Meerkat’s approach is based on two pillars:

(1) Heterogeneous data frames with extended API. At the heart of Meerkat is a data frame that can store structured fields (e.g. numbers, strings, and dates) alongside complex objects (e.g. images, web pages, audio) and their tensor representations (e.g. embeddings, logits) in a single table. Meerkat's data frame API goes beyond structured data analysis libraries like Pandas by providing a set of FM-backed unstructured data operations.

import meerkat as mk

df = mk.from_csv("paintings.csv")
df["img"] = mk.files("img_path")
df["embeddings"] = mk.embed(df["img"], encoder="clip")
df
Meerkat logo

(2) Interactivity in Python. Meerkat provides interactive data frame visualizations that allow you to control foundation models as they process your data. Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts. Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.

match = mk.gui.Match(df,
	against="embedding",
	engine="clip"
)
sorted_df = mk.sort(df,
	by=match.criterion.name,
	ascending=False
)
gallery = mk.gui.Gallery(sorted_df)
mk.gui.html.div([match, gallery])
Meerkat logo

✉️ About

Meerkat is being built by Machine Learning PhD students in the Hazy Research lab at Stanford. We're excited to build for a future where models will make it easier for teams to sift and reason through large volumes of data effortlessly. We have varied research backgrounds and have done research that touches all parts of the machine learning process: we've created new model architectures, studied model robustness and evaluation, worked on applications ranging from audio generation to medical imaging.

Please reach out to kgoel [at] cs [dot] stanford [dot] edu, eyuboglu [at] stanford [dot] edu, and arjundd [at] stanford [dot] edu if you would like to use Meerkat for a project, at your company or if you have any questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meerkat-ml-0.4.2.tar.gz (2.8 MB view hashes)

Uploaded Source

Built Distribution

meerkat_ml-0.4.2-py2.py3-none-any.whl (3.0 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page