Meerkat is building new data abstractions to make machine learning easier.
Project description
Meerkat is a open-source Python library designed for technical teams that want to interactively wrangle their unstructured data with foundation models.
Website | Quickstart | Docs | Contributing | Discord | Blogpost
⚡️ Quickstart
We recommend installing Meerkat in a virtual environment,
pip install meerkat-ml
GPU Install: If you want to use Meerkat with a GPU, you will need to install PyTorch with GPU support. See here for more details.
Optional Dependencies: some parts of Meerkat rely on optional dependencies e.g. audio processing may rely on utilities from
torchaudio
. We leave it up to you to install necessary dependencies when required. As a convenience, we provide bundles of optional dependencies that you can install e.g.pip install meerkat-ml[text]
for text dependencies. Seesetup.py
for a full list of optional dependencies.
Then try one of our demos,
mk demo tutorial-image-gallery --copy
Explore the code for this demo in tutorial-image-gallery.py
.
To see a full list of demos, use mk demo --help
. (If this didn't work for you, we'd appreciate if you could open an issue and let us know.)
Next Steps. Check out our Getting Started page and our documentation to start building with Meerkat. As we work to make the documentation more comprehensive, please feel free to open an issue or reach out if you have any questions.
Why Meerkat?
Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.
Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.
Meerkat’s approach is based on two pillars:
(1) Heterogeneous data frames with extended API. At the heart of Meerkat is a data frame that can store structured fields (e.g. numbers, strings, and dates) alongside complex objects (e.g. images, web pages, audio) and their tensor representations (e.g. embeddings, logits) in a single table. Meerkat's data frame API goes beyond structured data analysis libraries like Pandas by providing a set of FM-backed unstructured data operations.
import meerkat as mk
df = mk.from_csv("paintings.csv")
df["img"] = mk.files("img_path")
df["embeddings"] = mk.embed(df["img"], encoder="clip")
df
(2) Interactivity in Python. Meerkat provides interactive data frame visualizations that allow you to control foundation models as they process your data. Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts. Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.
match = mk.gui.Match(df,
against="embedding",
engine="clip"
)
sorted_df = mk.sort(df,
by=match.criterion.name,
ascending=False
)
gallery = mk.gui.Gallery(sorted_df)
mk.gui.html.div([match, gallery])
✉️ About
Meerkat is being built by Machine Learning PhD students in the Hazy Research lab at Stanford. We're excited to build for a future where models will make it easier for teams to sift and reason through large volumes of data effortlessly. We have varied research backgrounds and have done research that touches all parts of the machine learning process: we've created new model architectures, studied model robustness and evaluation, worked on applications ranging from audio generation to medical imaging.
Please reach out to kgoel [at] cs [dot] stanford [dot] edu, eyuboglu [at] stanford [dot] edu, and arjundd [at] stanford [dot] edu
if you would like to use Meerkat for a project, at your company or if you have any questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for meerkat_ml-0.4.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d8407e2bfdc9365c07a779642ed8f1b5e0e8032aaa94815e5091c7413c6740b |
|
MD5 | ceb54188c1e0fc8f981cc0982daab664 |
|
BLAKE2b-256 | 665b9109beeffb09f5227212b6da509381a74c7ee44e9a67ea2fb57d6ad606fe |