Skip to main content

No project description provided

Project description

dataquality

The Official Python Client for Galileo.

Galileo is a tool for understanding and improving the quality of your NLP and CV data.

Galileo gives you access to all of the information you need, at a UI and API level, to continuously build better and more robust datasets and models.

dataquality is your entrypoint to Galileo. It helps you start and complete the loop of data quality improvements.

ToC

Getting Started

Install the package.

pip install dataquality

Create an account at Galileo

Grab your token

Get your dataset and analyze it with dq.auto (You will be prompted for your token here)

import dataquality as dq

dq.auto(
    train_data="/path/to/train.csv",
    val_data="/path/to/val.csv",
    test_data="/path/to/test.csv",
    project_name="my_first_project",
    run_name="my_first_run",
)

☕️ Wait for Galileo to train your model and analyze the results.
✨ A link to your run will be provided automatically

Pro tip: Set your token programmatically for automated workflows

By setting the token, you'll never be prompted to log in

import dataquality as dq

dq.config.token = 'MY-TOKEN'

For long-lived flows like CI/CD, see our docs on environment variables

What kinds of datasets can I analyze?

Currently, you can analyze Text Classification and NER

If you want support for other kinds, reach out!

Can I use auto with other data forms?

auto params train_data, val_data, and test_data can also take as input pandas dataframes and huggingface dataframes!

What if all my data is in huggingface?

Use the hf_data param to point to a dataset in huggingface

import dataquality as dq

dq.auto(hf_data="rungalileo/emotion")

Anything else? Can I learn more?

Run help(dq.auto) for more information on usage
Check out our docs for the inspiration behind this methodology.

Can I analyze data using a custom model?

Yes! Check out our full documentation and example notebooks on how to integrate your own model with Galileo

What if I don't have labels to train with? Can you help with labeling?

We have an app for that! Currently text classification only, but reach out if you want a new modality!

This is currently in development, and not an official part of the Galileo product, but rather an open source tool for the community.

We've built a bulk-labeling tool (and hosted it on streamlit) to help you generate labels quickly using semantic embeddings and text search.

For more info on how it works and how to use it, check out the open source repo.

Is there a Python API for programmatically interacting with the console?

Yes! See our docs on dq.metrics to access things like overall metrics, your analyzed dataframe, and even your embeddings.

Contributing

Read our contributing doc!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataquality-2.3.0.tar.gz (264.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataquality-2.3.0-py3-none-any.whl (334.6 kB view details)

Uploaded Python 3

File details

Details for the file dataquality-2.3.0.tar.gz.

File metadata

  • Download URL: dataquality-2.3.0.tar.gz
  • Upload date:
  • Size: 264.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for dataquality-2.3.0.tar.gz
Algorithm Hash digest
SHA256 c348f3668b0c4bc8e0678ab5e0e5dd94646f13cd3f7a7f2cfca9a88d8c9d384a
MD5 10516454c5e1c0ae45ee859f409fd1c2
BLAKE2b-256 bb52e044463467f4704413707b8458e7764a92770540dc15ccce7f2c6f416daf

See more details on using hashes here.

File details

Details for the file dataquality-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: dataquality-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 334.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for dataquality-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ad1f7ba75a7ef0ba543d094253d5491acf0dac4bd7c479dacf133c93c1eeda2
MD5 889440c0ac1e93ac4fab4b04a60516cf
BLAKE2b-256 564c24b5796394a5dbd7e8ea1ad1961762b2ec9a34f599d2bdb9978d0e19d947

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page