dataquality
Project description
dataquality
The Official Python Client for Galileo.
Galileo is a tool for understanding and improving the quality of your NLP (and soon CV!) data.
Galileo gives you access to all of the information you need, at a UI and API level, to continuously build better and more robust datasets and models.
dataquality
is your entrypoint to Galileo. It helps you start and complete the loop of data quality improvements.
Getting Started
Install the package.
pip install dataquality
Create an account at Galileo
Grab your token
Get your dataset and analyze it with dq.auto
(You will be prompted for your token here)
import dataquality as dq
dq.auto(
train_data="/path/to/train.csv",
val_data="/path/to/val.csv",
test_data="/path/to/test.csv",
project_name="my_first_project",
run_name="my_first_run",
)
☕️ Wait for Galileo to train your model and analyze the results.
✨ A link to your run will be provided automatically
What kinds of datasets can I analyze?
Currently, you can analyze Text Classification and NER
If you want support for other kinds, reach out!
Can I use auto with other data forms?
auto
params train_data
, val_data
, and test_data
can also take as input pandas dataframes and huggingface dataframes!
What if all my data is in huggingface?
Use the hf_data
param to point to a dataset in huggingface
import dataquality as dq
dq.auto("rungalileo/emotion")
Anything else? Can I learn more?
Run help(dq.auto)
for more information on usage
Check out our docs for the inspiration behind this methodology.
Can I analyze data using a custom model?
Yes! Check out our full documentation and example notebooks on how to integrate your own model with Galileo
Contibuting
Read our contributing doc!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataquality-0.8.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52e0a6dab0bff4b1d04928bda808552cbb6b799fbf8743bf4937eda8bff300d1 |
|
MD5 | 31b91d230655e8e42b78d469d4b3b519 |
|
BLAKE2b-256 | ffe25b729389be23717cf918f915dd3b7aa28fbe365ddf69acf88589126f7c17 |