Skip to main content

Clarifai Data Utils

Project description

Clarifai logo

Clarifai Python Data Utils

Discord codecov

This is a collection of utilities for handling various types of multimedia data. Enhance your experience by seamlessly integrating these utilities with the Clarifai Python SDK. This powerful combination empowers you to address both visual and textual use cases effortlessly through the capabilities of Artificial Intelligence. Unlock new possibilities and elevate your projects with the synergy of versatile data utilities and the robust features offered by the Clarifai Python SDK. Explore the fusion of these tools to amplify the intelligence in your applications! 🌐🚀

Website | Schedule Demo | Signup for a Free Account | API Docs | Clarifai Community | Python SDK Docs | Examples | Colab Notebooks | Discord


Table Of Contents

Installation

Install from PyPi:

pip install clarifai-datautils

Install from Source:

git clone https://github.com/Clarifai/clarifai-python-datautils
cd clarifai-python-datautils
python3 -m venv env
source env/bin/activate
pip3 install -r requirements.txt

Getting started

Quick intro to Image Annotation Conversion feature

from clarifai_datautils import ImageAnnotations

annotated_dataset = ImageAnnotations.import_from(path= 'folder_path', format= 'annotation_format')

Features

Image Utils

  • Annotation Loader

    • Load various annotated image datasets and export to clarifai Platform
    • Convert from one annotation format to other supported annotation formats

Data Ingestion Pipeline

  • Easy to use pipelines to load data from files and ingest into clarifai platfrom.
  • Load text files(pdf, doc, etc..) , transform, chunk and upload to the Clarifai Platform

Usage

Image Annotation Loader

from clarifai_datautils import ImageAnnotations
#import from folder
coco_dataset = ImageAnnotations.import_from(path='folder_path',format= 'coco_detection')

#Using clarifai SDK to upload to Clarifai Platform
#export CLARIFAI_PAT={your personal access token}  # set PAT as env variable
from clarifai.client.dataset import Dataset
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")
dataset.upload_dataset(dataloader=coco_dataset.dataloader)

#info about loaded dataset
coco_dataset.get_info()


#exporting to other formats
coco_dataset.export_to('voc_detection')

Data Ingestion Pipelines

Setup

To use Data Ingestion Pipeline, please run

pip install -r requirements-dev.txt
from clarifai_datautils.text import Pipeline, PDFPartition
from clarifai_datautils.text.pipeline.cleaners import Clean_extra_whitespace

# Define the pipeline
pipeline = Pipeline(
    name='pipeline-1',
    transformations=[
        PDFPartition(chunking_strategy = "by_title",max_characters = 1024),
        Clean_extra_whitespace()
    ]
)


# Using SDK to upload
from clarifai.client import Dataset
dataset = Dataset(dataset_url)
dataset.upload_dataset(pipeline.run(files = file_path, loader = True))

More Examples

See many more code examples in this repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarifai_datautils-0.0.5.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

clarifai_datautils-0.0.5-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file clarifai_datautils-0.0.5.tar.gz.

File metadata

  • Download URL: clarifai_datautils-0.0.5.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for clarifai_datautils-0.0.5.tar.gz
Algorithm Hash digest
SHA256 5d5a8fff6aa920001162245cd75b3e3a7f2c31d67b70e9aafb4ccb7ffd448332
MD5 78b2b246f9b7e1203c96a1e4651fff19
BLAKE2b-256 14c62a66dc83803bae54871e1cb595f80cf82a3c54b24e89051a4fa525694b73

See more details on using hashes here.

File details

Details for the file clarifai_datautils-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for clarifai_datautils-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 af9e5a088ca240867b14537385770a0ee0682337f885645c1639e750d9646d16
MD5 e64aec8b4f811ca667fe44d205c9ae6f
BLAKE2b-256 93efd8485d905b69cea39d222201a9cc0e4a2519d2460b1643ef51fbf4bf7bdf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page