Skip to main content

CLI tool for dataset preparation: resize, align, caption, shuffle, synthetic, mask, filter, degrade, and character generation.

Project description

datasety

CLI tool for dataset preparation

PyPI License: MIT Python 3.10+

CLI tool for dataset preparation — resize, caption, align, shuffle, synthetic editing, masking, degradation, character generation, LoRA training, audio TTS datasets, upload to HuggingFace, and multi-step workflows.

Full documentation →

Installation

pip install datasety                 # core (resize, align, shuffle, degrade)
pip install datasety[caption]        # + Florence-2 captioning
pip install datasety[synthetic]      # + image editing (FLUX, Qwen, SDXL)
pip install datasety[mask]           # + segmentation masks (SAM 3, CLIPSeg)
pip install datasety[filter]         # + content filtering (CLIP, NudeNet)
pip install datasety[character]      # + character dataset generation
pip install datasety[workflow]       # + YAML workflow support
pip install datasety[train]          # + LoRA training (FLUX, Qwen) & TTS (Piper)
pip install datasety[audio]          # + TTS audio datasets (YouTube, VAD, Piper)
pip install datasety[upload]         # + upload to HuggingFace Hub
pip install datasety[all]            # everything

Commands

resize — Resize & Crop Images

Batch resize images to exact dimensions with configurable crop positions.

datasety resize --input ./raw --output ./resized --resolution 768x1024 --crop-position top
Options
Option Description Default
--input, -i Input directory required*
--output, -o Output directory required*
--input-image Single input image (alternative to dir mode)
--output-image Single output image (use with --input-image)
--resolution, -r Target resolution (WIDTHxHEIGHT)
--megapixel Target megapixel count (e.g., 0.5, 1.0)
--aspect-ratio Aspect ratio W:H (e.g., 1:1, 16:9)
--crop-position top, center, bottom, left, right center
--input-format Comma-separated input formats jpg,jpeg,png,webp
--output-format jpg, png, webp jpg
--output-name-numbers Rename output files to 1.jpg, 2.jpg, ... off
--upscale Upscale images smaller than target off
--min-resolution Skip images below this size (e.g., 256x256)
--workers Parallel workers for processing 1
--recursive, -R Search input directory recursively off
--progress Show tqdm progress bar off
--dry-run Preview without modifying files off
# Single image
datasety resize --input-image photo.jpg --output-image resized.jpg -r 512x512

# Batch with sequential numbering
datasety resize -i ./photos -o ./dataset -r 1024x1024 --output-name-numbers --crop-position top

Full documentation →


caption — Generate Image Captions

Generate captions using Florence-2 (local) or OpenAI-compatible vision APIs.

datasety caption --input ./images --output ./captions --trigger-word "[trigger]"
Options
Option Description Default
--input, -i Input directory required*
--output, -o Output directory for .txt files required*
--input-image Single input image
--output-caption Single output .txt path
--device auto, cpu, cuda, mps auto
--trigger-word Text to prepend to each caption
--prompt Florence-2 task prompt <MORE_DETAILED_CAPTION>
--model HF model name or API model ID
--num-beams Beam search width (1 = greedy) 3
--florence-2-base Use Florence-2-base (0.23B, faster) default
--florence-2-large Use Florence-2-large (0.77B, more accurate)
--llm-api Use OpenAI-compatible vision API
--max-tokens Max response tokens (API mode) 300
--temperature Temperature (API mode) 0.3
--skip-existing Skip images that already have a .txt file off
--append Append text to existing captions
--prepend Prepend text to existing captions
--recursive, -R Search input directory recursively off
--progress Show tqdm progress bar off
--dry-run Preview without processing off
# Florence-2 with trigger word
datasety caption -i ./dataset -o ./dataset --trigger-word "photo of sks person," --device cuda

# OpenAI vision API (supports OPENAI_MODEL env var)
datasety caption -i ./images -o ./captions --llm-api --model gpt-5-nano

Full documentation →


align — Align Control/Target Pairs

Match dimensions, enforce multiples of 32, and unify formats for control/target training pairs. Includes a built-in web server for visual comparison with a compare slider, caption editing, and pair management.

datasety align --target ./target --control ./control --dry-run
Options
Option Description Default
--target, -t Target images directory required
--control, -c Control images directory required
--multiple-of Align dimensions to this multiple 32
--output-format Convert all images: jpg, png, webp keep original
--recursive, -R Search input directories recursively off
--dry-run Preview changes without modifying files off
# Preview, then apply
datasety align -t ./target -c ./control --dry-run
datasety align -t ./target -c ./control --output-format jpg

Visual comparison: use datasety server -i ./target --control ./control to browse and compare aligned pairs in the browser.

Full documentation →


shuffle — Random Caption Generation

Generate random captions by picking one variant from each text group.

datasety shuffle -i ./images -o ./captions \
    --group "A photo of a person.|Portrait of someone." \
    --group "Remove the hat.|Take off the hat."
Options
Option Description Default
--input, -i Input directory containing images required
--output, -o Output directory for .txt files required
--group, -g Inline |-separated, .txt file, or URL required
--separator Separator between groups " "
--seed Random seed for reproducibility
--dry-run Preview captions without writing off
--show-distribution Show caption distribution after generation off
# Mix file, URL, and inline sources
datasety shuffle -i ./images -o ./captions \
    --group subjects.txt \
    --group "ending A|ending B" \
    --seed 42 --show-distribution

Full documentation →


synthetic — Synthetic Image Editing

Generate synthetic variations using image editing models (FLUX.2-klein FP8, FLUX.2-klein-9b-kv, Qwen-Image-Edit-2511, SDXL, LongCat, HunyuanImage). The default model FLUX.2-klein-4b-fp8 requires no HuggingFace token and fits in ~5 GB VRAM.

datasety synthetic --input ./images --output ./synthetic --prompt "add a winter hat" --steps 4
Options
Option Description Default
--input, -i Input directory required*
--output, -o Output directory required*
--input-image Single input image
--output-image Single output image
--prompt, -p Edit instruction required
--model Model (auto-detects family or API model) black-forest-labs/FLUX.2-klein-4b-fp8
--image-api Use OpenAI-compatible API for generation off
--api-aspect-ratio Aspect ratio for --image-api (e.g. 16:9, 9:16, 1:1) auto
--api-image-size Resolution for --image-api: 0.5K, 1K, 2K, 4K 1K
--weights Fine-tuned weights file
--lora LoRA adapter (repeatable, :WEIGHT)
--device auto, cpu, cuda, mps auto
--cpu-offload Force CPU offload auto
--steps Inference steps 4
--cfg-scale Guidance scale 2.5
--true-cfg-scale True CFG (Qwen only) 4.0
--negative-prompt Negative prompt " "
--num-images Images per input 1
--seed Random seed
--gguf GGUF path/URL for quantized loading
--strength Img2img strength (SDXL/FLUX.2, 0.0-1.0) 0.7
--recursive, -R Search input directory recursively off
--output-format png, jpg, webp png
--skip-existing Skip images with existing output off
--batch-size Flush GPU memory every N images 0 (off)
--progress Show tqdm progress bar off
--dry-run Preview without loading models off
# Single image edit
datasety synthetic --input-image photo.jpg --output-image edited.png \
    --prompt "add sunglasses" --steps 4

# Cloud API — FLUX.2-flex (no GPU needed)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety synthetic -i ./images -o ./synthetic \
  --prompt "add a winter hat" --image-api --model black-forest-labs/flux.2-flex \
  --api-aspect-ratio 1:1

# Cloud API — Gemini 2.5 Flash (text+image, supports image-to-image)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety synthetic -i ./images -o ./synthetic \
  --prompt "transform into oil painting style" \
  --model google/gemini-2.5-flash-image --image-api \
  --api-aspect-ratio 3:4 --api-image-size 2K

# FLUX.2-klein-9b-kv (KV-cache, faster multi-reference, ~29 GB VRAM)
datasety synthetic -i ./images -o ./synthetic \
    --model "black-forest-labs/FLUX.2-klein-9b-kv" \
    --prompt "add sunglasses" --steps 4

# Qwen-Image-Edit-2511 with LoRA
datasety synthetic -i ./dataset -o ./synthetic \
    --model "Qwen/Qwen-Image-Edit-2511" \
    --lora "adapter.safetensors:0.8" \
    --prompt "add a red scarf" --steps 40

Full documentation →


mask — Text-Prompted Segmentation Masks

Generate binary masks from images using text keywords. Supports SAM 3, SAM 2, and CLIPSeg.

datasety mask --input ./dataset --output ./masks --keywords "face,hair" --device cuda
Options
Option Description Default
--input, -i Input directory required*
--output, -o Output directory for masks required*
--input-image Single input image
--output-image Single output mask
--keywords, -k Comma-separated keywords required
--model sam3, sam2, clipseg sam3
--device auto, cpu, cuda, mps auto
--threshold Confidence threshold (0.0-1.0) 0.3
--padding Pixels to expand mask (dilation) 0
--blur Gaussian blur radius for edges 0
--invert Invert mask colors off
--naming folder or suffix (_mask) folder
--output-format png, jpg, webp png
--skip-existing Skip images with existing masks off
--dry-run Preview detections without saving off
--recursive, -R Search input directory recursively off
--progress Show tqdm progress bar off
# CLIPSeg (lightweight, no extra deps)
datasety mask -i ./dataset -o ./masks -k "face" --model clipseg --threshold 0.5

# SAM 2 with mask refinement
datasety mask -i ./dataset -o ./masks -k "hat,glasses" --model sam2 --padding 5 --blur 3

Full documentation →


filter — Filter Dataset by Content

Filter, curate, or clean datasets based on image content. Use CLIP for arbitrary text queries or NudeNet for NSFW label detection.

datasety filter --input ./dataset --output ./rejected --query "leg,male face" --action move
Options
Option Description Default
--input, -i Input directory required
--output, -o Output directory for matched/rejected images
--query, -q Comma-separated text queries (CLIP)
--labels, -l Comma-separated NudeNet labels
--model clip, nudenet clip
--action move, copy, delete, keep move
--threshold Confidence threshold (0.0-1.0) 0.5
--device auto, cpu, cuda, mps auto
--confirm Required for destructive actions (delete, keep) off
--preserve-structure Keep subfolder hierarchy in output (with --recursive) off
--invert Invert match logic (act on non-matches) off
--log Write CSV log of all decisions to this path
--dry-run Preview detections without modifying files off
--recursive, -R Search input directory recursively off
--progress Show tqdm progress bar off
# Move images containing legs or male faces to a reject folder
datasety filter -i ./dataset -o ./rejected --query "leg,male face" --action move

# Delete NSFW images using NudeNet labels
datasety filter -i ./dataset --labels "FEMALE_BREAST_EXPOSED,MALE_GENITALIA_EXPOSED" \
    --action delete --model nudenet --threshold 0.6 --confirm

# Keep only images with "hat and socks", move the rest out
datasety filter -i ./dataset -o ./rejected --query "hat and socks" --action keep

# Dry-run to preview what would be filtered
datasety filter -i ./dataset --query "blurry,low quality" --action delete --dry-run -R

# Write a decision log for review
datasety filter -i ./dataset -o ./rejected --query "outdoor" --action copy --log filter_log.csv

Full documentation →


inspect — Dataset Statistics

Scan a dataset directory and report image count, resolution distribution, format breakdown, file sizes, caption coverage, and optionally detect duplicate images via perceptual hashing.

datasety inspect --input ./dataset --duplicates
Options
Option Description Default
--input, -i Input directory required
--duplicates Detect duplicate/near-duplicate images off
--json Export report as JSON to this path
--csv Export per-image data as CSV to this path
--recursive, -R Search input directory recursively off
# Full report with duplicate detection
datasety inspect -i ./dataset --duplicates

# Export report to JSON
datasety inspect -i ./dataset --json report.json

# Export per-image data to CSV
datasety inspect -i ./dataset --csv images.csv -R

Full documentation →


server — Dataset Management Dashboard

Start a universal web server for managing your entire dataset from the browser.

Start a universal web server for managing your entire dataset from the browser. Browse images in a gallery, edit and create captions, delete or compare images, view statistics, upload new images, and detect duplicates — all in one interface.

datasety server --input ./dataset
Options
Option Description Default
--input, -i Dataset directory to manage required
--control, -c Control images directory (enables Pairs tab)
--port Port for the web server 8080
--recursive, -R Search directories recursively for images off
--duplicates Pre-compute perceptual hashes for duplicate detection off
# Start the dashboard on the default port
datasety server -i ./dataset

# With duplicate detection pre-computed
datasety server -i ./dataset --duplicates --port 9000

# Pairs comparison (align workflow)
datasety server -i ./target --control ./control

The dashboard provides:

  • Gallery — thumbnail grid with sorting and filtering; click any image for the detail panel (caption editor, file info, delete)
  • Compare — drag-slider side-by-side comparison for any two images
  • Pairs (with --control) — compare control/target pairs with a drag slider; edit captions for both sides; delete pairs; arrow-key navigation
  • Stats — live dataset overview: image count, total size, caption coverage, format and orientation breakdown
  • Upload — drag images into the browser or use the Upload button to add images to the dataset
  • Keyboard navigation — arrow keys to move through gallery or pairs, Ctrl+S to save, T to toggle theme, ? for help

degrade — Image Degradation

Create degraded versions of images for upscale/enhance training. Pure Pillow, no extra dependencies.

datasety degrade --input ./originals --output ./dataset --type random --intensity-range 0.2-0.8 --paired
Options
Option Description Default
--input, -i Input directory required*
--output, -o Output directory required*
--input-image Single input image
--output-image Single output image
--type, -t Degradation type(s), repeatable random
--intensity Global intensity (0.0-1.0) 0.5
--intensity-range Random range MIN-MAX
--chain Apply multiple types sequentially off
--num-variants Variants per input image 1
--paired Create control/ + target/ subdirs off
--seed Random seed
--output-format png, jpg, webp png
--skip-existing Skip images with existing output off
--workers Parallel workers for processing 1
--progress Show tqdm progress bar off
--dry-run Preview without writing files off

Degradation types: lowres, oversharpen, noise, blur, jpeg, motion-blur, pixelate, color-bands, upscale-sim, random

# Chain specific degradations for paired output
datasety degrade -i ./images -o ./dataset --type jpeg --type noise --chain --paired --seed 42

# Multiple random variants per image
datasety degrade -i ./images -o ./degraded --type random --num-variants 3 --intensity-range 0.3-0.8

Full documentation →


character — Character Dataset Generation

Generate character datasets using LLM-generated prompts + text-to-image (FLUX.2-klein local or cloud API).

datasety character --output ./dataset --llm-ollama qwen3.5:4b --num-images 20
Options
Option Description Default
--reference, -r Reference face image(s) (optional, prompt context)
--output, -o Output directory required
--num-images, -n Number of images to generate 10
--model Model for generation (local HF or API model ID) black-forest-labs/FLUX.2-klein-4b-fp8
--gguf GGUF path/URL for quantized loading
--image-api Use OpenAI-compatible API for image generation off
--api-aspect-ratio Aspect ratio for --image-api (e.g. 9:16, 1:1) derived from --width/--height
--api-image-size Resolution for --image-api: 0.5K, 1K, 2K, 4K
--character-description Text description of the character
--style Style guidance (e.g., photorealistic)
--prompts-only Only generate prompts, skip images off
--prompts-file Load prompts from file instead of LLM
--llm-api Use OpenAI-compatible API for prompts
--llm-ollama MODEL Use local Ollama server for prompts
--llm-gguf PATH Use local GGUF model for prompts
--llm-model REPO Use HuggingFace model for prompts
--device auto, cpu, cuda, mps auto
--steps Inference steps 4
--cfg-scale Guidance scale 4.0
--seed Random seed
--height Output image height 1024
--width Output image width 1024
--output-format png, jpg, webp png
--batch-size Flush GPU memory every N images 0 (off)
--dry-run Preview prompts without generating images off
# Generate with local pipeline + Ollama prompts
datasety character -o ./dataset --llm-ollama qwen3.5:4b --num-images 20

# Cloud API for images (no GPU needed)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety character -o ./dataset --prompts-file prompts.txt \
  --image-api --model black-forest-labs/flux.2-flex --api-aspect-ratio 2:3

# Preview prompts only
datasety character -o ./dataset --llm-api --prompts-only

Full documentation →


sweep — Parameter Grid Search

Generate workflow YAML files with parameter grid combinations for synthetic editing. Computes the Cartesian product of sweep parameters.

datasety sweep -i ./images -o ./sweep_output -p "add a winter hat" --steps 4,8,16 --cfg-scale 1.0,2.5,5.0
Options
Option Description Default
--input, -i Input images directory required
--output, -o Base output directory required
--prompt, -p Edit prompt required
--steps Comma-separated step values to sweep
--cfg-scale Comma-separated CFG values to sweep
--true-cfg-scale Comma-separated true CFG values to sweep
--strength Comma-separated strength values to sweep
--lora Comma-separated LoRA specs to sweep
--model Comma-separated model names to sweep
--seed Random seed (passed through)
--output-file Output YAML path sweep.yaml
--run Generate and immediately execute off
# Generate YAML, inspect, then run
datasety sweep -i ./images -o ./sweep -p "add sunglasses" --steps 4,8,16 --cfg-scale 1.0,2.5
datasety workflow -f sweep.yaml

# Generate and run immediately
datasety sweep -i ./images -o ./sweep -p "add a hat" --steps 4,8 --cfg-scale 2.0,3.0 --run

Full documentation →


train — LoRA Fine-Tuning & TTS Training

Train a LoRA adapter for image generation models (FLUX, SDXL, Qwen) or a TTS voice model (Piper). The mode is auto-detected from --family (flux/sdxl/qwen) or --backend (piper/coqui/f5-tts).

Image parameters (--family flux/sdxl/qwen): --lr, --lora-rank, --lora-alpha, --image-size, --optimizer, --lr-scheduler, etc.

Audio parameters (--backend piper): --sample-rate, --batch-size, --accelerator, --devices, --test-text.

# Image: FLUX.2-klein LoRA (~8 GB VRAM)
datasety train --input ./dataset --output lora.safetensors --family flux --steps 500 --lr 1e-4 --lora-rank 16

# Audio: Piper TTS (auto-downloads base model, auto-installs Piper, multi-GPU, voice watcher)
datasety train -i ./tts_dataset -o ./tts_output --backend piper \
    --model "rhasspy/piper-checkpoints:en/en_US/kristin/medium" \
    --devices auto --test-text "Hello world"
Image (LoRA) Options
Option Description Default
--family Model family: flux, sdxl, qwen auto-detected
--model, -m HuggingFace repo ID (base model) black-forest-labs/FLUX.2-klein-base-4B
--output, -o Output .safetensors path lora.safetensors
--steps Training steps 100
--lr Learning rate 1e-4
--lora-rank LoRA rank 16
--lora-alpha LoRA alpha 16.0
--lora-dropout LoRA dropout rate 0.0
--image-size Training resolution (square crop) 512
--device auto, cpu, cuda, mps auto
--seed Random seed 42
--save-every Save checkpoint every N steps end only
--resume Resume from a .safetensors checkpoint
--validation-split Fraction for validation (0.0–0.5)
--timestep-type Timestep sampling: sigmoid, lognorm, linear sigmoid
--caption-dropout Probability of dropping caption 0.05
--gradient-checkpointing Enable gradient checkpointing (saves VRAM) off
--optimizer adamw or adamw8bit (requires bitsandbytes) adamw
--lr-scheduler LR schedule: constant, cosine, linear constant
--lr-warmup-steps Linear warmup steps 0
--gradient-accumulation-steps Accumulate gradients over N steps 1
--min-snr-gamma Min-SNR-γ for SDXL (recommended: 5.0) disabled
--noise-offset Per-channel noise offset for SDXL (recommended: 0.05–0.1) 0.0
Audio (TTS) Options
Option Description Default
--backend TTS backend: piper (coqui, f5-tts planned) piper
--model Piper base model (repo_id:subfolder or local path) (required)
--output, -o Output directory for .ckpt checkpoints (required)
--steps Training epochs 100
--sample-rate Audio sample rate in Hz 22050
--batch-size Training batch size 32
--accelerator PyTorch Lightning accelerator: auto, gpu, cpu auto
--devices Number of GPUs: auto, 1, 2, -1 (all) auto
--test-text Background inference text to test checkpoints
--seed Random seed 42

Full documentation →


audio — Build TTS Audio Datasets

Build TTS (Text-to-Speech) audio datasets from video or audio files. Supports YouTube URLs, direct media URLs, local files, and text files containing lists of paths. Extracts audio, transcribes with faster-whisper, performs deep text cleaning, and outputs Piper/LJSpeech-compatible datasets.

datasety audio --input ./video.mp4 --output ./dataset
datasety audio --input ./clips/ --output ./dataset
datasety audio --input "https://www.youtube.com/watch?v=..." --output ./dataset --language uk
Options
Option Description Default
--input, -i Input: local file, URL, dir, or .txt list. Append ?start=X&end=Y to slice required
--output, -o Output directory for the dataset required
--sample-rate Output audio sample rate in Hz 22050
--demucs Enable Demucs vocal isolation false
--demucs-model Demucs model name htdemucs
--whisper-model Faster-Whisper model: tiny, base, small, medium, large-v3 base
--language Language code (e.g., en, es, fr, uk). Auto-detected if omitted (auto)
--device Device: auto, cpu, cuda, mps auto
--vad Enable voice activity detection (VAD) to filter non-speech false
--min-duration Minimum segment duration in seconds 1.5
--max-duration Maximum segment duration in seconds 30.0
--merge-gap Merge segments closer than this many seconds 0.0 (off)
--normalize-numbers Expand digits into words false
--no-clean-text Disable special character stripping false
--phoneme-map Path to config.json/phonemes.json to filter bad text
--workers Number of parallel file workers (default: 1) 1
--keep-temp Keep temporary audio files at this path
--resume Resume a previous run (skip existing chunks, append to CSV) false
--overwrite Overwrite existing output directory false
--dry-run Print pipeline steps without executing false
--verbose, -V Print detailed progress messages false
# Process a list of URLs from a text file, dropping unsupported characters
datasety audio --input urls.txt --output ./dataset --phoneme-map phonemes.json

# Extract a specific 40-second slice from a YouTube video
datasety audio --input "https://youtube.com/watch?v=...?start=50&end=90" -o ./dataset

# Local video with vocal isolation and high-quality transcription
datasety audio --input ./video.mp4 --output ./dataset --demucs --whisper-model large-v3

# Parallel processing of multiple files
datasety audio --input ./videos/ --output ./dataset --workers 4

Full documentation →


upload — Upload to HuggingFace Hub

Upload datasets and model adapters to HuggingFace Hub. Auto-detects type (audio, image, video, document, model, generic) from directory structure and generates HF-compliant README dataset cards with YAML frontmatter.

datasety upload --path ./tts_dataset --repo-id user/my-voice --type audio
datasety upload --path ./lora_output --repo-id user/klein-lora --type model
datasety upload --path ./dataset --repo-id user/my-dataset --dry-run
Options
Option Description Default
--path, -p Path to the dataset or model directory to upload required
--repo-id, -r HuggingFace repo ID (e.g. username/my-dataset). Derived from dir name if omitted (derived)
--type, -t Dataset or model type auto
--private Make the repository private false
--token HuggingFace API token (or set HF_TOKEN env var) HF_TOKEN
--force Force regenerate README.md if it already exists false
--dry-run Show what would be uploaded without uploading false
--metadata Extra YAML key: value pairs for dataset card frontmatter
--yes, -y Skip all confirmation prompts false
--verbose, -V Print detailed progress messages false
# Upload a TTS dataset (auto-generates README with TTS task card)
datasety upload --path ./tts_dataset --repo-id your-username/my-voice --private

# Upload a LoRA adapter
datasety upload --path ./lora.safetensors --repo-id your-username/klein-lora --type model

# Dry-run to verify what will be uploaded
datasety upload --path ./dataset --repo-id user/dataset --dry-run --verbose

# With extra metadata
datasety upload --path ./dataset --repo-id user/dataset \
    --metadata 'license:cc-by-4.0 language: [en,fr]'

Full documentation →


workflow — Multi-Step Pipelines

Run multi-step datasety pipelines from YAML or JSON files with dry-run validation.

datasety workflow --file datasety.yaml --dry-run
Options
Option Description Default
--file, -f Path to workflow file auto-detect
--dry-run Validate steps without executing off

Create datasety.yaml:

steps:
  - command: resize
    args:
      input: ./raw
      output: ./resized
      resolution: 768x1024
  - command: caption
    args:
      input: ./resized
      output: ./resized
      llm-api: true
      model: gpt-5-nano
# Validate first, then execute
datasety workflow --dry-run
datasety workflow

Full documentation →


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasety-0.40.0.tar.gz (5.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasety-0.40.0-py3-none-any.whl (132.5 kB view details)

Uploaded Python 3

File details

Details for the file datasety-0.40.0.tar.gz.

File metadata

  • Download URL: datasety-0.40.0.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datasety-0.40.0.tar.gz
Algorithm Hash digest
SHA256 3ab1e257efc646d854e7ec1b1a4f5723035b677e4d6f863566e10b6e9aa30728
MD5 22110f218362fb5e9566082eebde72d6
BLAKE2b-256 1a3ae0bc3b1cb6d4d0f656b6314b5c05e754461260bd0fa9377a5116bb7121e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for datasety-0.40.0.tar.gz:

Publisher: publish.yml on kontextox/datasety

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datasety-0.40.0-py3-none-any.whl.

File metadata

  • Download URL: datasety-0.40.0-py3-none-any.whl
  • Upload date:
  • Size: 132.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datasety-0.40.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10767f5add1fb52324b3894411466ec237e3c8c9a169cd46253dae141e2840ec
MD5 50f3c0d744be4d6dc12af5ac808faeb0
BLAKE2b-256 94931bc232e1f04c0d37d462b27ee3d0b9fc3d0c909f1137f008ad11ae653da0

See more details on using hashes here.

Provenance

The following attestation bundles were made for datasety-0.40.0-py3-none-any.whl:

Publisher: publish.yml on kontextox/datasety

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page