Skip to main content

Analyze, process, and extract from many types of input data. Highly modular/customizable.

Project description

Taters!

🥔 TATERS: Takes All Things, Extracts Relevant Stuff

Taters is a Python toolkit (and CLI) for getting from raw media to analysis-ready artifacts — fast, repeatable, and with predictable outputs. Point it at video, audio, or text and it helps you build end-to-end workflows: extract WAV from video, diarize and transcribe, compute embeddings, run dictionary/archetype analyses, then gather everything into tidy datasets you can model or visualize.

  • 🥔 Documentation: https://www.taters.wiki
  • 🥔 Status: early but usable; APIs will probably evolve. Pin versions if you need stability.

What Taters is (and is not)

  • Is: A library + CLI with small, composable functions and an optional YAML pipeline runner. Predictable I/O, friendly defaults, and “do not overwrite unless asked.”
  • Is not: A single black-box pipeline. You keep control of each step and can run pieces à la carte or all at once.
  • Is not: Edible.

A tiny taste of Taters

Python

from taters import Taters
t = Taters()

# Pull audio from video
wavs = t.audio.extract_wavs_from_video(input_path="input.mp4")

# Diarize & transcribe (CSV/SRT/TXT)
diar = t.audio.diarize_with_thirdparty(audio_path=wavs[0], device="auto")

# Features (defaults write under ./features/<kind>/)
t.audio.extract_whisper_embeddings(source_wav=wavs[0], transcript_csv=diar["csv"])
t.text.analyze_with_dictionaries(csv_path=diar["csv"], dict_paths=["dictionaries/liwc"])
t.text.analyze_with_archetypes(csv_path=diar["csv"], archetype_csvs=["archetypes/Resilience.csv"])

CLI

# Whisper embeddings over non-silent spans, then mean-pool
python -m taters.audio.extract_whisper_embeddings \
  --source_wav audio/session.wav --strategy nonsilent --aggregate mean

For more examples (including per-speaker splits, sentence embeddings, and end-to-end pipelines), see the Guides in the docs.


Installation

Use a fresh virtual environment. Then follow the step-by-step install guide (CPU or CUDA, FFmpeg, optional diarization extras): 👉 https://www.taters.wiki/install-guide


Pipelines

When you are ready to batch a whole dataset, use the YAML runner to chain steps and control concurrency:

python -m taters.pipelines.run_pipeline \
  --root_dir videos --file_type video \
  --preset conversation_video \
  --workers 8 --var device=cuda

Details, presets, and how to write your own: 👉 https://www.taters.wiki/guides/pipelines/


Contributing

Bug reports and pull requests are welcome. If you are using Taters on real projects, feedback on rough edges and missing presets is especially valuable.


License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taters-0.1.7.tar.gz (87.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

taters-0.1.7-py3-none-any.whl (109.8 kB view details)

Uploaded Python 3

File details

Details for the file taters-0.1.7.tar.gz.

File metadata

  • Download URL: taters-0.1.7.tar.gz
  • Upload date:
  • Size: 87.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for taters-0.1.7.tar.gz
Algorithm Hash digest
SHA256 fd6fd3b3d2822a6bcf7c7a18a299f5d0ae77aa6638948d2ef905a4a0837b0fe1
MD5 1618082b765a9409b0d54b899214891e
BLAKE2b-256 4eaac7eb89e04ffbe6f4e21e92a9485cb1e086c71d35240fee84dc863638c441

See more details on using hashes here.

File details

Details for the file taters-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: taters-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 109.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for taters-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 437b4a9da4bead24c97328a91cccb8a1aa6305486f105b97597f8dab0ac822dd
MD5 6e0bdd5b831140ad9cf88b4e5f4bed52
BLAKE2b-256 290e0d118e354faaac6979849b00034f94d723e7a52c3bc6125bd5af7cc22c9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page