Analyze, process, and extract from many types of input data. Highly modular/customizable.
Project description
🥔 TATERS: Takes All Things, Extracts Relevant Stuff
Taters is a Python toolkit (and CLI) for getting from raw media to analysis-ready artifacts — fast, repeatable, and with predictable outputs. Point it at video, audio, or text and it helps you build end-to-end workflows: extract WAV from video, diarize and transcribe, compute embeddings, run dictionary/archetype analyses, then gather everything into tidy datasets you can model or visualize.
- 🥔 Documentation: https://www.taters.wiki
- 🥔 Status: early but usable; APIs will probably evolve. Pin versions if you need stability.
What Taters is (and is not)
- Is: A library + CLI with small, composable functions and an optional YAML pipeline runner. Predictable I/O, friendly defaults, and “do not overwrite unless asked.”
- Is not: A single black-box pipeline. You keep control of each step and can run pieces à la carte or all at once.
- Is not: Edible.
A tiny taste of Taters
Python
from taters import Taters
t = Taters()
# Pull audio from video
wavs = t.audio.extract_wavs_from_video(input_path="input.mp4")
# Diarize & transcribe (CSV/SRT/TXT)
diar = t.audio.diarize_with_thirdparty(audio_path=wavs[0], device="auto")
# Features (defaults write under ./features/<kind>/)
t.audio.extract_whisper_embeddings(source_wav=wavs[0], transcript_csv=diar["csv"])
t.text.analyze_with_dictionaries(csv_path=diar["csv"], dict_paths=["dictionaries/liwc"])
t.text.analyze_with_archetypes(csv_path=diar["csv"], archetype_csvs=["archetypes/Resilience.csv"])
CLI
# Whisper embeddings over non-silent spans, then mean-pool
python -m taters.audio.extract_whisper_embeddings \
--source_wav audio/session.wav --strategy nonsilent --aggregate mean
For more examples (including per-speaker splits, sentence embeddings, and end-to-end pipelines), see the Guides in the docs.
Installation
Use a fresh virtual environment. Then follow the step-by-step install guide (CPU or CUDA, FFmpeg, optional diarization extras): 👉 https://www.taters.wiki/install-guide
Pipelines
When you are ready to batch a whole dataset, use the YAML runner to chain steps and control concurrency:
python -m taters.pipelines.run_pipeline \
--root_dir videos --file_type video \
--preset conversation_video \
--workers 8 --var device=cuda
Details, presets, and how to write your own: 👉 https://www.taters.wiki/guides/pipelines/
Contributing
Bug reports and pull requests are welcome. If you are using Taters on real projects, feedback on rough edges and missing presets is especially valuable.
License
MIT. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file taters-0.1.93.tar.gz.
File metadata
- Download URL: taters-0.1.93.tar.gz
- Upload date:
- Size: 101.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06200d8de5e657314a6c48b5e014e6819d1a3bf969ee1279c6f41fe43a57f055
|
|
| MD5 |
68402659cb0ef43d7f8cfd2585821937
|
|
| BLAKE2b-256 |
b67b753ddb7578fd236b5e0d8dbbee4d02f268f74b2b4d9d80517cd716f1227e
|
File details
Details for the file taters-0.1.93-py3-none-any.whl.
File metadata
- Download URL: taters-0.1.93-py3-none-any.whl
- Upload date:
- Size: 128.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05c23a4f7af07a88cee3e3869e4aff982c0ef4bd67e5ad72a527f386be16f1c4
|
|
| MD5 |
db37cc341b546f5f230687d1ff5bb4f7
|
|
| BLAKE2b-256 |
90b6a1012103806ffd04525ca65838af636157b14e1fec3e82eecda2dcf1ddc0
|