CLI para transcrição de audiências judiciais com WhisperX e diarização

These details have not been verified by PyPI

Project description

tecjustica-transcribe

CLI para transcrição de audiências judiciais com WhisperX e diarização de falantes.

Transcreve vídeos MP4 gerando texto com timestamps e identificação de quem está falando (Juiz, Promotor, Advogado, etc. — identificados como SPEAKER_00, SPEAKER_01...).

Requisitos do Sistema

Sistema Operacional

SO	Suporte	Observações
Ubuntu/Debian (WSL2)	✅ Testado	Recomendado. Funciona no Windows via WSL2
Ubuntu/Debian nativo	✅ Compatível	Instalação direta
Windows nativo	❌ Não suportado	Use WSL2 (veja abaixo)
macOS	❌ Não suportado	Requer GPU NVIDIA (CUDA)

Usuários Windows: instale o WSL2 com Ubuntu. O WSL2 acessa a GPU NVIDIA do Windows automaticamente.

Hardware

Componente	Mínimo	Recomendado
GPU NVIDIA	6 GB VRAM (ex: RTX 3050)	8 GB+ VRAM (ex: RTX 3060, RTX 4060)
RAM	8 GB	16 GB
Disco	10 GB livres (modelos IA)	15 GB+

GPUs AMD e Intel não são compatíveis. É necessária uma GPU NVIDIA com suporte a CUDA.

Software

Dependência	Como instalar	Verificar
Driver NVIDIA	nvidia.com/drivers ou Windows Update	`nvidia-smi`
CUDA	Instalado automaticamente com PyTorch	`python -c "import torch; print(torch.cuda.is_available())"`
ffmpeg	`sudo apt install ffmpeg`	`ffmpeg -version`
Python 3.10–3.13	`sudo apt install python3.12`	`python3 --version`
uv (recomendado)	`curl -LsSf https://astral.sh/uv/install.sh \| sh`	`uv --version`

O comando tecjustica-transcribe init verifica tudo isso automaticamente e mostra o que está faltando.

Token HuggingFace (para identificar falantes)

A diarização (identificar quem está falando) usa o modelo pyannote, que exige um token gratuito do HuggingFace:

Crie uma conta em https://huggingface.co
Aceite os termos do modelo em https://huggingface.co/pyannote/speaker-diarization-community-1
Gere um token (tipo "Read") em https://huggingface.co/settings/tokens

O comando init vai pedir esse token e salvá-lo automaticamente.

Sem o token, você ainda pode transcrever usando --sem-diarizacao — a transcrição funciona normalmente, só não identifica os falantes.

Instalação

uv tool install tecjustica-transcribe

Ou com pip:

pip install tecjustica-transcribe

Primeiro Uso

# 1. Verificar requisitos e configurar token (só precisa rodar uma vez)
tecjustica-transcribe init

O init mostra um diagnóstico completo:

╭──────────────── TecJustiça Transcribe — Diagnóstico ─────────────────╮
│ Python            ✅ 3.12.3                                          │
│ Driver NVIDIA     ✅ 591.44                                          │
│ CUDA              ✅ 12.8                                            │
│ GPU               ✅ NVIDIA GeForce RTX 3050 6GB Laptop GPU (6.0 GB) │
│ ffmpeg            ✅ 6.1.1                                           │
│ Token HuggingFace ✅ hf_pcgK...                                      │
╰──────────────────────────────────────────────────────────────────────╯

Transcrever

# Transcrever com identificação de falantes
tecjustica-transcribe transcrever audiencia.mp4

# Transcrever sem identificar falantes (não precisa de token)
tecjustica-transcribe transcrever audiencia.mp4 --sem-diarizacao

# Escolher pasta de saída (padrão: ./transcricoes/)
tecjustica-transcribe transcrever audiencia.mp4 --output ./minha-pasta

Saída

O comando gera 3 arquivos na pasta ./transcricoes/:

Arquivo	Formato	Uso
`audiencia.txt`	Texto puro com `[SPEAKER_00]`	Leitura e análise
`audiencia.srt`	Legendas com timestamps	Players de vídeo (VLC, etc.)
`audiencia.json`	Dados completos por palavra	Integração com outros sistemas

Solução de Problemas

Problema	Solução
`nvidia-smi` não encontrado	Instale o driver NVIDIA: nvidia.com/drivers
CUDA não disponível	Verifique se o driver NVIDIA é compatível com CUDA 12+
Erro de memória (OOM)	Feche outros programas, especialmente navegadores
Token HuggingFace negado	Aceite os termos em huggingface.co/pyannote/speaker-diarization-community-1
ffmpeg não encontrado	`sudo apt install ffmpeg`

Guia Rápido: Windows com WSL2

# 1. Abrir PowerShell como Administrador e instalar WSL2
wsl --install

# 2. Dentro do Ubuntu (WSL2), instalar dependências
sudo apt update && sudo apt install ffmpeg python3.12

# 3. Instalar uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 4. Instalar tecjustica-transcribe
uv tool install tecjustica-transcribe

# 5. Configurar e transcrever
tecjustica-transcribe init
tecjustica-transcribe transcrever /mnt/c/Users/SeuUsuario/Downloads/audiencia.mp4

No WSL2, seus arquivos do Windows ficam em /mnt/c/. Ex: C:\Users\marcos\Downloads\ → /mnt/c/Users/marcos/Downloads/

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.22

Mar 19, 2026

0.1.21

Mar 19, 2026

0.1.20

Mar 19, 2026

0.1.19

Mar 19, 2026

0.1.18

Mar 19, 2026

0.1.17

Mar 19, 2026

0.1.16

Mar 19, 2026

0.1.14

Mar 19, 2026

0.1.13

Mar 19, 2026

0.1.12

Mar 19, 2026

0.1.11

Mar 19, 2026

0.1.10

Mar 19, 2026

0.1.9

Mar 19, 2026

0.1.8

Mar 19, 2026

0.1.7

Mar 19, 2026

0.1.6

Mar 19, 2026

0.1.5

Mar 19, 2026

0.1.4

Mar 19, 2026

This version

0.1.3

Mar 19, 2026

0.1.2

Mar 19, 2026

0.1.1

Mar 19, 2026

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tecjustica_transcribe-0.1.3.tar.gz (7.9 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tecjustica_transcribe-0.1.3-py3-none-any.whl (10.4 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file tecjustica_transcribe-0.1.3.tar.gz.

File metadata

Download URL: tecjustica_transcribe-0.1.3.tar.gz
Upload date: Mar 19, 2026
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tecjustica_transcribe-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`d354c43b9a66c0b0fd3b61aae07885e0e8784fa1edb832c54faceca8544311c4`
MD5	`7cc767eb8e28927bd88c4bf242ba4f47`
BLAKE2b-256	`9ba757adf157fb77c84086c25962de88b0cf76bed7d5337907609aa53c1cce5d`

See more details on using hashes here.

File details

Details for the file tecjustica_transcribe-0.1.3-py3-none-any.whl.

File metadata

Download URL: tecjustica_transcribe-0.1.3-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tecjustica_transcribe-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae30c51a38c66292cff114b3ad36a5b9233813db4017f7a11a6c6a3adc546555`
MD5	`c9f5e878a3a3b001adaec08e378294a3`
BLAKE2b-256	`4e4cbfb0190aed45efff54d5b131207942286d6778c1f30979fca884e9a8ae44`

See more details on using hashes here.

tecjustica-transcribe 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

tecjustica-transcribe

Requisitos do Sistema

Sistema Operacional

Hardware

Software

Token HuggingFace (para identificar falantes)

Instalação

Primeiro Uso

Transcrever

Saída

Solução de Problemas

Guia Rápido: Windows com WSL2

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes