Skip to main content

A toolkit for tagging and analyzing transcript content using AI

Project description

Transcript Tagger

PyPI version License: MIT

A comprehensive toolkit for tagging and analyzing transcript content using AI. This SDK allows you to automatically categorize and determine the difficulty level of transcript text.

Features

  • AI-Powered Content Tagging: Generate relevant topic, format, audience, and other tags for transcript content
  • Content Difficulty Analysis: Analyze and rate the difficulty level of transcript content based on various metrics
  • Fully Customizable: Configure thresholds, categories, and storage options to fit your needs
  • Command Line Interface: Process transcripts directly from the command line
  • Python API: Integrate tagging and analysis capabilities into your own applications

Installation

pip install transcript-tagger

Quick Start

Basic Usage

from transcript_tagger_sdk import TranscriptTagger, Config

# Create a tagger with default configuration
tagger = TranscriptTagger()

# Process a transcript file
result = tagger.process_transcript("path/to/transcript.txt")

# Access the results
print(f"Difficulty level: {result['difficulty']['difficulty_name']}")
print(f"Topics: {result['tags'].get('Topic', [])}")

Analyzing Difficulty Only

from transcript_tagger_sdk import DifficultyAnalyzer

# Create an analyzer
analyzer = DifficultyAnalyzer()

# Analyze text
result = analyzer.analyze_text("Your transcript text here...")

# Print difficulty level
print(f"Difficulty: {result['difficulty_name']} ({result['difficulty_level']}/5)")

Custom Configuration

from transcript_tagger_sdk import Config, TranscriptTagger

# Create custom configuration
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")

# Custom readability thresholds
config.set_readability_thresholds({
    "Beginner": 3.0,  # 0-3.0
    "Intermediate": 9.0,  # 3.1-9.0
    "Advanced": 15.0,  # 9.1+
})

# Create tagger with custom config
tagger = TranscriptTagger(config)

Command Line Usage

Process transcripts:

# Process a single transcript
transcript-tagger process path/to/transcript.txt

# Process multiple transcripts
transcript-tagger process file1.txt file2.txt file3.txt

# Only analyze difficulty (no tagging)
transcript-tagger process --difficulty-only transcript.txt

# Only generate tags (no difficulty analysis)
transcript-tagger process --tags-only transcript.txt

View results:

# View all results
transcript-tagger view

# View results for a specific video ID
transcript-tagger view --video-id video123

Advanced Usage

For more advanced usage examples, check out the examples directory:

  • basic_usage.py: Simple usage example
  • advanced_usage.py: Advanced features including batch processing and custom configurations

API Reference

Main Classes

  • TranscriptTagger: Main class for tagging and analyzing transcripts
  • Config: Configuration class for customizing tagger behavior
  • DifficultyAnalyzer: Class for analyzing the difficulty level of text

Difficulty Levels

The toolkit defines 5 difficulty levels:

  1. 初级/Beginner: Basic vocabulary, simple sentences, suitable for beginners
  2. 初中级/Elementary: Slightly more complex vocabulary, suitable for early learners
  3. 中级/Intermediate: Moderate complexity, suitable for intermediate learners
  4. 中高级/Upper-Intermediate: More complex language, suitable for advanced learners
  5. 高级/Advanced: Complex vocabulary and sentence structures, suitable for proficient users

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcript-tagger-0.1.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcript_tagger-0.1.1-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file transcript-tagger-0.1.1.tar.gz.

File metadata

  • Download URL: transcript-tagger-0.1.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.5

File hashes

Hashes for transcript-tagger-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1ad0ff48db089807634e655b0dee9ae01829a90ac0aeee061f27842c64396861
MD5 91e7806571fb8cc7f1a30551aa3c8f19
BLAKE2b-256 47864ee4e74838948412d5ec629b785146f47129e7b8f4a129fe4d80a98f06e6

See more details on using hashes here.

File details

Details for the file transcript_tagger-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for transcript_tagger-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d62fc9cc291787d19be24d8bf992d8086a8b8566f5e35038aaab74559841bac4
MD5 0d3d183af727dbb672f0b860d2e87bf8
BLAKE2b-256 a91f3a08a6c9b980dfd01d19f2567549f2449b1a7006afa52aacc7438ba4118e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page