Skip to main content

A toolkit for tagging and analyzing transcript content using AI

Project description

Transcript Tagger

PyPI version License: MIT

A comprehensive toolkit for tagging and analyzing transcript content using AI. This SDK allows you to automatically categorize and determine the difficulty level of transcript text.

Features

  • AI-Powered Content Tagging: Generate relevant topic, format, audience, and other tags for transcript content
  • Content Difficulty Analysis: Analyze and rate the difficulty level of transcript content based on various metrics
  • Fully Customizable: Configure thresholds, categories, and storage options to fit your needs
  • Command Line Interface: Process transcripts directly from the command line
  • Python API: Integrate tagging and analysis capabilities into your own applications

Installation

pip install transcript-tagger

Quick Start

Basic Usage

from transcript_tagger_sdk import TranscriptTagger, Config

# Create a tagger with default configuration
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")

tagger = TranscriptTagger(config)

# Process a transcript file
result = tagger.process_transcript("path/to/transcript.txt")

# Access the results
print(f"Difficulty level: {result['difficulty']['difficulty_name']}")
print(f"Topics: {result['tags'].get('Topic', [])}")

Analyzing Difficulty Only

from transcript_tagger_sdk import DifficultyAnalyzer

config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")

# Create an analyzer
analyzer = DifficultyAnalyzer(config)

# Analyze text
result = analyzer.analyze_text("Your transcript text here...")

# Print difficulty level
print(f"Difficulty: {result['difficulty_name']} ({result['difficulty_level']}/5)")

Custom Configuration

from transcript_tagger_sdk import Config, TranscriptTagger

# Create custom configuration
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")

# Custom readability thresholds
config.set_readability_thresholds({
    "Beginner": 3.0,  # 0-3.0
    "Intermediate": 9.0,  # 3.1-9.0
    "Advanced": 15.0,  # 9.1+
})

# Create tagger with custom config
tagger = TranscriptTagger(config)

Command Line Usage

Process transcripts:

# Process a single transcript
transcript-tagger process path/to/transcript.txt

# Process multiple transcripts
transcript-tagger process file1.txt file2.txt file3.txt

# Only analyze difficulty (no tagging)
transcript-tagger process --difficulty-only transcript.txt

# Only generate tags (no difficulty analysis)
transcript-tagger process --tags-only transcript.txt

View results:

# View all results
transcript-tagger view

# View results for a specific video ID
transcript-tagger view --video-id video123

Advanced Usage

For more advanced usage examples, check out the examples directory:

  • basic_usage.py: Simple usage example
  • advanced_usage.py: Advanced features including batch processing and custom configurations

API Reference

Main Classes

  • TranscriptTagger: Main class for tagging and analyzing transcripts
  • Config: Configuration class for customizing tagger behavior
  • DifficultyAnalyzer: Class for analyzing the difficulty level of text

Difficulty Levels

The toolkit defines 5 difficulty levels:

  1. 初级/Beginner: Basic vocabulary, simple sentences, suitable for beginners
  2. 初中级/Elementary: Slightly more complex vocabulary, suitable for early learners
  3. 中级/Intermediate: Moderate complexity, suitable for intermediate learners
  4. 中高级/Upper-Intermediate: More complex language, suitable for advanced learners
  5. 高级/Advanced: Complex vocabulary and sentence structures, suitable for proficient users

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcript-tagger-0.1.2.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcript_tagger-0.1.2-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file transcript-tagger-0.1.2.tar.gz.

File metadata

  • Download URL: transcript-tagger-0.1.2.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.5

File hashes

Hashes for transcript-tagger-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fc088f2d3bfcdc500f3e98328814ceb8f1da050aaead94bd30328ec051e1d5e3
MD5 ab45d283ab3a73a1d5a920795f1a191e
BLAKE2b-256 0bd588e55bd5d2b8dc4868c54d25d62dd320618cb3f2f28b317dd60b7b674619

See more details on using hashes here.

File details

Details for the file transcript_tagger-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for transcript_tagger-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47f84ceb38fb2ce5e19effae0a88b2d22a79401f530f2b5567a07044e70dd137
MD5 7e5fbeab519426ec1767484fa9b418b6
BLAKE2b-256 00f519614a4aaf6ec73c9c3cb74702c8239608c3bef871c81cb822b55c9d1dc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page