A toolkit for tagging and analyzing transcript content using AI
Project description
Transcript Tagger
A comprehensive toolkit for tagging and analyzing transcript content using AI. This SDK allows you to automatically categorize and determine the difficulty level of transcript text.
Features
- AI-Powered Content Tagging: Generate relevant topic, format, audience, and other tags for transcript content
- Content Difficulty Analysis: Analyze and rate the difficulty level of transcript content based on various metrics
- Fully Customizable: Configure thresholds, categories, and storage options to fit your needs
- Command Line Interface: Process transcripts directly from the command line
- Python API: Integrate tagging and analysis capabilities into your own applications
Installation
pip install transcript-tagger
Quick Start
Basic Usage
from transcript_tagger_sdk import TranscriptTagger, Config
# Create a tagger with default configuration
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")
tagger = TranscriptTagger(config)
# Process a transcript file
result = tagger.process_transcript("path/to/transcript.txt")
# Access the results
print(f"Difficulty level: {result['difficulty']['difficulty_name']}")
print(f"Topics: {result['tags'].get('Topic', [])}")
Analyzing Difficulty Only
from transcript_tagger_sdk import DifficultyAnalyzer
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")
# Create an analyzer
analyzer = DifficultyAnalyzer(config)
# Analyze text
result = analyzer.analyze_text("Your transcript text here...")
# Print difficulty level
print(f"Difficulty: {result['difficulty_name']} ({result['difficulty_level']}/5)")
Custom Configuration
from transcript_tagger_sdk import Config, TranscriptTagger
# Create custom configuration
config = Config()
config.set_api_key("your-openai-api-key")
config.set_model("gpt-4")
config.set_storage_path("./custom/path")
# Custom readability thresholds
config.set_readability_thresholds({
"Beginner": 3.0, # 0-3.0
"Intermediate": 9.0, # 3.1-9.0
"Advanced": 15.0, # 9.1+
})
# Create tagger with custom config
tagger = TranscriptTagger(config)
Command Line Usage
Process transcripts:
# Process a single transcript
transcript-tagger process path/to/transcript.txt
# Process multiple transcripts
transcript-tagger process file1.txt file2.txt file3.txt
# Only analyze difficulty (no tagging)
transcript-tagger process --difficulty-only transcript.txt
# Only generate tags (no difficulty analysis)
transcript-tagger process --tags-only transcript.txt
View results:
# View all results
transcript-tagger view
# View results for a specific video ID
transcript-tagger view --video-id video123
Advanced Usage
For more advanced usage examples, check out the examples directory:
basic_usage.py: Simple usage exampleadvanced_usage.py: Advanced features including batch processing and custom configurations
API Reference
Main Classes
- TranscriptTagger: Main class for tagging and analyzing transcripts
- Config: Configuration class for customizing tagger behavior
- DifficultyAnalyzer: Class for analyzing the difficulty level of text
Difficulty Levels
The toolkit defines 5 difficulty levels:
- 初级/Beginner: Basic vocabulary, simple sentences, suitable for beginners
- 初中级/Elementary: Slightly more complex vocabulary, suitable for early learners
- 中级/Intermediate: Moderate complexity, suitable for intermediate learners
- 中高级/Upper-Intermediate: More complex language, suitable for advanced learners
- 高级/Advanced: Complex vocabulary and sentence structures, suitable for proficient users
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transcript-tagger-0.1.2.tar.gz.
File metadata
- Download URL: transcript-tagger-0.1.2.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc088f2d3bfcdc500f3e98328814ceb8f1da050aaead94bd30328ec051e1d5e3
|
|
| MD5 |
ab45d283ab3a73a1d5a920795f1a191e
|
|
| BLAKE2b-256 |
0bd588e55bd5d2b8dc4868c54d25d62dd320618cb3f2f28b317dd60b7b674619
|
File details
Details for the file transcript_tagger-0.1.2-py3-none-any.whl.
File metadata
- Download URL: transcript_tagger-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47f84ceb38fb2ce5e19effae0a88b2d22a79401f530f2b5567a07044e70dd137
|
|
| MD5 |
7e5fbeab519426ec1767484fa9b418b6
|
|
| BLAKE2b-256 |
00f519614a4aaf6ec73c9c3cb74702c8239608c3bef871c81cb822b55c9d1dc1
|