Skip to main content

A multilingual voice transcription tool for Telugu and English

Project description

Dhvagna

A powerful multilingual voice transcription tool that supports both Telugu and English speech, powered by Google's Gemini AI. Dhvagna makes it easy to transcribe speech from both recordings and live microphone input, with support for customizable prompts and formatting.

🔑 API Key Required

To use Dhvagna, you need a valid API key with the following requirements:

  1. Valid API key format: All API keys must start with dk- followed by a unique alphanumeric string
  2. Limited usage: Each API key is limited to 100 transcriptions
  3. Verification: API keys are verified with our servers

How to obtain an API key:

Visit https://github.com/gnanesh-16/Dhwagna to register and receive your API key.

Setting up your API key:

# On Windows (Command Prompt)
set DHVGNA_API_KEY=dk-your_unique_key_here

# On Windows (PowerShell)
$env:DHVGNA_API_KEY="dk-your_unique_key_here"

# On Linux/MacOS
export DHVGNA_API_KEY="dk-your_unique_key_here"

Alternatively, you can create a .env file in your project directory:

DHVGNA_API_KEY=dk-your_unique_key_here

Usage tracking:

The package tracks usage against your API key quota. You'll receive notifications about:

  • Remaining transcriptions at startup
  • Updated count after each transcription
  • Errors when your limit is reached

🌟 Features

  • Multilingual Support: Transcribe both Telugu and English speech with high accuracy
  • Multiple Input Methods:
    • Live microphone recording with simple keyboard controls
    • Process existing WAV audio files
  • Smart Language Detection: Automatically detects whether the speech is in Telugu or English
  • Advanced Processing:
    • Raw transcription preserving original speech patterns
    • Refined output with improved grammar and formatting
    • Customizable prompts for different use cases
  • Flexible Output:
    • Customizable title formatting
    • Both raw and refined transcriptions
    • Automatic file saving with timestamps
  • Easy-to-Use Interface:
    • Simple keyboard controls ('K' to start/stop recording)
    • Progress indicators and timers
    • Beautiful console output with color coding
  • API Key Management:
    • Secure API key validation
    • Usage tracking with quota limits
    • Graceful error handling for exceeded quotas

📦 Installation

pip install dhvagna

🚀 Quick Start

Basic Usage

from dhvagna import record_audio, transcribe_wav_file

# Make sure you've set your DHVGNA_API_KEY environment variable first!
# The key must start with 'dk-' and be registered with our service

# Record and transcribe from microphone
record_audio()

# Or transcribe an existing WAV file
result = transcribe_wav_file("path/to/audio.wav")

📝 Examples

The package includes several example scripts to help you get started:

1. Basic Example

# basic_example.py
import os
from dhvagna import record_audio, transcribe_wav_file

def check_api_key():
    """Check if API key is properly set"""
    api_key = os.getenv('DHVGNA_API_KEY')
    if not api_key:
        print("\n❌ Error: DHVGNA_API_KEY environment variable not found!")
        print("\nTo set your API key:")
        print("1. Get your API key from: https://github.com/gnanesh-16/Dhwagna")
        print("\n2. Set it as an environment variable:")
        print("   Windows (Command Prompt):")
        print("   set DHVGNA_API_KEY=your_api_key_here")
        return False
    return True

def main():
    print("===== Dhvagna Basic Usage Example =====")
    
    # First, verify API key is set
    if not check_api_key():
        return
    
    # Present options to user
    print("\nWhat would you like to do?")
    print("1. Record audio from microphone")
    print("2. Transcribe a WAV file")
    
    choice = input("\nEnter your choice (1 or 2): ")
    
    if choice == "1":
        print("\nPress 'K' to start recording")
        print("Press 'K' again to stop recording")
        record_audio()
        
    elif choice == "2":
        file_path = input("\nEnter the path to your WAV file: ")
        if file_path.strip():
            result = transcribe_wav_file(file_path.strip())
            if result:
                original, refined, language = result
                print(f"\nTranscription successful!")
                print(f"Detected language: {language}")

if __name__ == "__main__":
    main()

2. Simple Example

# simple_example.py
import os
from dhvagna import record_audio, transcribe_wav_file, set_custom_prompts

def verify_api_key():
    """Make sure API key is set up correctly"""
    api_key = os.getenv('DHVGNA_API_KEY')
    if not api_key:
        print("\n❌ Error: DHVGNA_API_KEY not found!")
        print("\nGet your API key from:")
        print("https://github.com/gnanesh-16/Dhwagna")
        return False
    if not api_key.startswith('dk'):
        print("\n❌ Error: Invalid API key format!")
        print("Dhvagna API keys must start with 'dk'")
        return False
    return True

def main():
    print("===== Dhvagna Quick Start =====")
    
    # Check API key first
    if not verify_api_key():
        return
    
    # Set up a simple custom prompt (optional)
    custom_prompt = """
Transcribe this content in Telugu or English.
Identify the language as [LANGUAGE: Telugu] or [LANGUAGE: English].
Capture the speech exactly as spoken.
"""
    set_custom_prompts(new_transcription_prompt=custom_prompt)
    
    # Show options
    print("\nWhat would you like to do?")
    print("1. Record new audio")
    print("2. Transcribe WAV file")
    
    choice = input("\nChoice (1 or 2): ")
    
    if choice == "1":
        print("\nRecording:")
        print("1. Press 'K' to start")
        print("2. Speak your content")
        print("3. Press 'K' again to stop")
        record_audio()
    
    elif choice == "2":
        file_path = input("\nEnter WAV file path: ")
        if file_path.strip():
            print("\nTranscribing file...")
            transcribe_wav_file(file_path.strip())

if __name__ == "__main__":
    main()

3. Prompt Customization Example

# prompt_customization_example.py
import os
from dhvagna import (
    record_audio, 
    transcribe_wav_file, 
    set_custom_prompts,
    reset_prompts,
    DEFAULT_TRANSCRIPTION_PROMPT,
    DEFAULT_REFINEMENT_PROMPT_TEMPLATE,
    DEFAULT_TITLE_FORMAT
)

# View and customize different aspects of transcription
def main():
    print("===== Dhvagna Customization Example =====")
    
    # First, verify API key is set
    if not check_api_key():
        return
    
    # Let's customize for technical documentation
    technical_prompt = """
Transcribe this technical content. The speech may be in Telugu or English.
Identify the language and indicate it with [LANGUAGE: Telugu] or [LANGUAGE: English].
Focus on capturing:
- Code snippets and technical terms
- API references and documentation
- Software architecture descriptions
"""
    
    technical_template = """
Here is a technical transcription in {language}: "{text}"

Please refine this text while:
1. Maintaining all technical terms and code references exactly
2. Using proper technical writing style
3. Organizing content logically
"""
    
    technical_title = "💻 Technical {language} Documentation"
    
    # Set all custom options
    set_custom_prompts(
        new_transcription_prompt=technical_prompt,
        new_refinement_prompt_template=technical_template,
        new_title_format=technical_title
    )
    
    # Try the custom settings
    print("\nPress 'K' to start recording technical documentation")
    print("Press 'K' again to stop")
    record_audio()

if __name__ == "__main__":
    main()

4. Advanced Example

# advanced_example.py
import os
from rich.console import Console
from rich.panel import Panel
from dhvagna import (
    record_audio, 
    transcribe_wav_file, 
    set_custom_prompts,
    reset_prompts
)

# Create Rich console for beautiful output
console = Console()

class TranscriptionProfile:
    """Represents a complete transcription configuration profile"""
    def __init__(self, name, prompt, template, title_format, description):
        self.name = name
        self.prompt = prompt
        self.template = template
        self.title_format = title_format
        self.description = description
    
    def apply(self):
        """Apply this profile's settings"""
        set_custom_prompts(
            new_transcription_prompt=self.prompt,
            new_refinement_prompt_template=self.template,
            new_title_format=self.title_format
        )

# Define specialized transcription profiles
PROFILES = {
    "academic": TranscriptionProfile(
        name="Academic",
        prompt="""
Transcribe this academic content with high precision. The speech may be in Telugu or English.
Identify the language and indicate it with [LANGUAGE: Telugu] or [LANGUAGE: English].
Focus on capturing:
- Technical and academic terminology
- Research methodologies and findings
- Statistical data and measurements
""",
        template="""
Here is an academic transcription in {language}: "{text}"

Please refine this content while:
1. Maintaining academic rigor and technical accuracy
2. Preserving all citations and references
3. Structuring into proper academic sections
""",
        title_format="🎓 Academic {language} Transcript",
        description="Optimized for research presentations, lectures, and academic discussions"
    ),
    
    # Additional profiles defined for medical and legal transcription
    # ...
}

def main():
    console.print(Panel(
        "[bold cyan]Welcome to the Advanced Dhvagna Example![/]\n\n"
        "This example demonstrates all features including:\n"
        "• Specialized transcription profiles\n"
        "• Custom formatting options\n"
        "• Multiple use cases\n"
        "• Profile switching",
        title="🚀 Advanced Features Demo",
        border_style="green"
    ))
    
    # Apply profile and record
    profile = PROFILES["academic"]
    profile.apply()
    console.print(f"\n[green]✓[/] Applied [bold]{profile.name}[/] profile!")
    
    console.print("\nPress 'K' to start recording")
    console.print("Press 'K' again to stop")
    record_audio()

if __name__ == "__main__":
    main()

Each example is located in the dhvagna/examples/ directory and demonstrates different aspects of using the Dhvagna package.

🛠️ API Key Troubleshooting

If you encounter API key issues:

  1. Invalid key format: Ensure your key starts with dk-
  2. Verification failure: Check that your key is registered in our system
  3. Usage limit exceeded: Contact us to upgrade your plan if you need more than 100 transcriptions
  4. Server connection issues: The package will run in offline mode with limited functionality if it can't connect to our servers

🎯 Use Cases

  1. Academic Research

    • Transcribe research presentations
    • Document academic discussions
    • Record seminar content
  2. Medical Transcription

    • Patient consultations
    • Medical dictations
    • Clinical observations
  3. Legal Documentation

    • Court proceedings
    • Client interviews
    • Legal consultations
  4. General Purpose

    • Meeting minutes
    • Interviews
    • Personal notes

🛠️ Customization Options

1. Transcription Prompts

Customize how the initial transcription is processed:

set_custom_prompts(new_transcription_prompt="Your custom prompt here")

2. Refinement Templates

Control how the transcription is refined:

template = """
Refine this {language} text: "{text}"
Include your refinement instructions here.
"""
set_custom_prompts(new_refinement_prompt_template=template)

3. Title Formatting

Customize the display title:

set_custom_prompts(new_title_format="✨ Custom {language} Title")

📝 Requirements

  • Python 3.7+
  • Microphone (for recording features)
  • Internet connection (for AI processing and API key verification)
  • WAV file support
  • Valid Dhvagna API key

🔧 Technical Details

  • Uses Google's Gemini AI for advanced language processing
  • Supports WAV audio format
  • Real-time audio processing
  • Automatic language detection
  • Multi-threaded recording handling
  • Smart error recovery and fallback options
  • Secure API key validation and usage tracking

🌐 Links

📄 License

MIT License - See LICENSE file for details

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dhvagna-0.1.2-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file dhvagna-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dhvagna-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.5

File hashes

Hashes for dhvagna-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e957c52ed27bc7026a9140cad463debb147b236c5dbf3f7b5044c20bc40f90e
MD5 30b2d1a74812bdd20d7139b1a66ffcf8
BLAKE2b-256 468f491084b21f3b440abe6bc0dca9facc077cda8bca133fcc489e29a3d2d6d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page