Skip to main content

A sophisticated desktop application for capturing and analyzing live captions with AI assistance

Project description

Live Caption Assistant

Overview

This project is a sophisticated desktop application designed to capture, process, and analyze live captions with integrated AI assistance capabilities. It's particularly useful for interview scenarios, meetings, and other situations requiring real-time caption analysis.

Key Features

1. Live Caption Capture

  • Real-time capture of Chrome's Live Caption window
  • Intelligent text deduplication and sentence processing
  • Automatic text formatting and display
class LiveCaptionCapture:
    def __init__(self, ui):
        self.ui = ui
        self.running = False
        self.last_text = ""
        self.seen_fragments = set()
        self.current_sentence = ""  # 添加这行来跟踪当前正在构建的句子

    def find_caption_window(self):
        try:
            caption_window = auto.WindowControl(searchDepth=1, ClassName='Chrome_WidgetWin_1', SubName='Live Caption')
            if caption_window.Exists(maxSearchSeconds=1):
                return caption_window
            return None
        except Exception as e:
            self.ui.status_var.set(f"查找窗口错误: {str(e)}")
            return None

    def get_caption_text(self, window):
        try:
            doc_control = window.DocumentControl()
            if doc_control.Exists():
                return doc_control.Name
            return None
        except Exception as e:
            self.ui.status_var.set(f"获取文本错误: {str(e)}")
            return None

    def process_text(self, new_text):
        if not new_text:
            return

        # 将文本按句子分割,但保持完整性
        sentences = new_text.split('. ')
        
        for i, sentence in enumerate(sentences):
            sentence = sentence.strip()
            if not sentence:
                continue

            # 如果不是最后一个句子,添加句号
            if i < len(sentences) - 1:
                sentence = sentence + "."

            # 如果是新的句子且不在已见集合中
            if sentence and sentence not in self.seen_fragments:
                self.seen_fragments.add(sentence)
                # 使用 after 方法在主线程中更新UI
                self.ui.root.after(0, self.ui.append_text, sentence)

        # 限制已见片段集合的大小
        if len(self.seen_fragments) > 100:
            self.seen_fragments.clear()
    def start_capture(self):
        self.running = True
        self.ui.root.after(0, self.ui.status_var.set, "开始捕获...")
        
        while self.running:
            try:
                window = self.find_caption_window()
                if window:
                    text = self.get_caption_text(window)
                    if text:
                        self.process_text(text)
                else:
                    time.sleep(3)
                    continue
                
                time.sleep(0.1)
                
            except Exception as e:
                self.ui.root.after(0, self.ui.status_var.set, f"发生错误: {str(e)}")
                time.sleep(1)

    def stop_capture(self):
        self.running = False
        self.ui.root.after(0, self.ui.status_var.set, "已停止捕获")

2. AI-Powered Analysis

  • Integration with multiple AI models including:
    • o1-all
    • o1-mini
    • gpt-4o
    • claude-3-5-sonnet
    • o1-preview
    • o1-pro-all
  • Contextual analysis of conversations
  • Bilingual response generation (Chinese/English)

3. User Interface

  • Modern Tkinter-based GUI with:
    • Split-pane layout
    • Real-time transcript display
    • Configurable input panels for JD, CV, and Notes
    • Response panel for AI analysis
class LiveCaptionUI:
    def __init__(self):
        self.root = tk.Tk()
        self.root.title("Live Caption Transcript")
        # 设置最小窗口大小
        self.root.minsize(800, 500)
        
        # 添加用于LLM调用的成员变量
        self.llm_queue = queue.Queue()
        self.button_states = {'ask': False}
        self.buttons = {}
        self.animation_count = 0
        
        # 创建主框架
        self.main_frame = ttk.Frame(self.root)
        self.main_frame.grid(row=0, column=0, sticky="nsew", padx=10, pady=10)
        
        # 配置root的grid权重
        self.root.grid_rowconfigure(0, weight=1)
        self.root.grid_columnconfigure(0, weight=1)
        
        # 创建垂直方向的PanedWindow作为主容器
        self.main_paned = ttk.PanedWindow(self.main_frame, orient=tk.VERTICAL)
        self.main_paned.grid(row=0, column=0, sticky="nsew")
        
        # 创建上部面板
        self.upper_frame = ttk.Frame(self.main_paned)
        self.main_paned.add(self.upper_frame, weight=2)  # 上部分配更多空间
        
        # 创建水平方向的PanedWindow
        self.paned_window = ttk.PanedWindow(self.upper_frame, orient=tk.HORIZONTAL)
        self.paned_window.pack(fill=tk.BOTH, expand=True)
        
        # 左侧面板 - 字幕显示
        self.left_frame = ttk.Frame(self.paned_window)
        self.paned_window.add(self.left_frame, weight=1)
        
        # 右侧面板
        self.right_frame = ttk.Frame(self.paned_window)
        self.paned_window.add(self.right_frame, weight=1)
        
        # 创建左侧文本显示区域
        self.text_area = scrolledtext.ScrolledText(
            self.left_frame, 
            wrap=tk.WORD,
            width=40,
            height=20,
            font=("Microsoft YaHei UI", 10)
        )
        self.text_area.pack(fill=tk.BOTH, expand=True)
        
        # 创建右侧文本框
        self.create_right_panels()
        
        # 创建下部答案面板
        self.lower_frame = ttk.Frame(self.main_paned)
        self.main_paned.add(self.lower_frame, weight=1)  # 下部分配较少空间
        
        # 创建答案文本框
        self.create_answer_panel()
        

4. Configuration Management

  • Flexible configuration system using INI format
  • Customizable shortcuts
  • Persistent settings storage
[GenAI]
model = o1-mini
openai_token = 
openai_token_url = 
openai_health_url = 
openai_mm_url = 
openai_chat_url = 
openai_user_name = 
openai_password = 
openai_application_id = 
openai_application_name = 
head_token_key = Authorization

[Prompts]
summarize_prompt = Summarize the current state of the meeting based on the following transcript, considering the meeting topic, goals, and background. Provide a concise overview of key points discussed and any decisions made. \n** Transcript** : {transcript}\n ** Meeting Topic **: {meeting_topic}\n** Meeting Goals:**  {meeting_goals}\n ** Background** : {background}\n ** Output  Language: **  {language}
viewpoints_prompt = Summarize each participant·s main points from the transcript , Highlight key ideas from key Stakeholders\n Transcript: {transcript}\n Meeting Topic: {meeting_topic}\n Meeting Goals: {meeting_goals}\n Key Stakeholders {key_stakeholders}\n Output Language: {language}
navigate_prompt = Based on the meeting topic, goals, transcript, and {user_name}·s stance, suggest the next statement for {user_name} should make to navigate the meeting effectively. Consider:communication skills, technical understanding, decision-making, leadership, strategic thinking, adaptability, and stakeholder management.\n  Transcript: {transcript}\n  Meeting Topic: {meeting_topic}\n  Meeting Goals: {meeting_goals}  \n Key Stakeholders: {key_stakeholders}  \n User Name: {user_name}\n Output Language: {language}\nNotes: {notes}
minutes_prompt = Convert the following transcript into a formal meeting minutes document, including key points, decisions, and action items etc.   please try to keep the output concise and to the point. try to compile the output in a way that is easy to read and understand, write in header + paragraphs rather than bullet points alone. Ensure clarity and structure align with standard meeting minutes format.\n Transcript: {transcript}\n Meeting Topic: {meeting_topic}\n Meeting Goals: {meeting_goals}\n Output Language: {language}

[Shortcuts]
hotkey_snip = <shift>+a+s
hotkey_paint = <ctrl>+p
hotkey_text = <ctrl>+t
hotkey_screenpen_toggle = <ctrl>+<cmd>+<alt>
hotkey_undo = <ctrl>+z
hotkey_redo = <ctrl>+y
hotkey_screenpen_exit = <esc>
hotkey_screenpen_clear_hide = <ctrl>+<esc>
hotkey_topmost_on = <esc>+`
hotkey_topmost_off = <cmd>+<shift>+\
hotkey_opacity_down = <left>+<right>+<down>
hotkey_opacity_up = <left>+<right>+<up>
hotkey_ask_dialog_key = <ctrl>
hotkey_ask_dialog_count = 4
hotkey_ask_dialog_time_window = 1.0

[Defaults]
duration = 30min
username = Jim
language = En
live_freq = 30
notification_showtime = 4
context = Please input meeting context...
agenda = Please input meeting agenda/target...
topics = Please input meeting topics...
stakeholders = Please input key stakeholders...
notes = Please input meeting notes...
default_jd = 1111111
default_cv = 22222
default_notes = NA

Technical Architecture

Core Components

  1. LiveCaptionUI

    • Main application window handler
    • Manages UI components and event loops
    • Handles user interactions and display updates
  2. LiveCaptionCapture

    • Manages caption window detection and text extraction
    • Implements intelligent text processing
    • Handles COM initialization for Windows UI Automation
  3. ConfigWindow

    • Configuration management interface
    • Model selection and default text management
    • Settings persistence
  4. GPT Integration

    • Custom API client for AI model interaction
    • Support for multiple model endpoints
    • Robust error handling and response processing

Setup and Dependencies

Required Packages

uiautomation
tkinter
pyperclip
pythoncom
difflib

Configuration

The application requires a config.ini file with the following sections:

  • GenAI: AI model configuration
  • Prompts: System prompt templates
  • Shortcuts: Keyboard shortcut definitions
  • Defaults: Default values and settings

Usage

  1. Starting the Application
from capture import LiveCaptionUI

app = LiveCaptionUI()
app.run()
  1. Basic Operations
  • Start/Stop Capture: Toggles live caption capture
  • Clear: Clears all text fields
  • Copy: Copies all content to clipboard
  • Save: Saves transcript and analysis to file
  • Ask: Triggers AI analysis of current content
  1. Configuration
  • Access through the Config button
  • Select AI model
  • Configure default texts for JD, CV, and Notes
  • Save settings for persistence

Technical Details

Text Processing Algorithm

The application uses a sophisticated text processing system that:

  • Removes duplicates using sequence matching
  • Maintains sentence integrity
  • Handles partial updates
    def process_text(self, new_text):
        if not new_text:
            return

        # 将文本按句子分割,但保持完整性
        sentences = new_text.split('. ')
        
        for i, sentence in enumerate(sentences):
            sentence = sentence.strip()
            if not sentence:
                continue

            # 如果不是最后一个句子,添加句号
            if i < len(sentences) - 1:
                sentence = sentence + "."

            # 如果是新的句子且不在已见集合中
            if sentence and sentence not in self.seen_fragments:
                self.seen_fragments.add(sentence)
                # 使用 after 方法在主线程中更新UI
                self.ui.root.after(0, self.ui.append_text, sentence)

AI Integration

The system implements a robust AI communication layer that:

  • Handles authentication
  • Manages API endpoints
  • Processes responses
def ask(msgs):
    # 检查OPENAI_TOKEN是否已经存在

    print("~"*100)
    print(msgs)
    print("~"*100)
    
    _token = ""
    
    if OPENAI_TOKEN and OPENAI_TOKEN.strip():  # 优先从环境变量中取token
        _token = "Bearer " + OPENAI_TOKEN
    else:
        # 如果没有找到环境变量中的token,尝试通过get_token获取
        _token = get_token()
    resp = ask_with_msgs(_token, msgs)
    return resp

Development Notes

Threading Model

  • Main UI thread for interface operations
  • Separate capture thread for COM operations
  • Queue-based communication between threads
class CaptureThread(threading.Thread):
    def __init__(self, capturer):
        super().__init__()
        self.capturer = capturer
        self.daemon = True

    def run(self):
        try:
            # ���线程中初始化COM
            pythoncom.CoInitialize()
            # 初始化UI Automation
            auto.InitializeUIAutomationInCurrentThread()
            # 动捕获
            self.capturer.start_capture()
        finally:
            # 清理COM
            pythoncom.CoUninitialize()

Error Handling

  • Robust exception handling for UI operations
  • Graceful degradation for API failures
  • User feedback through status bar updates

Future Enhancements

  1. Support for additional AI models
  2. Enhanced text processing algorithms
  3. Multiple language support
  4. Advanced configuration options
  5. Plugin system for extensibility

Contributing

Contributions are welcome! Please ensure:

  1. Code follows existing style
  2. New features include appropriate tests
  3. Documentation is updated
  4. Pull requests include description of changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiktalk-0.1.0.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiktalk-0.1.0-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file tiktalk-0.1.0.tar.gz.

File metadata

  • Download URL: tiktalk-0.1.0.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for tiktalk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f978687d3a949211a93cb1b626b7e363eec490b6c466fcb7b5a6e08123f55e9
MD5 2a466ea7a4b98e939cc56d3b64f7e27a
BLAKE2b-256 e7b547f030366b3143aa85c0e6d56903c70762ffe9301f7d13a483d749054cb2

See more details on using hashes here.

File details

Details for the file tiktalk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tiktalk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for tiktalk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33f909c92ded7191760ab1cf50f2cbdd95eadf0403bf3a7f4e00575e54cc9706
MD5 562226cbed3fece5ee8f399d441a1af7
BLAKE2b-256 c939480a089b9e0ddeba5241ccd8df984d9a348f45777eb8a99992429291f4d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page