A sophisticated desktop application for capturing and analyzing live captions with AI assistance

These details have not been verified by PyPI

Project links

Homepage

Project description

Live Caption Assistant

Overview

This project is a sophisticated desktop application designed to capture, process, and analyze live captions with integrated AI assistance capabilities. It's particularly useful for interview scenarios, meetings, and other situations requiring real-time caption analysis.

Key Features

1. Live Caption Capture

Real-time capture of Chrome's Live Caption window
Intelligent text deduplication and sentence processing
Automatic text formatting and display

class LiveCaptionCapture:
    def __init__(self, ui):
        self.ui = ui
        self.running = False
        self.last_text = ""
        self.seen_fragments = set()
        self.current_sentence = ""  # 添加这行来跟踪当前正在构建的句子

    def find_caption_window(self):
        try:
            caption_window = auto.WindowControl(searchDepth=1, ClassName='Chrome_WidgetWin_1', SubName='Live Caption')
            if caption_window.Exists(maxSearchSeconds=1):
                return caption_window
            return None
        except Exception as e:
            self.ui.status_var.set(f"查找窗口错误: {str(e)}")
            return None

    def get_caption_text(self, window):
        try:
            doc_control = window.DocumentControl()
            if doc_control.Exists():
                return doc_control.Name
            return None
        except Exception as e:
            self.ui.status_var.set(f"获取文本错误: {str(e)}")
            return None

    def process_text(self, new_text):
        if not new_text:
            return

        # 将文本按句子分割，但保持完整性
        sentences = new_text.split('. ')
        
        for i, sentence in enumerate(sentences):
            sentence = sentence.strip()
            if not sentence:
                continue

            # 如果不是最后一个句子，添加句号
            if i < len(sentences) - 1:
                sentence = sentence + "."

            # 如果是新的句子且不在已见集合中
            if sentence and sentence not in self.seen_fragments:
                self.seen_fragments.add(sentence)
                # 使用 after 方法在主线程中更新UI
                self.ui.root.after(0, self.ui.append_text, sentence)

        # 限制已见片段集合的大小
        if len(self.seen_fragments) > 100:
            self.seen_fragments.clear()
    def start_capture(self):
        self.running = True
        self.ui.root.after(0, self.ui.status_var.set, "开始捕获...")
        
        while self.running:
            try:
                window = self.find_caption_window()
                if window:
                    text = self.get_caption_text(window)
                    if text:
                        self.process_text(text)
                else:
                    time.sleep(3)
                    continue
                
                time.sleep(0.1)
                
            except Exception as e:
                self.ui.root.after(0, self.ui.status_var.set, f"发生错误: {str(e)}")
                time.sleep(1)

    def stop_capture(self):
        self.running = False
        self.ui.root.after(0, self.ui.status_var.set, "已停止捕获")

2. AI-Powered Analysis

Integration with multiple AI models including:
- o1-all
- o1-mini
- gpt-4o
- claude-3-5-sonnet
- o1-preview
- o1-pro-all
Contextual analysis of conversations
Bilingual response generation (Chinese/English)

3. User Interface

Modern Tkinter-based GUI with:
- Split-pane layout
- Real-time transcript display
- Configurable input panels for JD, CV, and Notes
- Response panel for AI analysis

class LiveCaptionUI:
    def __init__(self):
        self.root = tk.Tk()
        self.root.title("Live Caption Transcript")
        # 设置最小窗口大小
        self.root.minsize(800, 500)
        
        # 添加用于LLM调用的成员变量
        self.llm_queue = queue.Queue()
        self.button_states = {'ask': False}
        self.buttons = {}
        self.animation_count = 0
        
        # 创建主框架
        self.main_frame = ttk.Frame(self.root)
        self.main_frame.grid(row=0, column=0, sticky="nsew", padx=10, pady=10)
        
        # 配置root的grid权重
        self.root.grid_rowconfigure(0, weight=1)
        self.root.grid_columnconfigure(0, weight=1)
        
        # 创建垂直方向的PanedWindow作为主容器
        self.main_paned = ttk.PanedWindow(self.main_frame, orient=tk.VERTICAL)
        self.main_paned.grid(row=0, column=0, sticky="nsew")
        
        # 创建上部面板
        self.upper_frame = ttk.Frame(self.main_paned)
        self.main_paned.add(self.upper_frame, weight=2)  # 上部分配更多空间
        
        # 创建水平方向的PanedWindow
        self.paned_window = ttk.PanedWindow(self.upper_frame, orient=tk.HORIZONTAL)
        self.paned_window.pack(fill=tk.BOTH, expand=True)
        
        # 左侧面板 - 字幕显示
        self.left_frame = ttk.Frame(self.paned_window)
        self.paned_window.add(self.left_frame, weight=1)
        
        # 右侧面板
        self.right_frame = ttk.Frame(self.paned_window)
        self.paned_window.add(self.right_frame, weight=1)
        
        # 创建左侧文本显示区域
        self.text_area = scrolledtext.ScrolledText(
            self.left_frame, 
            wrap=tk.WORD,
            width=40,
            height=20,
            font=("Microsoft YaHei UI", 10)
        )
        self.text_area.pack(fill=tk.BOTH, expand=True)
        
        # 创建右侧文本框
        self.create_right_panels()
        
        # 创建下部答案面板
        self.lower_frame = ttk.Frame(self.main_paned)
        self.main_paned.add(self.lower_frame, weight=1)  # 下部分配较少空间
        
        # 创建答案文本框
        self.create_answer_panel()

4. Configuration Management

Flexible configuration system using INI format
Customizable shortcuts
Persistent settings storage

[GenAI]
model = o1-mini
openai_token = 
openai_token_url = 
openai_health_url = 
openai_mm_url = 
openai_chat_url = 
openai_user_name = 
openai_password = 
openai_application_id = 
openai_application_name = 
head_token_key = Authorization

[Prompts]
summarize_prompt = Summarize the current state of the meeting based on the following transcript, considering the meeting topic, goals, and background. Provide a concise overview of key points discussed and any decisions made. \n** Transcript** : {transcript}\n ** Meeting Topic **: {meeting_topic}\n** Meeting Goals:**  {meeting_goals}\n ** Background** : {background}\n ** Output  Language: **  {language}
viewpoints_prompt = Summarize each participant·s main points from the transcript , Highlight key ideas from key Stakeholders\n Transcript: {transcript}\n Meeting Topic: {meeting_topic}\n Meeting Goals: {meeting_goals}\n Key Stakeholders {key_stakeholders}\n Output Language: {language}
navigate_prompt = Based on the meeting topic, goals, transcript, and {user_name}·s stance, suggest the next statement for {user_name} should make to navigate the meeting effectively. Consider:communication skills, technical understanding, decision-making, leadership, strategic thinking, adaptability, and stakeholder management.\n  Transcript: {transcript}\n  Meeting Topic: {meeting_topic}\n  Meeting Goals: {meeting_goals}  \n Key Stakeholders: {key_stakeholders}  \n User Name: {user_name}\n Output Language: {language}\nNotes: {notes}
minutes_prompt = Convert the following transcript into a formal meeting minutes document, including key points, decisions, and action items etc.   please try to keep the output concise and to the point. try to compile the output in a way that is easy to read and understand， write in header + paragraphs rather than bullet points alone. Ensure clarity and structure align with standard meeting minutes format.\n Transcript: {transcript}\n Meeting Topic: {meeting_topic}\n Meeting Goals: {meeting_goals}\n Output Language: {language}

[Shortcuts]
hotkey_snip = <shift>+a+s
hotkey_paint = <ctrl>+p
hotkey_text = <ctrl>+t
hotkey_screenpen_toggle = <ctrl>+<cmd>+<alt>
hotkey_undo = <ctrl>+z
hotkey_redo = <ctrl>+y
hotkey_screenpen_exit = <esc>
hotkey_screenpen_clear_hide = <ctrl>+<esc>
hotkey_topmost_on = <esc>+`
hotkey_topmost_off = <cmd>+<shift>+\
hotkey_opacity_down = <left>+<right>+<down>
hotkey_opacity_up = <left>+<right>+<up>
hotkey_ask_dialog_key = <ctrl>
hotkey_ask_dialog_count = 4
hotkey_ask_dialog_time_window = 1.0

[Defaults]
duration = 30min
username = Jim
language = En
live_freq = 30
notification_showtime = 4
context = Please input meeting context...
agenda = Please input meeting agenda/target...
topics = Please input meeting topics...
stakeholders = Please input key stakeholders...
notes = Please input meeting notes...
default_jd = 1111111
default_cv = 22222
default_notes = NA

Technical Architecture

Core Components

LiveCaptionUI
- Main application window handler
- Manages UI components and event loops
- Handles user interactions and display updates
LiveCaptionCapture
- Manages caption window detection and text extraction
- Implements intelligent text processing
- Handles COM initialization for Windows UI Automation
ConfigWindow
- Configuration management interface
- Model selection and default text management
- Settings persistence
GPT Integration
- Custom API client for AI model interaction
- Support for multiple model endpoints
- Robust error handling and response processing

Setup and Dependencies

Required Packages

uiautomation
tkinter
pyperclip
pythoncom
difflib

Configuration

The application requires a config.ini file with the following sections:

GenAI: AI model configuration
Prompts: System prompt templates
Shortcuts: Keyboard shortcut definitions
Defaults: Default values and settings

Usage

Starting the Application

from capture import LiveCaptionUI

app = LiveCaptionUI()
app.run()

Basic Operations

Start/Stop Capture: Toggles live caption capture
Clear: Clears all text fields
Copy: Copies all content to clipboard
Save: Saves transcript and analysis to file
Ask: Triggers AI analysis of current content

Configuration

Access through the Config button
Select AI model
Configure default texts for JD, CV, and Notes
Save settings for persistence

Technical Details

Text Processing Algorithm

The application uses a sophisticated text processing system that:

Removes duplicates using sequence matching
Maintains sentence integrity
Handles partial updates

    def process_text(self, new_text):
        if not new_text:
            return

        # 将文本按句子分割，但保持完整性
        sentences = new_text.split('. ')
        
        for i, sentence in enumerate(sentences):
            sentence = sentence.strip()
            if not sentence:
                continue

            # 如果不是最后一个句子，添加句号
            if i < len(sentences) - 1:
                sentence = sentence + "."

            # 如果是新的句子且不在已见集合中
            if sentence and sentence not in self.seen_fragments:
                self.seen_fragments.add(sentence)
                # 使用 after 方法在主线程中更新UI
                self.ui.root.after(0, self.ui.append_text, sentence)

AI Integration

The system implements a robust AI communication layer that:

Handles authentication
Manages API endpoints
Processes responses

def ask(msgs):
    # 检查OPENAI_TOKEN是否已经存在

    print("~"*100)
    print(msgs)
    print("~"*100)
    
    _token = ""
    
    if OPENAI_TOKEN and OPENAI_TOKEN.strip():  # 优先从环境变量中取token
        _token = "Bearer " + OPENAI_TOKEN
    else:
        # 如果没有找到环境变量中的token，尝试通过get_token获取
        _token = get_token()
    resp = ask_with_msgs(_token, msgs)
    return resp

Development Notes

Threading Model

Main UI thread for interface operations
Separate capture thread for COM operations
Queue-based communication between threads

class CaptureThread(threading.Thread):
    def __init__(self, capturer):
        super().__init__()
        self.capturer = capturer
        self.daemon = True

    def run(self):
        try:
            # ���线程中初始化COM
            pythoncom.CoInitialize()
            # 初始化UI Automation
            auto.InitializeUIAutomationInCurrentThread()
            # 动捕获
            self.capturer.start_capture()
        finally:
            # 清理COM
            pythoncom.CoUninitialize()

Error Handling

Robust exception handling for UI operations
Graceful degradation for API failures
User feedback through status bar updates

Future Enhancements

Support for additional AI models
Enhanced text processing algorithms
Multiple language support
Advanced configuration options
Plugin system for extensibility

Contributing

Contributions are welcome! Please ensure:

Code follows existing style
New features include appropriate tests
Documentation is updated
Pull requests include description of changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Jan 5, 2025

This version

0.1.0

Jan 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiktalk-0.1.0.tar.gz (37.0 kB view details)

Uploaded Jan 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tiktalk-0.1.0-py3-none-any.whl (34.4 kB view details)

Uploaded Jan 5, 2025 Python 3

File details

Details for the file tiktalk-0.1.0.tar.gz.

File metadata

Download URL: tiktalk-0.1.0.tar.gz
Upload date: Jan 5, 2025
Size: 37.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for tiktalk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5f978687d3a949211a93cb1b626b7e363eec490b6c466fcb7b5a6e08123f55e9`
MD5	`2a466ea7a4b98e939cc56d3b64f7e27a`
BLAKE2b-256	`e7b547f030366b3143aa85c0e6d56903c70762ffe9301f7d13a483d749054cb2`

See more details on using hashes here.

File details

Details for the file tiktalk-0.1.0-py3-none-any.whl.

File metadata

Download URL: tiktalk-0.1.0-py3-none-any.whl
Upload date: Jan 5, 2025
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for tiktalk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33f909c92ded7191760ab1cf50f2cbdd95eadf0403bf3a7f4e00575e54cc9706`
MD5	`562226cbed3fece5ee8f399d441a1af7`
BLAKE2b-256	`c939480a089b9e0ddeba5241ccd8df984d9a348f45777eb8a99992429291f4d9`

See more details on using hashes here.

tiktalk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Live Caption Assistant

Overview

Key Features

1. Live Caption Capture

2. AI-Powered Analysis

3. User Interface

4. Configuration Management

Technical Architecture

Core Components

Setup and Dependencies

Required Packages

Configuration

Usage

Technical Details

Text Processing Algorithm

AI Integration

Development Notes

Threading Model

Error Handling

Future Enhancements

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes