A voice recording and transcription application
Project description
Voice Notes
A sophisticated audio recording, playback, and transcription application built with PyQt6, featuring modern glassmorphism design, AI-powered transcription, and comprehensive note-taking capabilities.
Main application interface showing the horizontal layout with Media tabs (Player, Record, Transcribe) on the left and Notes panel on the right.
Project Structure
voice-notes/
├── main.py # Application entry point (for development)
├── voice_notes/ # Main package
│ ├── __init__.py
│ ├── app.py # Application core and initialization
│ ├── core/ # Business logic and services
│ │ ├── __init__.py
│ │ ├── config.py # Application constants and settings
│ │ ├── audio_devices.py # Audio device enumeration and management
│ │ ├── audio_library.py # Audio file operations and library management
│ │ ├── notes_manager.py # Notes storage, retrieval, and time anchoring
│ │ ├── recorder.py # Audio recording thread with level monitoring
│ │ ├── transcribe.py # Transcription service integration
│ │ └── transcription.py # Whisper transcription processing
│ ├── ui/ # User interface layer
│ │ ├── __init__.py
│ │ └── main_window.py # Main application window and layout
│ ├── widgets/ # Reusable UI components
│ │ ├── __init__.py
│ │ ├── library_widget.py # File browser, search, and import functionality
│ │ ├── media_widget.py # Tabbed interface for Player, Recorder, and Transcription
│ │ ├── notes_widget.py # Notes editing, export/import, and global actions
│ │ ├── player_widget.py # Audio playback with word-level highlighting
│ │ ├── recorder_widget.py # Recording controls and device selection
│ │ └── transcription_widget.py # Transcription model selection and processing
│ ├── styles/ # Styling and theming
│ │ ├── __init__.py
│ │ └── theme.py # Comprehensive glassmorphism QSS theme
│ └── utils/ # Utility functions and helpers
│ ├── __init__.py
│ ├── helpers.py # Time formatting and text utilities
│ └── platform.py # Platform-specific effects and optimizations
├── recordings/ # Audio recordings (auto-created)
├── outputs/ # Notes and transcription files (auto-created)
├── session_media/ # Temporary session files
├── requirements.txt # Python dependencies
├── pyproject.toml # Package configuration
├── README.md # This file
├── LICENSE # MIT License
├── .gitignore # Git ignore rules
└── screenshots/ # Application screenshots (optional)
├── app-overview.png # Main application interface
└── transcription-highlighting.png # AI transcription with word highlighting
Features
- 🎨 Modern Glassmorphism UI - Beautiful translucent design with soft gradients and shadows
- 🎵 Advanced Audio Recording - Record from any input device with real-time level monitoring
- ▶️ Intelligent Audio Playback - Full-featured player with speed control, seeking, and word-level highlighting
- 📝 Smart Note-Taking - Time-anchored notes with automatic word-level timing synchronization
- 🤖 AI-Powered Transcription - Multiple Whisper model sizes with real-time word highlighting during playback
- 📚 Library Management - Search, organize, and delete recordings with intuitive file browser
- 🔄 Tabbed Media Interface - Combined Player, Recorder, and Transcription in organized tabs
- 💾 Export/Import - Save and load notes in multiple formats
- 🎯 Word-Level Highlighting - Visual feedback showing current word position during audio playback
- 🖥️ Cross-Platform - Native support for macOS, Windows, and Linux
- ⚡ Fast Transcription - Optimized Whisper integration with faster-whisper for quick processing
Installation
Prerequisites
- Python 3.9 or higher
- pip package manager
Install from PyPI (Recommended)
pip install voice-notes
Then run:
voice-notes
Install from Source
Clone the repository:
git clone https://github.com/zangjiucheng/Voice-Notes.git
cd Voice-Notes
Install dependencies:
pip install -r requirements.txt
Or install manually:
# Core UI framework
pip install PyQt6 PyQt6-Qt6 PyQt6-sip
# Audio processing
pip install sounddevice soundfile pydub numpy
# AI transcription (choose one)
pip install faster-whisper # Recommended: faster and more efficient
# OR
pip install openai-whisper # Alternative: original OpenAI implementation
Optional: macOS Enhancements
pip install pyobjc-framework-Cocoa # For native macOS vibrancy effects
Usage
Getting Started
Launch the application:
voice-notes
Or from source:
python main.py
Interface Overview
The application features a horizontal layout with two main panels:
-
Left Panel (Media): Tabbed interface containing:
- Player Tab: Audio playback with word-level highlighting
- Record Tab: Audio recording with device selection
- Transcribe Tab: AI transcription with model selection
-
Right Panel (Notes): Note-taking interface with:
- Rich text editor for notes
- Export/import functionality
- Global action buttons (Clear All, Export, Import)
Keyboard Shortcuts
Space- Play/Pause audioCtrl+N(orCmd+Non macOS) - Start new recordingCtrl+V- Import audio filesDelete- Remove selected recording (with confirmation)
Workflow
- Record Audio: Switch to Record tab, select input device, and start recording
- Transcribe: Switch to Transcribe tab, select Whisper model, and process audio
- Playback: Switch to Player tab to play audio with synchronized word highlighting
- Take Notes: Use the Notes panel to add time-anchored notes during playback
- Export: Save your notes and transcriptions for later use
Architecture Benefits
Advanced Modular Design
- Separation of Concerns: Business logic, UI, and utilities are cleanly separated
- Component-Based: Reusable widgets with clear interfaces and responsibilities
- Service-Oriented: Core services (transcription, audio, notes) are independent modules
- Centralized Theming: All styling managed through a single QSS theme file
Enhanced User Experience
- Tabbed Media Interface: Organized workflow with Player, Recorder, and Transcription tabs
- Word-Level Highlighting: Visual feedback during audio playback for better synchronization
- Horizontal Layout: Efficient use of screen space with side-by-side panels
- Glassmorphism Design: Modern, translucent UI with consistent theming across all dialogs
Developer Experience
- Maintainable Codebase: Clear structure with logical file organization
- Easy Testing: Modular components can be tested independently
- Extensible Architecture: Simple to add new features or modify existing ones
- Type Hints: Python type annotations for better code documentation
- Cross-Platform: Platform-specific optimizations and effects
Performance & Reliability
- Optimized Transcription: Faster-whisper integration for quick AI processing
- Efficient Audio Handling: Real-time level monitoring and device management
- Robust File Management: Comprehensive library with search and cleanup
- Error Handling: Proper exception handling and user feedback
Development Status
This application has evolved from a single-file script into a professional, modular PyQt6 application with:
- ✅ Complete modular architecture with 20+ organized files
- ✅ Advanced glassmorphism UI with consistent theming
- ✅ AI-powered transcription with multiple model support
- ✅ Word-level highlighting during audio playback
- ✅ Comprehensive audio library management
- ✅ Cross-platform compatibility
- ✅ Export/import functionality for notes
- ✅ Horizontal layout optimization
- ✅ Centralized QSS theming for all components
The codebase follows modern Python practices with proper package structure, type hints, and comprehensive documentation.
Recent Enhancements
v2.0 Features
- Tabbed Media Widget: Combined Player, Recorder, and Transcription in organized tabs
- Word-Level Highlighting: Yellow background highlighting of current words during playback
- Enhanced UI Layout: Horizontal arrangement for better space utilization
- Separate Transcription Widget: Dedicated transcription interface with model selection
- Improved Dialog Styling: Glassmorphism theming for all popup dialogs
- Advanced Library Management: Delete functionality with confirmation dialogs
- Global Action Buttons: Clear All, Export, and Import functionality in Notes panel
- Optimized Performance: Faster-whisper integration for improved transcription speed
- Comprehensive Theming: Centralized QSS with objectName-specific selectors
Technical Improvements
- Modular Architecture: 20+ files organized in logical packages
- Enhanced Error Handling: Better user feedback and exception management
- Platform Optimizations: macOS-specific effects and cross-platform compatibility
- Code Quality: Type hints, documentation, and clean separation of concerns
- UI Consistency: Matching backgrounds, rounded corners, and unified styling
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Fork the repository
- Create a virtual environment:
python -m venv venv - Activate the environment:
source venv/bin/activate(Linux/macOS) orvenv\Scripts\activate(Windows) - Install dependencies:
pip install -r requirements.txt - Install in development mode:
pip install -e . - Run the application:
voice-notes
Code Style
- Follow PEP 8 guidelines
- Use type hints for function parameters and return values
- Add docstrings to classes and methods
- Keep functions focused on single responsibilities
Module Overview
Core Modules
config.py- Application constants, file paths, UI settings, and configurationaudio_library.py- Audio file operations, library management, and file cleanupnotes_manager.py- Notes storage, retrieval, time anchoring, and export/importrecorder.py- Audio recording thread with real-time level monitoringtranscription.py- Whisper integration and transcription processingtranscribe.py- Transcription service wrapper and model managementaudio_devices.py- Audio device enumeration, selection, and validation
UI Components
main_window.py- Main application window with horizontal layout and splittermedia_widget.py- Tabbed container for Player, Recorder, and Transcription widgetsplayer_widget.py- Audio playback controls with word highlighting and seekingrecorder_widget.py- Recording interface with device selection and level displaytranscription_widget.py- Transcription controls with model selection and progressnotes_widget.py- Notes editor with export/import and global actionslibrary_widget.py- File browser with search, import, and deletion functionality
Utilities
helpers.py- Time formatting, text parsing, and utility functionsplatform.py- Platform-specific effects, shadows, and optimizationstheme.py- Comprehensive glassmorphism QSS stylesheet with dialog styling
Migration from Single File
The original voice_notes_glass.py has been completely refactored into this structured approach while maintaining 100% feature compatibility. The new architecture provides:
- Better separation of concerns
- Easier testing and debugging
- More maintainable codebase
- Cleaner code organization
- Enhanced extensibility
To use the new version, simply run python main.py instead of the old single file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_notes-0.1.0.tar.gz.
File metadata
- Download URL: voice_notes-0.1.0.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4808f1a8a6f849c3239213e470b9428f36f950c7437c1b895ad714307cd04dfe
|
|
| MD5 |
ea3be6e487d2f6ad924f564041ca5282
|
|
| BLAKE2b-256 |
9474818e8a6c29165caf9abc818c0dcac8269a90c3d1865142b5180ed25745c0
|
Provenance
The following attestation bundles were made for voice_notes-0.1.0.tar.gz:
Publisher:
publish-to-pypi.yml on zangjiucheng/Voice-Notes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_notes-0.1.0.tar.gz -
Subject digest:
4808f1a8a6f849c3239213e470b9428f36f950c7437c1b895ad714307cd04dfe - Sigstore transparency entry: 565433205
- Sigstore integration time:
-
Permalink:
zangjiucheng/Voice-Notes@189477d51cd2119a92f7371e617b53fe24f9b7e8 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/zangjiucheng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@189477d51cd2119a92f7371e617b53fe24f9b7e8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file voice_notes-0.1.0-py3-none-any.whl.
File metadata
- Download URL: voice_notes-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e97f2c454f7a1bbbe60f15d51682af88435cac8a3a78cf5051d78812cabdb74
|
|
| MD5 |
5eefab045e296dea85ff94f26254e30a
|
|
| BLAKE2b-256 |
716d3ff9b068c67d65af167141740d1af08edcf24496d6352e6c8fdc477c994a
|
Provenance
The following attestation bundles were made for voice_notes-0.1.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on zangjiucheng/Voice-Notes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_notes-0.1.0-py3-none-any.whl -
Subject digest:
4e97f2c454f7a1bbbe60f15d51682af88435cac8a3a78cf5051d78812cabdb74 - Sigstore transparency entry: 565433210
- Sigstore integration time:
-
Permalink:
zangjiucheng/Voice-Notes@189477d51cd2119a92f7371e617b53fe24f9b7e8 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/zangjiucheng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@189477d51cd2119a92f7371e617b53fe24f9b7e8 -
Trigger Event:
push
-
Statement type: