Skip to main content

An AI assistant with PDF processing, web browsing, and speech capabilities

Project description

Karhu AI Assistant CLI

Karhu is a powerful command-line AI assistant designed for productivity, research, and creative tasks. It supports file and document processing, web browsing, contextual conversations, speech synthesis, and advanced profile/model management—all from your terminal.


Table of Contents


Features

  • File & Document Processing: Read and process PDF, text, and other files.
  • Web Browsing & Search: Browse web pages and perform web searches directly from the CLI.
  • Contextual Conversations: Maintain, save, and manage conversation context for seamless multi-turn interactions.
  • Profile & Model Management: Switch between AI models and conversational profiles (e.g., coding, creative, academic, therapist).
  • Speech Synthesis & Recognition: Text-to-speech (TTS) and speech-to-text (STT) support, including multiple voice engines.
  • Interactive Mode: Chat with Karhu in a conversational loop with command autocompletion and history.
  • Robust Error Handling: Graceful error messages and recovery for all operations.
  • Extensible: Modular design for easy addition of new features and integrations.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/karhu-cli.git
    cd karhu-cli
    
  2. Install dependencies:

    pip install -r requirements.txt
    

    (If using a virtual environment, activate it first.)

  3. (Optional) Install extra system dependencies for TTS/STT features (see Speech and Voice Features).


Configuration

Karhu uses JSON configuration files in src/karhu/config/:

  • models.json: Define available AI models and their parameters.
  • profiles.json: Define conversational profiles (e.g., coding, creative, therapist).
  • system_prompt.json: Set the default system prompt for the assistant.

You can customize or add new profiles and models by editing these files.


Usage

CLI Options

Run Karhu from the project root:

python src/karhu/cli.py [OPTIONS]

Main options:

  • --query, -q <question>: Ask a direct question.
  • --interactive, -i: Start interactive chat mode.
  • --file, -f <path>: Process a specific file.
  • --files, -ff <directory>: Process all files in a directory.
  • --web, -w <url>: Browse a web page.
  • --search, -s <query>: Perform a web search.
  • --model, -m <name>: Select AI model.
  • --profile, -P <name>: Select conversational profile.
  • --setsprompt <prompt>: Set a custom system prompt.
  • --save: Save conversation context.
  • --clear, -c: Clear current context.
  • --list-models: List available models.
  • --list-profiles: List available profiles.
  • --voices: List TTS voices.
  • --kokoro-voices: List Kokoro TTS voices.
  • --kokoro-blend <indices>: Blend Kokoro voices.
  • --help-commands: Show all available commands.

Interactive Mode

Start with:

python src/karhu/cli.py --interactive

Features:

  • Command autocompletion and history.
  • All CLI and special commands available as !command (see below).

Interactive Commands

  • !model [name] — Switch AI model.
  • !list_models — List models.
  • !profile [name] — Switch profile.
  • !list_profiles — List profiles.
  • !create_profile [name:prompt] — Create a new profile.
  • !system_prompt — Show current system prompt.
  • !setsprompt [prompt] — Set system prompt.
  • !file [path] — Read a file.
  • !files [directory] — Read all files in a directory.
  • !browse [url] — Browse a web page.
  • !search [query] — Web search.
  • !context_size — Show context size.
  • !context_info — Show context details.
  • !optimize_context — Summarize/optimize context.
  • !search_context [query] — Search within context.
  • !chunk [id] — List/retrieve document chunks.
  • !save — Save conversation.
  • !clear — Clear context.
  • !clearall — Clear all context/history.
  • !lazy — Toggle speech-to-text mode.
  • !speak — Toggle text-to-speech mode.
  • !voices — List TTS voices.
  • !voice [index] — Change TTS voice.
  • !kokoro — Toggle Kokoro TTS.
  • !kokoro_voices — List Kokoro voices.
  • !kokoro_voice [index] — Change Kokoro voice.
  • !kokoro_blend [indices] — Blend Kokoro voices.
  • !help — Show help.
  • !quit — Exit.

Example Commands

  • Process a PDF:
    python src/karhu/cli.py --file path/to/file.pdf
    
  • Web search:
    python src/karhu/cli.py --search "What is quantum computing?"
    
  • Switch to therapist profile in interactive mode:
    python src/karhu/cli.py --interactive --profile therapist
    

Profiles and Models

Karhu supports multiple AI models (e.g., GPT-4o, Claude, Gemma) and conversational profiles (e.g., coding, creative, academic, therapist, funny, sarcastic, chill). You can switch or create new ones at runtime.

  • List models: !list_models
  • Switch model: !model <name>
  • List profiles: !list_profiles
  • Switch profile: !profile <name>
  • Create profile: !create_profile name:prompt

Profiles are defined in src/karhu/config/profiles.json.


Speech and Voice Features

  • Text-to-Speech (TTS): Use !speak, !voices, !voice [index] to enable and select voices.
  • Kokoro TTS: Advanced TTS engine with voice blending (!kokoro, !kokoro_voices, !kokoro_voice, !kokoro_blend).
  • Speech-to-Text (STT): Use !lazy to toggle speech input mode.

Note: Some features may require additional system dependencies (e.g., espeak, ffmpeg, or platform-specific TTS engines).


Context Management

  • Save context: !save
  • Clear context: !clear
  • Clear all: !clearall
  • Show context size/info: !context_size, !context_info
  • Optimize context: !optimize_context
  • Search context: !search_context [query]
  • Chunking: !chunk [id] for large documents

Module Reference

  • ai_assistant.py: Core assistant logic and LLM interaction.
  • cli.py: Command-line interface and argument parsing.
  • interactive.py: Interactive chat mode.
  • model_manager.py: Model selection and management.
  • profile_manager.py: Profile selection and management.
  • context_manager.py: Context storage, retrieval, and optimization.
  • document_processor.py: File and document parsing.
  • web_browser.py: Web browsing and search.
  • TextToSpeech.py / SpeechToText.py / kokorotts.py: Speech synthesis and recognition.
  • Display_help.py: Command help and documentation.
  • Errors.py: Error handling and reporting.
  • config_parser.py: Configuration file parsing.
  • globals.py: Global state and settings.

Testing

Run all tests with:

pytest

Tests are located in the tests/ directory and cover core modules and features.


Contributing

  1. Fork the repository and create a new branch.
  2. Add your feature or fix.
  3. Write or update tests as needed.
  4. Submit a pull request with a clear description.

License

This project is licensed under the MIT License.


For questions or support, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

karhu-2.0.0.tar.gz (48.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

karhu-2.0.0-py3-none-any.whl (52.2 kB view details)

Uploaded Python 3

File details

Details for the file karhu-2.0.0.tar.gz.

File metadata

  • Download URL: karhu-2.0.0.tar.gz
  • Upload date:
  • Size: 48.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for karhu-2.0.0.tar.gz
Algorithm Hash digest
SHA256 3f8950299d71403fbeb508c9fa25b86f628d2d353d2eed663c2757d2982f0907
MD5 c7a0745611d042af586e43c04ee7a75d
BLAKE2b-256 55176c01709c373f670201eccf2792fffe4943270077a8448f7ec53f7614069d

See more details on using hashes here.

File details

Details for the file karhu-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: karhu-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 52.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for karhu-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2aaef45110044e9f66f7de358e8e710856142f2196b2cfcb657074fa30f224e
MD5 30bb59f2c7351be203bbfdaf44281092
BLAKE2b-256 9f638d43bc7dd6862a9c572508879ea8f669a367cac2a3d2aaf973067592d57e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page