Skip to main content

A full-featured, premium AI command line interface with Transformers and GGUF support

Project description

Generation Controls

# Disable thinking traces
cognicli --model gpt2 --no-think --generate "Quick answer:"

# Disable streaming for batch processing
cognicli --model gpt2 --no-stream --generate "Batch response"

# Adjust sampling parameters
cognicli --model gpt2 --temperature 0.9 --max-tokens 1024 --generate "Creative story:"
```# CogniCLI 🧠⚡

[![PyPI version](https://badge.fury.io/py/cognicli.svg)](https://badge.fury.io/py/cognicli)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

CogniCLI has evolved into a **full-featured, premium AI command line** interface that supports both **Transformers and GGUF runners** with a single `--model` flag, automatic Hugging Face downloads, precision controls like `--type bf16 | fp16 | q4 | q8`, and a `--no-think` toggle for reasoning traces. We added **animated streaming output**, **ASCII logo and rich CLI colors**, and **extensive Markdown + syntax-highlighted code support** for all major programming languages. The `face` command now powers model exploration with `--list` filters, detailed `--info` model cards, README previews, and even `--files` for repo contents with file sizes and the ability to pick a specific GGUF quant file. On top of chatting and generating, CogniCLI also delivers **benchmarking tools** with latency, tokens/sec, perplexity, and JSON reports  all wrapped in a sleek, colorful interface.

## ✨ Features

### 🚀 **Dual Runtime Support**
- **Transformers**: Native PyTorch models with automatic GPU acceleration
- **GGUF**: Optimized quantized models via llama-cpp-python
- **Single `--model` flag** switches between both seamlessly

### 🎯 **Precision & Quantization Control**
- `--type bf16` - BFloat16 for optimal performance
- `--type fp16` - Half precision for memory efficiency  
- `--type fp32` - Full precision for maximum accuracy
- `--type q4` - 4-bit quantization (BitsAndBytes for Transformers, GGUF native)
- `--type q8` - 8-bit quantization (BitsAndBytes for Transformers, GGUF native)
- **Automatic quantization detection** - seamlessly switches between BitsAndBytes and GGUF quantization

### 🧠 **Advanced Generation**
- **Reasoning traces** with `--no-think` toggle
- **Animated streaming output** with real-time rendering
- **Markdown rendering** with syntax highlighting
- **Temperature and top-p** sampling controls

### 🔍 **Model Explorer (`face` command)**
- `--list [filter]` - Browse thousands of models with smart filtering
- `--info model-id` - Detailed model cards with stats and README
- `--files model-id` - Repository browser with file sizes and GGUF variants

### 📊 **Performance Benchmarking**
- **Latency measurements** - Precise timing for each generation
- **Tokens/second** - Throughput analysis
- **JSON export** - Structured results for analysis
- **Batch testing** - Multiple iterations for statistical accuracy

### 🎨 **Rich Interface**
- **ASCII art logo** with colorful branding
- **Progress spinners** and live updates
- **Syntax highlighting** for 50+ programming languages
- **Tables and panels** for organized information display

## 🚀 Quick Start

### Installation

```bash
# Core installation (Transformers models only)
pip install cognicli

# With quantization support (BitsAndBytes)
pip install cognicli[quantization]

# With GGUF support  
pip install cognicli[gguf]

# GPU-optimized (CUDA + quantization)
pip install cognicli[gpu]

# Apple Silicon (Metal + quantization)
pip install cognicli[metal]

# Everything included
pip install cognicli[full]

Note: The CLI will automatically prompt to install missing dependencies when you try to use features that require them.

Basic Usage

# Explore available models
cognicli --list llama

# Get detailed model information
cognicli --info microsoft/DialoGPT-medium

# Load and chat with a model
cognicli --model microsoft/DialoGPT-medium --chat

# Generate a single response
cognicli --model gpt2 --generate "The future of AI is"

# Use GGUF model with specific quantization
cognicli --model TheBloke/Llama-2-7B-Chat-GGUF --gguf-file llama-2-7b-chat.q4_0.gguf --chat

📖 Comprehensive Usage Guide

Model Management

# List trending models
cognicli --list

# Filter models by name
cognicli --list "code"

# Get model details
cognicli --info codellama/CodeLlama-7b-Python-hf

# Browse model files and GGUF variants
cognicli --files TheBloke/CodeLlama-7B-Python-GGUF

Precision & Quantization

# BitsAndBytes 4-bit quantization for Transformers models
cognicli --model microsoft/DialoGPT-large --type q4 --chat

# BitsAndBytes 8-bit quantization
cognicli --model microsoft/DialoGPT-large --type q8 --generate "Hello world"

# Mixed precision training
cognicli --model gpt2 --type bf16 --generate "High performance generation"

# GGUF quantization (automatic detection)
cognicli --model TheBloke/Llama-2-7B-GGUF --type q4 --chat

Interactive Chat

# Start chat mode
cognicli --model microsoft/DialoGPT-medium --chat

# Chat with custom settings
cognicli --model gpt2 --type bf16 --temperature 0.8 --no-think --chat

Benchmarking

# Basic benchmark
cognicli --model gpt2 --benchmark

# Save results to JSON
cognicli --model gpt2 --benchmark --json --save-benchmark results.json

# Custom benchmark prompt
cognicli --model gpt2 --benchmark --generate "Custom benchmark prompt"

GGUF Models

# Auto-select GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --chat

# Specify exact GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --gguf-file llama-2-7b.q4_0.gguf --chat

# List available GGUF files
cognicli --files TheBloke/Llama-2-7B-GGUF

🛠️ Advanced Configuration

Quantization Options

CogniCLI supports multiple quantization backends:

  • BitsAndBytes (for Transformers models):

    • --type q4: 4-bit NF4 quantization with double quantization
    • --type q8: 8-bit quantization with CPU offloading
    • Automatic GPU memory optimization
    • Works with any Transformers-compatible model
  • GGUF (for llama.cpp models):

    • --type q4: Native GGUF 4-bit quantization
    • --type q8: Native GGUF 8-bit quantization
    • CPU and GPU acceleration support
    • Optimized for inference speed
# Compare quantization methods
cognicli --model microsoft/DialoGPT-medium --type q4 --benchmark  # BitsAndBytes
cognicli --model TheBloke/DialoGPT-medium-GGUF --type q4 --benchmark  # GGUF

Environment Variables

# Set cache directory
export COGNICLI_CACHE_DIR="/path/to/cache"

# Configure Hugging Face token
export HUGGINGFACE_TOKEN="your_token_here"

# Set default model
export COGNICLI_DEFAULT_MODEL="microsoft/DialoGPT-medium"

Model Configuration

# ~/.cognicli/config.yaml
default_model: "gpt2"
default_precision: "fp16"
default_temperature: 0.7
default_max_tokens: 512
cache_dir: "~/.cognicli/cache"
streaming: true
show_thinking: true

🏗️ Architecture

CogniCLI is built with a modular architecture:

  • Model Loaders: Unified interface for Transformers and GGUF
  • Generation Engine: Streaming and batch generation with precision control
  • CLI Framework: Rich terminal interface with animated components
  • Benchmark Suite: Performance measurement and analysis tools
  • Model Explorer: Hugging Face integration for model discovery

🔧 Development

Building from Source

git clone https://github.com/cognicli/cognicli.git
cd cognicli
pip install -e .

Running Tests

pytest tests/

Contributing

We welcome contributions! Please see our Contributing Guide for details.

📊 Performance

CogniCLI is optimized for both speed and memory efficiency:

  • GPU Acceleration: Automatic CUDA detection and optimization
  • Memory Management: Smart batching and gradient checkpointing
  • Quantization: 4-bit and 8-bit GGUF support for resource-constrained environments
  • Streaming: Real-time token generation with minimal latency

Benchmark Results

Model Backend Precision Tokens/sec Memory (GB) Latency (ms)
GPT-2 Transformers fp16 45.2 1.2 22
GPT-2 Transformers q4 (BnB) 38.7 0.8 26
GPT-2 GGUF q4 42.1 0.6 24
Llama-7B Transformers fp16 12.3 14.2 81
Llama-7B Transformers q4 (BnB) 15.8 4.1 63
Llama-7B GGUF q4 18.2 3.8 55

🤝 Support

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

  • Hugging Face for the transformers library and model hub
  • BitsAndBytes for efficient quantization algorithms
  • llama.cpp team for GGUF format and optimization
  • Rich for the beautiful terminal interface
  • PyTorch for the deep learning foundation

Made with ❤️ by the CogniCLI team

Transform your command line into an AI powerhouse 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognicli-1.1.2.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cognicli-1.1.2-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file cognicli-1.1.2.tar.gz.

File metadata

  • Download URL: cognicli-1.1.2.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cognicli-1.1.2.tar.gz
Algorithm Hash digest
SHA256 31fd5dd636e582da443e52bde6b183cf540700ca47874880997979fefbb3c6d0
MD5 1dd9cf6fae14ebc15156ffd6af57b58f
BLAKE2b-256 1a9f33d8afda53d163dc2d65b1eb158d7b794cf28d033c96b5636ad8986b81cf

See more details on using hashes here.

File details

Details for the file cognicli-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: cognicli-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cognicli-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dce55cfe5dd94e8a5ebc3d4dc0ff2aa5908fe39283034f537f06e8ff190a4a28
MD5 80cbce9d77a004528563e93d53ae17a9
BLAKE2b-256 ff8a29347997e8aebe030171041a02c1e2eacb331a320317ff79cc978cb13dff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page