A full-featured, premium AI command line interface with Transformers and GGUF support
Project description
Generation Controls
# Disable thinking traces
cognicli --model gpt2 --no-think --generate "Quick answer:"
# Disable streaming for batch processing
cognicli --model gpt2 --no-stream --generate "Batch response"
# Adjust sampling parameters
cognicli --model gpt2 --temperature 0.9 --max-tokens 1024 --generate "Creative story:"
```# CogniCLI 🧠⚡
[](https://badge.fury.io/py/cognicli)
[](https://www.python.org/downloads/release/python-380/)
[](https://opensource.org/licenses/Apache-2.0)
CogniCLI has evolved into a **full-featured, premium AI command line** interface that supports both **Transformers and GGUF runners** with a single `--model` flag, automatic Hugging Face downloads, precision controls like `--type bf16 | fp16 | q4 | q8`, and a `--no-think` toggle for reasoning traces. We added **animated streaming output**, **ASCII logo and rich CLI colors**, and **extensive Markdown + syntax-highlighted code support** for all major programming languages. The `face` command now powers model exploration with `--list` filters, detailed `--info` model cards, README previews, and even `--files` for repo contents with file sizes and the ability to pick a specific GGUF quant file. On top of chatting and generating, CogniCLI also delivers **benchmarking tools** with latency, tokens/sec, perplexity, and JSON reports — all wrapped in a sleek, colorful interface.
## ✨ Features
### 🚀 **Dual Runtime Support**
- **Transformers**: Native PyTorch models with automatic GPU acceleration
- **GGUF**: Optimized quantized models via llama-cpp-python
- **Single `--model` flag** switches between both seamlessly
### 🎯 **Precision & Quantization Control**
- `--type bf16` - BFloat16 for optimal performance
- `--type fp16` - Half precision for memory efficiency
- `--type fp32` - Full precision for maximum accuracy
- `--type q4` - 4-bit quantization (BitsAndBytes for Transformers, GGUF native)
- `--type q8` - 8-bit quantization (BitsAndBytes for Transformers, GGUF native)
- **Automatic quantization detection** - seamlessly switches between BitsAndBytes and GGUF quantization
### 🧠 **Advanced Generation**
- **Reasoning traces** with `--no-think` toggle
- **Animated streaming output** with real-time rendering
- **Markdown rendering** with syntax highlighting
- **Temperature and top-p** sampling controls
### 🔍 **Model Explorer (`face` command)**
- `--list [filter]` - Browse thousands of models with smart filtering
- `--info model-id` - Detailed model cards with stats and README
- `--files model-id` - Repository browser with file sizes and GGUF variants
### 📊 **Performance Benchmarking**
- **Latency measurements** - Precise timing for each generation
- **Tokens/second** - Throughput analysis
- **JSON export** - Structured results for analysis
- **Batch testing** - Multiple iterations for statistical accuracy
### 🎨 **Rich Interface**
- **ASCII art logo** with colorful branding
- **Progress spinners** and live updates
- **Syntax highlighting** for 50+ programming languages
- **Tables and panels** for organized information display
## 🚀 Quick Start
### Installation
```bash
# Core installation (Transformers models only)
pip install cognicli
# With quantization support (BitsAndBytes)
pip install cognicli[quantization]
# With GGUF support
pip install cognicli[gguf]
# GPU-optimized (CUDA + quantization)
pip install cognicli[gpu]
# Apple Silicon (Metal + quantization)
pip install cognicli[metal]
# Everything included
pip install cognicli[full]
Note: The CLI will automatically prompt to install missing dependencies when you try to use features that require them.
Basic Usage
# Explore available models
cognicli --list llama
# Get detailed model information
cognicli --info microsoft/DialoGPT-medium
# Load and chat with a model
cognicli --model microsoft/DialoGPT-medium --chat
# Generate a single response
cognicli --model gpt2 --generate "The future of AI is"
# Use GGUF model with specific quantization
cognicli --model TheBloke/Llama-2-7B-Chat-GGUF --gguf-file llama-2-7b-chat.q4_0.gguf --chat
📖 Comprehensive Usage Guide
Model Management
# List trending models
cognicli --list
# Filter models by name
cognicli --list "code"
# Get model details
cognicli --info codellama/CodeLlama-7b-Python-hf
# Browse model files and GGUF variants
cognicli --files TheBloke/CodeLlama-7B-Python-GGUF
Precision & Quantization
# BitsAndBytes 4-bit quantization for Transformers models
cognicli --model microsoft/DialoGPT-large --type q4 --chat
# BitsAndBytes 8-bit quantization
cognicli --model microsoft/DialoGPT-large --type q8 --generate "Hello world"
# Mixed precision training
cognicli --model gpt2 --type bf16 --generate "High performance generation"
# GGUF quantization (automatic detection)
cognicli --model TheBloke/Llama-2-7B-GGUF --type q4 --chat
Interactive Chat
# Start chat mode
cognicli --model microsoft/DialoGPT-medium --chat
# Chat with custom settings
cognicli --model gpt2 --type bf16 --temperature 0.8 --no-think --chat
Benchmarking
# Basic benchmark
cognicli --model gpt2 --benchmark
# Save results to JSON
cognicli --model gpt2 --benchmark --json --save-benchmark results.json
# Custom benchmark prompt
cognicli --model gpt2 --benchmark --generate "Custom benchmark prompt"
GGUF Models
# Auto-select GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --chat
# Specify exact GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --gguf-file llama-2-7b.q4_0.gguf --chat
# List available GGUF files
cognicli --files TheBloke/Llama-2-7B-GGUF
🛠️ Advanced Configuration
Quantization Options
CogniCLI supports multiple quantization backends:
-
BitsAndBytes (for Transformers models):
--type q4: 4-bit NF4 quantization with double quantization--type q8: 8-bit quantization with CPU offloading- Automatic GPU memory optimization
- Works with any Transformers-compatible model
-
GGUF (for llama.cpp models):
--type q4: Native GGUF 4-bit quantization--type q8: Native GGUF 8-bit quantization- CPU and GPU acceleration support
- Optimized for inference speed
# Compare quantization methods
cognicli --model microsoft/DialoGPT-medium --type q4 --benchmark # BitsAndBytes
cognicli --model TheBloke/DialoGPT-medium-GGUF --type q4 --benchmark # GGUF
Environment Variables
# Set cache directory
export COGNICLI_CACHE_DIR="/path/to/cache"
# Configure Hugging Face token
export HUGGINGFACE_TOKEN="your_token_here"
# Set default model
export COGNICLI_DEFAULT_MODEL="microsoft/DialoGPT-medium"
Model Configuration
# ~/.cognicli/config.yaml
default_model: "gpt2"
default_precision: "fp16"
default_temperature: 0.7
default_max_tokens: 512
cache_dir: "~/.cognicli/cache"
streaming: true
show_thinking: true
🏗️ Architecture
CogniCLI is built with a modular architecture:
- Model Loaders: Unified interface for Transformers and GGUF
- Generation Engine: Streaming and batch generation with precision control
- CLI Framework: Rich terminal interface with animated components
- Benchmark Suite: Performance measurement and analysis tools
- Model Explorer: Hugging Face integration for model discovery
🔧 Development
Building from Source
git clone https://github.com/cognicli/cognicli.git
cd cognicli
pip install -e .
Running Tests
pytest tests/
Contributing
We welcome contributions! Please see our Contributing Guide for details.
📊 Performance
CogniCLI is optimized for both speed and memory efficiency:
- GPU Acceleration: Automatic CUDA detection and optimization
- Memory Management: Smart batching and gradient checkpointing
- Quantization: 4-bit and 8-bit GGUF support for resource-constrained environments
- Streaming: Real-time token generation with minimal latency
Benchmark Results
| Model | Backend | Precision | Tokens/sec | Memory (GB) | Latency (ms) |
|---|---|---|---|---|---|
| GPT-2 | Transformers | fp16 | 45.2 | 1.2 | 22 |
| GPT-2 | Transformers | q4 (BnB) | 38.7 | 0.8 | 26 |
| GPT-2 | GGUF | q4 | 42.1 | 0.6 | 24 |
| Llama-7B | Transformers | fp16 | 12.3 | 14.2 | 81 |
| Llama-7B | Transformers | q4 (BnB) | 15.8 | 4.1 | 63 |
| Llama-7B | GGUF | q4 | 18.2 | 3.8 | 55 |
🤝 Support
- Documentation: docs.cognicli.ai
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: CogniCLI Community
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
🙏 Acknowledgments
- Hugging Face for the transformers library and model hub
- BitsAndBytes for efficient quantization algorithms
- llama.cpp team for GGUF format and optimization
- Rich for the beautiful terminal interface
- PyTorch for the deep learning foundation
Made with ❤️ by the CogniCLI team
Transform your command line into an AI powerhouse 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cognicli-1.0.2.tar.gz.
File metadata
- Download URL: cognicli-1.0.2.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff8bd98421f47238a899deeaa20ae1314ad8293db68d300450615b3d30b102fe
|
|
| MD5 |
f7fd30b16dd9daa91610b1ed0d2a23ee
|
|
| BLAKE2b-256 |
802d28a29a2b99a2b7cd7fb407eb270b2c5dcdda1265de5c63a7eb5db51da289
|
File details
Details for the file cognicli-1.0.2-py3-none-any.whl.
File metadata
- Download URL: cognicli-1.0.2-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cde0bd5651d24a307f4691aaaed7bada5935d05ef64f47c2cd6ef947a4cc74a9
|
|
| MD5 |
fa1e8f1adca5b01f8efa67639fa9b498
|
|
| BLAKE2b-256 |
f9a73dacee5176b2fd8565edd643fc471fd9306641bd86a790520bf89fc1a0d7
|