cognicli

A full-featured, premium AI command line interface with Transformers and GGUF support

These details have not been verified by PyPI

Project links

Project description

Generation Controls

# Disable thinking traces
cognicli --model gpt2 --no-think --generate "Quick answer:"

# Disable streaming for batch processing
cognicli --model gpt2 --no-stream --generate "Batch response"

# Adjust sampling parameters
cognicli --model gpt2 --temperature 0.9 --max-tokens 1024 --generate "Creative story:"
```# CogniCLI 🧠⚡

[![PyPI version](https://badge.fury.io/py/cognicli.svg)](https://badge.fury.io/py/cognicli)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

CogniCLI has evolved into a **full-featured, premium AI command line** interface that supports both **Transformers and GGUF runners** with a single `--model` flag, automatic Hugging Face downloads, precision controls like `--type bf16 | fp16 | q4 | q8`, and a `--no-think` toggle for reasoning traces. We added **animated streaming output**, **ASCII logo and rich CLI colors**, and **extensive Markdown + syntax-highlighted code support** for all major programming languages. The `face` command now powers model exploration with `--list` filters, detailed `--info` model cards, README previews, and even `--files` for repo contents with file sizes and the ability to pick a specific GGUF quant file. On top of chatting and generating, CogniCLI also delivers **benchmarking tools** with latency, tokens/sec, perplexity, and JSON reports — all wrapped in a sleek, colorful interface.

## ✨ Features

### 🚀 **Dual Runtime Support**
- **Transformers**: Native PyTorch models with automatic GPU acceleration
- **GGUF**: Optimized quantized models via llama-cpp-python
- **Single `--model` flag** switches between both seamlessly

### 🎯 **Precision & Quantization Control**
- `--type bf16` - BFloat16 for optimal performance
- `--type fp16` - Half precision for memory efficiency  
- `--type fp32` - Full precision for maximum accuracy
- `--type q4` - 4-bit quantization (BitsAndBytes for Transformers, GGUF native)
- `--type q8` - 8-bit quantization (BitsAndBytes for Transformers, GGUF native)
- **Automatic quantization detection** - seamlessly switches between BitsAndBytes and GGUF quantization

### 🧠 **Advanced Generation**
- **Reasoning traces** with `--no-think` toggle
- **Animated streaming output** with real-time rendering
- **Markdown rendering** with syntax highlighting
- **Temperature and top-p** sampling controls

### 🔍 **Model Explorer (`face` command)**
- `--list [filter]` - Browse thousands of models with smart filtering
- `--info model-id` - Detailed model cards with stats and README
- `--files model-id` - Repository browser with file sizes and GGUF variants

### 📊 **Performance Benchmarking**
- **Latency measurements** - Precise timing for each generation
- **Tokens/second** - Throughput analysis
- **JSON export** - Structured results for analysis
- **Batch testing** - Multiple iterations for statistical accuracy

### 🎨 **Rich Interface**
- **ASCII art logo** with colorful branding
- **Progress spinners** and live updates
- **Syntax highlighting** for 50+ programming languages
- **Tables and panels** for organized information display

## 🚀 Quick Start

### Installation

```bash
# Core installation (Transformers models only)
pip install cognicli

# With quantization support (BitsAndBytes)
pip install cognicli[quantization]

# With GGUF support  
pip install cognicli[gguf]

# GPU-optimized (CUDA + quantization)
pip install cognicli[gpu]

# Apple Silicon (Metal + quantization)
pip install cognicli[metal]

# Everything included
pip install cognicli[full]

Note: The CLI will automatically prompt to install missing dependencies when you try to use features that require them.

Basic Usage

# Explore available models
cognicli --list llama

# Get detailed model information
cognicli --info microsoft/DialoGPT-medium

# Load and chat with a model
cognicli --model microsoft/DialoGPT-medium --chat

# Generate a single response
cognicli --model gpt2 --generate "The future of AI is"

# Use GGUF model with specific quantization
cognicli --model TheBloke/Llama-2-7B-Chat-GGUF --gguf-file llama-2-7b-chat.q4_0.gguf --chat

📖 Comprehensive Usage Guide

Model Management

# List trending models
cognicli --list

# Filter models by name
cognicli --list "code"

# Get model details
cognicli --info codellama/CodeLlama-7b-Python-hf

# Browse model files and GGUF variants
cognicli --files TheBloke/CodeLlama-7B-Python-GGUF

Precision & Quantization

# BitsAndBytes 4-bit quantization for Transformers models
cognicli --model microsoft/DialoGPT-large --type q4 --chat

# BitsAndBytes 8-bit quantization
cognicli --model microsoft/DialoGPT-large --type q8 --generate "Hello world"

# Mixed precision training
cognicli --model gpt2 --type bf16 --generate "High performance generation"

# GGUF quantization (automatic detection)
cognicli --model TheBloke/Llama-2-7B-GGUF --type q4 --chat

Interactive Chat

# Start chat mode
cognicli --model microsoft/DialoGPT-medium --chat

# Chat with custom settings
cognicli --model gpt2 --type bf16 --temperature 0.8 --no-think --chat

Benchmarking

# Basic benchmark
cognicli --model gpt2 --benchmark

# Save results to JSON
cognicli --model gpt2 --benchmark --json --save-benchmark results.json

# Custom benchmark prompt
cognicli --model gpt2 --benchmark --generate "Custom benchmark prompt"

GGUF Models

# Auto-select GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --chat

# Specify exact GGUF file
cognicli --model TheBloke/Llama-2-7B-GGUF --gguf-file llama-2-7b.q4_0.gguf --chat

# List available GGUF files
cognicli --files TheBloke/Llama-2-7B-GGUF

🛠️ Advanced Configuration

Quantization Options

CogniCLI supports multiple quantization backends:

BitsAndBytes (for Transformers models):
- --type q4: 4-bit NF4 quantization with double quantization
- --type q8: 8-bit quantization with CPU offloading
- Automatic GPU memory optimization
- Works with any Transformers-compatible model
GGUF (for llama.cpp models):
- --type q4: Native GGUF 4-bit quantization
- --type q8: Native GGUF 8-bit quantization
- CPU and GPU acceleration support
- Optimized for inference speed

# Compare quantization methods
cognicli --model microsoft/DialoGPT-medium --type q4 --benchmark  # BitsAndBytes
cognicli --model TheBloke/DialoGPT-medium-GGUF --type q4 --benchmark  # GGUF

Environment Variables

# Set cache directory
export COGNICLI_CACHE_DIR="/path/to/cache"

# Configure Hugging Face token
export HUGGINGFACE_TOKEN="your_token_here"

# Set default model
export COGNICLI_DEFAULT_MODEL="microsoft/DialoGPT-medium"

Model Configuration

# ~/.cognicli/config.yaml
default_model: "gpt2"
default_precision: "fp16"
default_temperature: 0.7
default_max_tokens: 512
cache_dir: "~/.cognicli/cache"
streaming: true
show_thinking: true

🏗️ Architecture

CogniCLI is built with a modular architecture:

Model Loaders: Unified interface for Transformers and GGUF
Generation Engine: Streaming and batch generation with precision control
CLI Framework: Rich terminal interface with animated components
Benchmark Suite: Performance measurement and analysis tools
Model Explorer: Hugging Face integration for model discovery

🔧 Development

Building from Source

git clone https://github.com/cognicli/cognicli.git
cd cognicli
pip install -e .

Running Tests

pytest tests/

Contributing

We welcome contributions! Please see our Contributing Guide for details.

📊 Performance

CogniCLI is optimized for both speed and memory efficiency:

GPU Acceleration: Automatic CUDA detection and optimization
Memory Management: Smart batching and gradient checkpointing
Quantization: 4-bit and 8-bit GGUF support for resource-constrained environments
Streaming: Real-time token generation with minimal latency

Benchmark Results

Model	Backend	Precision	Tokens/sec	Memory (GB)	Latency (ms)
GPT-2	Transformers	fp16	45.2	1.2	22
GPT-2	Transformers	q4 (BnB)	38.7	0.8	26
GPT-2	GGUF	q4	42.1	0.6	24
Llama-7B	Transformers	fp16	12.3	14.2	81
Llama-7B	Transformers	q4 (BnB)	15.8	4.1	63
Llama-7B	GGUF	q4	18.2	3.8	55

🤝 Support

Documentation: docs.cognicli.ai
Issues: GitHub Issues
Discussions: GitHub Discussions
Discord: CogniCLI Community

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Hugging Face for the transformers library and model hub
BitsAndBytes for efficient quantization algorithms
llama.cpp team for GGUF format and optimization
Rich for the beautiful terminal interface
PyTorch for the deep learning foundation

Made with ❤️ by the CogniCLI team

Transform your command line into an AI powerhouse 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.2.6

Aug 23, 2025

2.2.5

Aug 23, 2025

2.2.4

Aug 23, 2025

2.2.3

Aug 21, 2025

2.2.1

Aug 21, 2025

2.2.0

Aug 21, 2025

2.1.1

Aug 19, 2025

2.1.0

Aug 19, 2025

2.0.9

Aug 19, 2025

2.0.8

Aug 19, 2025

2.0.7

Aug 19, 2025

2.0.6

Aug 19, 2025

2.0.5

Aug 19, 2025

2.0.3

Aug 19, 2025

2.0.2

Aug 19, 2025

2.0.1

Aug 19, 2025

1.1.3

Aug 17, 2025

This version

1.1.2

Aug 17, 2025

1.1.1

Aug 17, 2025

1.1.0

Aug 17, 2025

1.0.2

Aug 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognicli-1.1.2.tar.gz (17.8 kB view details)

Uploaded Aug 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cognicli-1.1.2-py3-none-any.whl (15.9 kB view details)

Uploaded Aug 17, 2025 Python 3

File details

Details for the file cognicli-1.1.2.tar.gz.

File metadata

Download URL: cognicli-1.1.2.tar.gz
Upload date: Aug 17, 2025
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cognicli-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`31fd5dd636e582da443e52bde6b183cf540700ca47874880997979fefbb3c6d0`
MD5	`1dd9cf6fae14ebc15156ffd6af57b58f`
BLAKE2b-256	`1a9f33d8afda53d163dc2d65b1eb158d7b794cf28d033c96b5636ad8986b81cf`

See more details on using hashes here.

File details

Details for the file cognicli-1.1.2-py3-none-any.whl.

File metadata

Download URL: cognicli-1.1.2-py3-none-any.whl
Upload date: Aug 17, 2025
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cognicli-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dce55cfe5dd94e8a5ebc3d4dc0ff2aa5908fe39283034f537f06e8ff190a4a28`
MD5	`80cbce9d77a004528563e93d53ae17a9`
BLAKE2b-256	`ff8a29347997e8aebe030171041a02c1e2eacb331a320317ff79cc978cb13dff`

See more details on using hashes here.

cognicli 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Generation Controls

Basic Usage

📖 Comprehensive Usage Guide

Model Management

Precision & Quantization

Interactive Chat

Benchmarking

GGUF Models

🛠️ Advanced Configuration

Quantization Options

Environment Variables

Model Configuration

🏗️ Architecture

🔧 Development

Building from Source

Running Tests

Contributing

📊 Performance

Benchmark Results

🤝 Support

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes