The simplest way to chat with Llama-3.3-70B for free
Project description
FreeLlama
The simplest Python client for free Llama-3.3-70B access via public proxies
freellama is a lightweight, easy-to-use Python package that gives you instant access to the powerful Llama-3.3-70B-Instruct model — completely free, no API key, no registration required.
It works by communicating directly with public LLM web interfaces, making high-quality AI available to everyone with just a few lines of code.
Features
- Zero setup — no accounts, no keys
- Simple
.ask("your message")interface - Optional conversation memory (sends full history when
limitis enabled) - Per-conversation message limit with automatic reset
- Clean CLI for interactive chatting
- Streaming support
- Minimal dependencies (only
requests)
Installation
pip install freellama
Requires Python 3.8+
Quick Start
Programmatic Use
from freellama import FreeLlama
# One-shot queries (no memory)
print(FreeLlama().ask("Tell me a joke"))
# With conversation memory
bot = FreeLlama(limit=20) # remembers up to 20 user messages
bot.ask("My name is Alice")
print(bot.ask("What is my name?")) # → Should recall "Alice" if backend supports it
Interactive Chat (CLI)
freellama # starts interactive chat
freellama --limit 15 # enables memory, resets after 15 messages
freellama "Hello!" # one-shot message
freellama --stream # streaming output
Usage Examples
# Persistent conversation
bot = FreeLlama(limit=10)
bot.ask("Explain quantum entanglement")
bot.ask("Now give a real-world analogy")
bot.ask("Make it simpler")
# ... continues with context until limit reached
# Quick stateless queries
questions = ["Capital of Japan?", "2+2?", "Best pizza topping?"]
for q in questions:
print(FreeLlama().ask(q))
CLI Options
freellama --help
usage: freellama [-h] [--limit LIMIT] [--stream] [message]
FreeLlama - Free Llama-3.3-70B Chat Client
positional arguments:
message Send a single message and exit
options:
-h, --help show this help message
--limit LIMIT Enable memory: max user messages before conversation reset
--stream Show response token-by-token
Important Note on Memory
The underlying public services are free anonymous proxies and do not officially guarantee multi-turn conversation memory.
When you set --limit or limit=N, FreeLlama sends the full conversation history with every request — this is the correct format and gives the best possible chance of context retention.
Memory works best for short conversations and may vary depending on the backend load and model routing.
Author
IMApurbo
GitHub: @IMApurbo
License
MIT License — free to use, modify, and distribute.
Enjoy frontier-level AI for free — no barriers, no costs! 🚀
Made with ❤️ by IMApurbo
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freellama-1.0.0-py3-none-any.whl.
File metadata
- Download URL: freellama-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cbecdb80c4594d8ff40c81d3f651ded71a25f2e50cc0c78aeaf512d9f83db52
|
|
| MD5 |
2b999ca890321187ded98bc5366606d8
|
|
| BLAKE2b-256 |
2472afa9591702ef4f91568b423aec871522d44c832fe8817a175b6699bb2126
|