Skip to main content

An AI library (in development)

Project description

corellm — LLM integration library

corellm is an LLM integration library that provides an easy-to-use API for running local large language models (LLM) (via llama-cpp-python) and for instantly building a web GUI (via Gradio). The library exposes a single high-level LLM class for chat/prompt interactions (including streaming) and an Interface class to render a Gradio-based chat interface.

Local LLMs can be downloaded as .gguf models. You can find these on Hugging Face.


Quick Start

from corellm import LLM

llm = LLM(path="models/your-model.gguf")
resp = llm.chat("Hello — what can you do?")
print(resp)

Note: corellm uses llama-cpp-python under the hood to load and run GGUF models. Gradio is used only by the optional Interface class. These dependencies are automatically installed when downloading corellm.


Concepts

  • Memory: LLM keeps a conversation history in memory, a list of role/content dictionaries ({"role": "system"|"user"|"assistant", "content": str}) similar to standard chat formats. The first element is always the system prompt.
  • System prompt: The initial system instruction (defaults to "You are a helpful CoreLLM assistant."). It is stored as memory[0].
  • Stateless prompt: A one-off prompt that does not modify memory.
  • Streaming: Token-by-token output is available via generators for both chat and stateless prompts.

API Reference

The API below documents classes and methods you can interact with. Types and behavior are written to match the exposed runtime behavior.

class corellm.LLM

Create an LLM instance and interact with the model.

LLM(path: str, system_prompt: str = "You are a helpful CoreLLM assistant.", max_tokens: int = 4096) -> None

Create a new LLM instance.

  • path: Filesystem path to the GGUF model file to load.
  • system_prompt: Initial system message persisted at the start of memory.
  • max_tokens: Maximum context window (passed to the underlying model).

Raises: Errors from the model loader (for example if path is invalid).

Example:

llm = LLM(path="models/your-model.gguf")

chat(prompt: str) -> str

Send a prompt as a user message and store both the prompt and the assistant response in conversation memory.

  • prompt: Text to send to the model.
  • Returns: the assistant's response string.

Behavior:

  • Appends {"role":"user","content": prompt} to memory before calling the model.
  • Appends {"role":"assistant","content": response} to memory after receiving the reply.

Example:

resp = llm.chat("Summarize this paragraph.")
print(resp)
# memory now contains the user prompt and the assistant response

prompt(prompt: str, use_memory: bool = True) -> str

Send a prompt but do not modify persistent conversation memory.

  • prompt: Text to send to the model.
  • use_memory: If True, the model call uses the current memory as context (so it is influenced by prior messages); if False, only the current system prompt and the supplied prompt are used.
  • Returns: the assistant's response string.

Example:

# Ask a one-off question without recording it in memory
answer = llm.prompt("Translate to Spanish: 'Good morning'", use_memory=False)

chat_stream(prompt: str, print_stream: bool = False) -> Generator[str] | None

Stream tokens for a chat()-style interaction and add the final response to memory.

  • prompt: The prompt text to send (will be appended to memory as a user message).
  • print_stream: If True, tokens are printed to stdout as they are produced and the method auto-runs the stream and returns None. If False, the method returns a generator that yields tokens (strings) which you can iterate to consume the streamed output.
  • Returns: a generator that yields token chunks (if print_stream is False), or None if print_stream is True.

Behavior notes:

  • The full assistant message is appended to memory after the stream completes.
  • If you want to collect the whole response while streaming, iterate the generator and concatenate tokens.

Examples:

# Manual consumption of stream
gen = llm.chat_stream("Explain recursion simply.")
response = ""
for token in gen:
    response += token
print("Full response:", response)

# Auto-run printing
llm.chat_stream("Tell me a short joke.", print_stream=True)
# prints the tokens as they arrive and returns None

prompt_stream(prompt: str, print_stream: bool = False, use_memory: bool = True) -> Generator[str] | None

Stream tokens for a stateless prompt (does not add assistant response to memory).

  • prompt: The prompt text.
  • print_stream: If True, tokens are printed to stdout while the stream runs and the method returns None. If False, it returns a generator you can iterate.
  • use_memory: If True, the stream uses the current memory as context; if False, only system prompt + the prompt are used.
  • Returns: a generator yielding token chunks (if print_stream is False), otherwise None.

Example:

# Collect tokens from the stream and build the final string
tokens = []
for t in llm.prompt_stream("Write a haiku about autumn.", use_memory=False):
    tokens.append(t)
haiku = "".join(tokens)
print(haiku)

clear_memory() -> None

Reset conversation memory to only the system prompt.

Example:

llm.clear_memory()

set_memory(memory: list[dict[str, str]]) -> None

Replace the entire conversation memory.

  • memory must be a list of dictionaries where each dict has at least the keys "role" and "content". Typical roles: "system", "user", "assistant".

Example:

llm.set_memory([
    {"role": "system", "content": "You are an assistant that replies in bullet points."},
    {"role": "user", "content": "List the steps to boil an egg."},
    {"role": "assistant", "content": "1. ... 2. ..."}
])

get_memory() -> list[dict[str, str]]

Return the current conversation memory.

Example:

history = llm.get_memory()

modify_system_prompt(system_prompt: str) -> None

Change the system prompt (overwrites memory[0]).

Example:

llm.modify_system_prompt("You are concise and formal.")

class corellm.Interface

A small utility class that builds and launches a Gradio-based chat interface.

Interface(llm: LLM, port: int = 3001) -> None

Create an Interface bound to an LLM instance.

  • llm: A corellm.LLM instance.
  • port: Port to run the Gradio server on.

render() -> None

Launch the Gradio chat interface in a browser. The UI will reflect the LLM's current memory (user/assistant pairs are shown). The interface streams responses if the underlying LLM streams.

Notes:

  • This requires Gradio to be installed (gradio package).
  • The function launches a local server and opens a browser tab by default (Gradio standard behavior).

Example:

from corellm import LLM, Interface

llm = LLM(path="models/your-model.gguf")
ui = Interface(llm, port=3001)
ui.render()

Examples

1) Simple chat (stateful)

from corellm import LLM

llm = LLM(path="models/your-model.gguf")
print(llm.chat("Hello! Who are you?"))

# The model remembers that conversation
print(llm.chat("What did I ask you previously?"))

2) One-off stateless prompt

summary = llm.prompt("Summarize this text: <paste text here>", use_memory=False)
print(summary)

# memory is unchanged by prompt(..., use_memory=False)

3) Streaming and collecting tokens manually

gen = llm.chat_stream("Explain the difference between TCP and UDP.")
collected = []
for token in gen:
    collected.append(token)
final_response = "".join(collected)
print(final_response)
# full response is now also appended to memory

4) Streaming with automatic printing

# prints tokens to stdout as they arrive; returns None
llm.prompt_stream("Write a short limerick about a coder.", print_stream=True, use_memory=False)

5) Memory manipulation

# Reset to fresh system prompt
llm.clear_memory()

# Overwrite memory with a custom conversation
llm.set_memory([
    {"role":"system","content":"You are an assistant that always answers in JSON."},
    {"role":"user","content":"Provide metadata for a book."}
])

# Inspect
print(llm.get_memory())

# Change system prompt (does not remove other messages)
llm.modify_system_prompt("You are an assistant that replies in YAML.")

6) Launching the GUI (optional)

from corellm import LLM, Interface

llm = LLM(path="models/your-model.gguf")
ui = Interface(llm, port=3001)
ui.render()  # launches Gradio and opens browser

Usage notes & best practices

  • Model path: Provide the correct GGUF model file path when constructing LLM. The library will pass this to llama_cpp to load the model.

  • Dependencies: llama-cpp-python is used for low-level operations with LLM models. gradio is used in the Interface class to render the chat window.

  • Memory size and tokens: Be mindful of the max_tokens/context window. Large histories can exceed the model context limit and may cause errors or truncated context.

  • Streaming behavior:

    • If you pass print_stream=True to chat_stream or prompt_stream, the method will print tokens as they arrive and will return None. If you need the generator to process tokens yourself, set print_stream=False.
    • The chat_stream method appends the final assistant message to memory after the stream completes; prompt_stream does not.
  • Threading / concurrency: The underlying model runtime may not be thread-safe. For concurrent usage, create separate LLM instances per thread or process, or serialize model access.

  • Error handling: Errors from model loading or runtime will bubble up from llama-cpp-python (e.g., file not found, out-of-memory). Catch exceptions around model creation and calls as needed.


Troubleshooting

  • Model fails to load: Confirm the path you provided is correct and points to a GGUF model file compatible with your llama-cpp build.
  • No output or empty responses: Check model compatibility and the tokenization/context length. Try with a smaller prompt or a different model to isolate the issue.
  • Streaming does not yield: If you call streaming with print_stream=True, the function will auto-consume the stream and return None. To get a generator you can iterate, use print_stream=False.
  • GUI does not open: If Interface.render() seems to hang, check your environment (headless servers often need inbrowser=False or share=True depending on your configuration). Also ensure gradio is installed.

Minimal requirements

  • Python 3.8+

  • Optional runtime dependencies only if you use those features:

    • llama-cpp-python — required for model execution
    • gradio — required only to use Interface

Design intent

corellm aims to provide a clear and compact API for integrating local LLMs into applications with minimal friction:

  • Make it straightforward to do stateful chat and stateless prompts.
  • Provide token-level streaming so applications can render progressive output.
  • Keep a simple and inspectable conversation memory model.
  • Offer a small, optional web UI for development and demos.

corellm's API exposes a stable LLM and Interface abstraction so you can integrate model-powered features fast.


Contributing

Contributions, bug reports, and feature requests are welcome. When opening issues or PRs, include:

  • Minimal reproduction (snippets)
  • Model path or example model (if relevant)
  • Error tracebacks

Updates to corellm are done by Jadon Duff.


License

MIT License

Copyright (c) 2025 Jadon Duff

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corellm-0.1.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

corellm-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file corellm-0.1.0.tar.gz.

File metadata

  • Download URL: corellm-0.1.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for corellm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 158e25c2204ece56ce7a2a9f97b53f024f3deeada96537c46eeb5b12d568576e
MD5 88c179e5605b747399bf0bc519a87dd9
BLAKE2b-256 d52de241a741a32af1f1cf89727cb77178f2e5af6537001f4b0c1edd68853758

See more details on using hashes here.

File details

Details for the file corellm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: corellm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for corellm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ea4b2a3c3d8cbf8e4e9736a94e4277f9978ea9420b0d929888e8f502374602b
MD5 807457abae9ef666fa255f37fc5dc7d8
BLAKE2b-256 f6a280e1bd3477fe871c0587ac6ca697826b22fc96d3e24e3f24c33c50ad979e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page