Standalone LFM ONNX inference with first-run Hugging Face download and local cache
Project description
LFM ONNX HF Library
Install
pip install lfm-onnx-hf
Python Usage
Basic Prompt (sync, stream=False)
from lfm_onnx_hf import LFMOnnxEngine, GenerationConfig
engine = LFMOnnxEngine()
text = engine.basic_prompt(
"What is the capital of France?",
stream=False,
generation=GenerationConfig(
max_new_tokens=64,
temperature=0.0,
top_k=50,
repetition_penalty=1.05,
seed=42,
),
)
print(text)
Basic Prompt (sync, stream=True)
from lfm_onnx_hf import LFMOnnxEngine
engine = LFMOnnxEngine()
for chunk in engine.basic_prompt("Write a one-line poem about the sea.", stream=True):
print(chunk, end="", flush=True)
print()
Basic Prompt + Assistant Prefill
from lfm_onnx_hf import LFMOnnxEngine
engine = LFMOnnxEngine()
text = engine.basic_prompt(
"Return strict JSON with fields city and country.",
assistant_prefill="```json\n",
stream=False,
)
print(text)
Chat Input (sync, stream=False)
from lfm_onnx_hf import LFMOnnxEngine
engine = LFMOnnxEngine()
turns = [
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "My name is Ana."},
{"role": "assistant", "content": "Nice to meet you, Ana."},
{"role": "user", "content": "What is my name?"},
]
text = engine.chat_input(turns, stream=False)
print(text)
Chat Input (sync, stream=True)
from lfm_onnx_hf import LFMOnnxEngine
engine = LFMOnnxEngine()
turns = [{"role": "user", "content": "Give me 3 short productivity tips."}]
for chunk in engine.chat_input(turns, stream=True):
print(chunk, end="", flush=True)
print()
Full Generate API (sync)
from lfm_onnx_hf import LFMOnnxEngine
engine = LFMOnnxEngine()
messages = [{"role": "user", "content": "Explain recursion in 2 sentences."}]
text, stats = engine.generate(
messages=messages,
max_new_tokens=80,
temperature=0.1,
top_k=50,
repetition_penalty=1.05,
seed=7,
assistant_prefill="",
)
print(text)
print(stats)
Async Usage (stream=False and stream=True)
import asyncio
from lfm_onnx_hf import LFMOnnxEngine
async def main():
engine = LFMOnnxEngine()
# async non-stream
text = await engine.basic_prompt_async(
"One word for water in French?",
stream=False,
)
print(text)
# async stream
stream_iter = await engine.chat_input_async(
[{"role": "user", "content": "List 5 planets."}],
stream=True,
)
async for chunk in stream_iter:
print(chunk, end="", flush=True)
print()
asyncio.run(main())
Hugging Face Source
By default, first use downloads from:
- Repo:
cnmoro/LFM-Q4-GGUFS - Subfolder:
2_5_350m - Model:
model_q4.slim.spec.strip.min.onnx
CLI Usage
Basic
lfm-onnx-hf \
--prompt "What is the capital of France?" \
--max-new-tokens 64 \
--temperature 0.0
Stream Output
lfm-onnx-hf \
--prompt "Write a short haiku about rain" \
--stream
Assistant Prefill
lfm-onnx-hf \
--prompt "Return strict JSON with fields city and country" \
--assistant-prefill '```json\n'
Multi-turn Messages
lfm-onnx-hf \
--messages-json '[{"role":"system","content":"Be concise."},{"role":"user","content":"Summarize photosynthesis in one paragraph."}]'
HF Options
lfm-onnx-hf \
--repo-id cnmoro/LFM-Q4-GGUFS \
--subfolder 2_5_350m \
--model model_q4.slim.spec.strip.min.onnx \
--download-max-retries 8 \
--download-initial-backoff 1.5
Common CLI Options
--repo-id--subfolder--model--revision--token--cache-root--prompt--system--messages-json--max-new-tokens--temperature--top-k--repetition-penalty--seed--stream--assistant-prefill--benchmark-runs--provider--intra-op-threads--inter-op-threads--download-max-retries--download-initial-backoff
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lfm_onnx_hf-0.1.0.tar.gz
(9.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lfm_onnx_hf-0.1.0.tar.gz.
File metadata
- Download URL: lfm_onnx_hf-0.1.0.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99a953abbabbbba13539f2f61a57441b653cc42125eb0a5bea735d81e3e23a0f
|
|
| MD5 |
8b01890b19800801d47d8f789007f25b
|
|
| BLAKE2b-256 |
9c69acacc33633fb2b0c9024d9a9d9640a90f360e2027f37df1448f27726e3fe
|
File details
Details for the file lfm_onnx_hf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lfm_onnx_hf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fee3b10034a368f893c48a1c5ece1355c6620c04e756626510ee4559536b8451
|
|
| MD5 |
4444981bc9b4104e9f01395debce8c90
|
|
| BLAKE2b-256 |
bcae88526943d5375fc6b49f2340fdae9849d3078952fa10cd3adbf0c32ff9e7
|