min-llm-server-client

A minimal Flask API server for local HuggingFace LLMs

These details have not been verified by PyPI

Project description

LLM REST API

The simplest possible python code for inference call of LLMs as a REST API server and a simple client for it

This setting both the server and client being written in Python and running on the same computer.

This is the basic code if you want to call LLMs in your server, or own computer and make it accessible to applications and codes locally.

Installation

To install the package, clone the repository and run the following command in the project root directory:

pip install .

You can install the dependencies using:

pip install -r requirements.txt

Usage

Configuration

The configuration settings, such as the model path and device settings, can be found in setting parameter of server. Make sure to update these settings according to your environment.

Starting the Server

To start the LLM inference server, run the following command:

To run:

 python -m min_llm_server_client.src.local_llm_inference_server_api

You can set the arguments such as:

--model_name
--max_new_tokens
--device

For example:

 python -m min_llm_server_client.src.local_llm_inference_server_api --model_name  meta-llama/Llama-3.3-70B-Instruct --max_new_tokens  100 --device cuda:1

device could be cpu or 0 or 1 or any other number meaning the core gpu number to use

running on cpu:

 python -m min_llm_server_client.src.local_llm_inference_server_api --model_name  openai/gpt-oss-20b --max_new_tokens  100 --device cpu

Usage on browser:

get test with curl http://127.0.0.1:5000/llm/q

or: post test no user : curl -X POST http://127.0.0.1:5000/llm/q -H "Content-Type: application/json" -d '{"query": "what is earth?"}'

post test no user : curl -X POST http://127.0.0.1:5000/llm/q -H "Content-Type: application/json" -d '{"query": "what is earth?" , "key": "key1"}'

Local test runs using lamma 3.1 8B:

intel cpu takes 30 seonds : memory cpu 2.4 used GB A100 gpu less than a seoncd; memroy GPU 34 GB , cpu 4.8 GB

Author's contact :

sadeghi.afshin@gmail.com

License

This project is open source. licensed under the Apache 2.0 License. See the LICENSE file for more details.

Explnation:

This project provides a simple REST API server and client for interacting with a local language model (LLM) inference server. The server is built using Flask and allows users to send queries to the model and receive generated responses.

Project Structure

llm_server_client
├── src
│   ├── __init__.py
│   ├── local_llm_inference_api_client.py
│   ├── local_llm_inference_server_api.py
│   └── setting.py
├── setup.py
└── README.md

Using in third party code

Sending Queries:

To interact with the server, you can use the client provided in src/local_llm_inference_api_client.py. This client includes functions to send queries to the server and handle responses.

Example usage

Here is a simple example of how to send a query to the server:

from src.local_llm_inference_api_client import send_query

response = send_query("What is the capital of France?", user="user1", key="key1")
print(response)

Dependencies

This project requires the following Python packages:

Flask
transformers
sentencepiece

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.3

May 5, 2026

0.4.2

May 5, 2026

0.4.1

May 5, 2026

0.4.0

May 5, 2026

0.3.10

Oct 3, 2025

0.3.9

Sep 16, 2025

0.3.8

Sep 15, 2025

0.3.7.1

Sep 15, 2025

0.3.7

Sep 15, 2025

0.3.5

Aug 28, 2025

0.2.0

Aug 28, 2025

This version

0.1.0

Aug 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

min_llm_server_client-0.1.0.tar.gz (6.7 kB view details)

Uploaded Aug 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

min_llm_server_client-0.1.0-py3-none-any.whl (7.2 kB view details)

Uploaded Aug 28, 2025 Python 3

File details

Details for the file min_llm_server_client-0.1.0.tar.gz.

File metadata

Download URL: min_llm_server_client-0.1.0.tar.gz
Upload date: Aug 28, 2025
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for min_llm_server_client-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bcbb5287ac73ca1aff8127b37bb4d5a2f14b81358cdc3402f8f673c5908529b5`
MD5	`9b45e6f02b59e899d52daae6101d103e`
BLAKE2b-256	`f832a8861f98663c641634daf26824e1b5092bd376fd568c39166e8cf7a846de`

See more details on using hashes here.

File details

Details for the file min_llm_server_client-0.1.0-py3-none-any.whl.

File metadata

Download URL: min_llm_server_client-0.1.0-py3-none-any.whl
Upload date: Aug 28, 2025
Size: 7.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for min_llm_server_client-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f37c5e62bb35e4967f85878496105a13bab50c8f649506d4aa640ef42dd4f261`
MD5	`0dd8813656089cf63ddc7116dc58eb76`
BLAKE2b-256	`73f2ee81d14ebe6981bf141f4e76476031e80faaa2eadcba1019440423327be3`

See more details on using hashes here.

min-llm-server-client 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLM REST API

Installation

Usage

Configuration

Starting the Server

To run:

Usage on browser:

Author's contact :

License

Explnation:

Project Structure

Using in third party code

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes