Skip to main content

A local language model API for sharing models between programs.

Project description

ShareLMAPI

English | 中文

ShareLMAPI is a local language model sharing API that uses FastAPI to provide interfaces, allowing different programs to share the same local model, reducing resource consumption. It supports streaming generation and various model configuration methods.

Table of Contents

Features

  • Support for multiple model loading methods: default, BitsAndBytes quantization, PEFT
  • Support for streaming and non-streaming text generation
  • Support for dialogue history and system prompts
  • Easy to configure and extend
  • Flexible model server URL configuration

Installation

1. Clone the repository

git clone https://github.com/yourusername/ShareLMAPI.git
cd ShareLMAPI

2. Install dependencies

Dependencies can be installed using Conda or Pip.

Using Conda:

conda env create -f environment.yml
conda activate ShareLMAPI

Using Pip:

pip install -r requirements.txt

3. Install for local development

If you plan to develop this package, use the following command to install it:

pip install -e .

Configuration

  1. Navigate to the configs directory and open model_config.yaml.
  2. Modify the configuration according to your needs. You can specify:
    • Model name
    • Loading method (default, bitsandbytes, or peft)
    • Device (CPU or CUDA)
    • Other model-specific settings
    • Model server URL

Example configuration:

model:
  name: "gpt-2"
  loading_method: "default"
  default:
    device: "cuda"
  bitsandbytes:
    device: "cuda"
    quantization_config:
      quant_type: "nf4"
      load_in_4bit: True
      bnb_4bit_quant_type: "nf4"
      bnb_4bit_compute_dtype: "float16"
      bnb_4bit_use_double_quant: False
  peft:
    device: "cuda"
    peft_type: "lora"
    peft_config:
      r: 8
      lora_alpha: 16
      lora_dropout: 0.1
      target_modules: ["q_proj", "v_proj"]

model_server:
  model_server_url: "http://localhost:5000"

Usage

Start the model server

First, start the model server to load and manage the language model:

uvicorn ShareLMAPI.server.model_server:app --host 0.0.0.0 --port 5000

Start the frontend API server

After the model server is running, start the frontend server to handle client requests:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker ShareLMAPI.server.server:app --bind 0.0.0.0:8000

API Documentation

1. /generate_stream

Generate model responses and stream the results.

  • Method: POST
  • URL: http://localhost:8000/generate_stream
  • Parameters:
    • dialogue_history: List of dialogue messages (optional)
    • prompt: User input prompt (if dialogue history is not provided)
    • max_length: Maximum number of tokens to generate
    • temperature: Parameter to control generation randomness
    • generation_kwargs: Other generation parameters (optional)

2. /generate

Generate model responses without streaming.

  • Method: POST
  • URL: http://localhost:8000/generate
  • Parameters: Same as /generate_stream

Client Usage

install

pip install ShareLMAPI

Here's an example of how to use the LocalModelAPIClient to call the API:

from ShareLMAPI.client import ShareLMAPIClient

# Create API client
client = ShareLMAPIClient(base_url="http://localhost:8000")

# Streaming generation
for chunk in client.generate_text("Once upon a time", max_length=50, streamer=True):
    print(chunk, end='', flush=True)

# Non-streaming generation
response = client.generate_text("What is the capital of France?", max_length=50, streamer=False)
print(response)

# Using dialogue history
dialogue_history = [
    {"role": "user", "content": "Hi, who are you?"},
    {"role": "assistant", "content": "I'm an AI assistant. How can I help you today?"},
    {"role": "user", "content": "Can you explain quantum computing?"}
]
response = client.generate_text(dialogue_history=dialogue_history, max_length=200, streamer=False)
print(response)

Testing

Run the following command in the project root directory to execute tests:

pytest -s tests/test_client.py

This will run the tests and display the output results.

Contributing

Contributions of any form are welcome. Please follow these steps:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sharelmapi-0.1.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

ShareLMAPI-0.1.1-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file sharelmapi-0.1.1.tar.gz.

File metadata

  • Download URL: sharelmapi-0.1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for sharelmapi-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bc997492dbbe5dce17260588eb740455b6e7d467522bdd4ca1b9bad86acacac8
MD5 1a18af08d9318ad512cbbce8cf7ff5f9
BLAKE2b-256 d8014b9a4175c46685d75d954c1c13a23ef764ca234ad70920daa7d2c82ec8da

See more details on using hashes here.

File details

Details for the file ShareLMAPI-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ShareLMAPI-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for ShareLMAPI-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 faaf16ec76afeff3c7e252cdcbe1a149d9cc7726fcdfc71a286bb2b265843136
MD5 e0df82bd13f20c2bbdb189ed9372e6ea
BLAKE2b-256 0c9c8f5f29b6554e74d74fd431272056eae39851336fe0dc6e9ecdea9bda4205

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page