Skip to main content

A local language model API for sharing models between programs.

Project description

ShareLMAPI

English | 中文

ShareLMAPI is a local language model sharing API that uses FastAPI to provide interfaces, allowing different programs to share the same local model, thereby reducing resource consumption. It supports streaming generation and various model configuration methods.

Table of Contents

Features

  • Support for multiple model loading methods: default, BitsAndBytes quantization, PEFT
  • Support for streaming and non-streaming text generation
  • Support for dialogue history and system prompts
  • Easy to configure and extend
  • Flexible model server URL configuration

Installation

1. Clone the Repository

git clone https://github.com/starpig1129/ShareLMAPI.git
cd ShareLMAPI

2. Install Dependencies

Dependencies can be installed using either Conda or Pip.

Using Conda:

conda env create -f environment.yml
conda activate ShareLMAPI

Using Pip:

pip install -r requirements.txt

3. Local Installation

If you plan to use it locally, install the package using:

pip install -e .

Configuration

  1. Navigate to the configs directory and open model_config.yaml.
  2. Modify the configuration according to your needs. You can specify:
    • Model name
    • Loading method (default, bitsandbytes, or peft)
    • Device (CPU or CUDA)
    • Other model-specific settings
    • Model server URL

Configuration example:

model:
  name: "gpt-2"
  loading_method: "default"
  default:
    device: "cuda"
  bitsandbytes:
    device: "cuda"
    quantization_config:
      quant_type: "nf4"
      load_in_4bit: True
      bnb_4bit_quant_type: "nf4"
      bnb_4bit_compute_dtype: "float16"
      bnb_4bit_use_double_quant: False
  peft:
    device: "cuda"
    peft_type: "lora"
    peft_config:
      r: 8
      lora_alpha: 16
      lora_dropout: 0.1
      target_modules: ["q_proj", "v_proj"]

model_server:
  model_server_url: "http://localhost:5000"

Usage

Start the Model Server

First, start the model server to load and manage the language model:

uvicorn ShareLMAPI.server.model_server:app --host 0.0.0.0 --port 5000

Start the Frontend API Server

After the model server is running, start the frontend server to handle client requests:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker ShareLMAPI.server.server:app --bind 0.0.0.0:8000

Docker Guide

If you want to use Docker to run ShareLMAPI, follow these steps:

1. Build Docker Image

Run the following command in the directory containing the Dockerfile to build the Docker image:

docker build -t sharelmapi .

This will create a Docker image named sharelmapi.

2. Run Docker Container

After building, use the following command to run the container:

docker run -p 5000:5000 -p 8000:8000 sharelmapi

This will start the container and map ports 5000 and 8000 from the container to the corresponding ports on the host.

3. Access the API

You can now access the API via http://localhost:8000, just like in a non-Docker environment.

Notes

  • Ensure that the model settings in your model_config.yaml file are suitable for running in a Docker environment.
  • Consider using Docker volumes if you need to persist data or configurations.
  • For large models, ensure your Docker host has sufficient resources (especially GPU support, if needed).

API Documentation

1. /generate_stream

Generate model responses and stream the results.

  • Method: POST
  • URL: http://localhost:8000/generate_stream
  • Parameters:
    • dialogue_history: List of dialogue messages (optional)
    • prompt: User input prompt (if dialogue history is not provided)
    • max_length: Maximum number of tokens to generate
    • temperature: Parameter to control generation randomness
    • generation_kwargs: Other generation parameters (optional)

2. /generate

Generate model responses without streaming.

  • Method: POST
  • URL: http://localhost:8000/generate
  • Parameters: Same as /generate_stream

Client Usage

Installation

pip install ShareLMAPI

Here's an example of how to use ShareLMAPI to call the API:

from ShareLMAPI.client import ShareLMAPIClient

# Create API client
client = ShareLMAPIClient(base_url="http://localhost:8000")

# Streaming generation
for chunk in client.generate_text("Once upon a time", max_length=50, streamer=True):
    print(chunk, end='', flush=True)

# Non-streaming generation
response = client.generate_text("What is the capital of France?", max_length=50, streamer=False)
print(response)

# Using dialogue history
dialogue_history = [
    {"role": "user", "content": "Hello, who are you?"},
    {"role": "assistant", "content": "I'm an AI assistant. How can I help you today?"},
    {"role": "user", "content": "Can you explain quantum computing?"}
]
response = client.generate_text(dialogue_history=dialogue_history, max_length=200, streamer=False)
print(response)

Testing

Run the following command in the project root directory to execute tests:

pytest -s tests/test_client.py

This will run the tests and display the output results.

Contributing

Contributions of any form are welcome. Please follow these steps:

  1. Fork this repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sharelmapi-0.1.2.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

ShareLMAPI-0.1.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file sharelmapi-0.1.2.tar.gz.

File metadata

  • Download URL: sharelmapi-0.1.2.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for sharelmapi-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6052eded865859df1e050b7d027b146ace389368b790bf945c3fc8eccca904f2
MD5 f40c6844523c55b438dc3dee3f95e832
BLAKE2b-256 bf80ff0d0e3763d9089ab20f860b44dc78bb6b8e219dd4ec332b0ef608fd04e8

See more details on using hashes here.

File details

Details for the file ShareLMAPI-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ShareLMAPI-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for ShareLMAPI-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 17c9f14d8e5738e3606025d3cc434cfc860e99aeaefba72bc67a4942bdb4037d
MD5 6c64e4aaa8e448142b4c586fd2c9140f
BLAKE2b-256 7c91a95c61ac983b2fd833e7e515d487822a18980a812828323bc5df6681804e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page