Skip to main content

Run large language models.

Project description

LLM Text Generation

Generate natural language text with large language models.

Installation

This project requires Python 3.10 or higher.

From PyPI

pip install llm-text-generation

From source

git clone git@github.com:ad-freiburg/llm-text-generation.git
cd llm-text-generation
pip install -e .

Usage

From Python

From command line

After installation the command llm-gen is available in your python environment. It lets you use the text generation models directly from the command line. Below are examples of how to use llm-gen. See llm-gen -h for all options.

# print version
llm-gen -v

# list available models
llm-gen -l

# by default llm-gen tries to read stdin, complete the input it got line by line 
# and prints the completed lines back out
# therefore, you can for example use text generation with pipes
echo "The capital of Germany is" | llm-gen
cat "path/to/input/file.txt" | llm-gen > output.txt

# complete a string using
llm-gen -p "The capital of Germany is"

# complete a text file line by line and print the completed lines
llm-gen -f path/to/input/file.txt
# optionally specify an output file path where the completed lines are saved
llm-gen -f path/to/input/file.txt -o output.txt

# start an interactive text generation session
# where your input will be completeed and printed back out
llm-gen -i

# start a text generation server with the following endpoints:
### /models [GET] --> output: available models as json 
### /info [GET] --> output: info about backend as json
### /generate [POST] input: some input text --> output: continuation of the input text
### /live [WS] websocket endpoint for live text generation (only single unbatched requests)
llm-gen --server <config_file>

### OPTIONS
### Pass the following flags to the llm-gen command to customize its behaviour
-m <model_name> # use a different text generation model than the default one 
--cpu # force execution on CPU, by default a GPU is used if available
--progress # display a progress bar (always on when a file is repaired using -f)
-b <batch_size> # specify a different batch size
-batch-max-tokens <batch_max_tokens> # limit batch by a number of tokens and not by number of samples
-u # do not sort the inputs before completeing
-e <experiment_dir> # specify the path to an experiment directory to load the model from 
                    # (equivalent to TextGenerator.from_experiment(experiment_dir) in Python API)
--force-download # force download of the text generation model even if it was already downloaded
--progress # show a progress bar while completing
--report # print a report on the runtime of the model after finishing the completion

Note: When first using llm-gen with a pretrained model, the model needs to be downloaded, so depending on your internet speed the command might take considerably longer.

Note: Loading the text generation model requires an initial startup time each time you invoke the llm-gen command. CPU startup time is around 1s, GPU startup time around 3.5s, so for small inputs or files you should probably pass the --cpu flag to force CPU execution for best performance.

See configs/server.yaml for an exemplary server configuration file.

Documentation

Use pretrained model

If you just want to use this project to complete whitespaces, this is the recommended way.

from llm_text_generation import TextGenerator

gen = TextGenerator.from_pretrained(
    # pretrained model to load, get all available models from available_models(),
    # if None, loads the default model
    model=None,
    # the device to run the model on
    # ("cuda" by default)
    device="cuda",
    # optional path to a cache directory where downloaded models will be extracted to,
    # if None, we check the env variable TEXT_GENERATION_CACHE_DIR, if it is not set 
    # we use a default cache directory at <install_path>/api/.cache 
    # (None by default)
    cache_dir=None,
    # optional path to a download directory where pretrained models will be downloaded to,
    # if None, we check the env variable TEXT_GENERATION_DOWNLOAD_DIR, if it is not set 
    # we use a default download directory at <install_path>/api/.download
    # (None by default)
    download_dir=None,
    # force download of model even if it already exists in download dir
    # (False by default)
    force_download=False
)

When used for the first time with the command line interface or Python API the pretrained model will be automatically downloaded. However, you can also download our pretrained models first as zip files, put them in a directory on your local drive and set TEXT_GENERATION_DOWNLOAD_DIR (or the download_dir parameter above) to this directory.

Use own model

Once you trained your own model you can use it in the following way.

from llm_text_generation import TextGenerator

gen = TextGenerator.from_experiment(
    # path to the experiment directory that is created by your training run
    experiment_dir="path/to/experiment_dir",
    # the device to run the model on
    # ("cuda" by default)
    device="cuda"
)

Directory structure

The most important directories you might want to look at are:

configs -> (example yaml config files for training and server)
src -> (library code used by this project)

Docker

You can also run this project using docker. Build the image using

docker build -t llm-text-generation .

If you have an older GPU build the image using

docker build -t llm-text-generation -f Dockerfile.old .

By default, the entrypoint is set to the llm-gen command, so you can use the Docker setup like described here earlier.

You can mount /llm-gen/cache and /llm-gen/download to volumes on your machine, such that you do not need to download the models every time.

# complete text
docker run llm-text-generation -p "completethisplease"

# complete file
docker run llm-text-generation -f path/to/file.txt

# start a server
docker run llm-text-generation --server path/to/config.yaml

# with volumes
docker run -v $(pwd)/.cache:/llm-gen/cache -v $(pwd)/.download:/llm-gen/download \
  llm-text-generation -c "completethisplease"

# optional parameters recommended when using a GPU:
# --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864

Note
----
Make sure you have docker version >= 19.03, a nvidia driver
and the nvidia container toolkit installed (see https://github.com/NVIDIA/nvidia-docker)
if you want to run the container with GPU support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

llm_text_generation-0.1.0-py2.py3-none-any.whl (26.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file llm_text_generation-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for llm_text_generation-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3b0ab7c64747b09812998ddcce506dcb5f1d532c0c0aab6ee53d4c9f1819e5d3
MD5 09daaa97081cca4a8536e1e0abb05873
BLAKE2b-256 181a3511e246086d18e5ec060f2b58d5b8ee73e3ccb4d238ae60a03b40b5c3e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page