Skip to main content

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Project description

C Transformers PyPI tests build

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Supported Models

Models Model Type
GPT-2 gpt2
GPT-J, GPT4All-J gptj
GPT-NeoX, StableLM gpt_neox
Dolly V2 dolly-v2
StarCoder starcoder

More models coming soon.

Installation

pip install ctransformers

Usage

It provides a unified interface for all models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))

Run in Google Colab

If you are getting illegal instruction error, try using lib='avx' or lib='basic':

llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')

It provides a generator interface for more control:

tokens = llm.tokenize('AI is going to')

for token in llm.generate(tokens):
    print(llm.detokenize(token))

This allows you to use a custom tokenizer.

It also provides access to the low-level C API. See Documentation section below.

Hugging Face Hub

It can be used with models hosted on the Hub:

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

If a model repo has multiple model files (.bin files), specify a model file using:

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml', model_file='ggml-model.bin')

It can be used with your own models uploaded on the Hub. For better user experience, upload only one model per repo.

To use it with your own model, add config.json file to your model repo specifying the model_type:

{
  "model_type": "gpt2"
}

You can also specify additional parameters under task_specific_params.text-generation:

{
  "model_type": "gpt2",
  "task_specific_params": {
    "text-generation": {
      "top_k": 40,
      "top_p": 0.95,
      "temperature": 0.8,
      "repetition_penalty": 1.1,
      "last_n_tokens": 64
    }
  }
}

See marella/gpt-2-ggml for a minimal example and marella/gpt-2-ggml-example for a full example.

LangChain

LangChain is a framework for developing applications powered by language models. A LangChain LLM object can be created using:

from ctransformers.langchain import CTransformers

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))

If you are getting illegal instruction error, try using lib='avx' or lib='basic':

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')

It can also be used with models hosted on the Hugging Face Hub:

llm = CTransformers(model='marella/gpt-2-ggml')

Additional parameters can be passed using the config parameter:

config = {'max_new_tokens': 256, 'repetition_penalty': 1.1}

llm = CTransformers(model='marella/gpt-2-ggml', config=config)

It can be used with other LangChain modules:

from langchain import PromptTemplate, LLMChain

template = """Question: {question}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=['question'])

llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run('What is AI?'))

Documentation

Parameters

Name Type Description Default
top_k int The top-k sampling parameter. 40
top_p float The top-p sampling parameter. 0.95
temperature float The temperature parameter. 0.8
repetition_penalty float The repetition penalty parameter. 1.0
last_n_tokens int Number of last tokens to use for repetition penalty. 64
seed int Seed for sampling tokens. Random
max_new_tokens int Maximum number of new tokens to generate. 256
reset bool Whether to reset the model state before evaluating a new prompt. True
batch_size int Batch size for evaluating tokens. 8
threads int Number of threads to use. Auto

class AutoModelForCausalLM


classmethod AutoModelForCausalLM.from_pretrained

from_pretrained(
    model_path_or_repo_id: 'str',
    model_type: 'Optional[str]' = None,
    model_file: 'Optional[str]' = None,
    config: 'Optional[AutoConfig]' = None,
    lib: 'Optional[str]' = None,
    **kwargs
)  LLM

class LLM

method LLM.__init__

__init__(
    model_path: str,
    model_type: str,
    config: Optional[ctransformers.llm.Config] = None,
    lib: Optional[str] = None
)

property LLM.config

property LLM.model_path

property LLM.model_type

method LLM.detokenize

detokenize(tokens: Union[Sequence[int], int])  str

method LLM.eval

eval(
    tokens: Sequence[int],
    batch_size: Optional[int] = None,
    threads: Optional[int] = None
)  None

method LLM.generate

generate(
    tokens: Sequence[int],
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    reset: Optional[bool] = None
)  Generator[int, NoneType, NoneType]

method LLM.is_eos_token

is_eos_token(token: int)  bool

method LLM.reset

reset()  None

method LLM.sample

sample(
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None
)  int

method LLM.tokenize

tokenize(text: str)  List[int]

method LLM.__call__

__call__(
    prompt: str,
    max_new_tokens: Optional[int] = None,
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    reset: Optional[bool] = None
)  str

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctransformers-0.1.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctransformers-0.1.0-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file ctransformers-0.1.0.tar.gz.

File metadata

  • Download URL: ctransformers-0.1.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for ctransformers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5bf3f253471edfcfe39db424f5d684b604610d29f1b8247e9f4c5098120eb29f
MD5 65a2562c86143298d80d0dec4a10bbc8
BLAKE2b-256 689c731f9158c2b84a8c32f17ad7e1c8e9195bed92d7d10003677388f41da799

See more details on using hashes here.

File details

Details for the file ctransformers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ctransformers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for ctransformers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54e9c3621a27836f2205943bc1f0065e7380157b93fdad389cf92a30992445ec
MD5 aca8a6731a4aeea4d72046a1a263f20c
BLAKE2b-256 12d4fb66cdcf784a59de35be8e63d9f85cb75ca66b911d87516ffb32c13425bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page