Skip to main content

Falcon - Pytorch

Project description

Multi-Modality

Simple Falcon

A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations with bitsandbytes, PEFT, GPTQ, assisted generation, RoPE scaling support, and rich generation parameters.

Installation

You can install the package using pip

pip3 install simple-falcon

Usage

from falcon.main import Falcon


falcon = Falcon(
    temperature=0.5, 
    top_p=0.9, 
    max_new_tokens=500,
    quantized=True,
    system_prompt=""
)

prompt = "What is the meaning of the collapse of the wave function?"

result = falcon.run(prompt=prompt)
print(result)

Documentation

The Falcon class provides a convenient interface for conversational agents based on the transformers architecture. It facilitates both single-turn and multi-turn conversations with pre-trained models and allows users to customize certain inference settings such as temperature, top_p, and token generation limits. Furthermore, it can leverage quantized models for faster performance.

Purpose

The main purpose of the Falcon class is to:

  • Make it easy to initiate and run generative language models.
  • Provide efficient conversation interfaces with customization.
  • Support both regular and quantized models for better performance.
  • Manage conversational history in multi-turn scenarios.

Class Definition:

class Falcon:
    def __init__(
        self,
        *,
        model_id: str = "tiiuae/falcon-180B",
        temperature: float = None,
        top_p: float = None,
        max_new_tokens: int = None,
        quantized: bool = False,
        system_prompt: str = None
    ):

Parameters:

  • model_id (str): Model identifier from the HuggingFace Model Hub. Default is "tiiuae/falcon-180B".

  • temperature (float, optional): Controls randomness in the Boltzmann distribution of model predictions. Higher values result in more randomness.

  • top_p (float, optional): Nucleus sampling: Restricts sampling to the top tokens summing up to this cumulative probability.

  • max_new_tokens (int, optional): Maximum number of tokens that can be generated in a single inference call.

  • quantized (bool): If set to True, the model loads in 8-bit quantized mode. Default is False.

  • system_prompt (str, optional): Initial system prompt to set the context for the conversation.

Method Descriptions:

1. run:

def run(self, prompt: str) -> None:

Generates a response based on the provided prompt.

Parameters:

  • prompt (str): Input string to which the model responds.

Returns: None. The response is printed to the console.

2. chat:

def chat(self, message: str, history: list[tuple[str, str]], system_prompt: str = None) -> None:

Generates a response considering the conversation history.

Parameters:

  • message (str): User's current message to which the model will respond.

  • history (list[tuple[str, str]]): Conversation history as a list of tuples. Each tuple consists of the user's prompt and the Falcon's response.

  • system_prompt (str, optional): Initial system prompt to set the context for the conversation.

Returns: None. The response is printed to the console.

Usage Examples:

1. Single-turn conversation:

from falcon import Falcon
import torch

model = Falcon(temperature=0.8)
model.run("What is the capital of France?")

2. Multi-turn conversation with history:

from falcon import Falcon
import torch

model = Falcon(system_prompt="Conversational Assistant")
history = [
    ("Hi there!", "Hello! How can I assist you?"),
    ("What's the weather like?", "Sorry, I can't fetch real-time data, but I can provide general info.")
]
model.chat("Tell me a joke.", history)

3. Using quantized models:

from falcon import Falcon
import torch

model = Falcon(quantized=True)
model.run("Tell me about quantum computing.")

Mathematical Representation:

The Falcon class essentially leverages the transformer-based generative language model for text generation. The mathematical process can be generalized as:

Given an input sequence ( x = [x_1, x_2, ... , x_n] ), the model predicts the next token ( x_{n+1} ) by:

[ x_{n+1} = \arg \max P(x_i | x_1, x_2, ... , x_n) ]

Where:

  • ( P ) is the probability distribution over the vocabulary generated by the model.
  • The argmax operation selects the token with the highest probability.

Additional Information:

  • For best performance, it's recommended to use the Falcon class with CUDA-enabled devices. Ensure that your PyTorch setup supports CUDA.

  • The Falcon class uses models from the HuggingFace model hub. Ensure you have an active internet connection during the first run as models will be downloaded.

  • If memory issues arise, consider reducing the max_new_tokens parameter or using quantized models.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_falcon-0.0.7.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

simple_falcon-0.0.7-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file simple_falcon-0.0.7.tar.gz.

File metadata

  • Download URL: simple_falcon-0.0.7.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for simple_falcon-0.0.7.tar.gz
Algorithm Hash digest
SHA256 9505b4c2f9d29b406512d9ed47a677d77bfcb020fb2b8f10163969f450e99718
MD5 76f3e7af8a394bc2ef973d42c9ee8f67
BLAKE2b-256 47a119bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c

See more details on using hashes here.

File details

Details for the file simple_falcon-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: simple_falcon-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for simple_falcon-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 bbcbfeb175ce8612949f96e2117da4586879038d21f8b2a264313d2d3c36c686
MD5 7ed1101a62d34a4b7e5ec6151c294e3e
BLAKE2b-256 696a29e5cb2877367a8ca21a856dfe4a606937f3c4b95bcd27949d6a6212fa9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page