Skip to main content

Inference Llama 2 in one file of pure Python

Project description

llama2.py

Open In Colab

demo

Demo Llama2.py

why this fork?

This repository serves as a fork that provides a Python-based implementation of llama2.c. Designed for an extensive audience, it aims to be a straightforward "reference implementation" suitable for educational purposes.

The current llama2.c repository comprises two Python files intended for model training and one C file for inference. Our goal is to bridge the existing gap by offering a clear-cut reference implementation encapsulating all transformer logic within a concise Python file, not exceeding 500 lines of code.

Though the original Facebook/llama is written on Python, its complexity is rather high due to multiple dependencies and sophisticated optimizations implemented within. This often makes it hard to follow, particularly for those new to the field.

Please note, the current performance of our implementation is relatively slow, clocking in at approximately ~1 tok/sec. This leaves ample scope for significant performance optimizations. We welcome any contributions towards enhancing the efficiency of this project.

feel the magic

First, navigate to the folder when you keep your projects and clone this repository to this folder:

git clone https://github.com/tairov/llama2.py.git

Then, open the repository folder:

cd llama2.py

Now, let's just run a baby Llama 2 model in Python

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin

Just run the Python

python3 llama2.py stories15M.bin 0.8 256 "Dream comes true this day"
<s>
Dream comes true this day. To their surprise. A big game was easy and everyone was going on the day. Jack and they were playing beneath: life, free, butter! There was the time to think of the universe. There was very happy, fun and the joy and the following down below of this day they were there was a lot of a wide, new camping.
Jack and they had happened. The town was the saving up above the camp of the waves shor of their laughter, friendly journey of friendship to one. The night sky show of the end. Little ceremony, happy again.
<s>
 Once upon his family of a big day when Jack. They were filled foreshadowed happy and they were the joy filled this, different: the King of their appreciation they were to a wave to the spring limit. They were becoming Ruby, happy and the sunset of life of an amazing friendship and he had a robot.
<s>
 Once upon a 4, happy to the wonderful experience of the celebration of their friendship. Even the playground.
Jack and Sammy fishing adventure foreshium of a wishing being free time, happy. The generous adventure foreshly made it. The chance to
achieved tok/s: 1.3463711338028914

use as a package

PyPi llama2-py

pip install llama2-py
>>> import llama2
>>> llama2.run({"checkpoint": "out/model.bin", "temperature": 0.0, "steps": 256, "prompt": None})
<s>
Once upon a time, there was...

performance

Performance is awful at the moment. On my Mac M1 Max -- ~1.3 tok / sec

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama2.py-0.0.3.tar.gz (8.8 kB view details)

Uploaded Source

Built Distributions

llama2_py-0.0.3-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

llama2.py-0.0.3-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file llama2.py-0.0.3.tar.gz.

File metadata

  • Download URL: llama2.py-0.0.3.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for llama2.py-0.0.3.tar.gz
Algorithm Hash digest
SHA256 70374f365959d8f672d8f88ee740443056326eb7cafb95b875db79efdc9957fb
MD5 04fabd25028b85ec94e3a56280300a17
BLAKE2b-256 4bd803195ccd603db865ab5cfdfa1f7a5e8ccb7256b32de43041edf67a6d30e8

See more details on using hashes here.

File details

Details for the file llama2_py-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: llama2_py-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for llama2_py-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 87d905f5bd714ad9d0665000a3c5e8751e280d08318fb3c819c75cb23deef8da
MD5 072cc3f9005244084fd8195c14a260d4
BLAKE2b-256 496c0f77787e8b21b6dabf9d4955086bd2f7125fbbb71798420ede6c7fd7dd86

See more details on using hashes here.

File details

Details for the file llama2.py-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: llama2.py-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for llama2.py-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3acc0b74533a1459d7c750042ff89fc6f14593b54f3cf0ffe877d5a40bdf0625
MD5 9200defd4a9bbe72267e0aa048cc849c
BLAKE2b-256 a1650e947bdb0f208bc77c71803dbdb4efcd0f669d7e8830b153aba4a6d1b842

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page