SHARK layers and inference models for genai

These details have not been verified by PyPI

Project links

Repository

Project description

SHARK Tank

WARNING: This is an early preview that is in progress. It is not ready for general use.

Light weight inference optimized layers and models for popular genai applications.

This sub-project is a work in progress. It is intended to be a repository of layers, model recipes, and conversion tools from popular LLM quantization tooling.

Project Status

Examples

The repository will ultimately grow a curated set of models and tools for constructing them, but for the moment, it largely contains some CLI examples. These are all under active development and should not yet be expected to work.

Perform batched inference in PyTorch on a paged llama derived LLM:

Note: Use --device='cuda:0' to run this inference on an AMD GPU.

python -m sharktank.examples.paged_llm_v1 \
  --hf-dataset=open_llama_3b_v2_f16_gguf \
  --device='cuda:0' \
  "Prompt 1" \
  "Prompt 2" ...

Export an IREE compilable batched LLM for serving:

python -m sharktank.examples.export_paged_llm_v1 \
  --hf-dataset=open_llama_3b_v2_f16_gguf \
  --output-mlir=/tmp/open_llama_3b_v2_f16.mlir \
  --output-config=/tmp/open_llama_3b_v2_f16.json

Generate sample input tokens for IREE inference/tracy:Add commentMore actions

python -m sharktank.examples.paged_llm_v1 \
  --irpa-file=open_llama_3b_v2_f16.irpa \
  --tokenizer-config-json=tokenizer_config.json \
  --prompt-seq-len=128 \
  --bs=4 \
  --dump-decode-steps=1 \
  --max-decode-steps=1 \
  --dump-path='/tmp' \
  --device='cuda:0'

Dump parsed information about a model from a gguf file:

python -m sharktank.tools.dump_gguf --hf-dataset=open_llama_3b_v2_f16_gguf

Package Python Release Builds

To build wheels for Linux:
```
./build_tools/build_linux_package.sh
```
That should produce build_tools/wheelhouse/sharktank-{X.Y.Z}.dev0-py3-none-any.whl, which can then be installed with
```
python3 -m pip install build_tools/wheelhouse/sharktank-{X.Y.Z}.dev0-py3-none-any.whl
```

To build a wheel for your host OS/arch manually:

# Build sharktank.*.whl into the dist/ directory
#   e.g. `sharktank-3.0.0.dev0-py3-none-any.whl`
python3 -m pip wheel -v -w dist .

# Install the built wheel.
python3 -m pip install dist/*.whl

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

3.8.0

Oct 14, 2025

3.7.0

Sep 5, 2025

3.6.0

Jul 21, 2025

3.5.1

Jul 2, 2025

3.5.0

Jun 11, 2025

3.4.0

May 5, 2025

3.3.0

Mar 24, 2025

3.2.0

Feb 10, 2025

3.1.0

Jan 8, 2025

3.0.0

Nov 18, 2024

2.9.2

Nov 15, 2024

2.9.1

Nov 14, 2024

2.9.0

Nov 11, 2024

0.1.dev3 pre-release yanked

Apr 21, 2024

Reason this release was yanked:

Iterating on initial setup

0.1.dev2 pre-release yanked

Apr 20, 2024

Reason this release was yanked:

Iterating on initial setup

0.1.dev1 pre-release yanked

Apr 20, 2024

Reason this release was yanked:

Iterating on initial setup

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sharktank-3.8.0-py3-none-any.whl (479.8 kB view details)

Uploaded Oct 14, 2025 Python 3

File details

Details for the file sharktank-3.8.0-py3-none-any.whl.

File metadata

Download URL: sharktank-3.8.0-py3-none-any.whl
Upload date: Oct 14, 2025
Size: 479.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for sharktank-3.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`39c069a339f470bdd55d22bfd61442e7590711d41c5fce36211534a8aa01118b`
MD5	`470b53004b5aaf47d59c73678b7ddc28`
BLAKE2b-256	`00c8a683f401e1b2e9ea8d87dc693c51184c39b1ae625ea7a908a0afd3a5cd36`

See more details on using hashes here.

sharktank 3.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SHARK Tank

Project Status

Examples

Perform batched inference in PyTorch on a paged llama derived LLM:

Export an IREE compilable batched LLM for serving:

Generate sample input tokens for IREE inference/tracy:Add commentMore actions

Dump parsed information about a model from a gguf file:

Package Python Release Builds

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes