TextSlinger: Fast and accurate text predictions in Python
Project description
|
|
TextSlinger: Fast and Accurate Text Predictions in Python |
This is a Python library for making text predictions using different types of language models. Current features:
- Predict the distribution over the next character given the previous text.
- Predict the most likely next words given the previous text and prefix of current word.
- Supports:
- N-gram language models via KenLM.
- Subword tokenized large language models (LLMs) via Hugging Face.
- Byte tokenized LLMs via Hugging Face and Byte Latent Transformer.
Developer setup
Our code style is whatever the Black formatter says it should be. You should configure your IDE to format using Black when you save.
Setting up a Python environment
If you don't have Miniforge installed in your user account you'll first need to do that.
To install Miniforge on MacOS using Apple Silicon:
curl -LO https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
zsh Miniforge3-MacOSX-arm64.sh
~/miniforge3/bin/conda init zsh
To install Miniforge on Linux:
curl -LO https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
After installing Miniforge, be sure to close your terminal and start a new one. Create an environment as follows:
conda config --remove-key channels
conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -n textslinger python=3.10 -y
conda activate textslinger
Installation of PyTorch
MacOS using Apple Silicon:
pip install torch torchvision torchaudio
Linux with CUDA support (GPU driver must support installed library version or greater. Run nvidia-smi to check driver support):
# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Linux without CUDA support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Test if the PyTorch installation worked:
python - <<'EOF'
import torch
print("Torch version:", torch.__version__)
print("MPS available:", torch.backends.mps.is_available())
print("MPS built:", torch.backends.mps.is_built())
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("CUDA device count:", torch.cuda.device_count())
if torch.backends.mps.is_available():
x = torch.randn(2, 2, device="mps")
print("Tensor device:", x.device)
elif torch.cuda.is_available():
x = torch.randn(2, 2, device="cuda")
print("Tensor device:", x.device)
else:
x = torch.randn(2, 2)
print("Tensor device:", x.device)
EOF
Installation of libraries
Install transformers (5.2.0 or greater required).
pip install transformers
Check transformers version and model support:
python - <<'EOF'
import transformers
from transformers import __version__
from transformers.utils import is_torch_available
print("Version:", __version__)
print("File:", transformers.__file__)
# Check for BLT symbols that do NOT exist in stable 5.0.0
try:
from transformers.models.blt.modeling_blt import BltModel
print("BLT model available ✅")
except Exception as e:
print("BLT model missing ❌", e)
EOF
Install other dependencies:
pip install pytest scipy peft psutil datasets
# NOTE: increase MAX_ORDER if you plan to load n-gram models with longer context
MAX_ORDER=12 pip install https://github.com/kpu/kenlm/archive/master.zip
Fix harmless warning message:
pip install --upgrade --force-reinstall setuptools
Testing installation
Download assets needed by the test suite and then run it:
cd textslinger/assets
./download.sh
cd ..
pytest -v -rs
This material is based upon work supported by the NSF under Grant No. IIS-1909089 and IIS-2402876.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textslinger-0.2.0.tar.gz.
File metadata
- Download URL: textslinger-0.2.0.tar.gz
- Upload date:
- Size: 47.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01b43e4e4605f13934088e216a08967780888d974bd2262c87bf06cdecac57cb
|
|
| MD5 |
523b922dc7244b14d9eaa11f21ec4d78
|
|
| BLAKE2b-256 |
f33b5f6da3016b015f2844a80ec12a7a216c21c905c59914b74d81bfe0d00d59
|
File details
Details for the file textslinger-0.2.0-py3-none-any.whl.
File metadata
- Download URL: textslinger-0.2.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc07db5f16d2bcc57c13589c2373e00670aeef9853926f44b8b9a4e5a5d3454f
|
|
| MD5 |
7e7c740e12b931985713f1555d4cb197
|
|
| BLAKE2b-256 |
017eb27cab3ec100710e388e0e2667e15c094736fcac2271e52e9ecca63b1af7
|