LucaGPLM - The LUCA general purpose language model.
Project description
LucaGPLM
LucaGPLM - The LUCA general purpose language model.
Installation
You can install the package from source using pip:
pip install .
Usage
Basic Model Usage
from lucagplm import LucaGPLMModel, LucaGPLMTokenizer
# Load model
model = LucaGPLMModel.from_pretrained("Yuanfei/lucavirus-large-step3.8M")
tokenizer = LucaGPLMTokenizer.from_pretrained("Yuanfei/lucavirus-large-step3.8M")
# Example usage
seq = "ATCG"
inputs = tokenizer(seq, seq_type="gene",return_tensors="pt")
outputs = model(**inputs)
seq = "NSQTA"
inputs = tokenizer(seq, seq_type="prot",return_tensors="pt")
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
Pretraining Model Usage
The package also includes a pretraining model with multiple pretraining heads for different tasks:
from lucagplm import LucaGPLMForPretraining, LucaGPLMTokenizer
# Load pretraining model
model = LucaGPLMForPretraining.from_pretrained("path/to/pretraining/model")
tokenizer = LucaGPLMTokenizer.from_pretrained("path/to/pretraining/model")
# Example usage with pretraining tasks
seq = "ATCGATCGATCG"
inputs = tokenizer(seq, seq_type="gene", return_tensors="pt")
# Forward pass with pretraining heads
outputs = model(**inputs)
# Access logits for different pretraining tasks
print("Available task logits:", list(outputs['logits'].keys()))
# Token-level tasks (e.g., masked language modeling)
if 'token_level' in outputs['logits']:
for task_name, logits in outputs['logits']['token_level'].items():
print(f"Token-level task '{task_name}' logits shape:", logits.shape)
# Span-level tasks
if 'span_level' in outputs['logits']:
for task_name, logits in outputs['logits']['span_level'].items():
print(f"Span-level task '{task_name}' logits shape:", logits.shape)
# Sequence-level tasks
if 'seq_level' in outputs['logits']:
for task_name, logits in outputs['logits']['seq_level'].items():
print(f"Sequence-level task '{task_name}' logits shape:", logits.shape)
Converting Old Models
The package includes a utility script to convert old LucaOneVirus checkpoints to the new LucaGPLM format:
Using the command-line tool:
# Convert without pretraining heads
lucagplm-convert --old-checkpoint /path/to/old/checkpoint --output-dir /path/to/new/model
# Convert with pretraining heads
lucagplm-convert --old-checkpoint /path/to/old/checkpoint --output-dir /path/to/new/model --with-pretraining-heads
Using the Python API:
from lucagplm.convert_model import convert_old_weights
# Convert without pretraining heads
convert_old_weights(
old_checkpoint_path="/path/to/old/checkpoint",
output_dir="/path/to/new/model",
with_pretraining_heads=False
)
# Convert with pretraining heads
convert_old_weights(
old_checkpoint_path="/path/to/old/checkpoint",
output_dir="/path/to/new/model",
with_pretraining_heads=True
)
Pretraining Tasks
The LucaGPLMForPretraining model includes multiple pretraining tasks organized into three levels:
-
Token-level tasks: Tasks that operate on individual tokens
mlm: Masked Language Modelingerc: Entity Recognition and Classificationpos: Part-of-Speech tagging
-
Span-level tasks: Tasks that operate on spans of tokens
ner: Named Entity Recognitionsbo: Span Boundary Optimizationspr: Span Prediction and Recovery
-
Sequence-level tasks: Tasks that operate on entire sequences
cls: Sequence Classificationsim: Sequence Similaritygen: Sequence Generation
Each task has its own prediction head (classifier) that can be fine-tuned for specific downstream applications.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lucagplm-1.1.1.tar.gz.
File metadata
- Download URL: lucagplm-1.1.1.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b507112910932b18fd3e3f999c7f342e2e33b79181fa6faaca1742d540f02e89
|
|
| MD5 |
12ee84b93abdba331271f9406eab0706
|
|
| BLAKE2b-256 |
94af8f866f0ceae423291d2cb12d41e11dc910609576b59551a4b77f56633db1
|
File details
Details for the file lucagplm-1.1.1-py3-none-any.whl.
File metadata
- Download URL: lucagplm-1.1.1-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
982dd8838b5221278fb963604f9058169110333f69993c956da9abf57f097a35
|
|
| MD5 |
fd0dcf058b9eaf402d6d8ce13f41467a
|
|
| BLAKE2b-256 |
715c3ca06187a3bd2b2b34a39552027abb1b6cd7f11a33a2a96391e99937602e
|