LucaGPLM - The LUCA general purpose language model.

These details have not been verified by PyPI

Project links

Project description

LucaGPLM

LucaGPLM - The LUCA general purpose language model.

Installation

You can install the package from source using pip:

pip install .

Usage

Basic Model Usage

from lucagplm import LucaGPLMModel, LucaGPLMTokenizer

# Load model
model = LucaGPLMModel.from_pretrained("Yuanfei/lucavirus-large-step3.8M")
tokenizer = LucaGPLMTokenizer.from_pretrained("Yuanfei/lucavirus-large-step3.8M")

# Example usage
seq = "ATCG"
inputs = tokenizer(seq, seq_type="gene",return_tensors="pt")
outputs = model(**inputs)

seq = "NSQTA"
inputs = tokenizer(seq, seq_type="prot",return_tensors="pt")
outputs = model(**inputs)

print(outputs.last_hidden_state.shape)

Pretraining Model Usage

The package also includes a pretraining model with multiple pretraining heads for different tasks:

from lucagplm import LucaGPLMForPretraining, LucaGPLMTokenizer

# Load pretraining model
model = LucaGPLMForPretraining.from_pretrained("path/to/pretraining/model")
tokenizer = LucaGPLMTokenizer.from_pretrained("path/to/pretraining/model")

# Example usage with pretraining tasks
seq = "ATCGATCGATCG"
inputs = tokenizer(seq, seq_type="gene", return_tensors="pt")

# Forward pass with pretraining heads
outputs = model(**inputs)

# Access logits for different pretraining tasks
print("Available task logits:", list(outputs['logits'].keys()))

# Token-level tasks (e.g., masked language modeling)
if 'token_level' in outputs['logits']:
    for task_name, logits in outputs['logits']['token_level'].items():
        print(f"Token-level task '{task_name}' logits shape:", logits.shape)

# Span-level tasks
if 'span_level' in outputs['logits']:
    for task_name, logits in outputs['logits']['span_level'].items():
        print(f"Span-level task '{task_name}' logits shape:", logits.shape)

# Sequence-level tasks
if 'seq_level' in outputs['logits']:
    for task_name, logits in outputs['logits']['seq_level'].items():
        print(f"Sequence-level task '{task_name}' logits shape:", logits.shape)

Converting Old Models

The package includes a utility script to convert old LucaOneVirus checkpoints to the new LucaGPLM format:

Using the command-line tool:

# Convert without pretraining heads
lucagplm-convert --old-checkpoint /path/to/old/checkpoint --output-dir /path/to/new/model

# Convert with pretraining heads
lucagplm-convert --old-checkpoint /path/to/old/checkpoint --output-dir /path/to/new/model --with-pretraining-heads

Using the Python API:

from lucagplm.convert_model import convert_old_weights

# Convert without pretraining heads
convert_old_weights(
    old_checkpoint_path="/path/to/old/checkpoint",
    output_dir="/path/to/new/model",
    with_pretraining_heads=False
)

# Convert with pretraining heads
convert_old_weights(
    old_checkpoint_path="/path/to/old/checkpoint",
    output_dir="/path/to/new/model",
    with_pretraining_heads=True
)

Pretraining Tasks

The LucaGPLMForPretraining model includes multiple pretraining tasks organized into three levels:

Token-level tasks: Tasks that operate on individual tokens
- mlm: Masked Language Modeling
- erc: Entity Recognition and Classification
- pos: Part-of-Speech tagging
Span-level tasks: Tasks that operate on spans of tokens
- ner: Named Entity Recognition
- sbo: Span Boundary Optimization
- spr: Span Prediction and Recovery
Sequence-level tasks: Tasks that operate on entire sequences
- cls: Sequence Classification
- sim: Sequence Similarity
- gen: Sequence Generation

Each task has its own prediction head (classifier) that can be fine-tuned for specific downstream applications.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.4

Sep 20, 2025

1.1.3

Sep 20, 2025

This version

1.1.2

Sep 20, 2025

1.1.1

Sep 20, 2025

1.0.0

Jul 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lucagplm-1.1.2.tar.gz (29.1 kB view details)

Uploaded Sep 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lucagplm-1.1.2-py3-none-any.whl (27.8 kB view details)

Uploaded Sep 20, 2025 Python 3

File details

Details for the file lucagplm-1.1.2.tar.gz.

File metadata

Download URL: lucagplm-1.1.2.tar.gz
Upload date: Sep 20, 2025
Size: 29.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for lucagplm-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`09ce627eafdd630071c3863776aa32153548f9844030b16c93962d952b82b80a`
MD5	`3d06d88301f8e5120dd870413efafe50`
BLAKE2b-256	`589d9f55fdc5e14b9b52fd54ed7be21fa5eb56480e2ce330a0296fe2cce806fd`

See more details on using hashes here.

File details

Details for the file lucagplm-1.1.2-py3-none-any.whl.

File metadata

Download URL: lucagplm-1.1.2-py3-none-any.whl
Upload date: Sep 20, 2025
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for lucagplm-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cb98ae87f95217e1a1ed08d81fddbcd49c4dbbf9950cf073184510ab2992fc6`
MD5	`04e08598752a3a0e0b36c1a667375ae1`
BLAKE2b-256	`9c5c844eba5d958872e07ffef6aebb1ee6b5248a243492cf97817a6ec9a70ee8`

See more details on using hashes here.

lucagplm 1.1.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

LucaGPLM

Installation

Usage

Basic Model Usage

Pretraining Model Usage

Converting Old Models

Using the command-line tool:

Using the Python API:

Pretraining Tasks

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes