Skip to main content

CPU-friendly sequence-only CRISCross off-target prediction (scikit-learn-style API). Ships compressed weights in-wheel; optional genome scanning via criscross.offinder needs a system OpenCL runtime (see README).

Project description

criscross

CPU-friendly sequence-only CRISCross off-target prediction with a scikit-learn-style API. The pretrained model weights ship inside the wheel (fp16 + zstd-compressed) so no extra downloads are required.

Install

pip install criscross

Optional marker extra (PEP 621 optional-dependencies): use pip install criscross[offinder] when you intend to use genome scanning. The extra carries no additional Python packages; OpenCL drivers are installed separately on the system. This pattern keeps the install line and documentation aligned. See the Python Packaging User Guide on optional dependencies.

CPU-only install (no CUDA libraries pulled in):

pip install criscross --extra-index-url https://download.pytorch.org/whl/cpu

If you plan to use criscross.offinder (important)

pip install criscross is enough for model inference (sequence_model.predict(...)).

If you also want genome scanning via criscross.offinder.prepare(...), you must install an OpenCL runtime on the machine. Cas-OFFinder requires OpenCL even in CPU mode. This follows the usual pattern for Python packages that wrap native tools: document system dependencies, expose optional extras for discoverability, and fail with a clear error when the feature is used without the runtime.

Check your environment before a long scan:

from criscross.offinder import check_opencl, opencl_setup_instructions

status = check_opencl()
print(status)
if not status["ok"]:
    print(opencl_setup_instructions())

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install pocl-opencl-icd

Conda (cross-platform option):

conda install -c conda-forge pocl ocl-icd-system

If OpenCL is missing, offinder.prepare(...) will fail with an error like clGetPlatformIDs Failed: -1001.

Quickstart

from criscross import sequence_model, sample_input_path
import pandas as pd

# Single datapoint (dict)
prob = sequence_model.predict({
    "Guide_sequence":   "GCTCGGGGACACAGGATCCCTGG",     # 23 nt
    "off_target_512nt": "GCAG...TGCC",                 # 512 nt, RC for - strand
    "strand_id":        1,                             # 1 for +, 0 for -
})
print(prob)   # float in [0, 1]

# Dataset (DataFrame or path to CSV)
df = pd.read_csv(sample_input_path())
probs = sequence_model.predict(df)           # -> np.ndarray, shape [N]
probs = sequence_model.predict(sample_input_path())  # same

Preparing inputs from a genome scan (Cas-OFFinder)

If you have guide RNA(s) and a reference genome FASTA, you can generate the Guide_sequence/off_target_512nt/strand_id table with Cas-OFFinder and feed it directly into sequence_model.predict(...).

from criscross import offinder, sequence_model

X = offinder.prepare(
    guide_rnas=["GCTCGGGGACACAGGATCCCTGG"],
    fasta="reference.fasta",  # path to your reference FASTA (file or directory)
    pam="NGG",            # default
    max_mismatches=6,     # default
)

# X is a DataFrame you can pass straight to criscross
probs = sequence_model.predict_proba(X)

Requirements:

  • Cas-OFFinder needs an OpenCL runtime even for CPU mode. On Linux, the simplest CPU runtime is PoCL, e.g. conda install -c conda-forge pocl ocl-icd-system (or sudo apt install pocl-opencl-icd).
  • Cas-OFFinder 2.4.1 is bundled inside the criscross wheel. To use a different binary, set the CAS_OFFINDER environment variable to its path (or pass cas_offinder_path=).

Accepted inputs to predict(X)

X Returned
dict / pandas.Series with the 3 required keys float
(guide, off_target_512nt, strand_id) 3-tuple float
pandas.DataFrame with the 3 required columns np.ndarray shape [N]
list of dicts np.ndarray shape [N]
str / pathlib.Path pointing to a CSV with the 3 columns np.ndarray shape [N]

Required columns/keys:

key dtype meaning
Guide_sequence 23nt string sgRNA guide sequence
off_target_512nt 512nt string candidate off-target window, already reverse-complemented for - strand
strand_id int 0/1 1 for + strand, 0 for -

CLI

criscross predict --demo --out preds.csv
# or, with your own file:
criscross predict --csv path/to/your_input.csv --out preds.csv

If the input CSV also has a label column (0/1), AUPRC is printed to stderr.

Loading a custom checkpoint

from criscross import sequence_model
sequence_model.load("path/to/my_model.pt")           # fp32 raw .pt
sequence_model.load("path/to/my_model.pt.zst")       # zstd-compressed fp16

Inspecting the model

sequence_model.config()    # hyperparameters used to build CRISCross(**config)
sequence_model.metadata()  # versions, training-time test_auprc, input/output signature, seed

Citation

If you use this package in research, please cite the upstream CRISCross work. This package is a CPU-only, sequence-only repackaging of that model.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

criscross-0.1.8.tar.gz (70.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

criscross-0.1.8-py3-none-any.whl (70.7 MB view details)

Uploaded Python 3

File details

Details for the file criscross-0.1.8.tar.gz.

File metadata

  • Download URL: criscross-0.1.8.tar.gz
  • Upload date:
  • Size: 70.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for criscross-0.1.8.tar.gz
Algorithm Hash digest
SHA256 f92ac0aa2aaaaf80ca54c8a1d75024bb0aa8fc8279821c200e98d68e79519e81
MD5 83823b7587992cca809a476e6ff39593
BLAKE2b-256 cb08606c55467940038a4f1da613092abb5dc482cf51f852236f794cf90019e8

See more details on using hashes here.

File details

Details for the file criscross-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: criscross-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 70.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for criscross-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b193444496a1c3b0881ae2e37eab262021b84f7ae7559b602b28ecc7b7f88734
MD5 5e31b27e675fef43b626d7d8826cc96c
BLAKE2b-256 e618a0aa543cf212c1d5627fb73d4ba9ec388c091b3ee050fce3534b3a2c4e35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page