Skip to main content

Rust-native toolkit for parsing, cleaning, writing, and sharding protein FASTA files.

Project description

rfasta

rfasta is a production-ready toolkit for parsing, cleaning, writing, and sharding protein FASTA files.

It provides:

  • A high-performance Rust library
  • A Python package for scripting and notebooks
  • A CLI for batch and pipeline workflows

Highlights

  • FASTA parsing from files or in-memory readers
  • Configurable cleanup policies for duplicate records and invalid residues
  • Deterministic shard generation for parallel processing
  • Python bindings via PyO3
  • CLI designed for operational workflows
  • Documentation for Rust, Python, and CLI usage

Installation

Python (recommended for end users)

pip install rfasta

Rust (library development)

Add to Cargo.toml:

[dependencies]
rfasta = "0.1"

Quick Start

CLI

Clean a FASTA file:

rfasta clean proteins.fasta \
  --non-unique-header \
  --duplicate-record remove \
  --invalid-sequence convert-remove \
  -o cleaned.fasta

Split a large FASTA file into one-pass round-robin shards:

rfasta split proteins.fasta --output-dir shards --chunks 8

Python

import rfasta

rows = rfasta.read_fasta("proteins.fasta", expect_unique_header=True, verbose=False)
rfasta.write_fasta(rows, "proteins.copy.fasta", line_length=60)

Rust

use std::io::Cursor;

use rfasta::clean::{clean_sequences, CleanOptions, DuplicateAction, InvalidSequenceAction};
use rfasta::parse::{parse_fasta_reader, ParseOptions};
use rfasta::write::{write_fasta_writer, WriteOptions};

let input = b">seq1\nacdx\n>seq2\nTTTT\n";
let records = parse_fasta_reader(Cursor::new(input), ParseOptions::default())?;
let cleaned = clean_sequences(
    records,
    &CleanOptions {
        invalid_sequence_action: InvalidSequenceAction::ConvertRemove,
        duplicate_record_action: DuplicateAction::Fail,
        ..CleanOptions::default()
    },
)?;
let mut output = Vec::new();
write_fasta_writer(&mut output, &cleaned, &WriteOptions::default())?;
# Ok::<(), rfasta::RfastaError>(())

Documentation and Guides

Build the guide site locally:

pip install -r docs/requirements.txt
mkdocs build --strict

Local Performance Check

For a repo-local benchmark driver that does not require extra benchmark crates, run:

cargo run --release --example benchmark_driver

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfasta-0.1.0.tar.gz (5.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rfasta-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (586.0 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ x86-64

rfasta-0.1.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (571.0 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARM64

rfasta-0.1.0-cp37-abi3-macosx_11_0_arm64.whl (522.1 kB view details)

Uploaded CPython 3.7+macOS 11.0+ ARM64

rfasta-0.1.0-cp37-abi3-macosx_10_12_x86_64.whl (542.2 kB view details)

Uploaded CPython 3.7+macOS 10.12+ x86-64

File details

Details for the file rfasta-0.1.0.tar.gz.

File metadata

  • Download URL: rfasta-0.1.0.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rfasta-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bcb656851c4ee62d0204cbbe65692cb729384db83d5b34384857ee81a579b35c
MD5 418c14f4d079015832b1c5909643f940
BLAKE2b-256 847325bdfbaa75ba4327469f5b22e4426182c8d752937cf996184b1a6b2ca4ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for rfasta-0.1.0.tar.gz:

Publisher: CI.yml on jlotthammer/rfasta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rfasta-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rfasta-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0fcc67de552d14b7f7b1174ce12e7c2407773731b5bffcb0894c3adb8703c431
MD5 d272d9ba9362e75e8cd06027396fae4f
BLAKE2b-256 23620e979c872c852fc08e145104bd00ebc049bd688576045dc32944308f198b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rfasta-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on jlotthammer/rfasta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rfasta-0.1.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rfasta-0.1.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ebf84b7ded7f7d119d3cd8d658da2143ed0d8c155576d4eb2cec8ff65bb0274a
MD5 35643af95930a2893fb36a55b6e2da74
BLAKE2b-256 de38194de21069c9a47e7fb9beda82e7c514ec2136259d5a3724253f427d86bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for rfasta-0.1.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on jlotthammer/rfasta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rfasta-0.1.0-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: rfasta-0.1.0-cp37-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 522.1 kB
  • Tags: CPython 3.7+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rfasta-0.1.0-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c79d3447c6f9319a427f2a46dd6824b2fcb04effb2126e9ccdef65255e5d6c37
MD5 af9c12bc7c9f9f6caaaf99f8f122b209
BLAKE2b-256 fe236ad54198b467697cb98a4ab4b616c73f687935a2ce699b8e7beeaeff4430

See more details on using hashes here.

Provenance

The following attestation bundles were made for rfasta-0.1.0-cp37-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on jlotthammer/rfasta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rfasta-0.1.0-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rfasta-0.1.0-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 949138ca6e8807d8c5415323fbd50b93a799ea09bf01a48efce0cf3358d52fe0
MD5 f1374932ad55b796aca51ba9214e378d
BLAKE2b-256 f518f1af0e2f1bfcd3d1c350c379ee13098947e53af654657f0a65c47881487a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rfasta-0.1.0-cp37-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on jlotthammer/rfasta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page