Lightweight IO and conversion for bioinformatics file formats.

These details have not been verified by PyPI

Project links

Project description

💻 bioino

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

Command-line tools and Python API for interconverting FASTA, GFF, and CSV.

bioino converts tables to FASTA, and GFF to tables. It also provides a Python API for reading, writing, and querying GFF and FASTA files.

Installation

The easy way

pip install bioino

From source

Clone the repository, then cd into it and run:

pip install -e .

Usage

Command line

Info goes to stderr, so output can be piped freely.

`gff2table`

Convert a GFF file to TSV (default) or CSV.

$ printf 'test_seq\ttest_source\tgene\t1\t10\t.\t+\t.\tID=test01;attr1=+\n' \
    | bioino gff2table 2>/dev/null
seqid   source  feature start   end     score   strand  phase   ID      attr1
test_seq        test_source     gene    1       10      .       +       .       test01  +

$ printf 'test_seq\ttest_source\tgene\t1\t10\t.\t+\t.\tID=test01;attr1=+\n' \
    | bioino gff2table -f CSV 2>/dev/null
seqid,source,feature,start,end,score,strand,phase,ID,attr1
test_seq,test_source,gene,1,10,.,+,.,test01,+

Pass --metadata / -m to include the GFF header as commented lines in the output.

`table2fasta`

Convert a CSV or TSV table of sequences to FASTA.

$ printf 'name\tseq\tdata\nSeq1\tAAAAA\tSome-info\n' \
    | bioino table2fasta -n name -s seq -d data 2>/dev/null
>Seq1 data=Some-info
AAAAA

Multiple --name columns are concatenated with _; multiple --description columns are formatted as key=value pairs separated by ;.

Detailed usage

usage: bioino [-h] [--version] {gff2table,table2fasta} ...

Interconvert some bioinformatics file formats.

options:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit

Sub-commands:
  {gff2table,table2fasta}
    gff2table           Convert a GFF to a TSV file.
    table2fasta         Convert a CSV or TSV of sequences to a FASTA file.

usage: bioino gff2table [-h] [--format {TSV,CSV}] [--metadata]
                        [--output OUTPUT]
                        [input]

positional arguments:
  input                 Input file in GFF format. Default: stdin.

options:
  -h, --help            show this help message and exit
  --format {TSV,CSV}, -f {TSV,CSV}
                        Output format. Default: "TSV".
  --metadata, -m        Write GFF header as commented lines.
  --output OUTPUT, -o OUTPUT
                        Output file. Default: stdout.

usage: bioino table2fasta [-h] [--format {TSV,CSV}] [--sequence SEQUENCE]
                          --name [NAME ...] [--description [DESCRIPTION ...]]
                          [--worksheet WORKSHEET] [--output OUTPUT]
                          [input]

positional arguments:
  input                 Input table file (TSV, CSV, or XLSX). Default: stdin.

options:
  -h, --help            show this help message and exit
  --format {TSV,CSV}, -f {TSV,CSV}
                        Input format. Default: "TSV".
  --sequence SEQUENCE, -s SEQUENCE
                        Column to take sequence from. Default: "sequence".
  --name [NAME ...], -n [NAME ...]
                        Column(s) for sequence name. Concatenated with "_",
                        spaces replaced with "-". Required.
  --description [DESCRIPTION ...], -d [DESCRIPTION ...]
                        Column(s) for sequence description. Formatted as
                        "key=value" pairs separated by ";", spaces replaced
                        with "_". Default: omitted.
  --worksheet WORKSHEET, -w WORKSHEET
                        For XLSX files, the worksheet to read. Default: "Sheet 1".
  --output OUTPUT, -o OUTPUT
                        Output file. Default: stdout.

Python API

FASTA

FastaSequence is a dataclass holding a sequence name, description, and sequence string. FastaCollection wraps an iterable of FastaSequence objects.

>>> from bioino import FastaSequence, FastaCollection

>>> seq1 = FastaSequence("example", "This is a description", "ATCG")
>>> seq2 = FastaSequence("example2", "This is another sequence", "GGGAAAA")
>>> FastaCollection([seq1, seq2]).write()
>example This is a description
ATCG
>example2 This is another sequence
GGGAAAA

Read from a file handle or filename with FastaCollection.from_file():

>>> from io import StringIO
>>> buf = StringIO()
>>> FastaCollection([seq1, seq2]).write(buf)
>>> buf.seek(0)
0
>>> FastaCollection.from_file(buf).write()
>example This is a description
ATCG
>example2 This is another sequence
GGGAAAA

Build a FastaCollection from a Pandas DataFrame with FastaCollection.from_pandas(). The names columns are concatenated with name_sep (default _); descriptions columns are formatted as key=value pairs separated by desc_sep (default ;).

>>> import pandas as pd
>>> from bioino import FastaCollection

>>> df = pd.DataFrame(dict(
...     seq=['atcg', 'aaaa'],
...     title=['seq1', 'seq2'],
...     info=['SeqA', 'SeqB'],
...     score=[1, 2],
... ))
>>> FastaCollection.from_pandas(df, sequence='seq',
...                             names=['title'],
...                             descriptions=['info', 'score']).write()
>seq1 info=SeqA;score=1
atcg
>seq2 info=SeqB;score=2
aaaa
>>> FastaCollection.from_pandas(df, sequence='seq',
...                             names=['title', 'info'],
...                             descriptions=['score']).write()
>seq1_SeqA score=1
atcg
>seq2_SeqB score=2
aaaa

GFF

Makes an attempt to conform to GFF3 but makes no guarantees.

Reading and writing

GffFile.from_file() accepts a file handle or filename and returns a GffFile that streams records lazily.

>>> from io import StringIO
>>> from bioino import GffFile

>>> lines = [
...     "##meta1 item1",
...     "#meta2  item2  comment",
...     "\t".join("test_seq test_source gene 1 10 . + . ID=test01;attr1=+".split()),
...     "\t".join("test_seq test_source gene 9 100 . + . Parent=test01;attr2=+".split()),
... ]
>>> gff = GffFile.from_file(StringIO("\n".join(lines)))
>>> gff.write()
##meta1 item1
#meta2  item2  comment
test_seq    test_source     gene    1       10      .       +       .       ID=test01;attr1=+
test_seq    test_source     gene    9       100     .       +       .       Parent=test01;attr2=+

Converting to table

GffFile.to_csv() writes a flat table with one row per GFF line, columns for the eight standard GFF fields plus all unique attribute keys. Use sep='\t' for TSV output.

>>> from io import StringIO
>>> from bioino import GffFile

>>> lines = [
...     "\t".join("TEST test gene 1 100 . + + ID=test001;comment=Test".split()),
...     "\t".join("TEST test gene 121 120 . + - ID=test001;tag=test_tag".split()),
... ]
>>> GffFile.from_file(StringIO("\n".join(lines))).to_csv()
seqid,source,feature,start,end,score,strand,phase,ID,comment,tag
TEST,test,gene,1,100,.,+,+,test001,Test,
TEST,test,gene,121,120,.,+,-,test001,,test_tag

Interconversion

GffLine.from_dict() constructs a GffLine from a dictionary. Keys matching the standard GFF column names (seqid, source, feature, start, end, score, strand, phase) populate the columns; all other keys become attributes.

>>> from bioino import GffLine

>>> d = dict(seqid='TEST', source='test', feature='gene',
...          start=1, end=100, score='.', strand='+', phase='+')
>>> print(GffLine.from_dict(d))
TEST    test    gene    1       100     .       +       +

>>> d.update(dict(ID='test001', comment='This is a test'))
>>> GffLine.from_dict(d).write()
TEST    test    gene    1       100     .       +       +       ID=test001;comment=This is a test

GffFile.as_dict() yields each line as a flat dictionary:

>>> from io import StringIO
>>> from bioino import GffFile

>>> lines = [
...     "TEST\ttest\tgene\t1\t100\t.\t+\t+\tID=test001;comment=Test",
...     "TEST2\ttest2\tgene\t101\t200\t.\t+\t+\tID=test002;comment=Test2",
... ]
>>> list(GffFile.from_file(StringIO("\n".join(lines))).as_dict())
[{'seqid': 'TEST', 'source': 'test', 'feature': 'gene', 'start': 1, 'end': 100,
  'score': '.', 'strand': '+', 'phase': '+', 'ID': 'test001', 'comment': 'Test'},
 {'seqid': 'TEST2', 'source': 'test2', 'feature': 'gene', 'start': 101, 'end': 200,
  'score': '.', 'strand': '+', 'phase': '+', 'ID': 'test002', 'comment': 'Test2'}]

Positional lookup

GffFile can build a per-chromosome interval index for fast positional annotation queries. Pass lookup=True to GffFile.from_file().

>>> from io import StringIO
>>> from bioino import GffFile

>>> lines = [
...     "\t".join(["chr1", "src", "gene", "10",  "50",  ".", "+", ".", "ID=g1;Name=geneA"]),
...     "\t".join(["chr1", "src", "gene", "100", "150", ".", "+", ".", "ID=g2;Name=geneB"]),
...     "\t".join(["chr2", "src", "gene", "20",  "80",  ".", "-", ".", "ID=g3;Name=geneC"]),
... ]
>>> gff = GffFile.from_file(StringIO("\n".join(lines)), lookup=True)

Query with lookup_at(seqid, pos), which returns a tuple of GffLine objects covering that position. Each returned line has locus_tag and offset attributes computed for that exact position.

# Gene body — offset from annotated start (+ strand) or end (- strand)
>>> r = gff.lookup_at('chr1', 30)
>>> r[0].attributes['locus_tag'], r[0].attributes['offset']
('geneA', 20)

# Intergenic — first half of gap attributed to upstream gene
>>> r = gff.lookup_at('chr1', 75)
>>> r[0].attributes['locus_tag'], r[0].attributes['offset']
('_down-geneA', 65)

# Intergenic — second half of gap attributed to downstream gene
>>> r = gff.lookup_at('chr1', 76)
>>> r[0].attributes['locus_tag'], r[0].attributes['offset']
('_up-geneB', 24)

# Up to 1000 bp past the last annotated feature is covered
>>> r = gff.lookup_at('chr1', 200)
>>> r[0].attributes['locus_tag'], r[0].attributes['offset']
('_down-geneB', 100)

# Each chromosome is indexed independently
>>> r = gff.lookup_at('chr2', 50)
>>> r[0].attributes['locus_tag'], r[0].attributes['offset']
('geneC', 30)

# Returns an empty tuple for unknown seqids or positions outside all intervals
>>> gff.lookup_at('chrX', 50)
()

The lookup index:

handles multi-chromosome GFFs
only indexes parent features (Name attribute present, no Parent attribute)
ignores feature types region and repeat_region
stores references to the original GffLine objects; offsets are computed on demand

Suggestions, issues, fixes

File an issue here.

Documentation

API reference at bioino.readthedocs.org.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

May 9, 2026

0.0.2.post1

Mar 19, 2024

0.0.2

Feb 25, 2024

0.0.1.post1

May 23, 2023

0.0.1

May 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioino-0.0.3.tar.gz (21.0 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bioino-0.0.3-py3-none-any.whl (21.2 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file bioino-0.0.3.tar.gz.

File metadata

Download URL: bioino-0.0.3.tar.gz
Upload date: May 9, 2026
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for bioino-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`be52b7c84009ca2305628117f742e8b69f192fb486c2eef5ced5ee07c493ba97`
MD5	`8a45a6d57994abf19d8ea5d8db3fba43`
BLAKE2b-256	`e39474640ab16f86ed841d0b1ac237b7c5c65e64593bba73de30a92d75c168a6`

See more details on using hashes here.

File details

Details for the file bioino-0.0.3-py3-none-any.whl.

File metadata

Download URL: bioino-0.0.3-py3-none-any.whl
Upload date: May 9, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for bioino-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7f406518a5e5bf33a93b3a720bbd173dd2193c3be4358b4ea65c51956e0d6f3f`
MD5	`7791dd21a3718e6f92425551f6571dc4`
BLAKE2b-256	`bd1c85b5ea88fd3cf1de6d13e9b0967f9c9d097143d3b6ce105523c2dc16e47b`

See more details on using hashes here.

bioino 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

💻 bioino

Installation

The easy way

From source

Usage

Command line

`gff2table`

`table2fasta`

Detailed usage

Python API

FASTA

GFF

Suggestions, issues, fixes

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes