Incenp.org’s utilities for computational biology
Project description
Incenp.Bioutils - Incenp.org’s utilities for computational biology
Incenp.Bioutils is a set of command line utilities and helper Python modules to deal with some computational biology tasks.
Utilities
SeqTool
SeqTool is intended as a “Swiss Army knife” to perform various operations on biological sequence files.
Among other things, it provides wrappers for the blast
and dotter
alignment tools; the main interest of those wrappers is that they can be
used with sequences in any format supported by Biopython’s SeqIO module,
whereas the original tools only read files in the FASTA format.
SeqVault
SeqVault is a command-line interface to BioSQL databases.
SeqVault is intended to be used with a slightly modified version of the
BioSQL database schema (provided in the biosql
directory), where every
biodatabase
is associated with a 3-letters prefix. That prefix is then
used to automatically assign accession numbers (of the form PRE_xxxxxx
where PRE
is the prefix) when importing sequences into the database.
However SeqVault can also be used with pristine BioSQL databases.
Quickstart (example with PostgreSQL)
Create a new PostgreSQL user account and a new database::
# createuser <username>
# createdb -O owner <username> <dbname>
Initialize the newly created database by running the provided
biosql/biosqldb-pg.sql
script::
$ psql -h localhost -U <username> <dbname> < biosql/biosqldb-pg.sql
Next, create a plaintext file in $XDG_CONFIG_HOME/bioutils/databases.ini
and fill it as follows::
[Server]
host: localhost
user: <username>
password: <PostgreSQL user password>
database: <dbname>
SeqVault is now ready to be used.
cc3d-runner
CC3d-Runner is a small wrapper around the command-line interface of CompuCell3D, to run headless simulations.
Helper modules
The main helper module is the incenp.bio.seq.usa
module, which
provides an implementation of the Uniform Sequence Address format
.
A Uniform Sequence Address or USA is a way to refer to a biological sequence and to fetch that sequence from its source.
The following example shows how to read a sequence specified by a USA::
from incenp.bio.seq import usa
records = usa.read_usa('genbank::file.gb:acc:X65923')
The returned records
object is an array of Bio.SeqRecord.SeqRecord
objects.
Prerequisites
The following Python packages are necessary:
ReportLab is required to run the plasmm
subcommand of the seqtool
utility.
The seqvault
utility requires a SQL adapter for the database server
you plan to use it with, such as psycopg2 for
PostgreSQL.
Copying
Incenp.Bioutils is distributed under the terms of the GNU General Public License, version 3 or higher. The full license is included in the COPYING file of the source distribution.
Homepage and repository
The project is located at https://incenp.org/dvlpt/bioutils.html. The source code is available in a Git repository at https://git.incenp.org/damien/bioutils.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for incenp.bioutils-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35bd352e71b3903521f9a9e53a9fc36f2c689b38c6317dfe1203c929ed256696 |
|
MD5 | 65620c1ebc487ffd82ea5047b7c8bb70 |
|
BLAKE2b-256 | 73a2069e35b2e55d00e4115b7b5500166cea66b9d71aacdd75b490de36615b44 |